What is Unicode and Why It Matters
Unicode is the universal standard that assigns a unique number to every character in every writing system in the world. Think of it as a global phone book for letters, symbols, and characters. Without Unicode support, your language's unique characters simply cannot exist in the digital world—they'll appear as question marks, boxes, or be replaced with incorrect characters.
Key fact: Unicode currently supports over 150,000 characters covering 159 modern and historic scripts. If your language's characters aren't in Unicode, they cannot be properly displayed on computers, phones, or the internet.
Unicode is critical because it ensures:
- Universal compatibility: Your text displays correctly on any device, anywhere in the world
- Data preservation: Digital texts remain readable for future generations
- Searchability: People can search for words in your language online
- Software development: Developers can create apps and websites supporting your language
- Digital communication: People can text, email, and post on social media in their language
How to Check if Your Characters Are in Unicode
Before starting any digitization project, you must verify whether your language's writing system is already supported in Unicode. Here are several ways to check:
Method 1: Unicode Character Search
Visit the Official Unicode Charts
Go to unicode.org/charts and look for your script. Unicode organizes characters by script families (Latin, Cyrillic, Cherokee, etc.).
Use Online Unicode Tools
Websites like Compart Unicode or Unicode Table let you search for specific characters or browse by script.
Test Your Characters
Try typing or pasting your language's text into various applications. If characters display correctly across different platforms, they're likely in Unicode.
Common Scenarios
Scenario | What You'll See | What It Means |
---|---|---|
Fully Supported | ᏣᎳᎩ ᎦᏬᏂᎯᏍᏗ | Characters display correctly everywhere—you're ready to proceed! |
Not Supported | □□□ ??? | Characters appear as boxes or question marks—Unicode proposal needed |
Partial Support | Āā Ēē □□ | Some characters work, others don't—identify missing characters |
Font Issue | Characters work in some apps but not others | Unicode support exists but fonts need to be installed |
What If Your Characters Aren't in Unicode?
If your language's writing system isn't in Unicode, don't panic! Many languages have successfully been added to Unicode through the proposal process. Here's what you need to know:
The Unicode Proposal Process
Research and Documentation
Gather evidence of your writing system's use: historical documents, modern publications, educational materials, and examples of the script in actual use. The more evidence, the stronger your proposal.
Character Inventory
Create a complete list of all characters needed, including:
- Basic letters/symbols
- Diacritical marks (accents, tones)
- Punctuation specific to your language
- Numbers (if unique to your script)
- Any combining characters
Technical Preparation
Work with Unicode experts or linguists to prepare technical documentation including character properties, encoding considerations, and implementation guidelines.
Submit Proposal
Submit your proposal to the Unicode Technical Committee (UTC). The proposal must follow their specific format and include all required sections.
Review Process
The UTC reviews proposals quarterly. Be prepared to answer questions and provide additional information. The process typically takes 1-2 years.
Important: The Unicode proposal process is complex and technical. Consider partnering with organizations like the Script Encoding Initiative (SEI) at UC Berkeley, which helps communities prepare successful proposals.
Success Stories
Recent Unicode Additions
Many indigenous and minority scripts have been successfully added to Unicode in recent years:
- Osage (2016): The Osage Nation worked with linguists to add their script, enabling digital preservation of their language
- Adlam (2016): Created for the Fulani language, this script went from invention to Unicode in just 30 years
- Wancho (2019): Used in northeastern India, added after community advocacy
- Nandinagari (2019): Historical Brahmic script now preserved digitally
- Tangsa (2021): Myanmar script added through collaborative effort
Temporary Solutions While Awaiting Unicode
While working on Unicode inclusion, you can still make progress with digitization using these approaches:
1. Private Use Area (PUA)
Unicode reserves character codes (U+E000 to U+F8FF) for private use. You can assign your characters to these codes temporarily:
Example: Assign your unique character to U+E000
Create fonts that display your character at this position
Share fonts within your community
Limitation: Only works when your custom font is installed; text won't display correctly for others.
2. Transliteration Systems
Develop a consistent system to represent your language using existing characters:
- Use Latin letters with diacritics (ā, č, ñ)
- Create digraphs (ch, sh, th for single sounds)
- Use numbers for tones (ma1, ma2, ma3)
3. Image-Based Solutions
For critical documents, use images of text while awaiting Unicode support. Modern OCR can later convert these when Unicode is available.
4. Custom Encoding Documentation
Whatever temporary solution you choose, document it thoroughly:
- Create a conversion table between your system and standard Unicode
- Build tools to convert between formats
- Ensure your community understands the temporary nature
Resources and Support
Script Encoding Initiative
Free assistance with Unicode proposals for minority scripts
SEI at UC BerkeleyPro Tip: Start Font Development Early
Even if your characters are in Unicode, you may need custom fonts for proper display. Start developing fonts alongside your Unicode efforts—you'll need them either way!
Ready to Move Forward?
Whether your characters are in Unicode or you need to submit a proposal, the next step is creating functional keyboards for your language.
Learn About Keyboard Development