Unicode Characters

Ensuring your language's writing system is properly represented in the digital world

What is Unicode and Why It Matters

Unicode is the universal standard that assigns a unique number to every character in every writing system in the world. Think of it as a global phone book for letters, symbols, and characters. Without Unicode support, your language's unique characters simply cannot exist in the digital world—they'll appear as question marks, boxes, or be replaced with incorrect characters.

Key fact: Unicode currently supports over 150,000 characters covering 159 modern and historic scripts. If your language's characters aren't in Unicode, they cannot be properly displayed on computers, phones, or the internet.

Unicode is critical because it ensures:

  • Universal compatibility: Your text displays correctly on any device, anywhere in the world
  • Data preservation: Digital texts remain readable for future generations
  • Searchability: People can search for words in your language online
  • Software development: Developers can create apps and websites supporting your language
  • Digital communication: People can text, email, and post on social media in their language

How to Check if Your Characters Are in Unicode

Before starting any digitization project, you must verify whether your language's writing system is already supported in Unicode. Here are several ways to check:

Method 1: Unicode Character Search

1

Visit the Official Unicode Charts

Go to unicode.org/charts and look for your script. Unicode organizes characters by script families (Latin, Cyrillic, Cherokee, etc.).

2

Use Online Unicode Tools

Websites like Compart Unicode or Unicode Table let you search for specific characters or browse by script.

3

Test Your Characters

Try typing or pasting your language's text into various applications. If characters display correctly across different platforms, they're likely in Unicode.

Common Scenarios

Scenario What You'll See What It Means
Fully Supported ᏣᎳᎩ ᎦᏬᏂᎯᏍᏗ Characters display correctly everywhere—you're ready to proceed!
Not Supported □□□ ??? Characters appear as boxes or question marks—Unicode proposal needed
Partial Support Āā Ēē □□ Some characters work, others don't—identify missing characters
Font Issue Characters work in some apps but not others Unicode support exists but fonts need to be installed

What If Your Characters Aren't in Unicode?

If your language's writing system isn't in Unicode, don't panic! Many languages have successfully been added to Unicode through the proposal process. Here's what you need to know:

The Unicode Proposal Process

1

Research and Documentation

Gather evidence of your writing system's use: historical documents, modern publications, educational materials, and examples of the script in actual use. The more evidence, the stronger your proposal.

2

Character Inventory

Create a complete list of all characters needed, including:

  • Basic letters/symbols
  • Diacritical marks (accents, tones)
  • Punctuation specific to your language
  • Numbers (if unique to your script)
  • Any combining characters
3

Technical Preparation

Work with Unicode experts or linguists to prepare technical documentation including character properties, encoding considerations, and implementation guidelines.

4

Submit Proposal

Submit your proposal to the Unicode Technical Committee (UTC). The proposal must follow their specific format and include all required sections.

5

Review Process

The UTC reviews proposals quarterly. Be prepared to answer questions and provide additional information. The process typically takes 1-2 years.

Important: The Unicode proposal process is complex and technical. Consider partnering with organizations like the Script Encoding Initiative (SEI) at UC Berkeley, which helps communities prepare successful proposals.

Success Stories

Recent Unicode Additions

Many indigenous and minority scripts have been successfully added to Unicode in recent years:

  • Osage (2016): The Osage Nation worked with linguists to add their script, enabling digital preservation of their language
  • Adlam (2016): Created for the Fulani language, this script went from invention to Unicode in just 30 years
  • Wancho (2019): Used in northeastern India, added after community advocacy
  • Nandinagari (2019): Historical Brahmic script now preserved digitally
  • Tangsa (2021): Myanmar script added through collaborative effort

Temporary Solutions While Awaiting Unicode

While working on Unicode inclusion, you can still make progress with digitization using these approaches:

1. Private Use Area (PUA)

Unicode reserves character codes (U+E000 to U+F8FF) for private use. You can assign your characters to these codes temporarily:

Example: Assign your unique character to U+E000
Create fonts that display your character at this position
Share fonts within your community

Limitation: Only works when your custom font is installed; text won't display correctly for others.

2. Transliteration Systems

Develop a consistent system to represent your language using existing characters:

  • Use Latin letters with diacritics (ā, č, ñ)
  • Create digraphs (ch, sh, th for single sounds)
  • Use numbers for tones (ma1, ma2, ma3)

3. Image-Based Solutions

For critical documents, use images of text while awaiting Unicode support. Modern OCR can later convert these when Unicode is available.

4. Custom Encoding Documentation

Whatever temporary solution you choose, document it thoroughly:

  • Create a conversion table between your system and standard Unicode
  • Build tools to convert between formats
  • Ensure your community understands the temporary nature

Resources and Support

Unicode Consortium

Official Unicode standards and proposal guidelines

unicode.org

Script Encoding Initiative

Free assistance with Unicode proposals for minority scripts

SEI at UC Berkeley

SIL International

Linguistic software and font development tools

software.sil.org

Noto Fonts

Google's font family covering all Unicode scripts

Google Noto

Pro Tip: Start Font Development Early

Even if your characters are in Unicode, you may need custom fonts for proper display. Start developing fonts alongside your Unicode efforts—you'll need them either way!

Ready to Move Forward?

Whether your characters are in Unicode or you need to submit a proposal, the next step is creating functional keyboards for your language.

Learn About Keyboard Development