Homai Knowledge Base
Discover detailed guides and step-by-step tutorials designed to help you digitize your language - from creating fonts to training advanced AI models.
Introduction
Why Digitize Your Language?
Understand the significance of digital presence for preserving and revitalizing indigenous languages in today’s connected world.
ReadDigitization Process Overview
Discover how your language goes digital: from concept to practical tools and apps.
ReadGlossary
Not familiar with ASR, TTS, corpora, or Hugging Face? Here's your guide to essential terms.
ReadStage 1: Digital Foundations, Fonts and Keyboards
Unicode Characters
Check if your language’s characters exist in Unicode and learn how to request additions.
ReadCreating Desktop Keyboards
Step-by-step instructions to build keyboard layouts for Windows, macOS, and Linux.
ReadOrdering Mobile Keyboards
Find out who can create mobile keyboards for iOS and Android, what you’ll need, and estimated costs.
ReadStage 2: Data Collection and Preparation
Digitizing Texts (OCR)
How to convert printed books and documents into digital text using FineReader, Vision LLM, and other ML tools.
ReadDigitizing Dictionaries
Methods for turning printed dictionaries into structured databases. This foundational step directly determines the quality of your future corpora and AI models.
ReadCreating a Monolingual Corpus
How to clean and format digitized texts to create a high-quality text corpus.
ReadParallel Corpora
Sources for parallel texts and where to find them; discover tools to automatically align sentence pairs.
ReadValidating Alignments
Quickly check the quality of automatically aligned texts with volunteer and community help, using tools like Telegram bots.
ReadRecording Audio for TTS
Best practices for choosing equipment, recording spaces, and speaker guidelines for quality speech synthesis datasets.
ReadCollecting Data for ASR
Effective methods for gathering speech data: from Common Voice contributions to scripted recordings.
ReadUploading Data to HuggingFace
How to format and upload your text and audio datasets to Hugging Face. (Paid automation tool available!)
ReadStage 3: DIY Model Training
Training ASR Models (Automatic Speech Recognition)
ОAn overview of leading models (Wav2Vec2, Whisper), including tutorials and code references for training your own ASR models.
ReadTraining TTS Models (Speech Synthesis)
Understand popular frameworks (Tacotron, VITS), their strengths and weaknesses, and follow step-by-step training instructions.
Coming SoonTraining MT Models (Machine Translation)
Learn the basics of neural machine translation (NMT), with detailed guides and code resources.
ReadCreating Spellcheckers
Methods and tools to develop effective spellcheck systems for your language.
Coming SoonSimplify Your Model Training!
We offer easy-to-use, paid tools for ASR, TTS, and MT training: no programming needed, just provide your dataset and launch your training.
Reach out to learn moreStage 4: Applying Language Technologies
Ideas and Opportunities
Explore practical ways your ASR, TTS, and MT models can be implemented: via smart assistants, content translation, educational apps, and more.
Coming soonComplete, Ready-to-Use Solutions
Discover our turnkey products—like smart speakers and automated video translation—that seamlessly integrate your trained AI models.
View Products