Term · technical
Script-based language detection
A method of classifying an ad creative's language by examining which Unicode script blocks dominate its text — without using a probabilistic language model. Example mappings used by tgadsspy: U+1200–U+137F (Ethiopic) → Amharic, U+1000–U+109F (Burmese) → Myanmar, U+0600–U+06FF (Arabic-derived) → Arabic/Persian/Urdu (disambiguated by language-specific characters). Faster and more reliable than statistical detection for short ad copy in low-resource languages where ML models lack training data. Latin-script languages (English, German, Indonesian, Malay) still require keyword/morphology heuristics because they share the same script block.
Related terms
- Niche ClassificationAutomatic or manual categorization of ads by vertical — crypto, trading, gambling, VPN, tech, news, retail, finance, gaming, educa
- Geo TargetingSetting ads to be shown only to users in specific countries or regions. On the Telegram Ads Platform, advertisers target by channe
- Low-resource languageA language with limited training corpora available for machine-learning detection or translation — typically languages spoken by s
Cite this entry
Telegram Ads Spy (2026). "Script-based language detection" in Telegram Ads glossary. https://tgadsspy.com/info/script-based-language-detection
Licensed CC-BY-4.0 — reuse allowed including commercial, attribution required.