9fans archive / 2006 / 05 / 179    prev next

search terms are split using tokenize from:regexp searches From: lines subject:regexp searches Subject: lines before:yyyy[/mm[/dd]] and after:yyyy[/mm[/dd]] specify date range powered by grep(1)
From: Joel Salomon <joelcsalomon@gma...> Subject: combining characters Date: Sun, 21 May 2006 13:52:11 -0400 On 5/19/06, quanstro@qua... <quanstro@qua...> wrote: > On Fri May 19 17:55:50 CDT 2006, joelcsalomon@gma... wrote: > > Take Hebrew, for instance: 27 letters (including the 5 final forms) + > > a few alternate forms, 15 vowel marks, 25+ cantillation marks -- > > that's more than 10,000 combinations right there. > > i don't know hebrew very well, but are you confusing glyphs with characters? > > for example arabic has three letter forms: initial, final and medial. > (there is a different shape for the the same letter at the beginning, middle > and end of the word.) > > so in arabic, a good renderer would need three glyphs for each codepoint. Hebrew final forms are not much of a problem; they are separate characters, typed with different keys. It's the vowel marks needed sometimes (whn yu cnnt nfr th vwls frm cntxt) and the cantillation marks needed for Biblical text that make for the code space explosion. Arabic text rendering—for readable plain text, not just for "fancy" typesetting—requires yet another clever set of algorithms. I don't know that there's a way to manage this complexity without giving "fonts" their own context-aware programming language, like TrueType. --Joel -- It reverses the normal flow of conversation. > What's wrong with top-posting? > > Top-posting. > > > What's the biggest scourge on plain text email discussions?