So finally, I am giving out this long awaited draft:
http://tinyurl.com/34yckl
It addresses some of the OpenType, Unicode and fonts related issues. Many of the issues discussed here, have been the source of conflicts, especially for ml_IN. Thus it was an utter need to provide a detailed analysis like this. I hope the illustrations made there provide some common guidelines. There is certainly a scope for improvement. I would like to hear from various communities if they want some of the other left out issues to be also addressed.
The draft is open for discussion and feedback.
http://tinyurl.com/34yckl
It addresses some of the OpenType, Unicode and fonts related issues. Many of the issues discussed here, have been the source of conflicts, especially for ml_IN. Thus it was an utter need to provide a detailed analysis like this. I hope the illustrations made there provide some common guidelines. There is certainly a scope for improvement. I would like to hear from various communities if they want some of the other left out issues to be also addressed.
The draft is open for discussion and feedback.
Hi राहुल,
ReplyDeletePosting my comments here, primarily for devanagari.
First, with reference to the issues that were raised.
1.
अॅ Issue 1 - This form is also used in Hindi for transliterating foreign language words, for though more common in Marathi.
Similar is the case for
आॅ - which should be rendered the same way as ऑ, although we would be promoting two forms of the same text. Perhaps the renderer can auto convert आॅ to ऑ
२. ऱ् - I'm curious to know if there are any other instances in Marathi that necessitate the need for ZWJ - ie not having ZWJ would change the word grammatically? Asking you because ZWJ use for ऱ् has been oft quoted on the unicode mailing lists even though it can be encoded even without ZWJ.
3. Nothing can be done about the "redundant" codepoints. But we could have the rendering engine always produce the most compact encoding, ie the rendering engine can auto convert क़ to the single codepoint.
4. Lack of encoding for ङ्क, ङ्ख, ङ्ग, ङ्घ conjucts - most common fonts lack these conjuncts, as a result, users resort to using anuswar instead, which is fine, but the font glyphs are forcing a particular kind of behavior on the users. Encoding of these conjuncts should be mandatory.
5. Text editor - can you incorporate a section for the text editor as well, aside from font, rendering engine and unicode? - text editors should be designed such that they compact to the least codepoint form.
eg क़ gets replaced with the one single conjunct, र + halant + ZWJ gets replaced with ऱ + halant during storage or data transfer.
This way we save space(which not really the goal) and also have a consistent codepoint sequence for common words (which is the goal).
आलोक
Hi Alok,
ReplyDeleteThanks for your comments.
1. I have received few feedbacks that have informed that the independent vowel for candra E i.e. अॅ already proposed in unicode and is currently in the beta version of unicode 5.1.0. I don't think need to have anything special for आॅ issue since ऑ is already encoded. But we may manipulate the sequence in keyboard maps.
2. Users can always use ZWJ/ZWNJ as per guided in unicode standard. Generally they are used for alternative half forms and are used in marathi sometimes.ZWNJ is more common on few websites, may not be per unicode always, but in few cases they are required.
3. removing redundancy is useful for ensuring proper combinations as well. But the actual data size cannot be reduced.
4. Formation of the conjuncts is entirely upto font and we can at the most make sure to include these conjuncts in the font. But convincing unicode to add them will be very difficult.
5. Variety and wide spectrum of text editors may not provide compact encoding, but we can always modify or create few of such.