Skip to main content

Unicode 5.1 release and Indic changes

Unicode 5.1 release was announced earlier this month on 4th April. Here I have put a diff taken of Unicode 5.1 character database against that of Unicode 5.0. My buddy, Parag also did a nice job of summarizing the Indic specific changes, that I am trying to restate now.

So, here go the updates on Indian scripts UCD:

A. New Indic Scripts Added to Unicode:

1. LEPCHA:

Lepcha is a language spoken by the Lepcha people in Sikkim in India,and parts of Nepal and Bhutan. The Lepcha script (also known as "róng") is a syllabic script which has a lot of special marks and requires ligatures. Its genealogy is unclear. Early Lepcha manuscripts were written vertically, a sign of Chinese influence. Lepcha is considered to be one of the aboriginal languages of the area in which it is spoken.

Total number of speakers numbers near 50,000.

Unicode Range =>U1C00 to U1C4F

Chart URL => http://www.unicode.org/charts/PDF/U1C00.pdf

2. OL-CHIKI:

The Ol Chiki script, also known as Ol Cemetʼ ("language of writing"), Ol Ciki, Ol (and sometimes as the Santali alphabet), was created in 1925 by Pandit Raghunath Murmu for the Santali language. Santali is a language in the Munda subfamily of Austro-Asiatic, related to Ho and Mundari. It is spoken by about six million people in India, Bangladesh, Nepal, and Bhutan[citation needed]. Most of its speakers live in India, in the states of Jharkhand, Assam, Bihar, Orissa, Tripura, and West Bengal. It has its own alphabet, known as Ol Chiki, but literacy is very low, between 10 and 30%. Santali is spoken by the Santals.

Unicode Range => U1C50 to U1C7F

Chart URL => http://www.unicode.org/charts/PDF/U1C50.pdf

3. SAURASHTRA :

Saurashtra, more correctly, Sauraṣṭri or Sauraṣṭram or Sourashtra, also known as Palkar, Sowrashtra, Saurashtram, is an Indo-Aryan language spoken in parts of the Southern Indian State of Tamil Nadu. The Saurashtra community is referred to by the same name, or sometimes by the Tamil name, Pattunoolkaarar. The Ethnologue puts the number of speakers at 510,000 (1997 IMA), although the actual number could be double this figure or even more.

Unicode Range => UA880 to UA8D9

Chart URL => http://www.unicode.org/charts/PDF/UA880.pdf


B. Updates to Existing SCripts in Unicode:

1. DEVANAGARI (2 New Characters):

0971; SIGN HIGH SPACING DOT
0972; LETTER CANDRA A


2. GURMUKHI (2 New Characters):

0A51; SIGN UDAAT
0A75; SIGN YAKASH


3. ORIYA (3 New Characters):

0B44; VOWEL SIGN VOCALIC RR
0B62; VOWEL SIGN VOCALIC L
0B63; VOWEL SIGN VOCALIC LL

4. TAMIL (1 New Characters):

0BD0; OM

5. TELUGU (13 New Characters):

0C3D; SIGN AVAGRAHA
0C58; LETTER TSA
0C59; LETTER DZA
0C62; VOWEL SIGN VOCALIC L
0C63; VOWEL SIGN VOCALIC LL
0C78; FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR
0C79; FRACTION DIGIT ONE FOR ODD POWERS OF FOUR
0C7A; FRACTION DIGIT TWO FOR ODD POWERS OF FOUR
0C7B; FRACTION DIGIT THREE FOR ODD POWERS OF FOUR
0C7C; FRACTION DIGIT ONE FOR EVEN POWERS OF FOUR
0C7D; FRACTION DIGIT TWO FOR EVEN POWERS OF FOUR
0C7E; FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR
0C7F; SIGN TUUMU

6. MALAYALAM (17 New Characters):

0D3D; SIGN AVAGRAHA
0D44; VOWEL SIGN VOCALIC RR
0D62; VOWEL SIGN VOCALIC L
0D63; VOWEL SIGN VOCALIC LL
0D70; NUMBER TEN
0D71; NUMBER ONE HUNDRED
0D72; NUMBER ONE THOUSAND
0D73; FRACTION ONE QUARTER
0D74; FRACTION ONE HALF
0D75; FRACTION THREE QUARTERS
0D79; DATE MARK
0D7A; LETTER CHILLU NN
0D7B; LETTER CHILLU N
0D7C; LETTER CHILLU RR
0D7D; LETTER CHILLU L
0D7E; LETTER CHILLU LL
0D7F; LETTER CHILLU K

All the New Unicode Charts can now be found here:

http://www.unicode.org/charts/


Changes to Tamil and Malayalam have a lot more to discuss than just additional characters. On one side, I think Tamil community would be happy about Unicode rewarding Tamil Named Character Sequences to simplify the script processing, on other side, Malayalam community is not so happy about the Atomic Chillu Characters. Here is their opposition.

I am myself very happy about the 0972 (Letter Candra A) being added to Devanagari. This will help fixing the 'Apple' and 'Anaconda' for Marathi. Also, the inclusion of Ol-Chiki script is a very good initiative.

There is actually a lot of work to be done related to all these changes, ranging through fonts, rendering, keymaps, locales etc. I will have to come up with the details of all that very soon.

Popular posts from this blog

PVR is so wierd!

Yesterday we went second time to a mall bit far from office to complete the earlier failed mission of watching this 3D movie, Clash of the Titans. On ticket counter, we were first told that evening show was house full. Then we asked for a night show, and were told there isn't any show then and the gentleman handed us the pamphlet of all movie schedules. We checked on the nearby digital kiosk and also on the printed schedule to be sure of the show timings. Then went to second counter, and asked the lady for the night show tickets, and without any problem got the tickets for back seats. In fact this show was hardly 20% full, wonder how the evening show became houseful.

But the biggest wonder/blunder is yet to come. On the entrance we were stopped for having a laptop bag along with (we had went straight after the office). In spite of having checked the bag, we were not allowed, because laptops were not allowed inside! Then we asked for keeping it at the baggage counter. But then, the…

Launching the project 'i18nWidgets for Android'

A lot of Android devices, platforms and apps have several issues regarding rendering of non-English text especially that of Indic text. Though many of them claim to support various Indic and other languages, it usually either means that they have a font for that language included or they have some of the native apps supporting all these languages. But this does not mean all the app will be able to render the non-English text properly. This usually happens for one of the following problem being present:
1. No fonts added in the device (or the native android system) 2. Fonts are not accessible by the third party application 3. App has its own Unicode font, but the native android system does not support text layout rendering for the language 4. App has the font and the android system also supports the language, but the sdk for the particular platform does not have widgets integrated with the complex text rendering features.
This problem gave birth to the idea of developing and extending…