Wednesday, September 11, 2013

My takeaways from FUEL GILT Conference 2013

This is going to be a long post and I am going to think aloud while typing it out. So before you lose interest, first of all, let me thank both Red Hat and CDAC for jointly hosting the FUEL-GILT Conference 2013. The 2 days conference at Pune not just helped the default objective of advancing on the Fuel project but also revived the Indic computing community. It is after a long time that the stalwarts of Indian Language technology all got together on the same platform and worked on ideas and tasks that have been waiting for a long time. The significant highlight of the event has to be the growing harmony and collaboration between the technologists, linguists, Government bodies and Open Source community.

So what is FUEL? Going by the words on the brochure and what I understand over the years, FUEL started with a simple idea of standardizing the most commonly used entries in the menus and submenus of a desktop and hence the name FUEL (Frequently Used Entries for Localization). Today the concept has gone on to cover the web and mobile platforms as well.  It may look like a simple idea, but considering the impact that it has on the use of ICT on various platforms in various languages, it actually addresses a very important issue. Its absence can very well deteriorate the effective elimination of linguistic barrier in technology. No wonder it has grown on to cover more than 40 languages and eGovernance standards in India are taking it seriously in their guidelines. For more on FUEL project see this and this.

The conference started with the special address from Mr. Satish Mohan of Red Hat and Mr. M. D. Kulkarni of CDAC. Both of them highlighted the progress and achievement so far in the field of Indian Language Computing and also gave some crucial directions on way forward and thing yet undone. Although organized under the banner of FUEL, the stage was set for contemplating on problems and ideas related to language computing which were left aside for a long time at least in the open source community.

Talking about the ideas and initiatives, the ones worth noting had to be Zanata translation management tool, hadoop based auto-translation framework, UTRRS testing system for text rendering, matrix based translation assessment system, standardization or guidelines for indic fonts, issues and ideas on indic typing on mobile, automatic language detection for input methods and last but not least my own appeal to extend FUEL efforts for vertical specific localization.

In my humble opinion, there exist a lot of translation management tools, but what is necessary is to make them more user-friendly, improve their capability in suggesting translations and ensuring consistency of terminology. In this respect I think Zanata does have the features and capabilities to smoothen the process of accurate translations and I am really hopeful that it will grow towards these goals. What I would like to see though in near future is a public portal of something like Zanata that will help both the upstream and downstream localization. Such a portal can become a very prominent starting point for any localization project and help new projects benefit from the translation memories created by the work done in previous ones.

We have been looking at the google translations for a long time now. It does have a good support for European languages, but so far the performance with Indian languages has not been very impressive. Now making an auto-translation work seamlessly is not an easy task. It requires mammoth efforts, not just the backend technology but also the not so technical training of the algorithms and linguistic resources that these backend frameworks can make use of. It won’t be a sane thought if we expect one organization or team to achieve all of it. It certainly needs efforts from hundreds of dedicated developers and linguists. From this perspective, the hadoop based framework discussed by Mr. Rajat Gupta may provide a strong technical framework. But to switch the plug on and make the machine translation giant work, it’s going to need enormous amount of data corpus fed to it. I am neither an expert on the quality of translation framework nor do I know of the plans to feed the machine, but it would be really great if they indeed succeed in their efforts on collecting the unorganized data. I would love to see the development of this get into the public domain and get help from crowd-sourcing at least on the data side.

For those who do not know about UTRRS, it’s the testing reference system for rendering of local language scripts. Satyabrata Maitra gave nice introduction of the system that was long awaited in the open source community, especially the ones concerned about the consistent rendering of Indic fonts. Hope this gives a headstart to the efforts of developing more Indic fonts that are bug-free across applications and platforms. There is a discussion going on renaming this system and my vote goes for SUTRA which not just appears to be a sensible Indic word but also reflects the meaning of the system i.e. equation, in a sense this framework provides us the equations to verify against for testing local fonts’ rendering.

Right since 2004 when I and my team started working on developing Samyak fonts, one of our major focuses was to ensure standard multilingual text usage. We not only tried to standardize styles and sizes of the fonts among the Indic scripts that we developed the font for but also with the Latin text that can co-exist with Indic text. On day 2 of the conference, Guntupalli Karunakar presented ideas and need for standardizing the font features and giving various levels of compatibility to the Indic fonts based on the complex set of rules they support. The discussions went on to cover the problems of intermixing the text from various scripts and ensuring similar look and feel, especially the size and line widths, the same issue we have been trying to resolve for so long. I think there needs to be a general guideline on the em-sizes and alignments for font developers to ensure compatibility with at least few of the most commonly used Latin fonts. In the offline, I discussed the issue along with few of my suggestions with Peiying Mo of Mozilla and one of my friends, the Indic font designer Ravi Pande. I am planning to look closer on this issue and come up with few guidelines that I will try to implement with Samyak fonts and if possible on Lohit as well.

Coming to the mobile platform, it seems with the progress in smartphone developments and android, the language computing on mobile platform is gaining pace. When Anivar Arvind presented the developments and problems in mobile Indic computing, I was reminded of the same major hurdle, an efficient input method. Considering the complexities of the Indic scripts and large character sets, no matter how big the phones get in size, they are still small for efficient input. Considering that users are yet to get acquainted with the typing on desktop, its actually a far-stretched hope that same input mechanisms will thrive on the mobile platform. Let’s assume that the screen size is also not an issue and we can have sufficiently large tablet displays, still typing on a touch-screen is a lot different than a physical keyboard, especially the need to hold ‘shift’ key so frequently while typing in Indian languages using some of the traditional keyboard layout. I personally use Ashoka map on my android phone but it’s not even complete let alone efficient.  At GnowTantra, we have been researching about the various accessibility solutions including the ones that not only bridge linguistic barriers but also help blind users in their daily activities using handheld devices. I think we got a new topic and direction to think over now.

Pravin Satpute from Red Hat’s i18n team presented an innovative idea for seamless multi-lingual typing. He discussed the possibility of detecting the language being typed without the user having to switch between layouts every time the language changes.  I surely get irritated when I am talking via text and have to switch the language whenever there is a word from English that comes inevitable. It would be good to get something like this actually implemented although I am not sure how effectively such system can handle the ambiguities of language detection, I hope it becomes possible to some extent at least for non-phonetic layouts such as inscript.

Talking about my own contributions to the conference, I made an appeal to extend FUEL to various verticals apart from ICT. This comes from our recent experience at GnowTantra when we worked on globalization of an application in the Accounting domain. We realized that the difficulties in working on a particular vertical are not covered during the general ICT localization. It needs a more focused approach and contributions from specialists of the vertical from various linguistic backgrounds. And even if we do it for one application, we do not solve the problem for other applications in the same vertical since there is no standard guideline. Hence FUEL.  Thanks to Rajesh for accepting this request in the panel discussion. I myself and GnowTantra would like to  contribute in whichever way possible towards these efforts. Verticals ensure specific solutions for specific problems, and hence unless we work on consistent solutions to increase the reach of these specific solutions, technology will not become accessible and useful to the masses.

Apart from all these technicalities and difficulties, one of the issues highlighted through various talks including the ones by Mr. Ravikant, Mr. Ravishankar Shrivastava and few others, was the usefulness of the translated terms in conveying the meaning. This gives us a direction that the language is supposed to convey meaning and not just an alternative word. A lot of interesting discussions went on about Hinglish, Minglish, Malinglish and so on.

So these are my takeaways from the conference. But the most important and fortunate thing that happened and in my opinion benefited both the community and the FUEL project was the keynote from Mr. Sam Pitroda. He joined over video conference, and shared his ideas, concerns and assurance to help the development of Indic computing solutions. I think this is one area of computing that needs collaboration than anything else. Mr. Pitroda highlighted the importance of language computing and assured the government’s sustained collaborative efforts in the field. I hope he takes closer look into opening up huge wealth of linguistic resources such as literature, language corpus, research material etc. that is available with the government institutes but not yet into the public domain. He talked about convincing the academic institutes to publish their PhD research works over internet for people and also governments preference to Open Soruce. In spite of the cultural barriers he mentioned about non-sharing of knowledge resources in the country, this has to be very encouraging for the community, future of Indic computing and reach of the technology in general.

I would once again like to thank the organizers and especially Mr. Rajesh Ranjan, "the man behind FUEL" and my ex-colleague from Red Hat for making it a successful project and an event. We hope to see even more significant achievements and more such events in the future. 

[P.S. : Everything written here is my personal thought process and it may not be accurate. Hope we have only healthy discussions. J]