This is going to be a long post and I am going to think
aloud while typing it out. So before you lose interest, first of all, let me
thank both Red Hat and CDAC for jointly hosting the FUEL-GILT Conference 2013.
The 2 days conference at Pune not just helped the default objective of
advancing on the Fuel project but also revived the Indic computing community.
It is after a long time that the stalwarts of Indian Language technology all
got together on the same platform and worked on ideas and tasks that have been
waiting for a long time. The significant highlight of the event has to be the growing
harmony and collaboration between the technologists, linguists, Government
bodies and Open Source community.
So what is FUEL? Going by the words on the brochure and what
I understand over the years, FUEL started with a simple idea of standardizing the most commonly
used entries in the menus and submenus of a desktop and hence the name FUEL (Frequently
Used Entries for Localization). Today the concept has gone on to cover the web
and mobile platforms as well. It may
look like a simple idea, but considering the impact that it has on the use of
ICT on various platforms in various languages, it actually addresses a very
important issue. Its absence can very well deteriorate the effective
elimination of linguistic barrier in technology. No wonder it has grown on to
cover more than 40 languages and eGovernance standards in India are taking it seriously
in their guidelines. For more on FUEL project see this and this.
The conference started with the special address from Mr. Satish
Mohan of Red Hat and Mr. M. D. Kulkarni of CDAC. Both of them highlighted the progress
and achievement so far in the field of Indian Language Computing and also gave
some crucial directions on way forward and thing yet undone. Although organized
under the banner of FUEL, the stage was set for contemplating on problems and
ideas related to language computing which were left aside for a long time at
least in the open source community.
Talking about the ideas and initiatives, the ones worth noting
had to be Zanata translation management tool, hadoop based auto-translation
framework, UTRRS testing system for text rendering, matrix based translation
assessment system, standardization or guidelines for indic fonts, issues and
ideas on indic typing on mobile, automatic language detection for input methods
and last but not least my own appeal to extend FUEL efforts for vertical
specific localization.
In my humble opinion, there exist a lot of translation
management tools, but what is necessary is to make them more user-friendly,
improve their capability in suggesting translations and ensuring consistency of
terminology. In this respect I think Zanata does have the features and capabilities
to smoothen the process of accurate translations and I am really hopeful that
it will grow towards these goals. What I would like to see though in near
future is a public portal of something like Zanata that will help both the
upstream and downstream localization. Such a portal can become a very prominent
starting point for any localization project and help new projects benefit from
the translation memories created by the work done in previous ones.
We have been looking at the google translations for a long
time now. It does have a good support for European languages, but so far the
performance with Indian languages has not been very impressive. Now making an
auto-translation work seamlessly is not an easy task. It requires mammoth
efforts, not just the backend technology but also the not so technical training
of the algorithms and linguistic resources that these backend frameworks can
make use of. It won’t be a sane thought if we expect one organization or team
to achieve all of it. It certainly needs efforts from hundreds of dedicated
developers and linguists. From this perspective, the hadoop based framework
discussed by Mr. Rajat Gupta may provide a strong technical framework. But to
switch the plug on and make the machine translation giant work, it’s going to need
enormous amount of data corpus fed to it. I am neither an expert on the quality
of translation framework nor do I know of the plans to feed the machine, but it
would be really great if they indeed succeed in their efforts on collecting the
unorganized data. I would love to see the development of this get into the
public domain and get help from crowd-sourcing at least on the data side.
For those who do not know about UTRRS, it’s the testing
reference system for rendering of local language scripts. Satyabrata Maitra gave
nice introduction of the system that was long awaited in the open source
community, especially the ones concerned about the consistent rendering of Indic
fonts. Hope this gives a headstart to the efforts of developing more Indic fonts
that are bug-free across applications and platforms. There is a discussion
going on renaming this system and my vote goes for SUTRA which not just appears
to be a sensible Indic word but also reflects the meaning of the system i.e.
equation, in a sense this framework provides us the equations to verify against
for testing local fonts’ rendering.
Right since 2004 when I and my team started working on developing
Samyak fonts, one of our major focuses was to ensure standard multilingual text
usage. We not only tried to standardize styles and sizes of the fonts among the
Indic scripts that we developed the font for but also with the Latin text that can
co-exist with Indic text. On day 2 of the conference, Guntupalli Karunakar
presented ideas and need for standardizing the font features and giving various
levels of compatibility to the Indic fonts based on the complex set of rules
they support. The discussions went on to cover the problems of intermixing the
text from various scripts and ensuring similar look and feel, especially the
size and line widths, the same issue we have been trying to resolve for so
long. I think there needs to be a general guideline on the em-sizes and
alignments for font developers to ensure compatibility with at least few of the
most commonly used Latin fonts. In the offline, I discussed the issue along
with few of my suggestions with Peiying Mo of Mozilla and one of my friends, the
Indic font designer Ravi Pande. I am planning to look closer on this issue and come
up with few guidelines that I will try to implement with Samyak fonts and if
possible on Lohit as well.
Coming to the mobile platform, it seems with the progress in
smartphone developments and android, the language computing on mobile platform
is gaining pace. When Anivar Arvind presented the developments and problems in
mobile Indic computing, I was reminded of the same major hurdle, an efficient
input method. Considering the complexities of the Indic scripts and large
character sets, no matter how big the phones get in size, they are still small
for efficient input. Considering that users are yet to get acquainted with the
typing on desktop, its actually a far-stretched hope that same input mechanisms
will thrive on the mobile platform. Let’s assume that the screen size is also
not an issue and we can have sufficiently large tablet displays, still typing on
a touch-screen is a lot different than a physical keyboard, especially the need
to hold ‘shift’ key so frequently while typing in Indian languages using some
of the traditional keyboard layout. I personally use Ashoka map on my android
phone but it’s not even complete let alone efficient. At GnowTantra, we have been researching about
the various accessibility solutions including the ones that not only bridge
linguistic barriers but also help blind users in their daily activities using
handheld devices. I think we got a new topic and direction to think over now.
Pravin Satpute from Red Hat’s i18n team presented an
innovative idea for seamless multi-lingual typing. He discussed the possibility
of detecting the language being typed without the user having to switch between
layouts every time the language changes. I
surely get irritated when I am talking via text and have to switch the language
whenever there is a word from English that comes inevitable. It would be good to get
something like this actually implemented although I am not sure how effectively
such system can handle the ambiguities of language detection, I hope it becomes
possible to some extent at least for non-phonetic layouts such as inscript.
Talking about my own contributions to the conference, I made
an appeal to extend FUEL to various verticals apart from ICT. This comes from
our recent experience at GnowTantra when we worked on globalization of an
application in the Accounting domain. We realized that the difficulties in
working on a particular vertical are not covered during the general ICT
localization. It needs a more focused approach and contributions from
specialists of the vertical from various linguistic backgrounds. And even if we
do it for one application, we do not solve the problem for other applications
in the same vertical since there is no standard guideline. Hence FUEL. Thanks to Rajesh for accepting this request in
the panel discussion. I myself and GnowTantra would like to contribute in whichever way possible towards
these efforts. Verticals ensure specific solutions for specific problems, and
hence unless we work on consistent solutions to increase the reach of these
specific solutions, technology will not become accessible and useful to the
masses.
Apart from all these technicalities and difficulties, one of
the issues highlighted through various talks including the ones by Mr. Ravikant,
Mr. Ravishankar Shrivastava and few others, was the usefulness of the translated
terms in conveying the meaning. This gives us a direction that the language is
supposed to convey meaning and not just an alternative word. A lot of interesting
discussions went on about Hinglish, Minglish, Malinglish and so on.
So these are my takeaways from the conference. But the most important
and fortunate thing that happened and in my opinion benefited both the
community and the FUEL project was the keynote from Mr. Sam Pitroda. He joined
over video conference, and shared his ideas, concerns and assurance to help the
development of Indic computing solutions. I think this is one area of computing
that needs collaboration than anything else. Mr. Pitroda highlighted the
importance of language computing and assured the government’s sustained
collaborative efforts in the field. I hope he takes closer look into opening up huge wealth of linguistic resources such as literature, language corpus, research material etc. that is available with the government institutes but not yet into the public domain. He talked about convincing the academic institutes to publish their PhD research works over internet for people and also governments preference to Open Soruce. In spite of the cultural barriers he mentioned about non-sharing of knowledge resources in the country, this has to be very encouraging for the
community, future of Indic computing and reach of the technology in general.
I would once again like to thank the organizers and
especially Mr. Rajesh Ranjan, "the man behind FUEL" and my ex-colleague from Red Hat for making it a
successful project and an event. We hope to see even more significant
achievements and more such events in the future.
[P.S. : Everything written here is my personal thought
process and it may not be accurate. Hope we have only healthy discussions. J]
Thanks a lot Rahul for such a detailed report. GnowTantra's support will be very much important for FUEL Project.
ReplyDeleteHi Rahul,
ReplyDeleteFirstly, thanks for the kind words about Zanata.
As Product Manager though I mostly appreciate your insight about where Zanata can go from here. To that end please do keep in contact with us (zanata-users@redhat.com) and feel free to contact me personally (irooskov@redhat.com).
I am always keen to hear feedback from the community on how we can grow.
Thanks,
Isaac
Hi Isaac, I am straightaway joining the list. It would be my pleasure to contribute as much as possible.
ReplyDeleteIt was a nice blog but i want to know about shawn bartholomae
ReplyDeletethank you