The Free and Open Productivity Suite
Apache OpenOffice 4.1.4 released

Interview: Jitendra Shah

Using to Switch to Unicode

Mayank Sharma



Unicode was developed to provide a standardized way of encoding multilingual text. By standardizing the encoding scheme and by making room for more than 65,000 characters, Unicode became an instant hit with local language computing developers around the world.

In India, which has more than a handful of languages and scripts written both from left-to-right and right-to-left, Unicode was accepted with open hands.

Indictrans is a project that is using as a means of getting common people to switch to Unicode. Prof. Jitendra Shah, the project leader of Indictrans, has worked on projects to convert Government documents to Unicode. His pitch to the government officers: "if you shift to with any free OpenType fonts, you get the Indian language capability through Unicode along with it. And you can make this shift to Unicode immediately. This can be done even on older hardware like the PIII or Celeron with 64MB RAM. The shift to enables you to standardize on Unicode and any new PCs don't need to be loaded with a costly OS. GNU/Linux systems will do nicely."

In this interview , Shah explains his belief that "" implies affordable freedom for Indian languages.


Q. India has developed its own encoding like ISFOC, ISCII. Why then instead of standardizing on them did you decide to go after Unicode?

A: It's like this. If my mother organized a hall, bedroom and prayer place in a 10x10 apartment I would admire her for that. But this does not mean I have to keep her in the same state when I can afford 10 times the space.

ISFOC and ISCII were developed in the 8-bit constrained era. The most important aspect in ISCII is its phonetic pattern. This has been picked up lock-stock and barrel in Unicode. Hence we are loosing little. To make the same encoding be seen in different scripts by a 'switch of a button' is still easy though not automatic. This is no great loss as most computing goes on behind the scene and users do not have to bother. The most advantageous aspect of Unicode is that Linux natively supports Unicode while it does not support ISCII. This is the most important reason why migration to Unicode is justified.

The second most important reason why I particularly lean on Unicode is that free software can be much more easily spread using Unicode and supporters of ISCII have not yet come forward with free solutions based on ISCII. I have therefore adopted the route of translating ISCII content to Unicode and to then use it with Free/Open Source software. The Free Software Movement has of course consistently argued in favor of Unicode.

This does not prevent ardent Free/Open Source users from using some ASCII based fonts in Emacs, mule, etc. ('Asian'). There is no doubt that the Centre for Development of Advanced Computing (CDAC formerly NCST) showed the way in this direction a few years back.

Thirdly if we were to standardize on anything, then it must be 'global' ( i.e., global+local) and not just local. ISCII would be just local.


Q. How are open source projects like OOo useful to India? How are you using OOo?

A: They are extremely useful to India because OOo is the single most important reason why migration to OSS/Free software will occur beyond the network servers. There is absolutely no doubt in my mind. Being open source, is like a glass box which is superbly better than the blackbox model.

I am using OOo as 'the' argument for shifting to Indian language enablement, whatever might be the OS [operating system--ed.].


Q. What are the obstacles in standardizing on Unicode? What problems exist for you to engage in OOo? How can these be resolved?

A: The main obstacle is the hand-holding required while dealing with the government departments who need to migrate. There are many lobbies of proprietary software which are there propagating against Unicode. The big-budget-loving decision makers are attracted by the proprietary options.

They can see the deficiencies of Free/Open Source with a magnifying lens. They either overlook or ignore if they notice, issues like the total cost of ownership or the importance of the stigma of piracy. Once government bodies start migrating, others will follow more easily.

The education sector is important too. But that requires more lobbying. The system as of now is too tied towards minting money and not geared to encourage learning. The fly-by-night contractors who 'teach' computers in schools are happy with 'pirated', 'standard', 'familiar' software and would consider anything else substandard. The machines are usually discarded ones with low RAM and hence typical GNU/Linux with current versions of OOo may not be loadable.

Another area is to incorporate new keyboard layouts. Procedures are not clear (to me). Our government employees have been hooked to 'typewriter' keyboards. How do I include them? I have done that for yudit and have also done it using java, from scratch. OOo can definitely help in this regard.

The support for dictionaries, spell-checkers, better support for tables in word-processors (particularly wide tables and in landscape format) is missing. This hurts. The issues of presentation software, like not supporting some of the most recent features in Microsoft PowerPoint, may not be that important as far as I think. can even work on Windows and can support binary applications across platforms. But input methods and fonts are the main limitations (use IBM IME as input method and Arial Unicode MS as font). One wishes there was better support for OpenType.

So in a snap, my students were not able to locate a simple way to build a dictionary, so that we could use the auto-completion feature in OOo. We also don't know the procedure for localizing OOo: there is a simple procedure using gettext, msgfmt, po-file translation, etc. for C programs, GNOME, KDE etc. We don't know how to add a new keyboard layout (you will find typewriter, KeyLekh on our site in typebhaaratii). OOo has Inscript which is the best.


Q. How do you envisage engaging your students in open-source projects, including OOo?

A: I have tried to work on spell-checkers, auto-completion, dictionaries and converters. I wish all these can be incorporated in OOo. You can see that I have tools for converting (font-encoding to Unicode, and vice versa, HTML or text etc). I wish I could incorporate them in OOo.


Q. Why would you rather work on popular OSS for Localization purposes than creating your own from scratch?

A: I don't really know the answer except that I am not crazy to start from scratch for what is already available :)

Q. How does F/OSS help Indian democratic efforts? What are the issues that exist? How do you resolve them?

A: Fluency of information is an essential prerequisite for any democracy. F/OSS enables fluency of information by basing information on standards and also making applications reproducible without limits.

As for the issues, apart from the ones listed above, there is a need to localize OOo. Indictrans have had to keep that in the back-burner as Indictrans are struggling to help the team survive with other revenue earning projects.

There are a lot of charity organizations and public spirited people. But there is no effective organization that can bring these amorphous forces together.

About the author

Mayank Sharma writes on open source development in India. He lives in New Delhi. He has generously donated this work to the Project.

Return to Editorial Index

Creative Commons Licence
This work is licensed under a Creative Commons License.

Apache Software Foundation

Copyright & License | Privacy | Contact Us | Donate | Thanks

Apache and the Apache feather logo are trademarks of The Apache Software Foundation. OpenOffice, and the seagull logo are registered trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.