How to create a localized version of OOo
This document descibes how to proceed to create a localized version and gives some brief background information about ISO Codes and the resourcesystem.
Build environment
First of all you need the source code and a working build environment. Please visit
http://tools.openoffice.org/
Building Linux
http://tools.openoffice.org/dev_docs/build_linux.html
Building Windows
http://tools.openoffice.org/dev_docs/build_windows_tcsh.html
http://tools.openoffice.org/dev_docs/build_windows.html
I suggest to join the
mailinglist dev@l10n. You can find a
lot of people involved into localization there
http://l10n.openoffice.org/servlets/ProjectMailingListList
ISO Codes
Languages are handled within the buildsystem using ISO Codes. As discussed in Issuezilla Task #i8252# we support a subset of RFC 3066 [1] for the language identifier.
RFC 3066 basically says:
1. Use ISO 639-1 [2] if possible
2. Use ISO 639-2 [2] if no ISO 639-1
code exists
3. Use ISO 3166 [3] country code if
necessary,
to separate to languages with the
same language code, e.g. US English and British
English.
This means we'll have these codes for example:
ISO 639-1 sv Swedish
ISO 3166 en-US US English
To prevent matching problems, languages
with identical language and coutry code like “de-DE” are
reduced to the language code “de”. Currently the build
environment don't support ISO 639-2.
Resourcesystem
The OpenOffice.org source code contains several file types, where strings and messages are declared:
*.src / *.hrc : Contains main UI strings
*.xrm: Contains readme strings
*.xcu: Contains configuration strings
*.ulf: Container for strings converted to several custom formats (.rc, .par)
*.xhp: Contains online help strings
An example of a src resource file:
|
... CheckBox CB_READ_ONLY ... |
Untranslatable string in src files have no language code:
|
Text = "50%" |
The untranslatable strings in xml files are marked by the reserved language identifier “x-no-translate”:
|
... |
Another reserved language identifier
used for documentaion purpose is “x-comment”, these
strings are known as the old developer English and no longer used. The
comments are outdated. The standart string encoding in the all
formats like src , ulf , xcs , xcu is UTF8.
The interface between the source code and the translation tooling is the so called sdf file format ( also know as gsi ). The sdf intermediate file format is introduce here:
http://l10n.openoffice.org/L10N_Framework/Intermediate_file_format.html
Note that this format is strict and should not violated. You can use the tool “gsicheck” in the module transex3 to perfom simple format checks. Deviant to the sdf file format, the language column have been changed from numeric language identifier to ISO Codes.
In the source files are only English and German strings. There is a sdf file particle called „localize.sdf“ placed inside each directory, which contains all other language strings.
| sw/source/ui/frmdlg/ | frmpage.cxx |
| |
frmpage.src |
| |
localize.sdf |
| |
wrap.src |
| |
... |
The content of the particle SDF file is merged into the temporary copies of the source files during the build. From that temporary source files, now containing all strings, the resources files are created. You can use the localize tool to collect and merge strings back into that sdf particles.
Real life
What steps are need to create a OOo build for Khmer?
There are some steps nessesary to add a new language to the OOo build environment.
First introduce the ISO Code for Khmer (“km”) to the build environment, add a new entry in solenv/inc/postset.mk:
|
... completelangiso=af ar be-BY bg br bn bn-BD bn-IN bs ca cs cy da de el en-GB en-US en-ZA eo es et eu fa fi fr ga gl gu-IN he hi-IN hr hu it ja km kn-IN ko lo lt lv mk ms ne nb nl nn nr ns pa-IN pl pt pt-BR ru rw sk sl sh-YU sr-CS ss st sv sw sw-TZ sx ta-IN th tn tr ts ve vi xh zh-CN zh-TW zu ... |
To build the Khmer resources set the
new environment variable “setenv
WITH_LANG "km" every time before building, you can also use
configure to set this variable in the LinuxIntelEnv script.
Now introduce the language to the tools/source/intntl/isolang.cxx
|
... { LANGUAGE_KHMER, "km", "KH" }, ... |
and tools/inc/lang.hxx
|
... #define LANGUAGE_KHMER 0x0453 ... |
The hexadezimal value in lang.hxx is the Microsoft Language identifier. The MS Lang IDs are used in core code, and are of course also needed when storing documents in MS binary file formats. They are not relevant for UI localization, except in one place where they are used for values in a language listbox, see svx/source/dialog/langtab.src . For futher information about that identifier visit the Microsoft webseites [4]
There are additional steps needed to do
for the complete introduction of this language, please consult this
document
http://www.khmeros.info/tools/localization_of_openoffice_2.0.html
Ensure that you have sourced the file
LinuxIntelEnv so
you have a proper environment.
First build the whole office. This is
needed to build all necessary tools.
cd
instsetoo_native
build --all
Use the localize tool to extract and merging the strings
Extracting the strings:
perl -w localize.pl -e -l km=en-US -f khmer.sdf
Note that the Khmer strings are fallbacked to English US, if there is no existing translation yet. Choose the language that best fits as fallback.
Now translate the strings in the new
created file “khmer.sdf”. You can do that by simple
translate the file by hand or use tooling. Common are web based
translation or a conversion to the PO file format and using the
corresponding translation tools like KBabel. Please have a look here
http://www.khmeros.info/tools/oo2.0_program_translation.html
Merge the strings back:
perl -w localize.pl -m -l km -f khmer.sdf
The tool distribute the translated strings into the sdf particles.
Create the localized build
cd
instsetoo_native
setenv WITH_LANG "km"
build --all
After a sucessfull build you should find localised install sets in your instsetoo_native/<platform>/OpenOffice/<package_format>/rpm/install tree
Links
[0] Much detailed additional
documentation:
http://www.khmeros.info/tools/
[1] For more information about RFC
3066:
http://www.faqs.org/rfcs/rfc3066.html
[2] For more information about ISO
639-1 and ISO 639-2:
http://www.loc.gov/standards/iso639-2/
[3] For more information about ISO
3166:
http://www.iso.org/iso/en/prods-services/iso3166ma/index.html
[4] Microsoft Language Identifier
The complete list, not necessarily supported by Windows:
List of Locale ID (LCID) Values as Assigned by Microsoft
http://www.microsoft.com/globaldev/reference/lcid-all.mspx
NLS information page
http://www.microsoft.com/globaldev/nlsweb/
Table of Language Identifiers
http://msdn.microsoft.com/library/en-us/intl/nls_238z.asp
Primary Language Identifiers
http://msdn.microsoft.com/library/en-us/intl/nls_61df.asp
SubLanguage Identifiers
http://msdn.microsoft.com/library/en-us/intl/nls_19ir.asp
WD2000: Supported Language ID
Reference Numbers (LCID)
http://support.microsoft.com/default.aspx?scid=KB;en-us;q221435


