Lingucomponent Sub-Project: Hyphenation
One of the goals of this projects is to improve the quality of the hyphenator and include more hyphenation dictionaries from various languages.
The OpenOffice.org hyphenator was based on the libhnj library by Raph Levien. It uses TeX hyphenation dictionaries with small corrections. There are many currently supported languages.
It is reasonably easy to port the TeX hyphenation dictionary (usually located in tex/generic/hyphen/ directory of TeX tree) to OpenOffice.org's hyphenation format. To help with hyphenation dictionary creation there is a standalone version of the hyphenation code available with a simple example program that can be used for development and testing. Some hints follow:
- You need to replace or remove all TeX macroses from library.
- You need to call substrings.pl (can be found in the standalone hyphenator
linked above).
Usage: substrings.pl <patterns.dic> <newpatterns.dic>
This will write the modified file to newpatterns.dic. - You need to put an indicator of the character encoding used as the first line of the dictionary file (look in hyph_en.dic). Possible values are: ISO8859-1, ISO8859-2, ..., ISO8859-10, KOI8-R
- See Extensions development on how to deploy your hyphenation dictionary in OpenOffice.org.