Apache OpenOffice (AOO) Bugzilla – Issue 21019
Read-out and assign language of IME to CJK+CTL input text
Last modified: 2007-06-20 17:17:00 UTC
MS Windows IME provides extensive information about the current text input that can be read out by OO.o already. OO.o should make use of these entended information by read-out language (and direction if possible) and assigning it to iput text. Example 1: A user chose Thai as the default language for CTL languages but types in Arabic. Now all Thai locale will be assigned to this Arabic text. This leads to problems in certain cases where Thai language uses different formatting than Arabic (like last line text distribution in justfied text). With this new feature OO.o would auto-detect that Arabic text is input (opposed to the defined Thai locale) and would override this setting by formatting the input (Arabic) text in Arabic instead. Example 2: When typing in compbinations of RTL, LTR and weak characters (like numbers, hyphens and Arabic) OO.o currently often assign the wrong text direction to the weak character, resulting in construts like '-5CIBARA' instead of '5-CIBARA'. Again this erranous behaviour can be avoided if OO.o would "know" which language (-> text directuion) is used for weak characters. For the moment is only makes sense to support CTL languages for this only. It still has to be cleared if this also should apply to CJK languages. (It does not makes sense for Western languages since for example we cannot differentiate in between German/German and German/Austrian input) Note: Since Unix IMEs do not report any language this feature con only be implemented under Windows.
*** Issue 5966 has been marked as a duplicate of this issue. ***
*** Issue 19848 has been marked as a duplicate of this issue. ***
Issue 19848 is NOT a duplicate of this issue. Issue 19848 discusses internal representation and imported/legacy texts, while this issue discusses a particular input method -- these are mostly orthogonal. To witness, issue 18024 gives an alternative resolution to this issue (namely, manual insertion of RLM and LRM characters).
*** Issue 1035 has been marked as a duplicate of this issue. ***
Unicode 4.0.1 has recently been released with changes to the properties of several characters. Once OO (and some other projects) will be updated to comply with these changes, the HebrewLetter+Hyphen+Number issue will finally be solved. See http://bugzilla.mozilla.org/show_bug.cgi?id=240943 for Mozilla's take on the subject. Note that this bug 19848 has wrongly been marked as a duplicate of this bug (which has nothing to do with the hyphenation issue), so just to make that this important update isn't missed, I'm posting it in both bugs. Sorry for the spam. Please consider reopening bug 19848, or post a new one specifically for compliance with the aforementioned changes in Unicode. Prog.
Fixed in cws os30
.
reassigned to SBA.
SBA: "Example 1" works now: The language for CJK and CTL input now gets set according to the chosen Input method. To be seen like this: - Enable CJK and CTL support - Switch Keyboard to Hebrew, type something - Switch Keyboard to Arabic, type something - Select some Hebrew letters - Format-Character, tabpage "Font" -> The CJK language for the Hebrew text is set to Hebrew. Note: Works also with CJK languages This enables (for example) the linguistic components to check only the respective language in CJK-CJK or CTL-CTL mixed text without the user having to set the language that is NOT the default CJK/CTL language. It was dropped to do this for different Western languages. Most of these can be written with the same keyboard layout and this is what most users do: When I write an English part in a document, I don't change the keyboard layout. I am sure most "bilingual western writers" do the same. If this would have been implemented accordingly, the input language would always override the western default language, thus it would be impossible to write in the default language of a document without having the keyboard set accordingly. A scenario: A multilingual manual with different paragraph styles, each style having a different western language set. As soon as I type in one of them, the respective language is kept WITHOUT having to change the keyboard from German to English, French, Italian. In cases of "newly written multilingual text" one has to set the Western language other than the default one manually. Some hints to ease this: 1. Via character styles (tab page "fonts") 2. Via paragraph styles (tab page "fonts") 3. Via context menu in a misspelled word the online spellchecker has detected as a word existing in another installed language. Then the context menu offers to change the language of the word or the entire paragraph (this hard-sets the character attribute "language"). About Example 2: See issue 18042 for future enhancements of weak characters. Set to verified.
The HyphenMinus+Number problem is not fixed. Please re-open this bug, or more fitting, re-open/undupe bug 19848. Tested with Writer 1.9.m49 Prog.
SBA, PLEASE think again on "It was dropped to do this for different Western languages." Look at the 1035 or 5966 that hav been marked duplicate, it's _different_ keyboard layouts example there (Russian and English), the same being for e.g. Greek and English. If such IME-reading functionality has already been developed it should be configurable (switch on/off) to allow users to choose the right way for them (even for western-only languages, e.g. to define English and French and assign "United States - Internatlional" to both).
SBA: OK in 680m52. Closed. Note: I reopened issue 1035 because of the unsolved Greek and Russian keyboard input problem.