Apache OpenOffice (AOO) Bugzilla – Issue 16354
puctuations' font in R2L text differ from the default paragraph font
Last modified: 2013-08-07 15:00:08 UTC
(Original bug Report by Mattan) To recreate bug follow this steps: 1) write an hebrew doc. with puctuations. with different font then the default 2) check a puctuation's (say a dot) font - it's times-new-romans - should be like the Hebrew one. One of the sideeffects is that when choosing much Hebrew text - the choose-font bar becomes blank (as there are two different fonts in the selected text the dots and the text) Pretty Hard to notice but Mattan really has it when it comes to obscure bugs...
It seems to me that OO keeps using the ROMAN font for pucntuation/numbering instead the CTL font- this would explain the "coose font" becoming blank. Confirmed on windows 2000
DL->FME: Would you please takeover?
FME->FT: Well, that's correct. The punctuation characters are classified as "Western", therefore the Western font is used for them. Do you think we have to change this?
*** Issue 16659 has been marked as a duplicate of this issue. ***
the Issue 16659 is not necessarily related to this, the SIZE of of the puctuations is different, not only the font. This is a very different issue as this behaviour could be considered "normal" (as you can't tell which font to choose for neutural characters) the excessive font issue is defenitely NOT normal and shouldn't happen. (was submitted also to ISsue 16659)
mehlng->fm: Quoting Frank Meirs "Do you think we should change that?" Well the answer is defenitely a major YES. It can be changed and should be defenitely changed. The algorithm for determining its font is simple. If neutural sign NS should be written R2L it's non-western font, otherwise it's a western font. It makes sense, MSWord does it, and we should also. Demonstration, caps=heb main directionality=L2R said the man,[1] bruto "SHALOM,[2] LACHEM" rendered as: said the man,[1] bruto "MEHCAL [2],MOLAHS" the [2] comma style should be hebrew of course, and the [1] comma should defenitely be English-western style.
I agree. I cannot see why we would use Western punctuation within CTL text if the user does not change the IME/Keyboard layout.
FME->FT: A solution would be to classify the punctuation characters depending on the IME that was used to enter the character. FME->KHONG: What do you think? Is it possible for i18n to classify punctuation characters as WEAK or COMPLEX if they are located behind CTL characters, but have them LATIN if they are located behind Asian characters?
melng->FM,FT: No, no, no. It doesn't have to be connected to KEyboard Layout, it should be connected to text directionality! in "SHA, LOM sha, lom" where caps are Hebrew, the first comma sould be non-western as it's directionality is R2L, and the second comma should be western as it's directionality is L2R. This is so simple I can't see the need for IME/Keyboard layout in here. Please note, you don't ALWAYS have keyboard Layout availible (onscreen keyboard for instance, regular text importing for another instance) so the less we depend of IME the better, please stop mentioning it in every corner. We can really avoid the need for such recognition.
*** Issue 13059 has been marked as a duplicate of this issue. ***
Karl->FMT: Since we have glyph substitution for missing glyph in a font, I don't see the problem to make Latin punctuations as WEAK script type, they will get a real script type of preceeding characters when applying fonts and other langauge services. We currently make space (0x20 and 0xA0) as WEAK type, we could extend the range to, 0x20-0x2F, 0x3A-0x40, 0x5B-0x60 and 0x7B-0x7E, which covers all basic Latin punctuations.
FME: There are at least two problem with 'weak' punctuation characters: 1. Sample: SOME ASIAN TEXT. SOME MORE ASIAN TEXT (and English text). In this sample the first '.' will differ from the second '.', and the '(' will not match the ')' 2. Even worse: Already existing documents will change their formatting.
*** Issue 18675 has been marked as a duplicate of this issue. ***
mehlng->FME: [1] This is just what I was saying, twice, thanks for explaining me. [2] "Even worse: Already existing documents will change their formatting." why so? importing raw text will have to process the raw text before. but a document? the document was be autoformatted when it was written, why should we change anything about it?
See also Issue 18675 - this issue casuses simple file to become extremly bloated when exporting to HTML, with no real reason.
FME: Ok, that's it. I'm out of this discussion. FME->FT: I'll reassign this issue to you.
Hi, instead of making such complicated solutions I vote for using just the same font for Western and CTL text.
DL->FT: Please assign this task to the responsible person.
------- Additional Comments From Falko Tesch 2003-09-01 06:20 PDT ------- >instead of making such complicated solutions I vote for using just >the same font for Western and CTL text. Hebrew and western puncutation do not always match typographly- not to mention that some Hebrew punctuation does not even exsist in roman fonts (for example, Sheqel sign or the shulder hyhen). See also issue #19848 for problems that could be affected by this kind of decision.
FT: As long as we have not implemented there is no chance to fix this issue.
I missed out 21019 in the text above.
NO NO NO NO NO NO NO!!!!!!!! you do *NOT* need to implement IME recognition to solve this issue. You do *not* I repeat NOT , N-O-T need IME recognition to solve this issue. Let us clarify ourselves. ASIAN TEXT . (english text) we shoul d only recognize punctuations by what was typed before. IE dot will become asian-dot if ASIAN text was typed before it, but parenthesis will NOT become asian unless it is enclosed by Asian Text, otherwise it will be English. This algorithm works 99% of the time whereas IME works even less, I haven't described it with great details but I think the Idea is clear. Now repeat after me: We do not need IME recognition We do not need IME recognition We do not need IME recognition We do not need IME recognition We do not need IME recognition Really we don't, wise heuristics will improve things significantly.
FT->Mehling: Please lower your voice and calm down. If you keep on yelling and insisting the way you do now I'm not willing to discuss this issue with you any further! FT->All: I will further investigate this issue.
Mehlng->FT: I'm sorry for my rudeness but it seems I've been constantly ignored regarding this issue IMHO using IME is not wise and I'll be sorry to see OOo walking this way. After explaining myself nicely facing a bold ignoration from you (stating this issue depends of IME without any reference to my (very polite) privious comments) is pretty desparative, please remember I'm QA'ing for the community alone, no one pays me to do so. Anyway please consult a Hebrew speaking person with this issue it could matter (see shoshana's comment about typographical diffrences between punctuations) I'll suggest Shachar Shemesh (see http://www.shemesh.biz for contact details) as he handles such things very well.
FT: Here's a solution that will (IMHO) satisfy all needs without using an automatism that will discriminate one or the other users: OO.o 2.0 will introduce a paragraph attribute that determines what script will be used for punctuation characters within the ASCII range 0-127. In case of OO.o running with a Western locale (English, German etc.) Western script will be used as the default setting for punctuation characters in all paragraphs. The above goes for CTL and CJK script languages of course. In case that a different setting is desired the user will be able to change these settings manually by hard formatting or soft formatting (using a style).
mehlng->ft: Again, I see no point using a strict Rule for all punctuations which will DEFENITELY cause mistakes (think of SHALOM hello, man, hello SHALOM) when a wiser approach exists. Please contact Shachar Shemesh ( http://www.shemesh.biz ) and he might explain it to you. Shortly, it's about that any punctuation between to R2L words is R2L as well.
FME->mehlng: [...] Shortly, it's about that any punctuation between to R2L words is R2L as well. [...] FT's proposal is not about the direction of the punctuation marks. It's a solution for the problem, which of the three fonts (Western, Asian, CTL) to use for the punctuation marks. The direction issue is discussed i18024. I think this is a good solution, which perfectly matches the reported problem. Weak characters (e.g., space) will use the font of their predecessors (just like it is now), punctuation marks will use the font specified by the new paragraph attribute.
mehlng->fme: I ment that the puctuations' directionality thus FONT will be Hebrew. think of this: SAID THE MAN "if we can't beat 'em - apply some complex BiDi algorithm to confuse them" all the puctuations will be wronglly using an R2L font, as it's an English quote in a Hebrew paragraph. Still they obviously shoul d use an L2R font as they're a part of an English sentence, with my method, all of them except of the two quotation-marks will be using correcly an English font.
FME->mehlng: Since the punctuation marks are currently defined as Latin, we do not have a problem with LTR text in RTL paragraphs. We currently do not have a problem with Asian text, because most likely, Asians will use their full-width punctuation marks. The only thing we would change, is that we would use the CTL font for text inside a RTL run, right?
mehlng->fme: Correct, we need to use the CTL (Middle-eastern, Hebrew) font to punctuations between CTL words and OF COURSE(!!!) vice versa, in an L2R paragraph and an inline R2L sentence we'll use the CTL (Hebrew) font for the punctuations inside the R2lL paragraph: for instance SHALOM,[*] MAR KONILEMEL it was nice to meet you. the comma ticked with [*] will be using an R2L (hebrew) font. Thanks for commenting so fast.
dina: fwi
FME: Ok, I'll implement it this way: 1. Inside a LTR run: Since punctuation characters are defined as 'Latin' by default, these characters use the Latin font inside a LTR run. Nothing has to be changed in this case. 2. Inside a RTL run: All characters inside a RTL run will use the CTL font. Any objections?
mehlng->fme: agreed. Punctuations between R2L and L2R paragraphs could use the paragraph font.
FME: Fixed in cws swq02: sw/source/core/text/porfld.cxx 1.41.10.1 sw/source/core/text/porfld.hxx 1.9.74.1 sw/source/core/text/porlay.cxx 1.43.50.1 FME->US: Please do some thorough testing with script changes, fields should also be considered. Thanx.
*** Issue 23282 has been marked as a duplicate of this issue. ***
Reassigned to QA.
FT: Added this feature to spec. See http://specs.openoffice.org/CTL/Editing_Bidirectional_Text.sxw chapter 1.7.
the URL you gave is giving me a 404. I don't see the file browsing the specs either. Is it publicly avalible?
SBA: The correct URL is http://specs.openoffice.org/writer/CTL/Editing_Bidirectional_Text.sxw
SBA: Verified in CWS swq02.
SBA->Sforbes: Before we close this one, some RTL-LTR specialists like you should have a look at the 680 master build where the CWS swq02 is integrated (should be available within the next two weeks) and comment here.
The specs ignores my comment. In a R2L paragraph *all* punctuations will be using CTL font. This is clearly problamatic as I've shown[*]. I think the specs should be thought over. [*]the following is R2L SHALOM, LACHEM I SAID "hi all! hi" the comma should obviously be CTL and the excl. mark should be L2R. The excl. mark is already recognized as L@R text by the BiDi algorithm I can see no reason why stricly assigning all punctuations in R2L paragraph with CTL font.
us->mehlng: I don't get the point for reopening the issue. You explicitely agreed on FME's suggestion for a fix: "Inside a RTL run: All characters inside a RTL run will use the CTL font."
I probably misunderstood fme's note, I proposed that inside the paragraph there'll be distinction between punctuations between L2R text. I thought "L2R run" in fme's proposition ment "piece of text inside the paragraph" and not "L2R paragraph" as he probably ment. fme thought that I agreed upon strictly setting up all punctuation's font according to the paragraphs directionality, which I didn't. A misunderstanding.
FME: A 'run' in my understanding is a piece of text with the same direction. So a L2R run can be either 1. A complete L2R paragraph, 2. A piece of L2R text in a R2L paragraph, 3. A piece of L2R text inside a piece of R2L text inside a L2R paragraph and so on. So we should not have any problems, do we?
FT:Due to legal reasons the had to be removed from public access.
FT: Due to legal reasons the specification mentioned above had to be removed from public access.
mehlng->fme: no, completely agreed and sorry for bothering you about this.
After agreement setting back to 'fixed'.
Re-closing 'fixed'/'verified' issue.
I don't quite see how fixing this bug would change the way Hyhpen-Minus behaves in RTL texts (Bug 19848). Does it make OO Unicode 4.0.1 compliant? Now, excuse my newbieness, but where can I download a build with this fix? I'd like to test it out. Prog.
prog: the 680 snapshots are available from: http://download.openoffice.org/680/index.html