16354 – puctuations' font in R2L text differ from the default paragraph font

Issue 16354 - puctuations' font in R2L text differ from the default paragraph font

Summary: puctuations' font in R2L text differ from the default paragraph font

Status:	CLOSED FIXED

Alias:	None

Product:	Internationalization
Classification:	Code
Component:	BiDi (show other issues)
Version:	OOo 1.1 RC3
Hardware:	PC All

Importance:	P4 Trivial (vote)
Target Milestone:	---
Assignee:	ulf.stroehler
QA Contact:	issues@l10n

URL:
Keywords:	needhelp, needmoreinfo, oooqa

Duplicates (3):	13059 16659 23282 (view as issue list)
Depends on:	21019
Blocks:	18675 19848
	Show dependency tree

Reported:	2003-07-02 21:27 UTC by mehlng
Modified:	2013-08-07 15:00 UTC (History)
CC List:	6 users (show)

See Also:
Issue Type:	DEFECT
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description mehlng 2003-07-02 21:27:59 UTC

(Original bug Report by Mattan)
To recreate bug follow this steps:
1) write an hebrew doc. with puctuations. with different font then the default
2) check a puctuation's (say a dot) font - it's times-new-romans - should be
like the Hebrew one.
One of the sideeffects is that when choosing much Hebrew text - the choose-font
bar becomes blank (as there are two different fonts in the selected text the
dots and the text)

Pretty Hard to notice but Mattan really has it when it comes to obscure bugs...

Comment 1 sforbes 2003-07-10 15:37:53 UTC

It seems to me that OO keeps using the ROMAN font for
pucntuation/numbering instead the CTL font- this would explain the
"coose font" becoming blank.

Confirmed on windows 2000

Comment 2 Dieter.Loeschky 2003-08-20 11:32:59 UTC

DL->FME: Would you please takeover?

Comment 3 frank.meies 2003-08-20 11:50:17 UTC

FME->FT: Well, that's correct. The punctuation characters are
classified as "Western", therefore the Western font is used for them.
Do you think we have to change this?

Comment 4 frank.meies 2003-08-20 12:01:36 UTC

*** Issue 16659 has been marked as a duplicate of this issue. ***

Comment 5 mehlng 2003-08-20 12:25:17 UTC

the Issue 16659 is not necessarily related to this, the SIZE of of the
puctuations is different, not only the font.
This is a very different issue as this behaviour could be considered
"normal" (as you can't tell which font to choose for neutural
characters) the excessive font issue is defenitely NOT normal and
shouldn't happen.
(was submitted also to ISsue 16659)

Comment 6 mehlng 2003-08-20 12:30:33 UTC

mehlng->fm:
Quoting Frank Meirs "Do you think we should change that?"
Well the answer is defenitely a major YES. It can be changed and
should be defenitely changed.
The algorithm for determining its font is simple. If neutural sign NS
should be written R2L it's non-western font, otherwise it's a western
font. It makes sense, MSWord does it, and we should also.
Demonstration, caps=heb main directionality=L2R
said the man,[1] bruto "SHALOM,[2] LACHEM"
rendered as:
said the man,[1] bruto "MEHCAL [2],MOLAHS"
the [2] comma style should be hebrew of course, and the [1] comma
should defenitely be English-western style.

Comment 7 falko.tesch 2003-08-22 11:10:27 UTC

I agree. I cannot see why we would use Western punctuation within CTL
text if the user does not change the IME/Keyboard layout.

Comment 8 frank.meies 2003-08-22 11:36:59 UTC

FME->FT: A solution would be to classify the punctuation characters
depending on the IME that was used to enter the character.

FME->KHONG: What do you think? Is it possible for i18n to classify
punctuation characters as WEAK or COMPLEX if they are located behind
CTL characters, but have them LATIN if they are located behind Asian
characters?

Comment 9 mehlng 2003-08-22 16:02:34 UTC

melng->FM,FT: No, no, no. It doesn't have to be connected to KEyboard
Layout, it should be connected to text directionality!
in "SHA, LOM sha, lom" where caps are Hebrew, the first comma sould be
non-western as it's directionality is R2L, and the second comma should
be western as it's directionality is L2R.
This is so simple I can't see the need for IME/Keyboard layout in here.

Please note, you don't ALWAYS have keyboard Layout availible (onscreen
keyboard for instance, regular text importing for another instance) so
the less we depend of IME the better, please stop mentioning it in
every corner. We can really avoid the need for such recognition.

Comment 10 frank.meies 2003-08-25 09:04:52 UTC

*** Issue 13059 has been marked as a duplicate of this issue. ***

Comment 11 karl.hong 2003-08-26 00:13:19 UTC

Karl->FMT: Since we have glyph substitution for missing glyph in a 
font, I don't see the problem to make Latin punctuations as WEAK 
script type, they will get a real script type of preceeding 
characters when applying fonts and other langauge services. We 
currently make space (0x20 and 0xA0) as WEAK type, we could extend 
the range to, 0x20-0x2F, 0x3A-0x40, 0x5B-0x60 and 0x7B-0x7E, which 
covers all basic Latin punctuations.

Comment 12 frank.meies 2003-08-26 07:43:41 UTC

FME: There are at least two problem with 'weak' punctuation characters:

1. Sample: SOME ASIAN TEXT. SOME MORE ASIAN TEXT (and English text).

In this sample the first '.' will differ from the second '.', and the
'(' will not match the ')'

2. Even worse: Already existing documents will change their formatting.

Comment 13 mehlng 2003-08-26 09:32:19 UTC

*** Issue 18675 has been marked as a duplicate of this issue. ***

Comment 14 mehlng 2003-08-26 09:39:56 UTC

mehlng->FME:
[1] This is just what I was saying, twice, thanks for explaining me.
[2] "Even worse: Already existing documents will change their
formatting." why so? importing raw text will have to process the raw
text before. but a document? the document was be autoformatted when it
was written, why should we change anything about it?

Comment 15 sforbes 2003-08-26 09:51:26 UTC

See also Issue 18675 - this issue casuses simple file to become
extremly bloated when exporting to HTML, with no real reason.

Comment 16 michael.brauer 2003-08-27 11:55:23 UTC

*** Issue 18675 has been marked as a duplicate of this issue. ***

Comment 17 frank.meies 2003-08-28 07:39:29 UTC

FME: Ok, that's it. I'm out of this discussion.

FME->FT: I'll reassign this issue to you.

Comment 18 falko.tesch 2003-09-01 14:20:12 UTC

Hi, instead of making such complicated solutions I vote for using just
the same font for Western and CTL text.

Comment 19 Dieter.Loeschky 2003-09-12 14:24:17 UTC

DL->FT: Please assign this task to the responsible person.

Comment 20 sforbes 2003-10-08 19:40:55 UTC

------- Additional Comments From Falko Tesch 2003-09-01 06:20 PDT -------
 
>instead of making such complicated solutions I vote for using just
>the same font for Western and CTL text.

Hebrew and western puncutation do not always match typographly- not to mention 
that some Hebrew punctuation does not even exsist in roman fonts (for example, 
Sheqel sign or the shulder hyhen).
See also issue #19848 for problems that could be affected by this kind of decision.

Comment 21 falko.tesch 2003-10-27 16:30:40 UTC

FT: As long as we have not implemented  there is no chance to fix this
issue.

Comment 22 falko.tesch 2003-10-27 16:31:22 UTC

I missed out 21019 in the text above.

Comment 23 mehlng 2003-10-27 18:12:59 UTC

NO NO NO NO NO NO NO!!!!!!!! 
you do *NOT* need to implement IME recognition to solve this issue. 
You do *not* I repeat NOT , N-O-T need IME recognition to solve this issue. 
Let us clarify ourselves. 
ASIAN TEXT . (english text) 
we shoul d only recognize punctuations by what was typed before. IE dot will 
become asian-dot if ASIAN text was typed before it, but parenthesis will NOT 
become asian unless it is enclosed by Asian Text, otherwise it will be English. 
This algorithm works 99% of the time whereas IME works even less, I haven't 
described it with great details but I think the Idea is clear. 
Now repeat after me: 
We do not need IME recognition 
We do not need IME recognition 
We do not need IME recognition 
We do not need IME recognition 
We do not need IME recognition 
Really we don't, wise heuristics will improve things significantly.

Comment 24 falko.tesch 2003-10-28 09:48:17 UTC

FT->Mehling: Please lower your voice and calm down. If you keep on
yelling and insisting the way you do now I'm not willing to discuss
this issue with you any further!

FT->All: I will further investigate this issue.

Comment 25 mehlng 2003-10-28 11:40:25 UTC

Mehlng->FT: I'm sorry for my rudeness but it seems I've been constantly ignored 
regarding this issue IMHO using IME is not wise and I'll be sorry to see OOo walking 
this way. After explaining myself nicely facing a bold ignoration from you (stating this 
issue depends of IME without any reference to my (very polite) privious comments) 
is pretty desparative, please remember I'm QA'ing for the community alone, no one 
pays me to do so. 
Anyway please consult a Hebrew speaking person with this issue it could matter 
(see shoshana's comment about typographical diffrences between punctuations) I'll 
suggest Shachar Shemesh (see http://www.shemesh.biz for contact details) as he 
handles such things very well.

Comment 26 falko.tesch 2003-11-17 15:20:05 UTC

FT: Here's a solution that will (IMHO) satisfy all needs without using
an automatism that will discriminate one or the other users:
OO.o 2.0 will introduce a paragraph attribute that determines what
script will be used for punctuation characters within the ASCII range
0-127.
In case of OO.o running with a Western locale (English, German etc.)
Western script will be used as the default setting for punctuation
characters in all paragraphs.
The above goes for CTL and CJK script languages of course.
In case that a different setting is desired the user will be able to
change these settings manually by hard formatting or soft formatting
(using a style).

Comment 27 mehlng 2003-11-19 12:16:56 UTC

mehlng->ft:
Again, I see no point using a strict Rule for all punctuations which
will DEFENITELY cause mistakes (think of SHALOM hello, man, hello
SHALOM) when a wiser approach exists.

Please contact Shachar Shemesh ( http://www.shemesh.biz ) and he might
explain it to you. Shortly, it's about that any punctuation between to
R2L words is R2L as well.

Comment 28 frank.meies 2003-11-20 07:53:31 UTC

FME->mehlng: [...] Shortly, it's about that any punctuation between to
R2L words is R2L as well. [...]

FT's proposal is not about the direction of the punctuation marks.
It's a solution for the problem, which of the three fonts (Western,
Asian, CTL) to use for the punctuation marks. The direction issue is
discussed i18024.

I think this is a good solution, which perfectly matches the reported
problem. Weak characters (e.g., space) will use the font of their
predecessors (just like it is now), punctuation marks will use the
font specified by the new paragraph attribute.

Comment 29 mehlng 2003-11-30 18:48:26 UTC

mehlng->fme: 
I ment that the puctuations' directionality thus FONT will be Hebrew.  
think of this: 
SAID THE MAN "if we can't beat 'em - apply some complex BiDi algorithm to confuse 
them" 
all the puctuations will be wronglly using an R2L font, as it's an English quote in a 
Hebrew paragraph. Still they obviously shoul d use an L2R font as they're a part of 
an English sentence, with my method, all of them except of the two quotation-marks 
will be using correcly an English font.

Comment 30 frank.meies 2003-12-01 10:32:38 UTC

FME->mehlng: Since the punctuation marks are currently defined as
Latin, we do not have a problem with LTR text in RTL paragraphs. We
currently do not have a problem with Asian text, because most likely,
Asians will use their full-width punctuation marks. The only thing we
would change, is that we would use the CTL font for text inside a RTL
run, right?

Comment 31 mehlng 2003-12-01 12:22:09 UTC

mehlng->fme: 
Correct, we need to use the CTL (Middle-eastern, Hebrew) font to punctuations 
between CTL words and OF COURSE(!!!) vice versa, in an L2R paragraph and an 
inline R2L sentence we'll use the CTL (Hebrew) font for the punctuations inside the 
R2lL paragraph: 
for instance SHALOM,[*] MAR KONILEMEL it was nice to meet you. 
the comma ticked with [*] will be using an R2L (hebrew) font. 
 
Thanks for commenting so fast.

Comment 32 sforbes 2003-12-03 11:02:24 UTC

dina: fwi

Comment 33 frank.meies 2003-12-09 11:16:06 UTC

FME: Ok, I'll implement it this way:

1. Inside a LTR run: Since punctuation characters are defined as 'Latin' by
default, these characters use the Latin font inside a LTR run. Nothing has to be
changed in this case.

2. Inside a RTL run: All characters inside a RTL run will use the CTL font.

Any objections?

Comment 34 mehlng 2003-12-10 08:27:14 UTC

mehlng->fme:
agreed.
Punctuations between R2L and L2R paragraphs could use the paragraph font.

Comment 35 frank.meies 2003-12-16 10:00:04 UTC

FME: Fixed in cws swq02:

sw/source/core/text/porfld.cxx 1.41.10.1
sw/source/core/text/porfld.hxx 1.9.74.1
sw/source/core/text/porlay.cxx 1.43.50.1

FME->US: Please do some thorough testing with script changes, fields should also
be considered. Thanx.

Comment 36 sforbes 2003-12-23 12:46:12 UTC

*** Issue 23282 has been marked as a duplicate of this issue. ***

Comment 37 frank.meies 2004-01-07 15:42:31 UTC

Reassigned to QA.

Comment 38 falko.tesch 2004-01-09 11:01:51 UTC

FT: Added this feature to spec.
See 
http://specs.openoffice.org/CTL/Editing_Bidirectional_Text.sxw
chapter 1.7.

Comment 39 sforbes 2004-01-09 12:05:14 UTC

the URL you gave is giving me a 404. I don't see the file browsing the specs either. Is it publicly 
avalible?

Comment 40 stefan.baltzer 2004-02-03 17:36:41 UTC

SBA: The correct URL is
http://specs.openoffice.org/writer/CTL/Editing_Bidirectional_Text.sxw

Comment 41 stefan.baltzer 2004-02-04 12:04:35 UTC

SBA: Verified in CWS swq02.

Comment 42 stefan.baltzer 2004-02-04 12:09:05 UTC

SBA->Sforbes: Before we close this one, some RTL-LTR specialists like you should
have a look at the 680 master build where the CWS swq02 is integrated (should be
available within the next two weeks) and comment here.

Comment 43 mehlng 2004-02-15 09:21:04 UTC

The specs ignores my comment.
In a R2L paragraph *all* punctuations will be using CTL font. This is clearly
problamatic as I've shown[*]. I think the specs should be thought over.
[*]the following is R2L
SHALOM, LACHEM I SAID "hi all! hi"
the comma should obviously be CTL and the excl. mark should be L2R. The excl.
mark is already recognized as L@R text by the BiDi algorithm I can see no reason
why stricly assigning all punctuations in R2L paragraph with CTL font.

Comment 44 ulf.stroehler 2004-06-03 14:38:18 UTC

us->mehlng: I don't get the point for reopening the issue. You explicitely
agreed on FME's suggestion for a fix: "Inside a RTL run: All characters inside a
RTL run will use the CTL font."

Comment 45 mehlng 2004-06-19 21:32:16 UTC

I probably misunderstood fme's note,
I proposed that inside the paragraph there'll be distinction between
punctuations between L2R text. I thought "L2R run" in fme's proposition ment
"piece of text inside the paragraph" and not "L2R paragraph" as he probably ment.
fme thought that I agreed upon strictly setting up all punctuation's font
according to the paragraphs directionality, which I didn't.
A misunderstanding.

Comment 46 frank.meies 2004-06-20 20:37:24 UTC

FME: A 'run' in my understanding is a piece of text with the same direction. So
a L2R run can be either 
1. A complete L2R paragraph,
2. A piece of L2R text in a R2L paragraph,
3. A piece of L2R text inside a piece of R2L text inside a L2R paragraph and so on.
So we should not have any problems, do we?

Comment 47 falko.tesch 2004-06-21 10:37:46 UTC

FT:Due to legal reasons the had to be removed from public access.

Comment 48 falko.tesch 2004-06-21 10:38:25 UTC

FT: Due to legal reasons the specification mentioned above had to be removed
from public access.

Comment 49 mehlng 2004-06-23 05:09:47 UTC

mehlng->fme:
no, completely agreed and sorry for bothering you about this.

Comment 50 ulf.stroehler 2004-07-29 13:52:32 UTC

After agreement setting back to 'fixed'.

Comment 51 ulf.stroehler 2004-07-29 13:54:56 UTC

Re-closing 'fixed'/'verified' issue.

Comment 52 prognathous 2004-08-02 12:53:44 UTC

I don't quite see how fixing this bug would change the way Hyhpen-Minus behaves
in RTL texts (Bug 19848). Does it make OO Unicode 4.0.1 compliant?

Now, excuse my newbieness, but where can I download a build with this fix? I'd
like to test it out.

Prog.

Comment 53 sforbes 2004-08-02 13:12:43 UTC

prog: the 680 snapshots are available from:
http://download.openoffice.org/680/index.html