Issue 19565 - Save as RTF result depends on the current locale
Summary: Save as RTF result depends on the current locale
Status: CLOSED IRREPRODUCIBLE
Alias: None
Product: Writer
Classification: Application
Component: code (show other issues)
Version: OOo 1.1 RC4
Hardware: PC Linux, all
: P3 Trivial (vote)
Target Milestone: ---
Assignee: michael.ruess
QA Contact: issues@sw
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-09-14 17:49 UTC by pavel
Modified: 2013-08-07 14:38 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
All files used/produced (8.34 KB, application/octet-stream)
2003-09-14 17:50 UTC, pavel
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description pavel 2003-09-14 17:49:11 UTC
Hi,

I wrote file with czech letters (attached as czech_letters.sxw) in English RC4.
Then I did the following:

pavel@pavel:~/OpenOffice.org1.1.0-English> LANG=C
pavel@pavel:~/OpenOffice.org1.1.0-English> ./soffice /tmp/czech_letters.sxw 
pavel@pavel:~/OpenOffice.org1.1.0-English> LANG=en_US
pavel@pavel:~/OpenOffice.org1.1.0-English> ./soffice /tmp/czech_letters.sxw 
pavel@pavel:~/OpenOffice.org1.1.0-English> LANG=cs_CZ
pavel@pavel:~/OpenOffice.org1.1.0-English> ./soffice /tmp/czech_letters.sxw 
pavel@pavel:~/OpenOffice.org1.1.0-English> LANG=ru_RU
pavel@pavel:~/OpenOffice.org1.1.0-English> ./soffice /tmp/czech_letters.sxw
pavel@pavel:~/OpenOffice.org1.1.0-English> LANG=da_DK
pavel@pavel:~/OpenOffice.org1.1.0-English> ./soffice /tmp/czech_letters.sxw

In each session, I saved that document as RTF as locale_<locale>.rtf. All
documents attached.

The differences are only in fcharset:

diff from cs_CZ to da_DK:

-{\fonttbl{\f0\froman\fprq2\fcharset238
+{\fonttbl{\f0\froman\fprq2\fcharset0

What is the idea behind this?
Comment 1 pavel 2003-09-14 17:50:07 UTC
Created attachment 9303 [details]
All files used/produced
Comment 2 mci 2003-09-16 09:51:39 UTC
reassigned to mru
Comment 3 michael.ruess 2003-09-17 12:33:47 UTC
MRU->CMC: Did there change anything in limerickfilterteam08 CWS?
Could you give a short comment on this? thanks.
Comment 4 caolanm 2003-09-17 12:58:00 UTC
The charset in the rtf comes from the font details. In different
locales different font details may be set, this just gets passed
through by the filter. But for rtf this implies that text in the rtf
document is in the charset of the font in use, unless it is in unicode.

For limerick a fix for issue 12120 and issue 13486 was implemented to
fix corruption of some 8bit characters exported into rtf. The problem
here was the bad assumption in the code to always export as codepage
1252 if the value of the char was in that range, regardless of the
font's charset which actually denotes the charset to use for these
8bit characters. For many encodings this works for many characters as
they are all similiar encoding, of course in many other circumstances
it doesn't, especially for czech encodings.

So the differences of the charset in the fonts between these documents
is ok, but the text in the documents should either be different
between the documents or in unicode if chars are used which are in
different locations in the different font's charset

cmc->mru: Do these examples work post-limerickfilterteam08 ? I think
they should work fine now.

When issue 19112 is implemented the produced RTF will be more
aesthetically pleasing, but that's really just a cosmetic change, and
shouldn't affect this.
Comment 5 caolanm 2003-09-17 12:58:24 UTC
The charset in the rtf comes from the font details. In different
locales different font details may be set, this just gets passed
through by the filter. But for rtf this implies that text in the rtf
document is in the charset of the font in use, unless it is in unicode.

For limerick a fix for issue 12120 and issue 13486 was implemented to
fix corruption of some 8bit characters exported into rtf. The problem
here was the bad assumption in the code to always export as codepage
1252 if the value of the char was in that range, regardless of the
font's charset which actually denotes the charset to use for these
8bit characters. For many encodings this works for many characters as
they are all similiar encoding, of course in many other circumstances
it doesn't, especially for czech encodings.

So the differences of the charset in the fonts between these documents
is ok, but the text in the documents should either be different
between the documents or in unicode if chars are used which are in
different locations in the different font's charset

cmc->mru: Do these examples work post-limerickfilterteam08 ? I think
they should work fine now.

When issue 19112 is implemented the produced RTF will be more
aesthetically pleasing, but that's really just a cosmetic change, and
shouldn't affect this.
Comment 6 michael.ruess 2003-09-17 14:35:10 UTC
Pavel, 
I hope this helps. Now, having this explanation, I close the issue as
worksforme.
Comment 7 michael.ruess 2003-09-17 14:38:55 UTC
Closed.