Apache OpenOffice (AOO) Bugzilla – Issue 19565
Save as RTF result depends on the current locale
Last modified: 2013-08-07 14:38:26 UTC
Hi, I wrote file with czech letters (attached as czech_letters.sxw) in English RC4. Then I did the following: pavel@pavel:~/OpenOffice.org1.1.0-English> LANG=C pavel@pavel:~/OpenOffice.org1.1.0-English> ./soffice /tmp/czech_letters.sxw pavel@pavel:~/OpenOffice.org1.1.0-English> LANG=en_US pavel@pavel:~/OpenOffice.org1.1.0-English> ./soffice /tmp/czech_letters.sxw pavel@pavel:~/OpenOffice.org1.1.0-English> LANG=cs_CZ pavel@pavel:~/OpenOffice.org1.1.0-English> ./soffice /tmp/czech_letters.sxw pavel@pavel:~/OpenOffice.org1.1.0-English> LANG=ru_RU pavel@pavel:~/OpenOffice.org1.1.0-English> ./soffice /tmp/czech_letters.sxw pavel@pavel:~/OpenOffice.org1.1.0-English> LANG=da_DK pavel@pavel:~/OpenOffice.org1.1.0-English> ./soffice /tmp/czech_letters.sxw In each session, I saved that document as RTF as locale_<locale>.rtf. All documents attached. The differences are only in fcharset: diff from cs_CZ to da_DK: -{\fonttbl{\f0\froman\fprq2\fcharset238 +{\fonttbl{\f0\froman\fprq2\fcharset0 What is the idea behind this?
Created attachment 9303 [details] All files used/produced
reassigned to mru
MRU->CMC: Did there change anything in limerickfilterteam08 CWS? Could you give a short comment on this? thanks.
The charset in the rtf comes from the font details. In different locales different font details may be set, this just gets passed through by the filter. But for rtf this implies that text in the rtf document is in the charset of the font in use, unless it is in unicode. For limerick a fix for issue 12120 and issue 13486 was implemented to fix corruption of some 8bit characters exported into rtf. The problem here was the bad assumption in the code to always export as codepage 1252 if the value of the char was in that range, regardless of the font's charset which actually denotes the charset to use for these 8bit characters. For many encodings this works for many characters as they are all similiar encoding, of course in many other circumstances it doesn't, especially for czech encodings. So the differences of the charset in the fonts between these documents is ok, but the text in the documents should either be different between the documents or in unicode if chars are used which are in different locations in the different font's charset cmc->mru: Do these examples work post-limerickfilterteam08 ? I think they should work fine now. When issue 19112 is implemented the produced RTF will be more aesthetically pleasing, but that's really just a cosmetic change, and shouldn't affect this.
Pavel, I hope this helps. Now, having this explanation, I close the issue as worksforme.
Closed.