Apache OpenOffice (AOO) Bugzilla – Issue 11018
Unknown characters replaced by question marks
Last modified: 2004-01-29 10:05:51 UTC
When a document contains characters that are not present in the current font, OpenOffice.org replaces these with question marks. Common examples include quotation marks, bullets and long dashes that are present in the Microsoft Windows code pages, but not in Latin-1. Using a question mark is a fundamentally flawed approach. The question mark can completely alter the *meaning* of the text. I've seen other applications use a hollow square for characters that cannot be shown. This is much better, since the square does not mean anything. The user will just see that something is not correct, instead of seeing a totally different text. (In some configurations, it appears that OpenOffice.org does use a hollow square. On my Linux computers, I always get the question mark.) If an unknowing reader reads a sentence with a question mark in the middle, he will assume that the writer was mad ;-) All the writer perhaps did, was to insert a (typograhically proper) long dash. Some examples include issues #4889, #6815, #8429 and #9794.
Created attachment 5263 [details] Screenshot showing the same text in Times New Roman and Helvetica (Debian Linux)
Hi reporter, thank you for using and supporting OOo. Does this problem still exist in “OOo 1.1.0”? If it does, please attach an OOo-document that reproduces this problem _here_in_the_issue_. All you say sounds completely logic, and an example document with a little description (The missing characers are "WZTLFX:-) Unicode xxx see screenshot ... , or something like that) wold help to get a solution, soon. CU Rainer
reqassigned to mci
Hi gaute, thanks for using and supporting openOffice... Please try our newest version and send a new comment if your problem occurs again... if there's no new comment until 12/04/2003 I assume this works now... I'll close this Issue then... set to worksforme...
Created attachment 11735 [details] Screenshot showing how typographical quotes appear in certain fonts (Bookman in this case)
Created attachment 11736 [details] The document used to create the last screenshot
Sorry for being late, but I haven't gotten around to install 1.1 on my Linux computers yet. When making the screenshots, I wrote "test", and the quotes were automatically converted to proper, typographical quotes, as defined in the standard English replacement settings. In some fonts, these characters don't exist, and I get the question mark instead. I think the question mark is plainly misleading, and would prefer something neutral, like a square symbol. (Or even better, fall back to an alternative quote symbol.) I use Debian Woody, with XFree86 4.1.
Hi, I can not reproduce that with my 1.1.0 German version WIN98SE: 645m19(Build8693) I do not have "Bookman" on my PC, so that I can not test with that special font. I found "Simplex" as a font without "typographical quotes", I tried with 'U+201C' and 'U+201D'), and it shows rectangles instead of the quotes, what seems to be a good solution. I tried to find some other fonts without 'U+201C' and 'U+201D' to test that again, but it is really hard work, especially, because "Insert Special Character" shows very many fonts that can not be inserted in a normal way. May be this is a special Linux issue and can be reproduced only by Linux users. I will ask them. Rainer
Without additional testing, I can say this is common under Linux, and OpenOffice.org is not the only application which displays this behaviour. I have the same thing in Netscape 7.1, Mozilla and several other apps. The problem seems to revolve around character encodings. Any app set to use UTF-8 will do this (I worked that out in Netscape... I usually have it set to UTF-8. Changed the default encoding to Latin 1 and the question-marks disappeared....) Unfortunately, this is probably not an option for OO.o.
Here is a screenshot from linux sparc. Open that document from qatestool/writer/input/bt.sxw it says "arial" at the top and ? for ' change font to example bistream vera and all is OK.
Created attachment 11746 [details] linux sparc example
I can confirm Alex's comments. If the character code is set to UTF-8, or Unicode 8, then all characters transfer without flaw. To set this in OOo, go to Tools | Options | Load/Save | HTML Compatibility, and select UTF-8 character set at the bottom right of the box. I have also found that I must do this in mozilla (browser and email application) too. Once this character set is the default in both places, my font problems went away. Those question marks can be a little embarrasing at times. I run RH9 Linux and OOo 1.1.0, but I think this is worth a try with the older versions of OOo too. Please try this, and then add your test results to the issue, so that we can decide how is best to process it.
This did not help on my linux. I found utf-8 settings in mozilla for navigator languages, mail message display and mail composition. I changed in Tools | Options | Load/Save | HTML Compatibility and closed and restarted. Please seem my attached new picture, where \' is displayed as both \' and \? on the same page. Maybe you can see what is different between those two \' ? i will attach the doc from qatesttool/writer/input/bt.sxw. Notice also in the window at the top the default style and the font says "arial" although the defualt is actually "thorndale" in this document and i do not have any "arial" :) I am using latest fix111
Created attachment 11764 [details] both ' and ? together
Created attachment 11765 [details] qatesttool/writer/input/bt.sxw
I downloaded and opened both of the latest PNG and the SXW files attached by sparcmoz. I see the question mark in the PNG file. I went the same place in the SXW file (page 10), and there is an apostrophe there in my view. I am running OOo1.1.0 on RH9 linux with the Gnome desktop. I tested the file in OOo1.0.2 also. My machine is set for UTF-8 character code in both versions of OOo. Are you running the KDE by chance, sparcmoz? I think there is another issue about this, where KDE was not working, and Gnome was not seeing the problem, but it was a paste issue, so a little different (issue 13089). Do you have access to a std build of OOo to test on your system? That could isolate the issue to the latest fix111 of OOo or your machine, perhaps. If we figure out how our systems differ too, maybe we can help to isolate this.
BTW, that Arial font most likely came in with the doc from a windows machine. I have had some of my docs do that. I purge it, because I found it slowed the opening of my files, especially when they got large. Not sure how this would affect others, if you were to share the file back to an m$ machine. My view of the file showed the Arial font too, but it still worked ok here, character-wise.
I just tested the last SXW file using OOo1.0.2 on KDE on RH9 Linux, and still, an apostrophe is there; so that theory has been proven wrong.
I have tested under RH9 with many fonts and XP with the stock lot and cannot duplicate the findings. I will test on my laptop under RH9 with only the stock fonts and see. (wait, wait) Works for me under OOo1.0.2 on the laptop. I all cases I teste bt.sxw and quotes.sxw and was unable to reproduce the problem and I know the laptop has no extra fonts installed in fact it's doing font substitution where the other two machines do not. All in all, unreproducible. If you want more done let me know.
I did search on "font linux ?" and issue 5190 suggests font substitution. I did tools -> options -> openoffice.org -> font > apply replacment table (tick) Arial was not in the existing list of fonts to be replaced so I typed arial into the font box and selected a font that i know exists into the replace with box, tick and lo. see how at how.png. the result is at enough.png I suppose the question mark means the product knows the font needs replacing so it would be a nice enhancement to automate? Are there some rules to know which font is best to replace with?
Created attachment 11769 [details] how to replace font with
Created attachment 11770 [details] replaced arial with bitstream vera sans
That is interesting. I replaced my Arial fonts manually, one document at a time, as I worked with each file. (silly me!) I too picked Bitstream Vera Sans as a replacement when I did this. At the same time, I also made Bitstream Vera Sans my default font. This makes me wonder, is my system ok, not because I use UTF-8, but because my default font is Bitstream Vera Sans, and it has the characters that enable it to fill in automatically when there is a missing character? Perhaps some fonts can handle the automation, and some cannot? Ger mentioned on the dev@qa list that he does not have UTF-8 selected as a character code. He also mentioned he tested lots of fonts, so this theory could be wrong too.
Can you attach the xlsfonts.txt file that results from the xlsfonts > xlsfonts.txt command? I think the problem is that an X11 font like Bookman or Helvetica claims to support the typographic quotes U+201C and U+201D, but when the X11 font is asked to display one of them it displays a question mark instead. The question mark used to be the "character/glyph not available" symbol that is nowadays typically indicated by an empty box.
Created attachment 11779 [details] linux sparc debian unstable
Looking over the relevant code I found this checkin http://gsl.openoffice.org/source/browse/gsl/vcl/unx/source/gdi/salcvt.cxx.diff?r1=1.9&r2=1.10 which causes the typographical quotes to be treated as convertible to io8859-1. Unfortunately the flag that is necessary to allow the corresponding U+201C,U+201D => U+0022 "ASCII quotation mark" conversion is not enabled.
Fixed in CWS vcl7pp1r4 for target OOo 1.1.1 and CWS vcl17 for target OOo2. As a workaround until they are integrated I suggest to do setup "font substitution" as described above for fonts that only display as X11 fonts on the display. E.g. Bookman->BitStream Vera Serif and Helvetica->Bitstream Vera Sans.
Testing instructions: 1. use the spadmin program to remove all fonts 2. select a "victim X11 font" (use xlsfonts and look for e.g. Bookman) 3. use "xset q" to get a list of all fontpaths 4. in the fonts.dir/fonts.scale files of these directories remove all lines that mention the victim font amd that do not end in -iso8859-1. Do this for each fontpath and reduce the number in the first line of the fonts.* file by the number of removed lines 5. do "xset fp rehash" 6. start Writer and use the victim font to write a typograhic quotation mark
I reset the target to 2.0 and closed the issue, as it is resolved/fixed. If we see this issue in released versions of 2.0 or later, then the issue can be reopened.
HDU->MacMoon: Closing a fixed issue before it is verified or integrated breaks the workflow. The fix would have to be removed from the Child WorkSpace and thus would not be integrated into mainline. Since everybody with X11 only fonts wants the fix, I'm reopening the issue.
Fixed in CWS vcl7pp1r4 for target OOo 1.1.1 and CWS vcl17 for target OOo2.
I am so sorry. I was under the impression that we should close all resolved issues. This has been discussed on the qa mail lists, and many times it has been said that there is no verification process in place. I stand corrected, and have learned from it. I will be more careful in the future. Thank you for your guidance.
HDU->MacMoon: I just checked OOo's QA pages and didn't find the document about the workflow, so I understand where the confusion comes from. I talked with the QA guy who can fix this and he confirmed that the pages will be updated soon and the pages will then also address the workflow. Thanks for bringing this issue to our attention.
HDU->US: please verify in CWS vcl7pp1r4.
Changing resolution to FIXED in order to mark issue verified.
Congratulations this fix is admittedly a small step for mankind but a great step for OOo (and I don't mean it cynical).
Re-verified on resynced vcl7pp1r4. US->US: to make this fix work there have to be latin1 XFonts available, which is the default on any XFree86. Testscenario: remove all entries from the font path except latin1 and 2 unscaled XFonts (misc,75dpi,100dpi)
ok in master workspace srx645_m27s1-1.8738. Fix will be in forthcomming OOo 1.1.1. Closing Fixed/Verified issue.