Issue 11018 - Unknown characters replaced by question marks
Summary: Unknown characters replaced by question marks
Status: CLOSED FIXED
Alias: None
Product: gsl
Classification: Code
Component: code (show other issues)
Version: OOo 1.0.2
Hardware: All Linux, all
: P3 Trivial (vote)
Target Milestone: OOo 1.1.1
Assignee: ulf.stroehler
QA Contact: issues@gsl
URL:
Keywords: oooqa
Depends on:
Blocks:
 
Reported: 2003-01-28 15:01 UTC by gaute
Modified: 2004-01-29 10:05 UTC (History)
4 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Screenshot showing the same text in Times New Roman and Helvetica (Debian Linux) (15.16 KB, image/png)
2003-03-27 10:11 UTC, gaute
no flags Details
Screenshot showing how typographical quotes appear in certain fonts (Bookman in this case) (19.72 KB, image/png)
2003-12-03 18:37 UTC, gaute
no flags Details
The document used to create the last screenshot (5.29 KB, application/octet-stream)
2003-12-03 18:37 UTC, gaute
no flags Details
linux sparc example (85.96 KB, image/png)
2003-12-04 08:34 UTC, sparcmoz
no flags Details
both ' and ? together (129.49 KB, image/png)
2003-12-04 20:54 UTC, sparcmoz
no flags Details
qatesttool/writer/input/bt.sxw (73.87 KB, application/octet-stream)
2003-12-04 20:55 UTC, sparcmoz
no flags Details
how to replace font with (131.71 KB, image/png)
2003-12-05 04:34 UTC, sparcmoz
no flags Details
replaced arial with bitstream vera sans (145.89 KB, image/png)
2003-12-05 04:36 UTC, sparcmoz
no flags Details
linux sparc debian unstable (192.99 KB, text/plain)
2003-12-05 09:39 UTC, sparcmoz
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description gaute 2003-01-28 15:01:02 UTC
When a document contains characters that are not present in the current font,
OpenOffice.org replaces these with question marks. Common examples include
quotation marks, bullets and long dashes that are present in the Microsoft 
Windows code pages, but not in Latin-1.

Using a question mark is a fundamentally flawed approach. The question mark
can completely alter the *meaning* of the text. I've seen other applications
use a hollow square for characters that cannot be shown. This is much better,
since the square does not mean anything. The user will just see that
something is not correct, instead of seeing a totally different text.

(In some configurations, it appears that OpenOffice.org does use a hollow
square. On my Linux computers, I always get the question mark.)

If an unknowing reader reads a sentence with a question mark in the middle,
he will assume that the writer was mad ;-) All the writer perhaps did, was
to insert a (typograhically proper) long dash.

Some examples include issues #4889, #6815, #8429 and #9794.
Comment 1 gaute 2003-03-27 10:11:46 UTC
Created attachment 5263 [details]
Screenshot showing the same text in Times New Roman and Helvetica (Debian Linux)
Comment 2 Rainer Bielefeld 2003-10-24 11:57:32 UTC
Hi reporter, 

thank you for using and supporting OOo.

Does this problem still exist in “OOo 1.1.0”?

If it does, please attach an OOo-document that reproduces this problem
 _here_in_the_issue_.

All you say sounds completely logic, and an example document with a
little description (The missing characers are "WZTLFX:-) Unicode xxx 
 see screenshot ... , or something like that) wold help to get a
solution, soon.


CU

Rainer
Comment 3 mci 2003-11-20 17:06:36 UTC
reqassigned to mci
Comment 4 mci 2003-11-20 17:10:17 UTC
Hi gaute,

thanks for using and supporting openOffice...

Please try our newest version and send a new comment if your problem 
occurs again...
if there's no new comment until 12/04/2003 I assume this works now...
I'll close this Issue then...

set to worksforme...
Comment 5 gaute 2003-12-03 18:37:07 UTC
Created attachment 11735 [details]
Screenshot showing how typographical quotes appear in certain fonts (Bookman in this case)
Comment 6 gaute 2003-12-03 18:37:48 UTC
Created attachment 11736 [details]
The document used to create the last screenshot
Comment 7 gaute 2003-12-03 18:44:17 UTC
Sorry for being late, but I haven't gotten around to install 1.1 on my Linux computers yet.

When making the screenshots, I wrote "test", and the quotes were automatically converted 
to proper, typographical quotes, as defined in the standard English replacement settings. In 
some fonts, these characters don't exist, and I get the question mark instead. I think the 
question mark is plainly misleading, and would prefer something neutral, like a square 
symbol. (Or even better, fall back to an alternative quote symbol.)

I use Debian Woody, with XFree86 4.1.
Comment 8 Rainer Bielefeld 2003-12-04 07:12:28 UTC
Hi,

I can not reproduce that with my  1.1.0 German version WIN98SE:
645m19(Build8693) 

I do not have "Bookman" on my PC, so that I can not test with that
special font.

I found "Simplex" as a font without "typographical quotes", I tried
with 'U+201C' and 'U+201D'), and it shows rectangles instead of the 
quotes, what seems to be a good solution. 

I tried to find some other fonts without 'U+201C' and 'U+201D' to test
that again, but it is really hard work, especially, because "Insert
Special Character" shows very many fonts that can not be inserted in a
normal way.

May be this is a special Linux issue and can be reproduced only by
Linux users. I will ask them.

Rainer
Comment 9 settantta 2003-12-04 08:00:39 UTC
Without additional testing, I can say this is common under Linux, and
OpenOffice.org is not the only application which displays this
behaviour. I have the same thing in Netscape 7.1, Mozilla and several
other apps. 

The problem seems to revolve around character encodings. Any app set
to use UTF-8 will do this (I worked that out in Netscape... I usually
have it set to UTF-8. Changed the default encoding to Latin 1 and the
question-marks disappeared....) Unfortunately, this is probably not an
option for OO.o.
Comment 10 sparcmoz 2003-12-04 08:33:45 UTC
Here is a screenshot from linux sparc. 
Open that document from qatestool/writer/input/bt.sxw

it says "arial" at the top and ? for '

change font to example bistream vera and all is OK.

Comment 11 sparcmoz 2003-12-04 08:34:44 UTC
Created attachment 11746 [details]
linux sparc example
Comment 12 diane 2003-12-04 12:24:15 UTC
I can confirm Alex's comments. If the character code is set to UTF-8,
or Unicode 8, then all characters transfer without flaw. To set this
in OOo, go to Tools | Options | Load/Save | HTML Compatibility, and
select UTF-8 character set at the bottom right of the box. I have also
found that I must do this in mozilla (browser and email application)
too. Once this character set is the default in both places, my font
problems went away. Those question marks can be a little embarrasing
at times. I run RH9 Linux and OOo 1.1.0, but I think this is worth a
try with the older versions of OOo too. Please try this, and then add
your test results to the issue, so that we can decide how is best to
process it.

Comment 13 sparcmoz 2003-12-04 20:52:39 UTC
This did not help on my linux. I found utf-8 settings in mozilla for
navigator languages, mail message display and mail composition. I
changed in Tools | Options | Load/Save | HTML Compatibility and closed
and restarted. 
Please seem my attached new picture, where \' is displayed as both \'
and \? on the same page. Maybe you can see what is different between
those two \' ? i will attach the doc from qatesttool/writer/input/bt.sxw. 
Notice also in the window at the top the default style and the font
says "arial" although the defualt is actually "thorndale" in  this
document and i do not have any "arial" :)

I am using latest fix111
Comment 14 sparcmoz 2003-12-04 20:54:04 UTC
Created attachment 11764 [details]
both ' and ? together
Comment 15 sparcmoz 2003-12-04 20:55:23 UTC
Created attachment 11765 [details]
qatesttool/writer/input/bt.sxw
Comment 16 diane 2003-12-04 22:01:42 UTC
I downloaded and opened both of the latest PNG and the SXW files
attached by sparcmoz. I see the question mark in the PNG file. I went
the same place in the SXW file (page 10), and there is an apostrophe
there in my view. I am running OOo1.1.0 on RH9 linux with the Gnome
desktop. I tested the file in OOo1.0.2 also. My machine is set for
UTF-8 character code in both versions of OOo. Are you running the KDE
by chance, sparcmoz? I think there is another issue about this, where
KDE was not working, and Gnome was not seeing the problem, but it was
a paste issue, so a little different (issue 13089). Do you have access
to a std build of OOo to test on your system? That could isolate the
issue to the latest fix111 of OOo or your machine, perhaps. If we
figure out how our systems differ too, maybe we can help to isolate this.
Comment 17 diane 2003-12-04 22:12:32 UTC
BTW, that Arial font most likely came in with the doc from a windows
machine. I have had some of my docs do that. I purge it, because I
found it slowed the opening of my files, especially when they got
large. Not sure how this would affect others, if you were to share the
file back to an m$ machine. My view of the file showed the Arial font
too, but it still worked ok here, character-wise.
Comment 18 diane 2003-12-04 22:43:17 UTC
I just tested the last SXW file using OOo1.0.2 on KDE on RH9 Linux,
and still, an apostrophe is there; so that theory has been proven wrong.
Comment 19 grsingleton 2003-12-05 01:03:32 UTC
I have tested under RH9 with many fonts and XP with the stock lot and
cannot duplicate the findings. I will test on my laptop under RH9 with
only the stock fonts and see.  (wait, wait) Works for me under
OOo1.0.2 on the laptop. I all cases I teste bt.sxw and quotes.sxw and
was unable to reproduce the problem and I know the laptop has no extra
fonts installed in fact it's doing font substitution where the other
two machines do not.

All in all, unreproducible. If you want more done let me know.
Comment 20 sparcmoz 2003-12-05 04:33:29 UTC
I did search on "font linux ?" and issue 5190 suggests font substitution.

I did tools -> options -> openoffice.org -> font > apply replacment
table (tick)

Arial was not in the existing list of fonts to be replaced so I typed
arial into the font box and selected a font that i know exists into
the replace with box, tick and lo. see how at how.png. the result is
at enough.png 

I suppose the question mark means the product knows the font needs
replacing so it would be a nice enhancement to automate?

Are there some rules to know which font is best to replace with? 
 
Comment 21 sparcmoz 2003-12-05 04:34:46 UTC
Created attachment 11769 [details]
how to replace font with
Comment 22 sparcmoz 2003-12-05 04:36:03 UTC
Created attachment 11770 [details]
replaced arial with bitstream vera sans
Comment 23 diane 2003-12-05 05:13:53 UTC
That is interesting. I replaced my Arial fonts manually, one document
at a time, as I worked with each file. (silly me!) I too picked
Bitstream Vera Sans as a replacement when I did this. At the same
time, I also made Bitstream Vera Sans my default font. This makes me
wonder, is my system ok, not because I use UTF-8, but because my
default font is Bitstream Vera Sans, and it has the characters that
enable it to fill in automatically when there is a missing character?
Perhaps some fonts can handle the automation, and some cannot? Ger
mentioned on the dev@qa list that he does not have UTF-8 selected as a
character code. He also mentioned he tested lots of fonts, so this
theory could be wrong too.
Comment 24 hdu@apache.org 2003-12-05 09:26:17 UTC
Can you attach the xlsfonts.txt file that results from the
  xlsfonts > xlsfonts.txt
command?

I think the problem is that an X11 font like Bookman or Helvetica
claims to support the typographic quotes U+201C and U+201D, but when
the X11 font is asked to display one of them it displays a question
mark instead. The question mark used to be the "character/glyph not
available" symbol that is nowadays typically indicated by an empty box.
Comment 25 sparcmoz 2003-12-05 09:39:04 UTC
Created attachment 11779 [details]
linux sparc debian unstable
Comment 26 hdu@apache.org 2003-12-05 11:26:33 UTC
Looking over the relevant code I found this checkin
http://gsl.openoffice.org/source/browse/gsl/vcl/unx/source/gdi/salcvt.cxx.diff?r1=1.9&r2=1.10
which causes the typographical quotes to be treated as convertible to
io8859-1. Unfortunately the flag that is necessary to allow the
corresponding U+201C,U+201D => U+0022 "ASCII quotation mark"
conversion is not enabled.
Comment 27 hdu@apache.org 2003-12-05 11:46:29 UTC
Fixed in CWS vcl7pp1r4 for target OOo 1.1.1 and CWS vcl17 for target OOo2.

As a workaround until they are integrated I suggest to do setup "font
substitution" as described above for fonts that only display as X11
fonts on the display.
E.g. Bookman->BitStream Vera Serif
and Helvetica->Bitstream Vera Sans.
Comment 28 hdu@apache.org 2003-12-05 12:37:45 UTC
Testing instructions:
1. use the spadmin program to remove all fonts
2. select a "victim X11 font" (use xlsfonts and look for e.g. Bookman)
3. use "xset q" to get a list of all fontpaths
4. in the fonts.dir/fonts.scale files of these directories remove all
lines that mention the victim font amd that do not end in -iso8859-1.
Do this for each fontpath and reduce the number in the first line of
the fonts.* file by the number of removed lines
5. do "xset fp rehash"
6. start Writer and use the victim font to write a typograhic
quotation mark
Comment 29 diane 2003-12-05 13:35:48 UTC
I reset the target to 2.0 and closed the issue, as it is
resolved/fixed. If we see this issue in released versions of 2.0 or
later, then the issue can be reopened.
Comment 30 hdu@apache.org 2003-12-12 14:07:08 UTC
HDU->MacMoon: Closing a fixed issue before it is verified or integrated breaks
the workflow. The fix would have to be removed from the Child WorkSpace and thus
would not be integrated into mainline. Since everybody with X11 only fonts wants
the fix, I'm reopening the issue.
Comment 31 hdu@apache.org 2003-12-12 14:08:30 UTC
Fixed in CWS vcl7pp1r4 for target OOo 1.1.1 and CWS vcl17 for target OOo2.
Comment 32 diane 2003-12-12 16:10:24 UTC
I am so sorry. I was under the impression that we should close all resolved
issues. This has been discussed on the qa mail lists, and many times it has been
said that there is no verification process in place. I stand corrected, and have
learned from it. I will be more careful in the future. Thank you for your guidance.
Comment 33 hdu@apache.org 2003-12-15 08:41:41 UTC
HDU->MacMoon: I just checked OOo's QA pages and didn't find the document about
the workflow, so I understand where the confusion comes from. I talked with the
QA guy who can fix this and he confirmed that the pages will be updated soon and
the pages will then also address the workflow. Thanks for bringing this issue to
our attention.
Comment 34 hdu@apache.org 2003-12-18 14:50:00 UTC
HDU->US: please verify in CWS vcl7pp1r4.
Comment 35 ulf.stroehler 2004-01-06 11:10:48 UTC
Changing resolution to FIXED in order to mark issue verified.
Comment 36 ulf.stroehler 2004-01-06 11:19:45 UTC
Congratulations this fix is admittedly a small step for mankind but a great step
for OOo (and I don't mean it cynical).
Comment 37 ulf.stroehler 2004-01-22 16:07:21 UTC
Re-verified on resynced vcl7pp1r4.
US->US: to make this fix work there have to be latin1 XFonts available, which is
the default on any XFree86.
Testscenario: remove all entries from the font path except latin1 and 2 unscaled
XFonts (misc,75dpi,100dpi)
Comment 38 ulf.stroehler 2004-01-29 10:05:51 UTC
ok in master workspace srx645_m27s1-1.8738.
Fix will be in forthcomming OOo 1.1.1.
Closing Fixed/Verified issue.