Apache OpenOffice (AOO) Bugzilla – Issue 19826
Postscript output uses ambiguous font encoding values. Unparsable.
Last modified: 2005-02-17 12:08:59 UTC
Font encodings in postscript output have changed in openoffice 1.1RCx and now use random ambiguous values instead of the ascii value of the referenced character. This is bad and should be corrected. Open office 1.0.3 postscript output: /FontName /GnuMICRNormalHSet1 def /XUID [103 0 0 16#3CA9857D 13 16#7F837AB1 16#181136D6] def /FontMatrix [.001 0 0 .001 0 0] def /FontBBox [0 0 648 701] def /Encoding 256 array def 0 1 255 {Encoding exch /.notdef put} for Encoding 32 /glyph0 put Encoding 48 /glyph1 put Encoding 49 /glyph2 put Encoding 50 /glyph3 put Encoding 51 /glyph4 put Encoding 52 /glyph5 put Encoding 53 /glyph6 put Encoding 54 /glyph7 put Encoding 55 /glyph8 put Encoding 56 /glyph9 put Encoding 57 /glyph10 put Encoding 65 /glyph11 put Encoding 67 /glyph12 put Open office 1.1RC3 postscript output: /FontName (GnuMICRNormalHGSet2) cvn def /XUID [103 0 0 16#3CA9857D 14 16#501FB36A 16#87C0B0A7] def /FontMatrix [.001 0 0 .001 0 0] def /FontBBox [0 0 648 701] def /Encoding 256 array def 0 1 255 {Encoding exch /.notdef put} for Encoding 0 /glyph0 put Encoding 8 /glyph1 put Encoding 7 /glyph2 put Encoding 6 /glyph3 put Encoding 5 /glyph4 put Encoding 4 /glyph5 put Encoding 3 /glyph6 put Encoding 2 /glyph7 put Encoding 10 /glyph8 put Encoding 11 /glyph9 put Encoding 12 /glyph10 put Encoding 13 /glyph11 put Encoding 9 /glyph12 put Encoding 1 /glyph13 put The above font is a MICR check printing font. The chacters utilized are 0 -9, A, B, and ' ' (space). 1.0.3's output properly references these encoding by their ASCII value. IE: Encoding 32 /glyph0 put References ASCII value 32, aka ' ' (space) However the same character in 1.1RC3: Encoding 8 /glyph1 put The value 8 is arbitarily used. What this means, is references to the actual text further on in the PS output use the value 8 instead of 32 for 'show' output. IE: 1.0.3 /GnuMICRNormalHSet1 findfont 50 -50 matrix scale makefont setfont <43353433323131432041313233343132333441203536373839353637383943> [37 38 37 38 37 38 37 38 37 38 37 38 37 38 37 38 37 38 37 38 37 38 37 38 37 38 37 38 37 38 0] xshow 1.1RC3: (GnuMICRNormalHGSet2) cvn findfont 50 -50 matrix scale makefont setfont <0102030405060701080906050403060504030908020A0B0C0D020A0B0C0D01> [38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 0] xshow The data in question lies between the <>'s. They should be identical. In 1.0.3 the values are the ASCII hexadecimal values. In 1.1RC3 the values are the arbitary values assigned to the font encoding. This postscript output change makes it impossible to post process the an OO postscript file with anything but a full postscript engine. I found this bug, as I am using OO as the template system for a sequencial check printing program. OO's 1.1 postscript output, can no longer be parsed in any reasonable way!
cp->dcinege: there is no warranty on the way we generate postscript output. it is completely unspecified and you must not rely on any implementation detail. it is subject of change without further notice. cp->pl: in 1.0 we generated an ascii subset (HSet1) but in 1.1 we never seem to do but start with subset (HGSet2) for the same characters. Any ideas about that ?
dcinege->cp: I should note that the remainder of the 1.1RC3 output still DOES encode based on the ascii char values, for different fonts. (IE Arial) But for this font, for some reason, it does not. (But in 1.0.3 it did) So it's output is inconsistent to itself...and to me that's a bug. As for the output standard, I understand it's subject to change, but in this area in particular, I've never seen postscript output from a word processing type program that didn't refer to a char according to their underlying ascii values in SOME way. (Over the years I've used 3 other programs for check templating before moving to OO. This is the first time It found an unparsable condition.)
You'll find encoded characters used for type1 and printer builtin fonts (Times, Helvetica and the like) since they are addressed via their encoding. All TrueType fonts are subsetted, that is only the used glyphs will be put into the new downloaded font - this was so in 1.0.3 also. The difference is that the printing code is not driven with characters anymore but with glyph id's, that is the original Unicode code point is not known at the point the character is output. This is due to complex text layout which makes it possible to print languages like arabic, thai and the like which do not have a simple character <-> glyph correlation. That being said it would be possible to make an exception for ascii, or even better ISO8859-15, but it would require some rework. I'll see if i can do something in the 2.0 timeframe.
adjusting component and type
according to the announcement on releases (http://www.openoffice.org/servlets/ReadMsg?list=releases&msgNo=7503) this issue will be re-targeted to OOo Later.
target
fixed in CWS vcl23; this will only work with Ansi1252 characters of course as all other characters have to be mapped into an arbitrary single byte glyph map.
reopen
ja->pl: please verify in CWS vcl23; output is now ansi encoded for ansi characters
fixed
JA: verified within cws vcl23 äääöööüüüßßáéç is now used with it's unicode values within the postscript output <E4E4E4F6F6F6FCFCFCDFDFE1E9E7>
JA: closing
*** Issue 42983 has been marked as a duplicate of this issue. ***