Apache OpenOffice (AOO) Bugzilla – Issue 78749
some Latin text needs CTL processing
Last modified: 2017-05-20 11:31:38 UTC
Issue 16032 (support OpenType features) is way too broad, so this issue has been created to split one specific aspect of: some Latin text needs to be treated as complex text. @fme: what problems do you see if the script detector handled some scripts that are currently detected as simple text as complex (e.g. "i_ogonek+combining_accent")? Wouldn't this cause quite some trouble in the Latin/CJK/CTL font selection... do you see a way out of this? @moyogo/ruedin: our quality assurance team prefers concrete examples (docs,screenshots,...) of what does not work now, so they can assess the priority of the problem and check the success of a fix...
.
Created attachment 46139 [details] text requiring ccmp, mark and mkmk
Created attachment 46140 [details] output of text with Doulos SIL in Gedit with Pango and
Created attachment 46141 [details] output of sample text with DejaVu Sans in Gedit
Created attachment 46142 [details] Wrong output of sample text with Doulos SIL in OO Write
Created attachment 46143 [details] Wrong output of sample text with DejaVu Sans in OO Write
http://www.openoffice.org/nonav/issues/showattachment.cgi/46139/text.utf8 - the first line just needs 'mark' to position the mark below correctly - - the second line needs 'mark' for marks above, and 'ccmp' to decompose i and j when followed by combining mark above - the third line shows ligatures, not obligatory but the user should be able to enable them - the fourth line needs 'mark' for marks, 'ccmp' for i, and 'mkmk' to stack the marks. You’ll notice there’s something wrong with the first g-with-a-hook, the marks are not in a Unicode canonical order, to handle this properly the shaper should normalize base+marks before shaping. - the fifth, mark stacking below - the sixth line should have ligature tie at different high but DejaVu and Doulos SIL don’t do that. Junicode handles this with the 'kern' feature. DejaVu fonts are installed by default on many Linux systems, the latest (MS Office, Vista) Tahoma, Arial, and Times New Roman have similar feature for simple diacritics placement. http://dejavu.sourceforge.net Doulos SIL is a font for linguistics or languages using stacked diacritics http://scripts.sil.org/DoulosSILfont Junicode is a medievalist font with good support for stacked diacritics. http://junicode.sourceforge.net/
Thank you for the nice samples! Especially the problem with the latest versions of the common Dejavue font is important for selecting the appropriate priority. To get an even better overview: which languages are most impacted by the problem? Do you happen to know if these languages are already represented in the OOo NLC (http://native- lang.openoffice.org/)?
btw: this should be named "some Standard scripts text needs CTL processing" Any language that doesn't benefit from legacy encodings with precomposed characters in Unicode is affected by this bug. This includes many African languages or other minority languages, Malagasy is an example in NLC. The 'locl' language specific feature affect languages like Serbian and Macedonian. In theory any language using marks could be affected, if composed forms are used instead of the legacy precomposed forms. For example in French "École" and "École" will no be rendered the same way in OO when they should.
This issue is not restricted to Latin texts. Hebrew text needs 'ccmp' and 'mark' to display diacritical marks properly. Perhaps Arabic needs these too, and possibly some more, as it more complicated than Hebrew. A free Hebrew font with these features can be found at http://culmus.sourceforge.net/devel/FrankOT.tar.gz This issue is not like 'making text prettier'. Without these features diacritics are absolutely unusable.
> Hebrew text needs 'ccmp' and 'mark' to display diacritical marks properly. @iorsh: are you aware of any Hebrew script that is improperly handled because these tables don't work? If yes: please write a seperate issue (because Hebrew already gets CTL processing, whereas this issue is about text that didn't get CTL-processing though it might need it). Then please assign the new Hebrew issue to me and cc mreimer and ayaninger...
Created attachment 46182 [details] ouput of sample text in OOo with Doulos SIL on Windows
Created attachment 46183 [details] output of sample text in OOo with SegoeUI on Windows
On Windows, with a recent Uniscribe, these features are enabled by default and are handled properly. As you can see in the attachment 46182 [details] and attachment 46183 [details], Doulos SIL and Segoe UI have those features properly used in OpenOffice in Windows. The first g-with-a-hook even has the marks at the right places since Uniscribe reorders the characters before shaping. This means that making a document on Windows will be correct, but not if made on Linux.
I'm sorry - I was wrong with my earlier comment. TrueType fonts with Hebrew 'mark'/'ccmp' features work fine.
resolution of this issue is crucial for making oo.o usable for writing linguistic literature. MS Word doesn't support this either, so it isn't all that important for "market share" at present, but this is a severe problem for anyone working with orthographies not addressed by "precomposed" legacy encodings adopted into Unicode.
Excuse me, I cannot understand your logic sometimes. Since when do you follow exactly what MS Word is doing ?? It's ok to keep an eye on it for important features that are still missing from OO and for ideas, but sometimes I'm struggling to decide the impression of whether you are eventually INDEED trying to produce a better product than M$ Office or not.
@dbachmann: Do you think it would be good to get extra developers on this? Is there some description of what files need to be touched and what should be added?
@simos, I am afraid I am not familiar enough with oo.o internals to know "which files should be touched". This appears to be a platform-dependent issue. As moyogo notes (June 22), the required features apparently work fine for Windows with a recent Uniscribe, but not on X11 (Linux / OS X), although X11 seems well capable of supporting OTF, as shown by gedit or yudit etc. Consequently, this issue should probably be assigned to an X11 wizard. for the typographers' view see also http://www.typophile.com/node/17517 http://typophile.com/node/28539
Setting a target.
target
"locl" is required for Serbian. There is an example of this on pango.org: http://www.pango.org/ScriptGallery?action=AttachFile&do=get&target=OpenTypeLanguage.png
*** Issue 96123 has been marked as a duplicate of this issue. ***
*** Issue 110477 has been marked as a duplicate of this issue. ***
*** Issue 111378 has been marked as a duplicate of this issue. ***
IMO, this issue is related to OpenOffice's text rendering engine (Graphite). Firefox and GTK+ apps use another text rendering engine Pango which is not affected, and KDE/Qt apps use Qt which is OK too.
yaoziyuan, you are right. This has to do with the OO rendering engine. It makes OO look very bad. Since OO is so important to the free software community, it makes the whole community look bad. You mentioned the rendering engines used by KDE4 and Gnome. But also Windows Vista+ and the Mac OS have good support for these features. The effect is, any graphical web browser, and much simpler word processors such as KWord, and even the lowliest text processor on these systems renders mark placement fairly well (never mind MS Word). But not OpenOffice. OpenOffice Writer is primarily about text display of high quality. Without that, all the fancy bells and whistles are for the garbage. Anything else you can do with it, could be done better with some other tool. Crucial features that still seem to be absent in OO 'ccmp' ligature composition (as opposed to the ccmp decomposition table) 'mark' 'mkmk' certainly there are others. Scripts that would use these features (and may be illegible without them) include: Hebrew, Arabic, Thai, Vietnamese, but there are many more. I don't know if OO supports 'abvm' and 'blwm', but these are used a lot with Indic scripts, as well as Tibetan. But mark positioning is used for fine placement in general for languages that use marks. And that includes the *majority* of European languages. It makes the difference between text being very ugly, and looking great. See for example: http://partners.adobe.com/public/developer/opentype/index_table_formats2.html This issue makes the product look stupid and useless for a large fraction of potential users in the world, yet since this report was opened in 2007, it has been marked priority P3 "Of interest, but not planned or expected in this release" (and many related reports preceded it.) Colleagues, let's get our priorities straight.
On GNU/Linux Open Office uses ICU for text layout. If there's a problem with OpenType features, look at the ICU code. cya, #
Reset assigne to the default "issues@openoffice.apache.org".