Apache OpenOffice (AOO) Bugzilla – Issue 69088
MML Import of multiple subscripts fails.
Last modified: 2013-08-07 14:54:45 UTC
Trying to translate a LaTeX document to OOo I encountered that multiple indices are not correctly grouped, when imported from MML. This can be reproduced using oomath by deleting the SO5 Annotation. Example: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN" "math.dtd"> <math:math xmlns:math="http://www.w3.org/1998/Math/MathML"> <math:semantics> <math:msub> <math:mi>x</math:mi> <math:msub> <math:mi>s</math:mi> <math:mi>j</math:mi> </math:msub> </math:msub> </math:semantics> </math:math> should give "x_{s_j}", but produces "x_s_j" which is not displayed as expected.
Created attachment 38847 [details] Same excerpt as file
MRU->TL: the attached file can be displaeyd as expected by FireFox browser, but OO Math will display it as error, because it can't interpret something like x_y_z . Thus it should be imported as x_{y_z}.
Created attachment 65809 [details] Correct MathML code in odt, which has wrong representation in OO
I have tested this with OO-3.1.1. Nothing changed from 2006. This bug is still exist. I made a simple latex test file: \documentclass{article} \begin{document} \[ R_{k} = \int_{t_{k}}^{t_{k+1}} y(t)\cdot s_{ref}(t) dt \] \end{document} Having opened odt file from tex4ht, a saw the wrong OOMath formula: R_k ="∫"_t_k^t_{k + 1} y { \( t \)} cdot s_{r e f} { \( t \)} d t Then I unpacked odt file and opened test-m2/content.xml file with MathML of the given formula in Opera browser, which displayed it correctly. So, I suggest this to be the OO bug. See attachment http://www.openoffice.org/nonav/issues/showattachment.cgi/65809/ content.xml
tl->eugene_b: have a look at the target of the issue, it means it is not even targeted for a planned release. This is usually because of the large numbers of open issues and limited developer resources. And compared to others this one seemed to be of lesser importance when being evaluated last time. In generally: To get a better target you need either to raise the issue by discussing it with QA or product management. Or, if you are capable, you can provide a patch for the problem. Patches always have high priority and will be integrated quickly if they are correct and don't introduce other problems.
tl, I'm working on this problem - it would be faster to solve it by myself. If I'll obtain the solution, I'll send a patch. But I am new in openoffice. It is quite large program and it will take a lot of time to understand the sources. Now I think the reason is somewhere in build/starmath/mathml.cxx file.
Eugene_b, I hope you will manage to fix this long standing bug, because it makes OO useless for interchange of mathematical texts. Too bad that OO developers don't see that as a higher priority problem. I suspect that fixing it wouldn't take more than an hour for someone familiar with the OO code base. Note also that starmath/mathml.cxx file is no longer in the trunk. The one to look at now (I think) is starmath/source/mathmlimport.cxx.
t3, thank you for this notice. I didn't use trunk, I investigated the stable sources whic my Gentoo system downloaded with its portage system. I want to takle this problem seriously, so I need to get latest sources from SVN. Unfortunately I'll have no time to digg the sources for next several days for some reasons. The bug definitely seems to be easy to fix but unfortunately I'm unfamiliar with the OO sources and its exploration will also take some time. It seems nobody cares of this bug...
I had to conclude the SUN odf plugin 3.1 from http://www.sun.com/software/star/ odf_plugin/get.jsp have exactly the same bug. Evidently it shares the same buggy code with OOo.
About the SUN odf plugin: more likely it just calls the OOo code to get things done.
I have installed Odf plugin but there were no OpenOffice. So this code is included into Odf plugin itself.
Created attachment 66094 [details] Formula with double subscripts, corrected in OOMath
Created attachment 66095 [details] The same file as http://www.openoffice.org/nonav/issues/showattachment.cgi/66094/content_corrected.mml, but with OOMath internal format string deleted
Some observations of the way OOMath handles math formulas. One can notice two things. 1) OOMath incorrectrly interprets the MathML code with double subscripts. 2) Having created in OOMath the formula with double subscripts one can save it as MathML and reopen it correctily. How could it be? The answer is - OOMath additionally saves the formula in it's internal format! Look at attachment 66094 [details] (http://www.openoffice.org/nonav/issues/ showattachment.cgi/66094/content_corrected.mml). It is the formula with double subscripts, corrected and saved in OOMath. You could open it in OOMath and it displayed it correctly. I've checked the MathML in this file (and reindented it for better readability) - the MathML is correct. But this file has an additional string: <math:annotation math:encoding="StarMath 5.0">R_k ="∫"_{t_k}^{t_{k + 1}} y { \( t \)} cdot s_{r e f} { \( t \)} d t</math:annotation> It is the code of the formula in the OOMath format. Having deleted this string I obtained the file you can see in attachment 66095 [details] (http://www.openoffice.org/nonav/issues/showattachment.cgi/66095/ content_wo_oomath.mml) OOMath can't open it correctly! So, some conclusions about the way OOMath handles formulas: 1) OOMath saves correct MathML code. 2) Along with the MathML, it saves an additional string with the formula in it's own format. 3) While opening document, OOMath look up the string in it's own format. Having found it OOMath uses it and completely ignores the MathML code. 4) If the file have no the string in OOMath's format, OOMath had to use MathML code, which it can't interpret correctly.
tl->eugene_b: Please try again once your content_wo_oomath.mml is a valid MathML file. I created a text document with a single formula and replaced the the text in the content.xml file of the formula with the one from content_wo_oomath.mml attache here. Then I recreated a new odt from that by zipping all the files and folders. After that I ran it through the ODF validator, it failed and I got this output: This file is NOT valid Result details: upload:///BBBneu.odt/Object 1//Object 1/content.xml[15,48]:Error:cvc-complex-type.3.2.2: Attribute 'math:stretchy' is not allowed to appear in element 'math:mo'. upload:///BBBneu.odt/Object 1//Object 1/content.xml[38,60]:Error:cvc-complex-type.3.2.2: Attribute 'math:stretchy' is not allowed to appear in element 'math:mo'. upload:///BBBneu.odt/Object 1//Object 1/content.xml[53,52]:Error:cvc-complex-type.3.2.2: Attribute 'math:stretchy' is not allowed to appear in element 'math:mo'. upload:///BBBneu.odt/Object 1//Object 1/content.xml[59,52]:Error:cvc-complex-type.3.2.2: Attribute 'math:stretchy' is not allowed to appear in element 'math:mo'. upload:///BBBneu.odt/Object 1//Object 1/content.xml[63,48]:Error:cvc-complex-type.3.2.2: Attribute 'math:stretchy' is not allowed to appear in element 'math:mo'. upload:///BBBneu.odt/Object 1//Object 1/content.xml[84,48]:Error:cvc-complex-type.3.2.2: Attribute 'math:stretchy' is not allowed to appear in element 'math:mo'. upload:///BBBneu.odt/Object 1//Object 1/content.xml[90,48]:Error:cvc-complex-type.3.2.2: Attribute 'math:stretchy' is not allowed to appear in element 'math:mo'. upload:///BBBneu.odt/Object 1//Object 1/content.xml[101,22]:Error:cvc-complex-type.2.4.b: The content of element 'math:semantics' is not complete. One of '{"http://www.w3.org/1998/Math/MathML":annotation, "http://www.w3.org/1998/Math/MathML":annotation-xml}' is expected. upload:///BBBneu.odt:Info:validation errors found upload:///BBBneu.odt:Info:Generator: StarOffice/9$Win32 OpenOffice.org_project/300m64$Build-9446$CWS-tl76 That is there are two types of errors the one with the stretchy in 'mo' tags and the one with the 'semantic' tag. Thus you need to fix those errors first before we can see if there is a problem with the MathML import. The ODF validator can be found here http://tools.services.openoffice.org/odfvalidator/
Created attachment 66139 [details] Small bugdoc to reproduce the import problem
tl->eugene_b: to make things a bit easier for you I created a small bugdoc without the StarMath annotation tag to reproduce the problem. (Don't worry about the wrong replacementment image it will get fixed once you activate the formula.) If your fix solves the import problem then you should be able to open the document, double-click the formula and have it displayed correctly.
Created attachment 66205 [details] MathML file after cleaning up for ODF validation, "annotation" tag is removed
tl->eugene_b: Ok, as I can see that one can be imported now. However, in my unfixed version, of course incorrect. But is that one now working with your fix?
This file was taken from OOo (I corrected and resaved the file, prepaired with Tex4ht). OK, I cleaned up all errors found with ODF validator manually and the file passed the check. After that, I removed the "annotation" tag from the MathML file (attachment 66205 http://www.openoffice.org/nonav/issues/showattachment.cgi/66205/ content_valid_wo_annotate.mml). ODF validator suggested this to be an error: upload:///test2.odt/test-m2//test-m2/content.xml[2,953]:Error:cvc-complex- type.2.4.b: The content of element 'math:semantics' is not complete. One of '{"http://www.w3.org/1998/Math/MathML":annotation, "http://www.w3.org/1998/Math/ MathML":annotation-xml}' is expected. So, ODF validator requires "annotation" tag. Next, look at the MathML page in Wikipedia (http://en.wikipedia.org/wiki/MathML). There is an example of MathML code with two "annotation" tags, one in TeX format, another in "StarMath format": <annotation encoding="TeX"> x=\frac{-b \pm \sqrt{b^2 - 4ac}}{2a} </annotation> <annotation encoding="StarMath 5.0"> x={-b plusminus sqrt {b^2 - 4 ac}} over {2 a} </annotation> The software should distinguish the annotation format with "encoding" token. But ODF validator claims "encoding" token to be incorrect! upload:///test1.odt/test-m2//test-m2/content.xml[2,1136]:Error:cvc-complex- type.3.2.2: Attribute 'math:encoding' is not allowed to appear in element 'math:annotation'. Next, I placed the annotation in TeX format to the MathMl file and removed the "encoding" token (which is ignored by ODF validator and OOo). ODF validator detects this: upload:///test2.odt/test-m2//test-m2/content.xml[2,948]:Error:cvc-complex- type.2.4.a: Invalid content was found starting with element 'annotation'. One of '{"http://www.w3.org/1998/Math/MathML":annotation, "http://www.w3.org/1998/ Math/MathML":annotation-xml}' is expected. The results. ODF validator requires MathML, embedded into ODF, to have "annotation" tag and it should be exactly in StarMath format. Otherwise the file would be suggested to be incorrect. But Open Document Format itself is the standard irrespective to the particular software, like StarMath. Unfortunatelly its current version lacks the description of the formula format. Maybe the root of this problem lays here. I haven't found any mention of "StarMath 5.0" if the ODF standard and in http://www.w3.org/1998/Math/MathML. So now, the ODF validator http://tools.services.openoffice.org/odfvalidator/ check files not to be complicant to the Open Document Format standard, but to be compartible with OpenOffice. It's the OOo validator. By the way, OOMath incorrectly displays the MathML formulas from official MathML testsuite http://www.w3.org/Math/testsuite/
I took another ODF validator: http://opendocumentfellowship.com/validator It found another errors! But it haven't found the errors, Sun ODF validator found previously. In particular, it allows the "annotation" tag to have "encoding" token and it allows the MathML not to have "annotation" tag at all. I haven't found the obligatory requirement for MathML to have "annotation" tag nor in Open Document format standard (ISO/IEC 26300), nor in MathML 2.0 description (W3C recommendation 21 october 2003 http://www.w3.org/TR/2003/REC- MathML2-20031021/ )
tl->eugene_b: I agree that according to http://www.w3.org/TR/MathML2/chapter4.html#contm.semantics a annotation tag should not be required and therefore or ODF validator might have a slight problem here. (I have forwarded a note about this to someone else.) However, since despite that the corrected MathML can still be imported lets drop this particular item. (My attached sample doc also does not have that tag.) And no, w3.org not listing a "StarMath 5.0" annotation is not a problem at all. w3.org is just listing some examples for known existing tags. But that does not mean there is predefined and fixed set from that annotations must be chosen. But I'm still curious about the effect of your fix, did it solve the problem? If yes it would seem we have a patch. If that is the case please add it to this issue, change the type to 'PATCH' and assign the issue to me.