Issue 69635 - Writer cannot open .xhtml files, nor can it parse XHTML properly
Summary: Writer cannot open .xhtml files, nor can it parse XHTML properly
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: open-import (show other issues)
Version: OOo 2.0.2
Hardware: All All
: P3 Trivial with 17 votes (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-09-18 13:50 UTC by saoshyant
Modified: 2013-02-07 22:35 UTC (History)
3 users (show)

See Also:
Issue Type: FEATURE
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
a simple example -- shows up in 2.2 just as the source code (309 bytes, text/html)
2007-04-02 20:46 UTC, zephyrous
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description saoshyant 2006-09-18 13:50:41 UTC
How do you do?

I have searched the issue data base, and I found some issues with XHTML, but
didn't go into much detail to what I'd like to point out, so I'm submiting this.

I haven't tested it with other applications from the OOo suite besides Writer,
but I could check if need be. So, what I've seen under my Kubuntu installation
is that Writer is unable to import .xhtml files when available in that extension.

If I change the extension to .html, Writer will load it up as if it was HTML tag
soup monstrousity, complete with unparsed <?xml ?> tag at the top. This is not
much noticable when dealing with text-only, but I'm sure it will conflict if one
adds anything that actually requires the use of XHTML (MathML, SVG and what not).

Now, I'm sure it can't be that hard to support an open specification that's been
around for years now, but if there's no resources to work on this issue, I
recommend (if OOo license allows) the use of either the Gecko engine, or KHTML.

I can also attach some XHTML test cases here if it helps.
Comment 1 michael.ruess 2006-09-18 14:38:46 UTC
Reassigned to JSI.
Comment 2 Joost Andrae 2006-09-18 17:25:59 UTC
AFAIK there is no import of XHTML but there's an export option.

Read this if you're interested:

http://xml.openoffice.org/sx2ml/

Changing issue type to ENHANCEMENT
Comment 3 jogi 2006-09-19 06:30:11 UTC
It's a FEATURE, no enhancement because there exists no XHTML filter and it isn't
defined how it should work (also with the same technology like the export with
the same limitations?) Has to decided by requirements team.
Comment 4 saoshyant 2006-09-19 09:53:34 UTC
@ja

Thank you. That was interesting, but yes, I believe OOo needs import support for
XHTML, as well as proper rendering. Let's hope then that the requirements team
agrees with me.
Comment 5 zephyrous 2007-04-02 20:46:12 UTC
Created attachment 44170 [details]
a simple example -- shows up in 2.2 just as the source code
Comment 6 erikanderson3 2007-07-04 20:36:11 UTC
XHTML docs still show up in Writer as just the source in 2.2.1.  
Comment 7 shubhrakant 2008-01-31 07:26:05 UTC
*** Issue 69635 has been confirmed by votes. ***
Comment 8 batavus 2008-04-21 10:51:58 UTC
Import (x)html worked fine in Oo 2.0.2 as part of Ubuntu 6.06 LTS. So this
should be "just" a bug in the current Oo 2.3.

I believe this to be an inportant feature as it allows one to programmatically
produce documents by data processing apps, and convert it to nice docs with Open
Office styles.

This is also a (working) feature in MS-Word.
Comment 9 batavus 2008-04-21 12:14:44 UTC
I forgot to mention that I stripped headers from the (x)html file. Only the part
BETWEEN body and /body tags was given to, imported and processed by Oo 2.0.2,
but the h and p tags where nicely converted to headers and plain text. Still
very useful.
Comment 10 erikanderson3 2008-12-28 01:36:15 UTC
Given that OOo seems to handle the *contents* of XHTML files correctly (i.e.
everything but the XHTML header, as noted above by batavus), it seems that OOo
only needs to change how it handles this header, and then add the .xhtml
extension to the list of openable filetypes, and therefore this bugfix should
theoretically be exceedingly simple.  Are there any devs following this issue
that could give us a rough target milestone forecast?  Would 3.2 be reasonable
to hope for?
Comment 11 Mathias_Bauer 2009-01-05 12:16:20 UTC
If it's really only a bug in parsing the header, if we accept the limitation of
the current HTML import filter and just add the requirement that there shouldn't
be some unparsed XML content in the imported document: yes, that could be a low
hanging fruit.