Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | text box in wrong location when importing .doc or .rtf file from Word | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Writer | Reporter: | dankegel <dank> | ||||||
Component: | ui | Assignee: | michael.ruess | ||||||
Status: | CLOSED FIXED | QA Contact: | issues@sw <issues> | ||||||
Severity: | Trivial | ||||||||
Priority: | P3 | CC: | caolanm, issues, ivaroo | ||||||
Version: | OOo 1.1 Beta | Keywords: | ms_interoperability, oooqa | ||||||
Target Milestone: | --- | ||||||||
Hardware: | PC | ||||||||
OS: | All | ||||||||
URL: | http://www.ieee.org/organizations/pubs/transactions/TRANS-JOUR.DOC | ||||||||
Issue Type: | DEFECT | Latest Confirmation in: | --- | ||||||
Developer Difficulty: | --- | ||||||||
Issue Depends on: | 27349 | ||||||||
Issue Blocks: | |||||||||
Attachments: |
|
Description
dankegel
2003-04-27 00:54:47 UTC
See http://a957.g.akamai.net/f/957/3680/1h/www.ieee.org/organizations/pubs/transactions/TRANS-JOUR.PDF for a pdf showing how it should look. Created attachment 5922 [details]
Minimal test case. "line2" should be below "LINE1", but is superimposed on top of it instead.
OK, I attached a minimal test case (produced with Word, but trimmed down by hand a bit). The word "line2" should appear *below* the word "LINE1", but in OpenOffice, it appears on top of it. Abiword gets this right, by the way. The problem is the same whether the file is saved in .rtf or .doc. I hereby confirm my own bug, as directed by the "how to help" page. Reassigned to MRU MRU->CMC: The problem is, that these old WW6 frames are positioned above their anchor though the properties tell me "0,00 cm vertical position"... Do think that there's a possibility to "fix" anything here? can't be done correctly before 2.0 until I get the other placement options we need. OK, adding ms_interoperability keyword to indicate that it's an ms-word interoperability problem that can't be solved until 2.0. cmc->od: This is a far as we can go in the filter at the moment, I'll reassign this to you as you're looking into placement options and so on. This is an unusual case as its involves "old style" frames. These frames can be created manually be inserting a text box in word and choosing format->text box->text box->convert to frame... Its good to consider the placement options (and layout behaviour) that are available with these frames in addition to the "normal" ones. OD->Dan Kegel (20.08.2003): Hallo Dan, I've got a closer look to your example and I was wondering, what MS Word does with the given vertical position of the frames: Both frames are anchored at the second paragraph and vertical positioned 0 cm to this paragraph with a distance to the text of 0,33 cm. Thus, both frames are proposed to have the same vertical position directly in front of the second paragraph, correct?. But MS Word positioned these frames before the first paragraph with different vertical positions. When I input some text in the first paragraph (more than one line), every line, except the last one, of the first paragpraph are positioned before the frames. It doesn't seem - looking at the given PDF - that this was your intention, right? I think, you should anchor the frames at the first paragraph. I figured out, that MS Word behaves like this because of the given distance to the text. I changed the value of both frames to 0cm. Then MS Word behaves like I excepted - both frames are directly positioned before the second paragraph and behind the first paragraph. If I change the value of frame 'LINE1' to 0,1cm, it is positioned before the first paragraph, holding at the bottom the given distance, but at the top an unexcepted distance of about 0,3cm - I don't know why. Then, changed the value of frame 'line2' also to 0,1cm. Now, it is also positioned before the first paragraph, holding the excepted distance at the bottom, but between the frame an unexcepted distance of 0,5cm is hold - I don't know why. As you see, I'm a little bit confused about what MS Word does with your given positioning values. Such a behaviour would be quite complicated to implement in the writer. I propose a workaround for your positioning problem: Anchor the frames to the first paragraph and vertical position these frame relative to page with appropriate values, e.g. 1,8 cm for frame 'LINE1' and 3,1 cm for frame 'line2' Please give me feedback, if this works for you. cmc->od: These old style frames are a little bit interesting in word, this type of frame is not actually an object like a drawing object, but instead these frames are properties of the paragraphs inside the frame. So an old style frame is a series of paragraphs which all have the same absolute positioning properties. This might go someway towards explaining why the distance from text matters where the frames are being positioned in word, as the frames are actually paragraphs and so are "text". Perhaps the distance from text value of one frame considers text inside other old style frames when deciding where to go. OD->CMC (20.08.2003): Thanks for your comments. As I figured out with Andreas (AMA), the distance between the two frames doesn't seem to depend on the 'distance to text' values. It seems to depend, how MS Word layout engine works: It seems that first the body text is formatted without any frame. Next, frame 'LINE1' seems to be positioned considering the current paragraph position. Afterwards the body text is formatted again and now wraps around frame 'LINE1', but frame 'LINE1' isn't notified, that its anchor paragraph is moved. Next, frame 'line2' seems to be positioned considering the new paragraph position. Afterwards, again the body text is formatted and now wraps around frame 'line2', but doesn't notify any frame of its movement. To prove this 'theory', set the following vertical position values: for frame 'LINE1' -0,5cm and for frame 'line2' -1,0cm. As you see, the frames now overlap and the positions seems to be calculated following the given 'theory'. What your opinion about this 'theory'? For 'new style' frames (text boxes) a similar 'theory' about the layout engine is hold, but with the difference, that the body text isn't formatted until the last text box is positioned. Thus, the positions of both text boxes are determined by the paragraph position, before the text boxes are inserted. For a mixture of 'old style' and 'new style' frames, we didn't find a consistent 'theory'. Can you confirm this? dk->od: the document I found this in was a very important .doc file (at least according to google, which dredged it up for me) that happened to have a .pdf version as well. I have no control over it. No workaround is therefore possible on the source side -- OpenOffice really does have to render this the same way as Microsoft Office. OD->Dan (21.08.2003): Thanks for your comment, but your statement "OpenOffice really does have to render this the same way as Microsoft Office." doesn't help very much. OpenOffice is *not* a clone of Microsoft Office. We are providing a filter for Microsoft Office word processing documents and we really try to be as close as possible. And we try to adjust our layout engine for such document. But, we can't be perfect, because during the import we have got all layout informations for the document and we can't look into the Microsoft Office code, because it isn't open source. We also have to consider already existing OpenOffice documents, which are rendered with our current layout engine. These documents have to be rendered as they are in the current state. Thus, each adjustment of the layout engine has to consider this. In the given case, we have got the positioning values for the frames, but as you can see in the discussion, that we have to try to understand how these positioning values are used to find the corresponding position in the document by Microsoft Office. And I think, you can agree, that the algorithm, which is used, isn't very intuitive and doesn't correspond directly to the given positioning value. But, help to improve our filter is very welcomed and if you have concrete proposal how the layout engine of Microsoft Office works and how we can implement this in OpenOffice we will be very appreciated. Sorry, I'm just the messenger. Users are going to expect OOo to be able to load that file, that's all. I wish I had time to help implemement the fix, but I am off saving the world in other ways... OD (23.09.2003): accepted. This an adjusted formatting of frames with wrapping, this 'defect' can be solved. We do our best. OD (24.09.2003): We face the challenge - after the fully understanding of the layout algorithm of MS Word, we will try to implement it. OK, good luck! This might take care of one of the few remaining stumbling blocks for certain large automotive manufacturers. If you make progress on this, and need more test cases, I can send you some tough ones from a real live huge manufacturer who is waiting for this kind of fix before moving to OpenOffice/StarOffice. Created attachment 10526 [details]
An RTF file that is not recognizable in OOo
Just added a rtf file that makes the trouble even worse. This document looks nice in word (some overprinting but this is by design) The image is probably black because i messed it up a bit (sorry but had to). I'll upload a picture on how Word 2000 (and i personally) likes it. Amusing factoid: OOo 1.1.1rc crashes when you load and then save the document, TRANS-JOUR.DOC, that caused me to file this report in the first place. Amusing factoid, part 2: the crash I just mentioned has been in issuezilla for a while (with a different document) as issue 24978. Add dependence to issue #27349 OD->ivaroo: Please submit a separate issue for the RTF-import of document 'Thermat.rtf'. It isn't handled by this issue. fixed cws swobjpos04 by issue #27349 Which milestone will the fix appear in, do you think? I just tested with 680m45, and while there has been a lot of improvement in rendering of TRANS-JOUR.DOC in the 14 months since I filed the bug, this issue persists (as does one other: not as much vertical space is used above the footnote on the first page, which is probably partly responsible for the strange positioning of Fig 1 on page 2 instead of page 3 as in the PDF). OD->dankregel: Thank you for the compliments about our success to improve our Microsoft interoperability. It was a 'long way' for this issue to be solved. Several features have been implemented: - 'Negative positions for Writer fly frames', specification found at http://specs.openoffice.org/writer/compatibility/negative_positions_for_Writer_fly_frames.sxw - 'Follow text flow vs. leaving layout environment for Writer fly frames', specification found at http://specs.openoffice.org/writer/compatibility/follow_text_flow_vs_leaving_environment.sxw - 'Vertical alignments at page areas for Writer fly frames', specification found at http://specs.openoffice.org/writer/compatibility/vertical_alignment_at_page_areas.sxw - 'Adjust positioning of floating screen objects', specification found at http://specs.openoffice.org/writer/compatibility/adjust-object-positioning.sxw - 'Adjust text wrapping', specification found at http://specs.openoffice.org/writer/compatibility/adjust-text-wrapping.sxw - 'Unification of object positioning', specification found at http://specs.openoffice.org/writer/compatibility/unification_of_object_positioning.sxw - and finally 'Positioning of floating screen objects with considering its wrapping mode', implemented in cws swobjpos04, specification found at http://specs.openoffice.org/writer/compatibility/obj-pos-without-wrapping.sxw You see, we aren't on holiday on the last 14 months. The cws swobjpos04 is currently synchronising to SRC680m47. Afterwards an internal installation set will be build. This installation set is checked by the quality assurance team. If everything is ok, the cws is nominated for integration. I think it will take at least 3 weeks to nominate cws swobjpos04. Then, the release engineering will integrate the cws into the master. Which milestone that will be I don't know exactly. BTW, in my local environment of cws swobjpos04 document 'TRANS-JOUR.DOC' looks nearly the same in Microsoft Word and Writer - the positions of the page breaks differ about one line. OK, I'll check again in a month or so. On your local copy, does the text flow around Fig. 1 properly? On the July snapshot, the text runs behind the picture. (Interestingly, it's in the right spot on the page; looks like it's anchored relative to the page, but the text doesn't notice the image in the way!) The Q concept was quite aggressive on its MS compatibility goals, and it looks like you folks are on your way to really delivering. OD->dankegel: Yes, in my local copy the text wraps around figure 1 Reopened to assign to QA OD->MRU: Checked in internal installation set of cws swobjpos04 - please verify. set status back to FIXED Checked fix in CWS swdrawpos04. Checked in 680m52. Closed. |