Issue 125179 - Opening complex docx document takes several minutes (but succeeds)
Summary: Opening complex docx document takes several minutes (but succeeds)
Status: CLOSED FIXED
Alias: None
Product: Writer
Classification: Application
Component: open-import (show other issues)
Version: 4.1.0
Hardware: PC All
: P3 Critical (vote)
Target Milestone: 4.1.1
Assignee: Oliver-Rainer Wittmann
QA Contact:
URL:
Keywords: performance, regression
Depends on:
Blocks:
 
Reported: 2014-06-29 22:29 UTC by Tim Baigent
Modified: 2017-05-20 10:35 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: 4.2.0-dev
Developer Difficulty: ---


Attachments
F# 3.1 draft language specification (919.39 KB, application/octet-stream)
2014-06-29 22:32 UTC, Tim Baigent
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description Tim Baigent 2014-06-29 22:29:26 UTC
Opening the attached docx file (the current F# draft language specification) initially appears to hang, with the soffice.bin process using 100% of one processor core.

It eventually succeeds (taking just short of 4 minutes on my Core i7 system, loading the document from SSD, with 32GB RAM).

If Writer was launched by double-clicking the file then no GUI becomes visible (and if Writer was already open then it remains unresponsive) until this process is complete.

The conversion result seems faultless. There is no problem saving to *.odt or *.doc format, and printing also works fine.

The document has 301 pages (in original format) and many tables.

Is a delay of this order to be expected? If so, some sort of UI (perhaps with a progress bar) would be helpful.
Comment 1 Tim Baigent 2014-06-29 22:32:31 UTC
Created attachment 83623 [details]
F# 3.1 draft language specification

Source: http://fsharp.org/specs/language-spec/
Comment 2 Ariel Constenla-Haile 2014-06-29 22:48:44 UTC
OK (though a little slow) with 3.4.1 and 4.0.1
Regression in 4.1.0 and nightly build (AOO420m1(Build:9800)- Rev. 1605918, 2014-06-27_04:11:21 - Rev. 1605944)

@Tim: if this is a blocker for your daily work, try reverting to previous version 4.0.1
Comment 3 Ariel Constenla-Haile 2014-06-29 22:50:46 UTC
(In reply to Tim Baigent from comment #0)
> Is a delay of this order to be expected? If so, some sort of UI (perhaps
> with a progress bar) would be helpful.

The lack of progress bar seems to be another bug with the filter, because the document in Issue 125055 shows a progress bar (it is loaded with a different filter).
Comment 4 Tim Baigent 2014-06-29 23:23:37 UTC
@Ariel,

Thanks for giving your attention to this. (Any faster and you would have responded before I submitted the report!)

It's not a blocker for me personally since I can usually avoid docx.

Thanks again.
Comment 5 Oliver-Rainer Wittmann 2014-06-30 07:55:47 UTC
@Ariel:
Can you figure out, if this performance decrease has the same root cause as issue 125055?
Comment 6 Oliver-Rainer Wittmann 2014-06-30 07:57:18 UTC
@Tim:
Could you give the recent developer snapshot build which had been created for early testing of our planned 4.1.1 release a try - you find them under https://cwiki.apache.org/confluence/display/OOOUSERS/Development+Snapshot+Builds
Comment 7 Tim Baigent 2014-07-01 00:03:18 UTC
@Oliver,

With 4.1.1 M1 I get a slightly different result:

Opening the file by double-clicking Windows Explorer, I now immediately get an OpenOffice splash screen, with a wait cursor.

Otherwise the same. Process soffice.bin using one full core, OpenOffice GUI with translated document appears at 3mn50s.

Tim
Comment 8 Ariel Constenla-Haile 2014-07-01 05:50:36 UTC
(In reply to Oliver-Rainer Wittmann from comment #5)
> @Ariel:
> Can you figure out, if this performance decrease has the same root cause as
> issue 125055?

It is reproducible with 

AOO411m1(Build:9770)  -  Rev. 1603804
2014-06-19 10:08 - Linux x86_64

AOO420m1(Build:9800)- Rev. 1605918
2014-06-27_04:11:21 - Rev. 1605944

so the fix for issue 125055 doesn't seem to solve this one.
Comment 9 Kay 2014-07-03 21:22:58 UTC
With AOO411m1(Build:9770)  -  Rev. 1604099
2014-06-30_07:13:14-Rev.1606633

on Linux-32, and 4GB RAM, I gave up after 4 mins -- no document appeared, and AOO was hung up basically.
Comment 10 Armin Le Grand 2014-07-04 14:07:47 UTC
Took a look using trunk debug version and VerySleepy. I have not enough expertise in Writer, but wanted to check for an evtl. easy to find bottleneck. Finf´dings:

- SwXBookmark::attach takes a lot of time (deactivated to get over it)
- SwTable::CheckConsistency() takes a lot of time, maybe should only be called after importing (?)
- Only in trunk: SwFmt::GetBackground used to decide if to call SetCompletePaint(), this could be avoided with checking first if it's already set
Comment 11 Armin Le Grand 2014-07-04 14:09:43 UTC
One more: After repagination a huge number of assertions at every action from SwTxtINetFmt::GetCharFmt() "<SwTxtINetFmt::GetCharFmt()> - missing character format at hyperlink attribute". Looks like something that should be corrected at import time (?)
Comment 12 Oliver-Rainer Wittmann 2014-07-07 11:41:06 UTC
taking over to have a closer look.
Comment 13 Oliver-Rainer Wittmann 2014-07-07 11:50:34 UTC
I have already figured out that the *.docx import of overlapping bookmarks is broken since AOO 4.1.0 - see issue 125215

For the in-place editing of Input Fields some new data structures are introduced at the mark manager which also cause a certain performance decrease.
Comment 14 Oliver-Rainer Wittmann 2014-07-08 12:23:38 UTC
(In reply to Oliver-Rainer Wittmann from comment #13)
> I have already figured out that the *.docx import of overlapping bookmarks
> is broken since AOO 4.1.0 - see issue 125215
> 
> For the in-place editing of Input Fields some new data structures are
> introduced at the mark manager which also cause a certain performance
> decrease.

Solutions to these two observations - provided with issue 125215 - solved the observed performance decrease.
Comment 15 Tim Baigent 2014-07-08 13:22:43 UTC
Great.

What was the problem? (Verbose details very welcome.)

In which builds is it fixed?
Comment 16 Oliver-Rainer Wittmann 2014-07-08 13:47:52 UTC
(In reply to Tim Baigent from comment #15)
> Great.
> 
> What was the problem? (Verbose details very welcome.)
> 

I think three issues are causing the performance decrease:
(a) The one which has been already fixed with issue 125055. The sorting of all mark containers was triggered each time text is inserted into a paragraph when a mark ends at the insertion position. This is needed to fix issue 124338. But the sorting is only needed, if another mark starts at the insertion position.

(b) The import of bookmarks of *.docx document was completely broken. This caused the insertion of hundreds of wrong bookmarks without a corresponding bookmark name. The method to find a unique name is not efficient, but do not have to, because it should not be triggered that often.

(c) An additional mark container had been introduced for the enhancement of annotations/comments on text ranges. But the costs of this container does not justify its benefits. It was only use in two use cases, which could also work with existing containers. 

> In which builds is it fixed?

on trunk: the next build bot builds which work on trunk should include the fixes.

for planned 4.1.1: the fixes will not be included in the announced milestone 2, but the next milestone will contain them.
Comment 17 Tim Baigent 2014-07-09 10:36:26 UTC
(In reply to Oliver-Rainer Wittmann from comment #16)

Thanks for satisfying my curosity. It's very interesting to get some feel for how these things are tackled. I've been very impressed with the process.