Apache OpenOffice (AOO) Bugzilla – Issue 62295
very slow (37 minutes) to load small (77KB) .sxw file
Last modified: 2017-05-20 11:18:18 UTC
When I attempt to load the attached 77KB .sxw document into OOo 1.1.4 on a 1.5GHz P4 with XP SP2 and 256MB RAM using a non-administrator account, the load begins to slow down after the 12th gray progress indicator segment during "Loading Document..." status. It slows down even more after the 13th segment, and a lot more after the 15th. The CPU pegs at 100%, mostly soffice.exe. After approximately 37 minutes, the document is loaded successfully and OOo seems not to have any problems with it. I am able to open other, much larger (401KB, for example) documents in a matter of seconds. I am also able to save the file in seconds, although longer than normal for this size file. If I save the file as Microsoft Word 97/2000/XP .doc, loading also hangs a bit but still completes in under 1 minute. If I re-save the .doc as .sxw then load again, the new .sxw file has no problems loading (although the formatting has changed a bit due to Word/OOo compatibility). Unable to upgrage to 2.x at this time. // slow slowly long time forever load loading half hour start startup text writer sxw native file document multiple tables font fonts symbol symbols gray screen window indicator hang hangs freeze freezes maximum pegged cpu percent
Created attachment 34291 [details] file exhibiting problem
Reassigned to ES.
ES->FLR: the content.xml is 11 (eleven) Mb big and mainly consists in imbricated "<text:span text:style-name="AMA3 ss - Blank Line">" tags.
flr: It seems that the style "AMA3 ss - Blank Line" is applied multiple times to a single paragraph. This can only happens via the API. My question is? How was the document created originally? Was it a special kind of filter or an XSL(T) script?
As we don't know anything about the origin of the document we will stop working on this issue. The fact that resaving the document fixes the problem encourages me to think that this document is an artefact. Question to the submitter: can you please explain how this document was created?
The document was created by, and has only been processed using, the OpenOffice end-user GUI, never a filter or external program. I did a lot of copy-and- paste, though, so perhaps it's a bug related to cut-and-paste. I am now using OpenOffice 2.0.2 and the problem continues. I resaved in ODT format, and when I reload the ODT, the load still takes "forever". When I save then load as a DOC file using OpenOffice, the load is almost immediate.
77kb in 37 minutes ? I think something is wrong
While analyzing a performance problem on Aqua (i85798) I noticed the root cause of this problem: The loop in SwXTextRange::_CreateNewBookmark() doesn't scale well with the number of bookmarks. The algorithmic complexity could be about O(n), but it is actually about O(n^2)!
The function _CreateNewBookmark() iterates over all bookmark names to make sure that the to-be-inserted new bookmark has a unique name. This is so slow because Writers bookmark array is not sorted by the names and all elements have to be checked. I can't see the O(n^2) because the iteration over the names never finds a match while loading the bugdoc. It would only fail if the sal_Int32 bookmark index overflows or bookmarks with the name SwXTextPosition + <index> were inserted otherwise. That doesn't happen here.
->AMA: I checked what happens if the string compare iterations are completely removed. The it takes only two minutes to load the doc. Changing the bookmark array of SwDoc to some hashing container could do miracles here. Target changed to 3.0 Reassigned
Update to my comment above: The method SwXTextRange::_CreateNewBookmark() itself is "only" O(n^1). Adding n bookmarks using this method is O(n^2) though.
Set target OOo3.x
keyword: performance
Bjoern, please have a look on the bookmark performance problems
retargeting 3.3
confirmed in OOO320m18 (released OOo 3.2.1)
error is in old binfilter code only, closing as WONTFIX. Please reopen if the performance is also as bad using modern file formats (e.g. odt).
sxw was confused with sdw - this is not a binfilter issue. Reopened.
starting work on cws swbookmarkfixes01 => STARTED
retargeting 3.4
I fixed the issue with the bookmark creation, however loading this doc is still awfully slow. I see a lot of hint array operations -- no wonder judging by the description of the content.xml by es. Given that the document is rather pathological I am tempted to close the issue as WONTFIX.
pls. reassign or close issues. Thx.
Reset assigne to the default "issues@openoffice.apache.org".