Apache OpenOffice (AOO) Bugzilla – Issue 2800
Loading .sxc file takes ~10x longer than .xls file.
Last modified: 2013-08-07 15:14:58 UTC
I have a large complex spreadsheet that am I am converting from Excel to OOo (using v641C under linux). I can import this document in its original .xls format into OOo in approx. 30s. After saving the document in the native .sxc format, the file now requires >300s to load. I have tracked the problem down to the many repeated "if" statements on on sheet of the original document. Thus, I have been able to make a simple demonstration file for this problem. To reproduce: Simply create an Excel file with 256x256 block of "if" statements (or use my oneif_256x256.xls file). Now, open that document in OOo and save it in .sxc format (or use my oneif_256x256.sxc file). Close OOo. Start OOo fresh, and open the .sxc format file. Load time will be approx. 10x that of opening the .xls file. Note: this is *not* a Excel conversion issue. You can also create the document fresh in OOo and the .sxc file will still require a very long time to open. I mention the "10x load time to the .xls file" to point out that this must be an issue with the reading of the .sxc file format itself, and not some OOo internal issue.
Created attachment 919 [details] Excel 256x256 block of "if statements" example
Created attachment 920 [details] Excel 256x256 block of "if statements" example
Created attachment 921 [details] 256x256 block of "if statements" in sxc format
oops, send the Excel file twice. I know the second Excel attachment is okay, I'm not sure about the first one.
Hi Eike, it seems to be yours. Loading the sxc by SAB takes about 1:00 compiling the formulas takes about 1:43 and calculating the cellheights 0:05. Frank
gnumeric has a (somewhat) similar problem with file like this, which has supposedly been solved recently by Jody: http://bugzilla.gnome.org/show_bug.cgi?id=68723
Gee, this is something that could benefit from "shared formulas", reusable formula code. Also the "all formulas are listening to one and the same cell" mentioned in gnumeric could be an issue here in compile time.
I've just confirmed that this problem still exists in OOo v1.0.0. Load time for the .xls file is ~7s while the .sxc file is ~55s on my 600MHz laptop running RH Linux v7.2.
i just confirmed this problem still exists in v1.0.1. Load time of the .xls file is ~8s and of the .sxc file is ~65s on a 600MHz linux machine.
First of all: reading in an XML .sxc file is always slower than any binary file format, no matter if Excel .xls or SO5 .sdc, since we have to compile and interpret all formulas instead of reading in readily compiled token arrays and results. I optimized the process at various places, and the attached example file on my machine now loads in 18s instead of the 29s it previously needed. This is far from optimal, but at the moment it couldn't be much improved without reimplementation of some core parts, if at all. Note also that the IF() is not the problem, same behavior if you exchange all IF() with SUM(), for example. However, I did the changes on the SRX644 build line, no backport to OOO_STABLE_1 planned, therefor I set the issue to resolved later.
back-reference to iBIS task: #99677#
Hmmmm, I posted a comment a few minutes ago and it didn't register. Weird, I'll try again. ER- Thanks, I'll take a 40% improvement in load times anyday! You mention that XML format files will always be slower than binary file formats, and that makes perfect sense to me. But, then I wonder how gnumeric (which uses a XML file format) is able to load this same file (once converted to gnumeric format) in 3s? Maybe there is some interesting stuff in the changes Jody made which would be of interest of OOo? http://bugzilla.gnome.org/show_bug.cgi?id=68723 Regardless, thanks again Eike. -Jon
Without having investigated it any further I'd say that the file converted to Gnumeric format uses shared formula expressions, where a range of identical formulas is represented by one formula expression referenced by all subsequent cells of that range, so parsing has to be done only once. This is something not implemented yet in OOo Calc.
Okay, "shared formula expressions" sound like a good idea. Should we keep this bug open to track that new feature? My original document (the one that takes 300s to open) is one of many, many spreadsheets the financial people at my company use. Even with this improvement in load time you have done, I'm sure they will not switch to OOo if i takes them (300s minus ~40%) to load their documents. My point here is that we are not talking about waiting 18s in these real-world cases, but rather, many minutes. PS: Just to give you an idea: These spreadsheet they use have 365 columns, one for each day of the year, with a row for each item they are tracking, sometimes there are thousands of items. Each cell has many "if" statements, sometimes in ugly nested formats. Ugly? Yes. But, these are financial people, there is no talking to them about elegant spreadsheet design. :)
Have to reopen to reassign, rediculous.
reassign to QA
restore status
For what it is worth, the .sxc file loads ~25% faster in OOo 1.1b2 as compared to OOo 1.02. Great job! It still takes 35s on my home machine to load the .sxc, versus 9s to load the .xls file... but now these numbers are within the limit of being tolerable.
closed because remaining problem depends on implementation of shared formulars which is still planned for the near future. (see also PCD issue 20489)