Issue 1905 - Xcl Imp: Performance with large file
Summary: Xcl Imp: Performance with large file
Status: ACCEPTED
Alias: None
Product: Calc
Classification: Application
Component: ui (show other issues)
Version: 638
Hardware: PC Windows 2000
: P3 Trivial with 5 votes (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL: ftp://atari-source.com/pub/staroffice...
Keywords:
Depends on:
Blocks:
 
Reported: 2001-10-15 16:15 UTC by bryceman
Modified: 2013-08-07 15:12 UTC (History)
4 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Basic test for Large spreadsheet. It needs to have cells copied to work. (11.43 KB, application/vnd.oasis.opendocument.spreadsheet)
2005-11-04 15:40 UTC, mestech
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description bryceman 2001-10-15 16:15:13 UTC
Ok, here is the issue posting for the thread I started, "Torture-testing 
OpenOffice" on the discuss mailing list.  The url link is a 20MB zip of the 
original excel spreadsheet.  At the end of this issue are urls for Win2k users 
showing the performance of SO/OOo and Excel in comparison opening the same csv 
file.  This is a complex issue and I hope that just posting my three main 
messages in that thread will serve as sufficient explanation.  However, there 
is a lot of text here, so I will try to summarize:

- SO/OOo will not successfully open large (~50MB) Excel Workbooks.
- SO/OOo takes an order of magnitude longer (3 min. vs. 10 sec.) than excel to 
open large, but reasonably sized data-only worksheets, even when converted to 
its native format.

Message 1:

My casual interest in OpenOffice was peaked when the StarOffice beta was 
released, so I thought I'd try some torture-testing.  ;-)  My boss at work has 
a huge spreadsheet that he's been using in Excel to analyze some test data.  
When I say huge, I mean 23,000 rows by 80 columns, with almost every column 
based on formulas.  It is a 50MB file in excel 2000 format, so it really is 
huge. Excel opens this file in about 10-15 seconds at 10-20% cpu, using 120MB 
of ram.  I tried OpenOffice with the 638c build, and it was ugly.  It took 
about 2 minutes at 100% cpu to get about 35% open, when I stopped it because, 
it was getting close to using 300MB of ram, sucking my 400MB of swap space 
completely dry.  Now I expect that a lot of this problem is overhead in 
converting Excel's twisted format, but I can't even get far enough to try the 
spreadsheet in OO's native format.  I don't really have the time (patience, 
really) to try again with my swap space increased drastically.  I like the idea 
and feel of OO, but if it fails a stress test this miserably, what am I to do?  
Any suggestions out there?

Message 2:

Well, I've tried some more stuff with the file this morning.  I increased my 
max paging file size to 1 GB and let OO run in the background at low priority 
so I could continue working.  OO has been running for about two hours now.  Its 
current status is completely unresponsive.  It took about 51 minutes of 
processor time, then dropped to 0-5% cpu usage.  I am assuming that this means 
that the actual conversion is complete.  I didn't check the memory usage very 
frequently during the conversion, the max I saw was ~230MB.  Since the cpu 
usage dropped, memory usage has varied from 20-100MB.  What is causing the 
varying memory usage, I have no idea.  Since the cpu usage dropped, I have left 
it as the foreground application for about 3 minutes with no response.  The 
task bar entry for it still reads "untitled," not the name of the converted 
excel spreadsheet.  FYI, I'm running Win2K SP2.  I've pretty much written off 
OO/SO as unsuitable for working in an MSOffice environment, but I still have 
hopes that an all OO/SO environment would be doable, maybe even pleasant.  ;-)  
The fact that OO/SO performs so horribly with a file that Excel handles well is 
simply irreconcilable.

Message 3:

I've done yet some more testing.  I exported the data-only csv from Excel and 
imported into OO.  It took 3 minutes to open, 3 minutes to save in OO format, 
and 3 minutes to open again in OO format.  Those times are cpu times as 
reported by the windows task manager, and compare to sub-10-second times for 
excel.  That is a huge margin.  I did a quick-and-very-dirty analysis of what 3 
minutes means as far as cycles per cell processed:
 
180 seconds * 900,000,000 cycles/second
-----------------------------------------------------------   =   ~84,000 
cycles (instructions) per cell.
24,000 rows * 80 columns
 
Now first of all, I freely admit that this is a rough analysis.  I'm assuming 
that my PIII 900 can actually get 1 instruction done per clock, which is very 
optimistic at best.  At first I thought that most of this time was because my 
system only(?!) has 256MB of ram, so I did some performance testing with the 
tool built into Win2k.  I looked at five different indicators over the time it 
took SO to load the csv and compared it to Excel.  If you're running Win2k, I 
think you'll be able to see the html files I've attached that use the 
performance activex control.  Otherwise, you should be able to look at the text 
and chart it in a spreadsheet app, like Calc for example.  ;-)  I was actually 
surprised by the uncharacteristically simple and understandable format MS used 
for the data.  Anyway, the graphs are telling for SO.  It briefly does a lot of 
paging while I'm assuming the file is read from disk, then does almost no 
paging for the remainder of the time, apparently spending almost all of its 
time on processing.  Please tell me why this much processing needs to be done 
on a data-only, text-based format?  I _really_ hope this is due to some bug, 
not inherent to the design of the filter or whatever is taking so long, because 
this really is unacceptable performance.  I have gotten the OK from my boss to 
post the files, I just need some place to put the ~20MB zip file so I can link 
the url in an issue.  If you guys have some space I could use, I'd be 
grateful.  I think we'd all like to see some more testing on this issue.

Attached files:
<a href="http://myhome.spu.edu/brycesho/SO_csv_open.htm">StarOffice Performance 
Graph</a>
<a href="http://myhome.spu.edu/brycesho/XL_csv_open.htm">Excel Performance 
Graph</a>
Comment 1 peter.junge 2001-10-15 16:44:49 UTC
Hi,
we know this problem but we still have Beta state. I think Daniel
knows best about the recent development of the Excel import.
Regards, Peter
Comment 2 daniel.rentz 2001-10-16 06:54:02 UTC
I'm currently working on performance enhancements and bug fixes for import/export of huge Excel files and will add your file to my tests.
But I think the main problem with this file is that it contains huge charts with >23000 data points. This is a known problem of our chart module and will hopefully be fixed in the planned new chart implementation (see http://graphics.openoffice.org/chart.html).
Btw: What are the links at the end of your comments? I cannot see anything there.
Comment 3 bryceman 2001-10-16 16:59:07 UTC
Reply to Comment 1:  The CSV issue is probably more frightening.  All 
that's happening is importing fixed data.  Granted, there's a lot of 
it, but 94 microseconds per cell is horrible.  That's an eternity in 
today's world of computing.

Reply to Comment 2:  It is most definitely not solely an issue with 
charting.  Please do not miscategorize this issue.  I've tried it 
with and without the charts in the file, and the results were the 
same.
Comment 4 daniel.rentz 2003-01-13 16:05:04 UTC
target->OOo2.0
Comment 5 thorsten.ziehm 2004-08-19 15:51:26 UTC
Because of limited resource for OOo2.0, it was decided to shift this tasks to
the next milestone. If somebody will be found, who can implement this until
OOo2.0, then this tasks will be re-targeted.
Comment 6 jimjams 2005-08-30 11:15:06 UTC
Just to let you know.
I have a spreadsheet with several chart on it, one of them is a simple EUR/USD
change tracking chart with about 2300 points.
Other data are not huge.
When I recalculate spreadsheet it takes about 45 seconds.
Removing the chart it calculate istantly.
So in my case the problem was the chart....
Comment 7 redndahead 2005-09-16 14:29:01 UTC
There should really be a look into this.  Here is a blog from a person from zdnet

http://blogs.zdnet.com/Ou/?p=102

This shows the test results opening his file.  I downloaded his file and
confirmed this.

Calc can't open the .xml file so I had to save it in .xls and then import it
into calc and save it as .ods.  I changed 1 column width and then hit save and
it took a considerable amount of time to save.
Comment 8 mestech 2005-10-28 18:46:57 UTC
Slow loading is a real problem, not just on Windows but also on Linux.  It make
autosaving even a greater issue.

To explain, this is being dicussed on the users mail list and I decided to run
my own tests.  I created a very large spreadsheet to test with.  The file sized
are in the 20M (ods).

20M Oct 27 12:13 Large_file_calc_test.ods
52M Oct 27 13:15 Large_file_calc_test.xls

Here are timing tests that I did. 

Problems.
Gnumeric doesn't support ODF spreadsheets.
KSpread says it will open ODF spreadsheets but crashes on every 
attempt to open the file.

        Gnumeric      OOo     Excel Native  Excel.ooo.export
Recalc      17.35   1:24.60     >1s           1:09.73
Save      2:21.48   4:32.48    17.65            17.73
Save as  14:35.15   2:40.98     7.64          2:37.22
Open        11.28   2:47.98    10.27          3:58.45

Also this topic came up on /. today.

 OpenOffice Bloated?
http://slashdot.org/article.pl?sid=05/10/27/1425232&tid=185

Note, in Linux.  A recent kernel upgrade sped things up to the above timings. 
It was much slower using the same spreadsheet a few days ago.  This at least an
improvement.

I understand that there is going to be a tradeoff in the converting from ODS to
the binary that is necessary to work but there must be a way of speeding this
process up.  The slashdot article posts two links that go into even further
issues of optimization problems.

If possible add Linux to the OS or I can create a new issue to cover Linux.
Comment 9 mestech 2005-11-04 15:31:49 UTC
I am going to attach a basic version of the spreadsheet I used.

This is with most of the rows deleted to minimize the size of the sheet.

All that needs to be done to recreate the spreadsheet is to copy row 7 all the
way down to row 16501.

Sheet 15 is complete as the last cell (F16501) is used for the calculation timing.
Comment 10 mestech 2005-11-04 15:40:53 UTC
Created attachment 31171 [details]
Basic test for Large spreadsheet.  It needs to have cells copied to work.
Comment 11 mestech 2005-11-04 15:46:01 UTC
My attachment actually requires that all the sheets need to have the cells
copied to all the rows down to 16501.  I had to modify it further due to the
size limit to be uploaded.

Comment 12 Ismael Fanlo 2010-03-20 12:54:25 UTC
I find this issue, to which exactly the problem I have in a company that I 
support. 

But my surprise is to see that the issue is in 2001, revived in 2005.

Now we are in 2010 and OpenOffice.org has its version 3.2. We like.
Someday it may solve this?

(sorry for my bad English, I'm Spanish speaker, and Google Translator is my 
friend)