Issue 76606 - Performance problem - "pasting" 65000x20 table from Calc to Base takes hours on 2.4GHz Pentium4
Summary: Performance problem - "pasting" 65000x20 table from Calc to Base takes hours ...
Status: CLOSED FIXED
Alias: None
Product: Base
Classification: Application
Component: code (show other issues)
Version: OOo 2.2
Hardware: All All
: P3 Trivial (vote)
Target Milestone: OOo 3.2
Assignee: marc.neumann
QA Contact: issues@dba
URL:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2007-04-23 10:00 UTC by kpalagin
Modified: 2009-07-20 20:57 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Data as csv file (931.14 KB, application/x-compressed)
2007-04-27 07:08 UTC, drewjensen.inbox
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description kpalagin 2007-04-23 10:00:03 UTC
Subject says it all. Repro steps:
1. Select data in Calc, Edit - Copy.
2. Start Base, Create new db, click Tables, right-click - Paste.
3. In Copy table wizard click "Definition and data", "Create primary key", 
Next, add all columns to the right, Next, Create.

I can supply .ods file (2.7MB, confidential), which contains the data to 
reproduce the problem. 
It takes 3 hours on 2.4GHz Pentium4 (1GB of RAM) to import all of the data 
into database. What makes the problem even worse is that Office is unusable 
for any other task during import and machine becomes sluggish (apparently some 
thread is running at high priority).
Comment 1 kpalagin 2007-04-23 10:19:52 UTC
P.S. Import is completely CPU-bound - I see 100% CPU utilization during import.
Comment 2 christoph.lukasiak 2007-04-24 16:42:11 UTC
tested on test pc p1800 (512mb ram) => two crashes (one of them after more than
a hour)

clu->fs: what is an appropriate time for 65000 datasets?
Comment 3 drewjensen.inbox 2007-04-25 09:09:10 UTC
You might want to separate two things here - The clipboard and importing to base.

I did the following:

Created a Calc file with 65,515 rows of 21 columns. Mixed data: decimal, date,
strings.

Created a Base file that connected to this Calc file.

Created a new embedded database Base file.

Dragged the table Sheet1 from the first Base file to the second Base file.

The import is not speedy but it finished in just under 18 minutes. This is on an
HP 810n ( AMD 3300, 640 Meg Ram ), WinXP Sp2 and a good mix of other
applications running at the time. Firefox ( 4 windows, 5 tabs in one - gmail,
Issue trakcker ( 2 tabs), OOoForum, local file ), 2 cmd windows, HSQLdb running
as server, MySQL 5 running, iTunes playing music from an internet feed. Total of
11 open documents ( 4 base files, 2 forms, 5 query data views ) in OOo.  The PC
was sluggish but I was still able to work in Firefox entering 2 posts at the
forum without too much discomfort at all...and the music never stopped playing :>)
Comment 4 drewjensen.inbox 2007-04-27 06:19:46 UTC
This deserves an update.

I preformed the same steps tonight as I did the other night, bringing the calc
data into an embedded Base database - My machine was running just about the same
mix as the other night also

but using 2.3m_210 

Time to import data dropped to 10 min 2 seconds.

Another BIG difference - system responsiveness. The CPU is still pegged during
the transfer BUT OOo is offering up clock cycles much, much more often now. I
could work with the other applications almost as if the process was not
screaming along. Someone put the word 'Cooperative' into Base on Windows it
seems :>)

It also appeared that total memory used by the soffice.bin process was less then
on 2.2, but I will need to run a few more tests before I can say for sure.
Comment 5 drewjensen.inbox 2007-04-27 07:08:27 UTC
Created attachment 44723 [details]
Data as csv file
Comment 6 drewjensen.inbox 2007-04-27 07:09:41 UTC
I thought others might want to check out the performance on their machines so I
saved the spreadsheet to a csv file and zipped for attachment here.

Comment 7 kpalagin 2007-04-29 07:07:57 UTC
I have tried m210 and it is roughly the same.
I have discovered that if I do not enable primary key creation then import 
finishes almost 4 times faster:
2.2 no PK   - 1m40s
2.2 with PK - 6m
(competitive analisys - Office 2003 takes 45s)

atjensen,
please make sure that you have timed copy with PK creation.
Comment 8 drewjensen.inbox 2007-04-30 06:25:25 UTC
Yes, all my test runs had included creating a primary key field.

Looking at your times does seem to show that removing the system clipboard from
the equation makes quite a difference. But it is hardly the only factor to be
considered.

Here is another test run - on the same machine, but a new software
configuration. Two differences only - First I changed OOo to use JRE 1.5.0_11
instead of 1.6.0 as the other night, second in getting ready for some contract
work I shutdown the MySQL server and fired up an Oracle 10g server on this machine.

I preformed the same record import test as before using 2.3 M_210, and with no
primary key being added, for this run my time was horrendous: 54 Minutes 20 Seconds.

OK - how many people are really going to have an Oracle server running on their
workstation - not many. So we can probably just toss this number out, but to be
sure: 

Next I left Oracle running and changed OOo to use JRE 1.6 again. Started the
transfer again. ( Just to be clear each time the HSQL embedded database is empty
) - I killed the process after 20 minutes.

So - I shutdown everything Oracle service ( server, recovery, agent, listener ).
I left OOo using JRE 1.6 and I performed the transfer one more time, again no
primary key. Run time: 4 minutes 06 seconds

Finally - for the last test I changed OOo back to use JRE 1.5.0_11. Run time 5
Minutes 28 Seconds.

All the test runs where with 2.3m_210

Comment 9 kpalagin 2007-05-01 18:30:06 UTC
I have done some more testing using 
http://www.openoffice.org/nonav/issues/showattachment.cgi/44736/Praha4.zip 
(make sure you set text delimiter empty, otherwise file would not import 
correctly) and version m210:
1. Using different version of JRE (1.5.11 vs 1.6.0) did not make a second of 
difference on WinXP.
2. Pasting that data on different OSes:
Suse 10.2 - 3 minutes
WinXP - 5m 50s
Vista - 35 minutes (yes, 35 minutes is correct).

Time spent in kernel (as seen in Task Manager) on Windows is around 65% on XP 
or around 90% on Vista.
Comment 10 drewjensen.inbox 2007-05-01 20:00:27 UTC
Another competitive analysis

Using the data in the Praha 4.csv file Kpalagin posted to populate a Calc sheet
in OOo and then doing a copy past of the 13,000 rows into Kexi 2007 for Windows
> 9 SECONDS. That is with adding a primary key field - HOWEVER, the date fields
are converted to text. Importing into Kexi directly from the CSV file took
longer - 11 seconds.
Comment 11 Frank Schönheit 2009-02-18 12:59:15 UTC
fs->oj: One for your performance list ...
Comment 12 ocke.janssen 2009-02-25 11:13:23 UTC
Fixed in cws dbaperf1.

Have a look at the wiki page
http://wiki.services.openoffice.org/wiki/Base/Performance#Row_Fetching
Comment 13 ocke.janssen 2009-03-06 13:56:44 UTC
Please verify. Thanks.
Comment 14 marc.neumann 2009-04-20 08:17:49 UTC
verified in CWS dbaperf

find more information about this CWS, like when it is available in the master
builds, in EIS, the Environment Information System:
http://eis.services.openoffice.org/EIS2/cws.ShowCWS?Path=DEV300%2Fdbaperf
Comment 15 thorsten.ziehm 2009-07-20 14:51:48 UTC
This issue is closed automatically and wasn't rechecked in a current version of
OOo. The fixed issue should be integrated in OOo since more than half a year. If
you think this issue isn't fixed in a current version (OOo 3.1), please reopen
it and change the field 'Target Milestone' accordingly.

If you want to download a current version of OOo =>
http://download.openoffice.org/index.html
If you want to know more about the handling of fixed/verified issues =>
http://wiki.services.openoffice.org/wiki/Handle_fixed_verified_issues
Comment 16 thorsten.ziehm 2009-07-20 15:34:54 UTC
Sorry this issue was wrongly closed. This issue will be reopened automatically.
And will be set after that back to fixed/verified.
Comment 17 thorsten.ziehm 2009-07-20 15:39:26 UTC
Set to state 'fixed'.
Comment 18 thorsten.ziehm 2009-07-20 15:43:51 UTC
Set back to state 'verified/fixed'.

Again. Sorry for the mass of mails.
Comment 19 kpalagin 2009-07-20 20:57:39 UTC
Using m52 on P4 2.12GHz - jan29_2006-feb4_2006cf.zip gets copied hundred times 
faster.
Thanks a lot!
Closing.