Apache OpenOffice (AOO) Bugzilla – Issue 19984
severe performance problem inserting database rows into calc via DnD
Last modified: 2006-05-31 14:29:06 UTC
Hi I've been doing some testing of the database access code in OO.o1.1rc4, and I've run into a few issues. This one relates to the insertion of rows into Calc from a MySQL database. I have a relatively small database here (12,000 records) that I'm using for testing. If I drag & drop 10 rows from the "beginning" of the database (low-numbered records), it completes almost instantly. If, however, I drag and drop 10 rows from the "end" (high-numbered, more recent records) it takes up to a minute to complete. Note that each record is quite small - a few date fields, a couple of short text and char fields, and a short description (varchar(255) from memory). I'm using the database over a 10/100 LAN, with MySQL Connector-J and J2SDK/JRE 1.3.1 . I will do more testing - jre1.4, local socket, and ODBC - later. Essentially, it looks like something is 'scanning' down the list of records before grabbing the ones needed. What, how, or why, I'm not clear on - but it will cause MAJOR performance headaches for decent sized databases. Unless I'm forgetting something really critical, this just shouldn't be required to grab a given set of records, since the primary keys are already known. The number of records I import at a time - 1 or 100 - doesn't seem to affect the time taken much. OpenOffice uses a LOT of CPU time during this process, becomes graphically unresponsive or almost so, and provides no progress meter. It appears to have crashed. Network activity is very heavy, and it's hard to tell if it's maxing the link (I'm on gigabit here, and the db server is on 10/100, so it's hard to judge using gkrellm etc). I'm attaching a pair of packet traces, both in libpacp format. The first is of the network activity caused by a drag&drop of records at the "beginning" of the DB. The second is a drag & drop from the end of the DB - warning, it's pretty huge. A SELECT for a record toward the "end" of the DB using the MySQL command line client completes almost instantly. to reproduce: - Locate a medium-sized sample MySQL database of perhaps ten thousand records. - - Create a new spreadsheet in OO.o - Create a data source using MySQL Connector-J (ODBC untested as yet) - Hit "F4" to open the data source explorer - Drill down to your chosen table in the database - Select a record near the beginning by clicking on the blank grey square to the left of the first field in the record - Drag it to the spreadsheet - it should appear almost immediately - Use the >| nav control to move to displaying the "end" of the database - Select a record from there and drag it to the spreadsheet - it should take at least 30 seconds to complete, at least over a LAN - local speed untested. Expected behaviour: - Data insertion speed is equal no matter what "position" in the database the records come from, and scales proportionally to number of records inserted. Note: the system OO.o is running on is a Dual Xeon with 2 gigs of RAM. OO.o appeared to max the CPU (it's a little hard to tell with 4 virtual CPUs). According to graphing done in ethereal, network traffic was below one megabit for the SQL session, and the link was not maxed. Database server for this test is an AMD Athlon 1800+ with 512MB of RAM. The Xeon runs RH8, the Athlon Debian 3.0. MySQL is version 3.23.54 on the server side, and 1.23.58 on the client-side. More testing is needed. Most importantly, with MySQL on a local machine using Connector-J MySQL on a remote machine using ODBC access MySQL on a remote machine using a newer JRE I can do the first two at some stage, and will when I get some time. The other one - I don't expect it to make a difference, and I won't be able to test it unfortunately. It's worth noting that I'm doing this testing to see if it's now feasable to use OO.o to make user access to a 100,000 record database of newspaper stories easier. This is a reasonably sized database, but would be totally unusable at present.
Created attachment 9634 [details] packet trace (libpcap) from insertion of low-numbered records
Created attachment 9636 [details] similar trace, of insertion of records with high primary key value
Interestingly, "Seeking" to the end of the DB using a newly created form takes no longer than doing so in the Data Explorer. This may be a calc-specific problem. Additionally, I noticed on closer examination of the packet traces that the same record is being requested multiple times. This strikes me as VERY strange. This did not happen in the first trace, which was the first insert operation, but began happening right at record 1 of the second trace. The order of records SELECTed is: 1 2 3 2 3 4 3 4 5 4 5 6 5 6 7 6 7 8 (and the patten appears to continue). So, we're getting each record 3 times, and we don't even need to be getting them in the first place. Each response appears to contain the full reply I'd expect.
Still present in 1.1RC5, tested with same setup as before.
correcting sub component (see http://www.openoffice.org/issues/describecomponents.cgi?component=Database%20access, please), and default owner
Craig, without investigating this at the moment, your first few sentences suggest to me that you may also have run into issue 15279.
Thanks; you're quite right. I had a look through issuezilla to find similar issues before submitting this one, but didn't manage to turn up that bug. Clearly I didn't look hard enough. This indicates that the problem ocurrs with two entirely different databases (MySQL, "Pervasive"), two different access methods (ODBC, MySQL/Connector-J), on two entirely different host setups and even OSes. I guess it helps confirm that the issue is in OO.o not, say, a Connector quirk. I'll post a reference to this bug at bug 15279, then this one can be closed as a dup. The packet traces etc here will still be available for additional examination after bug closure, I presume? Just to clarify, the packet traces I've attached are suitable for use with ethereal or tcpdump - any tool that supports libpcap format, really.
Hmm... just did one more test, and turned up something very interesting. If I follow the same procedure (essentially) to drop data into Writer, the insert is essentially instant, no matter where in the database the record(s) come from. It doesn't matter what format (table or text) you use to insert the data. Perhaps this is a calc bug rather than a DBA bug?
set target, confirm and send to the right developer. short description: - create a table with ~2000 records - display the table in the datasource browser - D&D the first 4 records to a calc document. => that's fast (< 1s) - go to record ~1974 - D&D 4 records to the calc document ==>> this need much more time as D&D records from the beginning of the table. (~5s) We can't reproduce runtimes from about 30 minutes, however something goes wrong here.
s/minutes/seconds/ WRT to runtime. I don't doubt you can manage a 30 minute runtime, but you'd have to have a database with very large records and/or something like a million records. At least, that's if time scales proportionally to record #, as I suspect it will after looking at the packet traces. It takes 30 seconds or so with a DB with 10k records on a _fast_ machine, at least in my tests. 5 seconds seems well in line with what I'd expect from grabbing record # 2000 or so, as you're doing. Of course, it'll vary a bit depending on host speed, LAN speed, etc, but your results sound like they match mine pretty closely. What gets me is why it starts retrieving records from record #1 in the first place, rather than just quietly fetching the required ones. Weird. Anyway, I thought this ended up looking like a duplicate of bug 15279 - as per above discussion. If you agree, then perhaps it's best to close this as DUPLICATE and keep discussion on the initial bug. So long as the packet traces and such get retained when the bug is marked as a dup, it strikes me as the way to go.
Craig, I don't think this is a duplicate of bug 15279. 15279, if I understand it right, talks about refreshing a already existent data range, while this bug here talks about creating a new one with DnD. Different things happen in both situations, so I really think we should keep them separated. The fact that in Writer, the problem does not occur, indeed suggests that it's a Calc problem - which doesn't mean that there's no room for improvement on our side :).
fs->oj: I *suppose* that Calc is not using the RowSet we which provide in the DnD data, but creates an new one, and there travels to the n-th record. This of course will take the longer the larger the record number is, and would explain the observations.
Yeah, I get you - I totally missed the "(data->refresh range)" comment in bug 15279. The symptoms do sound very similar, however.
I've just been able to confirm this issue with an entirely different database. This time, I tested with the Progress SQL-92 database via the progress JDBC driver. The results appear the same, though the tcpdump can't be usefully interpreted (being native progress traffic). As with the other DB, d&d into writer is almost instant.
I'm inclined to agree this is an OO.org problem, and specifically something in Calc. I'm suffering it with the DataPilot. It's so slow the DataPilot is usless. Comparing on a K6-2 400, 64M SunPCI card, in Win98. Using MS ODBC and MyODBC 3.51. This is a small table, 32900 rows, compared to the final desired size, with only 10 columns. Creation a simple pivot table in Excel97 takes ~1 minute, probably due to the low memory/speed of the client. mysqld never takes much CPU. Trying the same with OO.org 1.1 and the DataPilot is horrible. Just to get to the Layout stage take 20 minutes. After laying out the 5 cloumns I need, it take 50 minutes for the table to appear in Calc. Between selecting the table and the layout, or after hitting OK for the layout, OO.or doesn't refresh, and gives no indication of progress. I have also tried this with unixODBC and MyODBC on an Ultra10 with Solaris8, and on the RHL-9 system running MySQL. In both case OO.org takes well over an hour to create this simple table. SQL querries from isql (ODBC) or OO.org are nearly instant. Added to the speed differenc with Excel97 (Same machine, same drivers, same task) it appears the the DataPilot/Calc database interface needs serious work.
Priority should be P1. Databases and Cal are unusable like this. Changing OS should be ALL. I've seen it on Win98, Solaris, and Linux. It's still in OO.org 1.1 final.
this here is no crash, thus removing dependency to issue 21786 this is no ms interop problem, thus removing dependency to issue 21783
removing dependency to issue 21783 (which is the ms interop tracking bug) - I don't think this is an interop issue. If somebody re-introduces the dependency, please next time justify it.
change subcomponent to 'none'
*** Issue 21305 has been marked as a duplicate of this issue. ***
Fixedin cws insight01.
verify in cws insight01. However there is still an issue about D&D ( issue 30306 )
.