Apache OpenOffice (AOO) Bugzilla – Issue 22761
(still) Terrible performance on kernels 2.6.x while any CPU-hog running
Last modified: 2013-08-07 14:44:24 UTC
This problem with OO was supposedly fixed several releases ago, and it seemed to be due to some unusual usage of Linux's sched_yield() in OO. Now I am trying OO 1.1.0-RC5,and even 680m13 on a Linux kernel 2.6.0-test9-mm5 and the problem is still there. It is very simple to trigger: a) Start any CPU-hog on the box, for example a simple "yes" works well for this purpose. b) Try to use OpenOffice. For example, try to save some modified file: it takes ages to finish, because the OpenOffice process hardly gets any CPU time to complete the task. I have been told that some vendors package OpenOffice with a fix for the cause of this problem, but it seems the original versions from www.openoffice.org are still missing this fix. Without this fix OO can be virtually unusable on boxes running kernels 2.6.x with some load behind the scenes.
Transferring to KR. US->KR: as said on the phone you volunteered to evaluate/dispatch this issue. Thanks a lot!
Jose, we tried to reproduce this with a Fedora&2.6.0 and did not see the described behaviour. Could you give as more system information and may update your kernel to 2.6.0 and see if it still behaves the same?
P.S.: We tried SRC680 and OOo1.1.
Additional information. All the following data was gathered on a Linux Debian Sid box running Linux kernel version 2.6.1-rc3 and libc6 2.3.2.ds1-10. The test was to save a .sxw of aproximately 125 pages with no graphics, and 140 KiB in size. OOo_1.1rc5_030926_LinuxIntel_install_es.tar.gz ---------------------------------------------- No load: 2 seconds. "yes" running on an "xterm": 242 seconds. OOo_680m13_LinuxIntel_install.tar.gz ------------------------------------ No load: 2 seconds. "yes" running on an "xterm": after more than 18 minutes, I stopped "yes" and in a second the "save" ended. Under load, with both versions the progress bar goes as fast as without load up to the middle, but from that point it slows down. Hope it helps.
Hi, unfortunately I didn't have a Sid available, tried instead on a Sarge (Debian testing) with stock 2.6.1 kernel. Everything works fine. Just a guess, a you sure that DMA is enabled?
This still works for me, even when disabling DMA. So, new state is "works for me".
.
I can reproduce this problem (or one like it) on a 2.4 kernel with ftp://66.92.65.9/pub/daily-temperature.sxc (a fairly large spreadsheet, to be sure) and OOo 1.1.1 (downloaded binary from OpenOffice.org). Specifically, I'm running SuSE 8.1 with the k_deflt-2.4.21-203 RPM kernel (stock), on a Dell Inspiron 8000 laptop with a 1600x1200 screen, 512 MB of RAM, and XFree86 4.3. Using glibc 2.2.5-184 and libstdc++-3.2.2-45 from SuSE's update site. It is of course possible that SuSE back ported something from 2.6, but the point is that I have a reproducible test case. I'm nowhere near out of memory: $ free total used free shared buffers cached Mem: 514804 463952 50852 0 52012 187668 -/+ buffers/cache: 224272 290532 Swap: 530104 10160 519944 The command I run is "yes > /dev/null" (indeed, even "nice -20 yes > /dev/null" exhibits the same problem). The exact observed symptoms are that OOo completes the initialization through the splash screen quickly, and starts loading the document. The first two dots on the progress bar pop up right away, and then it slows to a crawl. I've observed the same behavior while saving, but don't have numbers. Note that merely starting up OOo without a document doesn't really exhibit this problem. It took about 7 seconds with the system idle and 10 seconds with yes running. Notice that ps shows "yes" getting virtually all of the CPU, while OOo gets almost none. Note that I'm running "yes" at nice 20: rlk 32391 92.6 0.0 1476 468 pts/1 RN 13:39 3:09 yes rlk 32392 0.0 0.0 1248 280 pts/3 S 13:39 0:00 /usr/bin/time /us rlk 32393 2.6 10.6 133228 55084 pts/3 R 13:39 0:05 /usr/local/OpenOf rlk 32391 93.2 0.0 1476 468 pts/1 RN 13:39 4:11 yes rlk 32393 1.9 11.0 134892 56724 pts/3 R 13:39 0:05 /usr/local/OpenOf rlk 32391 92.6 0.0 1476 468 pts/1 RN 13:39 9:04 yes rlk 32392 0.0 0.0 1248 280 pts/3 S 13:39 0:00 /usr/bin/time /us rlk 32393 0.9 12.3 141868 63640 pts/3 R 13:39 0:05 /usr/local/OpenOf It took about 14 minutes to complete. In contrast, this spreadsheet loads in 40 seconds if the system is idle. Here is my modules list. I can retry without the taint if need be, but I believe I've done this before without the tainting module being loaded. Module Size Used by Tainted: P snd-pcm-oss 46432 0 (autoclean) vpnmod 188864 0 (unused) snd-mixer-oss 14072 1 (autoclean) [snd-pcm-oss] isa-pnp 31100 0 (unused) parport_pc 25928 1 (autoclean) lp 6272 0 (autoclean) parport 23424 1 (autoclean) [parport_pc lp] sd_mod 12960 0 (autoclean) (unused) ipv6 213212 -1 (autoclean) key 65012 0 (autoclean) [ipv6] snd-maestro3 14188 1 snd-pcm 67616 0 [snd-pcm-oss snd-maestro3] snd-page-alloc 6516 0 [snd-pcm] snd-timer 15424 0 [snd-pcm] snd-ac97-codec 40440 0 [snd-maestro3] snd 36260 0 [snd-pcm-oss snd-mixer-oss snd-maestro3 snd-pcm snd-timer snd-ac97-codec] soundcore 3684 0 [snd] ds 6752 2 yenta_socket 10304 2 pcmcia_core 44544 0 [ds yenta_socket] visor 11144 0 (unused) usbserial 19964 0 [visor] joydev 5248 0 (unused) evdev 3904 0 (unused) input 3488 0 [joydev evdev] uhci 24688 0 (unused) usbcore 59488 1 [visor usbserial uhci] af_packet 12712 1 (autoclean) 3c59x 26512 1 i8k 5448 0 (unused) nls_iso8859-1 2844 1 (autoclean) nls_cp437 4348 1 (autoclean) vfat 9996 1 (autoclean) fat 31384 0 (autoclean) [vfat] lvm-mod 63616 0 (autoclean) sg 32448 0 (autoclean) (unused) scsi_mod 97196 2 (autoclean) [sd_mod sg] ide-cd 30208 0 (autoclean) cdrom 26368 0 (autoclean) [ide-cd] reiserfs 204244 2
dardhal, can reproduce the issue with the document you provided, will investigate further ....
In agreement with TZ retargeted to OOo 1.1.3, unfortunately too late for 1.1.2 and seems to be risky.
I confirm that behavior in RedHat 9 using OOo 1.1.1. I spent more than 5 minutes waiting for a calc document to finish saving while another process was in an infinite loop. The progess bar went fast as ever to the middle of the screen and it took it about 5 minutes to get the the end. This happened until I noticed there was a process eating all the CPU. I killed it and OO.o saved the document in 3 seconds as allways. Other tasks were fast, and even working in the spreadheet enering data and scrolling was as fast as ever. Only the save process was slowed down right after the progress bar reached the middle. RedHat 9 uses a 2.4.20 kernel, but it has backported features from the 2.6.x series. I used the OO.o 1.1.1 from the oficial page, not a RedHat build.
I cannot reproduce it using Fedora Core 2 with preempitve kernel and Fedora's OO.o build. I run yes > /dev/null nd I see it is taking about 95% of CPU but OO.o saves a fairly big spreadsheet as fast as ever.
This problem seems to be related to "osl_yieldThread" in vcl/unx/source/app/saldata.cxx. "osl_yieldThread" calls "sched_yield". Just to cite the documentation: Technically, `sched_yield' causes the calling process to be made immediately ready to run (as opposed to running, which is what it was before). This means that if it has absolute priority higher than 0, it gets pushed onto the tail of the queue of processes that share its absolute priority and are ready to run, and it will run again when its turn next arrives. If its absolute priority is 0, it is more complicated, but still has the effect of yielding the CPU to other processes. I will check with the owner what we can do about this.
PL, as discussed, you are so kind to take care of this and to remove the "osl_yieldThread" (saldata.cxx:872). Thanks :-).
removed the osl_yieldThread statement as requested by kr. commited in CWS vclppbugs4
reopen for verification
pl->kr: please verify in CWS vclppbugs4
fixed
Verified in vclppbugs4.
Verified in OOo 1.1.3 RC. Dardhal, please verify as well and reopen it if not working.
I am currently downloading the following build from a nearby mirror: -rw-r--r-- 80123557 sep 16 16:14 OOo_1.1.3rc_LinuxIntel_install.tar.gz My modem is at full throttle, and is showing an ETA of 4 hours, so maybe I will have to wait until tomorrow to install this build and check if the problem is indeed gone. I will report back as soon as possible.
I have just tested the mentioned build, and the problem is gone. I have created a heavy background CPU load (consisting on six to seven x-terminals executing "yes"), loaded a document in OpenOffice 1.13rc Writer, modified it and saved...and the save speed is very good, nearly as good as the one without background CPU utilization. So the bug is gone, at least for me. Even the whole user interface seems now much more responsive under heavy load than before without any load. For example, an "Open..." dialog now changes directories quite fast, but before this release is was noticeably slower. However, take into account that my software setup have changed since the last time I tested the bug. I am now using Linux kernel 2.6.9-rc1 and libc 2.3.2.ds1-12 (from Debian Sid). Hope it helps, thank you all.
Thanks for checking this. Kay