Issue 22761 - (still) Terrible performance on kernels 2.6.x while any CPU-hog running
Summary: (still) Terrible performance on kernels 2.6.x while any CPU-hog running
Status: CLOSED FIXED
Alias: None
Product: Writer
Classification: Application
Component: code (show other issues)
Version: OOo 1.1 RC5
Hardware: PC Linux, all
: P2 Trivial (vote)
Target Milestone: ---
Assignee: kay.ramme
QA Contact: issues@sw
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-11-23 17:24 UTC by dardhal
Modified: 2013-08-07 14:44 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description dardhal 2003-11-23 17:24:21 UTC
This problem with OO was supposedly fixed several releases ago, and it seemed to
be due to some unusual usage of Linux's sched_yield() in OO.

Now I am trying OO 1.1.0-RC5,and even 680m13 on a Linux kernel 2.6.0-test9-mm5
and the problem is still there. It is very simple to trigger:
a) Start any CPU-hog on the box, for example a simple "yes" works well for this
purpose.
b) Try to use OpenOffice. For example, try to save some modified file: it takes
ages to finish, because the OpenOffice process hardly gets any CPU time to
complete the task.

I have been told that some vendors package OpenOffice with a fix for the cause
of this problem, but it seems the original versions from www.openoffice.org are
still missing this fix. Without this fix OO can be virtually unusable on boxes
running kernels 2.6.x with some load behind the scenes.
Comment 1 ulf.stroehler 2003-12-23 14:56:14 UTC
Transferring to KR.

US->KR: as said on the phone you volunteered to evaluate/dispatch this issue.
Thanks a lot!
Comment 2 kay.ramme 2004-01-14 13:55:30 UTC
Jose, we tried to reproduce this with a Fedora&2.6.0 and did not see the
described behaviour. Could you give as more system information and may update
your kernel to 2.6.0 and see if it still behaves the same?
Comment 3 kay.ramme 2004-01-14 14:23:36 UTC
P.S.: We tried SRC680 and OOo1.1.
Comment 4 dardhal 2004-01-17 11:38:04 UTC
Additional information. All the following data was gathered on a Linux Debian
Sid box running Linux kernel version 2.6.1-rc3 and libc6 2.3.2.ds1-10. The test
was to save a .sxw of aproximately 125 pages with no graphics, and 140 KiB in size.

OOo_1.1rc5_030926_LinuxIntel_install_es.tar.gz
----------------------------------------------
No load: 2 seconds.
"yes" running on an "xterm": 242 seconds.

OOo_680m13_LinuxIntel_install.tar.gz
------------------------------------
No load: 2 seconds.
"yes" running on an "xterm": after more than 18 minutes, I stopped "yes" and in
a second the "save" ended.


Under load, with both versions the progress bar goes as fast as without load up
to the middle, but from that point it slows down.

Hope it helps.
Comment 5 kay.ramme 2004-01-26 09:07:04 UTC
Hi, unfortunately I didn't have a Sid available, tried instead on a Sarge
(Debian testing) with stock 2.6.1 kernel. Everything works fine. Just a guess, a
you sure that DMA is enabled?
Comment 6 kay.ramme 2004-02-10 14:50:04 UTC
This still works for me, even when disabling DMA. So, new state is "works for me".
Comment 7 kay.ramme 2004-02-25 09:28:07 UTC
.
Comment 8 rlk 2004-04-20 18:55:20 UTC
I can reproduce this problem (or one like it) on a 2.4 kernel with
ftp://66.92.65.9/pub/daily-temperature.sxc (a fairly large spreadsheet, to be
sure) and OOo 1.1.1 (downloaded binary from OpenOffice.org).  Specifically, I'm
running SuSE 8.1 with the k_deflt-2.4.21-203 RPM kernel (stock), on a Dell
Inspiron 8000 laptop with a 1600x1200 screen, 512 MB of RAM, and XFree86 4.3. 
Using glibc 2.2.5-184 and libstdc++-3.2.2-45 from SuSE's update site.  It is of
course possible that SuSE back ported something from 2.6, but the point is that
I have a reproducible test case.  I'm nowhere near out of memory:

$ free
             total       used       free     shared    buffers     cached
Mem:        514804     463952      50852          0      52012     187668
-/+ buffers/cache:     224272     290532
Swap:       530104      10160     519944


The command I run is "yes > /dev/null" (indeed, even "nice -20 yes > /dev/null"
exhibits the same problem).  The exact observed symptoms are that OOo completes
the initialization through the splash screen quickly, and starts loading the
document.  The first two dots on the progress bar pop up right away, and then it
slows to a crawl.  I've observed the same behavior while saving, but don't have
numbers.

Note that merely starting up OOo without a document doesn't really exhibit this
problem.  It took about 7 seconds with the system idle and 10 seconds with yes
running.

Notice that ps shows "yes" getting virtually all of the CPU, while OOo gets
almost none.  Note that I'm running "yes" at nice 20:

rlk      32391 92.6  0.0  1476  468 pts/1    RN   13:39   3:09 yes
rlk      32392  0.0  0.0  1248  280 pts/3    S    13:39   0:00 /usr/bin/time /us
rlk      32393  2.6 10.6 133228 55084 pts/3  R    13:39   0:05 /usr/local/OpenOf

rlk      32391 93.2  0.0  1476  468 pts/1    RN   13:39   4:11 yes
rlk      32393  1.9 11.0 134892 56724 pts/3  R    13:39   0:05 /usr/local/OpenOf

rlk      32391 92.6  0.0  1476  468 pts/1    RN   13:39   9:04 yes
rlk      32392  0.0  0.0  1248  280 pts/3    S    13:39   0:00 /usr/bin/time /us
rlk      32393  0.9 12.3 141868 63640 pts/3  R    13:39   0:05 /usr/local/OpenOf


It took about 14 minutes to complete.  In contrast, this spreadsheet loads in 40
seconds if the system is idle.

Here is my modules list.  I can retry without the taint if need be, but I
believe I've done this before without the tainting module being loaded.

Module                  Size  Used by    Tainted: P  
snd-pcm-oss            46432   0 (autoclean)
vpnmod                188864   0 (unused)
snd-mixer-oss          14072   1 (autoclean) [snd-pcm-oss]
isa-pnp                31100   0 (unused)
parport_pc             25928   1 (autoclean)
lp                      6272   0 (autoclean)
parport                23424   1 (autoclean) [parport_pc lp]
sd_mod                 12960   0 (autoclean) (unused)
ipv6                  213212  -1 (autoclean)
key                    65012   0 (autoclean) [ipv6]
snd-maestro3           14188   1
snd-pcm                67616   0 [snd-pcm-oss snd-maestro3]
snd-page-alloc          6516   0 [snd-pcm]
snd-timer              15424   0 [snd-pcm]
snd-ac97-codec         40440   0 [snd-maestro3]
snd                    36260   0 [snd-pcm-oss snd-mixer-oss snd-maestro3 snd-pcm
snd-timer snd-ac97-codec]
soundcore               3684   0 [snd]
ds                      6752   2
yenta_socket           10304   2
pcmcia_core            44544   0 [ds yenta_socket]
visor                  11144   0 (unused)
usbserial              19964   0 [visor]
joydev                  5248   0 (unused)
evdev                   3904   0 (unused)
input                   3488   0 [joydev evdev]
uhci                   24688   0 (unused)
usbcore                59488   1 [visor usbserial uhci]
af_packet              12712   1 (autoclean)
3c59x                  26512   1
i8k                     5448   0 (unused)
nls_iso8859-1           2844   1 (autoclean)
nls_cp437               4348   1 (autoclean)
vfat                    9996   1 (autoclean)
fat                    31384   0 (autoclean) [vfat]
lvm-mod                63616   0 (autoclean)
sg                     32448   0 (autoclean) (unused)
scsi_mod               97196   2 (autoclean) [sd_mod sg]
ide-cd                 30208   0 (autoclean)
cdrom                  26368   0 (autoclean) [ide-cd]
reiserfs              204244   2
Comment 9 kay.ramme 2004-04-21 11:48:05 UTC
dardhal, can reproduce the issue with the document you provided, will
investigate further ....
Comment 10 kay.ramme 2004-04-21 11:49:36 UTC
.
Comment 11 kay.ramme 2004-04-27 16:54:29 UTC
In agreement with TZ retargeted to OOo 1.1.3, unfortunately too late for 1.1.2
and seems to be risky.
Comment 12 fedetxf 2004-06-15 16:14:27 UTC
I confirm that behavior in RedHat 9 using OOo 1.1.1.
I spent more than 5 minutes waiting for a calc document to finish saving while
another process was in an infinite loop. The progess bar went fast as ever to
the middle of the screen and it took it about 5 minutes to get the the end. This
happened until I noticed there was a process eating all the CPU. I killed it and
OO.o saved the document in 3 seconds as allways. Other tasks were fast, and even
working in the spreadheet enering data and scrolling was as fast as ever. Only
the save process was slowed down right after the progress bar reached the middle.
RedHat 9 uses a 2.4.20 kernel, but it has backported features from the 2.6.x series.
I used the OO.o 1.1.1 from the oficial page, not a RedHat build.
Comment 13 fedetxf 2004-06-16 02:44:50 UTC
I cannot reproduce it using Fedora Core 2 with preempitve kernel and Fedora's
OO.o build. I run yes > /dev/null nd I see it is taking about 95% of CPU but
OO.o saves a fairly big spreadsheet as fast as ever.
Comment 14 kay.ramme 2004-06-28 12:50:29 UTC
This problem seems to be related to "osl_yieldThread" in
vcl/unx/source/app/saldata.cxx. "osl_yieldThread" calls "sched_yield". Just to
cite the documentation:

  Technically, `sched_yield' causes the calling process to be made
     immediately ready to run (as opposed to running, which is what it
     was before).  This means that if it has absolute priority higher
     than 0, it gets pushed onto the tail of the queue of processes
     that share its absolute priority and are ready to run, and it will
     run again when its turn next arrives.  If its absolute priority is
     0, it is more complicated, but still has the effect of yielding
     the CPU to other processes.

I will check with the owner what we can do about this.

Comment 15 kay.ramme 2004-06-28 13:18:06 UTC
PL, as discussed, you are so kind to take care of this and to remove the
"osl_yieldThread" (saldata.cxx:872). Thanks :-).
Comment 16 philipp.lohmann 2004-06-28 16:38:57 UTC
removed the osl_yieldThread statement as requested by kr.

commited in CWS vclppbugs4
Comment 17 philipp.lohmann 2004-07-01 11:23:47 UTC
reopen for verification
Comment 18 philipp.lohmann 2004-07-01 11:59:53 UTC
pl->kr: please verify in CWS vclppbugs4
Comment 19 philipp.lohmann 2004-07-01 12:00:14 UTC
fixed
Comment 20 kay.ramme 2004-07-02 08:10:29 UTC
Verified in vclppbugs4.
Comment 21 kay.ramme 2004-09-28 08:47:16 UTC
Verified in OOo 1.1.3 RC. Dardhal, please verify as well and reopen it if not
working.


Comment 22 dardhal 2004-09-28 19:46:34 UTC
I am currently downloading the following build from a nearby mirror:
-rw-r--r--     80123557   sep 16 16:14   OOo_1.1.3rc_LinuxIntel_install.tar.gz

My modem is at full throttle, and is showing an ETA of 4 hours, so maybe I will
have to wait until tomorrow to install this build and check if the problem is
indeed gone. I will report back as soon as possible.
Comment 23 dardhal 2004-09-29 18:53:51 UTC
I have just tested the mentioned build, and the problem is gone. I have created
a heavy background CPU load (consisting on six to seven x-terminals executing
"yes"), loaded a document in OpenOffice 1.13rc Writer, modified it and
saved...and the save speed is very good, nearly as good as the one without
background CPU utilization.

So the bug is gone, at least for me. Even the whole user interface seems now
much more responsive under heavy load than before without any load. For example,
an "Open..." dialog now changes directories quite fast, but before this release
is was noticeably slower.

However, take into account that my software setup have changed since the last
time I tested the bug. I am now using Linux kernel 2.6.9-rc1 and libc
2.3.2.ds1-12 (from Debian Sid).

Hope it helps, thank you all.
Comment 24 kay.ramme 2004-10-01 17:08:41 UTC
Thanks for checking this.

Kay