Apache OpenOffice (AOO) Bugzilla – Issue 18835
genbrk core dump during icu build
Last modified: 2013-08-07 15:34:48 UTC
genbrk attempts to unlock an invalid mutex (output below) My guess is that this occurs because the mutex is destroyed and then used again. If you change the initialization of the mutexes to PTHREAD_MUTEX_INITIALIZER it still core dumps but with a seg fault. In otherwords ensuring the mutex is good doesn't solve the problem. ICU_DATA=../data/out/build LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH ../tools/genbrk/genbrk -r ../data/brkitr/char.txt -o ../data/out/build/icudt22l_char.brk genbrk: Error detected by libpthread: Invalid mutex. Detected by file "/usr/src/lib/libpthread/pthread_mutex.c", line 312, function "pthread_mutex_unlock". See pthread(3) for information. [1] Abort trap (core dumped) ICU_DATA=../data... gmake[1]: *** [../data/out/build/icudt22l_char.brk] Error 134 gmake[1]: Leaving directory `/home/work/openoffice/3rd-party/openoffice/icu/unxbsd.pro/misc/build/icu/source/data' gmake: *** [all-recursive] Error 2 dmake: Error code 2, while making './unxbsd.pro/misc/build/so_built_so_icu' ---* TG_SLO.MK *---
mh->er: do you have any idea ?
Not the slightest idea. Shouldn't a severe error such as invalid mutex handling also occur on other platforms? I therefore assume the real cause to be something else. The only thing I could say is go to IBM's ICU site http://oss.software.ibm.com/icu/ and look for a hint there or in their mailing lists. Unfortunately NetBSD is listed as "rarely tested" in the supported platforms table, see readme.html of the ICU distribution.
I have evaluated the problem further and have discovered that this is infact an icu bug. Immediate files of interest include unistr.cpp umutex.c. The mutex that is being passed (in this case to pthread_mutex_lock) a simple trace of the execution shows genbrk creating instances of UnicodeString using them and then destroying them. U_CAPI int32_t U_EXPORT2 umtx_atomic_inc(int32_t *p) is never called prior to the destruction of any of the UnicodeString instances. When the destructor is called for those instances consequently U_CAPI int32_t U_EXPORT2 umtx_atomic_dec(int32_t *p) is called which causes the (uninitialized) mutex to be locked and thus resulting in exit due to assertion. There are a number of hacks that could be put in place to avoid the immediate crash, such as initialize in umtx_atomic_dec or check if mutex is valid. Doing such hacks allows genbrk to do what it should (as far as I can tell) but the bottom line is there are underlying problems that should likely be resolved. I have reported this problem to the icu people but I get the feeling that they won't make fixing it a priority because i'm just a random joe w/ an obscure platform.
I just had a short glance at unistr.cpp and umutex.c, the "umtx_atomic_dec() is called without umtx_atomic_inc() being ever called" really seems to be the problem here, since only umtx_atomic_inc() initializes the global static mutex. However, the UnicodeString dtor only calls umtx_atomic_dec() via removeRef() in releaseArray(), which checks for (fFlags & kRefCounted) first. Now, it seems that fFlags contains only kRefCounted if allocate() was called for a long string (kLongString is aliased to kRefCounted), but there the refcount is directly initialized. Note that this is all without having debugged or anything, just looking at the sources. IMHO the problem could be boiled down to properly handle the allocate() case by not directly using *array++ = 1; but something like *array = 0; umtx_atomic_inc(*array); ++array; instead, so the global static mutex would be initialized. I didn't try this but it could work. Under performance views of cause a proper one-time-initialization during startup could be preferred. Btw: Did you file a bug against the ICU (http://www.jtcsv.com/cgi-bin/icu-bugs) or how did you report it? If not, please do so. If yes, what is that BugID?
Yes, I filed a bug and it's currently listed as [icu-bug] incoming/3232. I agree, it is a matter of umtx_atomic_dec being called prior to any call to umtx_atomic_inc. The curious thing is (and I have debugged it some) is that the UnicodeString instances that are kLongString when they are destroyed but kShortString when they are created. So somewhere between construction and destruction the flags are being modified, I haven't identified where or why so I can't say if it's intended or not. On a side note, I have a number of issues filed and all but one stay in the state UNCONFIRMED, is there a method to this maddness?
Tyler, Thanks for filing the bug in the ICU bug tracker. Please try if a simple int32_t init_mutex = 0: umtx_atomic_inc( &init_mutex ); inserted somewhere at the beginning of the main() routine of genbrk fixes the crash. Please attach the patch here if it does. The UnicodeString may be converted from kShortString to kLongString by means of the allocate() member method, maybe if characters are appended to the string. Regarding the UNCONFIRMED state of this (and other) issues: Normally developers don't have unconfirmed issues assigned to them (if not directly assigned by someone knowing them to be responsible for a specific area), and QA members pick unconfirmed issues and try to verify whether the issue is a real issue or a duplicate or whatever, and then confirm the issue and forward it to a developer. This is a bit quirky in your case because most likely there is noone else who'd verify the issues because you're probably the only one who's building for NetBSD. I'll change that state for this issue, since I can't do anything else than believe you ;-)
Actually, I have already tried this. Since the mutexes in question are declared as static I have initialized them with PTHREAD_MUTEX_INITIALIZER (if you look back to my original report). What occurs is that genbrk goes further but eventually core dumps due to a bad pointer elsewhere. I'l go back and do this again and try to give some useful details when I get time.
Hi Tyler, Since there isn't much I can do about this, I reassign this issue to you. Please update if there is any new information available, I'm on CC. Adding http://www.jtcsv.com/cgibin/icu-bugs?findid=3232 here as a quick link to the ICU bug tracking system. Thanks Eike
Tyler, as the target is OOo1.1.1 and there hasn't been feedback for a whild, I re-target this one to OOo1.1.2 now.
Hi Michael, But why set to RESOLVED INVALID? Eike
Eike, I don't know :-) Just wanted to change the target.
retarget to 1.1.3, we are running out of time for 1.1.2
Since this is ICU bug, I do not thing we can solve it in time for 1.1.x. They seem to fix it in ICU 2.8 by rewriting the synchronization code (see http://www.jtcsv.com/cgibin/icu-bugs?findid=3014) so we wont be able to fix this properly for 1.1.x. Retargeting to OOoLater. 2.0 now uses ICU 2.6. ttyler: can you reproduce the same on 2.0?
I'm seeing this under ARM Linux running Debian unstable whilst building 2.o :-|
chg: target to PleaseHelp.
No additional insights since almost 2 years. Though OOo currently (2.0.3/4) still uses ICU 2.6, we're upgrading to 3.4/3.6 now, see CWS 'icuupgrade'. So if the mutex code was rewritten in ICU 2.8 we'll benefit from it. I'm closing this issue now.
Closing.