18835 – genbrk core dump during icu build

Issue 18835 - genbrk core dump during icu build

Summary: genbrk core dump during icu build

Status:	CLOSED NOT_AN_OOO_ISSUE

Alias:	None

Product:	Build Tools
Classification:	Code
Component:	code (show other issues)
Version:	OOo 1.1 RC3
Hardware:	PC NetBSD

Importance:	P1 (highest) Trivial (vote)
Target Milestone:	---
Assignee:	Unknown
QA Contact:	issues@tools

URL:
Keywords:

Depends on:
Blocks:

Reported:	2003-08-30 07:32 UTC by Unknown
Modified:	2013-08-07 15:34 UTC (History)
CC List:	5 users (show)

See Also:
Issue Type:	DEFECT
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description Unknown 2003-08-30 07:32:36 UTC

genbrk attempts to unlock an invalid mutex (output below) 
 
My guess is that this occurs because the mutex is destroyed and then used again.  If you 
change the initialization of the mutexes to PTHREAD_MUTEX_INITIALIZER it still core dumps 
but with a seg fault.  In otherwords ensuring the mutex is good doesn't solve the problem. 
 
 
ICU_DATA=../data/out/build 
LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH 
../tools/genbrk/genbrk -r ../data/brkitr/char.txt -o ../data/out/build/icudt22l_char.brk 
genbrk: Error detected by libpthread: Invalid mutex. 
Detected by file "/usr/src/lib/libpthread/pthread_mutex.c", line 312, function 
"pthread_mutex_unlock". 
See pthread(3) for information. 
[1]   Abort trap (core dumped) ICU_DATA=../data... 
gmake[1]: *** [../data/out/build/icudt22l_char.brk] Error 134 
gmake[1]: Leaving directory 
`/home/work/openoffice/3rd-party/openoffice/icu/unxbsd.pro/misc/build/icu/source/data' 
gmake: *** [all-recursive] Error 2 
dmake:  Error code 2, while making './unxbsd.pro/misc/build/so_built_so_icu' 
---* TG_SLO.MK *---

Comment 1 Martin Hollmichel 2003-09-02 14:31:48 UTC

mh->er: do you have any idea ?

Comment 2 ooo 2003-09-02 15:49:07 UTC

Not the slightest idea. Shouldn't a severe error such as invalid mutex
handling also occur on other platforms? I therefore assume the real
cause to be something else. The only thing I could say is go to IBM's
ICU site http://oss.software.ibm.com/icu/ and look for a hint there or
in their mailing lists. Unfortunately NetBSD is listed as "rarely
tested" in the supported platforms table, see readme.html of the ICU
distribution.

Comment 3 Unknown 2003-09-05 15:53:53 UTC

I have evaluated the problem further and have discovered that this is infact     
an icu bug.  Immediate files of interest include unistr.cpp umutex.c.            
                                                                                 
The mutex that is being passed (in this case to pthread_mutex_lock) a simple     
trace of the execution shows genbrk creating instances of UnicodeString          
using them and then destroying them.                                             
                                                                                 
U_CAPI int32_t U_EXPORT2 umtx_atomic_inc(int32_t *p) is never called prior       
to the destruction of any of the UnicodeString instances.  When the              
destructor is called for those instances consequently      
U_CAPI int32_t U_EXPORT2 umtx_atomic_dec(int32_t *p) is called which causes      
the (uninitialized) mutex to be locked and thus resulting in exit due to         
assertion.                                                                       
                                                                                 
There are a number of hacks that could be put in place to avoid the immediate    
crash, such as initialize in umtx_atomic_dec or check if mutex is valid.         
Doing such hacks allows genbrk to do what it should (as far as I can tell)       
but the bottom line is there are underlying problems that should likely          
be resolved.  
 
I have reported this problem to the icu people but I get the feeling that they won't 
make fixing it a priority because i'm just a random joe w/ an obscure platform.

Comment 4 ooo 2003-09-08 13:26:35 UTC

I just had a short glance at unistr.cpp and umutex.c, the
"umtx_atomic_dec() is called without umtx_atomic_inc() being ever
called" really seems to be the problem here, since only
umtx_atomic_inc() initializes the global static mutex. However, the
UnicodeString dtor only calls umtx_atomic_dec() via removeRef() in
releaseArray(), which checks for (fFlags & kRefCounted) first. Now, it
seems that fFlags contains only kRefCounted if allocate() was called
for a long string (kLongString is aliased to kRefCounted), but there
the refcount is directly initialized. Note that this is all without
having debugged or anything, just looking at the sources.

IMHO the problem could be boiled down to properly handle the
allocate() case by not directly using
*array++ = 1;
but something like
*array = 0;
umtx_atomic_inc(*array);
++array;
instead, so the global static mutex would be initialized. I didn't try
this but it could work. Under performance views of cause a proper
one-time-initialization during startup could be preferred.

Btw: Did you file a bug against the ICU
(http://www.jtcsv.com/cgi-bin/icu-bugs) or how did you report it? If
not, please do so. If yes, what is that BugID?

Comment 5 Unknown 2003-09-08 14:56:28 UTC

Yes, I filed a bug and it's currently listed as [icu-bug] 
incoming/3232. 
 
I agree, it is a matter of umtx_atomic_dec being called prior to any 
call to umtx_atomic_inc. 
 
The curious thing is (and I have debugged it some) is that the 
UnicodeString instances that are kLongString when they are 
destroyed but kShortString when they are created.  So 
somewhere between construction and destruction the flags are 
being modified, I haven't identified where or why so I can't say if 
it's intended or not. 
 
On a side note, I have a number of issues filed and all but one stay 
in the state UNCONFIRMED, is there a method to this maddness?

Comment 6 ooo 2003-09-09 13:21:32 UTC

Tyler,

Thanks for filing the bug in the ICU bug tracker.

Please try if a simple

int32_t init_mutex = 0:
umtx_atomic_inc( &init_mutex );

inserted somewhere at the beginning of the main() routine of genbrk
fixes the crash. Please attach the patch here if it does.

The UnicodeString may be converted from kShortString to kLongString by
means of the allocate() member method, maybe if characters are
appended to the string.

Regarding the UNCONFIRMED state of this (and other) issues: Normally
developers don't have unconfirmed issues assigned to them (if not
directly assigned by someone knowing them to be responsible for a
specific area), and QA members pick unconfirmed issues and try to
verify whether the issue is a real issue or a duplicate or whatever,
and then confirm the issue and forward it to a developer. This is a
bit quirky in your case because most likely there is noone else who'd
verify the issues because you're probably the only one who's building
for NetBSD.

I'll change that state for this issue, since I can't do anything else
than believe you ;-)

Comment 7 Unknown 2003-09-09 14:32:25 UTC

Actually, I have already tried this. 
 
Since the mutexes in question are declared as static I have initialized them with 
PTHREAD_MUTEX_INITIALIZER (if you look back to my original report). 
 
What occurs is that genbrk goes further but eventually core dumps due to a bad 
pointer elsewhere.  I'l go back and do this again and try to give some useful 
details when I get time.

Comment 8 ooo 2003-11-26 13:10:29 UTC

Hi Tyler,

Since there isn't much I can do about this, I reassign this issue to
you. Please update if there is any new information available, I'm on CC. 

Adding http://www.jtcsv.com/cgibin/icu-bugs?findid=3232 here as a
quick link to the ICU bug tracking system.

Thanks
Eike

Comment 9 michael.bemmer 2003-12-19 12:08:21 UTC

Tyler, as the target is OOo1.1.1 and there hasn't been feedback for a whild,  I
re-target this one to OOo1.1.2 now.

Comment 10 ooo 2004-01-06 12:03:56 UTC

Hi Michael,

But why set to RESOLVED INVALID?

Eike

Comment 11 michael.bemmer 2004-01-06 14:23:32 UTC

Eike, I don't know :-) Just wanted to change the target.

Comment 12 Martin Hollmichel 2004-04-23 17:32:53 UTC

retarget to 1.1.3, we are running out of time for 1.1.2

Comment 13 pavel 2004-08-17 11:56:08 UTC

Since this is ICU bug, I do not thing we can solve it in time for 1.1.x.

They seem to fix it in ICU 2.8 by rewriting the synchronization code (see
http://www.jtcsv.com/cgibin/icu-bugs?findid=3014) so we wont be able to fix this
properly for 1.1.x.

Retargeting to OOoLater.

2.0 now uses ICU 2.6.

ttyler: can you reproduce the same on 2.0?

Comment 14 pnaulls 2004-08-20 11:34:43 UTC

I'm seeing this under ARM Linux running Debian unstable whilst building 2.o :-|

Comment 15 Martin Hollmichel 2004-10-29 15:45:30 UTC

chg: target to PleaseHelp.

Comment 16 ooo 2006-08-02 11:13:32 UTC

No additional insights since almost 2 years. Though OOo currently (2.0.3/4)
still uses ICU 2.6, we're upgrading to 3.4/3.6 now, see CWS 'icuupgrade'. So if
the mutex code was rewritten in ICU 2.8 we'll benefit from it. I'm closing this
issue now.

Comment 17 ooo 2006-08-02 11:13:56 UTC

Closing.