Apache OpenOffice (AOO) Bugzilla – Issue 117252
Data race due to non-atomic half-word accesses in sal/osl/unx/thread.c
Last modified: 2013-01-29 21:38:54 UTC
thread.c defines Thread_Impl as: typedef struct osl_thread_impl_st { pthread_t m_hThread; sal_uInt16 m_Ident; /* @@@ see TODO @@@ */ short m_Flags; oslWorkerFunction m_WorkerFunction; void* m_pData; pthread_mutex_t m_Lock; pthread_cond_t m_Cond; } Thread_Impl; m_Ident and m_Flags are half-words mapping onto the same 32-bit word. Therefore writes to one of these half-words could conflict with writes to the other half word. Suggestion: add padding members to have each member in its own word.
Created attachment 76040 [details] Adds padding members
Hi Oli, as far as I can see, all accesses to m_Ident respectively m_flags are either protected by acquiring m_Lock or be ensuring that only one reference exists. Am I missing anything?
I'm currently evaluating Intel Inspector on our code using UNO, and it's telling me the following: Data race between the following lines (and in this execution order): osl_joinWithThread: pImpl->m_Flags &= ~THREADIMPL_FLAGS_ATTACHED; and osl_getThreadIdentifier: Ident = pImpl->m_Ident; Sadly the tool didn't report complete stack traces, so I don't know who's calling joinWithThread and getThreadIdentifier. So I assume the following: thread B wants to join thread A and write the m_Flags member (half-word write) thread A is still executing and reads its thread id -> data race However, the thread id should still be valid, even without adding padding. The half-word write can't temporarily destroy the other half-word, can it?
What is unclean in the code (and apparently triggers the complaint) is that osl_getThreadIdentifier accesses m_Ident with m_Lock not locked (while osl_thread_start_Impl accesses it with m_Lock locked). One could argue that the complaint is false, though: For one, proper visibility by all threads of m_Ident written by osl_thread_start_Impl is guaranteed at read of m_Ident in osl_getThreadIdentifier, as there is synchronization on m_Cond/m_Lock in between. For another, modifications of m_Flags should not clobber m_Ident (it would be a problem if m_Ident written to instead of only read from in osl_getThreadIdentifier). Anyway, changing osl_getThreadIdentifier to lock m_Lock before accessing m_Ident would make the code cleaner.
Writes to m_Flags may not change m_Ident, even not temporarily because out-of-order-execution or instruction reordering. Locking the structure likely fixes the complaint, another fix is adding some padding as suggested by Oli. ->Oli: Could you verify that acquiring the lock before accessing m_Ident fixes the complaint? ->SB: As you said, the systematical right fix is acquiring the lock. Though for performance reasons I suggest to go with the padding approach. Comments?
"Writes to m_Flags may not change m_Ident, even not temporarily [due to] out-of-order-execution or instruction reordering." At least theoretically this is wrong. There is nothing in C++ nor Pthreads that would prohibit this. Practically though, this is more than unlikely (hence I used "should not"). "[The] systematical right fix is acquiring the lock. Though for performance reasons I suggest to go with the padding approach." I would opt for locking. For one, stick to "first make it right, then make it fast" (what makes you assume that locking here will degrade overall performance?). For another, padding would not make it that much better: as we argued, for practical purposes padding should not be needed; but for theoretical purposes, padding 16 bits between m_Ident and m_Flags would not change anything; the only thing that would be improved is the interaction with tools like Intel Inspector (which is improved by locking just as well).
Padding does address the complaints, practically solving this issue without the need to worry anymore, changing performance characteristics may lead to rework ;-) Anyway, please take over and change according to your preferences.
I checked the padding approach, this seems to satisfy the Inspector. I will not try the locking approach, since this will work anyway to get rid of the error.
getting rid of value "enhancement" for field "severity". For enhancement the field "issue type" shall be used.