Apache OpenOffice (AOO) Bugzilla – Issue 26595
PrinterInfoManager/SystemQueueInfo race condition
Last modified: 2006-04-04 20:59:15 UTC
In debugging the Mac OS X printer issues, there appears to be a race condition in printerinfomanager.cxx. PrinterInfoManager object: 1) c'tor creates the SystemQueueInfo object/thread a) SystemQueueInfo object calls Thread::create(), which then creates the thread and schedules run() 2) c'tor calls initialize() 3) initialize() does lots of stuff 4) initialize() calls getSystemPrintQueues() and retrieves what hopefully are the results of SystemQueueInfo::run() Somewhere between 1 and 4, run() should execute. However, there is no way to prevent PrinterInfoManager from knowing whether run() has finished or not. Theoretically, PrinterInfoManager functions should block on run() completing successfully before these functions attempt to access its data. I am observing behavior where SystemQueueInfo::run() does not complete its run by the time step (4) comes around. Therefore, there are no print queues returned to PrinterInfoManager object and it does not report any queues to OOo the first run through. I am also observing behavior where when the /etc/cups/printers.conf watch file is changed, that the PrinterInfoManager object is too quickly calling getSystemPrintQueues() in initialize() and hasChanged still != 0. The remedy seems to be this patch, although it still doesn't account for the lag time between when create() is called and when run() actually is called, which may or may not cause the race condition to still occur. An ideal situation would be one where the PrinterInfoManager has a callback of sorts that updates its printer list, rather than having the PrinterInfoManager keep checking SystemQueueInfo. ie, SQI notifies the PIM it has complete data. This would solve the possible problem of the system queue info querying (trying all print commands for system queues) taking a longer time than the PIM can cope with as well.
Created attachment 13863 [details] patch to make race condition much less likely
cp/pl: are there any problems with moving the Mutex up in the code? Before the patch, the Mutex was quite localized and only protected the actual data that is changed. With the patch, the whole routine is of course inside the Mutex. The desired effect here is to ensure that ->hasChanged() does not return until any pending run() is complete...
This bug pretty much blocks OS X release of 1.1.1 since no system print queues show up otherwise.
I think we shouldn't apply this as the whole point of the thread is that hasChanged may return befor it has finished. The problem being solved here is that the thread may never finish due to a hanging lpc command; if that would always finish the thread wouldn't be needed at all.
How do no system queues show up ? Every time a frame (e.g. dialogue, document) gets the focus, PrinterInfoManager::checkPrintersChanged should get called and if printers are available by then, then the application is told so. Is probably something amiss so that checkPrintersChanged does not really get called ?
cp->fa: please check as Philipp suggested, the thread is not required to terminate early but essentially only befor you actually try to print. Is the thread still running when you try to do so ? I wouldn't mind if you change the code for Mac only but "hanging print queues" _is_ an issue so that I cannot accept to move the mutex in general.
according to IRC agreement, retarget to 1.1.2 to get proper fix and not hold 1.1.1 release for main platforms. Final patch will be applied manually to MacOS X build.
Hi Philipp, I am trying to figure this out as well. I never see the problem that Dan describes and I am worried that if we apply tha patch, some users may see a long hang or startup delay. I noticed you said that the race is not improtant since every focus in on a new frame updates the information. When I looked at that code I see the following: if( GetSalData()->pFirstInstance_->maInstData.mbPrinterInit ) vcl_sal::PrinterUpdate::update(); So mbPrinterInit must be true for this to happen. Since that value is initialized to false, it must be first set to true. I see this happening in: CreateInfoPrinter CreatePrinter GetPrinterQueueInfo GetPrinterQueueState GetDefaultPrinter. My question is this, on a clean start after a fresh install (with no use of spadmin) which of these routines gets called first. Could the race be that when the File->Printer Settings focus is done, that mbPrinterInit is not true yet (no jobs have printed yet, no timers have expired yet). Since mbPrinterInit is never changed after that, perhaps we should simply be enabling update() constantly for any new frame focus in and checking the mbPrinterInit value only at the last possible moment someplace later in the code. Kevin
CreateInfoPrinter and CreatePrinter are called whenever an application creates a Printer object; writer does this per default on startup, some others do also but e.g. impress does not AFAIK. When an application wants to list the existing printers (e.g. for the printer settings dialog) GetPrinterQueueInfo gets called; if you startup writer it is extremely unlikely that you could fire up the printer dialog before the document gets the focus - unless focus events are somehow not propagated. An fprintf in vcl_sal::PrinterUpdate::update() should clarify things a bit i think; also a breakpoint might show whether it gets called and by whom - if gdb can be trusted to actually stop which i must sadly say is not always the case at least on x86.
set platform.
Hi, Adding myself to CC on this one since this patch holds up the Panther printing patch which depdends on it. Perhaps we can simply ifdef MACOSX here so that other paltforms are not affected since empirically the patch that Dan created is needed for proper working of MacOSX. Ideas? Kevin
Hi Dan, I ifdef'd your MutextGuard change for just MacOSX so that no other platforms would be affected and then added a FIXME comment to see why this is not needed for other platforms. I integrated your race fix into the Panther printing patches which were tested and found not to impact Jaguar users. So resolving this as fixed but we should remember to revisit this to figure out why this was needed for just us. Thanks, Kevin
close issue