Approximately 10-15% of the time when starting LibreOffice it deadlocks and it appears based on the backtrace to be happening when it's loading fonts. Neither writer nor the splashpage process are chewing CPU, it' just completely hangs. Previously in OpenOffice we had some similar problems and deleting the fontcache file on each launch seemed to make it work better. But that technique is not working. I'm attaching the two backtraces. I'm watching for a pattern.
Created attachment 57114 [details] First Backtrace Splash page starts, both splash and write halt.
Created attachment 57115 [details] Second Backtrace
Also, this is running on the same server as OpenOffice and has access to the same fonts at OpenOffice. OpenOffice 3.3 is not experiencing this issue.
As Caolan says, looks like some potential configmgr deadlock.
#3 0x00007fd19495e7d0 in osl_acquireMutex () from /opt/libreoffice3.5/program/../ure-link/lib/libuno_sal.so.3 #4 0x00007fd1882a0a19 in osl_waitCondition () from /opt/libreoffice3.5/program/../program/configmgr.uno.so #5 0x00007fd190d4f038 in utl::DefaultFontConfiguration::tryLocale(com::sun::star::lang::Locale const&, rtl::OUString const&) const () from /opt/libreoffice3.5/program/libutllo.so #6 0x00007fd190d4f665 in utl::DefaultFontConfiguration::getDefaultFont(com::sun::star::lang::Locale const&, int) const () looks garbled - utl's tryLocale seems likely, but it calls directly via the vtable into configmgr, presumably something like: Reference< XNameAccess > xNode; if ( m_xConfigAccess->hasByName( it->second.aConfigLocaleString ) )
@all: Sberg was kind enough to send me configmgr.uno.so which I will install tomorrow and replicate.
Another thing that comes to mind is threadsafe statics. The thread that is completely within configmgr (apparently within configmgr::Components::WriteThread::run) calls __cxa_guard_acquire, i.e., comes across a local static variable (with non-trivial ctor), likely typeNames in writeNode (configmgr/source/writemodfile.cxx) or theLock in lock (configmgr/source/lock.cxx). The main thread is in the SwModule ctor, which it must reach via - SwModule::SwModule - SwDLL::SwDLL - (anonymous namespace)::SwDLLInstance::SwDLLInstance - rtl::Static<{anonymous}::SwDLLInstance, {anonymous}::theSwDLLInstance>::get(void) - SwGlobals::ensure [...] i.e., it also is within a local static ctor in rtl::Static::get (rtl/instance.hxx; thanks to HAVE_THREADSAFE_STATICS being generally enabled on Linux, cf. configure.in). Now, "Some C++ runtimes use a single lock for all static variables, which can cause deadlock in multi-threaded applications." (cf. configure.in; and e.g., Mac OS X is known to be affected by this problem). It is not entirely clear to me which Linux GCC versions are affected by this problem (recent versions are known to no longer have this defect). But the LO Linux installation sets available from <http://www.libreoffice.org/download/> are built with a rather old GCC toolchain (cf. comment 3 to bug 45696), and I do not know whether that might still exhibit this problem. (OOo is far more conservative wrt exploiting -fthreadsafe-static, i.e., it does not have the HAVE_THREADSAFE_STATICS optimizations in rtl/instance.hxx, so would not exhibit this problem.)
And indeed, compiling the C++ test program at <http://cgit.freedesktop.org/libreoffice/core/tree/configure.in?id=3ac780d8a2f8d1b94e9b4776d7f556274d3197dc#n4221> makes it run fine for me (Fedora 16 x86_64, based on GCC 4.6.2), but running it with LD_LIBRARY_PATH=/opt/libreoffice3.5/ure/lib (so that it picks up the libstdc++.so.6 and libgcc_s.so.1 GCC standard libraries from the official LO instanllation set, which come from a rather old GCC toolchain) makes it hang. That is, the assumption that all Linux GCC >= 4 have a working -fthreadsafe-statics is apparently wrong. Need to dig out the exact version where the problem got fixed. Dave, a quick workaround should be to move away /opt/libreoffice3.5/ure/lib/{libgcc_s.so.1,libstdc++.so.6}. The deadlocks should hopefully go away then.
<mmeeks> dave_largo: ping ? <dave_largo> mmeeks: Hey! What's up? <mmeeks> dave_largo: did you manage to verify if removing those libraries fixes the deadlock on start for you ? <dave_largo> Yes, deadlocks have stopped As such this should be automatically fixed in 3.5.1 where we stop shipping those libraries I believe.
So resolving duplicate of removing stdlibs. *** This bug has been marked as a duplicate of bug 46246 ***
No, this is not really fixed by removing the stdlibs that ship with LO. (That only happens to make the bug go away, on systems with sufficiently recent stdlibs.) The real problem is the wrong way configure determines whether GCC -fthreadsafe-statics are actually usable (w/o causing deadlocks) on a given build platform. That still needs fixing.
> No, this is not really fixed by removing the stdlibs that ship with LO. > (That only happens to make the bug go away, on systems with sufficiently > recent stdlibs.) Ok ;-) the option of shipping a very recent stdlibs with the distribution instead of an horribly old one was considered & rejected IIRC, that would help in this case surely. > The real problem is the wrong way configure determines whether > GCC -fthreadsafe-statics are actually usable (w/o causing deadlocks) > on a given build platform. That still needs fixing. I guess, or we could add a --without-threadsafe-statics parameter, that would force this off for the generic-linux build; it is after all a somewhat esoteric situation. Perhaps Fridrich might help out with that ?
Improved the configure check for a broken -fthreadsafe-statics now on master with <http://cgit.freedesktop.org/libreoffice/core/commit/?id=f78cb7da33a9f69e865b28b55a212bf1d11b1d7d> "Improve check for broken -fthreadsafe-statics." That should cause future Linux builds (starting with LO 3.5.1, once this fix is backported to libreoffice-3-5) available from <http://www.libreoffice.org/download/> to use more conservative code that cannot lead to these deadlocks.
Backported fix to libreoffice-3-5 (towards LO 3.5.1) as <http://cgit.freedesktop.org/libreoffice/core/commit/?h=libreoffice-3-5&id=bb0f6b0c7c745264da38b91e0eca39a6f5ad934d> "Improve check for broken -fthreadsafe-statics."