Bug 46129 - DeadLock At Startup While Loading Fonts
Summary: DeadLock At Startup While Loading Fonts
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: UI (show other bugs)
Version:
(earliest affected)
3.5.0 release
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Stephan Bergmann
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-15 12:11 UTC by Dave Richards
Modified: 2012-02-20 06:01 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
First Backtrace (9.27 KB, text/plain)
2012-02-15 12:12 UTC, Dave Richards
Details
Second Backtrace (9.29 KB, text/plain)
2012-02-15 12:12 UTC, Dave Richards
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Richards 2012-02-15 12:11:28 UTC
Approximately 10-15% of the time when starting LibreOffice it deadlocks and it appears based on the backtrace to be happening when it's loading fonts.  Neither writer nor the splashpage process are chewing CPU, it' just completely hangs. 

Previously in OpenOffice we had some similar problems and deleting the fontcache file on each launch seemed to make it work better.  But that technique is not working.

I'm attaching the two backtraces.  I'm watching for a pattern.
Comment 1 Dave Richards 2012-02-15 12:12:15 UTC
Created attachment 57114 [details]
First Backtrace

Splash page starts, both splash and write halt.
Comment 2 Dave Richards 2012-02-15 12:12:41 UTC
Created attachment 57115 [details]
Second Backtrace
Comment 3 Dave Richards 2012-02-15 12:13:34 UTC
Also, this is running on the same server as OpenOffice and has access to the same fonts at OpenOffice.  OpenOffice 3.3 is not experiencing this issue.
Comment 4 Michael Meeks 2012-02-15 12:17:28 UTC
As Caolan says, looks like some potential configmgr deadlock.
Comment 5 Michael Meeks 2012-02-15 12:29:23 UTC
#3  0x00007fd19495e7d0 in osl_acquireMutex () from /opt/libreoffice3.5/program/../ure-link/lib/libuno_sal.so.3
#4  0x00007fd1882a0a19 in osl_waitCondition () from /opt/libreoffice3.5/program/../program/configmgr.uno.so
#5  0x00007fd190d4f038 in utl::DefaultFontConfiguration::tryLocale(com::sun::star::lang::Locale const&, rtl::OUString const&) const () from /opt/libreoffice3.5/program/libutllo.so
#6  0x00007fd190d4f665 in utl::DefaultFontConfiguration::getDefaultFont(com::sun::star::lang::Locale const&, int) const ()

looks garbled - utl's tryLocale seems likely, but it calls directly via the vtable into configmgr, presumably something like:

       Reference< XNameAccess > xNode;
       if ( m_xConfigAccess->hasByName( it->second.aConfigLocaleString ) )
Comment 6 Dave Richards 2012-02-15 13:21:39 UTC
@all:  Sberg was kind enough to send me configmgr.uno.so which I will install tomorrow and replicate.
Comment 7 Stephan Bergmann 2012-02-15 14:12:45 UTC
Another thing that comes to mind is threadsafe statics.  The thread that is completely within configmgr (apparently within configmgr::Components::WriteThread::run) calls __cxa_guard_acquire, i.e., comes across a local static variable (with non-trivial ctor), likely typeNames in writeNode (configmgr/source/writemodfile.cxx) or theLock in lock (configmgr/source/lock.cxx).

The main thread is in the SwModule ctor, which it must reach via

- SwModule::SwModule
- SwDLL::SwDLL
- (anonymous namespace)::SwDLLInstance::SwDLLInstance
- rtl::Static<{anonymous}::SwDLLInstance, {anonymous}::theSwDLLInstance>::get(void)
- SwGlobals::ensure
[...]

i.e., it also is within a local static ctor in rtl::Static::get (rtl/instance.hxx; thanks to HAVE_THREADSAFE_STATICS being generally enabled on Linux, cf. configure.in).

Now, "Some C++ runtimes use a single lock for all static variables, which can cause deadlock in multi-threaded applications." (cf. configure.in; and e.g., Mac OS X is known to be affected by this problem).  It is not entirely clear to me which Linux GCC versions are affected by this problem (recent versions are known to no longer have this defect).

But the LO Linux installation sets available from <http://www.libreoffice.org/download/> are built with a rather old GCC toolchain (cf. comment 3 to bug 45696), and I do not know whether that might still exhibit this problem.

(OOo is far more conservative wrt exploiting -fthreadsafe-static, i.e., it does not have the HAVE_THREADSAFE_STATICS optimizations in rtl/instance.hxx, so would not exhibit this problem.)
Comment 8 Stephan Bergmann 2012-02-15 23:51:42 UTC
And indeed, compiling the C++ test program at <http://cgit.freedesktop.org/libreoffice/core/tree/configure.in?id=3ac780d8a2f8d1b94e9b4776d7f556274d3197dc#n4221> makes it run fine for me (Fedora 16 x86_64, based on GCC 4.6.2), but running it with LD_LIBRARY_PATH=/opt/libreoffice3.5/ure/lib (so that it picks up the libstdc++.so.6 and libgcc_s.so.1 GCC standard libraries from the official LO instanllation set, which come from a rather old GCC toolchain) makes it hang.

That is, the assumption that all Linux GCC >= 4 have a working -fthreadsafe-statics is apparently wrong.  Need to dig out the exact version where the problem got fixed.

Dave, a quick workaround should be to move away /opt/libreoffice3.5/ure/lib/{libgcc_s.so.1,libstdc++.so.6}.  The deadlocks should hopefully go away then.
Comment 9 Michael Meeks 2012-02-17 13:15:35 UTC
<mmeeks> dave_largo: ping ?
<dave_largo> mmeeks: Hey!  What's up?
<mmeeks> dave_largo: did you manage to verify if removing those libraries fixes the deadlock on start for you ?
<dave_largo> Yes, deadlocks have stopped

As such this should be automatically fixed in 3.5.1 where we stop shipping those libraries I believe.
Comment 10 Michael Meeks 2012-02-17 13:23:40 UTC
So resolving duplicate of removing stdlibs.

*** This bug has been marked as a duplicate of bug 46246 ***
Comment 11 Stephan Bergmann 2012-02-20 00:13:05 UTC
No, this is not really fixed by removing the stdlibs that ship with LO.  (That only happens to make the bug go away, on systems with sufficiently recent stdlibs.)  The real problem is the wrong way configure determines whether GCC -fthreadsafe-statics are actually usable (w/o causing deadlocks) on a given build platform.  That still needs fixing.
Comment 12 Michael Meeks 2012-02-20 03:48:01 UTC
> No, this is not really fixed by removing the stdlibs that ship with LO.
> (That only happens to make the bug go away, on systems with sufficiently
> recent stdlibs.)

Ok ;-) the option of shipping a very recent stdlibs with the distribution instead of an horribly old one was considered & rejected IIRC, that would help in this case surely.

>  The real problem is the wrong way configure determines whether
> GCC -fthreadsafe-statics are actually usable (w/o causing deadlocks)
> on a given build platform.  That still needs fixing.

I guess, or we could add a --without-threadsafe-statics parameter, that would force this off for the generic-linux build; it is after all a somewhat esoteric situation. Perhaps Fridrich might help out with that ?
Comment 13 Stephan Bergmann 2012-02-20 05:43:09 UTC
Improved the configure check for a broken -fthreadsafe-statics now on master with <http://cgit.freedesktop.org/libreoffice/core/commit/?id=f78cb7da33a9f69e865b28b55a212bf1d11b1d7d> "Improve check for broken -fthreadsafe-statics."  That should cause future Linux builds (starting with LO 3.5.1, once this fix is backported to libreoffice-3-5) available from <http://www.libreoffice.org/download/> to use more conservative code that cannot lead to these deadlocks.
Comment 14 Stephan Bergmann 2012-02-20 06:01:36 UTC
Backported fix to libreoffice-3-5 (towards LO 3.5.1) as <http://cgit.freedesktop.org/libreoffice/core/commit/?h=libreoffice-3-5&id=bb0f6b0c7c745264da38b91e0eca39a6f5ad934d> "Improve check for broken -fthreadsafe-statics."