Bug 69036 - segfault in osl_acquireMutex
Summary: segfault in osl_acquireMutex
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
4.2.0.0.alpha0+ Master
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:4.4.0
Keywords:
Depends on: 80205
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-06 13:46 UTC by Terrence Enger
Modified: 2014-06-29 12:53 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
gdb on the core file (28.54 KB, text/plain)
2013-09-06 13:46 UTC, Terrence Enger
Details
typescript: SIGSEGV under valgrind (251.31 KB, text/plain)
2013-09-11 12:43 UTC, Terrence Enger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Terrence Enger 2013-09-06 13:46:54 UTC
Created attachment 85355 [details]
gdb on the core file

STR, short version:
(*) Create database
(*) Create labels document
(*) Close labels document
(*) Close database.  Program segfaults.

In more detail, to reproduce the problem:

( 1) Download filing-labels.ods, the first attachment to bug 68912
     "Next Record".

( 2) Run LibreOffice from the command line with options
         --norestore --base.  

     Program displays "Database Wizard" step 1 "Select database".

( 3) Select "What do you want to do?" = "Connect to an existing
     database", in the dropdown list select Spreadsheet, and click
     <Next>.

     Program displays "Database Wizard" step 2 "Set up Spreadsheet
     connection".

( 4) Browse to the downloaded file and <Open> it.

     The Location control of "Database Wizard" shows the full pathname
     of the file.

( 5) Click <Next>.

     Program displays "Database Wizard" step 3 "Save and proceed" with
     "Yes, register the database for me" and "Open the database for
     editing selected".

( 6) Click <Finish>.

     Program displays Save dialog.

( 7) For Name type /tmp/a.odb and click <Save>.

     Program displays the database window "a.odb ...".

( 8) Take menu options File > New > Labels.

     Program displays Labels dialog open at tab Labels.

( 9) In the dropdown list Database, select a; in the dropdown list
     Table select Sheet1; in the dropdown list "Database field" select
     Label,; and click <New Document>.

     Program displays Writer window "Untitled 1 ...".

(10) Close window "Untitled 1 ...".

     Program presents dialog "Save document?".

(11) Click <Close without saving>.

     Program returns focus to "a.odb ...".

(12) Close window "a.odb...".

     Program segfaults.  In thread 1, stack frame 0 has a
     improbable-looking parameter:

         __pthread_mutex_lock (mutex=0x99999999)

     The first frame withing LibreOffice itself is ...

        #2  0x00843d1c in osl_acquireMutex (Mutex=0x99999999)
            at /home/terry/lo_hacking/git/libo2/sal/osl/unx/mutex.c:114
                nRet = 0
                pMutex = 0x99999999


My LibreOffice in master commit 9e9693b, fetched 2013-09-03, configured with

    --enable-option-checking=fatal
    --enable-dbgutil
    --enable-crashdump
    --without-system-postgresql
    --without-myspell-dicts
    --without-help
    --with-extra-buildid
    --without-doxygen
    --with-external-tar=/home/terry/lo_hacking/git/src
    --disable-werror

built and executing on ubuntu-natty (11.04) 32-bit:

    $ uname -a
    Linux cougar-natty 2.6.38-16-generic #67-Ubuntu SMP Thu Sep 6 18:00:43 UTC 2012 i686 athlon i386 GNU/Linux

    $ gcc --version
    gcc (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2
    Copyright (C) 2010 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

    $ java -version
    java version "1.6.0_24"
    OpenJDK Runtime Environment (IcedTea6 1.11.5) (6b24-1.11.5-0ubuntu1~11.04.1)
    OpenJDK Client VM (build 20.0-b12, mixed mode, sharing)


I think bug 55566 "open two odb files with macro at "open document"
event -> crash" may be the same bug.  It differs in that its crash
happens during open rather than close.  It is similar in that:
(*) comment 1 mentions an .odb plus an .odt, like this report
(*) comment 12, frame 2 shows osl_acquireMutex.
Comment 1 Terrence Enger 2013-09-06 13:48:59 UTC
Lionel,

Is this perhaps a dup of bug 55566 "open two odb files with macro at "open document" event -> crash" ?

Thanks,
Terry.
Comment 2 Lionel Elie Mamane 2013-09-06 23:55:51 UTC
(In reply to comment #1)
> Is this perhaps a dup of bug 55566 "open two odb files with macro at "open
> document" event -> crash" ?

I don't think so.

In this bug, osl_acquireMutex is called on an uninitialised value: the value 0x99999999 is a special canary for unitialised memory in debug builds on GNU libc (GNU/Linux). The problem is probably linked to how initialisation of static values (and calling the C++ constructor) happens in multi-threaded, multi-dynamic library code, which I'm not very clear on how it happens. The code looks like:

    static ItemHolder1* pHolder = new ItemHolder1();
    pHolder->impl_addItem(eItem);

The constructor of ItemHolder1 should call the Mutex constructor which should initialise the m_aLock Mutex which impl_addItem tries to lock (via ClearableGuard); it seems that in this specific case m_aLock is allocated but not initialised...

That's the local problem. But maybe (not sure) the local problem is triggered by a higher-level problem, which is that connectivity::calc::OCalcConnection::disposing tries to dispose of its connection to Calc, but Calc is already dead? I say that because I see in the backtrace:

SfxApplication::GetOrCreate
SfxObjectShell::Close

GetOrCreate goes and *creates* the SfxApplication, which is ... unexpected when called from a destructor / close method: what Close does is get the (freshly created) SfxApplicaton and erase all its ObjectShells... It seems pointless to create it just for that.

Which raises the question of order of destruction of LibreOffice components during application shutdown... How do we handle dependencies between "major" components (Calc, Base, ...) in there? The question is probably complexified by circular dependencies between Calc and Base (connectivity and/or dbaccess module): Calc can use Base datasources, but Base can use Calc as a datasource :)

Stephan, do you think (one of) these questions would fall under your area of expertise? If not, any idea who to consult?
Comment 3 Stephan Bergmann 2013-09-09 10:40:05 UTC
(In reply to comment #2)
> That's the local problem. But maybe (not sure) the local problem is
> triggered by a higher-level problem, which is that
> connectivity::calc::OCalcConnection::disposing tries to dispose of its
> connection to Calc, but Calc is already dead? I say that because I see in
> the backtrace:
> 
> SfxApplication::GetOrCreate
> SfxObjectShell::Close
> 
> GetOrCreate goes and *creates* the SfxApplication, which is ... unexpected
> when called from a destructor / close method: what Close does is get the
> (freshly created) SfxApplicaton and erase all its ObjectShells... It seems
> pointless to create it just for that.

Yes, the root problem here apparently is that SfxApplication is re-created during shutdown, while the UNO service manager is being disposed and in turn disposes all registered services.  Those UNO services must generally do as little as possible during disposing, as infrastructure they depend on during normal operation may already have been taken down during shutdown.  I have no idea what to do in this particular case, but it is ultimately dbaccess::ODatabaseContext::disposing (frame 36) that is doing "too much."
Comment 4 Terrence Enger 2013-09-11 12:43:50 UTC
Created attachment 85628 [details]
typescript: SIGSEGV under valgrind

Attaching this just in case it is interesting.

Some high points ...

(*) ODatabaseContext::disposing appears several times

(*) line 1598: invalid read in com::sun::star::uno::BaseReference::is;
    ODatabaseContext::disposing is in the stack

(*) line 1650: segfault in com::sun::star::uno::BaseReference::is();
    ODatabaseContext::disposing is in the stack
Comment 5 Julien Nabet 2013-10-19 19:33:02 UTC
On pc Debian x86-64 with master sources updated today, I tried to reproduce this but no crash.
However, I noticed these traces:

warn:legacy.osl:13046:1:sw/source/ui/dbui/dbmgr.cxx:1629: Exception in SwDBMgr::GetColumnSupplier

warn:fwk:13046:1:framework/source/fwi/threadhelp/transactionmanager.cxx:312: TransactionManager...: Owner instance already closed. Call was rejected!
Comment 6 Joel Madero 2014-06-18 20:04:43 UTC
Should this be closed as WFM or NEW?
Comment 7 Terrence Enger 2014-06-18 23:44:27 UTC
Nobody else has seen the segfault, and I cannot see it now because bug
80205 "assertion in OUString::operator[] at ustring.hxx:421" happens
as I try step (4) of this bug.  So, as little as we like to have a bug
UNCONFIRMED for a long time, I think that UNCONFIRMED is the truest
description of the situation.

I encountered this bug as I was trying to confirm bug 68912 "EDITING:
Label-wizard - Next Recored doesn't work".  Now, I have found another
bug while trying out this one again.  There seems to be a pattern
here.
Comment 8 Julien Nabet 2014-06-21 07:00:52 UTC
On pc Debian x86-64 with master sources updated yesterday (I've got enable-dbg), I still don't reproduce this :-(
Idem of my comment5, I noticed these:
warn:legacy.osl:7131:1:sw/source/uibase/dbui/dbmgr.cxx:1631: Exception in SwDBManager::GetColumnSupplier
warn:fwk:7131:1:framework/source/fwi/threadhelp/transactionmanager.cxx:274: TransactionManager...: Owner instance already closed. Call was rejected!
warn:tools.debug:7131:1:tools/source/debug/debug.cxx:297: no DbgTestSolarMutex function set


Following the Maxim's fix for fdo#80205, could you give it a new try with master sources updated from yesterday at minimum?
Also, do you have accessibility enabled? I'm asking because some bugs are triggered by this part.

If you still reproduce this, a bt would be useful.
Comment 9 Terrence Enger 2014-06-21 12:58:49 UTC
With master commit dc795cb, fetched 2014-06-20 1642 UTC, there is no
crash.

Thank you, Julien, for your attention.
Comment 10 Commit Notification 2014-06-27 15:09:14 UTC
Norbert Thiebaud committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=01a882039ec4d0edf4da7d3e10ffea569a3e4aee

fdo#69036 do not try to create a sfxApplication when we are tearing-down



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 11 Terrence Enger 2014-06-29 12:53:20 UTC
With master commit 924a28a, fetched 2014-06-28 0333 UTC, the crash is
gone.  Thank you, Norbert.