Bug 55974 - CRASH: segfault while closing an Impress file
Summary: CRASH: segfault while closing an Impress file
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Impress (show other bugs)
Version:
(earliest affected)
4.0.0.0.alpha0+ Master
Hardware: Other Linux (All)
: medium normal
Assignee: Caolán McNamara
URL:
Whiteboard: BSA target:4.1.0 target:4.0.0.2
Keywords: regression
: 57901 58815 58993 59590 (view as bug list)
Depends on:
Blocks: mab4.0
  Show dependency treegraph
 
Reported: 2012-10-14 15:42 UTC by Julien Nabet
Modified: 2013-01-19 21:58 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:


Attachments
bt with symbols + valgrind traces (65.57 KB, application/x-compressed)
2012-10-14 15:42 UTC, Julien Nabet
Details
bt + valgrind new (29.02 KB, application/bzip2)
2012-12-06 22:03 UTC, Julien Nabet
Details
bt + console logs on 4.0 branch (21.40 KB, text/plain)
2012-12-24 16:43 UTC, Julien Nabet
Details
bt + console logs on master (21.82 KB, text/plain)
2012-12-25 09:35 UTC, Julien Nabet
Details
new valgrind log (37.97 KB, application/bzip2)
2012-12-25 10:20 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Julien Nabet 2012-10-14 15:42:10 UTC
Created attachment 68552 [details]
bt with symbols + valgrind traces

Problem description: 
On pc Debian x86-64 with master sources updated today and a brand new file, I've got a crash during closing a file from another bugreport.
No specific logs but bt and Valgrind traces attached

Steps to reproduce:
1. Open the file from this link https://bugs.freedesktop.org/attachment.cgi?id=39659 (fdo#31072)
2. Just close the file or whole LO (File/close or just the cross at top right)

Current behavior:
Crash

Expected behavior:
No crash
              
Browser: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.7) Gecko/20100101 Firefox/10.0.7 Iceweasel/10.0.7
Comment 1 Julien Nabet 2012-10-14 17:09:14 UTC
With 3.6 branch updated yesterday (commit 170521349f3d5e3b6cc16890d66d77121bfd0312), I don't reproduce this.
Comment 2 Korrawit Pruegsanusak 2012-11-03 08:18:53 UTC
NO crash with build from tinderbox @16, pull time 2012-11-01 23.27.25, Build ID: 1219bc on Windows 7 64bit.

No crash with both (1) closing file (Ctrl+W) and (2) closing libo (Alt+F4)
Comment 3 Julien Nabet 2012-11-13 21:15:10 UTC
I'm still reproducing this with an equivalent bt.

Rainer: would you have some time to give it a try? (if you have another env than Windows since Korrawit tried it)
Comment 4 Rob Snelders 2012-12-06 21:16:49 UTC
I don't get the crash with a new build on Ubuntu x86_84 LO 4.1.0-master (updated to commit 8e43393c9514f39e6b43e581503b61177a00bd36)
Comment 5 Julien Nabet 2012-12-06 22:03:46 UTC
Created attachment 71102 [details]
bt + valgrind new

With master sources updated this morning, I still reproduce this :-(

I attached bt + valgrind traces.
Comment 6 Rainer Bielefeld Retired 2012-12-07 05:06:46 UTC
No crash for me with
- "LibreOffice 3.6.4.3" German UI/ German Locale [Build-ID: 2ef5aff] {pull date 2012-11-28} on German WIN7 Home Premium (64bit) 
-  parallel installation of  "LOdev  4.0.0.0.alpha1+   -  ENGLISH UI / German Locale  [Build ID:d6579884752c298a18b7cdfcc7d1614b55c9c7c)]"  {tinderbox: Win-x86@6, pull time 2012-12-06 02:46:58} on German WIN7 Home Premium (64bit) with own separate User Profile
Comment 7 Joel Madero 2012-12-11 16:05:53 UTC
I also can't reproduce this, pulled yesterday and running Bodhi Linux
Comment 8 Terrence Enger 2012-12-12 05:02:13 UTC
Using master commit 8450a99 (2012-12-07) and an attachment
<https://bugs.freedesktop.org/attachment.cgi?id=71340> to bug 58142, I
get this same crash.  At least, frames 0 through 18 of the backtraces
are as similar as you can expect them to be.

For comparison, this bug seems not to be present in version 3.6.4.3
(Build ID: 2ef5aff).  I am setting keyword "regression".
Comment 9 Julien Nabet 2012-12-12 06:52:25 UTC
Terrence: thank you Terrence for your test! We can update status to "New" now.
Could you remind your env and version?

Could you also give a try to fdo#57901? I would say it could be a dup.
(https://bugs.freedesktop.org/show_bug.cgi?id=57901)
Comment 10 Terrence Enger 2012-12-12 11:41:58 UTC
Sorry for skimping on information.

My LibreOffice is master commit 8450a99, fetched around 2012-12-08 02:00 UTC, configured with

    --enable-dbgutil
    --enable-crashdump
    --disable-build-mozilla
    --without-system-postgresql
    --without-myspell-dicts
    --without-help
    --with-extra-buildid

built and running on Linux ubuntu-natty (11.04) 32-bit

    $ uname -a
    Linux cougar-natty 2.6.38-16-generic #67-Ubuntu SMP Thu Sep 6 18:00:43 UTC 2012 i686 athlon i386 GNU/Linux

    $ gcc --version
    gcc (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2
    Copyright (C) 2010 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

    $ java -version
    java version "1.6.0_24"
    OpenJDK Runtime Environment (IcedTea6 1.11.5) (6b24-1.11.5-0ubuntu1~11.04.1)
    OpenJDK Client VM (build 20.0-b12, mixed mode, sharing)
Comment 11 Terrence Enger 2012-12-12 11:55:17 UTC
*** Bug 57901 has been marked as a duplicate of this bug. ***
Comment 12 Julien Nabet 2012-12-12 12:43:49 UTC
Terrence: thank you for your feedback.

Here's my autogen.lastrun
--with-system-odbc
--with-system-mysql
--enable-symbols
--enable-ext-barcode
--enable-ext-diagram
--enable-ext-google-docs
--enable-ext-hunart
--enable-ext-nlpsolver
--enable-ext-ct2n
--enable-ext-numbertext
--enable-ext-oooblogger
--enable-ext-pdfimport
--enable-postgresql-sdbc
--enable-ext-presenter-console
--enable-ext-presenter-minimizer
--enable-ext-report-builder
--enable-ext-typo
--enable-ext-validator
--enable-ext-watch-window
--enable-ext-wiki-publisher
--enable-dbus
--enable-graphite
--enable-evolution2
--enable-werror
--enable-debug
--enable-dbgutil
--enable-crashdump
--enable-dependency-tracking
--enable-online-update 

So at first look, only --enable-dbgutil and --enable-crashdump seem common between 
It could help.

Michael: we have several bts, Valgrind traces, any idea what extra info could be useful?
Comment 13 Julien Nabet 2012-12-14 22:39:25 UTC
Just noticed fdo#47917 that I put in "see also"
Comment 14 Michael Meeks 2012-12-15 10:34:30 UTC
The valgrind traces are brilliant - I guess they show that we're double-deleting a load of data (for some reason) after the false-positives from Java this one looks like the first sign of real badness with a quick scan:

==7216== Invalid write of size 8
==7216==    at 0x2D5341F1: ScrollBar::SetScrollHdl(Link const&) (scrbar.hxx:138)
==7216==    by 0x2D717101: sd::slidesorter::controller::ScrollBarManager::Disconnect() (SlsScrollBarManager.cxx:108)
==7216==    by 0x2D73AFDE: sd::slidesorter::SlideSorter::ReleaseListeners() (SlideSorter.cxx:395)
==7216==    by 0x2D73A79C: sd::slidesorter::SlideSorter::~SlideSorter() (SlideSorter.cxx:218)
==7216==    by 0x2D73AAD9: sd::slidesorter::SlideSorter::~SlideSorter() (SlideSorter.cxx:246)
==7216==    by 0x2D73CF1D: void boost::checked_delete<sd::slidesorter::SlideSorter>(sd::slidesorter::SlideSorter*) (checked_delete.hpp:34)

I guess it is another symptom of VCL's horrific lack of sane lifecycle handling (somehow) - that this SlideSorter object -looks- like it keeps references to VCL widgets with some naive: ::boost::shared_ptr<ScrollBar> mpVerticalScrollBar;

It's unclear to me that those guys are going to get NULL'd properly when some other code-path destroys the whole window hierarchy and makes those pointer invalid.
Comment 15 Michael Meeks 2012-12-15 10:40:52 UTC
The widgets are deleted by this (fairly obvious) recursive delete on the top-level.

==7216==    at 0x4C279DC: operator delete(void*) (vg_replace_malloc.c:457)
==7216==    by 0xA0FA2AB: ScrollBar::~ScrollBar() (scrbar.cxx:157)
==7216==    by 0x8E1F1D3: VCLXDevice::DestroyOutputDevice() (vclxdevice.cxx:56)
==7216==    by 0x8E66F69: VCLXWindow::dispose() (vclxwindow.cxx:957)
==7216==    by 0x8E8C28B: VCLXScrollBar::dispose() (vclxwindows.cxx:3365)
==7216==    by 0x9001D30: UnoWrapper::WindowDestroyed(Window*) (unowrapper.cxx:263)
==7216==    by 0xA56F779: Window::~Window() (window.cxx:4329)
==7216==    by 0xA51EFEB: SystemWindow::~SystemWindow() (syswin.cxx:85)
==7216==    by 0xA598248: WorkWindow::~WorkWindow() (wrkwin.cxx:142)
==7216==    by 0xA59829B: WorkWindow::~WorkWindow() (wrkwin.cxx:150)
==7216==    by 0x2D5FCC0B: void boost::checked_delete<WorkWindow>(WorkWindow*) (checked_delete.hpp:34)
==7216==    by 0x2D5FF541: boost::detail::sp_counted_impl_p<WorkWindow>::dispose() (sp_counted_impl.hpp:78)
==7216==    by 0x2D3EA43B: boost::detail::sp_counted_base::release() (sp_counted_base_gcc_x86.hpp:145)
==7216==    by 0x2D3EA4CA: boost::detail::shared_count::~shared_count() (shared_count.hpp:217)
==7216==    by 0x2D5F98FB: boost::shared_ptr<Window>::~shared_ptr() (shared_ptr.hpp:168)
==7216==    by 0x2D5F6AF0: sd::framework::BasicViewFactory::~BasicViewFactory() (BasicViewFactory.cxx:143)
==7216==    by 0x2D5F6BD1: sd::framework::BasicViewFactory::~BasicViewFactory() (BasicViewFactory.cxx:145)


Somehow we need to add a 'dispose' method to the SlideSorter (I guess) and ensure that the top-level impress window is calling dispose on it's child SlideSorter object, and that that is called first (I guess).

What a horror :-) and all because VCL doesn't use a sane reference-counting lifecycle model.
Comment 16 Julien Nabet 2012-12-24 16:43:26 UTC
Created attachment 72081 [details]
bt + console logs on 4.0 branch

Here is a new bt with 4.0 updated today commit 917616f1e15f06533f7380a309b1ccea0920f8f6 (so includes "fdo#56980, fdo#58267 don't leave stale SdrObject refs around")
Comment 17 Julien Nabet 2012-12-24 16:44:39 UTC
David: you may be interested by this one.
I'm gonna "make clean && make" on my local master sources and try to provide results ASAP.
Comment 18 Julien Nabet 2012-12-25 09:35:45 UTC
Created attachment 72099 [details]
bt + console logs on master

bt with master sources (commit 4a3018e4eceb981aadbadbe3eadff4c17f018357)
Comment 19 Julien Nabet 2012-12-25 10:20:11 UTC
Created attachment 72102 [details]
new valgrind log

attached new Valgrind log
Comment 20 Terrence Enger 2013-01-03 22:50:19 UTC
Bug 58933 "segfault closing Draw document without saving changes" has a backtrace very similar to the one here for the top 18 frames or so.  For all I know, it might even be the same bug.

In particular, checked_delete is the top frame or the next one.

Terry.
Comment 21 Rob Snelders 2013-01-03 22:59:17 UTC
*** Bug 58993 has been marked as a duplicate of this bug. ***
Comment 22 Caolán McNamara 2013-01-16 15:48:20 UTC
no crash before http://cgit.freedesktop.org/libreoffice/core/commit/?id=aa1927dc257b52edf96de220cc3797e02c83a0ae but crash after
Comment 23 Caolán McNamara 2013-01-16 16:00:50 UTC
caolanm->tbehrens: could you check if the original intent of aa1927dc257b52edf96de220cc3797e02c83a0ae is still ok after the above commit.
Comment 24 Not Assigned 2013-01-16 16:02:56 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a0d852b2ade42289af1e9b066a48c97aedeff3b1

Resolves: fdo#55974 segfault while closing an Impress file



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 25 Caolán McNamara 2013-01-16 16:06:44 UTC
commit of comment #24 I mean. Proposed 4-0 backport as https://gerrit.libreoffice.org/#/c/1718/
Comment 26 Caolán McNamara 2013-01-16 16:31:49 UTC
*** Bug 58815 has been marked as a duplicate of this bug. ***
Comment 27 Thorsten Behrens (allotropia) 2013-01-16 17:16:14 UTC
(In reply to comment #23)
> caolanm->tbehrens: could you check if the original intent of
> aa1927dc257b52edf96de220cc3797e02c83a0ae is still ok after the above commit.
>
Looks good on first check - thx for the fix!
Comment 28 Julien Nabet 2013-01-16 19:01:33 UTC
Caolán: thank you for the fix, I made the test again with master sources and it was ok.
Comment 29 Not Assigned 2013-01-17 11:12:22 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "libreoffice-4-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=7273244d44e7ab1e64b57cd3de7f89e72cbd9507&h=libreoffice-4-0

Resolves: fdo#55974 segfault while closing an Impress file


It will be available in LibreOffice 4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 30 Jorendc 2013-01-19 21:58:39 UTC
*** Bug 59590 has been marked as a duplicate of this bug. ***