Bug 61688 - SIGABRT with debug build in VclBuilder::handleChild
Summary: SIGABRT with debug build in VclBuilder::handleChild
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
4.1.0.0.alpha0+ Master
Hardware: Other Linux (All)
: medium normal
Assignee: Caolán McNamara
URL:
Whiteboard: target:4.1.0
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-02 14:14 UTC by Terrence Enger
Modified: 2013-08-21 18:23 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
excerpt from typescript with backtrace (13.49 KB, text/plain)
2013-03-02 14:14 UTC, Terrence Enger
Details
typescript with SIGABRT at line 119 (28.75 KB, text/plain)
2013-03-04 20:27 UTC, Terrence Enger
Details
.odb with minimal embedded HSQLDB (2.91 KB, application/vnd.sun.xml.base)
2013-03-05 01:23 UTC, Terrence Enger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Terrence Enger 2013-03-02 14:14:14 UTC
Created attachment 75780 [details]
excerpt from typescript with backtrace

Steps to reproduce:

(1) In the program directory, issue shell commands
         source ./ooenv
         ./soffice.bin --norestore --writer

    Program displays writer document "Untitled 1".

(2) In the writer document, type ctrl-F12.

    Expected program action:  to display dialog Insert Table.

    Actual program action:  crash with signal 6, SIGABRT


I observed this with master commit b355324, pulled around 
2013-03-01 16:30 UTC, built and running on ubunt-natty (11.04) 32-bit,
configured as ...

    --enable-dbgutil
    --enable-crashdump
    --disable-build-mozilla
    --without-system-postgresql
    --without-myspell-dicts
    --without-help
    --with-extra-buildid


From the backtrace, I think the "application" code leading to the
failure is in frame 27:

    #27 0x045538c5 in VclBuilder::handleChild (this=0x9762778, pParent=0x97601c0, reader=...) at /home/terry/lo_hacking/git/libo4/vcl/source/window/builder.cxx:1523
8                            aChilds.push_back(pChild);
    1519                }
    1520
    1521                //sort child order within parent so that     tabbing
    1522                //between controls goes in a visually sensible sequence
    1523                std::stable_sort(aChilds.begin(), aChilds.end(), sortIntoBestTabTraversalOrder(this));
    1524                for (size_t i = 0; i < aChilds.size(); ++i)
    1525                    reorderWithinParent(*aChilds[i], i);
    1526            }
    1527        }


The appearance of stl iterators in the backtrace tells me that the
crash is related to using a debug build.  In at least one other recent
similar case, a non-debug build did not crash, but it did show the
problem under valgrind.
Comment 1 Julien Nabet 2013-03-03 17:47:08 UTC
On pc Debian x86-64 with master sources updated today (commit 763e94b584411fd856253a06c869a82702f8c89d), I don't reproduce this crash.
Comment 2 Terrence Enger 2013-03-04 20:27:46 UTC
Created attachment 75916 [details]
typescript with SIGABRT at line 119

I saw this in a different context, but am reporting it here because of
VclBuilder::handleChild
(/home/terry/lo_hacking/git/libo2/vcl/source/window/builder.cxx:1523)
in both backtraces.

To reproduce ...

(1) Download and open the attached .odb file, which contains a really
    minimal embedded HSQLDB database.

    Program displays Window democrash.odb,

(2) In the left pane, Database, click <Tables>.

    Program displays table name Table1 in the lower right pane.

(3) Right click on table name Table1 and in the pop-up menu, select
    Open.

    Program displays window Table Data View.

(4) In the Table Data View window, right click on column heading
    "words".

    Program presents pop-up menu.

(5) From the pop-up menu, select "Column Format".

    Expected program action: to display a dialog box.

    Actual program action: Signal 6, SIGABRT.


I now have master commit f5ca04c, pulled around 2013-03-04 16:20 UTC.
Comment 3 Terrence Enger 2013-03-05 01:23:07 UTC
Created attachment 75935 [details]
.odb with minimal embedded HSQLDB
Comment 4 Julien Nabet 2013-03-16 18:24:57 UTC
On pc Debian x86-64 with master sources, I followed comment 2 but didn't reproduce the crash.

Terrence: Just for the update, do you still reproduce this one with a more recent version?
Comment 5 Terrence Enger 2013-03-17 23:08:22 UTC
(In reply to comment #4)
> On pc Debian x86-64 with master sources, I followed comment 2 but didn't
> reproduce the crash.

Does your build use debugging STL objects?  One of my configuration
parameters, --enable-dbgutil IIRC, turns on a lot of checking in the c++
library.  If you are using the debugging STL objects without seeing the
crash, that is surprising and interesting.

Of course, I only noticed the extra checking within the system library when
I looked into the backtrace from a crash.

> 
> Terrence: Just for the update, do you still reproduce this one with a more
> recent version?

Yup, I reproduced it with master commit 51a18f9, pulled aournd 2013-03-14
17:40 UTC.

This is problem is quite a nuisance for me.  It hits me when I try to check
that fdo#59262 "web help missing" still happens, and it stopped me
yesterday as I tried to confirm one new bug report against Base.

I did some stepping in gdb, but eventually gave up when gdb kept running
out of memory.  If I can find anything coherent in the notes I took then, I
think I shall ask for guidance on the dev list.
Comment 6 Julien Nabet 2013-03-18 06:32:06 UTC
Yes, I've got enable-dbgutil, here's my complete autogen:
--with-system-odbc
--enable-ext-barcode
--enable-ext-diagram
--enable-ext-google-docs
--enable-ext-hunart
--enable-ext-nlpsolver
--enable-ext-ct2n
--enable-ext-numbertext
--enable-postgresql-sdbc
--enable-ext-presenter-minimizer
--enable-ext-report-builder
--enable-ext-typo
--enable-ext-validator
--enable-ext-watch-window
--enable-ext-wiki-publisher
--enable-dbus
--enable-graphite
--enable-evolution2
--enable-werror
--enable-debug
--enable-dbgutil
--enable-crashdump
--enable-dependency-tracking
--enable-online-update

gcc (Debian 4.7.2-5) 4.7.2

make 3.81

java version "1.7.0_03"
OpenJDK Runtime Environment (IcedTea7 2.1.6) (7u3-2.1.6-1)
OpenJDK 64-Bit Server VM (build 22.0-b10, mixed mode)

I'll give a try to fdo#59262.
Comment 7 Julien Nabet 2013-03-18 06:33:37 UTC
Also, did you run a "make clean" recently? I know it's longer to build of course but it's just to be sure. (after complete gbuild conversion, OUString migration, etc.)
Comment 8 Terrence Enger 2013-03-18 12:36:09 UTC
Yup, did a make clean for this very build because of undefined symbols
in svgdocumenthandler.cxx.

My system has

    GNU Make 3.81

    gcc (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2

    java version "1.6.0_24"
    OpenJDK Runtime Environment (IcedTea6 1.11.5) (6b24-1.11.5-0ubuntu1~11.04.1)
    OpenJDK Client VM (build 20.0-b12, mixed mode, sharing)

and my c++ library is version 4.5.


In that library, my attention goes to __check_sorted, which says ...

      // Verify that the predicate is StrictWeakOrdering by checking that it
      // is irreflexive.
      _GLIBCXX_DEBUG_ASSERT(__first == __last || !__pred(*__first, *__first));

If I am interpreting the output from gdb correctly,
VclBuilder::sortIntoBestTabTraversalOrder::operator()(...) is neither
reflexive nor irreflexive.

Terry.
Comment 9 Julien Nabet 2013-03-18 18:56:52 UTC
Terrence:
Could you give glibc version?
Mine:
ldd --version
ldd (Debian EGLIBC 2.13-38) 2.13

Caolán: vcl, glib, you might be interested in this one.
Comment 10 Terrence Enger 2013-03-18 19:26:56 UTC
My system gives ...

    $ ldd --version
    ldd (Ubuntu EGLIBC 2.13-0ubuntu13.2) 2.13

As well, note directory "debug" in the name of the file with the
failing assertion: /usr/include/c++/4.5/debug/functions.h.  My
ubuntu-natty installation has packages libstdc++6 and
libstdc++6-4.5-dev, both of them version 4.5.2-8ubuntu4; it is the
latter which installed functions.h.

The requirement that the comparison functor supplied to stable_sort be
irreflexive seems really strange; I do not understand what it means
for a sorting algorithm to be stable if there is no possiblity of
equal keys.
Comment 11 Julien Nabet 2013-03-18 19:59:43 UTC
ok so glibc very similar (identical?)
but gcc and so libstd is different, could it be the root cause?
Package: libstdc++6-4.7-dev
Source: gcc-4.7
Version: 4.7.2-5
Comment 12 Terrence Enger 2013-03-18 22:23:36 UTC
I have looked at libstdc++6 version 4.7.2-2ubuntu1.  functions.h has
the essentially the same assertion.
Comment 13 Terrence Enger 2013-03-27 02:31:15 UTC
The SIGABRT disappears with application of this patch against master
commit 51a18f9 (2013-03-14).

--- a/vcl/source/window/builder.cxx
+++ b/vcl/source/window/builder.cxx
@@ -1389,6 +1389,10 @@ void VclBuilder::handleTabChild(Window *pParent, xmlreade
 //we sort these into a best-tab-order sequence
 bool VclBuilder::sortIntoBestTabTraversalOrder::operator()(const Window *pA, co
 {
+    // STL requires that this function be irreflexive
+    if( pA == pB )
+        return false;
+
     //sort child order within parent list by grid position
     sal_Int32 nTopA = pA->get_grid_top_attach();
     sal_Int32 nTopB = pB->get_grid_top_attach();

This patch changes the the value in the case that pA and pB are the
same pointer and they point to a label widget.  In this, it partly
contradicts commit 16d7194e
<http://cgit.freedesktop.org/libreoffice/core/commit/?id=16d7194ee73786c212e8639d41c7c31735ca930a>.


@Caolan,

Can you point me to an instance of the problem that your commit fixed?
IIUC, the problem is that the tab key gets caught in a too-small loop.
I wonder if my patch brings back the old problem.

Any other comments or guidance is welcome, of course.

Thanks,
Terry,
Comment 14 Caolán McNamara 2013-04-03 09:33:02 UTC
odd thing is that my libstdc++ with --enable-dbgutil doesn't break for me here. I have a history of getting strict weak ordering wrong
Comment 15 Commit Notification 2013-04-03 09:53:02 UTC
Terrence Enger committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=e0dd0e821b9b29e7489e0a17f7c43ef45c0941e9

Resolves: fdo#61688 SIGABRT with debug build in sortIntoBestTabTraversalOrder



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 16 Caolán McNamara 2013-04-03 09:53:37 UTC
thanks terrence, that fix will do fine
Comment 17 Commit Notification 2013-04-04 09:35:08 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=8c4f20938bc5d4013126a00a02e8eca80b250284

Related: fdo#61688 get strict ordering right



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 18 Terrence Enger 2013-04-04 17:48:51 UTC
I have removed the first patch and applied Caolan's patch.  This bug is still fixed and my other problems are plausibly unrelated.  

So, I suggest that the first patch be reversed on the grounds that (a) as a performance optimization it is not needed, and (b) it is a distraction to the person reading the code of the function.

Caolan?

Terry.
Comment 19 Caolán McNamara 2013-04-04 19:27:19 UTC
sure, easiest thing is to push your proposed patch to gerrit and someone can take it from there if I don't get around to it.
Comment 20 Commit Notification 2013-04-15 12:56:40 UTC
Terrence Enger committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=e9c050115bb1654c3c91138820b2e9ea0cf7a991

fdo#61688 SIGABRT with debug build in VclBuilder::handleChild



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 21 Terrence Enger 2013-08-17 19:14:32 UTC
My localbuilt LibreOffice is master commit ecb842e, fetched 2013-08-15 22:30 UTC, configured with
    --enable-option-checking=fatal
    --enable-dbgutil
    --enable-crashdump
    --without-system-postgresql
    --without-myspell-dicts
    --without-help
    --with-extra-buildid
    --without-doxygen
    --with-external-tar=/home/terry/lo_hacking/git/src
built on ubuntu-natty (11.04) 32-bit, an running variously on that system and ubuntu-quantal (12.10) 64-bit.

This is at least the second time where I have had a crash in a debug
iterator but others have been unable to reproduce it.  Bug 61688
"SIGABRT with debug build in VclBuilder::handleChild" is the example
which comes to mind.


Meanwhile, I have been trying with remarkably little success to use
gdb on the program.  The pop-up menu (or at least a gray box where the
pop=up menu was) continues to overlay other windows while the program
is at a breakpoint, and I lose random keystrokes and all mouse events.
Suggestions welcome.
Comment 22 Terrence Enger 2013-08-21 18:23:57 UTC
Whoops.  I intended comment 21 for bug 67128.