Bug 97277 - Windows: SEH and crash-handling / OpenGLZone
Summary: Windows: SEH and crash-handling / OpenGLZone
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
5.1.0.1 rc
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 97278 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-01-20 14:09 UTC by Michael Meeks
Modified: 2016-06-22 13:07 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Meeks 2016-01-20 14:09:06 UTC
It -appears- that windows SEH unwinds local variables; so things like:

{
   OpenGLZone aZone;
   <segv>
}

Will happily do ~OpenGLZone() - as the SEH exception is unwound.

Since commit:

http://cgit.freedesktop.org/libreoffice/core/commit/?id=977a2f14dbb23a8ff1281e799f0c0af43aa2fb52

It appears that we catch all un-handled exceptions and print information about them - and call FatalError.

This would (seem) to include seg faults - but most interestingly, the FatalError handler in desktop/ doesn't appear to trigger the auto-save mechanism; just throws up a native dialog with "SEH exception" or somesuch - which seems crazy-odd.

At least, this is what I suspect ;-) it seems our seg-fault handler:

sal/osl/w32/signal.c uses:

    SetUnhandledExceptionFilter(SignalHandlerFunction);

while I imagine catch (...) at the very top-level handles ~everything at this stage ;-)

Quite possibly we want to use:

AddVectoredExceptionHandler

Instead to catch these bad-boys, and/or turn off SEH or ...

Needs further investigation.
Comment 1 Michael Meeks 2016-01-20 14:10:16 UTC
It is also interesting that the ShowNativeDialog method appears to run the native windows event queue - which then processes timeouts, which come back into VCL and can cause more grief & aggravation ... really VCL sould have the 'FatalError' handler - and use it to turn off all pending VCL timeouts and ideally event processing etc. (I guess).
Comment 2 Michael Meeks 2016-01-20 14:16:29 UTC
Capturing some wisdom from IRC for future reference:

<sberg> mmeeks, looking it up, we pass -EHa (solenv/gbuild/platform/com_MSC_defs.mk), and that means we both catch SEH exceptions with C++ catch(...), and destroy C++ objects during SEH stack unwinding; we should probably switch to /EHs instead (<https://msdn.microsoft.com/en-us/library/1deeycx5.aspx>), can't remember why we do that nonsense
<sberg> mmeeks, reading it right now; so seems /EHs would be what you want there

Ultimately, it seems the crash handler does get triggered though - so quite what is going on here; I have no clue - needs more investigation & testing I think.
Comment 3 Stephan Bergmann 2016-01-20 14:30:38 UTC
see also <http://thread.gmane.org/gmane.comp.documentfoundation.libreoffice.scm/39102/focus=55516> "Re: [Libreoffice-commits] Resolves: #i123747# allow treating Window's SEH events as C++ exceptions"
Comment 4 Cor Nouws 2016-01-20 16:27:22 UTC
*** Bug 97278 has been marked as a duplicate of this bug. ***
Comment 5 Michael Meeks 2016-06-17 17:37:17 UTC
Markus - you had some thoughts here. IIRC your contention is that the SEH stuff in fact calls a handler, which traps and does the auto-save / cleanup - and then we do not propagate a structured exception up to:

desktop/source/app/app.cxx (Desktop::Main) and hit:

        catch( ...)
        {
            RequestHandler::SetDowning();
            FatalError( "Caught Unknown Exception: Aborting!");
        }

which is - far from ideal =)

If so - lets close the bug ...
Comment 6 Markus Mohrhard 2016-06-20 17:27:28 UTC
So let us sum up here what I understand about SEH right now.

Ideally we should never handle SEH (even MS warns that this leads to problems). Now the problem that you are seeing is most likely a difference between the handling of x86 and x64 in how the unwinding is handled.

the crash reporting shows that our signal handlers work currently as they translate the SEH to some exception sal exception information and call the signal handler at that point.
I think after some reading that this happens at different times on x64 and x86 which might explain some of the strange stack traces that we see with the crash reporting for x64 and your problem of destruction of the OpenGLZone variable before calling the signal handler.

IMHO the new behavior of using EHs should fix all of that as there is now no destruction of local objects and as far as my understanding goes the signal handler should be called as soon as the exception is generated as there is no catch block any more.

I hope that my understanding of SEH is at least partly correct.
Comment 7 Michael Meeks 2016-06-20 19:37:32 UTC
Thanks; so I guess we (you? ;-) should close this one ?
Comment 8 Caolán McNamara 2016-06-22 13:07:01 UTC
Lets assume we're done here