Bug 133770 - SalUserEventList::isFrameAlive hang after crash
Summary: SalUserEventList::isFrameAlive hang after crash
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.0.0.0.alpha1+
Hardware: All Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 131677 132219 (view as bug list)
Depends on:
Blocks: Crash
  Show dependency treegraph
 
Reported: 2020-06-07 18:27 UTC by Telesto
Modified: 2021-05-07 16:19 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
Very sleepy profile stack (15.60 KB, application/vnd.oasis.opendocument.spreadsheet)
2020-06-07 18:28 UTC, Telesto
Details
Bibisect attempt (wrong) (5.75 KB, text/plain)
2020-06-09 19:13 UTC, Telesto
Details
BT with symbols VTune Profiler based on x39 build (70.00 KB, application/vnd.ms-excel)
2020-07-09 10:52 UTC, Telesto
Details
Flush data-stream BT (6.97 KB, text/plain)
2020-07-09 10:53 UTC, Telesto
Details
Bibisect log (3.92 KB, text/plain)
2020-07-28 18:44 UTC, Telesto
Details
Bibisect log (12.76 KB, application/vnd.oasis.opendocument.text)
2020-07-28 18:45 UTC, Telesto
Details
second file for comparison crash (291.18 KB, application/vnd.oasis.opendocument.text)
2020-08-02 10:17 UTC, Telesto
Details
Screenshot Document Recovery (35.02 KB, image/jpeg)
2020-12-07 22:19 UTC, Telesto
Details
Process manager after discarding recovery data and crashing LO the second time (147.69 KB, image/png)
2021-05-07 16:19 UTC, NISZ LibreOffice Team
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Telesto 2020-06-07 18:27:43 UTC
Description:
SalUserEventList::isFrameAlive hang after crash

Steps to Reproduce:
1. Open attachment 142401 [details] from bug 117896
2. Open Writer
3. Indent
4. Undo
5. Redo -> Crash (2-5 working starting from 6.3
6. Crash notifier.. Press OK -> Process still alive consuming 100% CPU

Can't reproduce it in the bibisect repro.. only the installed version.. 

Actual Results:
Process being alive

Expected Results:
Quite


Reproducible: Always


User Profile Reset: Yes



Additional Info:
Version: 7.1.0.0.alpha0+ (x64)
Build ID: 191288d6a7fb52b31038a21c4e71ee57ffa3bacd
CPU-threads: 4; Besturingssysteem: Windows 6.3 Build 9600; UI-render: Skia/Rooster; VCL: win
Locale: nl-NL (nl_NL); GI: nl-NL
Calc: CL
Comment 1 Telesto 2020-06-07 18:28:26 UTC
Created attachment 161741 [details]
Very sleepy profile stack
Comment 2 Telesto 2020-06-07 18:43:05 UTC
I can actually reproduce it in bibisect build..

Accept the OK for the crash notifier with enter.. sometimes you need to do it a two or three rounds before the problem occurs
Comment 3 Xisco Faulí 2020-06-09 12:06:27 UTC
(In reply to Telesto from comment #0)
> Description:
> SalUserEventList::isFrameAlive hang after crash
> 
> Steps to Reproduce:
> 1. Open attachment 142401 [details] from bug 117896

Mentioned file is a xlsx file. Please, indicate which file needs to be used
Comment 4 Telesto 2020-06-09 18:22:08 UTC
*** Bug 131677 has been marked as a duplicate of this bug. ***
Comment 5 Telesto 2020-06-09 18:24:15 UTC
(In reply to Xisco Faulí from comment #3)
> (In reply to Telesto from comment #0)
> > Description:
> > SalUserEventList::isFrameAlive hang after crash
> > 
> > Steps to Reproduce:
> > 1. Open attachment 142401 [details] from bug 117896
> 
> Mentioned file is a xlsx file. Please, indicate which file needs to be used

Hmm.. lets keep it more basic.
1. Open Writer document
2. Open Another writer document
3. Indent
4. Undo
5. Redo -> Crash - (expected)
6. Process still alive -> SalUserEventList::isFrameAlive consuming 19% CPU.. needs be killed from a task manager
Comment 6 Telesto 2020-06-09 19:13:53 UTC
Created attachment 161816 [details]
Bibisect attempt (wrong)

Bad is certain.. good not so.. took the wrong turn somewhere. must be before 29 March based on my previous report (duplicate)
Comment 7 Buovjaga 2020-06-10 19:03:30 UTC
For me it just opens LibreOffice again after the crash. Do you mean you see SalUserEventList::isFrameAlive under the soffice.bin process somehow?

Tested with and without Skia.

Version: 7.1.0.0.alpha0+ (x64)
Build ID: 7f6d7a0eb624d67421cd5af6462ee2a662fdff3a
CPU threads: 4; OS: Windows 10.0 Build 18362; UI render: default; VCL: win
Locale: fi-FI (fi_FI); UI: en-US
Calc: threaded
Comment 8 Telesto 2020-06-10 19:37:49 UTC
(In reply to Buovjaga from comment #7)
> For me it just opens LibreOffice again after the crash. Do you mean you see
> SalUserEventList::isFrameAlive under the soffice.bin process somehow?
> 
> Tested with and without Skia.
> 
> Version: 7.1.0.0.alpha0+ (x64)
> Build ID: 7f6d7a0eb624d67421cd5af6462ee2a662fdff3a
> CPU threads: 4; OS: Windows 10.0 Build 18362; UI render: default; VCL: win
> Locale: fi-FI (fi_FI); UI: en-US
> Calc: threaded

Did you try one or two rounds? It does not always happen.. 

SalUserEventList::isFrameAlive is the only process alive.. looping.. Very Sleepy CS (quite a nice tool.. to get some insight what the LibreOffice doing.. where it is spending time.. or if a hang bug is same or not.. )

SalUserEventList::isFrameAlive
SalBitmap::~SalBitmap
Bitmap::~Bitmap
Image::ImplInit
Image::~Image
com_sun_star_form_OTimeModel_get_implementation
unit_lok_process_events_to_idle
execute_onexit_table

Somehow a process is executed on exit..
Comment 9 Buovjaga 2020-06-10 19:45:17 UTC
(In reply to Telesto from comment #8)
> Did you try one or two rounds? It does not always happen.. 

Yes, several.

> SalUserEventList::isFrameAlive is the only process alive.. looping.. Very
> Sleepy CS (quite a nice tool.. to get some insight what the LibreOffice
> doing.. where it is spending time.. or if a hang bug is same or not.. )

Does this imply step 0 is using Very Sleepy? You can't see this in Windows Task Manager (ie. a hanging soffice.bin)?

Is the behaviour different for you and me in that LibreOffice does not restart for you after the recovery window?
Comment 10 Telesto 2020-06-10 20:34:18 UTC
(In reply to Buovjaga from comment #9)
> (In reply to Telesto from comment #8)
> > Did you try one or two rounds? It does not always happen.. 
> 
> Yes, several.
> 
> > SalUserEventList::isFrameAlive is the only process alive.. looping.. Very
> > Sleepy CS (quite a nice tool.. to get some insight what the LibreOffice
> > doing.. where it is spending time.. or if a hang bug is same or not.. )
> 
> Does this imply step 0 is using Very Sleepy? You can't see this in Windows
> Task Manager (ie. a hanging soffice.bin)?
> 
> Is the behaviour different for you and me in that LibreOffice does not
> restart for you after the recovery window?

soffice.bin is hanging (but still busy at around 25%). There is only one thread left SalUserEventList::isFrameAlive. All other thread are internal processes are death already.

There is no restart.. no LibreOffice window only a process running.. which needs to be killed manually.. If you attempt to launch LibreOffice again.. you get boot/launch loop. Splash screen, splash screen, splash screen .. until to older process is killed..

A bibisect is awfully hard. But of timing aspect, I assume.
Comment 11 Telesto 2020-07-09 06:33:00 UTC
See also bug 134674
Comment 12 Telesto 2020-07-09 10:51:47 UTC
@Xisco
Are you able to reproduce this (I'm not the only one: bug 134674)

1. Open Writer document
2. Open Another writer document
3. Indent
4. Undo
5. Redo -> Crash - (expected)
6. Process still alive -> SalUserEventList::isFrameAlive consuming 19% CPU.. needs be killed from a task manager

Sometimes it takes 2-3 rounds of crashing before the issue starts
Comment 13 Telesto 2020-07-09 10:52:34 UTC
Created attachment 162843 [details]
BT with symbols VTune Profiler based on x39 build
Comment 14 Telesto 2020-07-09 10:53:53 UTC
Created attachment 162844 [details]
Flush data-stream BT
Comment 15 Xisco Faulí 2020-07-10 16:28:53 UTC
(In reply to Telesto from comment #12)
> @Xisco
> Are you able to reproduce this (I'm not the only one: bug 134674)
> 
> 1. Open Writer document
> 2. Open Another writer document
> 3. Indent
> 4. Undo
> 5. Redo -> Crash - (expected)
> 6. Process still alive -> SalUserEventList::isFrameAlive consuming 19% CPU..
> needs be killed from a task manager
> 
> Sometimes it takes 2-3 rounds of crashing before the issue starts

Not reproducible with the bisect repository nor a local build

Version: 7.1.0.0.alpha0+
Build ID: 358674b87b8d9cd78079fb105aa81b50f4b5029b
CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 16 Telesto 2020-07-11 07:27:12 UTC
(In reply to Xisco Faulí from comment #15)
I forgot to add, Windows only
Comment 17 Telesto 2020-07-20 18:05:33 UTC
*** Bug 132219 has been marked as a duplicate of this bug. ***
Comment 18 Telesto 2020-07-28 18:44:41 UTC
Created attachment 163711 [details]
Bibisect log

Following range 5dca5309207b6b3cd5bed68da47223e08a3ac3f8 is the first bad commit
commit 5dca5309207b6b3cd5bed68da47223e08a3ac3f8
Author: Norbert Thiebaud <nthiebaud@gmail.com>
Date:   Tue Mar 17 16:44:48 2020 -0700

    source sha:6a05f8810684024303047ac9105be4ff5ae8c536

    source sha:6a05f8810684024303047ac9105be4ff5ae8c536
    source sha:4e755e622f2d782d657626b6234fb3acd3d08e15
    source sha:7af11e2e051eedd790e0ed8c8ac0e1e667c1001b
    source sha:d1a4f95def7d65165a992613784564c02b1c76bb
    source sha:9323307d675b71c501534ee98872a2f00b465bc2
    source sha:d08d5c1857482cb3789ed2896921abeb83f4d217
    source sha:8d9d6d43a9d895eb781a7fb7f47b7e4342883829
    source sha:0c225c7c2b47d7ec57ab7f3f2a900aaac78031d0
    source sha:b389b5958787b142a42d95744f46ccc9b94cf0a9
    source sha:7520e2b2126c05aadb738256313d2f250b9ded62
    source sha:24973523ba59087185d434396fd614e73d72107f
Comment 19 Telesto 2020-07-28 18:45:31 UTC
Created attachment 163712 [details]
Bibisect log

1. Open the attached file
2. CTRL+A
3. CTRL+C
4. CTRL+N
5. CTRL+V
6. CTRL+Z -> Crash

Mostly 2 runs are needed
Comment 20 Telesto 2020-07-28 18:46:11 UTC
@Roman
You did encounter a hang after crash too? IIRC
Comment 21 Roman Kuznetsov 2020-07-28 20:01:18 UTC
(In reply to Telesto from comment #20)
> @Roman
> You did encounter a hang after crash too? IIRC

I saw a hang after crash in my own crash bug report (Skia problem for Fade slide transition effect)

I can test your example later
Comment 22 Telesto 2020-08-02 10:17:10 UTC
Download
1. https://wiki.documentfoundation.org/images/f/f3/GS50-GettingStartedLO.odt
2. Edit -> Track changes -> Compare documents -> Open the attached file (total different)
3. LibreOffice stops functioning. No termination; debugger still running.
Comment 23 Telesto 2020-08-02 10:17:39 UTC
Created attachment 163876 [details]
second file for comparison crash
Comment 24 Telesto 2020-12-06 13:10:30 UTC
@Jan-Marek..
The deadlock situation you're working on: Unlock scheduler in deinit for ProcessEventsToIdle bring this bug (with duplicates) (and more distant bug 135073 bug 131681)

For me lingering soffice.bin after crash only occurs on second and following crashes..

1. open a bug doc able to crash LibO
2. Let it crash -> file recovery dialog and such appears (everything OK) 
3. Make it crash again with same bug doc
4. Now we see a lingering soffice.bin at 25% CPU.
Comment 25 Jan-Marek Glogowski 2020-12-07 11:50:55 UTC
(In reply to Telesto from comment #24)
> @Jan-Marek..
> The deadlock situation you're working on: Unlock scheduler in deinit for
> ProcessEventsToIdle bring this bug (with duplicates) (and more distant bug
> 135073 bug 131681)

That Scheduler code Mike and I are working on is there for years:

commit fd0fff67798fea87217e65bb1561aa0d0e741c51
Author: Jan-Marek Glogowski <glogow@fbihome.de>
Date:   Fri Jul 28 17:13:20 2017 +0200

    Assert active Tasks on scheduler de-init

This ProcessEventsToIdle() is just enabled for a debug build, as this is just some debug facility to verify the active static Tasks list. Normally - at the point of Scheduler::ImplDeInitScheduler - no more Tasks will be processed, but the list of pending Tasks can be quite long (which may be a problem in itself).

> For me lingering soffice.bin after crash only occurs on second and following
> crashes..
> 
> 1. open a bug doc able to crash LibO
> 2. Let it crash -> file recovery dialog and such appears (everything OK) 
> 3. Make it crash again with same bug doc
> 4. Now we see a lingering soffice.bin at 25% CPU.

This whole bug report doesn't make sense. If LO really crashes, the process will be gone, so there is no way to "hang" in SalUserEventList::isFrameAlive. Maybe the LO process is not crashed but somehow deadlocked to begin with? No idea, how the watchdog is handling this.
Comment 26 Telesto 2020-12-07 12:36:19 UTC
(In reply to Jan-Marek Glogowski from comment #25)
> This whole bug report doesn't make sense. If LO really crashes, the process
> will be gone, so there is no way to "hang" in
> SalUserEventList::isFrameAlive. Maybe the LO process is not crashed but
> somehow deadlocked to begin with? No idea, how the watchdog is handling this.

First of, I misconceived where working on. Clearly a different topic. Me thinking I know something about the topic, while not knowing anything at all :P

Mea culpa

---About topic here 

The issue is more me lacking the adequate terminology/vocabulary. Take a STR which normally would crash/terminate LibreOffice. Directly without 'grace' or with intermediary dialog show up, LibreOffice crashed storing recovery information now.. Pressing OK should finally kill/terminate soffice.bin process. 

However In my case the 'termination' doesn't kick in. It works for the first round using known crasher, but second 'crash' gets stuck). So you get LibreOffice crashed storing recovery information dialog. You process OK. LibreOffice is gone from screen, Except the process still lingering in task manager (soffice.bin). Launching LibreOffice again, while soffice.bin still active in the background causes a bootloop. Splash screen, splash screen etc. until the older soffice.bin process lingering in the task manager is terminated manually.

Looking inside the lingering process with VerySleepy shows process SalUserEventList::isFrameAlive looping/running (25% CPU usage). Not that they tool is reliable for proper/clean stacks. But looks like for searching for a 'frame' which killed

Which I somehow connect with https://bugs.documentfoundation.org/show_bug.cgi?id=138022 (as it start somewhere with Skia implementation). But that's a n00b assessment not to be taken seriously :P

Or a deadlock issue or something like that. But as stated at the beginning.. I think I know something but probably know squad.
Comment 27 Jan-Marek Glogowski 2020-12-07 13:03:11 UTC
I just checked desktop/win32/source/loader.cxx, the LO watchdog process. It only does a loop of MsgWaitForMultipleObjects(1, &aProcessInfo.hProcess, ...) and then calls GetExitCodeProcess. I guess the loop won't terminate, while the process is running and then we check the exit code to handle some LO specific exit codes for uncaught exceptions or a normal restart.

Interestingly we don't check the return value from GetExitCodeProcess, so won't detect a case of STILL_ACTIVE. Maybe our watchdog is buggy?
Can you check the pid to verify, if it's really the old process or a new one and it's actually a startup problem?

Not that I yet tried to reproduce this. Just some curiosity.

----------

(In reply to Telesto from comment #26)
...
> LibreOffice crashed storing recovery information now..

If a process has really crashed, you have no way to "storing recovery information". These are just saved in some interval, while everything is still ok. Maybe LO does some additional stuff for the "uncaught exception" case, but in any case the process state will be compromised, so no way to store anything useful (except for the process backtrace, if possible, but that's just a bonus for developers).

P.S. I guess STR = Steps to Reproduce.
Comment 28 Telesto 2020-12-07 22:19:04 UTC
Created attachment 167918 [details]
Screenshot Document Recovery

(In reply to Jan-Marek Glogowski from comment #27)
> I just checked desktop/win32/source/loader.cxx, the LO watchdog process. It
> only does a loop of MsgWaitForMultipleObjects(1, &aProcessInfo.hProcess,
> ...) and then calls GetExitCodeProcess. I guess the loop won't terminate,
> while the process is running and then we check the exit code to handle some
> LO specific exit codes for uncaught exceptions or a normal restart.
> 
> Interestingly we don't check the return value from GetExitCodeProcess, so
> won't detect a case of STILL_ACTIVE. Maybe our watchdog is buggy?
> Can you check the pid to verify, if it's really the old process or a new one
> and it's actually a startup problem?
> 
> Not that I yet tried to reproduce this. Just some curiosity.

A) See screenshot What I mean with Document Recovery Windows with "graceful" crash. Instead of abruptly terminating.. 

Off-topic Note: Still not seeing the whole point of that dialog (but really of off topic)
* it manages to crash again while saving the recovery info in number of cases.. mostly if the crash is caused by rendering/layout flaw. Kind of hard to keep showcases if this.. people tend to fix the layout flaws (so you never reach the crash of the recovery dialog; so kind of masked). Also speculating this causing lovely recovery issues... 
 
* it turns up in 40% of the cases or so (again depending on crasher). The other 60% it simply skips it: the lovely abrupt termination.. 

--
B) It's the old process not exiting.. (did check pid specifically, but I memory usage 600 mb to different from new started process..

C) The screenshot is based on bug 138718 (which also illustrated the 'deadlock' or whatever is happening). Or I'm I the only one encountering this on my Windows 8.1 system? FWIW: as said the first time crash working properly.. failure when doing same thing again (so only the second and following runs). After reboot everything is back to normal, until the second crash. It isn't my user profile.. nor limited to specific LibO version.. 7.0 branch and up, I think (but maybe even 6.4?)
Comment 29 NISZ LibreOffice Team 2021-05-07 16:17:48 UTC
Oh, I can confirm this :).

* Take a known document and steps to crash LO. I used the one in bug 126226 and steps to make it crash (now this one got fixed in master, but any known crash can be used. This bug bothered me for months before finding how to reproduce.).

* Document recovery kicks in, and offers to restore the document. 
* Choose Discard here, this is the most important.
* Load the document again and make it crash again.
* This second time the recovery won't kick in, but LO keeps the processor 100% loaded. Only way to stop it is to kill via Process Manager.

I bibisected this to the same range as comment #18:

https://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=24973523ba59087185d434396fd614e73d72107f..6a05f8810684024303047ac9105be4ff5ae8c536 

Age	Commit message (Expand)	Author	Files	Lines
2020-03-04	Fix typo	Andrea Gelmini	1	-2/+2
2020-03-04	scroll to the row when putting the cursor in it	Caolán McNamara	1	-0/+3
2020-03-04	add iter_previous_sibling	Caolán McNamara	3	-0/+16
2020-03-04	uitest: speed up close_doc()	Miklos Vajna	1	-15/+13
2020-03-04	fix ToC links give unhelpful url popups	Mert Tumer	2	-3/+13
2020-03-04	tdf#125520 enhance internal OLE cloning	Armin Le Grand	1	-1/+14
2020-03-04	Fix typo	Andrea Gelmini	1	-1/+1
2020-03-04	tdf#129796 junk in reference edit box	Noel Grandin	1	-1/+1
2020-03-04	make some symbols private	Noel Grandin	5	-25/+19
2020-03-04	We have had C++11 for some time now	Tor Lillqvist	4	-20/+20
Comment 30 NISZ LibreOffice Team 2021-05-07 16:19:26 UTC
Created attachment 171755 [details]
Process manager after discarding recovery data and crashing LO the second time