Bug 101822 - glxtest zombie process on all non-Gtk+ X11 VCL platforms
Summary: glxtest zombie process on all non-Gtk+ X11 VCL platforms
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: graphics stack (show other bugs)
Version:
(earliest affected)
5.0.6.3 release
Hardware: All Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:5.3.0
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-31 16:26 UTC by Jan-Marek Glogowski
Modified: 2017-11-09 12:56 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
GDB session from a Gtk+ based LO start and shutdown (13.23 KB, text/plain)
2016-08-31 16:27 UTC, Jan-Marek Glogowski
Details
GDB session from a KDE4 based LO start and shutdown (11.50 KB, text/plain)
2016-08-31 16:27 UTC, Jan-Marek Glogowski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan-Marek Glogowski 2016-08-31 16:26:24 UTC
Since working on our 5.0 release I've noticed a zombie process spawned from LO. sberg pointed me to glxtest, which I could verify to be the culprit.

The original problem is OpenGLHelper::isVCLOpenGLEnabled(), which calls officecfg::Office::Common::VCL::ForceOpenGL::get(), which throws an exception in comphelper::getProcessServiceFactory(), which is caught somewhere down the stack, and now can't cleanup the Zombie and misses some of the OpenGL stuff. Actually I'm amazed that LO still manages to run with an unexpected exception in the OpenGL code :-)

I could verify that this problem happens with all non-Gtk+ based X11 VCL plugins (kde4, gen, tde).

In the end I did two debug runs with backtrace breakpoints for Gtk+ and KDE4 (AKA break with command ; bt ; c ; end).

This pointed to the commit "wait until we know the UI language before initializing gtk" - http://cgit.freedesktop.org/libreoffice/core/commit/?id=d07e7d692ddd2a9ab956a59bcc0f676c7d76bc10
At this point application initialization was split, with the Gtk+ i18n stuff depending on it.

From the attached debug logs:

#15 0x00002aaab118a514 in CreateSalInstance() () at /home/glg/Development/libreoffice/symbols/vcl/unx/generic/plugadapt/salplug.cxx:240
#16 0x00002aaab1107780 in InitVCL() () at /home/glg/Development/libreoffice/symbols/vcl/source/app/svmain.cxx:290

and

#14 0x00002aaac40a122b in GtkInstance::AfterAppInit() (this=0x754c50) at /home/glg/Development/libreoffice/symbols/vcl/unx/gtk/gtkinst.cxx:175
#15 0x00002aaab11078a8 in InitVCL() () at /home/glg/Development/libreoffice/symbols/vcl/source/app/svmain.cxx:304

So Gtk+ now delays some initialization, especially the problematic SalDisplay::Init() to AfterAppInit(). I guess it's not the origin of the problem, but something else silently depended on the move. Not sure if it's easier to move the SalDisplay::Init() based init to AfterAppInit() in all other VCL plugins, especially if I see the sprinkling of EnsureInit() everywhere in Gtk+ :-(
Comment 1 Jan-Marek Glogowski 2016-08-31 16:27:29 UTC
Created attachment 127097 [details]
GDB session from a Gtk+ based LO start and shutdown
Comment 2 Jan-Marek Glogowski 2016-08-31 16:27:56 UTC
Created attachment 127098 [details]
GDB session from a KDE4 based LO start and shutdown
Comment 3 Jan-Marek Glogowski 2016-09-01 10:13:33 UTC
So I investigated the catch point:

/home/glg/Development/libreoffice/symbols/vcl/unx/generic/app/saldisp.cxx:231

226         try {
227             bool bUseOpenGL = OpenGLHelper::isVCLOpenGLEnabled();
228             if (bUseOpenGL && BestOpenGLVisual(pDisplay, nScreen, rVI))
229                 return rVI.visualid == nDefVID;
230         }
231         catch (const css::uno::DeploymentException&)
232         {
233             // too early to try to access configmgr
234         }

which was included in 
commit 37800290245fd0462295a8bbaabd9d761929fa65
"vcl: Stop-gap solution to start the gen / kde / kde4 plugins again."

So first I'll revert it, as the try / catch block just pampers over the original breaking commit. I hope this won't break any non-X11 backends, as I can't test them.

So for KDE4 I moved the SalKDEDisplay creation to KDESalInstance::AfterAppInit(), which "Works for me". I will look into tde and gen next.
Comment 4 Commit Notification 2016-09-23 21:29:30 UTC
Jan-Marek Glogowski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=3bc2b8c5e0c4213b53a974944189bdf7f8155502

tdf#101822 Always de-zombie the glxtest process

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 5 Commit Notification 2016-09-23 21:29:35 UTC
Jan-Marek Glogowski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=06283e7b00f9f7b7ad1a3e30d2dcb85c8d550588

tdf#101822 X11 SalDisplay init => AfterAppInit

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Commit Notification 2016-09-23 21:29:40 UTC
Jan-Marek Glogowski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=971947b38d1dbc6213e55403cf482a530cd9b449

tdf#101822 Revert "vcl: Stop-gap solution to ...

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Xisco Faulí 2016-09-26 10:27:25 UTC
Changing status to NEW
Comment 8 Xisco Faulí 2016-11-08 09:02:18 UTC
Hello Jan-Marek Glogowski,
Is this bug fixed?
If so, could you please close it as RESOLVED FIXED?
Comment 9 QA Administrators 2017-11-09 07:42:58 UTC Comment hidden (obsolete)