Both with LibO 3.4 beta-x and my own builds under Ubuntu 10.04 x86-64, oosplash.bin does not quit, system monitor says it is waiting : futex_wait_queue_me. After a moment of inactivity oosplash.bin crashes. With previous beta versions it was eating 100% of the cpu before it crashes, but it is not the case anymore with RC1 and my own builds. Under gdb I get following informations: Program received signal SIGABRT, Aborted. 0x00007f008462ea75 in raise () from /lib/libc.so.6 (gdb) thread apply all backtrace Thread 3 (Thread 0x7f008340b700 (LWP 5236)): #0 0x00007f008514bbc9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x00007f0084be20c6 in ?? () from /home/jbf/LibO/libreoffice-3-4/install/program/../basis-link/ure-link/lib/libuno_sal.so.3 #2 0x00007f00851469ca in start_thread () from /lib/libpthread.so.0 #3 0x00007f00846e170d in clone () from /lib/libc.so.6 #4 0x0000000000000000 in ?? () Thread 2 (Thread 0x7f0082409700 (LWP 5240)): #0 0x00007f008514f48d in waitpid () from /lib/libpthread.so.0 #1 0x00007f0084bbce65 in ?? () from /home/jbf/LibO/libreoffice-3-4/install/program/../basis-link/ure-link/lib/libuno_sal.so.3 #2 0x00007f0084bbb2ec in ?? () from /home/jbf/LibO/libreoffice-3-4/install/program/../basis-link/ure-link/lib/libuno_sal.so.3 #3 0x00007f00851469ca in start_thread () from /lib/libpthread.so.0 #4 0x00007f00846e170d in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f008554e720 (LWP 5229)): #0 0x00007f008462ea75 in raise () from /lib/libc.so.6 #1 0x00007f00846325c0 in abort () from /lib/libc.so.6 #2 0x00007f00846725e0 in ?? () from /lib/libc.so.6 #3 0x00007f0084e542a2 in ?? () from /usr/lib/libX11.so.6 #4 0x00007f0084e54c07 in _XEventsQueued () from /usr/lib/libX11.so.6 #5 0x00007f0084e2c2da in XFlush () from /usr/lib/libX11.so.6 ---Type <return> to continue, or q <return> to quit--- #6 0x000000000040368c in splash_draw_progress () #7 0x0000000000405dcc in ?? () #8 0x0000000000406992 in main () (gdb) Hope this help to fix the problem. Best regards. JBF
Maybe a duplicate of Bug 35693
If I build LibreOffice 3.4 with debugging symbols I get the following trace : Program received signal SIGABRT, Aborted. 0x00007fc63a9dca75 in raise () from /lib/libc.so.6 (gdb) thread apply all backtrace Thread 3 (Thread 0x7fc6397b9700 (LWP 7504)): #0 0x00007fc63b4f9bc9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x00007fc63af900c6 in rtl_cache_wsupdate_wait (arg=<value optimized out>) at alloc_cache.c:1417 #2 rtl_cache_wsupdate_all (arg=<value optimized out>) at alloc_cache.c:1561 #3 0x00007fc63b4f49ca in start_thread () from /lib/libpthread.so.0 #4 0x00007fc63aa8f70d in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Thread 2 (Thread 0x7fc6387b7700 (LWP 7508)): #0 0x00007fc63b4fd48d in waitpid () from /lib/libpthread.so.0 #1 0x00007fc63af6ae65 in ChildStatusProc (pData=0x7fffeb3749e0) at process.c:612 #2 0x00007fc63af692ec in osl_thread_start_Impl (pData=<value optimized out>) at thread.c:276 #3 0x00007fc63b4f49ca in start_thread () from /lib/libpthread.so.0 #4 0x00007fc63aa8f70d in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Thread 1 (Thread 0x7fc63b8fc720 (LWP 7497)): #0 0x00007fc63a9dca75 in raise () from /lib/libc.so.6 #1 0x00007fc63a9e05c0 in abort () from /lib/libc.so.6 #2 0x00007fc63aa205e0 in ?? () from /lib/libc.so.6 #3 0x00007fc63b2022a2 in ?? () from /usr/lib/libX11.so.6 #4 0x00007fc63b202c07 in _XEventsQueued () from /usr/lib/libX11.so.6 ---Type <return> to continue, or q <return> to quit--- #5 0x00007fc63b1da2da in XFlush () from /usr/lib/libX11.so.6 #6 0x000000000040368c in process_events (progress=<value optimized out>) at splashx.c:579 #7 splash_draw_progress (progress=<value optimized out>) at splashx.c:616 #8 0x0000000000405dcc in sal_main_with_args (argc=<value optimized out>, argv=<value optimized out>) at start.c:981 #9 0x0000000000406992 in main (argc=2, argv=0x7fffeb378f08) at start.c:891 (gdb) Best regards. JBF
Interesting :-) if I had to guess, I would say we are missing a CLOEXEC bit on our X socket - such that it is cloned into the soffice.bin sub-process, which then causes trouble later. Unfortunately, its hard to tell. Any chance you can install some debuginfo packages for X and also glibc - and re-run inside gdb ? - that would give us trace information for deeper inside X (ie. this bit): #2 0x00007fc63aa205e0 in ?? () from /lib/libc.so.6 #3 0x00007fc63b2022a2 in ?? () from /usr/lib/libX11.so.6 #4 0x00007fc63b202c07 in _XEventsQueued () from /usr/lib/libX11.so.6 #5 0x00007fc63b1da2da in XFlush () from /usr/lib/libX11.so.6 I guess it is just an XIOerror cf. http://cgit.freedesktop.org/xorg/lib/libX11/tree/src/xcb_io.c#n344 Another thing that would -really- help, particularly if it crashes nice and quickly like this would be to do: strace -f -o /tmp/log soffice -writer # or whatever you run And when it has failed: gzip /tmp/log - and attach it here. Anyhow - interesting bug, thanks for the help !
Well, I do not know what happend: I wait since several days that oosplash.bin crashes but he does not want to do that. He does not close cleanly either. Long life to LibreOffice. JBF
lol - sorry about the lack of crash when we want it: that sucks. Anyhow - if you can find him please do update the bug ! (and thanks for your report & support).
Using the LibreOffice 3.4.0 release on Archlinux, I get a similar behaviour. However, oosplash.bin does not crash, it only eats 100% of the cpu (in fact, I usually don't let it run longer than a dozen of seconds before killing it, I hate overheating) The weird thing is that it usually happens something like 15 minutes after launching Libreoffice, which is a bit late for a splash screen ;) Steps to reproduce (the bug does not seem to show up every time): - open up a odt document; - forget LibreOffice on a spare desktop and start doing something else; - after a certain amount of time (typically more than 10 minutes), the oosplash.bin process starts eating all the cpu.
I confirm this bug on a similar system (amd64, archlinux, LO 3.4). However I am unable to recompile LO with debugging symbols in this system. I tried running with strace as Michael suggests: it didn't crash, but strace produced a very big log file after half an hour (3.9 GB!), and spit a PANIC message to the console (I inadvertently close the console before taking notes, sorry). I'm trying to reproduce this PANIC or the oosplash.bin crash, but it just comes and goes. After I'm able to do any of these, I'll post here the result.
Created attachment 48577 [details] strace log file Only the first 11MB of the actual log, which was 3.9GB originally.
Yes, I managed to reproduce it. oosplash.bin was using not 100% CPU, but a mere 15% (which still seems buggy and was overheating the computer). It continued like this until it crashed (normally I kill it, but I let it run to see what'd happen). When it crashed (not sure if at the exact same time) a message was printed to the console: PANIC: handle_group_exit: 26869 leader 26859. The same message I talked about earlier. The log is 3.9 GB big. It just didn't grew more because my root partition got filled entirely. I striped the first 11MB of the file to upload here, it's attached above. If I should have taken the last 11MB, just let me know. I figured the last parts would be repeated garbage, since the file was growing indefinitly, but I could be wrong.
A very weird behavior, indeed... I tried running LO with --nologo (which should disable the splash screen) Libreoffice shows up as expected, and blocks the terminal (sounds natural). However, some time later, I noticed I got my prompt again, with a segfault... But LO is still running (I guess it had forked before). So the segfault might well come from oosplash.bin... Running with time shows a very consistent timing for this segfault : zorun@tuxmachine ~$ time libreoffice --nologo /usr/share/themes/Shiki-Brave/gtk-2.0/gtkrc:126: Murrine configuration option "gradients" is no longer supported and will be ignored. Erreur de segmentation real 17m4.360s user 0m0.037s sys 0m0.063s zorun@tuxmachine ~$ time libreoffice --nologo /usr/share/themes/Shiki-Brave/gtk-2.0/gtkrc:126: Murrine configuration option "gradients" is no longer supported and will be ignored. Erreur de segmentation real 17m4.358s user 0m0.050s sys 0m0.053s zorun@tuxmachine ~$ zorun@tuxmachine ~$ libreoffice --version LibreOffice 3.4 340m1(Build:12) I'll attach a debug trace.
And 17 minutes later... Reading symbols from /usr/lib/libreoffice/program/oosplash.bin...(no debugging symbols found)...done. (gdb) r Starting program: /usr/lib/libreoffice/program/oosplash.bin --nologo [Thread debugging using libthread_db enabled] [New Thread 0x7ffff5ea8700 (LWP 12364)] [New Thread 0x7ffff5467700 (LWP 12365)] [New Thread 0x7ffff4c66700 (LWP 12368)] [Thread 0x7ffff5467700 (LWP 12365) exited] /usr/share/themes/Shiki-Brave/gtk-2.0/gtkrc:126: Murrine configuration option "gradients" is no longer supported and will be ignored. Program received signal SIGSEGV, Segmentation fault. 0x00007ffff78c0056 in XSetForeground () from /usr/lib/libX11.so.6 (gdb) Not very helpful...
I confirm the bug, in the same form as stated by Comment6. Using Archlinux, x86_64. I first noticed the bug on version 3.4.0, but it is still present on 3.4.1. What can I do to help debug this? I can seem to reproduce it fairly easy (I just need to have patience, sometimes it takes about 20 minutes before oosplash.bin starts eating all the CPU cycles). This bug has also been reported on the Arch bugtracker, here is the link for reference: https://bugs.archlinux.org/task/24617 Thanks!
Ah - great catch :-) it seems this is just a free-memory read/write issue that eventually clobbers us. I've pushed a fix to master & am getting it reviewed for -3-4-2. Thanks ! :-)
(In reply to comment #13) > Ah - great catch :-) it seems this is just a free-memory read/write issue that > eventually clobbers us. I've pushed a fix to master & am getting it reviewed > for -3-4-2. > > Thanks ! :-) Any updates on this? This is still not fixed for me. LibreOffice 3.4.1 OOO340m1 (Build:103) Has the fix been accepted and/or is there a ETA on 3.4.2?
jw: Yes, this has been fixed with 3.4.2 see http://cgit.freedesktop.org/libreoffice/libs-core/commit/?h=libreoffice-3-4-2&id=8588ed1abda9973ac724e55adb38dadabddf6a05
reopening as setting display to NULL seems to trigger open a race condition: https://bugs.launchpad.net/ubuntu/+source/libreoffice/+bug/835153 Do we need a lock around splash_draw_progress() maybe? Just looking at the number of dupes this collected in a few days on a beta release makes me assume the condition to fire way too often.
I confirm that this bug is not fixed for me (LibO 3.4.3 on Ubuntu 10.04 x86_64). But, as I never install the package libreoffice-debian-menu for incompatibility reasons with LibreOffice 3.3.2 from Ubuntu PPA, I am not sure whether the problem is in LibO 3.4.x or in my installation. Kind regards. JBF
Identified fix on master: http://cgit.freedesktop.org/libreoffice/core/commit/desktop/unx/source/splashx.c?id=2daa098c3c1b49ce4d46306eddc45f7a66b7c021
cherrypicked as: http://cgit.freedesktop.org/libreoffice/libs-core/commit/?h=libreoffice-3-4&id=253ff23c3a93b5ea45a2451a8bc97fca19856a75 on libreoffice-3-4 to be in 3.4.4 and following releases.
resolving fixed then :-) thanks guys !
To troubleshoot and resolve the problem, I recommend following these steps: Make sure you have the latest updates and patches installed for both Ubuntu 10.04 and LibreOffice. Update your system using the package manager to ensure you have the most recent versions. https://bitlife2.com/