Description: Ever since moving to 25.2.0.3 (AARCH64), I can reproducibly get a mutex lock in Base requiring a forced kill exit and restart. 1) The database connection is to a mysql backend using the direct (native) connector. 2) On screen, I display the main base window, and then open a table for direct editing of data. 3) I set a filter on the table to select a subset of records. 4) I edit one or two fields of a given record, and then validate those entries. 5) I then try to remove the filter, i.e. set it back to displaying all of the records. 6) At this point, LO goes into spinning beachball mode, and I'm required to force kill the application. 7) The Apple trace that is produced at the moment the LO process is terminated is enclosed. It appears to show a concurrent mutex lock, or race condition, caused by an XAccessible listener: comphelper::OInterfaceContainerHelper3<com::sun::star::form::XLoadListener>::forEach<comphelper::OInterfaceContainerHelper3<com::sun::star::form::XLoadListener>::NotifySingleListener<com::sun::star::lang::EventObject>>(comphelper::OInterfaceContainerHelper3<com::sun::star::form::XLoadListener>::NotifySingleListener<com::sun::star::lang::EventObject> const&) + 224 (libmergedlo.dylib + 10137008) [0x10bc3edb0] 22 FmXGridPeer::reloading(com::sun::star::lang::EventObject const&) + 104 (libmergedlo.dylib + 25632824) [0x10cb06038] 22 DbGridControl::setDataSource(com::sun::star::uno::Reference<com::sun::star::sdbc::XRowSet> const&, DbGridControlOptions) + 976 (libmergedlo.dylib + 25819700) [0x10cb33a34] 22 DbGridControl::RemoveRows() + 580 (libmergedlo.dylib + 25816064) [0x10cb32c00] 22 svt::EditBrowseBox::RemoveRows() + 20 (libmergedlo.dylib + 22718388) [0x10c83e7b4] 22 BrowseBox::Clear() + 436 (libmergedlo.dylib + 22619892) [0x10c8266f4] 22 accessibility::AccessibleBrowseBoxAccess::commitEvent(short, com::sun::star::uno::Any const&, com::sun::star::uno::Any const&) + 72 (libmergedlo.dylib + 42695428) [0x10db4bb04] 22 accessibility::AccessibleBrowseBoxBase::commitEvent(short, com::sun::star::uno::Any const&, com::sun::star::uno::Any const&) + 212 (libmergedlo.dylib + 42702980) [0x10db4d884] 22 comphelper::AccessibleEventNotifier::addEvent(unsigned int, com::sun::star::accessibility::AccessibleEventObject const&) + 400 (libmergedlo.dylib + 4619828) [0x10b6fbe34] 22 DocumentFocusListener::notifyEvent(com::sun::star::accessibility::AccessibleEventObject const&) + 248 (libvclplug_osxlo.dylib + 79324) [0x106a475dc] 22 DocumentFocusListener::detachRecursive(com::sun::star::uno::Reference<com::sun::star::accessibility::XAccessible> const&) + 76 (libvclplug_osxlo.dylib + 80692) [0x106a47b34] 22 accessibility::AccessibleBrowseBoxBase::getAccessibleStateSet() + 100 (libmergedlo.dylib + 42700056) [0x10db4cd18] 22 accessibility::AccessibleBrowseBoxBase::implCreateStateSet() + 116 (libmergedlo.dylib + 42705636) [0x10db4e2e4] 22 accessibility::AccessibleBrowseBoxBase::implIsShowing() + 60 (libmergedlo.dylib + 42705100) [0x10db4e0cc] 22 accessibility::AccessibleBrowseBoxAccess::getAccessibleContext() + 56 (libmergedlo.dylib + 42692372) [0x10db4af14] 22 std::__1::mutex::lock() + 16 (libc++.1.dylib + 141144) [0x198d93758] 22 _pthread_mutex_firstfit_lock_slow + 220 (libsystem_pthread.dylib + 7612) [0x198e56dbc] 22 __psynch_mutexwait + 8 (libsystem_kernel.dylib + 15292) [0x198e1dbbc] *22 psynch_mtxcontinue + 0 (com.apple.kec.pthread + 9756) [0xfffffe000baba84c] This does not occur with LO 24.8. ==> regression Steps to Reproduce: See description above Actual Results: Concurrent mutex locks causing hang, and requiring forced kill Expected Results: Shouldn't hang, the filter removal should return gracefully to the default grid control table resultset display. Reproducible: Always User Profile Reset: Yes Additional Info: Version: 25.2.0.3 (AARCH64) / LibreOffice Community Build ID: e1cf4a87eb02d755bce1a01209907ea5ddc8f069 CPU threads: 8; OS: macOS 15.2; UI render: Skia/Raster; VCL: osx Locale: fr-FR (fr_FR.UTF-8); UI: fr-FR Calc: threaded
Created attachment 199147 [details] Apple trace at moment of hang/kill The Apple trace appears to show two concurrent mutex locks which leads to deadlock/spinning beachball of the LO process.
Tried this one but couldn't confirm under Linux (OpenSUSE 15.6 64bit rpm)
Hmm, trawling through bugzilla hints that bug 148435 has resurfaced.
(In reply to Robert Großkopf from comment #2) > Tried this one but couldn't confirm under Linux (OpenSUSE 15.6 64bit rpm) Thanks Robert, pretty certain this is macOS specific.
(In reply to Alex Thurgood from comment #3) > Hmm, trawling through bugzilla hints that bug 148435 has resurfaced. Your Activity Monitor sample looks similar to the last sample in tdf#148435. Both samples have the grammar checker and other non-main threads waiting for something to unblock them. But the big difference is that on the main thread, this bug is in the accessibility code whereas tdf#148435 was the "idle task scheduler" code. So adding @Michael Weghorn to see if he has any insights into the locking code in accessibility::AccessibleBrowseBoxAccess::getAccessibleContext(). My memory is hazy, but I remember that tdf#148435 was caused by trying to lock a non-recursive mutex when it is already locked. In other words, tdf#148435 was due to using a non-recursive lock as a recursive lock. On macOS, trying to lock a non-recursive lock that is already lock results in a hang so possibly we need to replace the lock with a recursive lock? I can't tell from your sample if the mutex that accessibility::AccessibleBrowseBoxAccess::getAccessibleContext() is trying to lock is a non-recursive lock our not as, surprisingly, I couldn't find the code with a "grep -r AccessibleBrowseBoxAccess". :/
(In reply to Patrick (volunteer) from comment #5) > So adding @Michael Weghorn to see if he has any insights into the locking > code in accessibility::AccessibleBrowseBoxAccess::getAccessibleContext(). My > memory is hazy, but I remember that tdf#148435 was caused by trying to lock > a non-recursive mutex when it is already locked. In other words, tdf#148435 > was due to using a non-recursive lock as a recursive lock. On macOS, trying > to lock a non-recursive lock that is already lock results in a hang so > possibly we need to replace the lock with a recursive lock? I agree, the backtrace suggests that's the problem. I actually ran into the same issue while refactoring the AccessibleBrowseBox* classes and submitted commit 71d1432714b7aba0a10ca5d072c87e46ec325271 Author: Michael Weghorn <m.weghorn@posteo.de> Date: Wed Feb 5 11:26:43 2025 +0100 browsebox a11y: Use recursive mutex on master. Backports for 25-2 and 25-2-1 now pending in Gerrit: https://gerrit.libreoffice.org/c/core/+/181542 https://gerrit.libreoffice.org/c/core/+/181543 > I can't tell from your sample if the mutex that > accessibility::AccessibleBrowseBoxAccess::getAccessibleContext() is trying > to lock is a non-recursive lock our not as, surprisingly, I couldn't find > the code with a "grep -r AccessibleBrowseBoxAccess". :/ That class is no more on master. I dropped it while refactoring BrowseBox a11y code, see `git log --grep="browsebox a11y"`, in particular commit cc8d3dac879ce8be66b785efdfe62be4ca6676bd Author: Michael Weghorn <m.weghorn@posteo.de> Date: Thu Feb 6 10:17:47 2025 +0100 browsebox a11y: Drop AccessibleBrowseBoxAccess
(In reply to Alex Thurgood from comment #0) > This does not occur with LO 24.8. ==> regression For the record, quoting from the commit message of 71d1432714b7aba0a10ca5d072c87e46ec325271: > Locking in AccessibleBrowseBoxAccess::commitEvent was added in > > commit 67158da00e965c90495bb4f339ea25bbec898c60 > Date: Fri Oct 4 14:22:22 2024 +0100 > > cid#1608061 Data race condition > > and > > cid#1607995 Data race condition which could explain why it regressed in 25.2.
> I actually ran into the > same issue while refactoring the AccessibleBrowseBox* classes and submitted > > commit 71d1432714b7aba0a10ca5d072c87e46ec325271 > Author: Michael Weghorn <m.weghorn@posteo.de> > Date: Wed Feb 5 11:26:43 2025 +0100 > > browsebox a11y: Use recursive mutex > > on master. Backports for 25-2 and 25-2-1 now pending in Gerrit: > > https://gerrit.libreoffice.org/c/core/+/181542 > https://gerrit.libreoffice.org/c/core/+/181543 All of these are merged now, so once you update to 25.2.1, you shouldn't see this issue any more.