Bug 134641 - binaryurp bridge termination sporadically causes DisposedException in a different bridge
Summary: binaryurp bridge termination sporadically causes DisposedException in a diffe...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: sdk (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsDevAdvice
Depends on:
Blocks:
 
Reported: 2020-07-08 09:07 UTC by Marc-Oliver Straub
Modified: 2025-11-27 11:27 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marc-Oliver Straub 2020-07-08 09:07:58 UTC
Termination of an binaryurp bridge (eg. because the remote process crashes) can cause DisposedException in a different bridge. Expectation is that different bridges are not affected by termination of other bridges.

We had 3 processes communicating to each other using binaryurp bridges:
Process A <-> process B
Process B <-> process C
Process A and process C don't talk to each other.

Process A requests process B to execute a method. As part of this method, process B needs to call process C:

A: call doSomethingInProcessB(), waiting for result
B: execute doSomethingInProcessB(), will now call doSomethingInProcessC()
C: idling

Process A is now terminated (due to one of its threads crashing, a kill, ...).
Process B notices that the bridge to process A has terminated and calls ThreadPool::dispose(nDisposeId). ThreadPool::dispose(..) walks through all JobQueues, calling JobQueue::dispose(nDisposeId).

Since the doSomethingInProcessB()-call is still being processed, the associated  JobQueue contains the nDisposeId as topmost entry in the callstack. JobQueue::dispose(..) finds the disposeId and sets it to 0. It signals m_cndWait so that the bridge can terminate (jobqueue.cxx:143)

Concurrently to this, the worker thread currently working on doSomethingInProcessB() wants to call doSomethingInProcessC(). The IPC is sent out and JobQueue::enter(..) is called to wait for the result. JobQueue::enter(..) puts a different disposeId onto the callstack (since the call uses a different bridge) and should block on m_cndWait.wait() to wait for the result (jobqueue.cxx:73)

But m_cndWait has been signalled by JobQueue::dispose(), so JobQueue::enter(..) doesn't block - but m_lstJobs is still empty (jobqueue.cxx:98). It resets the m_cndWait and returns a nullptr, which is converted into a DisposedException by Bridge::makeCall() (bridge.cxx:610) - even though the bridge to process C is completely intact at this point in time.

I'd suggest the following fixes:
* JobQueue::enter() should check for job == nullptr after resetting m_cndWait in jobqueue.cxx:98. If so, it should continue waiting instead of returning nullptr. This will avoid the DisposedException, the call to doSomethingInProcessC() will work correctly.

* JobQueue::enter() should check for m_lstCallstack == 0 and m_lstJob.empty() after processing a request (jobqueue.cxx:109). This will ensure that the bridge will correctly terminate once doSomethingInProcessB() has finished.
Comment 1 Eleonora Govallo 2021-08-04 20:30:29 UTC
Hello!
Do you still want to implement your proposal?  If yes, please write to the IRC chat #libreoffice-dev
Also, you can try to make patch by yourself using information from this page https://wiki.documentfoundation.org/Development/GetInvolved
Comment 2 QA Administrators 2022-02-02 03:40:34 UTC Comment hidden (obsolete)
Comment 3 Marc-Oliver Straub 2022-02-25 08:32:10 UTC
Yes, I plan to provide a fix for this issue soon.
Comment 4 QA Administrators 2025-11-27 11:27:32 UTC
Dear Marc-Oliver Straub,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug