Bug 115399 - Data race in typelib_typedescription_register
Summary: Data race in typelib_typedescription_register
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: sdk (show other bugs)
Version:
(earliest affected)
4.1.4.2 release
Hardware: All All
: medium normal
Assignee: Marc-Oliver Straub
URL:
Whiteboard: target:7.0.0
Keywords: needsDevAdvice
Depends on:
Blocks:
 
Reported: 2018-02-02 11:54 UTC by straub
Modified: 2020-05-18 09:52 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Example to reproduce my findings (26.40 KB, patch)
2020-04-21 08:58 UTC, straub
Details

Note You need to log in before you can comment on or make changes to this bug.
Description straub 2018-02-02 11:54:14 UTC
cppu/source/typelibtypelib.cxx, typelib_typedescription_register contains a data race that might cause crashes.

In short: typelib_typedescription_register replaces members of already stored type descriptions without synchronizing with other threads that might currently use those members.

Crash scenario is as follows:
* thread A is processing an IPC call involving an out parameter with struct type (the type description for this struct's bComplete member is 0 up to now)
** thread A is copying the C++ out parameter to the BinaryAny objects for transferring it via IPC, and calls _copyConstructStruct (copy.hxx)
** _copyConstructStruct copies member per member, using the type description's pMemberOffsets and ppTypeRefs member attributes

* at the same time thread B also needs this struct type for its work, and tries to complete the struct type description (complete(...) in typelib.cxx).
** after completing the type description, the completed description is registered via typelib_typedescription_register
** typelib_typedescription_register notices that the new description is complete, but the previously stored was not -> it calls typelib_typedescription_destructExtendedMembers(...) on the old, stored typedescription
** typelib_typedescription_destructExtendedMembers(...) deletes the memory used for the pMemberOffsets and ppTypeRefs members (while they are in use by thread A)
** typelib_typedescription_register replaces the members in the stored struct with those from the new struct

Depending on the timing, thread A accesses already freed memory (no crash), or sees already different data at the pMemberOffset/ppTypeRefs pointers. The latter case happened for me, pMemberOffsets[0] was 0 and pMemberOffsets[1] was 0, ie. have already been reused for something else.
My struct contained a long and a type member, and the code crashed since it wanted to copy the type member at offset 0, which contained a 1 -> access to memory 0x1 instead of just copying the 1 as long value.


The issue seems to be present in libreoffice 6.0.0.3 (at least from looking at the typelib code), I found the issue in libreoffice 4.1.4.2
Comment 1 Buovjaga 2018-02-24 15:43:39 UTC
Would be cool, if you submitted a patch to solve it: https://wiki.documentfoundation.org/Development/gerrit
Comment 2 Xisco Faulí 2018-03-27 09:12:38 UTC
(In reply to Buovjaga from comment #1)
> Would be cool, if you submitted a patch to solve it:
> https://wiki.documentfoundation.org/Development/gerrit

Dear Reporter,
Would you like to submit a patch using the link above ?
Comment 3 straub 2018-03-27 11:17:42 UTC
A workaround could be to not delete the extended data members, but this would create a memory leak. I have no idea how to fix the issue.
Comment 4 Xisco Faulí 2018-03-27 11:35:19 UTC
@Stephan, would you mind taking a look at the description provided ?
Comment 5 straub 2019-12-10 13:18:32 UTC
I've submitted a patch proposal to gerrit.
Comment 6 Stephan Bergmann 2019-12-10 13:34:43 UTC
(In reply to straub from comment #5)
> I've submitted a patch proposal to gerrit.

...in which case it can't hurt to give a link to it here
Comment 7 Buovjaga 2019-12-10 13:49:38 UTC
https://gerrit.libreoffice.org/#/c/84846/
Comment 8 Stephan Bergmann 2020-03-04 10:24:07 UTC
(In reply to straub from comment #0)
> Crash scenario is as follows:
> * thread A is processing an IPC call involving an out parameter with struct
> type (the type description for this struct's bComplete member is 0 up to now)
> ** thread A is copying the C++ out parameter to the BinaryAny objects for
> transferring it via IPC, and calls _copyConstructStruct (copy.hxx)
> ** _copyConstructStruct copies member per member, using the type
> description's pMemberOffsets and ppTypeRefs member attributes
> 
> * at the same time thread B also needs this struct type for its work, and
> tries to complete the struct type description (complete(...) in typelib.cxx).
> ** after completing the type description, the completed description is
> registered via typelib_typedescription_register
> ** typelib_typedescription_register notices that the new description is
> complete, but the previously stored was not -> it calls
> typelib_typedescription_destructExtendedMembers(...) on the old, stored
> typedescription
> ** typelib_typedescription_destructExtendedMembers(...) deletes the memory
> used for the pMemberOffsets and ppTypeRefs members (while they are in use by
> thread A)
> ** typelib_typedescription_register replaces the members in the stored
> struct with those from the new struct

Do you happen to have a self-contained reproducer for this?  Trying to reconstruct the scenario, I don't even manage to get into a situation where there is a typelib_TypeDescription instance with eTypeClass=typelib_TypeClass_STRUCT and bComplete=false.
Comment 9 straub 2020-03-09 07:08:07 UTC
I'll try to create a reproducer, bit I'll have to reverse-engineer our scenario that caused this behavior - so it'll take some time.
Comment 10 straub 2020-04-21 08:58:11 UTC
Created attachment 159778 [details]
Example to reproduce my findings
Comment 11 straub 2020-04-21 08:59:27 UTC
Hi all,
I was finally able to create a reproducer that at runs into TypeDescription::makeComplete() of a uno struct, which calls destruct_extended_members().
To achieve this, we need two contributors:
1) Some code that uses the struct's static type, eg. because of calling ::cppu::UnoType< ::foo::XOut >::get()
2) An IPC completing the type struct type description

Scenario description:
* process 1:uno.bin hosting an adapted foo.Counter component (odk/examples/cpp/counter copied to outstruct, adapted XCountable.idl having an out parameter of type XOut struct)
* process 2: uno.bin executable calling modified remoteclient (odk/examples/cpp/remoteclient) which is calling XCountable::outMethod([out] XOut outParam)

Attaching to uno.bin hosting outstruct (process 1), we can see that destruct_extended_members() on XOut will be called with the following stacktrace:
#0  typelib_typedescription_destructExtendedMembers (pTD=0x7fffe4001b00) at /workspaces/socbm451/mastraub/libreoffice6_repo/core/cppu/source/typelib/typelib.cxx:1283
#1  0x00007ffff7825b2a in typelib_typedescription_register (ppNewDescription=0x7ffff03cb928) at /workspaces/socbm451/mastraub/libreoffice6_repo/core/cppu/source/typelib/typelib.cxx:1493
#2  0x00007ffff782302d in (anonymous namespace)::complete (ppTypeDescr=0x7ffff03cbc08, initTables=true) at /workspaces/socbm451/mastraub/libreoffice6_repo/core/cppu/source/typelib/typelib.cxx:480
#3  0x00007ffff7828084 in typelib_typedescription_complete (ppTypeDescr=0x7ffff03cbc08) at /workspaces/socbm451/mastraub/libreoffice6_repo/core/cppu/source/typelib/typelib.cxx:2379
#4  0x00007ffff03f80a5 in com::sun::star::uno::TypeDescription::makeComplete (this=0x7ffff03cbc08) at /workspaces/socbm451/mastraub/libreoffice6_repo/core/include/typelib/typedescription.hxx:225
#5  0x00007ffff03f7a68 in binaryurp::Marshal::writeValue (this=0x659fc8, buffer=0x7ffff03cbc20, type=..., value=0x7fffe8000c00) at /workspaces/socbm451/mastraub/libreoffice6_repo/core/binaryurp/source/marshal.cxx:191
#6  0x00007ffff03f7724 in binaryurp::Marshal::writeValue (this=0x659fc8, buffer=0x7ffff03cbc20, type=..., value=...) at /workspaces/socbm451/mastraub/libreoffice6_repo/core/binaryurp/source/marshal.cxx:124
#7  0x00007ffff040ab33 in binaryurp::Writer::sendReply (this=0x659ec0, tid=..., member=..., setter=false, exception=false, returnValue=..., outArguments=std::vector of length 1, capacity 1 = {...}) at /workspaces/socbm451/mastraub/libreoffice6_repo/core/binaryurp/source/writer.cxx:390
#8  0x00007ffff0409dd8 in binaryurp::Writer::execute (this=0x659ec0) at /workspaces/socbm451/mastraub/libreoffice6_repo/core/binaryurp/source/writer.cxx:174
#9  0x00007ffff7053d06 in salhelper::Thread::run() () from /workspaces/socbm451/mastraub/libreoffice6_repo/core/instdir/program/libuno_salhelpergcc3.so.3
#10 0x00007ffff7053e9a in threadFunc () from /workspaces/socbm451/mastraub/libreoffice6_repo/core/instdir/program/libuno_salhelpergcc3.so.3
#11 0x00007ffff72a0028 in osl_thread_start_Impl(void*) () from /workspaces/socbm451/mastraub/libreoffice6_repo/core/instdir/program/libuno_sal.so.3
#12 0x00007ffff661ae25 in start_thread (arg=0x7ffff03cc700) at pthread_create.c:308
#13 0x00007ffff634834d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

The type description for XOut is being completed, the old struct member type descriptions destructed (pCTD->ppTypeRefs[i]). However, some other thread might currently use them.

I attached a patch that contains this small scenario code. Up to now I was unable to reproduce the 2nd thread using ppTypeRefs[] of the original, incomplete type description. Stephan Bergmann, could you please take a look?
Thanks, Oli
Comment 12 straub 2020-04-21 09:05:55 UTC
To run my example:
* apply the diff
* check the Makefile in remoteclient/Makefile for any paths that need adaption
* make in examples/cpp/outstruct
* make remoteclient.run in examples/cpp/remoteclient
Comment 13 Commit Notification 2020-04-23 19:51:45 UTC
Stephan Bergmann committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/500c4a1442919715a504e90db592dcce0f31e1a5

Related tdf#115399: Reorder code slightly

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Commit Notification 2020-04-28 13:09:11 UTC
Stephan Bergmann committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/679163845bd2f8b6de703f8fe704eb8912bc4ba4

tdf#115399: Don't kill pre-existing typelib_TypeDescription members

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 straub 2020-04-29 09:03:30 UTC
I'm now applying the fix and will test the scenario where we saw the issue.
Looking at your changes, I think this fix should work fine.
Thank you, Oli
Comment 16 straub 2020-05-06 10:22:43 UTC
I'm running my testcase with this changeset endlessly, up to now it looks fine, no crash.
Comment 17 straub 2020-05-18 09:50:17 UTC
After my endless tests, I'm confident that the issue is fixed. Thank you Stephan Bergmann!