Bug 152710 - CRASH: importing ooo84576-1.odt, crashtest; corrupt document structure
Summary: CRASH: importing ooo84576-1.odt, crashtest; corrupt document structure
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.4.3.2 release
Hardware: All All
: high major
Assignee: Michael Stahl (allotropia)
URL:
Whiteboard: target:7.6.0 target:7.5.0.2 target:7.4.6
Keywords: bibisected, bisected, regression
Depends on:
Blocks: Crash
  Show dependency treegraph
 
Reported: 2022-12-28 21:41 UTC by Dave Gilbert
Modified: 2023-01-24 10:36 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Gilbert 2022-12-28 21:41:00 UTC
This is a crashtest import crash that's been live for a while;
the attachment on:

https://bz.apache.org/ooo/show_bug.cgi?id=84576

crashes on load reliably; the exact failure mode varies a bit - Noel recently added https://gerrit.libreoffice.org/c/core/+/143820  to stop it hanging.

It seems to be getting in a mess where rNodes is corrupt in SwList::SwList 

(gdb) p rNodes
$12 = (const SwNodes &) @0x1aeff10: {<BigPtrArray> = BigPtrArray of length 26 = {
[   0] 0x735cf60            StartNode ,
[   1] 0x73c8850              EndNode ,
[   2] 0x698cae0            StartNode ,
[   3] 0x69cb760              EndNode ,
[   4] 0x73226c0            StartNode ,
[   5]  0x7323900           StartNode ,
[   6]   0x699d198            GrfNode ,
[   7]  0x7376d10             EndNode ,
[   8]  0x7380be0           StartNode ,
[   9]   0x730b228            GrfNode ,
[  10]  0x7367a50             EndNode ,
[  11] 0x71fdc50              EndNode ,
[  12] 0x18a7670            StartNode ,
[  13] 0x27f2af0              EndNode ,
[  14] 0x73338d8             TextNode "",
[  15] 0x73474e0            StartNode ,
[  16]  0x69ad8d8            TextNode "Laisvos programos vaikų lavinimui",    <<<<<< Line 179
[  17]  0x7317548            TextNode "",
[  18]  0x732ce28            TextNode "(Penktas straipsnis apie rinkinį http://mokslui.akl.lt)",
[  19]  0x7376fb8            TextNode "Vaikai dažnai „lenda“ prie kompiuterio, ar turite ką jiems pasiūlyti?",
[  20]  0x7301868            TextNode "",
[  21]  0x7301a48            TextNode "Laisvas lavinamųjų žaidimų/užsiėmimų rinkinys „GCompris“ (http://gcompris.net) yra tobulinamas jau 10 metų. Kaip ir dauguma kokybiškų laisvų programų jis yra išverstas į daugumą pasaulio kalbų (apie 8"...,
[  22]  0x73b3978            TextNode ".",
[  23]  0x7351c48            TextNode "Yra 8 pagrindinės žaidimų/užsiėmimų (pamokėlių) grupės :",
[  24]  0x736be38            TextNode "",
[  25] 0x29ca640              EndNode }, m_vIndices = 0x73bce88, m_rMyDoc = @0x2402100, m_pEndOfPostIts = 0x73c8850, m_pEndOfInserts = 0x69cb760,
  m_pEndOfAutotext = 0x71fdc50, m_pEndOfRedlines = 0x27f2af0, m_pEndOfContent = std::unique_ptr<SwNode> = {get() = 0x29ca640},
  m_aOutlineNodes = {<o3tl::sorted_vector<SwNode*, CompareSwOutlineNodes, o3tl::find_unique, true>> = {m_vector = std::vector of length 1, capacity 1 = {
        0x69ad8d8}}, static npos = 18446744073709551615}, m_bInNodesDel = false, m_bInDelUpdOutline = false}

[14] isn't inside a start/end and when it tries to find the matching end it gets confused.
I've not figured out where that's coming from yet, but there are some other oddities with debug on:

warn:sw.xml:30389:30389:sw/source/filter/xml/XMLRedlineImportHelper.cxx:718: Recursive change tracking, removing
warn:sw.core:30389:30389:sw/source/core/doc/docbm.cxx:570: MarkManager::makeMark(..) - refusing to create mark on non-textnode

Looking at the original bug, I think that Recursive change tracking detection came from stopping the original crash
Comment 1 Stéphane Guillou (stragu) 2022-12-28 22:26:14 UTC
Reproduced crash with:

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 29c2bba1f3ef216d226c97197185066880fc1ab5
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Lo 7.3.7.2 does not crash, LO 7.4 starts crashing:

Version: 7.4.3.2 / LibreOffice Community
Build ID: 1048a8393ae2eeec98dff31b5c133c5f1d08b890
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: de-DE (en_AU.UTF-8); UI: en-US
Calc: threaded

Crash report for 7.4:

https://crashreport.libreoffice.org/stats/crash_details/5a2c5161-79d5-427a-b939-084ac4c3eb7a

7.5 and 7.6 fail to give me a crash report.
Comment 2 Dave Gilbert 2022-12-29 01:27:37 UTC
Commenting out the 'Recursive change tracking' removal code in XMLRedlineImportHelper.cxx does get past this import crash, so I guess it is that code that's mangling it.
Comment 3 Xisco Faulí 2022-12-29 11:10:02 UTC
Not reproduced in

Version: 6.0.0.0.alpha1+
Build ID: 6eeac3539ea4cac32d126c5e24141f262eb5a4d9
CPU threads: 8; OS: Linux 5.10; UI render: default; VCL: gtk3; 
Locale: es-ES (es_ES.UTF-8); Calc: group threaded
Comment 4 Xisco Faulí 2022-12-29 11:16:47 UTC
Regression introduced by

author	Michael Stahl <michael.stahl@allotropia.de>	2022-08-17 12:17:50 +0200
committer	Michael Stahl <michael.stahl@allotropia.de>	2022-08-17 14:04:42 +0200
commit 477e489e71b4a96ff10d9f2d2b802d91dec3e319 (patch)
tree 9bfb7475a4973fd5a3c4591bc47f06ef64a95a9d
parent 9f4405c94bc7d9d7500aedc1ade2d90955ab69d7 (diff)
forcepoint#108 sw: tweak a bit

Bisected with: bibisect-linux64-7.5

Adding Cc: to Michael Stahl
Comment 5 Dave Gilbert 2022-12-29 11:34:30 UTC
Which is the commit that added the recursive change tracking removal.
Comment 6 Caolán McNamara 2022-12-29 13:10:40 UTC
see also https://gerrit.libreoffice.org/c/core/+/143666
Comment 7 Dave Gilbert 2022-12-31 01:49:35 UTC
The recursive change in the input file is:

        <text:changed-region text:id="ct142065960">
          <text:deletion>
            <office:change-info>
              <dc:creator>Baltix user</dc:creator>
              <dc:date>2007-12-02T19:57:00</dc:date>
            </office:change-info>
            <text:p text:style-name="P1">tainment<text:change text:change-id="ct142065960"/></text:p>
          </text:deletion>
        </text:changed-region>

that id doesn't appear anywhere else in it.
Comment 8 Dave Gilbert 2023-01-01 18:27:06 UTC
This is actually the easy case of deleting a recursive change, and the delete itself seems to work OK.
Before the delete we have:

(gdb) n
733         pDoc->getIDocumentContentOperations().DeleteRange(aPaM);
(gdb) p aPaM
$3 = SwPaM = {point = SwPosition (node 7, offset 0), mark = SwPosition (node 9, offset 0)}
(gdb) p *pDoc->m_pNodes
$4 = {<BigPtrArray> = BigPtrArray of length 14 = {
[   0] 0x6063780            StartNode/SwNormalStartNode parent-start: 0x6063780 end: 0x60646f0, 
[   1] 0x60646f0              EndNode start: 0x6063780, 
[   2] 0x6064b00            StartNode/SwNormalStartNode parent-start: 0x6063780 end: 0x6066280, 
[   3] 0x6066280              EndNode start: 0x6064b00, 
[   4] 0x605a5a0            StartNode/SwNormalStartNode parent-start: 0x6063780 end: 0x6066480, 
[   5] 0x6066480              EndNode start: 0x605a5a0, 
[   6] 0x3381220            StartNode/SwNormalStartNode parent-start: 0x6063780 end: 0x6066390, 
[   7]  0x618b0c0           StartNode/SwNormalStartNode parent-start: 0x3381220 end: 0x618b120, 
[   8]   0x617e158           TextNode "tainment", 
[   9]  0x618b120             EndNode start: 0x618b0c0, 
[  10] 0x6066390              EndNode start: 0x3381220, 
[  11] 0x6064f40            StartNode/SwNormalStartNode parent-start: 0x6063780 end: 0x6063340, 
[  12]  0x5ec6498            TextNode "", 
[  13] 0x6063340              EndNode start: 0x6064f40}, m_vIndices = 0x607d7a8, m_rMyDoc = @0x2c17f60, m_pEndOfPostIts = 0x60646f0, 
  m_pEndOfInserts = 0x6066280, m_pEndOfAutotext = 0x6066480, m_pEndOfRedlines = 0x6066390, m_pEndOfContent = std::unique_ptr<SwNode> = {
    get() = 0x6063340}, m_aOutlineNodes = {<o3tl::sorted_vector<SwNode*, CompareSwOutlineNodes, o3tl::find_unique, true>> = {
      m_vector = std::vector of length 0, capacity 0}, static npos = 18446744073709551615}, m_bInNodesDel = false, m_bInDelUpdOutline = false}

RecursiveContains rls.i=7 rPos.i=8 rls.EoSI=9
RecursiveContains True!

and after:
(gdb) p *pDoc->m_pNodes
$5 = {<BigPtrArray> = BigPtrArray of length 11 = {
[   0] 0x6063780            StartNode/SwNormalStartNode parent-start: 0x6063780 end: 0x60646f0, 
[   1] 0x60646f0              EndNode start: 0x6063780, 
[   2] 0x6064b00            StartNode/SwNormalStartNode parent-start: 0x6063780 end: 0x6066280, 
[   3] 0x6066280              EndNode start: 0x6064b00, 
[   4] 0x605a5a0            StartNode/SwNormalStartNode parent-start: 0x6063780 end: 0x6066480, 
[   5] 0x6066480              EndNode start: 0x605a5a0, 
[   6] 0x3381220            StartNode/SwNormalStartNode parent-start: 0x6063780 end: 0x6066390, 
[   7] 0x6066390              EndNode start: 0x3381220, 
[   8] 0x6064f40            StartNode/SwNormalStartNode parent-start: 0x6063780 end: 0x6063340, 
[   9]  0x5ec6498            TextNode "", 
[  10] 0x6063340              EndNode start: 0x6064f40}, m_vIndices = 0x607d7a8, m_rMyDoc = @0x2c17f60, m_pEndOfPostIts = 0x60646f0, 
  m_pEndOfInserts = 0x6066280, m_pEndOfAutotext = 0x6066480, m_pEndOfRedlines = 0x6066390, m_pEndOfContent = std::unique_ptr<SwNode> = {
    get() = 0x6063340}, m_aOutlineNodes = {<o3tl::sorted_vector<SwNode*, CompareSwOutlineNodes, o3tl::find_unique, true>> = {
      m_vector = std::vector of length 0, capacity 0}, static npos = 18446744073709551615}, m_bInNodesDel = false, m_bInDelUpdOutline = false}
Comment 9 Dave Gilbert 2023-01-02 00:51:12 UTC
The oddity seems to happen during:

1712	    xTxtImport->InsertControlCharacter( ControlCharacter::APPEND_PARAGRAPH );

in:
#0  XMLParaContext::endFastElement(int) (this=0x15f1400) at /discs/fast/core/xmloff/source/text/txtparai.cxx:1715
#1  0x00007fffed1eb0ed in SvXMLImport::endFastElement(int) (this=0x49b1450, Element=197980) at /discs/fast/core/xmloff/source/core/xmlimp.cxx:870
#2  0x00007fffd02399dd in (anonymous namespace)::Entity::endElement() (this=0x499e190) at /discs/fast/core/sax/source/fastparser/fastparser.cxx:515

specifically (my line numbers are off a little from debug)

b sw/source/filter/xml/XMLRedlineImportHelper.cxx:728  (i.e. the triggering recursive check)
(click to load, hit the breakpoint)
p pDoc
b endElement
(gdb) p *((SwDoc *)0x1958470)->m_pNodes

(ok...)
c
c
<still good at this point>

goes into:
SvXMLImport::endFastElement (this=0x49d20b0, Element=197980) at /discs/fast/core/xmloff/source/core/xmlimp.cxx:o58
and then
#0  XMLParaContext::endFastElement(int) (this=0x15f1400) at /discs/fast/core/xmloff/source/text/txtparai.cxx:1688

and I think it's the call to:
1712	    xTxtImport->InsertControlCharacter( ControlCharacter::APPEND_PARAGRAPH );
that breaks it:

(gdb) p *((SwDoc *)0x1957c70)->m_pNodes
$56 = {<BigPtrArray> = BigPtrArray of length 11 = {
[   0] 0x4967ec0            StartNode/SwNormalStartNode parent-start: 0x4967ec0 end: 0x4968e30, 
[   1] 0x4968e30              EndNode start: 0x4967ec0, 
[   2] 0x4969240            StartNode/SwNormalStartNode parent-start: 0x4967ec0 end: 0x4969000, 
[   3] 0x4969000              EndNode start: 0x4969240, 
[   4] 0x495edf0            StartNode/SwNormalStartNode parent-start: 0x4967ec0 end: 0x496aec0, 
[   5] 0x496aec0              EndNode start: 0x495edf0, 
[   6] 0x4968020            StartNode/SwNormalStartNode parent-start: 0x4967ec0 end: 0x4968740, 
[   7] 0x4968740              EndNode start: 0x4968020, 
[   8] 0x49697f0            StartNode/SwNormalStartNode parent-start: 0x4967ec0 end: 0x4967bf0, 
[   9]  0x45a1458            TextNode "", 
[  10] 0x4967bf0              EndNode start: 0x49697f0}, m_vIndices = 0x49810a8, m_rMyDoc = @0x1957c70, m_pEndOfPostIts = 0x4968e30, 
  m_pEndOfInserts = 0x4969000, m_pEndOfAutotext = 0x496aec0, m_pEndOfRedlines = 0x4968740, m_pEndOfContent = std::unique_ptr<SwNode> = {get() = 0x4967bf0}, 
  m_aOutlineNodes = {<o3tl::sorted_vector<SwNode*, CompareSwOutlineNodes, o3tl::find_unique, true>> = {m_vector = std::vector of length 0, capacity 0}, 
    static npos = 18446744073709551615}, m_bInNodesDel = false, m_bInDelUpdOutline = false}
(gdb) n
1712	    xTxtImport->InsertControlCharacter( ControlCharacter::APPEND_PARAGRAPH );
(gdb) n
1715	    Reference < XTextCursor > xAttrCursor;
(gdb) p *((SwDoc *)0x1957c70)->m_pNodes
$57 = {<BigPtrArray> = BigPtrArray of length 12 = {
[   0] 0x4967ec0            StartNode/SwNormalStartNode parent-start: 0x4967ec0 end: 0x4968e30, 
[   1] 0x4968e30              EndNode start: 0x4967ec0, 
[   2] 0x4969240            StartNode/SwNormalStartNode parent-start: 0x4967ec0 end: 0x4969000, 
[   3] 0x4969000              EndNode start: 0x4969240, 
[   4] 0x495edf0            StartNode/SwNormalStartNode parent-start: 0x4967ec0 end: 0x496aec0, 
[   5] 0x496aec0              EndNode start: 0x495edf0, 
[   6] 0x4968020            StartNode/SwNormalStartNode parent-start: 0x4967ec0 end: 0x4968740, 
[   7] 0x4968740              EndNode start: 0x4968020, 
[   8] 0x49e15c8             TextNode "", 
[   9] 0x49697f0            StartNode/SwNormalStartNode parent-start: 0x4967ec0 end: 0x4967bf0, 
[  10]  0x45a1458            TextNode "", 
[  11] 0x4967bf0              EndNode start: 0x49697f0}, m_vIndices = 0x49810a8, m_rMyDoc = @0x1957c70, m_pEndOfPostIts = 0x4968e30, 
  m_pEndOfInserts = 0x4969000, m_pEndOfAutotext = 0x496aec0, m_pEndOfRedlines = 0x4968740, m_pEndOfContent = std::unique_ptr<SwNode> = {get() = 0x4967bf0}, 
  m_aOutlineNodes = {<o3tl::sorted_vector<SwNode*, CompareSwOutlineNodes, o3tl::find_unique, true>> = {m_vector = std::vector of length 0, capacity 0}, 
    static npos = 18446744073709551615}, m_bInNodesDel = false, m_bInDelUpdOutline = false}

which makes me think this is trying to modify the text that was deleted by the recursion fix up??
Comment 10 Michael Stahl (allotropia) 2023-01-05 12:42:12 UTC
thanks for the investigation, the crash happens because some cursor that's used by the import filter isn't invalidated when the nodes are deleted; then later nodes are inserted at invalid positions.

this is fixed with

https://gerrit.libreoffice.org/c/core/+/145077
https://gerrit.libreoffice.org/c/core/+/145078

now the file doesn't import because the invalidated cursor causes lots of exceptions, not sure if that's a problem - since it's obviously invalid...
Comment 11 Commit Notification 2023-01-05 18:26:17 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/8e05bdd26f21fc304978ff3b454cf355841ec75f

tdf#152710 sw: invalidate SwUnoCursors properly in DeleteRangeImpl()

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Commit Notification 2023-01-05 18:26:19 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/475e59d29b7a6cc7f058af8ff863b3bb1a2a84a5

tdf#152710 sw: call and fix DeleteSection() instead

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 13 Commit Notification 2023-01-05 18:27:22 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e5b5d9c8d33b1dd87e5a50856ad02f21df59dc5b

tdf#152710 xmloff: ignore exception in XMLChangedRegionImportContext

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Michael Stahl (allotropia) 2023-01-05 18:29:01 UTC
fixing the import for this file turned out to be rather easy, but in case there's something in the paragraph after the bad change element there might be other places that can't handle the thrown exceptions...
Comment 15 Commit Notification 2023-01-06 20:46:20 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/e8d36908ca57af37dfea747d32356d3175e2a9b0

tdf#152710 sw: invalidate SwUnoCursors properly in DeleteRangeImpl()

It will be available in 7.5.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Commit Notification 2023-01-06 20:46:22 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/e073e4afe5eb460083944eb22dc8d87ad334877a

tdf#152710 sw: call and fix DeleteSection() instead

It will be available in 7.5.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Commit Notification 2023-01-06 20:47:25 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/af920245710d1a51d98bba2c18b7002a32865e0b

tdf#152710 xmloff: ignore exception in XMLChangedRegionImportContext

It will be available in 7.5.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Dave Gilbert 2023-01-07 23:54:48 UTC
Thanks for fixing that!  That's a bit deeper in the guts than I understood; I'd wondered about moving the deletion until later.

Now you say that the file is obviously invalid - true - but does ODF actually define any rules on the validity of change tracking info?  I couldn't find any.
Comment 19 Commit Notification 2023-01-08 20:32:01 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/94246148b215b10062e1d425fec8f1647ff3ffea

tdf#152710 sw: invalidate SwUnoCursors properly in DeleteRangeImpl()

It will be available in 7.4.5.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Stéphane Guillou (stragu) 2023-01-09 07:34:45 UTC
Verified fix with:

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 8ae84bb5566e12df64236a116b9d1889d6f5f052
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Thanks everyone!
Comment 21 Commit Notification 2023-01-18 21:06:05 UTC
Xisco Fauli committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/6bf8ffbf90e701073a65e88ce4de26e40a24a07a

tdf#152710: sw_odfexport: Add unittest

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Xisco Faulí 2023-01-24 10:36:25 UTC
7.4.5 was a hotfix release, updating target in status-whiteboard