Bug 150768 - Hang on opening and converting a DOCX file
Summary: Hang on opening and converting a DOCX file
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.0.4 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, regression
Depends on:
Blocks: Layout-Loops, Writer-Loops
  Show dependency treegraph
 
Reported: 2022-09-03 22:56 UTC by Hossein
Modified: 2023-11-06 23:26 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample.docx (3.32 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2022-09-03 22:56 UTC, Hossein
Details
Flamegraph (180.06 KB, application/x-bzip)
2022-09-09 09:11 UTC, Julien Nabet
Details
file converted to ODT with recent master (3.33 MB, application/vnd.oasis.opendocument.text)
2022-10-04 19:33 UTC, Michael Stahl (allotropia)
Details
PDF output from LO 3.5 (2.50 MB, application/pdf)
2022-10-10 12:28 UTC, Hossein
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hossein 2022-09-03 22:56:22 UTC
Created attachment 182193 [details]
Sample.docx

Trying to opening and converting the sample.docx causes LibreOffice to hang.

Description:
When trying to edit a page break (directly after the first page), LibreOffice crashes.

Steps to Reproduce:
1. Open sample.docx
2. Go to the last page, or try to convert it to PDF

Easier approach:

libreoffice7.0 --convert-to pdf ~/Downloads/sample.docx

Actual Results:
Hang

Expected Results:
Not hanging, and being responsive


Reproducible: Always


User Profile Reset: Yes


Additional Info:

Reproducible with LO 7.0:

Version: 7.0.6.2
Build ID: 144abb84a525d8e30c9dbbefa69cbbf2d8d4ae3b
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: fa-IR (en_US.UTF-8); UI: en-US
Calc: threaded

But not reproducible with LO 6.4

Version: 6.4.0.1
Build ID: 1b6477b31f0334bd8620a96f0aeeb449b587be9f
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3; 
Locale: fa-IR (en_US.UTF-8); UI-Language: en-US
Calc: threaded
Comment 1 Hossein 2022-09-04 00:13:02 UTC
Bibisected using linux-64-7.0 to:

commit 81ec0039b2085faab49380c7a56af0c562d4c9e4
Author: Michael Stahl <Michael.Stahl@cib.de>
Date:   Mon Jan 20 13:48:27 2020 +0100

    tdf#129582 sw: fix copying of flys in header/footer in DOCX/RTF import
    
    The problem is that the exception for writerfilter in
    IsDestroyFrameAnchoredAtChar() and IsSelectFrameAnchoredAtPara() is
    wrong in the case when the header/footer content is copied via
    SwXText::copyText(); that is, previously the situation was that
    writerfilter relied on Delete not deleting such flys (for
    RemoveLastParagraph) but Copy copying them.
    
    (regression from 28b77c89dfcafae82cf2a6d85731b643ff9290e5
     and e75dd1fc992f168f24d66595265a978071cdd277)
    
    So restrict the writerfilter hack to delete; this causes a problem with
    ooxmlexport9 test testTdf100075: it has 2 flys anchored at the
    same paragraph; writerfilter will insert the content into the body and
    then convert to fly; when the 2nd one is converted it will copy the 1st
    fly and anchor it inside the 2nd fly but then unotext.cxx:1719 will
    reset its anchor to inside the body...
    
    Prevent this unwanted copy by relying on the new parameter bCopyText
    that was introduced in 04b2310aaa094794ceedaa1bb6ff1823a2d29d3e,
    but change things a bit so that the case that pass in the extra flag
    isn't the copyText() one that wants the *normal* selection semantics in
    writerfilter import, but the 2 known places that want the *exceptional*
    selection semantics in writerfilter import (hopefully there aren't more).
    
    This is not ideal and the various bool parameters to CopyRange() plus
    mbCopyIsMove plus mbIsRedlineMove should probably be consolidated
    into some flags enum passed to CopyRange().
    
    Change-Id: I638c7fa7ad0b4ec149aa6a1485e32f2c8e29ff5a
    Reviewed-on: https://gerrit.libreoffice.org/c/core/+/87072
    Tested-by: Jenkins
    Reviewed-by: Michael Stahl <michael.stahl@cib.de>


$ git bisect log
git bisect start
# good: [d67926cda658cfe40d35f9f0f203c3407f3700c9] source 9bc848cf0d301aa57eabcffa101a1cf87bad6470
git bisect good d67926cda658cfe40d35f9f0f203c3407f3700c9
# bad: [28c2621cf6a6d383bd0dfa3231adce6a6bff1fb4] source 626ea4e62a3e5005fe9825923a1c0c5bdb61cc08
git bisect bad 28c2621cf6a6d383bd0dfa3231adce6a6bff1fb4
# bad: [056a86d70b2c4322bac1bc3685eacd5364c1dbcf] source 368e9a829e07b3f8624898d69d2c00ec3bc590ec
git bisect bad 056a86d70b2c4322bac1bc3685eacd5364c1dbcf
# good: [13a5216e1fdd1c0b1b633301c84f441eeea45fa3] source c1599fc5c9800086548595d1f1464619a7024d06
git bisect good 13a5216e1fdd1c0b1b633301c84f441eeea45fa3
# bad: [9d0e41ac0017c0f2f5d1d86c617c4a95fbda5382] source 415c1b05242b80ca883596952caa0e179a07b409
git bisect bad 9d0e41ac0017c0f2f5d1d86c617c4a95fbda5382
# bad: [996c18801f638075875b1072355f69b910033715] source 0a64b33617299ece871a947828855b16e2482706
git bisect bad 996c18801f638075875b1072355f69b910033715
# good: [3b180e787931471468fe4cfac3ca0fc1efac8ebc] source 160cde8ec0473b4a0c8e15ee13520d83171aea8d
git bisect good 3b180e787931471468fe4cfac3ca0fc1efac8ebc
# good: [09616dc5a235f7c357f1d5a5614acf4814e55ff3] source 4bceda79065e91d6410d05931bff0324a9cbc321
git bisect good 09616dc5a235f7c357f1d5a5614acf4814e55ff3
# bad: [fb99ed4c0559e83ab6acc3b0dfcc93ae352e333e] source 998308c363dfad03143591aa18256d2669b4da11
git bisect bad fb99ed4c0559e83ab6acc3b0dfcc93ae352e333e
# bad: [e19de9bd39b48873908e612837b5b01c7450e837] source 51f8e04eaaea50b779e3882e87628a6e625e0fd8
git bisect bad e19de9bd39b48873908e612837b5b01c7450e837
# good: [b60917adec22b33c242694907e1e808f61d28110] source ad3580df085b3a3d66eb73cae997ea5ca178ccc1
git bisect good b60917adec22b33c242694907e1e808f61d28110
# bad: [d8dda5abc316268e71c6998105c0406d8661c8c5] source 68356ba158fa689f15e76763f24153976265ac84
git bisect bad d8dda5abc316268e71c6998105c0406d8661c8c5
# good: [5ee357d3fbdc61237afc6dba7b319eb5a12abe50] source d6628ddaf6e2acf53c5a7cbbcb201d700cd95f54
git bisect good 5ee357d3fbdc61237afc6dba7b319eb5a12abe50
# good: [72befe1183a1e433762e456d8c789891c3461d6b] source 901ae316b919680d59b064c6f79fb0910e6be7da
git bisect good 72befe1183a1e433762e456d8c789891c3461d6b
# bad: [af2ca860f235f5ae3dbe0d92f44a371123f5c3ab] source 81ec0039b2085faab49380c7a56af0c562d4c9e4
git bisect bad af2ca860f235f5ae3dbe0d92f44a371123f5c3ab
# first bad commit: [af2ca860f235f5ae3dbe0d92f44a371123f5c3ab] source 81ec0039b2085faab49380c7a56af0c562d4c9e4
Comment 2 Telesto 2022-09-04 12:18:01 UTC
Confirm
Version: 7.5.0.0.alpha0+ / LibreOffice Community
Build ID: 7a89eae97a970939174d59aa58147eaa194acaee
CPU threads: 8; OS: Mac OS X 12.3.1; UI render: Skia/Metal; VCL: osx
Locale: nl-NL (nl_NL.UTF-8); UI: en-US
Calc: threaded
Comment 3 Julien Nabet 2022-09-09 09:11:11 UTC
Created attachment 182333 [details]
Flamegraph

On pc Debian x86-64 with master sources updated today, I could reproduce this.
Perhaps this Flamegraph might give so leads.
Comment 4 Michael Stahl (allotropia) 2022-10-04 19:33:18 UTC
Created attachment 182835 [details]
file converted to ODT with recent master


if i convert the attachment to ODT and then run 6.3 with --convert-to pdf i get an infinite loop as well.

the identified commit only affects DOCX import, but the loop is in the layout code.

=> this is not really a regression from that commit
Comment 5 Michael Stahl (allotropia) 2022-10-04 19:36:14 UTC
please test again with ODT file if it's really a regression
Comment 6 Hossein 2022-10-09 21:53:47 UTC
(In reply to Michael Stahl (allotropia) from comment #5)
> please test again with ODT file if it's really a regression

Confirmed again with my own build:

I have built and tested 81ec0039b2085faab49380c7a56af0c562d4c9e4 and also one commit before that, 901ae316b919680d59b064c6f79fb0910e6be7da.

I tried to create output using this command:

    instdir/program/soffice --headless --convert-to pdf sample.docx

In 81ec0039b208, soffice didn't finish the task after 30 seconds, but for the previous commit (901ae316b919), the task was finished only after a few seconds.
Comment 7 Michael Stahl (allotropia) 2022-10-10 10:18:52 UTC
(In reply to Hossein from comment #6)
>     instdir/program/soffice --headless --convert-to pdf sample.docx

the idea was to test with sample.odt
Comment 8 Hossein 2022-10-10 12:11:59 UTC
(In reply to Michael Stahl (allotropia) from comment #7)
> (In reply to Hossein from comment #6)
> >     instdir/program/soffice --headless --convert-to pdf sample.docx
> 
> the idea was to test with sample.odt
I tested the ODT file. It also gets stuck in the conversion even before 81ec0039b2085faab49380c7a56af0c562d4c9e4.
Comment 9 Hossein 2022-10-10 12:28:53 UTC
Created attachment 182944 [details]
PDF output from LO 3.5

At least in LibreOffice 3.5, exporting ODT to PDF works fine:

LibreOffice 3.5.0rc3 
Build ID: 7e68ba2-a744ebf-1f241b7-c506db1-7d53735

It hangs even in LO 5.3.
Comment 10 Telesto 2022-10-10 13:29:25 UTC
Freeze with
Versie: 4.1.0.4 
Build ID: 89ea49ddacd9aa532507cbf852f2bb22b1ace28

in my case also with
3.5.7.2

---
Non-developer speculation based on VerySleepy stack Profile

TextWidth (or FontSize or Scaling) can't be calculated properly for substituted fonts, throwing off the table size calculation (does it fit on single page/or should be split to 2 pages), triggering endless repagination loop. 

Have some more loops/hangs with documents with substituted fonts in the past, but didn't keep track of those :-( 

OutputDevice::RemoveFontSubstitute
OutputDevice::GetDevFontSize
OutputDevice::ImplGlyphFallbackLayout
OutputDevice::ImplNewFont
OutputDevice::ImplLayout
OutputDevice::GetTextArray
OutputDevice::GetTextWidth
SwFmtINetFmt::PutValue
SwTxtNode::BuildConversionMap
SwTableCellInfo::~SwTableCellInfo
SwTableCellInfo::~SwTableCellInfo
SwTableCellInfo::~SwTableCellInfo
SwTxtNode::GetScalingOfSelectedText
SwTxtNode::GetScalingOfSelectedText
SwTxtNode::GetScalingOfSelectedText
SwTxtNode::GetScalingOfSelectedText
SwTxtNode::GetScalingOfSelectedText
SwTableCellInfo::~SwTableCellInfo
SwTableCellInfo::~SwTableCellInfo
SwTableCellInfo::~SwTableCellInfo
SwTableCellInfo::~SwTableCellInfo
SwTxtNode::IsCollapse
Comment 11 Gabor Kelemen (allotropia) 2023-11-06 23:26:42 UTC
ODT attachment 182835 [details] seems to have started to loop between 3.5.0 and 3.6.0.

Bibisect with linux-43all points to the range:

https://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=0a8596dd8ebbbc80e87d4bdfafe3cf53355b7d43..c0a99301f5d854cad8baeaca798549424937598d

of these I see one Writer layout-touching commit:

https://cgit.freedesktop.org/libreoffice/core/commit/?id=8a233f17ae589b33e3b54ef9ebb1fcff41ef6cd7
n#750258: removed strange non-wrapping condition