Bug 135245 - FILEOPEN: DOCX: slow opening
Summary: FILEOPEN: DOCX: slow opening
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.1.0.0.alpha0+
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisectRequest, filter:docx, perf, regression
: 135171 (view as bug list)
Depends on:
Blocks: DOCX-Opening
  Show dependency treegraph
 
Reported: 2020-07-28 19:25 UTC by Telesto
Modified: 2021-01-21 14:43 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Example file (22.81 KB, application/vnd.oasis.opendocument.text)
2020-07-28 19:25 UTC, Telesto
Details
DOCX file created by LibreOffice 7.1 (171.76 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-07-31 16:39 UTC, Xisco Faulí
Details
performance-bug_REDUCED2.odt: looking for duplication of page number frames (175.15 KB, application/vnd.oasis.opendocument.text)
2021-01-21 09:13 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Telesto 2020-07-28 19:25:13 UTC
Description:
FILEEXPORT DOCX: Heading off and slow opening

Steps to Reproduce:
1. Open the attached file
2. File save DOCX
3. File reload -> takes 30 seconds

Actual Results:
30 seconds waiting time & every row has multiple numbers

Expected Results:
Opening speed of 4.4.7.2 (4 seconds) and no broken header


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 7.1.0.0.alpha0+ (x64)
Build ID: <buildversion>
CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: Skia/Raster; VCL: win
Locale: nl-NL (nl_NL); UI: en-US
Calc: CL
Comment 1 Telesto 2020-07-28 19:25:25 UTC
Created attachment 163713 [details]
Example file
Comment 2 Xisco Faulí 2020-07-31 16:38:50 UTC
one bug at a time.
it takes

real	3m41,778s
user	3m35,108s
sys	0m3,169s


in

Version: 7.1.0.0.alpha0+
Build ID: 231e1e416c039d1f9724962a89cf0573a3db48a2
CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 3 Xisco Faulí 2020-07-31 16:39:48 UTC
Created attachment 163819 [details]
DOCX file created by LibreOffice 7.1
Comment 4 Xisco Faulí 2020-07-31 16:49:14 UTC
in

Version: 4.3.0.0.alpha1+
Build ID: c15927f20d4727c3b8de68497b6949e72f9e6e9e

it takes

real	0m27,966s
user	0m24,575s
sys	0m1,113s
Comment 5 Gabor Kelemen (allotropia) 2021-01-05 12:28:05 UTC
This seems to be quite similar to the bugdoc in bug #135171

Two problems here:
- The headers contain frames for page number which are converted to drawing shapes on docx export. Loading many drawing shapes is slow.
- On opening the docx version in LO the number of such shapes is a looot more than originally. This particular one went from ~150 to almost 6000 frames... the other in bug #135171 from 21 to 2100 with current nightly.
Comment 6 Justin L 2021-01-21 09:13:50 UTC
Created attachment 169070 [details]
performance-bug_REDUCED2.odt: looking for duplication of page number frames

(In reply to Gabor Kelemen from comment #5)
> - On opening the docx version in LO the number of such shapes is a looot
> more than originally. This particular one went from ~150 to almost 6000
> frames... the other in bug #135171 from 21 to 2100 with current nightly.

I bibisected using my minimized test with Linux's bibisect-43max. These bibisects are terrible - can't save or open files without crashing for huge stretches.
The last commit that was good is 
bibisect commit 79791a25c551c1e326c15a3ce325a1878a62c004
    source-hash-0351b59aea2b87c2685c0963a57145bdc75a7a86

The next commit where saving and loading works is
bibisect commit 5354507aaf9118ed4c1230ea167759a2274c1d4a
    source-hash-05955dd2096c29853f831d5d16b86c7b7ca00b28

So with a span of a mere 4400 commit, the range is https://cgit.freedesktop.org/libreoffice/core/log/?id=577dd32b1c4eb0a4cff574fbabca987cb52b831b&qt=range&q=0351b59aea2b87c2685c0963a57145bdc75a7a86..05955dd2096c29853f831d5d16b86c7b7ca00b28
Comment 7 Justin L 2021-01-21 09:37:47 UTC
Well, in these test documents, there truly are a TON of nested frames for the page numbers - as seen in the ODT's styles.xml.

So almost certainly the first commit from author Zolnai Tamás on 2014-02-16 17:35:14 with commit 05955dd2096c29853f831d5d16b86c7b7ca00b28
   DOCX export: nested text frames
   In Word it is not allowed to anchor a shape to another shape.
   That's why this code write text boxes only on the first level,
   nested frames is also written out on the same level because
   writeDMLText/WriteVMLText will push nested frames
   into m_aFramesOfParagraph's back.
is the reason for the "multiplication" of frames seen in the header.

So I'm not sure any of this could be considered a bug (except for the design of the original document).

(In reply to Gabor Kelemen from comment #5)
>  Loading many drawing shapes is slow.
Well yes - this known problem can be considered a problem, and these example documents are good proof of that. I imagine there is a bug report about that already.
Comment 8 Justin L 2021-01-21 09:39:21 UTC
*** Bug 135171 has been marked as a duplicate of this bug. ***
Comment 9 Telesto 2021-01-21 12:57:42 UTC
(In reply to Justin L from comment #7)
The bug report is going in the wrong direction :P. Someone marked my initial file as obsolete..

The source ODT file has only 20 frames.. attachment 163713 [details].. Those tons of frame are generated by LibreOffice

So not interested solving the issue of opening a DOCX with tones of frames.. But prefer LibreOffice not creating those in the first place..


But well never looked properly at the file.. more well opens slow so something wrong .. without looking at the frame count. So title/summary is kind of misleading.
Comment 10 Justin L 2021-01-21 14:22:56 UTC
(In reply to Telesto from comment #9)
> The source ODT file has only 20 frames.. attachment 163713 [details].. Those
> tons of frame are generated by LibreOffice

No - it actually has about 20 frames per page.  Look at styles.xml and you will see gazillions of frames within frames (11 to be exact) in the chapter5even and odd page styles.

It is identical in concept (and probably identical in terms of original source document) to the bug I marked as duplicate.
Comment 11 Telesto 2021-01-21 14:43:34 UTC
Let's say:NAB