Bug 158556 - DOCX Document That Opens/Loads Extremely Slowly
Summary: DOCX Document That Opens/Loads Extremely Slowly
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.6.3.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:24.8.0 target:24.2.0.0.beta2 t...
Keywords: bibisected, bisected, perf, regression
Depends on:
Blocks: File-Opening
  Show dependency treegraph
 
Reported: 2023-12-05 22:35 UTC by Tex2002ans
Modified: 2024-05-03 13:43 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
The.Century.Dictionary.-.Volume.3[cu31924091890602].docx (17.25 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-12-05 22:36 UTC, Tex2002ans
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tex2002ans 2023-12-05 22:35:07 UTC
Description:
Attached document does all of these extremely slowly:

- Loading/Opening from scratch
- Jumping to last page
- PAGE UP from last page

Steps to Reproduce:
TEST #1: OPENING DOCUMENT

1) Double-Click on attached DOCX to open it.

Microsoft Word 2016:

- 43 secs

LibreOffice 7.6.3:

- 10 mins 20 secs

While:

- Microsoft Word initially opens it faster
- LibreOffice is staring at a white loading screen

After opening, they both still take quite a while to generate all the pages too...

- - - - - - - - - - -

TEST #2: JUMP TO FINAL PAGE + PAGE UP

After document load:

1) Press Ctrl+End.
- (To jump to the end of the document.)

2) After reaching final page and document finishes loading:
- Press PAGE UP on keyboard.

This is how long it takes:

Microsoft Word 2016:

1) 28 secs
2) instant

LibreOffice 7.6.3:

1) 7 mins 42 secs
2) >37 mins
- (I gave up after 37 mins.)

Actual Results:
>1 hour of waiting.

Expected Results:
<2 minutes of waiting.


Reproducible: Always


User Profile Reset: No

Additional Info:
NOTE ON DOCUMENT: This was a DOCX document I generated by using ABBYY Finereader 12 to:

- OCR a PDF.
- Export to DOCX.

It is a dictionary that is ~2 million words + ~1450 DOCX pages.

(Original was ~910 pages of dense, triple-column text.)

- - -

Tested on:

Version: 7.6.3.2 (X86_64) / LibreOffice Community
Build ID: 29d686fea9f6705b262d369fede658f824154cc0
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL threaded
Comment 1 Tex2002ans 2023-12-05 22:36:19 UTC
Created attachment 191261 [details]
The.Century.Dictionary.-.Volume.3[cu31924091890602].docx
Comment 2 Caolán McNamara 2023-12-07 13:05:53 UTC
There are apparently 895 headers (and 76 footers) in this document. That seems to be where the time is spent, importing those.
Comment 3 Timur 2023-12-07 19:25:22 UTC
It is always good idea to test previous versions. 
File loaded in 80 seconds for me, until the regression commit from bug 133560, already in 7.5.6, 7.6.0 beta and 24.2 master:
source ba07bfcda6b9f256f636708e52283be0f3a90c8a
Comment 4 Timur 2023-12-08 10:45:13 UTC
Above commit is the main slow down, per my script bisect, that should be confirmed by opening file.

Previous small slow down from 80 secs to 120 secs was in 7.3, 7.4, 7.5 d0a8f6857e93f1f4a26f05615618ff733bfb4851
author	Vasily Melenchuk <vasily.melenchuk@cib.de>	Mon Dec 27 2021 
tdf#143703 sw: always assign name for fly section
But somehow I was still seeing 80 secs before Mike's commit.

For me, moral from this and other regression bugs I see is that changes should not be backported, except those for crashes or data loss.
Comment 5 Mike Kaganski 2023-12-08 11:14:16 UTC
(In reply to Timur from comment #4)
> For me, moral from this and other regression bugs I see is that changes
> should not be backported, except those for crashes or data loss.

This is debatable :-) I tend to agree, when you backport to X.Y.7 - which will be the last in the branch. But otherwise, this regression shows why it is *a good thing* to backport: currently I still remember that patch, and it could help me fixing it; while if it wouldn't be backported, it only got filed 6+ months later, and recalling the details would be much harder (although for this case, I have no idea how to fix it - we should be correct first, *then* fast; and I am not an expert in perf bugs).
Comment 6 Commit Notification 2023-12-15 09:30:16 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/dcae6615ed254cf7884fa6415f64561f85b93588

tdf#158556: provide objects anchored to node as a hidden property

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Mike Kaganski 2023-12-15 09:41:39 UTC
So with the change I merged now, it is almost twice as fast opening this bugdoc on my system. Not as fast as before the change mentioned in comment 3, but that is likely as far as I can improve it; the change that introduced the more expensive algorithm can't be reverted completely, because that would make import wrong, as opposed to slow :-)

Hopefully the problem of DOCX documents with hundreds images wouldn't be too widespread.
Comment 8 Mike Kaganski 2023-12-15 09:43:34 UTC
Note that the merged change doesn't address the other points mentioned in comment 0: slow navigation after opening. I guess, that that aspect (unrelated to the opening time regression) needs an own report, where it could be handled independently.
Comment 9 Commit Notification 2023-12-21 11:03:57 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-24-2":

https://git.libreoffice.org/core/commit/2f9ecce61b0c806a33a9f641e43f4a71ed699fee

tdf#158556: provide objects anchored to node as a hidden property

It will be available in 24.2.0.0.beta2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Tex2002ans 2024-01-09 22:44:23 UTC
On 24.8.0.0alpha0 (2024-01-06 02:53:39), it took:

- 7 mins 49 secs

for first paint... so about 25% better load speed for me.

- - -

This was the daily I used to test:

Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 25276df12abd9d002f7f899900434617b256f745
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL threaded

- - -

> I guess, that that aspect (unrelated to the opening time regression) needs an own report, where it could be handled independently.

Should we treat Ctrl+End (then Page Up) as 2 separate performance bugs? Or just submit them as 1?

Side Note: And while this document is still a tiny bit faster to paint... it's still *extremely* slow. (And the second I moved the mouse after load, I got another big white screen/freeze.)
Comment 11 Mike Kaganski 2024-01-10 02:55:04 UTC
(In reply to Tex2002ans from comment #10)
> On 24.8.0.0alpha0 (2024-01-06 02:53:39), it took:
> 
> - 7 mins 49 secs
> 
> for first paint... so about 25% better load speed for me.

Heh, it basically matches my findings - from the commit message:

LibreOffice 7.6.0.3 (TDF build):
real    8m37.386s
Current master (my no-debug build):
real    10m6.776s
Current master with this patch (my no-debug build):
real    5m41.524s

... so when I wrote "almost twice as fast", I only compared "immediately before the patch vs. after the patch", and something additionally slowed down the things between 7.6.0.3 and master ...

> > I guess, that that aspect (unrelated to the opening time regression) needs an own report, where it could be handled independently.
> 
> Should we treat Ctrl+End (then Page Up) as 2 separate performance bugs? Or
> just submit them as 1?

Please file it as a single bug. If that proves later to be different things, of course it might require a third one - but that's never a problem to do later :-D
Comment 12 Commit Notification 2024-03-21 07:39:55 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/241e2d68664e0e53cf02fe9986462c4a9ecd8d42

tdf#158556 speedup docx load

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 13 Commit Notification 2024-03-21 20:22:47 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-24-2":

https://git.libreoffice.org/core/commit/e4519e38d3598c3e26f2585bbc2553bc7ff5db4c

tdf#158556 speedup docx load

It will be available in 24.2.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Commit Notification 2024-03-22 07:00:57 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/284d80825ec7cf3c39af91959e4bf3d539b066f4

tdf#158556 speed up SwNodes::RemoveNode

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Commit Notification 2024-03-25 11:25:11 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/ab29c857c669bcca3d8eea8a5a9e6ad5eae622d7

tdf#158556 speedup docx load

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Commit Notification 2024-03-25 11:25:13 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/77224aaec6ba89194a404805d7190f88e92fcc9f

tdf#158556 speedup docx load

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Noel Grandin 2024-03-25 12:54:25 UTC
The root of the remaining problem is how we create and update SwPageDesc. It is a very heavy object, and when we update it, we create one, then copy it to the destination document, then delete it.

We should surely be able to short circuit this process somehow, by passing ownership of the new SwPageDesc to the document, but I'm not sure how, given how many constituent parts the SwPageDesc has.
Comment 18 Tex2002ans 2024-03-25 14:40:23 UTC
Fantastic. Thank you so much, Noel.

I can't wait to test the dailies again and see how much faster it got. :)

(According to your notes, looks like it's ~30% the load time compared to a week ago!)
Comment 19 Commit Notification 2024-04-08 14:55:27 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-24-2":

https://git.libreoffice.org/core/commit/4126c6b94e87f7ad2a1aa93d66bbb3edf67ec790

tdf#158556 speedup docx load

It will be available in 24.2.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Mike Kaganski 2024-05-03 13:43:44 UTC
Comparing the opening times on my system, I see ~110s using LO v.7.2.0, and ~105s using v.24.2.3. Thanks to the immense Noel's work, I believe that this regression can be closed now?