Bug 135683 - FILEOPEN DOCX: Somewhat slow opening of document containing a 222 pages table
Summary: FILEOPEN DOCX: Somewhat slow opening of document containing a 222 pages table
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.3.0
Keywords: haveBacktrace, perf
: 136748 148936 (view as bug list)
Depends on:
Blocks: DOCX-Tables DOCX-Opening Performance
  Show dependency treegraph
 
Reported: 2020-08-12 21:03 UTC by Telesto
Modified: 2022-11-13 13:16 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Perf flamegraph (268.99 KB, image/svg+xml)
2021-08-28 12:43 UTC, Buovjaga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Telesto 2020-08-12 21:03:33 UTC
Description:
FILEOPEN DOCX: Slow opening of document containing a 222 pages table

Steps to Reproduce:
1. Open attachment 164209 [details] bug 135584
2. Take notice of the time until the page counter reaches 222 pages/ cpu drops
3. Save file as ODT
4. File reload and measure time until CPU drops

Note: disable the automatic spell checker

Actual Results:
Opening of the ODT is by faster and smoother compared to DOCX

Expected Results:
Maybe some tweaking can be done


Reproducible: Always


User Profile Reset: No



Additional Info:
Found in
Version: 7.1.0.0.alpha0+ (x64)
Build ID: <buildversion>
CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: Skia/Raster; VCL: win
Locale: nl-NL (nl_NL); UI: nl-NL
Calc: CL

and in
3.3.0
Comment 1 Dieter 2020-11-03 20:05:54 UTC
I confirm it with

Version: 7.0.3.1 (x64)
Build ID: d7547858d014d4cf69878db179d326fc3483e082
CPU threads: 4; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
Calc: threaded

around 70 seconds with docx-file
around 12 seconds with odt-file
Comment 2 Dieter 2020-11-03 20:08:01 UTC
*** Bug 136227 has been marked as a duplicate of this bug. ***
Comment 3 Dieter 2020-11-03 20:11:22 UTC
*** Bug 136748 has been marked as a duplicate of this bug. ***
Comment 4 NISZ LibreOffice Team 2021-04-12 12:27:53 UTC
With current 7.2 bibisect on my old-ish machine, when measured with 

time OOO_EXIT_POST_STARTUP=1 isw ../test_file_tables.docx

I get about 15-18 seconds values.

But when I measure the time until the page count reaches 222, that takes about 43-45 seconds measured on my phone.

So after finishing the XML processing it seems like the rendering takes another 30 seconds.

The file has a huge table and some 180 tracked changes according to Word.
If I accept all changes and open that version in Writer, then the page counter reaches 222 in ~35 seconds instead of ~45.
Comment 5 Buovjaga 2021-08-28 12:43:42 UTC
Created attachment 174588 [details]
Perf flamegraph

It's not very slow for me, about 10 secs to word count, but here is a flamegraph

Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: 58a5bd793a2ed57077fc598281cc74e16373b877
CPU threads: 8; OS: Linux 5.13; UI render: default; VCL: kf5 (cairo+xcb)
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded
Comment 6 Commit Notification 2021-08-28 13:13:46 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/69e0567e118f00f299b6aac645c249521eb0629f

tdf#135683  speed up layout of large writer tables

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Commit Notification 2021-08-29 07:34:32 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e3ea0e32657a41b48d9d9d28f6ad15af4c2a7abc

tdf#135683 speed up large writer table load

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Commit Notification 2021-08-31 12:27:47 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/bb5425ed3d8cc04e4242059a17912752d6b48c53

tdf#135683 speed up writer layout cache access

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Timur 2021-09-09 12:29:54 UTC
As explained, fileopen is not slow, but full loading is...somewhat, worse from this DOCX is bug 144373 for ODT. 
Track changes have influence..but worse is bug 144208.
All that without measuring before and after the fix, just 7.3+.

I don't see point in so many different reports for general problem of table perf in Writer. And I think it's a long time known issue - maybe a meta bug should collect all those.
Comment 10 Commit Notification 2021-09-14 17:56:12 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/d467cd0dd9e9cf3b018859a592e2638527bc7add

tdf#135683 speedup DocumentRedlineManager::GetRedlinePos

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Roman Kuznetsov 2021-09-18 17:35:40 UTC
It takes around 20 sec from start of the file opening to end of all 222 pages loading and CPU usage decreasing in

Version: 7.3.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: c5aef25352d20e052ec3a697f3cb979d3bbf9df6
CPU threads: 4; OS: Windows 10.0 Build 19043; UI render: default; VCL: win
Locale: ru-RU (ru_RU); UI: en-US
Calc: threaded
Comment 12 Commit Notification 2021-09-28 16:25:48 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b62153753a9f21afb2a49110ef0459e427b0b01a

tdf#135683 speedup SwAttrHandler

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 13 Noel Grandin 2021-09-28 16:26:52 UTC
Noting that one reason this is still slow is because this document has invalid (overlapping) redlines, which forces the code to fall back to a slower search algorithm
Comment 14 Timur 2021-09-29 12:07:08 UTC Comment hidden (obsolete)
Comment 15 Telesto 2021-10-14 11:37:54 UTC
(In reply to Timur from comment #14)
> Noel, please explain how one can recognize those invalid (overlapping)
> redlines, so that we know in testing.

Likely the same as for bug 144995; See also https://gerrit.libreoffice.org/c/core/+/123458
Comment 16 Aron Budea 2021-10-14 21:43:08 UTC
(In reply to Commit Notification from comment #7)
> Noel Grandin committed a patch related to this issue.
> It has been pushed to "master":
> 
> https://git.libreoffice.org/core/commit/
> e3ea0e32657a41b48d9d9d28f6ad15af4c2a7abc
> 
> tdf#135683 speed up large writer table load
For the record, this has been reverted due to bug 144840.
Comment 17 Telesto 2021-10-24 10:34:24 UTC
(In reply to Aron Budea from comment #16)
> > tdf#135683 speed up large writer table load
> For the record, this has been reverted due to bug 144840.

A bit of a philosophical question (or me lacking information)

I'm ask myself, is the optimization fundamentally wrong or is it simply uncovering some weird logic? I surely understand that the person who is working on optimizations is working on a 'high' level. Not interest/aware of all the implementation aspect of everything involved.

And lacking the interest to solve what he/she has broken somewhere else in the code. And the first reflex being; lets revert. I'm not going the solve the specific problem (and maybe there are more?)

However it feels like throwing away the child with the bathwater. If someone bails on the first encounter of problem (headwind). It's bad for progress, IMHO 

So I'm asking myself is there an assessment made why the the problem occurs? (or plainly opted for revert; the easy course of action). There are already so many unit test etc. So I assume the optimization being pretty on first sight (and the bug being the exception)

The assessment shouldn't necessary be made by the one pushing the commit. Obviously it's better to be handled by someone with some more code knowledge in the area involved [something called collaboration]. I know availability of developers is scarce commodity.. and this might be seen as throwing stuff over the fence (in bad faith)

I do notice that  mostly developer are left on their own; getting the fall-out on their plate, if though the can't really help it (broken code somewhere else, but unfamiliar with it; so no intention to solve). And nobody interested in more/different (or yet unknown) bugs. 

Another issue is that with pulling the commit to soon, is the lack of data.. You don't get enough feedback if there are more problems or only one. I surely understand pulling a commit with 3-5 bugs reported against it which show problem in different parts of the code. But bailing out to soon makes progress really hard. 

But sometimes ask myself is the current practice efficient/effective? Is there no better way to handle this?
Comment 18 Stéphane Guillou (stragu) 2021-12-31 02:22:42 UTC
Thank you Noel for your work on this, but I'm wondering if this should really be included in the 7.3 release notes?

I haven't noticed a particularly significant improvement between 7.2 and 7.3 for this particular document: 58 seconds and 53 seconds respectively until the number of pages shows as 222. Still a fair way away from the couple of seconds needed to open the same file saved as ODT.

Version: 7.2.4.1 / LibreOffice Community
Build ID: 27d75539669ac387bb498e35313b970b7fe9c4f9
CPU threads: 8; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Version: 7.3.0.1 / LibreOffice Community
Build ID: 840fe2f57ae5ad80d62bfa6e25550cb10ddabd1d
CPU threads: 8; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded
Comment 19 Telesto 2021-12-31 02:40:12 UTC
(In reply to stragu from comment #18)
> Thank you Noel for your work on this, but I'm wondering if this should
> really be included in the 7.3 release notes?

I don't think this should be mentioned. The core fix is lacking after. See comment 16
Comment 20 Roman Kuznetsov 2022-05-22 09:32:05 UTC
*** Bug 148936 has been marked as a duplicate of this bug. ***