Bug 131546 - FILEOPEN DOCX: File takes longer to open in master
Summary: FILEOPEN DOCX: File takes longer to open in master
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.0.0.0.alpha0+
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.3.0
Keywords: bibisected, bisected, perf, regression
Depends on:
Blocks: DOCX-Tables
  Show dependency treegraph
 
Reported: 2020-03-24 18:41 UTC by Xisco Faulí
Modified: 2021-08-17 15:40 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Xisco Faulí 2020-03-24 18:41:25 UTC
Steps to reproduce:
1. Open attachment 125390 [details] from bug 100139 using 'time OOO_EXIT_POST_STARTUP=1 instdir/program/soffice'

it takes

real	3m58,908s
user	3m57,239s
sys	0m1,646s

in

Version: 7.0.0.0.alpha0+
Build ID: fd1cd5522283f279a01d6d673f676a1346e9358b
CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded

and

real	1m38,501s
user	1m37,409s
sys	0m1,048s

in

Version: 6.4.0.0.alpha1+
Build ID: 9bc848cf0d301aa57eabcffa101a1cf87bad6470
CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded
Comment 1 Xisco Faulí 2020-03-24 18:42:27 UTC
perf framegraph submitted by Julien in attachment 158953 [details] from bug 100139
Comment 2 Xisco Faulí 2020-03-24 18:45:36 UTC
The opening time went from

real	1m38,501s
user	1m37,409s
sys	0m1,048s

to

real	2m34,595s
user	2m33,596s
sys	0m1,419s

after

https://cgit.freedesktop.org/libreoffice/core/commit/?id=2ab481b038b62b1ff576ac4d49d03c1798cd7f84

author	László Németh <nemeth@numbertext.org>	2020-01-08 14:26:40 +0100
committer	László Németh <nemeth@numbertext.org>	2020-01-09 18:00:16 +0100
commit 2ab481b038b62b1ff576ac4d49d03c1798cd7f84 (patch)
tree 9739e3b799bd06ba07d8cca7ad6c8b85de75dda8
parent 79084665f0e351a3f83fdee88071919f05ec9cc3 (diff)
tdf#90069 DOCX: fix character style of new table rows

later on, it went from

real	2m34,595s
user	2m33,596s
sys	0m1,419s

to

real	3m58,908s
user	3m57,239s
sys	0m1,646s

after

author	László Németh <nemeth@numbertext.org>	2020-02-17 14:34:11 +0100
committer	László Németh <nemeth@numbertext.org>	2020-02-19 16:46:18 +0100
commit 4d5c0eaf3e0d3d3bcd9e691fffee19b75f3d6631 (patch)
tree 6ed8e4a013884c28db01b9175dfc933141b7c395
parent faa2e7b7227b6b87379e7e136ea9ab63f37c3fc4 (diff)
tdf#118812 DOCX import: fix table style preference – part 2

Adding Cc: to László Németh

Bisected with bibisect-linux64-6.5
Comment 3 NISZ LibreOffice Team 2021-04-12 11:57:31 UTC
According to measurements made by our intern (thx Balázs Sántha!) there are a few problematic areas around large docx tables:

- in 6.3 and before opening performance of large docx tables such as this attachment 125390 [details] was somewhat slow: for this  file a not great, not terrible ~40-45 seconds on my laptop.
We can say bug 93660 is about this part of the problem.
- then it became slower in 6.4 to 1:10 minute - as bibisected in bug 136227 comment 3
- then it became even more slower in 7.0 as bibisected here; slowed to around 2:45-2:50 minutes.

Other similar docx+large table bugs are: 
bug 76385 (nested tables load fast now, but seems to leak memory on Linux, yet: not on Windows)
bug 100139 (tracked changes made it slow to edit, not anymore)
bug 101149 (docx load became - interestingly! - better, doc load is still bad, also rendering feels a bit janky: it renders some pages, stops, renders again, stops...)
bug 135683 (nothing special, duplicate of this one)
Comment 4 László Németh 2021-07-07 14:31:18 UTC
For https://cgit.freedesktop.org/libreoffice/core/commit/?id=2ab481b038b62b1ff576ac4d49d03c1798cd7f84,
it would be enough to apply the original patch only for the last table row, so likely there is an “easy” fix for the regression.
Comment 5 Commit Notification 2021-08-12 08:47:01 UTC
Balazs Santha committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/498d2b82187ec3ff58f076e0d15741e64c0505ba

tdf#131546 DOCX import: fix performance regression at tables

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 László Németh 2021-08-12 09:02:15 UTC
tdf#131546 DOCX import: fix performance regression at tables

Commit 2ab481b038b62b1ff576ac4d49d03c1798cd7f84 "tdf#90069 DOCX:
fix character style of new table rows" caused ~20% slowing down
in loading time of documents with huge tables, related to the
extra processing of the redundant w:rPr of table paragraph runs.
(In DOCX tables, MSO exports the run properties into the run and
paragraph sections too, probably because of compatibility or
usability reasons.)

Theoretically in this case, the run properties which are under the
run section win. On the other hand, because LO copies the props
which are applied on paragraph level, and only them, when copying
a row (e.g. upon inserting a new one), it was needed to apply the
mentioned run props not only as direct character formatting, but
as a direct paragraph formatting too. This way, the support of
copying of rows are solved. Unfortunately, this "double" applying
was done for every single paragraph, which quite slowed down the
opening time. This patch gives a workaround, which completely removes
this double applying functionality in the writerfilter by reverting
commit 2ab481b038b62b1ff576ac4d49d03c1798cd7f84 (except its unit test),
and copy the mentioned run properties into paragraph level, when its
needed: upon inserting a new row before/after. This way we spare a lot
of cycles, as most of the original applies had no real use whatsoever.