Bug 149147 - FILEOPEN LAYOUT specific 78-pages .DOC File opening takes minutes in Linux
Summary: FILEOPEN LAYOUT specific 78-pages .DOC File opening takes minutes in Linux
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected) release
Hardware: All All
: medium normal
Assignee: Not Assigned
Keywords: bibisected, bisected, filter:doc, perf
Depends on:
Blocks: DOC-Opening Performance 149397
  Show dependency treegraph
Reported: 2022-05-18 05:50 UTC by Andrew
Modified: 2022-11-13 12:10 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:

DOC file 1 (703.00 KB, application/msword)
2022-05-18 05:50 UTC, Andrew
DOC file 2 (866.50 KB, application/msword)
2022-05-18 05:51 UTC, Andrew
DOC file 1 (703.00 KB, application/msword)
2022-05-18 06:07 UTC, Andrew
DOC file 2 (866.50 KB, application/msword)
2022-05-18 06:07 UTC, Andrew
Flamegraph (344.13 KB, application/x-bzip)
2022-05-18 11:36 UTC, Julien Nabet

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew 2022-05-18 05:50:56 UTC
Created attachment 180176 [details]
DOC file 1

This file opens a few minutes under Linux and a few seconds under windows
Comment 1 Andrew 2022-05-18 05:51:29 UTC
Created attachment 180177 [details]
DOC file 2
Comment 2 Roman Kuznetsov 2022-05-18 06:03:25 UTC Comment hidden (obsolete)
Comment 3 Andrew 2022-05-18 06:07:26 UTC
Created attachment 180178 [details]
DOC file 1
Comment 4 Andrew 2022-05-18 06:07:53 UTC
Created attachment 180179 [details]
DOC file 2
Comment 5 Michael Weghorn 2022-05-18 06:46:12 UTC
(In reply to Roman Kuznetsov from comment #2)
> Confirm in 7.4 with vcl:kf5

For me, this is equally slow with gtk3, so doesn't seem to be kf5-specific.

Version: / LibreOffice Community
Build ID: 75f7e057039aaa49558e22d18cad651d11589da9
CPU threads: 12; OS: Linux 5.17; UI render: default; VCL: gtk3
Locale: en-GB (en_GB.UTF-8); UI: en-US
Calc: threaded
Comment 6 Julien Nabet 2022-05-18 11:36:30 UTC
Created attachment 180184 [details]

On pc Debian x86-64 with master sources updated today + gen rendering (to avoid accessibility part), I retrieved a Flamegraph corresponding to the opening of the first attachment.
Comment 7 Timur 2022-05-30 15:12:56 UTC
DOC is 78 pages in MSO and more in LO. DOCX is also slow to open but 89 pages.

7.4 112,53s user 0,42s system 97% cpu 1:55,71 total opens 142 pages
7.0 122,31s user 0,77s system 89% cpu 2:18,12 total opens 141 pages
6.4   4,89s user 0,54s system 26% cpu   20,84 total opens 139 pages
5.4  52,60s user 3,79s system 63% cpu 1:28,50 total
5.3  49,74s user 4,75s system 67% cpu 1:20,38 total
5.2   6,13s user 7,24s system 34% cpu   38,85 total  opens 80 pages
4.4   6,17s user 5,04s system 31% cpu   35,19 total
4.1   6,79s user 5,90s system 28% cpu   44,20 total

Both time and pages must be tracked. 
This is in my bibisect plan.
Comment 8 Timur 2022-05-31 08:52:50 UTC
Times I got are 1st time run. Next run is faster. For example in 5.3 it's the difference between 25s and 65s.
There, slow down was in:
source 7837860ff99577467fecb287cb0e3b111729b70a
author	Justin Luth <justin_luth@sil.org>	Dec 02 2016 
tdf#104333 revert ww8import: set table keep/split if emulated
But it just reverted commit 129f93e46c29b388d38e9097869fd3e72dc40a5e from bug 91083.
Already 127 pages before revert and 135 pages, so increase was even earlier. 

Then in 6.4 there was a real speed up (from a commit that doesn't have own bug):
source 1cb7e4899b5ada902e99a0c964ee047950c07044
author	Michael Stahl <Michael.Stahl@cib.de>	2019-06-27 
sw: avoid deleting the iterated SwRowFrame on tdf104188-4.odt
Still 139 pages, though. 

And in 7.0 there was a slow down REGRESSION that this bug is about: 623d6cf06ccba392c1993a3b0ad271d508205e73
author	Justin Luth <justin.luth@collabora.com>	2020-04-21
tdf#73056 doc import: table margins - unknown byte is EndCell

CC: Justin. Please see.
Comment 9 Justin L 2022-06-03 13:20:38 UTC
Almost certainly is just an exposed layout bug - removing "regression".
Confirmed comment 8's bisect with both DOC 1.doc and DOC 2.doc. That commit was not about layout code, so it probably just exposed a table layout issue.