Download it now!
Bug 79878 - Writer freezes for minutes when opening .docx due to slow XML parse
Summary: Writer freezes for minutes when opening .docx due to slow XML parse
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
4.2.4.2 release
Hardware: All All
: high major
Assignee: Not Assigned
URL:
Whiteboard: target:6.2.0
Keywords: filter:docx, haveBacktrace, perf
Depends on:
Blocks: DOCX-Opening
  Show dependency treegraph
 
Reported: 2014-06-10 12:28 UTC by maintel2
Modified: 2020-01-07 20:00 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
docx 1 (2.93 MB, application/x-rar-compressed)
2014-06-10 12:35 UTC, maintel2
Details
docx 2 (2.93 MB, application/x-rar-compressed)
2014-06-10 12:37 UTC, maintel2
Details
docx 3 (1.72 MB, application/x-rar-compressed)
2014-06-10 12:38 UTC, maintel2
Details
Example file (7.69 MB, application/vnd.ms-word.document.12)
2016-12-07 20:12 UTC, Telesto
Details
Callgrind output from master (7.09 MB, application/x-xz)
2018-11-25 11:48 UTC, Buovjaga
Details
perf flamegraph (477.37 KB, application/x-bzip)
2020-01-07 20:00 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description maintel2 2014-06-10 12:28:16 UTC
Writter is freezing for few minutes when opening this file. After that it start opening the file (Opening page 1/xxx etc)

The expected behaviour is: Open Writter window, show progress bar at bottom and read the file. Also maybe someone can check if the opening can be faster for .docx

thanks
Comment 1 maintel2 2014-06-10 12:35:18 UTC
Created attachment 100807 [details]
docx 1
Comment 2 maintel2 2014-06-10 12:37:11 UTC
Created attachment 100810 [details]
docx 2
Comment 3 maintel2 2014-06-10 12:38:55 UTC
Created attachment 100811 [details]
docx 3
Comment 4 Yousuf Philips (jay) (retired) 2014-06-10 20:14:51 UTC
Confirmed on Windows 7 64-bit with an Intel Core 2 CPU @ 1.83Ghz and 2.5gb

== Loading Time Test Results ==

LibO 3.5.7 : ~11 mins
LibO 4.0.6, 4.2.4 and 4.3 beta 2 : ~9 mins
Kingsoft Writer : ~5.5 mins
Word 2013 : ~4 secs

Note: 'Loading Time' is the time that the document takes to become available for browsing and the UI becomes usable.

Also confirmed on Linux Mint.
Comment 5 Michael Meeks 2014-06-10 20:34:55 UTC
I wouldn't bother profiling anything until bug#38513 is fixed; hopefully Tor is working on that and we'll get something soon =) Then again - if someone wants to run callgrind on it and get a trace, we'll soon see if its a duplicate.
Comment 6 Yousuf Philips (jay) (retired) 2014-06-23 18:33:34 UTC
Now that bug 38513 has been closed, will that fix this issue or will the new patches for for bug 76260, solve this one.
Comment 7 QA Administrators 2015-07-18 17:43:30 UTC Comment hidden (obsolete)
Comment 8 Buovjaga 2015-10-22 10:53:07 UTC
Progress bar appears immediately.

Win 7 Pro 64-bit Version: 5.1.0.0.alpha1+
Build ID: fcc2415ade6ae93710bbbda9f7e163045e323105
TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2015-10-21_16:55:13
Locale: fi-FI (fi_FI)
Comment 9 Yousuf Philips (jay) (retired) 2015-10-22 13:37:19 UTC
Took ~8:30 mins to load on master on the same laptop as used in comment 4.

@Meeks: What is the next move?

Version: 5.1.0.0.alpha1+
Build ID: b684090d4f573eb339e93872d0cef07e69adc913
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2015-10-16_01:50:06
Locale: en-US (en_US.UTF-8)
Comment 10 Robinson Tryon (qubit) 2015-12-09 18:08:26 UTC Comment hidden (obsolete)
Comment 11 Telesto 2016-12-07 20:12:54 UTC
Created attachment 129380 [details]
Example file
Comment 12 Telesto 2016-12-07 20:14:31 UTC
Repro with
Version: 5.4.0.0.alpha0+
Build ID: a9f56091b6422ec8c42f09b8472200ae4ab12548
CPU Threads: 4; OS Version: Windows 6.19; UI Render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2016-12-05_23:12:26
Locale: nl-NL (nl_NL); Calc: CL
Comment 13 Michael Meeks 2016-12-07 20:26:49 UTC
> @Meeks: What is the next move?

Find a developer who cares ? or pay one to care =) not looked at the file, but I imagine there is a -lot- of some construct (styles?) that makes this an unusual file that performs particularly poorly.

Normally getting a callgrind profile, or a Windows VerySleepy (or whatever Aron is using) - will rather quickly pin-point the N^2 or N^3 piece of code that is consuming the time =)

HTH.
Comment 14 Aron Budea 2016-12-07 21:23:48 UTC
And the winner is: XML parsing.
The document has tons of tags and attributes.
Comment 15 Commit Notification 2018-07-08 09:49:23 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a2193f8f33565cc896592acb9d3ab65c756d97fb

tdf#79878 perf loading docx file, sax improvements

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Noel Grandin 2018-07-11 10:14:48 UTC
Note that 

   https://cgit.freedesktop.org/libreoffice/core/commit/?id=2e6afbe65c75c919665927f62efa21140a020d46

was meant for this but I used the wrong bug number in the commit
Comment 17 Commit Notification 2018-07-11 10:38:51 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=7d5c8923284b1ea8f82e30b7e8b2435e929e6c45

tdf#79878 perf loading docx file, more sax

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Commit Notification 2018-07-11 11:10:55 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=bd394492c165d27c96a44495d9ca694a242acb8f

tdf#79878 perf loading docx file, improve threading heuristic

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Buovjaga 2018-07-11 12:56:51 UTC
Tested with attachment 129380 [details]

Time from Start center to responsive document:

Daily build from 18 June: 48s

Master build with Noel's commits: 1min 10s

6.0.5: 38s

Arch Linux 64-bit
Version: 6.0.5.2
Build ID: 6.0.5-1
CPU threads: 8; OS: Linux 4.17; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group

Arch Linux 64-bit
Version: 6.2.0.0.alpha0+
Build ID: bd394492c165d27c96a44495d9ca694a242acb8f
CPU threads: 8; OS: Linux 4.17; UI render: default; VCL: gtk3; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group threaded
Built on July 11th 2018
Comment 20 Buovjaga 2018-07-11 13:33:51 UTC
Did a new build after reverting Noel's 3 commits.
Time: 48s

Note that I also checked with --safe-mode so spellchecker would have no effect, but it did not change the time.
Comment 21 Buovjaga 2018-07-11 14:14:35 UTC
Ok, did a more useful test. I reverted and built from newest to oldest and after reverting 2e6afbe65c75c919665927f62efa21140a020d46 the time to open dropped back to 48 secs.
Comment 22 Commit Notification 2018-07-17 11:27:56 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=d6bd9c273483b12f1bb2ae398afdba977e3ec336

tdf#79878 perf loading docx file, disable SAX threading for writer

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 23 Commit Notification 2018-07-18 06:38:53 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=99e626dee48e08d59304c8abe8abe84e7a99af3a

tdf#79878 perf loading docx file, use XMultiPropertySet

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 24 Commit Notification 2018-07-20 06:59:30 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=c6acb048e6f40ead4110750a79eeb3d6d6d5865d

tdf#79878 perf loading docx file, pendingChars

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 25 Buovjaga 2018-11-25 11:48:26 UTC
Created attachment 147018 [details]
Callgrind output from master

Time to open is still 48 secs like in comment 20 and 21. Here is a callgrind trace.

Arch Linux 64-bit
Version: 6.3.0.0.alpha0+
Build ID: 51e6a95757906dff8b2819a4141bf3dc7938e95f
CPU threads: 8; OS: Linux 4.19; UI render: default; VCL: gtk3_kde5; 
Locale: fi-FI (fi_FI.UTF-8); UI-Language: en-US
Calc: threaded
Built on 24 November 2018
Comment 26 Richard Chen 2018-11-29 08:52:19 UTC
It spent about 1min 5seconds to open the file
Here's the info of version i tried to open the fle 

Version: 6.3.0.0.alpha0+ (x64)
Build ID: 0f25a3c36f27fd51453b9a9115f236b83c143684
CPU threads: 8; OS: Windows 10.0; UI render: GL; VCL: win; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2018-11-27_20:06:55
Locale: zh-TW (zh_TW); UI-Language: en-US
Calc: threaded
Comment 27 QA Administrators 2019-11-30 03:40:31 UTC Comment hidden (obsolete)
Comment 28 Roman Kuznetsov 2019-12-30 10:04:38 UTC
I didn't see a freezing but Writer opened that file very long time 

it takes 2:42 min for me in

Version: 6.5.0.0.alpha0+ (x64)
Build ID: 2d736e1a0a2bbd41fe7793d52bbcc7bfc89c7da3
CPU threads: 4; OS: Windows 10.0 Build 18362; UI render: default; VCL: win; 
Locale: ru-RU (ru_RU); UI-Language: en-US
Calc: threaded
Comment 29 Xisco Faulí 2020-01-07 10:53:56 UTC
it takes

real	1m41,284s
user	1m40,108s
sys	0m0,789s

in

Version: 6.5.0.0.alpha0+
Build ID: bf540873f5e258452fed5006f65a403c95e7872a
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

@Julien, would it be possible to have a perf graph for the attached document ?
Comment 30 Julien Nabet 2020-01-07 11:01:05 UTC
(In reply to Xisco Faulí from comment #29)
> ...
> @Julien, would it be possible to have a perf graph for the attached document
> ?

No pb, I'll do it after my daytime job.
Comment 31 Julien Nabet 2020-01-07 20:00:16 UTC
Created attachment 156993 [details]
perf flamegraph

Here's a Flamegraph retrieved on pc Debian x86-64 with master sources updated today.