Bug 144395 - LibreOffice from 6.4 very slow and one CPU 100% while opening large Chinese .docx with 2521 pages
Summary: LibreOffice from 6.4 very slow and one CPU 100% while opening large Chinese ....
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.4.7.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, perf, regression
Depends on:
Blocks: DOCX-Opening
  Show dependency treegraph
 
Reported: 2021-09-09 01:03 UTC by wokeness
Modified: 2023-05-22 16:23 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
The problematic large .docx file. (5.20 MB, application/octet-stream)
2021-09-09 01:08 UTC, wokeness
Details
The htop's view on Writer. (4.59 KB, image/jpeg)
2021-09-09 01:11 UTC, wokeness
Details

Note You need to log in before you can comment on or make changes to this bug.
Description wokeness 2021-09-09 01:03:57 UTC
Description:
The .docx file's size is 5.3MB, with chinese characters inside.
Writer hangs on opening the file, 100% CPU, growing MEM.
I run htop to watch the status of Writer, it took about 6min to display the file content, with long lasting 100% CPU and 1G MEM.

Steps to Reproduce:
1. open the large .docx file
2. run htop command and watch the writer's status

Actual Results:
about 6min later, the Writer is operational, but with lasting 100% CPU and 1G+ MEM.

Expected Results:
open the file within 1min, and consume normal cpu and mem.


Reproducible: Always


User Profile Reset: No


OpenGL enabled: Yes

Additional Info:
Version: 7.2.0.4 / LibreOffice Community
Build ID: 20(Build:4)
CPU threads: 4; OS: Linux 5.10; UI render: default; VCL: gtk3
Locale: zh-CN (zh_CN.UTF-8); UI: zh-CN
7.2.0-2
Calc: threaded
Comment 1 wokeness 2021-09-09 01:08:37 UTC
Created attachment 174911 [details]
The problematic large .docx file.
Comment 2 wokeness 2021-09-09 01:11:21 UTC
Created attachment 174912 [details]
The htop's view on Writer.
Comment 3 Timur 2021-09-09 09:22:41 UTC
Repro. I modified title to make it more specific and recognizable. 
Could be a duplicate. 

Interesting that one CPU goes 100% and that's not a single one, but different. 
That's seen with 'time' as user+sys is similar to real, but it should be higher with multicore. 

To compare versions on the same system with time loexit:

5.2 m
real	1m49,340s
user	1m43,322s
sys	0m4,954s

6.0 m
real	1m33,933s
user	1m8,769s
sys	0m4,858s

6.3 m
real	0m50,940s
user	0m46,648s
sys	0m1,648s

6.4 o   1st time      2nd time
real	1m18,501s     1m11,729s
user	1m8,507s      1m6,241s
sys	0m3,689s      0m4,042s

6.4 m
real	3m33,772s
user	3m13,117s
sys	0m2,848s

7.3+ m
real	3m26,415s
user	3m22,688s
sys	0m3,214s

Based on time, I mark regression from 6.4.
Comment 4 Timur 2021-09-09 10:42:34 UTC
Linux 6.4:
commit 8d34f5be1e71481c3a874bf8df2b05759da0701b
Date:   Tue Sep 17 16:56:00 2019 +0200
    source 5ba30f588d6e41a13d68b1461345fca7a7ca61ac
    pre 6e1cb2e9dd406fb2883460cefaa4660622996005

commit 5ba30f588d6e41a13d68b1461345fca7a7ca61ac	[log]
author	Michael Stahl <Michael.Stahl@cib.de>	Fri Sep 06 2019 
tdf#64222 sw: better DOCX import/export of paragraph marker formatting
Comment 5 Timur 2021-10-01 10:54:02 UTC
Hi Michael. Here is a slow down bibisected to your commit, please see.
Comment 6 Roman Kuznetsov 2022-05-28 19:55:06 UTC
I got the crash when tried to open the file after 3 min 30 sec in

Version: 7.4.0.0.alpha1+ (x64) / LibreOffice Community
Build ID: a8df5c815c8b002b7083b8777e3dd8beac573bf3
CPU threads: 4; OS: Windows 6.1 Service Pack 1 Build 7601; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: CL
Comment 7 Gabor Kelemen (Collabora) 2023-05-16 09:48:48 UTC
This got better in 7.4.5 with

https://git.libreoffice.org/core/+/3998b98749739b2c499ffc4d83188e1034b66750

author	Miklos Vajna <vmiklos@collabora.com>	Mon Dec 19 08:47:18 2022 +0100
committer	Mike Kaganski <mike.kaganski@collabora.com>	Thu Dec 22 06:12:45 2022 +0000

sw: ODT import/export of DOCX's paragraph marker formatting

$ time OOO_EXIT_POST_STARTUP=1 isw --norestore note-taking.docx

real    0m29,571s

from the previous commits timing:
$ time OOO_EXIT_POST_STARTUP=1 isw --norestore note-taking.docx

real    2m36,147s

in bibisect-7.6 master it's:
$ time OOO_EXIT_POST_STARTUP=1 isw --norestore note-taking.docx

real    0m28,639s

For comparison 6.0 bibisect gives for me:
$ time OOO_EXIT_POST_STARTUP=1 isw --norestore note-taking.docx

real    0m46,749s

Thanks Miklos for fixing this!

(sure the file could be reused for future perf torturing... 30s still sounds like it could be reduced)