Bug 156225 - High CPU usage in Writer when opening a commented DOCX file
Summary: High CPU usage in Writer when opening a commented DOCX file
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.4.0.3 release
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, filter:docx, perf, regression
Depends on:
Blocks: DOCX-Comments
  Show dependency treegraph
 
Reported: 2023-07-10 15:51 UTC by this.ven
Modified: 2023-11-23 18:40 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Problematic DOCX file causing high CPU usage on open, but not after saving it (1.20 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-08-10 10:13 UTC, this.ven
Details

Note You need to log in before you can comment on or make changes to this bug.
Description this.ven 2023-07-10 15:51:28 UTC
Description:
I created a document with about 160 pages and 340.000 words in English language as ODT file. Scribbr's proofreading service only accepts DOC and DOCX files. For this reason, the document has been saved as DOCX file and I suppose it was edited (commented) using Microsoft Word.

When I open the commented file from Scribbr's proofread service Writer constantly uses about 25% (or one full core) of my CPU making it hard to navigate in the document and edit another one or use other applications. Additionally, the Writer crashes if it tries to auto-save the problematic DOCX file and another document.

As soon as I close the DOCX file from Scribbr, everything's back to normal. I suspect they may (unwittingly) infected my document with problematic content or changed the document in a way that the Writer behaves this way. Their support did not investigate this problem at their site.

Steps to Reproduce:
1. Create a document with about 160 pages and 340.000 words and save it as ODT.
2. Save the same document as DOCX file.
3. Upload it at Scribbr's proofreading service and receive a corrected version.
4. Open the corrected DOCX file.

Actual Results:
The problem persists.

Expected Results:
A commented DOCX file should use that much CPU time.


Reproducible: Always


User Profile Reset: Yes

Additional Info:
Version: 7.5.4.2 (X86_64) / LibreOffice Community
Build ID: 50(Build:2)
CPU threads: 4; OS: Linux 6.4; UI render: default; VCL: gtk3
Locale: en-US (de_DE.utf8); UI: de-DE
7.5.4-3
Calc: threaded
Comment 1 Roman Kuznetsov 2023-07-16 09:06:54 UTC
Please attach your DOCX problem file here
Comment 2 this.ven 2023-07-17 12:09:14 UTC
I'm unsure about this. As this is my master's thesis and it contains some personally identifiable information. Is there a way to provide the file only to the small group of people working on that issue?

I uploaded it into NextCloud and share the password-protected link: https://hea.ven.uber.space/s/R5Gfj9PyYwSDHwp

My suggestion is that an offcial representative of The Document Foundation contacts me to get access to this file. Would that be working?
Comment 3 QA Administrators 2023-07-18 03:15:28 UTC Comment hidden (obsolete)
Comment 4 ysui2022 2023-08-10 04:15:31 UTC
Could you please provide a sample file?
Comment 5 Buovjaga 2023-08-10 05:02:15 UTC
(In reply to this.ven from comment #2)
> I'm unsure about this. As this is my master's thesis and it contains some
> personally identifiable information. Is there a way to provide the file only
> to the small group of people working on that issue?

You could try sanitising it: https://wiki.documentfoundation.org/QA/Bugzilla/Sanitizing_Files_Before_Submission

If the sanitised version still exhibits the problem, you could upload it.
Comment 6 this.ven 2023-08-10 10:13:26 UTC
Created attachment 188900 [details]
Problematic DOCX file causing high CPU usage on open, but not after saving it

I sanitized the file according to the link you provided. It is now almost half the size of 1,3 MB as opposed to the original weighing 2,5 MB. The problem persists when opening the file.

However, I observed that the high CPU usage suddenly stopped after saving the changes for sanitization. Hopefully you can still reproduce the behavior. If I freshly open the file again the CPU usage is high again.
Comment 7 Buovjaga 2023-08-10 13:30:56 UTC
Thanks, I reproduce the CPU use which continues for a while after opening the document. There are reports like bug 60418 for "excessive" amounts of comments, but this doc does not have thousands of them.

Arch Linux 64-bit, X11
Version: 7.5.5.2 (X86_64) / LibreOffice Community
Build ID: 50(Build:2)
CPU threads: 8; OS: Linux 6.4; UI render: default; VCL: kf5 (cairo+xcb)
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
7.5.5-1
Calc: CL threaded
Comment 8 Gabor Kelemen (allotropia) 2023-11-20 18:23:56 UTC
Opening this document was pretty tolerable up until 7.3 (~30 sec load+layout), broke first (to ~2 min 20-30sec load+layout) sometime in the 7.4 cycle.

Master is even worse, falls into a layout loop(?) - killed the process after 10 minutes. This started sometime after 7.6, that was still in the 2.5 minute range. A double regression :(.
Comment 9 BogdanB 2023-11-21 19:52:08 UTC
I can not bibisect it. I get a crash.
Comment 10 Buovjaga 2023-11-23 18:40:42 UTC
(In reply to Gabor Kelemen (allotropia) from comment #8)
> Opening this document was pretty tolerable up until 7.3 (~30 sec
> load+layout), broke first (to ~2 min 20-30sec load+layout) sometime in the
> 7.4 cycle.

Bibisected with linux-64-7.4 to 3b0a0e70cb67fc2e1f9999d2e8cbb9cfcd8c670e
Related tdf#66039 DOCX import: fix Z-order of group shapes

I only had to do a couple of git bisect skips due to crashes.

> Master is even worse, falls into a layout loop(?) - killed the process after
> 10 minutes. This started sometime after 7.6, that was still in the 2.5
> minute range. A double regression :(.

I will leave this as an exercise to someone else.