I created a document with about 160 pages and 340.000 words in English language as ODT file. Scribbr's proofreading service only accepts DOC and DOCX files. For this reason, the document has been saved as DOCX file and I suppose it was edited (commented) using Microsoft Word.
When I open the commented file from Scribbr's proofread service Writer constantly uses about 25% (or one full core) of my CPU making it hard to navigate in the document and edit another one or use other applications. Additionally, the Writer crashes if it tries to auto-save the problematic DOCX file and another document.
As soon as I close the DOCX file from Scribbr, everything's back to normal. I suspect they may (unwittingly) infected my document with problematic content or changed the document in a way that the Writer behaves this way. Their support did not investigate this problem at their site.
Steps to Reproduce:
1. Create a document with about 160 pages and 340.000 words and save it as ODT.
2. Save the same document as DOCX file.
3. Upload it at Scribbr's proofreading service and receive a corrected version.
4. Open the corrected DOCX file.
The problem persists.
A commented DOCX file should use that much CPU time.
User Profile Reset: Yes
Version: 22.214.171.124 (X86_64) / LibreOffice Community
Build ID: 50(Build:2)
CPU threads: 4; OS: Linux 6.4; UI render: default; VCL: gtk3
Locale: en-US (de_DE.utf8); UI: de-DE
Please attach your DOCX problem file here
I'm unsure about this. As this is my master's thesis and it contains some personally identifiable information. Is there a way to provide the file only to the small group of people working on that issue?
I uploaded it into NextCloud and share the password-protected link: https://hea.ven.uber.space/s/R5Gfj9PyYwSDHwp
My suggestion is that an offcial representative of The Document Foundation contacts me to get access to this file. Would that be working?
[Automated Action] NeedInfo-To-Unconfirmed
Could you please provide a sample file?
(In reply to this.ven from comment #2)
> I'm unsure about this. As this is my master's thesis and it contains some
> personally identifiable information. Is there a way to provide the file only
> to the small group of people working on that issue?
You could try sanitising it: https://wiki.documentfoundation.org/QA/Bugzilla/Sanitizing_Files_Before_Submission
If the sanitised version still exhibits the problem, you could upload it.
Created attachment 188900 [details]
Problematic DOCX file causing high CPU usage on open, but not after saving it
I sanitized the file according to the link you provided. It is now almost half the size of 1,3 MB as opposed to the original weighing 2,5 MB. The problem persists when opening the file.
However, I observed that the high CPU usage suddenly stopped after saving the changes for sanitization. Hopefully you can still reproduce the behavior. If I freshly open the file again the CPU usage is high again.
Thanks, I reproduce the CPU use which continues for a while after opening the document. There are reports like bug 60418 for "excessive" amounts of comments, but this doc does not have thousands of them.
Arch Linux 64-bit, X11
Version: 126.96.36.199 (X86_64) / LibreOffice Community
Build ID: 50(Build:2)
CPU threads: 8; OS: Linux 6.4; UI render: default; VCL: kf5 (cairo+xcb)
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: CL threaded
Opening this document was pretty tolerable up until 7.3 (~30 sec load+layout), broke first (to ~2 min 20-30sec load+layout) sometime in the 7.4 cycle.
Master is even worse, falls into a layout loop(?) - killed the process after 10 minutes. This started sometime after 7.6, that was still in the 2.5 minute range. A double regression :(.
I can not bibisect it. I get a crash.
(In reply to Gabor Kelemen (allotropia) from comment #8)
> Opening this document was pretty tolerable up until 7.3 (~30 sec
> load+layout), broke first (to ~2 min 20-30sec load+layout) sometime in the
> 7.4 cycle.
Bibisected with linux-64-7.4 to 3b0a0e70cb67fc2e1f9999d2e8cbb9cfcd8c670e
Related tdf#66039 DOCX import: fix Z-order of group shapes
I only had to do a couple of git bisect skips due to crashes.
> Master is even worse, falls into a layout loop(?) - killed the process after
> 10 minutes. This started sometime after 7.6, that was still in the 2.5
> minute range. A double regression :(.
I will leave this as an exercise to someone else.