Bug 93261 - Persistently high background CPU usage by soffice.bin with high-word-count Writer document and auto spell checking disabled
Summary: Persistently high background CPU usage by soffice.bin with high-word-count Wr...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.0.0.5 release
Hardware: x86-64 (AMD64) Windows (All)
: medium major
Assignee: Michael Stahl (allotropia)
URL:
Whiteboard: target:5.1.0 target:5.0.3
Keywords:
: 93458 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-08-08 00:28 UTC by MartinPC
Modified: 2016-10-25 19:21 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
70,000-word plain-text document (389.65 KB, text/plain)
2015-08-12 20:46 UTC, MartinPC
Details
70,000-word ODT document (204.44 KB, application/vnd.oasis.opendocument.text)
2015-08-12 20:48 UTC, MartinPC
Details
570,000-word plain-text document (3.14 MB, text/plain)
2015-08-12 20:53 UTC, MartinPC
Details
570,000-word ODT document (1.24 MB, application/vnd.oasis.opendocument.text)
2015-08-12 20:56 UTC, MartinPC
Details

Note You need to log in before you can comment on or make changes to this bug.
Description MartinPC 2015-08-08 00:28:54 UTC
Hardware & OS:

ThinkPad R61 with Intel Mobile Core 2 Duo T7500 @2.2GHz, NVIDIA Quadro NVS 140M GPU, and 8GB of RAM, running fully up-to-date Windows 7 x64 SP1.

Problem with LibreOffice 5.0.0.5 x64:

When I have loaded a ~10MB, ~14,000-word Writer file with over 100 embedded graphics formatted with 3 different custom frame styles, 19 different applied paragraph styles, 29 different applied character styles, and a fair number of instances of direct (non-style) formatting, and have finished letting it repaginate and redo the word count, soffice.bin's background CPU usage, while I'm not doing anything in LibreOffice or Writer, is 0%, as it should be.

When I have loaded a ~1.5MB, ~720,000-word Writer file with no embedded graphics, no frame styles, no table of contents, no index, no index entries, all paragraphs formatted in default paragraph style, all text formatted in default text style, and all direct (non-style) formatting removed, and have finished letting it repaginate and redo the word count -- for over an hour -- soffice.bin's background CPU usage, while I'm not doing anything in LibreOffice or Writer, hovers at around 50%, driving my CPU and mainboard temperature up by 25°C.

The same problem occurs in LibreOffice 4.4.5.2 x86, but to a lesser degree: soffice.bin's background CPU usage with a high-word-count Writer document loaded is "only" 20%.

Unfortunately, the documents with which I tested this bug are proprietary and confidential and I absolutely cannot post or share them. I am hoping that someone else will have a nonconfidential high-word-count document on hand and can confirm it without too much effort. If not, I will eventually try to find a high-word-count document in the public domain, test it, and submit it if it bears out my theory.

This bug was enough to cause me to downgrade to 4.4.5.2 x86, as I don't want to subject my fan, thermal compound, CPU, and mainboard to high temperatures over a long period of time, especially if I'm not getting any actual computational work done in exchange.

I really hope this bug can be confirmed with generic documents and fixed, as I quite liked some of the added features and (apparently) faster loading speed I saw in LO 5.
Comment 1 MartinPC 2015-08-08 00:38:44 UTC
By the way, the above testing was done with automatic (background) spell checking and grammar checking turned off, with both my personal profile and a default profile. The problem is not my profile or the automatic spell or grammar checking.
Comment 2 MartinPC 2015-08-12 05:50:30 UTC
Identical high background CPU usage confirmed in 5.0.0.5 x86.

Identical high background CPU usage confirmed with plain text (*.TXT) version of ~720,000-word document loaded, in both 5.0.0.5 x64 and 5.0.0.5 x86.

Background CPU usage with plain-text 720,000-word document eventually fell to ~0% in 4.4.5.2 x86.
Comment 3 MartinPC 2015-08-12 20:46:22 UTC
Created attachment 117870 [details]
70,000-word plain-text document

0% background CPU usage in 4.4.5.2 x86, 5.0.0.5 x86, and 5.0.0.5 x64
Comment 4 MartinPC 2015-08-12 20:48:59 UTC
Created attachment 117871 [details]
70,000-word ODT document

0% background CPU usage in 4.4.5.2 x86, 5.0.0.5 x86, and 5.0.0.5 x64
Comment 5 MartinPC 2015-08-12 20:53:20 UTC
Created attachment 117872 [details]
570,000-word plain-text document

Average of around 22%-27% background CPU usage in 4.4.5.2 x86.

48%-50% background CPU usage in 5.0.0.5 x86 and 5.0.0.5 x64.
Comment 6 MartinPC 2015-08-12 20:56:15 UTC
Created attachment 117873 [details]
570,000-word ODT document

0% background CPU usage in 4.4.5.2. x86.

48%-50% background CPU usage in 5.0.0.5 x86 and 5.0.0.5 x64.
Comment 7 MartinPC 2015-08-12 21:00:35 UTC
I downloaded two public-domain books from Project Guthenberg: Leo Tolstoy's ~570,000-word War and Peace and Alain-Fournier's ~70,000-word Le Grand Meaulnes.

Using Calibre, I converted the EPUB-format versions to DOCX, and using LibreOffice I saved them as ODT and TXT versions.

I maintained LibreOffice 4.4.5.2 x86 as my primary (registered) LibreOffice installation and added 5.0.0.5 x86 and 5.0.0.5 x64 as parallel (unregistered) installs. I pointed the bootstrap.ini files of the parallel installs to my existing user profile folder, since LO5 uses the LO4 profile and since I had previously excluded my profile as a potential culprit in this bug.

I then separately loaded each of the two books in each of the two formats in each of the three LibreOffice versions, waited for post-document-loading tasks (repagination and statistics updating?) to complete, and examined soffice.bin's background CPU usage in Task Manager. (System Explorer yielded similar CPU usage readings.)

In 4.4.5.2 x86, soffice.bin *32's background CPU usage was:

* A constant 0% for the 70,000-word TXT version of Le Grand Mealnes
* A constant 0% for the 70,000-word ODT version of Le Grand Mealnes

* Variable, between 0% and 28%, most often in the low to mid 20s, for the 570,000-word TXT version of War and Peace
* A constant 0% for the 570,000-word ODT version of War and Peace

In 5.0.0.5 x86, soffice.bin *32's background CPU usage was:

* A constant 0% for the 70,000-word TXT version of Le Grand Mealnes
* A constant 0% for the 70,000-word ODT version of Le Grand Mealnes

* 48%-50% for the 570,000-word TXT version of War and Peace
* 48%-50% for the 570,000-word ODT version of War and Peace

In 5.0.0.5 x64, soffice.bin's background CPU usage was:

* A constant 0% for the 70,000-word TXT version of Le Grand Mealnes
* A constant 0% for the 70,000-word ODT version of Le Grand Mealnes

* 48%-50% for the 570,000-word TXT version of War and Peace
* 48%-50% for the 570,000-word ODT version of War and Peace

I believe this "high-word-count" bug can be confirmed as new to, or aggravated in, both x86 and x64 Windows editions of 5.0.0.5. I cannot venture any guess as to why background CPU usage would be moderately high in 4.4.5.2 x86 for the plain-text version of the 570,000-word document but not for the ODT version.

I'm uploading the two public-domain documents I used to carry out this test, in each of the two formats I tested (TXT and ODT).
Comment 8 MartinPC 2015-08-15 20:46:28 UTC
See related Bug 93458.
Comment 9 Jean-Baptiste Faure 2015-08-16 19:56:20 UTC
Reproducible for me with LO 5.0.2.0+ built at home under Ubuntu 15.04 x86-64 with gcc 5.1.
I think that this bug is a duplicate of bug 92036.

Workaround that works for me:  launch the automatic spelling then stop it. You may want to add its icon to the standard toolbar to ease this action.

Closing as duplicate. Please, feel free to reopen if you disagree.

Best regards. JBF

*** This bug has been marked as a duplicate of bug 92036 ***
Comment 10 Jean-Baptiste Faure 2015-08-16 20:06:47 UTC
*** Bug 93458 has been marked as a duplicate of this bug. ***
Comment 11 MartinPC 2015-08-16 21:15:42 UTC
Thanks for looking into this, Jean-Baptiste.

Short version: Your workaround didn't work for me, and I question whether this bug (93261) is actually a duplicate of Bug 92036. Accordingly, I am reopening it.

Long version: 

When I turned on automatic spell checking in 4.4.5.2 x86 for Windows, soffice.bin *32's background CPU usage jumped from a variable ~17% to a very stable 50%. When I turned it off again, it dropped back down to the variable ~17%

When I tried your workaround in both 5.0.0.5 x86 and 5.0.0.5 x64 for Windows, and it had no effect on persistent background CPU usage. (In x86, switching automatic spell checking on and off again did introduce a little more CPU variability at first, ranging between ~38% and ~72%, but soffice.bin *32 quickly stabilized at ~50%. In x64, the workaround had no discernible impact on CPU usage in either the short or long term.)

If the automatic spell checking controls are functional in 5.0.0.5 and are in fact capable of turning automatic spell checking off, spell checking is apparently not involved here. Accordingly, this bug (93261) might not be a duplicate of 92036, so I am reopening it.

Thank you again for looking into this.
Comment 12 Jean-Baptiste Faure 2015-08-17 04:41:00 UTC
About the workaround: it works only if I wait a few seconds between switch on and switch off the automatic spell checking.

Best regards. JBF
Comment 13 MartinPC 2015-08-17 05:21:47 UTC
(In reply to Jean-Baptiste Faure from comment #12)
> About the workaround: it works only if I wait a few seconds between switch
> on and switch off the automatic spell checking.
> 
> Best regards. JBF

I did that. In some of the tests, I turned automatic spell checking on and off using Tools > Automatic Spell Checking. In others, I did it by using Tools > Options > Language Settings > Writing Aids > Check spelling as you type. Sometimes I switched between on and off as fast as I could, and sometimes I deliberately waited anywhere between 5 seconds and several minutes before switching. Waiting didn't have any effect on the results. This workaround does not appear to work on this bug in 5.0.0.5 x86 and x64 for Windows.

If the automatic spell checking settings are working correctly in 5.0.0.5, this high-CPU-usage bug is probably a different bug from the spell-checking-loop bug, albeit with similar symptoms.

If this high-CPU-usage bug is really a duplicate of the spell-checking-loop bug, it may mean that automatic spell checking is never really getting turned off in 5.0.0.5, which would be yet another different bug.
Comment 14 Michael Stahl (allotropia) 2015-09-15 13:26:43 UTC
with auto spelling enabled, can reproduce the problem with 5.0.0.5
but not with current master or libreoffice-5-0 branch builds so it
is most likely a duplicate of bug 92036.

however with auto spelling disabled, the problem still happens
on current master.

it turns out that with auto spelling enabled, the auto spelling
function handles the auto-completion word collection too,
but with it disabled there is a separate auto-completion word
collection function, and that has a bug that it will
never mark an empty paragraph as "done".

so let's make this bug about the auto-completion problem...

the other problem here is that we get a timer every 5 ms,
and that interrupts the checking of a long paragraph (>200 words),
and with the >3000 pages in the large document we can easily spend
5 ms just iterating to the first not yet checked paragraph,
so that will time out again, and never finish.

fixed on master.
Comment 15 Commit Notification 2015-09-15 13:27:07 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=97c6dac69ac2ad9cb20ba4d3c167d22a19922700

tdf#93261: sw: fix idle auto-complete collection loop on empty paras

It will be available in 5.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 16 Commit Notification 2015-09-15 13:27:11 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=b4f35a7450830979b937ec6ae3b6d638302093d2

tdf#93261: sw: fix idle auto-complete collection loop on big paras

It will be available in 5.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 17 Commit Notification 2015-09-15 14:15:47 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-5-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=d121cc8037ddcb36763e665cf178791e6e3cafd5&h=libreoffice-5-0

tdf#93261: sw: fix idle auto-complete collection loop on empty paras

It will be available in 5.0.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 18 Commit Notification 2015-09-15 14:15:51 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-5-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=9962cb4bdaa7f9cf6c7a3c9e1dd17e2a51052588&h=libreoffice-5-0

tdf#93261: sw: fix idle auto-complete collection loop on big paras

It will be available in 5.0.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 19 MartinPC 2015-09-22 18:09:54 UTC
Hi, Michael. I really appreciate your work on this.

I tested your hypothesis in a parallel installation of LibreOffice Writer LibreOffice 5.0.1.2 (x64). 

First, I disabled "word completion" and "collect words".

Then I loaded the document in which I first noticed this problem and searched for empty paragraphs (regular expression ^$). There were a few that were recognized as empty -- "anchor" paragraphs for embedded graphics. I added a space between the graphic and the end of paragraph mark and they were no longer recognized as empty (by the search function, at least; I can't speak to the word-collection routine). I saved the document, closed it, and reloaded it.

Soffice.bin's CPU usage jumped to >50% on load but settled down to 0% within five or ten seconds. CPU usage remained low during a brief period of scrolling and reading.

After I added a new alphabetical index entry to the text (and did nothing else), soffice.bin's CPU usage jumped to 49%-50% and remained roughly at that level indefinitely. A timed autosave introduced some temporary variability, but afterwards CPU usage soon stabilized again at around 49%-50%. A manual save brought soffice.bin's background CPU usage back to 0%.

Doing File > Properties > Statistics > Update also triggered high background CPU usage by soffice.bin, with more initial variability at first, but eventually stabilizing at around 49%-50%, apparently indefinitely. A manual save again brought CPU usage down to 0%.

Updating the alphabetical index also triggered high background CPU usage (49%-50%) that persisted until the document was manually saved.

If these operations (Insert Index Entry, Statistics Update, Update Index or Table) are calling the same subroutine you fixed, I expect they will no longer be a problem. If they are calling similar independent subroutines, those subroutines might need to be fixed as well.

I have to be selective in how many parallel installs of LibreOffice I maintain on my computer (because of their impact on my maintenance and backup routines), but I will try to parallel-install the fresh release of 5.0.3 x64 when it comes out and repeat my tests in that. 

Thank you for your work on this bug. It sounds promising and I'm hopeful that I will be able to upgrade to LibreOffice 5 in the near future. All the best.