Bug 60418 - FILEOPEN and EDITING particular .odt with excessive lots of Comments causes heavy CPU and memory load, can crash on saving
Summary: FILEOPEN and EDITING particular .odt with excessive lots of Comments causes h...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.5.2 release
Hardware: All All
: medium major
Assignee: Not Assigned
QA Contact: Florian Reisinger
URL: http://www.mail-archive.com/discuss@d...
Whiteboard: target:5.2.0
Keywords: perf
: 60419 95244 (view as bug list)
Depends on:
Blocks: Writer-comments
  Show dependency treegraph
 
Reported: 2013-02-07 12:17 UTC by Florian Reisinger
Modified: 2017-03-04 22:10 UTC (History)
11 users (show)

See Also:
Crash report or crash signature:


Attachments
Testfile (480.35 KB, application/vnd.oasis.opendocument.text)
2013-02-07 12:17 UTC, Florian Reisinger
Details
Without Comment tags (525.73 KB, application/vnd.oasis.opendocument.text)
2013-02-07 22:16 UTC, GiorgioMigliaccio
Details
Without Frame tags (430.20 KB, application/vnd.oasis.opendocument.text)
2013-02-07 22:16 UTC, GiorgioMigliaccio
Details
Solution - Without Form tags (440.91 KB, application/vnd.oasis.opendocument.text)
2013-02-07 22:17 UTC, GiorgioMigliaccio
Details
calc, test file with many comments in a worksheet (24.56 KB, application/vnd.oasis.opendocument.spreadsheet)
2017-01-23 20:57 UTC, Patrik
Details
testcase, calc with many comments (24.56 KB, application/vnd.oasis.opendocument.spreadsheet)
2017-01-23 21:00 UTC, Patrik
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Reisinger 2013-02-07 12:17:06 UTC
Created attachment 74336 [details]
Testfile

1) Load attached document
On my computer (Notebook AMD X2 RM-75 2.2 GHz Dual Core it took 1min 11sec to load the document and a crash when closing LibreOffice...
On my Windows 7 x64 LibreOffice needs 462.692 KB
(Version 4.1.0.0.alpha0+ (Build ID: bdfd8de57bf5767ce5c179a5e8705c7587f7b32)
TinderBox: Win-x86@6, Branch:master, Time: 2013-02-06_22:06:22)
and Kb using Version 3.6.5.2 (Build ID: 5b93205): 468.420 KB

2) Try to close it (APPHANG, APPCRASH) [or it hang itself up for 30 sec +]

But 1) is the "main" bug of this report
Comment 1 Rainer Bielefeld Retired 2013-02-07 14:32:30 UTC
Not NEW due to   
 <https://wiki.documentfoundation.org/QA/BugTriage#Set_Status_.26_Prioritize>
I will do some more tests

@Florian Reisinger
V. 3.6.5.2 has been mentioned by whom?
What OS have been tested?
What specific characteristics of reporter's documents are under suspect? 120 pages is not a very large documents.
How have the documents been created?
We should have original reporter from ML here.
Comment 2 Rainer Bielefeld Retired 2013-02-07 14:33:51 UTC
Document contains 2500 ODF Errors, most of them 'unexpected attribute "ls-langcode"'
Comment 3 Florian Reisinger 2013-02-07 15:01:30 UTC
@Giorgio Migliaccio: Please comment, why it is not ODF conform....
If there is no 100% ODF compitable ODF testfile I am going to close it as NOTOURBUG

@Rainer: I did the testing both on Win 7 (BTW It is in the description)

In my case: Win7 x64

the original reporter is in CC
It is just about the loading speed
I don't know how this is created --> If creation is not done by us --> NOTOURBUG, is it?

So, this should be all information I have...

PS: Could you please link to the ODF validator used...
Comment 4 Rainer Bielefeld Retired 2013-02-07 15:10:19 UTC
Currently my suspect is that it might have to do with the lots of lots of Comments in the document. AOOo 3.4.1 seems a little faster for FILEOPEN, but has similar problems, I think that never worked better.
Especially to delete Comments is very difficult
Comment 5 GiorgioMigliaccio 2013-02-07 15:12:54 UTC
*** Bug 60419 has been marked as a duplicate of this bug. ***
Comment 6 GiorgioMigliaccio 2013-02-07 15:17:38 UTC
(In reply to comment #3)
> @Giorgio Migliaccio: Please comment, why it is not ODF conform....
> If there is no 100% ODF compitable ODF testfile I am going to close it as
> NOTOURBUG
> 
> @Rainer: I did the testing both on Win 7 (BTW It is in the description)
> 
> In my case: Win7 x64
> 
> the original reporter is in CC
> It is just about the loading speed
> I don't know how this is created --> If creation is not done by us -->
> NOTOURBUG, is it?
> 
> So, this should be all information I have...
> 
> PS: Could you please link to the ODF validator used...

We've created the documents with LibreOffice 3.5.4 but, for integrating it into our own solution, we needed to extend the file with some of our own attributes which will contain a guid/key. 
So those attributes on your elements are the only thing that's added by us and is not according specs, but shouldn't be the reason why these documents are behaving as they are in LibreOffice/OpenOffice.org.
The tests were run on Windows 7 32 and 64-bit and 4 GB of RAM with LibreOffice versions 3.4.5, 3.5.4, 3.5.5, 3.6.x and 4.0 beta. All gave the same results, were 3.4.5 gave slightly better results with our full-blown resolved document, containing all the sub-documents.
Comment 7 Rainer Bielefeld Retired 2013-02-07 15:47:15 UTC
Validator: <http://odf-validator2.rhcloud.com/odf-validator2/>

Documents with thousands of comments are not a usual application for an Office Suite, so I reduce priority.


@GiorgioMigliaccio:
Please do not cite and do not use your mail client for Comments.
Is there any possibility for you to check whether the comments cause the problems?
Comment 8 Rainer Bielefeld Retired 2013-02-07 15:55:10 UTC
Hm. The document contains more than 200000 Frames for 42000 words - this is a VRY unusual document.

@GiorgioMigliaccio:
I wonder why I get offered to delete comments what seem to be related to a PW protected Area, what might be a REAL bug.
Can you please check that and submit a different Bug (and add me to CC) if my suspect concerning Comment deletion is correct?
Comment 9 GiorgioMigliaccio 2013-02-07 16:05:15 UTC
@Rainer : this document indeed contains a lot of frames. We mark ranges of text which are conditional or need to be looped with a start 'tag' and end 'tag', which are put physically in the document using frames, and also variables are put there using frames which get a specific attribute and guid so we can link it to our own objectmodel which specifies some additional info for these ranges. We've tried and investigated all LibreOffice (or OpenOffice.org) had to offer us at the time we started and this came out being the most flexible solution.
Trying to delete a comment for user x just crashes LibreOffice 4.0.0.2.
I'll try to remove them from the xml, manually, and see if this improves the document speed and memory usage in LO.
Comment 10 Rainer Bielefeld Retired 2013-02-07 17:06:52 UTC
(In reply to comment #9)
That would be great, we will have to separate the task into bite-sized different bugs. I believe most promising would be to create several documents based on this sample here:
One only with lots of frames
One only with lots of comments
One only with lots of ...

For each of those troublemakers a separate Bug (with "Block this one) should be submitted. So the possible roots can be examined in separate steps, what would increase chance for a quick fix.
Comment 11 GiorgioMigliaccio 2013-02-07 22:14:17 UTC
Ok, found it.
The particular document(s) were containing hundres of the following lines:
<form:form form:name="Form" form:apply-filter="true" form:command-type="table" form:control-implementation="ooo:com.sun.star.form.component.Form" office:target-frame="" xlink:href="" xlink:type="simple">
               <form:properties>
                  <form:property form:property-name="PropertyChangeNotificationEnabled" office:value-type="boolean" office:boolean-value="true"/>
               </form:properties>
</form:form>


Removing these from the content.xml sped up the document again to open in 5 secs and use only some 50 MB of memory!

I also tried removing the frames and the comments from the document, and this didn't change much, even though there were hundreds of them in the document, so, this seems implemented very well! :-)

But the main culprit seems to be the form:form tag for some reason, I don't even know how or when it got in to the documents.
The strange thing is that the same entry is occurring hundreds of times in the same document.
Comment 12 GiorgioMigliaccio 2013-02-07 22:16:05 UTC
Created attachment 74383 [details]
Without Comment tags
Comment 13 GiorgioMigliaccio 2013-02-07 22:16:34 UTC
Created attachment 74384 [details]
Without Frame tags
Comment 14 GiorgioMigliaccio 2013-02-07 22:17:07 UTC
Created attachment 74385 [details]
Solution - Without Form tags
Comment 15 Rainer Bielefeld Retired 2013-02-08 18:25:46 UTC
@GiorgioMigliaccio (In reply to comment #11)
Thank you for the additional sample documents, I will try next week to narrow down influence to the visible worse performance compared to LibO 3.4.5.

The most promising way to get a more visible improvement for your root problem with the "form form form" tags might be that you do some own tests how they get into the documents. And if there is a suspect that it's a LibO bug you should submit an additional new bug with a detailed description how a developer can reproduce the bug with very simple test equipment. Please feel free to add me to CC.
Comment 16 David 2013-05-04 01:32:32 UTC
I believe this bug is a duplicate of bug 61558.  I've found that toggling the "Hidden Paragraphs" option under the view menu will cause it to start responding properly while selecting "Update All" under the tools menu will cause an extreme lag.
Comment 17 GiorgioMigliaccio 2013-05-06 07:02:13 UTC
Thanks David.
Will try that.

We still didn't manage to find out why or when these form:form tags got into the document, which caused the incredibly slowdown and memory consumption.

But you mentioned 'Hidden paragraphs', and I know some of our document authors were playing with that feature, it might as well be the root cause of the problem.
So, perhaps, once you toggle that feature on/off, you get the form:form tags.

I will test this and give feedback.
Comment 18 David 2013-05-06 19:15:58 UTC
I don't think it actually has anything to do with hidden paragraphs since I have several large documents that act the same way and AFAIK they don't have any hidden paragraphs.
Comment 19 ign_christian 2013-07-09 14:34:56 UTC
I can open attachment 74336 [details] in about half minute using Pentium 1.3Ghz & 2Gb ram (LO 4.0.4.2 - Win7 32bit). 
Confirm crash when closing that file (perhaps freezing because when open LO again not opening document recovery).
Comment 20 QA Administrators 2015-04-19 03:20:23 UTC Comment hidden (obsolete)
Comment 21 Buovjaga 2015-06-15 11:27:42 UTC
Still same slowness all around. Closing is just slow, doesn't crash.

Win 7 Pro 64-bit Version: 5.1.0.0.alpha1+
Build ID: 01a189abcd9a4ca472a74b3b2c000c9338fc2c91
TinderBox: Win-x86@39, Branch:master, Time: 2015-06-14_07:46:28
Locale: fi-FI (fi_FI)
Comment 22 Buovjaga 2015-10-24 15:53:12 UTC
*** Bug 95244 has been marked as a duplicate of this bug. ***
Comment 23 Luke Kendall 2015-11-13 02:22:04 UTC
This problem occurs with even something like only a thousand comments.

From observation, it's clear that at least one of the algorithms involved with handling comments is O(N^2) or worse.

Editing the manuscript of a novel for publication is not that unusual a use case, and the use of comments for the editing and revision process by the author and an external editor is the norm.  Even deleting comments that have been addressed, novels normally go through several rounds of revision, so the number of comments stays reasonably high for quite some time.

I had to break the file for my novel (the version with comments) into two halves for LO to be usable to work on it.

On a separate, unsplit version with o comments, I revised and added comments as needed.  Performance stayed fine until I reached about the 50% mark, when a slight effect on performance was detectable.  The degradation in performance rose slowly, then faster.  It was noticeable by the 65% - 70% mark; now at the 77% mark, with something like a thousand comments now added, performance is becoming a serious problem.  Just typing text into the body of the docmuent now lags, the characters appearing only at the rate of one or two per second.

Splitting the file in two is going to be necessary, I suspect, which is unfortunate, as it is not only added work but LO appears to sometimes lose the formatting of italics in some paragraphs, meaning that the etxt cannot be trusted and needs to be painstakingly compared when I paste the documents together to make the final single file for generating the epub and other versions.

Would a simple (doubling) growable array of references to comments be a candidate data structure to handle comments?  The O(N^2) algorithm is the performance killer.

I'm seriously considering the idea of buying a copy of MS Office to run under Wine or Crossover because of this problem, especially as it's been prioritised so low. :-(
Comment 24 Buovjaga 2015-11-13 06:59:19 UTC
Well, the severity should be higher, it's true.
Comment 25 Luke Kendall 2015-11-14 13:41:46 UTC
It's not at the unusable point quite yet.  I've pasted some of my reply comments inside the editor's comments, and deleted mine, to get reduce N. I may resort to inserting comments into the body of the document. :-(

Incidentally, the combination of the slow performance, plus LO's tendency to scroll to a different point within the comment when you click the cursor in it, means that on dozens of occasions today, I've modified the wrong part of the comment (because the insertion point, or the etxt I've sweep-selected, isn't what LO actually used because it repositioned the comment text before applying my edits).  So I then had to undo and reposition the comment text and then re-paste or retype my edit to the comment.  I.e. this behaviour induces errors, too.

That combines especially badly with the (separately reported bug) where LO scrolls back to the previous comment if the insertion point was in an earlier comment (possibly pages back), before applying your edit.
Comment 26 Robinson Tryon (qubit) 2015-12-09 18:08:08 UTC
Migrating Whiteboard tags to Keywords: (perf)
Comment 27 Luke Kendall 2016-02-27 04:28:40 UTC
I'm now using LO 5.1.0.3.  The issue is not improved.  Today I also noticed a new problem: in the document with many comments, as I address each, I delete the comment (so that performance will gradually improve).

For comments that require further discussion, I copy the text, with comment included, into the new version of the document and then reply to the comment.

Currently, it takes 10-20 seconds for the many-comment document to respond when I click on the comment.  I have found that if I click, wait, then hover or click on the drop-down comment-actions button, at some stage the "Activate this button..." balloon help will pop up, and then if I click, on the arrow, the menu will appear within a second or so.

BUT, for comments for which the associated text is still selected, when I click on the "Delete" action in the comment menu, often now, instead of deleting the comment, the selected text that the comment refers to is deleted!  This happens about 50% of the time.

The first three times it happened, I convinced myself I must have hit the Delete key instead of selecting the Delete Comment menu item: but that is not the case.  The Delete Comment *sometimes* deletes the associated text, if it is selected.

You can Undo, and the select Delete Comment again, and the comment will be reliably deleted on this second attempt.  However, another oddity I noticed is that when the comment has been successfully deleted, if I Undo that operation, while the comment reappears, it points to just the very end of the span of text that it was previously associated with.
Comment 28 Commit Notification 2016-04-26 09:43:44 UTC
Aron Budea committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f91674bd4b5022a63dc5e6a89fe9a1b832d96798

tdf#60418: improve perf of opening/closing odts with form tags

It will be available in 5.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 29 Luke Kendall 2016-04-26 10:04:15 UTC
(In reply to Rainer Bielefeld Retired from comment #7)
> Validator: <http://odf-validator2.rhcloud.com/odf-validator2/>
> 
> Documents with thousands of comments are not a usual application for an
> Office Suite, so I reduce priority.
> 
> 
> @GiorgioMigliaccio:
> Please do not cite and do not use your mail client for Comments.
> Is there any possibility for you to check whether the comments cause the
> problems?

I disagree: currently, Microsoft Word documents are the generally preferred method of reviewing and critiquing manuscripts, between authors and editors. It's wonderful that LO can export .docx format, and handles the comments.  But it is quite common to have 2-10 comments on most pages of a manuscript, and if the manuscript has 400 pages, having 2000 comments would not be unusual at all.

And there is clearly an N^2 algorithm in there, from my observations: as I delete comments, the speed improves.

And with around 2000 comments, to avoid 10+ second waits for every interaction, by splitting the file into a 1st half and a 2nd half, performance went from unusable to good.

I look forward to trying out the new version: I hope it addresses not just the problem found related to forms, but also to the number of comments!
Comment 30 Aron Budea 2016-04-26 21:09:04 UTC
(In reply to Luke Kendall from comment #29)
> I look forward to trying out the new version: I hope it addresses not just
> the problem found related to forms, but also to the number of comments!

I'm just a random person looking at random issues trying to understand the codebase better, who found some quick wins, so don't get your hopes up. I checked only with the 3rd attachment (the 1st didn't even open), and only opening and closing times. I'll probably try to look into this some more in the future, but can't promise anything.
Comment 31 Luke Kendall 2016-04-27 00:35:39 UTC
(In reply to Aron Budea from comment #30)
> (In reply to Luke Kendall from comment #29)
> > I look forward to trying out the new version: I hope it addresses not just
> > the problem found related to forms, but also to the number of comments!
> 
> I'm just a random person looking at random issues trying to understand the
> codebase better, who found some quick wins, so don't get your hopes up. I
> checked only with the 3rd attachment (the 1st didn't even open), and only
> opening and closing times. I'll probably try to look into this some more in
> the future, but can't promise anything.

Okay, thanks, Aron.  I'll keep my fingers crossed: good luck!
Comment 32 Patrik 2017-01-23 20:57:10 UTC Comment hidden (off-topic)
Comment 33 Patrik 2017-01-23 21:00:20 UTC Comment hidden (obsolete)
Comment 34 Patrik 2017-01-23 21:05:54 UTC Comment hidden (off-topic)
Comment 35 Buovjaga 2017-01-24 05:27:34 UTC Comment hidden (off-topic)
Comment 36 Luke Kendall 2017-02-09 11:59:45 UTC
I've upgraded to LO 5.2.5.1 now, and also received the Word .docx for the 2nd half of my novel, with his comments inserted.

The 2nd half was longer than the 1st - about 87k words.

When I tried to use it, in LO, LO was basically unresponsive.  I managed to type a three words, and (after a minute or two) undo that change.  And after that, I couldn't get any of the menus to respond, in a minute of clicking and waiting.

At that point I gave up, because waiting over a minute for every UI action was clearly impractical.

So I divided that file into two (making a file for the 3rd quarter, and one for the 4th quarter).  Once I'd done that, performance was (barely) acceptable: e.g. I could click on the drop-down arrow on a comment and after a few clicks, and waiting about 6 seconds, the comment menu would pop up.

It's quite clear that MS Office does not use an O(N^2) algorithm for its handling of comments, and LO does.

It would be great if someone could locate the section of code which is using multiple passes down a linear array or list, and replace it with something that's not O(N^2).

At least I have a workaround, since I can just work on my main document (which has at most a couple of hundred comments) and correct that, while referring to my editor's commented version (halved or halved again until it becomes responsive in LO), and delete each comment as I deal with it, so that performance quadratically improves as the no. of comments drop.

LO as it stands however is not currently a suitable tool for someone working as an editor who needs to add comments to a novel.  For that, as LO stands, you need the Microsoft product, unfortunately.