Bug 125619 - 'performance problem with plenty comments' still in 6.3 alpha1, *only after save of file/autosave*
Summary: 'performance problem with plenty comments' still in 6.3 alpha1, *only after s...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.3.0.0.alpha1+
Hardware: All All
: high normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, perf, regression
: 125641 159665 (view as bug list)
Depends on:
Blocks: Calc-Comments Regressions-ViewToDevice-Refactor
  Show dependency treegraph
 
Reported: 2019-06-01 12:37 UTC by b.
Modified: 2024-02-10 11:04 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:


Attachments
Spreadsheet with many comments (81.31 KB, application/octet-stream)
2019-10-07 14:34 UTC, David
Details
many-comments.ods (att. 154800) from David shown as folder? after download (124.31 KB, image/png)
2019-11-08 08:07 UTC, b.
Details
file 'content.xml' from att. 154800 opened in firefox (613.17 KB, image/png)
2019-11-08 09:10 UTC, b.
Details
Flamegraph (175.77 KB, application/x-bzip)
2019-12-31 23:25 UTC, Julien Nabet
Details
Flamegraph (with gen rendering) (63.47 KB, application/x-bzip)
2019-12-31 23:32 UTC, Julien Nabet
Details
file with macro to test 'comment performance' in LO calc, (21.98 KB, application/vnd.oasis.opendocument.spreadsheet)
2020-09-10 18:53 UTC, b.
Details
evil effect of (auto)save (276.55 KB, image/jpeg)
2020-09-11 11:10 UTC, b.
Details

Note You need to log in before you can comment on or make changes to this bug.
Description b. 2019-06-01 12:37:26 UTC
Description:
this posting is related to a very old bug #76324, but it's not! a duplicate of this. 

something is fixed / plastered / changed since that report, but something is left or new. 

calc produces massive performance issues when using plenty comments in a file *and* after autosave. 

see repro script below, system win7 pro SP1 (x64), 

Steps to Reproduce:
1. idle computer
2. fresh start of calc
3. fill 100x100 cells with any data
4. open task manger
5. observe 'no significant cpu load' while moving the mouse around over the sheet, 
6. wait till first autosave of the file (you can set it to a short time for that)
7. retry hovering around, observe 'no significant cpu load', 
8. add comments to all theese cells (add comment to one and paste over the oters), 
9. see cpu-load peak on copying, 
10. hover around with the mouse, 
11. in some cases you see significant load, in other cases you have to wait for the subsequent autosave, after that you'll likely see impressive cpu load just from hovering around with the mouse ... 
12. for recheck: save the file, restart calc, open the file, no cpu load on hoverimg until the first autosave, 

imho this load is unneccessary, and! negatively affecting the performance of nearly any subsequent work or operation. 



Actual Results:
unproductive cpu load on mousemoves after autosave of files with plenty comments.

Expected Results:
cpu load on mousemoves similar to fresh load of file also after autosave has taken place. 


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 6.3.0.0.alpha1 (x64)
Build-ID: 547edd20e527fb02900f6174973770d26306e2e7
CPU-Threads: 8; BS: Windows 6.1; UI-Render: Standard; VCL: win; 
Gebietsschema: de-DE (de_DE); UI-Sprache: de-DE
Calc: threaded
Comment 1 b. 2019-09-09 21:02:58 UTC
hello, 

problem also occuring after a normal 'save' or 'save as' of a file (with too many comments), 

problem occuring with different OS (Linux) and flavours of LO, 

problem also occuring and reported with AOO (apache open office), 

similar problem occuring with LO writer (plenty comments slow down the program till unusable), 

problem not! occurring in Ex$el, can bear 8 times more comments than calc before impact on performance, with 2 to 4 times bigger difference reg. the system tested on, 

problem not! occuring in Wo$d, can bear 6 times more comments than writer before becoming really slow, with 2 to 4 times bigger difference reg. the system tested on, 

reg. 

b.
Comment 2 b. 2019-09-23 22:40:22 UTC
look around, hug somebody and say 'there is hope', 

version 6.2.7.1 win-x64 doesn't! show this problem ... 

could somebody please retest? 

just 'poking around' i istalled that version and surprisingly: 

there is no or at least very little performance impact from plenty comments in a document, as well regarding delayed reaction on mouseclicks, as regarding production of cpu load by hovering around with the mouse ... 

no problem to work in a sheet with 30k commented cells, calc is 'responsive' as one could want and like it.  

- i hope this gives some help to find the source of the problem - 

crosschecked with linux ... :-( 

copying one commented cell to a 50x50 pattern: 7 minutes with 100% cpu load, 

copying that 50x50 field as one block below itself: 20 seconds, cpu load not observed, 

copying one cell with a comment to a 50x50 field below the existing 50x100 commented cells from previous steps: 20 minutes, 100% cpu load, "LibreOffice 6.2 Calc" is not responding, half an hour of 'wait', still not finished ... :-( ... took around one hour to complete ... 

test on linux doesn't need a save or autosave to show problems. 

thus i'll try to raise the importance, one hour to wait is similar to a crash ... didn't work, could pls. somebody with 'power' do that?
Comment 3 Xisco Faulí 2019-09-26 09:47:25 UTC
You can't confirm your own bugs. Moving it back to UNCONFIRMED until someone
else confirms it.
Comment 4 David 2019-10-06 15:10:25 UTC
I have not tried yet the test given in the first post, but I also have a large spreadsheet that began to exhibit this behaviour with version 6.3.  Version 6.2.7 works fine.  When I load the spreadsheet in version 6.3, the behaviour is normal until the spreadsheet gets saved, then it gets extremely slow when doing any input.  Removing all the comments corrects the problem.
Comment 5 Xisco Faulí 2019-10-07 09:02:20 UTC
(In reply to David from comment #4)
> I have not tried yet the test given in the first post, but I also have a
> large spreadsheet that began to exhibit this behaviour with version 6.3. 
> Version 6.2.7 works fine.  When I load the spreadsheet in version 6.3, the
> behaviour is normal until the spreadsheet gets saved, then it gets extremely
> slow when doing any input.  Removing all the comments corrects the problem.

Please attach a sample document, as this makes it easier for us to verify the bug. 
(Please note that the attachment will be public, remove any sensitive information before attaching it. 
See https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F for help on how to do so.)

I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the requested document is provided.
Comment 6 David 2019-10-07 14:34:28 UTC
Created attachment 154800 [details]
Spreadsheet with many comments

Following the instructions in the first post provides a file that is unusable in 6.3, for all practical purposes, after an initial save.  Functions correctly in 6.2.7.  See attachment.
Comment 7 Telesto 2019-10-08 20:25:47 UTC
Repro with
Version: 6.4.0.0.alpha0+ (x86)
Build ID: c45d477b0a0038d9c25176cf7cff299e5ddf3a7a
CPU threads: 4; OS: Windows 6.3; UI render: default; VCL: win; 
TinderBox: Win-x86@42, Branch:master, Time: 2019-09-30_05:06:55
Locale: nl-NL (nl_NL); UI-Language: en-US
Calc: CL
Comment 8 b. 2019-11-08 08:07:42 UTC
Created attachment 155622 [details]
many-comments.ods (att. 154800) from David shown as folder? after download

hello @all, 
hello and thanks @David and @Telesto for help in digging down, 

no solution, but may be some help ... 

i always wanted to know how a '.ods' file works internally, when downloading David's attachment 154800 [details] some of it was revealed. Strangely enough it wasn't opened with calc, but displayed as a kind of 'folder', see attached screenshot. More in the next comment. 

b.
Comment 9 b. 2019-11-08 09:10:22 UTC
Created attachment 155628 [details]
file 'content.xml' from att. 154800 opened in firefox

... continued: 

the files manifest.xml and others opened in text-editor (gedit?) and showed some info too cryptical for me, the file content.xml shot gedit into digital nirvana with endless massive cpu-load and fan noise. So I thought this file was broken, but I managed to open it with firefox. 

It consists of a header with several style definitions, and then the block I marked blue in the screenshot repeats the same thing several thousand times, with two variables 'svg:x' (from 80.99 to 6672,98 in steps of 64: 104 entries) and 'svg:y' (from 0 to 1267.2 in steps of about 12.8: 100 entries). 

(I assume that represents 104 columns from A to AZ and 100 rows from 1 to 100, with only graphical coordinates where to place something.)

I have massive difficulties imagining any program that could work effectively with this data structure. 

The data is a !list! that can only be accessed by searching? And whose values can only be assigned to a cell with float calculations of graphical coordinates? 

if you do this for the full list on every dot of a move of the mouse ... you get massive unproductive overhead. 

One might do things like this if to handle a manageable amount of values ... but as the number of comments in calc is limited only by the number of cells in the sheet?, this structure causes a behavior that when programming and testing with few comments everything slides wonderfully, with larger amounts of comments massive problems have to occur. 

(high-res screens may contribute to the impact by increasing the number of moves that must be evaluated.)

In my opinion, one would have to rework the data structure in such a way that:
- each entry for a comment is marked as to which cell it belongs
- these 'cell coordinates' can be used as an index to access the comments,
(a two-dimensional array ([0 to max-used-col; 0 to max-used-row]: string; attributes)?

But that's a job for programmers, my contribution as a user is to show that the data structure used is something between ridiculous and terrible.

Please correct me and I apologize very much, of course, if I misunderstood something.

reg. 

b.
Comment 10 b. 2019-11-09 17:37:03 UTC
apologize: 

- there is! some more structure in the stored files, 

- cells are noted row by row from left to right, 

- all rows are filled till max-cols-used, 

- empty cells are padded with 'cols-repeated' or similar, 

- jump to next row is marked with </table:table-row>

but it's still a sequential list with sequential access, or did i miss more structure? 


observations: 

- *.ods files are a flavour of zip archives which can be opened with - e.g. - 7zip, 

- David's attachment is 'usable' with ver. 6.4.0.0 alpha1+, 

this version still shows 'enhanced cpu-load on mousemoves', and 'delayed reaction on mouseclicks', but reduced to former versions, 

just 6.2.7.1 had been better, by a factor of about 2, 

Davids file has about 10k comments, 

somewhere above 40k i touch a limit where no version is usable, 

contrary filling 4M cells with numbers or formulas is - nothing - and takes - nearly - not time, 

*****************************************************************************
the funny thing that the negative impact of above performance problems is at least doubled after any? save of the file or even just a 'save a copy' is persistent, 
*****************************************************************************

all above said for win version. 

reg. 

b.
Comment 11 Telesto 2019-12-31 17:20:05 UTC
Bibisected to
author	Armin Le Grand <Armin.Le.Grand@cib.de>	2018-10-12 11:13:09 +0200
committer	Armin Le Grand <Armin.Le.Grand@cib.de>	2018-11-27 11:33:10 +0100
commit d464d505fbf6e53a38afdd3661d320fac8c760d6 (patch)
tree 3bf7db8591172bf948198f19d36df5df886486bb
parent 3e1e2b6687b0259ae28441cc0d314de0d908776b (diff)
Refactor calc non-linear ViewToDevice transform

https://cgit.freedesktop.org/libreoffice/core/commit/?id=d464d505fbf6e53a38afdd3661d320fac8c760d6
Comment 12 Telesto 2019-12-31 19:09:42 UTC
Addding CC: to Armin Le Grand
Comment 13 Telesto 2019-12-31 19:12:09 UTC
@Julien
A flamegraph would be nice ;-)
Comment 14 Julien Nabet 2019-12-31 23:25:35 UTC
Created attachment 156866 [details]
Flamegraph

Here's a Flamegraph retrieved on pc Debian x86-64 with master sources updated today (gtk3 rendering).
Comment 15 Julien Nabet 2019-12-31 23:32:20 UTC
Created attachment 156867 [details]
Flamegraph (with gen rendering)
Comment 16 Xisco Faulí 2020-05-14 08:22:39 UTC
Still reproducible in

Version: 7.0.0.0.alpha1+
Build ID: 1ffe59ef31186e36ad0aa7bbcdd32e407ee8d26c
CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

Steps to reproduce:
1. Open attachment 154800 [details]
2. Save it with another name

-> it hangs

@Noel, I thought you might be interested in this issue since Julien attached a flamegraph
Comment 17 Noel Grandin 2020-05-14 08:27:23 UTC
This is the same problem I mentioned in that other bug, that requires a fairly major refactoring of how comments/notes work in calc.
Comment 18 Julien Nabet 2020-05-14 08:42:54 UTC
(In reply to Noel Grandin from comment #17)
> This is the same problem I mentioned in that other bug, that requires a
> fairly major refactoring of how comments/notes work in calc.

Miklos: could it be a subject discussed on ESC?
Comment 19 Miklos Vajna 2020-05-14 08:59:49 UTC
Done, it's now on the ESC agenda, see <https://lists.freedesktop.org/archives/libreoffice/2020-May/085093.html>. However, just to set expectations: the ideal outcome of bisected regression is that the author of the problematic commit deals with it. It's rare that it happens otherwise.
Comment 20 Xisco Faulí 2020-05-14 09:07:03 UTC
Maybe this is one of those bugs that seems like a regression to the end user but in the end it's just an old issue which is more visible now.
According to comment 17, that might be the case here, and this issue might be a duplicate of bug 76324
Comment 21 David 2020-09-09 23:50:52 UTC
Something has definitely changed with version 7.10 alpha.  Version 7.02 still has the problem, but in 7.10 alpha, using the sample file, changing cells after a save is now approximately twice as fast as in 7.02.  But changing cells in my real life file is now immediate.  There is no observable delay in that file in ver. 7.10 alpha, while it is still a major issue in 7.02.
Comment 22 b. 2020-09-10 18:53:17 UTC
Created attachment 165379 [details]
file with macro to test 'comment performance' in LO calc,

it gets better gradually, but the processing of comments in calc is still a huge problem,  

i have attached a sheet with a small macro, it should be self-explanatory, fresh loaded you likely have to activate the form control toolbar via [view - toolbars - form controls] and deactivate 'edit mode' there, otherwise the macro button won't work, 

with '1' and '0' in A2 and A3 you can time the generation of 10.000 cells without comments, and with '1' and '1' the time for the same with comments, 

with higher numbers in A2 more cells are created and it takes longer, 

i find that 1 second for 150.000 and 4 seconds for 2.000.000 cells without comments, compared to 411 seconds for 150.000 cells with comments, shows that the handling of comments in calc is still 'not very effective' ... 

the problem gets much worse after the first (auto-)save, the memory usage is more than doubled after that, maybe this is an indication, 

similar delays occur in the handling of 'normal' files with many comments, to one of the many threads about it someone had attached a Lenovo article list with 25.000 comments, useless for everyday work :-(
Comment 23 b. 2020-09-11 11:10:36 UTC
Created attachment 165389 [details]
evil effect of (auto)save

for those who do not believe in the 'bad influence' of autosave here is a visualization:  

i used the sheet / macro from the above comment to create 150.000 cells with comments (A2: 15 - A3: 1 - start macro), and observed the resource usage and commented attached screenshot: 

1. - close calc (7.1.0.0.a0+), 
2. - open calc (7.1.0.0.a0+), 
3. - open the sheet with the macro, 
4. - start generation of the cells, 
5. - slow progress in the production, 
6. - start of autosave (after 9 minutes), 
7. - inadvertent opening of calc 7.0 parallel, 
8. - calc 7.0 closed, 7.1 continues to run, 
9. - memory full, 
10. - finished and display of the result, 

the used memory will be released only after closing calc, i.e. until then the computer works with 'memory full', 

that certainly does not! help the system to work smooth and performant, 

(same happens if the cell creation is finished before autosave) 

the effect of many comments is already unhealthy, 

autosave - or any other save - in addition is catastrophic ... :-(
Comment 24 Buovjaga 2022-06-20 07:32:29 UTC
b.: could you run new tests with a fresh master build as it was discovered that Calc comment handling performance has improved recently?
Comment 25 Gabor Kelemen (allotropia) 2022-06-30 19:12:45 UTC
I can reproduce the issue with the first attachment 154800 [details] (about 10k cells filled with text+comment).
Upon opening the file my system monitor reports about 280Mb memory use for soffice.bin, then saving the file as another name makes this go to about 900Mb, and even closing the file does not free this memory.

Randomly pointing to cells and waiting for their comment box to show up also shows quite a bit of waves in CPU use.

Version: 7.5.0.0.alpha0+ / LibreOffice Community
Build ID: 9c796266470183f673eb58a8637dfe621eefa8b3
CPU threads: 8; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: hu-HU (hu_HU.UTF-8); UI: en-US
Calc: threaded
Comment 26 b. 2022-12-25 16:52:43 UTC
I'm actually not using Calc but came across an issue which remained me on the comments / notes problem.  
_Assume_ it could be something like 'loop in loop', 'nested loops', which work well when developed / tested / used with small datasets, but expose evil 'superlinear behaviour' reg. 'quadratic-' or 'exponential-explosion' with 'big data' above some threshold.  
Such may harm execution time ( recursions ) or mem-usage ( large tables ) or both.  
If someone knowing Calc's code finds such in it...  
- either avoiding nested loops,  
- if unavoidable work with binary search in sorted lists,  
- and or the proposals in  
https://answers.sap.com/questions/2497299/how-to-avoid-loop-inside-the-loop-.html , 
https://answers.sap.com/questions/546841/how-to-avoid-the-loop-inside-loop-.html ,  
https://stackoverflow.com/questions/18014889/how-to-avoid-quadratic-computation-resulting-from-double-for-loop-when-computi ,  
could evtl. help?
Comment 27 Telesto 2023-01-24 00:27:02 UTC
*** Bug 125641 has been marked as a duplicate of this bug. ***
Comment 28 Thorsten Behrens (allotropia) 2024-01-29 10:01:11 UTC
Un-Ccing developer for the moment, old regression & very high workload.
Comment 29 Buovjaga 2024-02-10 11:04:57 UTC
*** Bug 159665 has been marked as a duplicate of this bug. ***