Bug 125428 - Huge ram amount used not freed
Summary: Huge ram amount used not freed
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.2.0.4 release
Hardware: x86-64 (AMD64) All
: high normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.1.0 target:7.0.0.1 target:7.4.0
Keywords: bibisected, bisected, perf, regression
Depends on:
Blocks: Memory
  Show dependency treegraph
 
Reported: 2019-05-21 15:13 UTC by polo
Modified: 2022-02-11 14:52 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
mem usage while loading and closing files, (87.58 KB, image/png)
2020-05-11 06:04 UTC, b.
Details
Example file (10.30 MB, application/vnd.oasis.opendocument.spreadsheet)
2020-05-17 11:00 UTC, Telesto
Details
Example file (111.34 KB, application/vnd.oasis.opendocument.spreadsheet)
2020-05-17 16:14 UTC, Telesto
Details
Screenshot (97.69 KB, image/png)
2020-05-21 16:10 UTC, Telesto
Details

Note You need to log in before you can comment on or make changes to this bug.
Description polo 2019-05-21 15:13:48 UTC
When i open an ods file (22MB) , the ram used is around 37% off 8 G (total).
That file has about 100 columns and 4000 rows with data and formulas.

If i select all the columns and rows, then ask for sorting two columns, the ram used climb to 66%. Then i save that job, but the ram used is still the same (expecting it to be lower as the job is done).

Continuing working with that file for a while, the ram is not lowered.
Then if i close that file, the ram fall to 65 % (again this is not expected).
Closing calc, then the ram is finally lowered to initial 37%.

Please tell me if i need to provide debugging output, but tell me how to do it.
Comment 1 m_a_riosv 2019-05-21 17:44:44 UTC
What kind of file MS?, please test if clearing all direct cell format out of the range with data helps in someway.
Comment 2 polo 2019-05-23 10:04:21 UTC
It is my own libreoffice's calc filed on ubuntu.
Cells are formatted either with: colored columns, and/or 2 data digits, nothing else.
Indeed sorting range data cells (only) require less memory, but still increase the initial ram amount, without freeing it after saving (which is expected).
Comment 3 QA Administrators 2019-05-24 02:59:03 UTC Comment hidden (obsolete)
Comment 4 m_a_riosv 2019-05-24 07:30:53 UTC
Can you attach the file after clear or modify private information.
Comment 5 polo 2019-05-24 17:51:19 UTC
The file has nothing special builtin. Simply data and some formulas using if(and(...))

I have also tested some other files with calc, and also writer's files (~50 pages)where i have ran some 'find & replace' feature.

and i have got the same issue with each used files: ram is not recovered after the asked job is done.

That means probably cant be blamed, but libreoffice-core, or event outside (mutter ? or else)
Comment 6 QA Administrators 2019-05-25 02:58:09 UTC Comment hidden (obsolete)
Comment 7 m_a_riosv 2019-05-25 15:45:36 UTC
Have you test with a clean profile? Menu/Help/Restart in Safe Mode
Comment 8 polo 2019-05-29 09:48:20 UTC
Safe mode does not get better result. I also have reproduced that issue with different pcs and other calc/writer files.

So the question is: which of libreoffice or system (gnome-shell on xorg session) or their dependencies is/are in charge of ram dynamic usage inside calc/writer ?
Comment 9 Xisco Faulí 2019-05-30 10:37:21 UTC
Thank you for reporting the bug. Please attach a sample document, as this makes it easier for us to verify the bug. 
(Please note that the attachment will be public, remove any sensitive information before attaching it. 
See https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F for help on how to do so.)

I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the requested document is provided.
Comment 10 QA Administrators 2019-11-27 03:46:07 UTC Comment hidden (obsolete)
Comment 11 duceil 2019-11-28 07:17:29 UTC
As explained/exposed earlier, there is no special relationship about the file (osd/doc/txt/pdf) used: it is reproducible everytime on different configs/versions with systems running gnome/systemd/mutter on xorg session.

The simplest fact is shown via System-monitor applet, when:
- soffice is loaded: take note of % ram used
- then open a task (calc/writer or else); do some work and save; note ram %
- close that task; note ram % again : you see that ram is not freed
- then close soffice, and finally ram is freed.

As asked via comment 8, it should be first needed to determine the source of that problem. If you have suggestions to narrow down that issue, please let me know.

But i'm quite sure everyone can detect that problem on similar config (at least).
Comment 12 duceil 2020-01-22 17:07:55 UTC
Feedback:

Tested LO 6.0.7 : no ram issue seen after a spike due to huge sorting; a couple seconds after that task is done, the used ram level come back to the previous one.

Tested newer LO 6.2.4 and 6.34 against the same huge ods file, and made the same sorting job, as described by Polo, the used ram level climb as expected, but then the level dont set back as it might be.

Works on Ubuntu Bionic with 6.0.7, but dont on Disco and Eoan with newer versions.
Comment 13 Telesto 2020-01-22 20:08:19 UTC
(In reply to duceil from comment #12)
> Feedback:
> 
> Tested LO 6.0.7 : no ram issue seen after a spike due to huge sorting; a
> couple seconds after that task is done, the used ram level come back to the
> previous one.
> 
> Tested newer LO 6.2.4 and 6.34 against the same huge ods file, and made the
> same sorting job, as described by Polo, the used ram level climb as
> expected, but then the level dont set back as it might be.
> 
> Works on Ubuntu Bionic with 6.0.7, but dont on Disco and Eoan with newer
> versions.

@duceil
Please attach a sample document, as this makes it easier for us to verify the bug. 
(Please note that the attachment will be public, remove any sensitive information before attaching it. 
See https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F for help on how to do so.)
Comment 14 b. 2020-05-11 06:04:13 UTC
Created attachment 160639 [details]
mem usage while loading and closing files,

For the Bug hunting Session 7.0.0.0.a1+: 

tried mem usage and release, see pic, 

files with random values, each 1.000.000 cells (A1:CV10000) filled with 
=CHAR(INT(RAND()*26+1+96))&CHAR(INT(RAND()*26+1+96))&CHAR(INT(RAND()*26+1+96))&CHAR(INT(RAND()*26+1+96))&CHAR(INT(RAND()*26+1+96))

loaded three such files - marks 1, 2, 3, 

constructed a fourth one - mark 4, 

subsequently sorted them by col A and B ascending - mem spikes at 5, 6, 7, 8, 
(useless on random data, i know! that) 

opened one new empty file - mark 9, 

released the first four, without saving - marks A, B, C, D, 

left is one empty! file, not even any undo actions valid for it, blocking 2 GB of memory, 

memeory freed with close of program - mark E, 

did i / the OP miss some memory retention 'timeouts'? or is it really a mem leak?
Comment 15 Telesto 2020-05-11 07:38:57 UTC
(In reply to b. from comment #14)
Please attach the sample file you used. Else everybody has to rebuild it themselfs..

And thanks for testing Calc.. not quite my expertise
Comment 16 b. 2020-05-12 10:15:52 UTC Comment hidden (obsolete)
Comment 17 b. 2020-05-12 10:18:24 UTC Comment hidden (obsolete)
Comment 18 b. 2020-05-12 10:22:36 UTC
it's faster to copy and paste that formula: 

=CHAR(INT(RAND()*26+1+96))&CHAR(INT(RAND()*26+1+96))&CHAR(INT(RAND()*26+1+96))&CHAR(INT(RAND()*26+1+96))&CHAR(INT(RAND()*26+1+96))

in that area: 

A1:CV10000

than to manage a 10 mb download ...
Comment 19 Telesto 2020-05-17 11:00:10 UTC Comment hidden (obsolete)
Comment 20 Telesto 2020-05-17 16:14:54 UTC Comment hidden (obsolete)
Comment 21 Telesto 2020-05-17 16:22:30 UTC
1. open Calc
2. Tools -> Options -> Advanced -> Open Expert Configuration -> Search for Undo
Set undo steps 100 back to 0
3. Close LibreOffice and restart
4. Open the attached file
5. Copy Column D
6. Start pasting column by column from E to S
7. Memory usage increases to for me 569 MB
8. Delete all the content -> Still 569 MB
9. Close the document -> 164 MB

Version: 7.0.0.0.alpha1+ (x64)
Build ID: f9790da286f2d2fa47f1748f8cfa6172c6622ca3
CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: Skia/Raster; VCL: win; 
Locale: de-CH (nl_NL); UI: en-US
Calc: CL

I have no clue why the memory is allocated until close of the document.. it's not undo information 

Same can be observed in 4.4.7.2
Comment 22 Telesto 2020-05-17 16:29:46 UTC
@Julien
Do you have any insight in memory management of Calc? Not sure if it's really leaking.. looks more like a delayed release. Memory is freed eventually (after closing)..

But the current behavior can add up pretty quick.. And undo isn't even enabled (assuming the setting works as expected)
Comment 23 Julien Nabet 2020-05-17 17:58:16 UTC
(In reply to Telesto from comment #22)
> @Julien
> Do you have any insight in memory management of Calc? Not sure if it's
> really leaking.. looks more like a delayed release. Memory is freed
> eventually (after closing)..
> 
> But the current behavior can add up pretty quick.. And undo isn't even
> enabled (assuming the setting works as expected)

Sorry, I don't have any insight and don't think Flamegraph may help. Except Eike, I don't know who may help. There was Kohei some years ago but he works only on mdds if I well understood.
About leaks, there was a tool on Mac (I don't remember the name and since I don't have a Mac anymore, can't tell).
Comment 24 Telesto 2020-05-17 21:06:57 UTC
@Mike
Not to nag.. but more as a question.. should this be open as bug? Comment 0/ Comment 14
Comment 25 Mike Kaganski 2020-05-18 04:47:03 UTC
(In reply to Telesto from comment #24)

Hi!
Although this might need further analysis, I agree that not releasing memory *at all* after an operation (as 7-8 from c#14 might hint) must be treated as a bug. Yet, it needs investigating: is that really not released e.g. on timer? is sorting repeatedly on the same data (e.g. asc->desc->asc->...) produce higher levels after the spike each time? etc.

But if it's released subsequently, without unloading documents or exiting program, then IMO it needs much more substantial ground to be called a bug. That would mean proper memory handling, and only if that results in actual problems, the strategy might need to be revised.
Comment 26 Telesto 2020-05-21 16:10:04 UTC
Created attachment 161088 [details]
Screenshot

@Mike,
Small question: how long is a timer based release 'allowed' to take.. 4 minutes after closing the document is fine?
Comment 27 Telesto 2020-05-21 18:00:44 UTC
1. Open attachment 160938 [details]
2. Copy Column D
3. Paste one by one until column S
4. CTRL+Z everything
5. CTRL+Y everything
6. Copy an empty cell (clearing the clipboard)
7. Open a new calc sheet
8. Close the one attached file without saving
9. Do nothing.. 1 GB in use.. won't be released.. 

The release is - surely - triggered by copying something to the clipboard.
Making edits to the new Calc file (typing) has no impact... as far I noticed..
Opening a new file doesn't do anything either.

Found in
7.0

and in 
4.4.7.2

and in
Version: 4.3.7.2
Build ID: 8a35821d8636a03b8bf4e15b48f59794652c68ba

but not in
Versie: 4.1.0.4 
Build ID: 89ea49ddacd9aa532507cbf852f2bb22b1ace28

Adding bibisectrequest assuming this is reproducible on Linux
Comment 28 Telesto 2020-05-23 08:48:20 UTC
Bisected to:
author	Kohei Yoshida <kohei.yoshida@collabora.com>	2013-10-10 20:24:21 -0400
committer	Kohei Yoshida <kohei.yoshida@collabora.com>	2013-10-11 12:14:27 -0400
commit 7333881bb7b04f7e4e2a28638024ae82a9c14e81 (patch)
tree 97548f94ab918d502b45a5dda40ece5ad4117617
parent 6255be7ca294d350143290c343673f264f42220c (diff)
Formula tokens, formula cells and formula interpreters to use shared strings.

https://cgit.freedesktop.org/libreoffice/core/commit/?id=7333881bb7b04f7e4e2a28638024ae82a9c14e81

Not actually the release oft memory.. but in general the memory usage 
Before this commit +/-200 after 1 GB. Even the calculation is faster before..
Comment 29 Telesto 2020-05-23 09:08:28 UTC
@Noel
I'm horrible opportunistic here.. And not even sure if Calc is you're thing.. but have done some nice changes improving memory usage in general:

So maybe a clue why this commit is causing a massive increase in memory usage
https://cgit.freedesktop.org/libreoffice/core/commit/?id=7333881bb7b04f7e4e2a28638024ae82a9c14e81

Formula tokens, formula cells and formula interpreters to use shared strings.
Comment 30 Noel Grandin 2020-05-23 09:54:23 UTC
@Telesto that particular commit is intended to _lower_ memory usage in most situations, by creating a shared pool of string objects to use, instead of every single cell and formula using it's own string object.

However, we never ever remove strings from that pool, so in this situation it leads to memory that this not freed until close.

We could flush that pool periodically, but it is not obvious when to flush that pool, in a way which will not cause performance regressions for some people.
Comment 31 Telesto 2020-05-24 13:44:40 UTC
(In reply to Noel Grandin from comment #30)
> @Telesto that particular commit is intended to _lower_ memory usage in most
> situations, by creating a shared pool of string objects to use, instead of
> every single cell and formula using it's own string object.
> 
> However, we never ever remove strings from that pool, so in this situation
> it leads to memory that this not freed until close.
> 
> We could flush that pool periodically, but it is not obvious when to flush
> that pool, in a way which will not cause performance regressions for some
> people.

I'm not knowing anything to give any advice here.. There are two flaws, IMHO
* The pool isn't flushed periodically
* The pool isn't flushed properly after closing the document either. It needs some bizarre trigger to do so (like copying something to the clipboard)
  
Also no clue how a pool functions. Lacking the knowledge about smart pointers, unique pointers etc.. Nor when unused strings can be useful.. Or how to track unused and used strings.. just emptying a pool on timer doesn't sound smart.. 

I assume there are more strings stored than actually present in the visual presentation in Calc itself. Is there no intelligent compiler feature to track unused strings.. as a sort of garbage collector. To flush things out after being unused for say 30 seconds.. 
Dumping everything into the pool until close, isn't doesn't sound like a proper solution either.. However not sure if the pool gets filled because of the specifics of the bug doc in question.. or this being a general behavior
Comment 32 b. 2020-05-24 20:08:45 UTC
@Noel: 

(In reply to Noel Grandin from comment #30)
> ... that particular commit is intended to _lower_ memory usage in most
> situations, by creating a shared pool of string objects to use, instead of
> every single cell and formula using it's own string object.
> 
> However, we never ever remove strings from that pool, so in this situation
> it leads to memory that this not freed until close.

hi, thanks, that explains a lot ... 

is the referencing to this list 'uni-' or 'bidirectional'? 

when i think about it ... it's a clear advantage as long as it doesn't become a killer, and will become a killer once it's grown too big ... thus dangerous for big sheets / projects

is it possible / a good idea to implement it similar to linux hardlinks in filesystems, track referencing items and require the referencers to delete the reference as soon as it is no longer needed, or to check from the list whether the referencing objects are still 'alive'? and release the referenced string once all referencers are dead ... ?
Comment 33 duceil 2020-05-25 08:10:04 UTC
Only sharing my own feeling here.

Apps like Calc/Writer/... are ran under Soffice ombrella

That report suggest that Soffice is not dealing with apps's ram usage and does not care about jobs saved/closed.

Indeed decisions have to be taken about the 'undo' possibilities (how many ? )

Maybe a custom user setting can definitely register the job done:
1) press icon for example when the job is saved, then freeing job's ram.
2) ram freed if that job is closed (Soffice does not need to store such data after the job has ended)
3) Soffice definitely purge the ram used by the being closed app (calc/writer/...)
Comment 34 Telesto 2020-05-31 11:23:59 UTC
Not related to "shared strings" - so bit off topic - and probably inherent-ed - but there something odd going on with the undo/redo 'information pool': bug 133553

Memory usage has a steep curve even with the undo/redo steps set to 1
Comment 35 Commit Notification 2020-06-02 13:39:04 UTC
Luboš Luňák committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/861fbd998f2b526c2aea073c9471613bf728fa75

purge shared string pool if ScDocument is closed (tdf#125428)

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 36 Commit Notification 2020-06-03 09:39:00 UTC
Luboš Luňák committed a patch related to this issue.
It has been pushed to "libreoffice-7-0":

https://git.libreoffice.org/core/commit/b0556da3a0966c9c68c6a57909524058d0bb5e07

purge shared string pool if ScDocument is closed (tdf#125428)

It will be available in 7.0.0.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 37 Commit Notification 2022-02-10 14:19:56 UTC
Luboš Luňák committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/5b2a153779b68e8454a66973579512fe17e376d5

do not call purge() on string pool too often (tdf#125428)

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.