Bug 97989 - LO Calc very poor performance when saving to ODS or CSV on large spreadsheet
Summary: LO Calc very poor performance when saving to ODS or CSV on large spreadsheet
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.1.0.3 release
Hardware: x86-64 (AMD64) Windows (All)
: medium major
Assignee: Markus Mohrhard
URL:
Whiteboard: target:5.2.0
Keywords: perf
Depends on:
Blocks:
 
Reported: 2016-02-18 21:58 UTC by rascal
Modified: 2016-10-25 19:08 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
source file and performance usage charts for cases described in ticket (2.84 MB, application/force-download)
2016-02-18 21:58 UTC, rascal
Details
test replicated on LO 5103x32 portable and it works fine (9.20 MB, application/force-download)
2016-02-18 22:10 UTC, rascal
Details
CSV import settings (140.80 KB, image/jpeg)
2016-03-25 09:23 UTC, Buovjaga
Details
Callgrind with 5.2 (8.15 MB, application/x-7z-compressed)
2016-03-25 18:28 UTC, Buovjaga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description rascal 2016-02-18 21:58:44 UTC
Created attachment 122781 [details]
source file and performance usage charts for cases described in ticket

Hi,
I'd like to report problem I discovered when playing with larger files in LO calc on version 5.1.0.3 x64 on windows7(x64).
Calc saves larger file very slowly - basically preventing any reasonable work with such files.

Findings:
It takes about 15-20 seconds to load a file with 200 000 rows.
It takes 6 minutes (!!!) to save to CSV (random line was copied and added at the bottom, so the file is not identical).
It takes 7 minutes (!!!) to save this 200k rows to ODS format (no changes in content).
It takes only around 20 seconds to save to XLSX (no changes in content).

I also tested on portable LO 4.4.7x32 from portableapps and it wors MUCH faster.
Times recorded
Load of CSV - cca 15s
Save to CSV (again with a random row copied and attached at the bottom) - less than 10 seconds.
Save to ODS - 20 seconds
Save to XLSX - 10 seconds

What I find weird that handling XLSX is faster than ODS. (Saving and loading - there is one chart with loads speeds.)

The attachment contains:
1) Original CSV file with 200k rows - star0000-1-200k.csv
Data are from here
https://sdm.lbl.gov/fastbit/data/samples.html , file star2000.csv.gz - I just took 200k rows from it and save do separate file (in notepad).

2) screenshots from charts from Process Hacker.
How to read the charts: Top is CPU, middle memory, bottom I/O
Vertical lines are 10 second markers.
First bump is loading of the file, second bump (veeery long CPU hog for LO 5.1.0) is saving of the file to specific format.

Please notice that while CPU is hogged for few minutes in the worst cases, there is almost no I/O activity.

OUTPUT files - not included due to attachmet limitation - I have them and can provide on request.

Tested on Lenovo W520 laptop with i7-2760QM CPU. It is a quad-core with HT, so eating 1 cpu core is visible as 12.5% CPU load on the chart.
8GB of RAM installed in the laptop.

I can run additional tests per instructions or provide more info, just let me know.

Thanks and hopefully this could be corrected soon as it makes working with larger files impossible.

Lastly - the loading and saving progressbar is NOT working in 5.1.0 x64 - LO Calc just freezes during the save.
Comment 1 rascal 2016-02-18 22:10:55 UTC
Created attachment 122782 [details]
test replicated on LO 5103x32 portable and it works fine

test replicated on LO 5103x32 portable and it works fine.
It looks like the issues is only on x64 version.
Comment 2 MM 2016-02-18 23:49:04 UTC
The slow saving to ods seems like a dup, bug 93405
Comment 3 rascal 2016-02-19 07:45:35 UTC
Hi,
If you check the evidence, very slow saving is also to CSV, not only ODS.
Moreover it happens on version 5.1.0.3 x64, but not on same version x32 and not on 4.7.7.2 x32.
It does not seem like a duplicate to me.
If provided some guidance, I can run more tests (different versions, architectures, cleaning profile etc....)

thank you
Comment 4 MM 2016-02-19 12:17:07 UTC
(In reply to rascal from comment #3)
> If you check the evidence, very slow saving is also to CSV, not only ODS.
> Moreover it happens on version 5.1.0.3 x64, but not on same version x32 and
> not on 4.7.7.2 x32.

Well normally it's one bug in one report, not two. That ods and csv saving might be slow doesn't mean that they are related.
So that's why i told you that slow ods saving might be a dup.
Comment 5 rascal 2016-02-19 13:33:28 UTC
Ok I understand. I though there is some bug in a saving library / module / component and filed this in 1 bug, because the behavior is similar.
Also I searched for duplicates when creating this and none seem to describe similar issues, so I filed this.

I will have access to more PCs over the weekend so I'll try the same on different machines and LO versions and will posts the results.
If anything more specific is needed (log outputs, whatever...) I will need some guidance on this.

Thank you
Comment 6 MM 2016-02-19 22:01:14 UTC
Tested with v5.0.5.2 & v5.1.1.1 under ubuntu 14.04 x64 and 5.1.1.1 under windows 10 x64. Saving to ODS will take from about 20 - 40 secs, but not 6-7 mins.
And about 10-15 secs to csv...

You might wanna try resetting your user profile and report back.
https://wiki.documentfoundation.org/UserProfile#Resolving_corruption_in_the_user_profile
Comment 7 rascal 2016-02-20 12:43:26 UTC
Just tested on different machines.
Win7 Desktop, older - with AMD Athlon X2 240 cpu, 4GB RAM
5.0.4.1 x64 - same issue - CSV 8 minutes, ODS 10 minutes, forgot to test XLSX
5.1.0.3 x32 - same issue (though bit faster) - CSV 5 minutes, ODS 6 minutes, XLSX about 30 seconds only
5.0.2.2 x64 - same issue - CSV 8 minutes, ODS 10 minutes, XLSX about 30 seconds only
5.1.0.3 x64 - same issue - CSV 8 minutes, ODS 10 minutes, XLSX about 30 seconds only
All version tested were installed versions, no portables

third machine - desktop win7x64, cpu intel i5-3470 quad core, 12 GB RAM
5.0.3.2 x64 - same issue - saving to CSV 3 min, saving to ODS 3 min, to XLSX 15 seconds
5.1.0.3 x32 portable - same issue but faster - saving to CSV 90 seconds, to 0DS 90 seconds, to XLSX 15 seconds


Will try to reinstall and reset the user profile next.
Comment 8 rascal 2016-02-20 13:17:12 UTC
I still have the issue
uninstalled, removed profile and re-installed LO 5103x64 on second (old desktop) and third (new desktop) and the issue is still there.

Also tried to disable antivirus (MSE) without any effect. On the laptop I use COMODO CIS.

Language versions used:
Laptop - english windows, english LO
desktops - czech windows, czech LO

Portable versions were english.

I need to check again the portable versions as in one of the tests the issue did not happen, in others it did happen. :/
Comment 9 MM 2016-02-21 21:15:20 UTC
Notice that if this is the same issue as in bug 93405 the ods saving still isn't fixed yet. What strange is, is that on multple machines the saving is so much slower than mine (under vbox).
Comment 10 rascal 2016-02-21 21:50:43 UTC
Exactly.
What I also don't understand is following:
I tried version 5103x32 portable from portableapps.com
on my laptop (the first tested machine) it performs OK - saving is only about 20 seconds.
On the third machine however (win7 i5 desktop) it takes 90 secodns - so to me it looks like the same build is working differently on different machines :/
Comment 11 rascal 2016-02-22 22:40:33 UTC
Just tried LO 5111x64 (uninstalled previous, removed profiles, installed new, reboot) and still same behavior :/
Saving to ODS 7 minutes, to CSV also around that.
Comment 12 Buovjaga 2016-02-26 18:58:09 UTC
Repro. Saving to CSV took about 5 minutes.

Win 7 Pro 64-bit Version: 5.2.0.0.alpha0+
Build ID: ef02de2698d90fd874bddf3146165cbe85487bc5
CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; 
TinderBox: Win-x86@39, Branch:master, Time: 2016-02-19_23:40:50
Locale: fi-FI (fi_FI)
Comment 13 rascal 2016-02-26 20:51:26 UTC
Just tested on another machine:
old Thinkpad T61, win7x32, 3GB RAM, core2duo T7300 (2Ghz)
LO 5103x32 portable works without issue.
LO 5052x32 portable also without issue.
In Both cases loading of the file took 30 seconds, saving to CSV 10 seconds and saving to ODS 30 seconds. Acceptable times for such old machine and much bether then my new laptop with i7 cpu due to the bug.
Comment 14 Markus Mohrhard 2016-03-23 21:43:52 UTC
I can't reproduce the slow saving in master. With callgrind the export to csv is much faster than the import (at least until my import fix lands in master).

Unless there is something that is not mentioned in the bug report I can't reproduce the problem. The slow ODS export is somewhat expected based on the ODF spec. I have not yet profiled the ODF export.

Import performance fix is landing soon.
Comment 15 Commit Notification 2016-03-24 00:31:21 UTC
Markus Mohrhard committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=7da3a53958695bfb1405fa513f71beddc6c0ecb7

don't allocate and destroy a LocaleDataItem for each cell, tdf#97989

It will be available in 5.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Markus Mohrhard 2016-03-24 00:33:10 UTC
I can't reproduce the ODS problems either. At least according to callgrind there is nothing strange going on. Most of the time is spent writing xml elements and in the deflate function of the zip code.

If anyone can still reproduce this in current master I need better steps to reproduce this.

The pushed patch is only for CSV import.
Comment 17 Buovjaga 2016-03-24 13:11:08 UTC
I still repro the CSV saving slowness of many minutes.

I was unable to do a cachegrind, it killed itself before UI launched.

Used these: https://wiki.documentfoundation.org/QA/BugReport/Debug_Information#GNU.2FLinux:_How_to_get_a_cachegrind_trace

Arch Linux 64-bit, KDE Plasma 5
Version: 5.2.0.0.alpha0+
Build ID: 8d267cdd48e8b736a81a9e76ea5803e6847d791e
CPU Threads: 8; OS Version: Linux 4.4; UI Render: default; 
Locale: fi-FI (fi_FI.UTF-8)
Built on March 24th 2016
Comment 18 Markus Mohrhard 2016-03-24 18:35:46 UTC
(In reply to Buovjaga from comment #17)
> I still repro the CSV saving slowness of many minutes.
> 
> I was unable to do a cachegrind, it killed itself before UI launched.
> 
> Used these:
> https://wiki.documentfoundation.org/QA/BugReport/Debug_Information#GNU.
> 2FLinux:_How_to_get_a_cachegrind_trace
> 
> Arch Linux 64-bit, KDE Plasma 5
> Version: 5.2.0.0.alpha0+
> Build ID: 8d267cdd48e8b736a81a9e76ea5803e6847d791e
> CPU Threads: 8; OS Version: Linux 4.4; UI Render: default; 
> Locale: fi-FI (fi_FI.UTF-8)
> Built on March 24th 2016

They do a detailed explanation what you are doing. Explain it to me like I would be an idiot. Keep in mind that I'm a developer so I might always miss something completely trivial. The first thing that comes to my mind are the csv import settings.
Comment 19 Markus Mohrhard 2016-03-25 01:14:41 UTC
ODS export might have been the same problem as Bug 93405.
Comment 20 Buovjaga 2016-03-25 09:23:13 UTC
Created attachment 123825 [details]
CSV import settings

These are the settings offered to me in 5.2 and what I used when reproducing yesterday. Probably the settings I used when I originally reproduced on Windows were the same.

I do like in the original description: select a row, copy, go to the bottom and paste to an empty row. Then I save as with a new name.
Comment 21 Buovjaga 2016-03-25 10:42:20 UTC
(In reply to Markus Mohrhard from comment #19)
> ODS export might have been the same problem as Bug 93405.

It is true, with a fresh build ODS export now takes only 38 seconds.

Markus discovered I am hitting this bug with valgrind: https://bugs.kde.org/show_bug.cgi?id=353370
I will have to compile valgrind from source later.

Arch Linux 64-bit, KDE Plasma 5
Version: 5.2.0.0.alpha0+
Build ID: 44a6d8ac3063511a149d4abdd6c2a556b3f477fe
CPU Threads: 8; OS Version: Linux 4.4; UI Render: default; 
Locale: fi-FI (fi_FI.UTF-8)
Built on March 25th 2016
Comment 22 rascal 2016-03-25 12:21:42 UTC
Hi Markus, Hi Buovjaga.

Just to clarify - CSV import was NEVER a problem - it took 10-20 seconds.
Saving as.... to CSV or to ODS was the problem. Hogget CPU for many minutes without any real I/O activity.
My understanding is that LO writes the XML file and then it's zipped and the "write from memory to text/xml" thing is what is taking ages.

Myself I did not observer it in all cases - on some computers and setups it worked fine, however I was not able to find the cause, no visible pattern...

I will try the daily builds probably tomorrow and will let you know. I'm not developer so I can only do normal tests, nothing fancy.

Thank you
Comment 23 rascal 2016-03-25 15:25:54 UTC
Also - can you please tell me which daily build exactly am I supposed to test? sorry there is too many winx86 version under daily/master.

Thank you
Comment 24 Buovjaga 2016-03-25 15:40:30 UTC
(In reply to rascal from comment #23)
> Also - can you please tell me which daily build exactly am I supposed to
> test? sorry there is too many winx86 version under daily/master.

Test with whatever is the most recent, like http://dev-builds.libreoffice.org/daily/master/Win-x86@62-merge-TDF/current/

However, you should not test with TB39 builds as they are debug builds. Today I learned that a build with --enable-dbgutil or --enable-debug behaves weirdly performance-wise, so with this bug we cannot use them.
Comment 25 Buovjaga 2016-03-25 18:28:43 UTC
Created attachment 123851 [details]
Callgrind with 5.2

Finally got a callgrind from exporting the CSV.
Comment 26 Commit Notification 2016-03-25 20:05:34 UTC
Markus Mohrhard committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=007b317fef91aa809deff8380a9e62c350eaf511

use the ScRefCellValue already available, tdf#97989

It will be available in 5.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 27 Markus Mohrhard 2016-03-25 20:06:30 UTC
One case of csv export settings was still slow. Should be fixed now with the latest patch. ODS seems to be fast as well so will close the bug report.
Comment 28 Buovjaga 2016-03-25 21:44:13 UTC
I verified the patch fixes it.
Comment 29 rascal 2016-04-10 15:58:46 UTC
So I tried this version (hoping it's right one to test) http://dev-builds.libreoffice.org/daily/master/Win-x86@62-merge-TDF/current/master~2016-04-06_20.10.15_LibreOfficeDev_5.2.0.0.alpha0_Win_x86_en-US_de_ar_ja_ru_qtz.msi
and it looks corrected. CSV saves very fast (few seconds), ODS is taking some 15-20 seconds to save which is fine for file this size. One thing though - when saving to ODS, the progressbar is not drawn at all (blank all the time until LO "unfreezes" and save is completed).
Also would be nice to have this bugfix in 5.1.x branch, but I have no idea if this is possible :)

Thanks!