Bug 100709 - Horrible performance opening ODS file saved from xlsx file (5 - 20 times slower than Excel)
Summary: Horrible performance opening ODS file saved from xlsx file (5 - 20 times slow...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.3.2.2 release
Hardware: All All
: medium normal
Assignee: Eike Rathke
URL:
Whiteboard: target:5.3.0 target:5.5.0 target:5.4.0.1
Keywords: perf
Depends on:
Blocks:
 
Reported: 2016-07-01 06:31 UTC by Dan Dascalescu
Modified: 2017-12-08 14:45 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
List of purchases, scrubbed. 9400 rows, 11 columns (473.78 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2016-07-01 06:31 UTC, Dan Dascalescu
Details
testcase (43.52 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2016-09-06 21:34 UTC, Maarten Bosmans
Details
reference output (842 bytes, text/csv)
2016-09-06 21:36 UTC, Maarten Bosmans
Details
Test case ODS for very slow loading (156.88 KB, application/vnd.oasis.opendocument.spreadsheet)
2017-05-07 07:23 UTC, Dan Dascalescu
Details
Removed macros, still loading as slow as before (166.94 KB, application/vnd.oasis.opendocument.spreadsheet)
2017-05-07 07:38 UTC, Dan Dascalescu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Dascalescu 2016-07-01 06:31:49 UTC
Created attachment 126017 [details]
List of purchases, scrubbed. 9400 rows, 11 columns

Opening the attached 9400x.xlsx in Excel 2007 takes less than a second. The file hasn't been touched by LibreOffice and is less than 500Kb.

Opening it in LibreCalc takes 5 seconds on Windows and 20 seconds on Ubuntu after the file is edited, saved and re-opened. Opening it is so slow that LibreCalc freezes and its window becomes dark grey.

Test machine: Core i5, 16GB RAM, SSD


Related: bug 67629
Comment 1 V Stuart Foote 2016-07-01 14:26:02 UTC
Can not confirm on Windows 10 Pro 64-bit en-US with
Version: 5.2.0.1 (x64)
Build ID: fcbcb4963bda8633ba72bd2108ca1e802aad557d
CPU Threads: 8; OS Version: Windows 6.19; UI Render: GL; 
Locale: en-US (en_US)

open 9 sec. (474KB)

simple edit to a field value

save and close 

reopen 9 sec. (343KB)
Comment 2 MM 2016-07-01 15:25:29 UTC
Confirmed with v5.1.4.2 under ubuntu 16.04 x64.
Unconfirmed with v5.2.0.1 under ubuntu 16.04 x64.

Seems already fixed, but not backported to the 5.1 branch.

Looks like a saving problem, as the file saved with v5.1 is also slow loading on v5.2. But saving with v5.2 and reloading is much faster.
Comment 3 Dan Dascalescu 2016-08-03 05:51:12 UTC
Opening the attached file as is in 5.2.0.1 still takes 5-6 seconds on the test machine described in my initial comment. Excel opens it in less than a second.
Comment 4 Maarten Bosmans 2016-08-23 10:49:39 UTC
I see the same behavior in git master.

The problem is excessive runtime spent in std::_Rb_tree_increment. This comes from iterating through the set of RowRangeStyles in oox::xls::SheetDataBuffer::addColXfStyle() over and over again.

Two things can be done about this:
 - Actually make use of the log(N) find() of the RBtree-backed set. Iterating through the set until you find an overlapping range is just stupid.
 - Try to batch several updates together. For each style there is a list of row ranges to which the style applies that need to be inserted into the ordered set. It appears that this list of ranges is ascending and non-overlapping (I still need to verify that is always the case). So instead of inserting each each rowrange individually in the ordered set, it could be faster to merge the list and the set using the fact that both are ordered.

I'm working on it.
Comment 5 Maarten Bosmans 2016-08-30 20:36:05 UTC
This is solved in https://gerrit.libreoffice.org/#/c/28510
(which isn't in master yet)
Comment 6 Commit Notification 2016-08-31 14:45:59 UTC
Maarten Bosmans committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=d16070a30d421098faedd51c4333d1d42163268f

Related: tdf#100709 Fix two bugs in SheetDataBuffer::addColXfStyle

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Commit Notification 2016-08-31 17:42:24 UTC
Maarten Bosmans committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=4a63c145dcce8411c5707f6b99877cc87a4f6c5d

Resolves tdf#100709 Optimize SheetDataBuffer::addColXfStyle

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Maarten Bosmans 2016-09-06 21:34:46 UTC
Created attachment 127182 [details]
testcase

This is a small testcase that exibits the two formatting bugs found while fixing the performance problem.
Comment 9 Maarten Bosmans 2016-09-06 21:36:46 UTC
Created attachment 127183 [details]
reference output

This is the output LO is supposed to convert the corresponding xlsx to. (and does in master with the above commits)
Comment 10 Maarten Bosmans 2016-09-06 21:42:56 UTC
I need some help converting the attached xlsx+csv pair to a working test case. 

First: how do I make a unit test to check that the tdf100709.xlsx file is correctly converted to the reference csv?

Furthermore, the tdf100709.xlsx file shows both bugs in the unfixed LO:
 - cell B52 should be formatted as "218", not "$218,00"
 - cell A75 should be formatted as "218", not "05-aug-00"
Of these two (distinct) problems, only the second can be checked with the csv file, because all dollar amounts get exported as plain "218", so the formatting difference isn't visible here. Is there a way around this?
Comment 11 Commit Notification 2016-09-09 01:32:43 UTC
Maarten Bosmans committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=6cf44da0395ce5d8dc9cfda838fb15c593507363

Related: tdf#100709 Add unit test

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Xisco Faulí 2016-10-23 12:26:24 UTC
Hello Maarten,
Is this bug already fixed?
If so, could you please close it as RESOLVED FIXED?
Comment 13 MM 2016-10-23 21:20:40 UTC
With

Version: 5.3.0.0.alpha1+
Build ID: 928776b734c6aa188151bbce048d5bef4486dce7
CPU Threads: 2; OS Version: Linux 4.4; UI Render: default; VCL: gtk2; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2016-10-23_00:39:08
Locale: en-US (en_US.UTF-8); Calc: single

it opens quite fast now (<2secs).
Comment 14 Xisco Faulí 2016-12-06 12:07:02 UTC
Closing this as RESOLVED FIXED as per comment 13
Comment 15 Dan Dascalescu 2017-05-07 07:23:48 UTC
Created attachment 133112 [details]
Test case ODS for very slow loading

I still see very slow load times (5+ seconds) with the attached file, while the source XLSX file I created it from loads in under one second in Excel 2007 on Wine, and had a bunch of additional data that I scrubbed (2 summary sheets, many more formulas, 4 extra columns etc.)

LibreCalc version information below. The hardware is the same as in the first comment - Core i5 CPU, 16GB RAM, Ubuntu 16.04.

Version: 5.3.2.2
Build ID: 6cd4f1ef626f15116896b1d8e1398b56da0d0ee1
CPU Threads: 4; OS Version: Linux 4.4; UI Render: default; VCL: gtk2; Layout Engine: new;
Comment 16 Dan Dascalescu 2017-05-07 07:38:13 UTC
Created attachment 133113 [details]
Removed macros, still loading as slow as before

Removed macros to eliminate that potential cause of slowness.

Looks like it's just the data and formulas.
Comment 17 Eike Rathke 2017-06-03 12:07:48 UTC
What slows things down are the excess columns formatted up to column ALT but not the entire row, for example in rows 3241, 3443, 3513, ...  Once those formattings are removed (ie. select F1:AMJ1048576 and Clear Direct Formatting from context menu) and the document is saved and reloaded the slowness is gone.

However, the bottleneck is in ScFlatSegmentsImpl<bool, bool>::getRangeData() from ScMultiSelIter::ScMultiSelIter() and ScMultiSelIter::Next() under ScTable::MergeSelectionPattern() which are created for each column to apply the formatting.

Digging deep into the coupling of ScMarkData, ScMultiSel, ScMultiCellIter and avoiding creation of temporary ScMultiCellIter instances with their segment trees might help.
Comment 18 Commit Notification 2017-06-03 22:12:40 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=bb9b6cb4e6cfc5fbb4733abb8e9f5758987c197a

Perf-sc: tdf#100709 avoid segment tree with ScMultiSelIter where possible

It will be available in 5.5.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Commit Notification 2017-06-04 20:50:00 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-5-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=3a52f3bf5ecf0a2b8d4f4b87e19911ad8e7eaefa&h=libreoffice-5-4

Perf-sc: tdf#100709 avoid segment tree with ScMultiSelIter where possible

It will be available in 5.4.0.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Commit Notification 2017-06-06 18:54:40 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f300a627b9a88eb8814cc35844ed7c6aa7f19379

Perf-sc: tdf#100709 SfxPoolItem::IsVoidItem() instead of dynamic_cast

It will be available in 5.5.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Commit Notification 2017-06-06 22:51:50 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=59c9d0653cc42560af48269bb8dee2c2b0b20f68

Perf-sc: tdf#100709 Use a "one and a half" alloc strategy for ScMarkArray

It will be available in 5.5.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Commit Notification 2017-06-06 22:56:12 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-5-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=3f94381f72782e0c5b16df12de6d52de96e8a0d5&h=libreoffice-5-4

Perf-sc: tdf#100709 SfxPoolItem::IsVoidItem() instead of dynamic_cast

It will be available in 5.4.0.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 23 Commit Notification 2017-06-07 04:11:03 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-5-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=2f25f408228528395f722f9332020ba9b13ed334&h=libreoffice-5-4

Perf-sc: tdf#100709 Use a "one and a half" alloc strategy for ScMarkArray

It will be available in 5.4.0.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 24 Eike Rathke 2017-06-12 16:41:32 UTC
There's one remaining bottleneck with the memory allocation schema for ScAttrArray, similar to ScMarkArray; however, the same "one and a half" strategy can not be applied as there exists each one ScAttrArray for every column, and having 1.5 times more memory persistently (as long as the document lives in memory) allocated than needed isn't a good strategy there.. it doesn't matter for ScMarkArray because that is temporary.

Setting this bug to FIXED though.
Comment 25 Dan Dascalescu 2017-06-12 18:30:13 UTC
I've removed all rows similar to ones mentioned in C17, but the file still takes 5-6 seconds to load vs. ~1 in Excel.
Comment 26 Xisco Faulí 2017-06-12 20:35:17 UTC
(In reply to Dan Dascalescu from comment #25)
> I've removed all rows similar to ones mentioned in C17, but the file still
> takes 5-6 seconds to load vs. ~1 in Excel.

Could you please copy the info from Help - About LibreOffice?
Comment 27 Eike Rathke 2017-06-19 19:17:37 UTC
@Dan: so, which version did you try with?
Comment 28 Eike Rathke 2017-12-08 14:45:34 UTC
No answer, marking resolved fixed again.