Created attachment 126017 [details] List of purchases, scrubbed. 9400 rows, 11 columns Opening the attached 9400x.xlsx in Excel 2007 takes less than a second. The file hasn't been touched by LibreOffice and is less than 500Kb. Opening it in LibreCalc takes 5 seconds on Windows and 20 seconds on Ubuntu after the file is edited, saved and re-opened. Opening it is so slow that LibreCalc freezes and its window becomes dark grey. Test machine: Core i5, 16GB RAM, SSD Related: bug 67629
Can not confirm on Windows 10 Pro 64-bit en-US with Version: 5.2.0.1 (x64) Build ID: fcbcb4963bda8633ba72bd2108ca1e802aad557d CPU Threads: 8; OS Version: Windows 6.19; UI Render: GL; Locale: en-US (en_US) open 9 sec. (474KB) simple edit to a field value save and close reopen 9 sec. (343KB)
Confirmed with v5.1.4.2 under ubuntu 16.04 x64. Unconfirmed with v5.2.0.1 under ubuntu 16.04 x64. Seems already fixed, but not backported to the 5.1 branch. Looks like a saving problem, as the file saved with v5.1 is also slow loading on v5.2. But saving with v5.2 and reloading is much faster.
Opening the attached file as is in 5.2.0.1 still takes 5-6 seconds on the test machine described in my initial comment. Excel opens it in less than a second.
I see the same behavior in git master. The problem is excessive runtime spent in std::_Rb_tree_increment. This comes from iterating through the set of RowRangeStyles in oox::xls::SheetDataBuffer::addColXfStyle() over and over again. Two things can be done about this: - Actually make use of the log(N) find() of the RBtree-backed set. Iterating through the set until you find an overlapping range is just stupid. - Try to batch several updates together. For each style there is a list of row ranges to which the style applies that need to be inserted into the ordered set. It appears that this list of ranges is ascending and non-overlapping (I still need to verify that is always the case). So instead of inserting each each rowrange individually in the ordered set, it could be faster to merge the list and the set using the fact that both are ordered. I'm working on it.
This is solved in https://gerrit.libreoffice.org/#/c/28510 (which isn't in master yet)
Maarten Bosmans committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=d16070a30d421098faedd51c4333d1d42163268f Related: tdf#100709 Fix two bugs in SheetDataBuffer::addColXfStyle It will be available in 5.3.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Maarten Bosmans committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=4a63c145dcce8411c5707f6b99877cc87a4f6c5d Resolves tdf#100709 Optimize SheetDataBuffer::addColXfStyle It will be available in 5.3.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Created attachment 127182 [details] testcase This is a small testcase that exibits the two formatting bugs found while fixing the performance problem.
Created attachment 127183 [details] reference output This is the output LO is supposed to convert the corresponding xlsx to. (and does in master with the above commits)
I need some help converting the attached xlsx+csv pair to a working test case. First: how do I make a unit test to check that the tdf100709.xlsx file is correctly converted to the reference csv? Furthermore, the tdf100709.xlsx file shows both bugs in the unfixed LO: - cell B52 should be formatted as "218", not "$218,00" - cell A75 should be formatted as "218", not "05-aug-00" Of these two (distinct) problems, only the second can be checked with the csv file, because all dollar amounts get exported as plain "218", so the formatting difference isn't visible here. Is there a way around this?
Maarten Bosmans committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=6cf44da0395ce5d8dc9cfda838fb15c593507363 Related: tdf#100709 Add unit test It will be available in 5.3.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Hello Maarten, Is this bug already fixed? If so, could you please close it as RESOLVED FIXED?
With Version: 5.3.0.0.alpha1+ Build ID: 928776b734c6aa188151bbce048d5bef4486dce7 CPU Threads: 2; OS Version: Linux 4.4; UI Render: default; VCL: gtk2; TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2016-10-23_00:39:08 Locale: en-US (en_US.UTF-8); Calc: single it opens quite fast now (<2secs).
Closing this as RESOLVED FIXED as per comment 13
Created attachment 133112 [details] Test case ODS for very slow loading I still see very slow load times (5+ seconds) with the attached file, while the source XLSX file I created it from loads in under one second in Excel 2007 on Wine, and had a bunch of additional data that I scrubbed (2 summary sheets, many more formulas, 4 extra columns etc.) LibreCalc version information below. The hardware is the same as in the first comment - Core i5 CPU, 16GB RAM, Ubuntu 16.04. Version: 5.3.2.2 Build ID: 6cd4f1ef626f15116896b1d8e1398b56da0d0ee1 CPU Threads: 4; OS Version: Linux 4.4; UI Render: default; VCL: gtk2; Layout Engine: new;
Created attachment 133113 [details] Removed macros, still loading as slow as before Removed macros to eliminate that potential cause of slowness. Looks like it's just the data and formulas.
What slows things down are the excess columns formatted up to column ALT but not the entire row, for example in rows 3241, 3443, 3513, ... Once those formattings are removed (ie. select F1:AMJ1048576 and Clear Direct Formatting from context menu) and the document is saved and reloaded the slowness is gone. However, the bottleneck is in ScFlatSegmentsImpl<bool, bool>::getRangeData() from ScMultiSelIter::ScMultiSelIter() and ScMultiSelIter::Next() under ScTable::MergeSelectionPattern() which are created for each column to apply the formatting. Digging deep into the coupling of ScMarkData, ScMultiSel, ScMultiCellIter and avoiding creation of temporary ScMultiCellIter instances with their segment trees might help.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=bb9b6cb4e6cfc5fbb4733abb8e9f5758987c197a Perf-sc: tdf#100709 avoid segment tree with ScMultiSelIter where possible It will be available in 5.5.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "libreoffice-5-4": http://cgit.freedesktop.org/libreoffice/core/commit/?id=3a52f3bf5ecf0a2b8d4f4b87e19911ad8e7eaefa&h=libreoffice-5-4 Perf-sc: tdf#100709 avoid segment tree with ScMultiSelIter where possible It will be available in 5.4.0.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=f300a627b9a88eb8814cc35844ed7c6aa7f19379 Perf-sc: tdf#100709 SfxPoolItem::IsVoidItem() instead of dynamic_cast It will be available in 5.5.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=59c9d0653cc42560af48269bb8dee2c2b0b20f68 Perf-sc: tdf#100709 Use a "one and a half" alloc strategy for ScMarkArray It will be available in 5.5.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "libreoffice-5-4": http://cgit.freedesktop.org/libreoffice/core/commit/?id=3f94381f72782e0c5b16df12de6d52de96e8a0d5&h=libreoffice-5-4 Perf-sc: tdf#100709 SfxPoolItem::IsVoidItem() instead of dynamic_cast It will be available in 5.4.0.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "libreoffice-5-4": http://cgit.freedesktop.org/libreoffice/core/commit/?id=2f25f408228528395f722f9332020ba9b13ed334&h=libreoffice-5-4 Perf-sc: tdf#100709 Use a "one and a half" alloc strategy for ScMarkArray It will be available in 5.4.0.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
There's one remaining bottleneck with the memory allocation schema for ScAttrArray, similar to ScMarkArray; however, the same "one and a half" strategy can not be applied as there exists each one ScAttrArray for every column, and having 1.5 times more memory persistently (as long as the document lives in memory) allocated than needed isn't a good strategy there.. it doesn't matter for ScMarkArray because that is temporary. Setting this bug to FIXED though.
I've removed all rows similar to ones mentioned in C17, but the file still takes 5-6 seconds to load vs. ~1 in Excel.
(In reply to Dan Dascalescu from comment #25) > I've removed all rows similar to ones mentioned in C17, but the file still > takes 5-6 seconds to load vs. ~1 in Excel. Could you please copy the info from Help - About LibreOffice?
@Dan: so, which version did you try with?
No answer, marking resolved fixed again.