Description: ODS file opens slow but after on/off Wrap text automatically it opens very fast Steps to Reproduce: 1. Open file from attach 2. It opens slow 3. Select all (Ctrl+A) on sheet "результат" 4. Enable Wrap text then disable Wrap text (use Format->Cells->Alignment->Wrap text automatically or use icon on toolbar) 5. Save file with another name 6. Close current file 7. Open saved file -> it opens fast Actual Results: file opens slow Expected Results: file opens fast Reproducible: Always User Profile Reset: No Additional Info: Версия: 6.4.0.0.alpha0+ (x64) ID сборки: ccfbe8b478f3daa8b5ec07a7e48dd5fbf8556811 Потоков ЦП: 4; ОС:Windows 10.0 Build 18362; Отрисовка ИП: по умолчанию; VCL: win; Локаль: ru-RU (ru_RU); Язык интерфейса: ru-RU Calc: threaded
Created attachment 155074 [details] Problem ODS file
Julien, could you make a beautiful FlameGraph here?
No pb, I'll give a try after my daytime job.
Created attachment 155096 [details] perf flamegraph On pc Debian x86-64 with master sources updated today (+enable-symbols), I retrieved this Flameware.
Here are the steps I did: - open Calc - launch data collect - open attached file - close file I noticed 2 parts in Flamegraph: - ScMatrixImpl::GetDouble calling mdds::multi_type_matrix<matrix_trait>::get_numeric - ScMatrixImpl::IsValueOrEmpty calling mdds::multi_type_matrix<matrix_trait>::get_type Kohei: sorry to put you in cc, I know you've been working hard on Mdds for some time now and have less time for LO. Anyway, thought you might have some opinion about knowing if the bottleneck is in mdds or LO.
I can't repro it in LO 6.0.4 but I can in 6.1.4 =>regression
seems to have started with: https://gerrit.libreoffice.org/plugins/gitiles/core/+/693953dd4699887bd3f5bca2c3582b5fae1d6992 commit 693953dd4699887bd3f5bca2c3582b5fae1d6992 [log] author Vasily Melenchuk <Vasily.Melenchuk@cib.de> Fri Apr 06 20:19:10 2018 +0300 committer Katarina Behrens <Katarina.Behrens@cib.de> Mon Oct 22 23:30:23 2018 +0200 tree 2091b2fe8d997ef84f149ace1e6a1f00fd8e08fe parent fad764c02c7a9cd210bfa44ea0ce1ac5354d6427 [diff] tdf#62268: allow row height recalculation on document load During document load rows with style:use-optimal-row-height="true" should recalculate it's height. * includes: Row height tolerance level increase for unittest * tdf#118086: calc: invalid row autoheight fixed /cygdrive/d/sources/bibisect/bibisect-win32-6.1 $ git bisect bad 40ab4a5cf85d27950e409bd4af0086cd98213719 is the first bad commit commit 40ab4a5cf85d27950e409bd4af0086cd98213719 Author: Norbert Thiebaud <nthiebaud@gmail.com> Date: Mon Oct 22 14:51:42 2018 -0700 source 693953dd4699887bd3f5bca2c3582b5fae1d6992 source 693953dd4699887bd3f5bca2c3582b5fae1d6992 :040000 040000 109429f0b4e7074293591b4bb614714854730480 80631ed7ef70e4de2eefa29d10c39c893f835b37 M instdir /cygdrive/d/sources/bibisect/bibisect-win32-6.1 $ git bisect log # bad: [75d131082ce51ed5a898d97bdc2b7a9fe5ddb340] source 5b3765f4d881e7ddefd0c4aad6886a46f000b4fc # good: [29d08f54c2f71ffee4fe12dbb24c5f5cbedecfd2] source 6eeac3539ea4cac32d126c5e24141f262eb5a4d9 git bisect start 'master' 'oldest' # good: [6227e15df9be101688e37cd891817cd858b49e03] source b8b7f8a8f8d97088181d287bb75e74facece16c6 git bisect good 6227e15df9be101688e37cd891817cd858b49e03 # good: [50b236fe0d359b9d5cc9998d2e72009a90a11d08] source b6025e6cffe2024fefebd161ea739188b4b4fdaf git bisect good 50b236fe0d359b9d5cc9998d2e72009a90a11d08 # good: [d59609a1bfbbb4f924492755719b7d340a51de1b] source eeaf4b0b7ad21da879554bdd93c9a9b97b8268d6 git bisect good d59609a1bfbbb4f924492755719b7d340a51de1b # good: [75029ade8fd0bbe5abc530394b85b346b499bc55] source ec79d0127f90d65d722e46688b6cfcf2f5e59794 git bisect good 75029ade8fd0bbe5abc530394b85b346b499bc55 # bad: [ab0db3bf58bf2ed341d05f62c0beefc85c134af3] source 6ed498a71482109fea731bb84f288f978bea12dc git bisect bad ab0db3bf58bf2ed341d05f62c0beefc85c134af3 # good: [7a121b163b7e4eab33c26d4d301b24f0a99d7bf8] source a42c65176f2791cf5e48578a8898bf03185adc89 git bisect good 7a121b163b7e4eab33c26d4d301b24f0a99d7bf8 # good: [5d779faf3839da9d060d6fb410ad30d0db1f3e66] source ac39aba9b2d08b061b0eef651f5ebc7a84391171 git bisect good 5d779faf3839da9d060d6fb410ad30d0db1f3e66 # bad: [e6b0819442248baf59e616887cf216f1da32be59] source 8b6f1d8460b931950b98b5968ff7734f3c128a4d git bisect bad e6b0819442248baf59e616887cf216f1da32be59 # bad: [84f4d9d1c1b9d0a1c0f0307dafefb33cfbd78c65] source d6f563b37d8a694c6c1d4c9ef3ba746c7f019517 git bisect bad 84f4d9d1c1b9d0a1c0f0307dafefb33cfbd78c65 # good: [00b943e017c148697e3b4ab3c938edcb07e6a33a] source e67ca59e293c4dd37795150cf871e36ca1affb76 git bisect good 00b943e017c148697e3b4ab3c938edcb07e6a33a # bad: [15b060764ee7934c58786891fab4d0f38a09498e] source 5de85be43198804573787d4186b156b5931c4a9f git bisect bad 15b060764ee7934c58786891fab4d0f38a09498e # good: [180656b1b8aed5295a44cbacded98f37e45f5f1d] source fad764c02c7a9cd210bfa44ea0ce1ac5354d6427 git bisect good 180656b1b8aed5295a44cbacded98f37e45f5f1d # bad: [40ab4a5cf85d27950e409bd4af0086cd98213719] source 693953dd4699887bd3f5bca2c3582b5fae1d6992 git bisect bad 40ab4a5cf85d27950e409bd4af0086cd98213719 # first bad commit: [40ab4a5cf85d27950e409bd4af0086cd98213719] source 693953dd4699887bd3f5bca2c3582b5fae1d6992
I can still reproduce this issue on latest 7.1 branch and also 7.2 branch. The slowness seems to be in adapting row height. I guess it was "on/off Wrap text" which removed the style:use-optimal-row-height='true', that is why at that case is opens fast. The "adapting row height" is very annoying. For me a 5MB ODS file may take one minute to open which stops at the adapting row height stage.
I would argue this is not fixable (as a reported regression), since the underlying problem is the slow row height calculation. Now, _not_ re-calculating row height creates another set of bugs (see bug 62268), so how about we re-purpose this as a increase-row-height-calculation-speed? As it stands now, I'd otherwise be inclined to close as WONTFIX (better be slow & correct in layout, than the other way round).
At least Calc should not add the style:use-optimal-row-height="true" attribute to each row for documents newly created. Currently, Calc add this attribute by default to each row. This attribute should be set only if the user explicitly select a row / some rows, then right-click and select "Optimal Row Height", or if the user select rows and then double-click on the row headers. Otherwise this attribute is useless. I see the main reason the row height get re-calculated is that if this attribute is true is that for programly generated documents if this attribute is explicitly added it is true that the row heights should be recalculated, see bug 62268.
Agree.
ODF Spec. ''' 20.394 style:use-optimal-row-height The style:use-optimal-row-height attribute specifies that a row height should be recalculated automatically if content in the row changes. The defined values for the style:use-optimal-row-height attribute are: •false: row height should not be recalculated automatically if content in the row changes. •true: row height should be recalculated automatically if content in the row changes. ''' Well, from this ODF standard, this attribute defines that the row height should be recalculated if the content in this row is changed (i.e., at the time of editing), rather than when the document is open. At time of editing when the row height is recalculated, a new row height value is determined and is saved to the ODF file. At file open, the application should use the defined row height value directly rather than recalculating each of them. For all new ODF spreadsheet docs there is a defined row height for each row. The rational of recalculating the row heights at file open as in the commit of bug 62268 is that for some mannully generated ODF files there is no row height value defined in the xml file, normally because those programs are not professional OpenDocument Producers and they simply want an ODF Viewer like Calc to calculate the row height for them when the file is open by the user. And I agree that in such case (if there is no defined row height value) Calc should help to recalculate. So, consider both bug 62268 and the properly generated ODF filed, the fix should be: if rowHeight and rowHeight>0: finalRowHeight = rowHeight else: if useOptimalRowHeight: (recalculate row height) finalRowHeight = reCalculatedRowHeight else: finalRowHeight = defaultRowHeight The improvement of the speed for the recalculation of row height is another issue. However, for large spreadsheets even the fastest calculation may still cost a lot of time and CPU circles as it need to loop into each row and each cell.
There is a lot to unpack here, so this will be a bit lengthy. First off, as Kevin correctly explained, that use-optimal-row-height flag corresponds to the flag to re-calculate optimal height when a cell value changes during editing, and it is not a flag to indicate whether the row heights should be re-calculated on load. Doing so would (as we now know) cause a noticeable performance degradation during file load for everyone, which is not a good look. If the standard is not clear about when this flag should trigger recalculation in my opinion that point should be further clarified in the standard. As for improving performance on row height re-calculation, to me it’s a lost cause since that process is already known to be very expensive involving getting font metric information as well as other attributes of the text for every character involved. Caching certain font metric information was attempted in the past, which may improve performance in certain situations, but it’s unclear how much that would help with the row height re-calculation. Still, any attempt to speed it up would not come anywhere close to not running it. As for a potential fix, the logic Kevin suggested is a reasonable approach, though I’m not sure whether we should check for the row height being 0. That’s a corner case that would not happen when Calc is the generator since setting the height to 0 would set the hidden flag while leaving the original height unchanged. I would just leave it as the generator’s responsibility to never set the height to zero, or leave out the value in case the desired value is not clear. But it’s just my opinion. I think either approach is fine. Now, here is the bad news. The current ODF import filter code is notoriously hard to work with since it was built on (IMO) the wrong architectural basis of basing it on (mostly) UNO API. UNO API is designed for run-time automation with change notifications firing everywhere. Making it built on UNO API unfortunately resulted in significant performance issue not to mention making it very very difficult to follow, understand, and make significant design changes to the code since UNO promotes the idea of decoupling all the moving parts. My hope at the time was to slowly switch from populating the content via UNO API to doing the same directly with ScDocument via its import-time specialized accessor ScDocumentImport. You see some trace of my earlier attempt in this part of Calc’s code. If someone is up for it, my suggestion would be to try to populate the content via ScDocumentImport instead of using UNO API. ScDocumentImport has direct access to ScDocument’s private parts, and is designed to populate the document content without unnecessary change notifications etc. I did make quite some inroads toward using ScDocumentImport to speed up loading for other, non-UNO based import filters, but unfortunately I only made small progress with the native ODF import filter. It may be a good idea for someone to pick up the torch to continue further. Alternatively, it may be actually simpler to introduce an internal configuration option to toggle row height re-calculation on load (defaults to off, of course) to satisfy the use case in tdf#62268. Fixing the import filter code would be the ideal approach, but I’m not sure if anyone would want to even touch that code… I wouldn’t, at least not willingly. ;-) Only those with enough bandwidth could tackle that code, and I don’t have much bandwidth these days unfortunately. As a final aside, my frustration with working with this code also motivated me to re-architect the ODF import filter in orcus, which can be turned on in Calc with some effort (right now it’s disabled). But that filter is only 10 to 15% complete, so using that would be a long shot. Maybe someday it will become somewhat feature complete, but who knows.
I was not able to reproduce original testcase: in my dev environment in debug build I let Calc to load document for ~ 30 mins without any success and then stopped. But I do not think that this performance issue is related to row height recalculation. See previously attached perf.svg.bz2: >ScFormulaCell::InterpretFormulaGroupThreading (917 samples, 56.57%) >ScDocRowHeightUpdater::update (1 samples, 0.06%) First one is a multiplication of matrix done in multiple threads most of the time. Second is a row height calculation during load, it is almost impossible to find on this svg.
14 sec for the opening the file in Version: 7.5.0.0.alpha0+ / LibreOffice Community Build ID: e4d23c27288b99c3ed3cfa332ff308b31c01f97d CPU threads: 4; OS: Linux 5.14; UI render: default; VCL: gtk3 Locale: ru-RU (ru_RU.UTF-8); UI: en-US Calc: threaded Jumbo Intel Core 2 Quad 9450 here
Dear Roman Kuznetsov, To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
9 sec in Version: 24.8.0.0.alpha1+ (X86_64) / LibreOffice Community Build ID: d2eab48f697a1e6097778158f623f11306ac7a3d CPU threads: 16; OS: Windows 10 X86_64 (10.0 build 19045); UI render: Skia/Raster; VCL: win Locale: ru-RU (ru_RU); UI: ru-RU Calc: CL threaded