Bug 154525 - [sample] Calc is extremely slow (3+ minutes) to open Lenovo's Accessories "Options Compatibility Matrix" spreadsheet
Summary: [sample] Calc is extremely slow (3+ minutes) to open Lenovo's Accessories "Op...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
24.2.0.0 alpha0+
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:ooxml, filter:xlsx, perf
Depends on:
Blocks: Calc-Threaded Performance
  Show dependency treegraph
 
Reported: 2023-03-31 20:19 UTC by Jeff Fortin Tam
Modified: 2024-04-12 05:37 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of stack traces in sysprof (159.39 KB, image/png)
2023-04-01 01:20 UTC, Jeff Fortin Tam
Details
sysprof 44 capture file with debug symbols activated (1.42 MB, application/x-xz)
2023-07-29 02:22 UTC, Jeff Fortin Tam
Details
Screenshot of stack traces in sysprof 44 (top half) (389.93 KB, image/png)
2023-07-29 02:23 UTC, Jeff Fortin Tam
Details
Screenshot of stack traces in sysprof 44 (bottom half) (149.28 KB, image/png)
2023-07-29 02:23 UTC, Jeff Fortin Tam
Details
Flame graph screenshot from Sysprof 46 when loading the Lenovo accessories spreadsheet (82.77 KB, image/png)
2024-03-11 05:42 UTC, Jeff Fortin Tam
Details
Flame graph screenshot from Sysprof 46 when loading the ODS spreadsheet from comment 5 (228.86 KB, image/png)
2024-03-11 05:46 UTC, Jeff Fortin Tam
Details
xlsx sample file - Lenovo's accessories spreadsheet (7.72 MB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2024-04-09 17:33 UTC, Jeff Fortin Tam
Details
video of a stopwatch showing the last 30 seconds of loading the Lenovo accessories spreadsheet on Linux with Calc 24.2.2.2 (726.14 KB, video/webm)
2024-04-09 23:15 UTC, Jeff Fortin Tam
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeff Fortin Tam 2023-03-31 20:19:09 UTC
Description:
Trying to open the attached spreadsheet causes Calc to eat 100% of one of my CPU cores (it would be faster if it was multi-threaded) for 3 minutes and 20 seconds, on a ThinkPad X220T with Intel® Core™ i5-2520M × 4 CPU + Mesa Intel® HD Graphics 3000 (SNB GT2) running on Fedora 37's GNOME Xorg (X11) session.

Steps to Reproduce:
1. Grab this spreadsheet from https://download.lenovo.com/pccbbs/options_iso/ocm_mar_2023.xlsx (found through clicking the Thinkpad link on http://www.lenovo.com/accessoriesguide )
2. Open that file.
3. Put a kettle on your CPU to prepare some tea while you wait 😏

Actual Results:
3 mins and 20 seconds to open the spreadsheet and for the UI to become reactive again. During that time, one of my 4 CPU cores is 100% used, and temperatures climb to 75°C (it would probably be higher and closer to 90-100°C if I didn't have really good thermal paste).

Even closing the app/document (read-only, with no unsaved changes) takes a solid 5-10 seconds, instead of being instantaneous.

Expected Results:
Open in 10-20 seconds or less.

Ideally use all my CPU cores to process the spreadsheet.


Reproducible: Always


User Profile Reset: No

Additional Info:
💡 It would probably be very beneficial for TDF to download, archive and integrate that spreadsheet as part of LibreOffice's standard performance/regression document testsuite... it certainly is a great bench test.

---

Version: 7.5.1.2 (X86_64) / LibreOffice Community
Build ID: fcbaee479e84c6cd81291587d2ee68cba099e129
CPU threads: 4; OS: Linux 6.0; UI render: default; VCL: gtk3
Flatpak
Calc: threaded
Comment 1 m_a_riosv 2023-03-31 21:45:33 UTC

*** This bug has been marked as a duplicate of bug 129228 ***
Comment 2 ady 2023-03-31 23:06:50 UTC
(In reply to m.a.riosv from comment #1)
> 
> *** This bug has been marked as a duplicate of bug 129228 ***

This new bug 154525 is being reported against LO 7.5. Bug 129228 has been set to FIXED for several years now.

Shouldn't this report be confirmed in some way or another before setting it as duplicate of an old already-FIXED report?

Isn't it possible that the description of the behavior is similar, but the problem is different (or came back again)?
Comment 3 Jeff Fortin Tam 2023-04-01 00:51:25 UTC
It doesn't look like a strict duplicate (of bug #129228) to me either, and "It's hard!" is not a resolution status in my view.

This being on 7.5, newer than 7.4, tells me it's also a different problem from what was investigated and fixed in https://llunak.blogspot.com/2022/07/making-unsorted-lookups-in-calc-fast.html

---

For comparison now, I installed Gnumeric, and it was able to open this same file in less than 11 seconds, with zero lag (whether scrolling or changing pages) while editing the file after opening it. *Gnumeric*, ladies and gentlemen! On the same computer, same OS, etc. There is then no reason why LibreOffice Calc can't achieve sub-20 seconds open times (let alone be able to scroll fluidly once opened, but I guess that's a different matter).

I also asked two friends to time how long it takes to open that file in Excel on Windows. Results: for one of them (with an older laptop) it took 13 seconds. For the other, it took... 3 seconds. Ouch.

I believe my initial "performance objective" guess remains valid: LibreOffice should aim to open this sample file "in 10-20 seconds or less."

LibreOffice taking over 3 minutes to open this file weakens the app's credibility for enterprise scenarios, or even simply for community marketing when you try to convince individuals to use it (and you are then embarrassed when it falls apart on their data).
Comment 4 Jeff Fortin Tam 2023-04-01 01:20:04 UTC
Created attachment 186384 [details]
Screenshot of stack traces in sysprof

I don't know if that helps, but if you look at it with sysprof, it seems (from my untrained eye) that there is a lot of expensive stuff going on with libsclo and libsvllo...
Comment 5 Jeff Fortin Tam 2023-04-04 19:40:43 UTC
Actually not just the Lenovo accessories spreadsheet, but also the sample spreadsheet at https://fortintam.com/public/libreoffice-augustin-benchmark--million-rows-spreadsheet.ods exhibits the problem in a similar way. While this spreadsheet was originally used as part of bug #151207 to show one various areas where multithreading would be beneficial, it can also be used just as a general file opening performance benchmark, even if file opening was multithreaded.

Indeed, presumably, even if it would be multithreaded and would use all 4 logical CPUs to open the Lenovo spreadsheet instead of 1, it would still take 50 seconds (instead of 3 mins 20) on that computer, which is still much slower than the 3 to 13 seconds that Gnumeric and Excel can manage. So, I can imagine that multithreading could help, but wouldn't be the entire solution by itself.
Comment 6 Jeff Fortin Tam 2023-04-06 21:09:57 UTC
Not X11-specific, and potentially not graphics-stack-related: I also tested on Wayland with an Intel Xeon W3520 CPU (8 logical CPUs) and AMD Radeon "Pitcairn" R270 graphics, and it takes the same time to open this sheet.
Comment 7 ady 2023-04-06 21:21:59 UTC
Also under Win10. > NEW.

I have not tested with other spreadsheet tools.
Comment 8 Jeff Fortin Tam 2023-07-29 02:22:11 UTC
Created attachment 188627 [details]
sysprof 44 capture file with debug symbols activated

I have now done `dnf debuginfo-install libreoffice libreoffice-calc` and recorded a new capture with Sysprof 44 on Fedora 38 in a Wayland GNOME session. Attached is the sysprof* capture file, which is probably going to be more useful than the previous capture attempt.

In the future, a recording with sysprof 45 could bring even more meaningful data and a nicer analysis GUI (but I could probably only record that in November-January, if nobody has identified the problem from the existing sysprof output here by then).

I will also attach some screenshots of what this current sysprof capture looks like, for your convenience of being able to see the top stack traces / function calls "at a glance".

*: https://wiki.gnome.org/Apps/Sysprof
Comment 9 Jeff Fortin Tam 2023-07-29 02:23:02 UTC
Created attachment 188628 [details]
Screenshot of stack traces in sysprof 44 (top half)
Comment 10 Jeff Fortin Tam 2023-07-29 02:23:18 UTC
Created attachment 188629 [details]
Screenshot of stack traces in sysprof 44 (bottom half)
Comment 11 b. 2023-08-02 05:20:34 UTC
most likely calcs old funbrakes:  
'comments / notes / post-it's / captions / annotations or tracking notes'  
and conditional formatting.  
Not comments in the bug report, but 'comments', 'notes', in the file.  
A funbrake in LO Calc since years, and this document has lots! of them.  
Find similar reports: search for 'comment' 'slow' in bugs, trail down the 'See Also's
 too.  
( there are old reports marked 'fixed', _assume_ 'not' )  
Check the amount: F5 ( navigator ), there 'comments'.  
verify the impact: remove comments and reopen file.  
( here > 2 min. -> ~ 1 min. )  
_assume_: ineffective iterations / nested looping / somewhere.  
( 'quadratic' or 'exponential' 'explosion', other possible  
sources and influences see in old reports )  
remove conditional formattings, save and re-load.  
( here ~ 1 min. -> < 40 sec. )  
Tested with 7.4.5.1 under Linux.  
While bugs are pushed around for years instead of solving  
the problems will be reported again and again.
Comment 12 Noel Grandin 2023-09-13 12:40:04 UTC
Thanks to caolan's fixes, this opens in 10s (on a very fast machine), which means it should open in 20s or so on a more normal machine.
So I think we can consider this fixed.
Comment 13 Jeff Fortin Tam 2023-09-13 14:16:32 UTC
Interesting, I'm cautiously optimistic and hope to test this again with the next version when it comes out (unless the fixes could be backported to 7.6.x, or there is an easy flatpak-style way for users to test the upcoming 24.02 version?) Is there a particular set of changes/commits that we could point to as the probable fix for this?
Comment 14 m_a_riosv 2023-09-13 23:31:05 UTC
24 seconds for me with
Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 92d0fa8c3a7998e51c8f27409ed8c44f34c42c8e
CPU threads: 16; OS: Windows 10.0 Build 22621; UI render: Skia/Vulkan; VCL: win
Locale: es-ES (es_ES); UI: en-US Calc: CL threaded Jumbo

Should be a bit better in a liberated release.

Please, @Caolán, can you set up the bug as fixed.
Comment 15 Caolán McNamara 2023-09-15 09:46:26 UTC
I imagine that this was:

commit 2bda87fd8758448267c447ba26f1932325a1338d
Date:   Fri Aug 11 13:29:23 2023 +0100

    defer turning xlsx notes into SdrCaptions until activated
    
    to improve import performance

or similar in the comment area because there are over 40,000 comments in that document.
Comment 16 Jeff Fortin Tam 2024-02-03 20:08:37 UTC
I had great hopes for this, unfortunately I must report that the problem persists in the latest version, with this same Lenovo spreadsheet above, as well as the other one-million rows spreadsheet sample mentioned in comment #5.

I have tested on the Wayland version of GNOME Shell 45.3 on Fedora 39, with:

Version: 24.2.0.3 (X86_64) / LibreOffice Community
Build ID: da48488a73ddd66ea24cf16bbc4f7b9c08e9bea1
CPU threads: 8; OS: Linux 6.6; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Flatpak
Calc: threaded

On my Intel Xeon W3520 CPU (8 logical CPUs) and AMD Radeon "Pitcairn" R270 graphics, the sample Lenovo spreadsheet mentioned above takes 2 minutes and 31 seconds (roughly) with the setup above. The one-million rows spreadsheet sample mentioned in comment #5 still takes a minute. During that time, in both cases, only one of the CPU cores/threads is used, at 100%.
Comment 17 Jeff Fortin Tam 2024-02-03 20:39:45 UTC
As an additional note (just for the sake of exhaustivity) regarding that Radeon R9 270: it seems like the "Allow use of OpenCL" setting is turned off by default in the "LibreOffice > OpenCL" options; however, even if I enable it, it makes no difference in processing times, and in both cases I see roughly 0% usage in radeontop. Everything is happening on the CPU, in only one thread, even though multi-threaded calculation is enabled.
Comment 18 m_a_riosv 2024-02-04 00:51:50 UTC
About ten seconds with
Version: 24.2.0.3 (X86_64) / LibreOffice Community
Build ID: da48488a73ddd66ea24cf16bbc4f7b9c08e9bea1
CPU threads: 16; OS: Windows 10.0 Build 22631; UI render: Skia/Raster; VCL: win
Locale: es-ES (es_ES); UI: en-US
Calc: CL threaded


The issue was still with
Version: 7.6.5.0.0+ (X86_64) / LibreOffice Community
Build ID: 2e65401cf50ca25e16a0f3d4b624e2b48c97644c
CPU threads: 16; OS: Windows 10.0 Build 22631; UI render: default; VCL: win
Locale: es-ES (es_ES); UI: en-US
Calc: CL threaded
Comment 19 Jeff Fortin Tam 2024-02-04 03:00:34 UTC
The pattern I'm noticing with the other two who replied so far is those of you who now report better performance seem to be using the Skia renderer on Windows, whereas the Linux version (at least the flatpak one) seems to be using the "Default" renderer (that you all used to run) with GTK3…
Comment 20 Jeff Fortin Tam 2024-03-11 05:42:27 UTC
Created attachment 193053 [details]
Flame graph screenshot from Sysprof 46 when loading the Lenovo accessories spreadsheet

Here is a flame graph performance profile of what happens during those 1 minute 30 seconds needed to open the Lenovo spreadsheet with LibreOffice 24.2 (flatpaked) on my ThinkPad T480 laptop running Fedora 39 on Wayland.

Version: 24.2.1.2 (X86_64) / LibreOffice Community
Build ID: db4def46b0453cc22e2d0305797cf981b68ef5ac
CPU threads: 8; OS: Linux 6.7; UI render: default; VCL: gtk3
Locale: fr-CA (en_CA.UTF-8); UI: en-US
Flatpak
Calc: threaded
Comment 21 Jeff Fortin Tam 2024-03-11 05:46:14 UTC
Created attachment 193054 [details]
Flame graph screenshot from Sysprof 46 when loading the ODS spreadsheet from comment 5

Here is another flame graph from Sysprof 46 under the same system conditions, this time using a simplified (only 150 thousand rows) version of the spreadsheet sample from comment #5.

This file being in ODS format, the flame graph exhibits different function calls than the Lenovo one (with its xlsx format).
Comment 22 Caolán McNamara 2024-03-11 20:00:28 UTC
The flamegraphcs appear to be massively dominated by loading, and next to no rendering so I think anything around skia vs gtk or wayland doesn't seem to matter wrt the flamegraph at least
Comment 23 Jeff Fortin Tam 2024-03-11 20:22:26 UTC
That would seem logical to me, because in practice, Calc does not show anything in the canvas during loading, it only shows the progressbar below the view, during the whole 1.5 minutes. Once it reaches the end of the progressbar / XML processing, it then renders the whole document in one pass, as far as I can tell from casual visual observation.

Apologies for my incorrect previous assumption that maybe something graphically-related explained the difference between my situation and other commenters'; I wonder why some people are not experiencing the issue, while it's still 100% reproducible here on any of my Linux computers. I presumed that the filters/backends would be the same across platforms…
Comment 24 Armondo Lopez 2024-04-03 20:39:25 UTC
I don't experience any extensive period of load time in 

Version: 24.2.1.2 (X86_64) / LibreOffice Community
Build ID: db4def46b0453cc22e2d0305797cf981b68ef5ac
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

or

Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: a2265e8faa099d9652efd12392c2877c2df1d1eb
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded
Comment 25 Timur 2024-04-09 08:37:36 UTC
Jeff, can you upload the file as attachment? External links are not reliable.
Comment 26 Jeff Fortin Tam 2024-04-09 17:33:16 UTC
Created attachment 193587 [details]
xlsx sample file - Lenovo's accessories spreadsheet

Attached is the xlsx sample (Lenovo's accessories spreadsheet) for posterity.

The other spreadsheet that exhibits the slow performance problem but with different code paths (as seen in the flame graphs) due to being OpenDocument-formatted is too big to attach (80 megabytes, exceeding this bug tracker's 30 megabytes filesize limit), but I promise to keep hosting it at:

https://fortintam.com/public/libreoffice-augustin-benchmark--million-rows-spreadsheet.ods

…at least until the bug is fixed when it comes to loading that spreadsheet in addition to the xlsx one.
Comment 27 m_a_riosv 2024-04-09 20:56:17 UTC
Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 432c866072aa62cf90168d569dc56cbc7269bcda
CPU threads: 16; OS: Windows 10.0 Build 22631; UI render: Skia/Raster; VCL: win
Locale: es-ES (es_ES); UI: en-US
Calc: CL threaded Jumbo

With: https://bugs.documentfoundation.org/attachment.cgi?id=193587
about 25 seconds.

With: https://fortintam.com/public/libreoffice-augustin-benchmark--million-rows-spreadsheet.ods
about 1' 55"
Comment 28 Jeff Fortin Tam 2024-04-09 23:15:06 UTC
Created attachment 193590 [details]
video of a stopwatch showing the last 30 seconds of loading the Lenovo accessories spreadsheet on Linux with Calc 24.2.2.2

Everyone reporting OK load times for the xlsx file are so far running Windows…

Here is video proof of just how slow it still is on Linux:
even with a pretty recent CPU like an Intel Core i5-8350U,
it takes over 3 min 50 secs before load completes / contents show up.
It remains sluggish when using it afterwards.

I find it interesting that m_a_riosv still experiences the slow load time on Windows when using the ods file however, I did not expect that…
Comment 29 Jeff Fortin Tam 2024-04-09 23:16:27 UTC
I think Stéphane uses Linux too, so they could probably confirm whether they are seeing this issue manifesting differently across the two platforms on their end.
Comment 30 Timur 2024-04-10 17:51:20 UTC
I tested original "Lenovo's accessories spreadsheet" in Linux, here with GEN.
As noted before, huge number of comments there, and it loads fast if comments are removed (checked that for conditional formatting but comments are the culprit).
I cannot confirm that single CPU is used all the time, rather large chunks of time are one of CPUs, but it changes. 

LO 7.5:   50-51 sec
LO 7.6:   54-57 sec
LO 24.2:  30-90 sec
LO 24.8+: 18-23 sec
MSO:          7 sec

Strange that loading times in 24.2 are in large range in the same computer. 

Good that current master is the fastest. Biggest speed up in 24.8 with 56>24 sec is:

commit 2e1f9da8a6359c8909e087a92239aefd4851b116	[log]
author	Armin Le Grand (allotropia) <armin.le.grand.extern@allotropia.de>	Sat Dec 23 15:52:06 2023
Decouple ScPatternAttr from SfxItemPool

Just looking at seconds, there should be speed up, just not sure if it's possible without overhaul.
Comment 31 Stéphane Guillou (stragu) 2024-04-12 05:37:56 UTC
(In reply to Jeff Fortin Tam from comment #29)
> I think Stéphane uses Linux too, so they could probably confirm whether they
> are seeing this issue manifesting differently across the two platforms on
> their end.
With attachment 193587 [details], I get:

gtk3:
- 24.2.2: ~1m25s
- current 24.8alpha0+ build: ~42s

gen:
- 24.2.2: ~1m20s
- current 24.8alpha0+ build: ~41s

So a clear improvement in 24.8 regardless of VCL plugin, but still too long.

Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: d5dcc9de8ebce5d14be89ddeb6606ef0aeebf7a9
CPU threads: 8; OS: Linux 6.5; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: CL threaded

My win11 setup is a VM, so more sluggish in general, but 24.2.1 and 24.8alpha0+ both take ~1 minute (most files take a couple of seconds).

Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 7b9905df455b47977968a185a7c43f35541e018b
CPU threads: 4; OS: Windows 10.0 Build 22631; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded