Bug 151207 - [SAMPLE] Areas where multithreading would be needed to improve slow performance of common tasks in huge (million rows) spreadsheets
Summary: [SAMPLE] Areas where multithreading would be needed to improve slow performan...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: perf
Depends on:
Blocks: Calc-Threaded Performance Multithreading
  Show dependency treegraph
 
Reported: 2022-09-27 21:45 UTC by Jeff Fortin Tam
Modified: 2023-06-15 14:11 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeff Fortin Tam 2022-09-27 21:45:35 UTC
I've seen a handful of bug reports about the need for multithreading, but they typically always end up closed because they are considered "not actionable" / too vague, so it is my hope that by providing a "torture test" sample file here, and my measurements of the slowness of various common tasks, you will be able to have a reference points of various things that would highly benefit from being multithreaded, as they apparently are not right now. I'm filing this as a single bug report because it would feel a bit ridiculous & overwhelming to open two dozen reports for each and every point where the issue is manifest, and I hope this summary here will prove sufficient.

To reproduce the issues on your end, download this "torture test" sample file of mine, as it will be a very useful benchmark to discover areas where LibreOffice would benefit most from multithreading in commonly used tasks: 
https://fortintam.com/public/libreoffice-augustin-benchmark--million-rows-spreadsheet.ods

Here are some "obvious" areas where I've identified slow, single-core work happening (tested with my 8-cores Intel Xeon W3520 CPU):

* Opening the file takes 3 minutes and 35 seconds, using a single core from my CPU. If it used all cores, we could presume it would take only about 27 seconds. Possibly related: bug #128396 but that one was closed as being a duplicate of bug #65046 which was in turn closed as being fixed, but it's not fixed for big spreadsheets like what I'm seeing today.
* After selecting columns A to M and choosing "Data > Standard Filter", after choosing column D ("Page Title") in the "Field name" combobox, you need to wait 7 seconds before the UI unlocks and you can click the "Condition" combobox (to set the "Contains" condition, for example). Filtering (if you type "example" in the "Value" field and press OK) in itself is very fast, however. Just that the column selection GUI causes something slow to happen to populate the Condition combobox.
* Selecting columns A to M and doing a standard sort operation on column E ("referring domains") takes 15 seconds and uses only one CPU core. If it used my 8 cores, it would probably take less than 2 seconds.
* Search & replace (across the whole sheet, or "Current selection only" after selecting columns A to M), to replace "Example" by "Banana": it takes 1 minute and 8 seconds on my machine because it currently only uses 1 of the cores. If it used the 8 cores, it could accomplish this in roughly 8.5 seconds.
* Saving the file (as a new file) takes 47 seconds and seems very CPU-bound (rather than I/O bound?) as I once again see only one of the CPU cores used at 100%; we can presume that making this multi-threaded would allow saving this big file in 6 to 12 seconds (I'm being conservative on my estimate here).
* Filtering with auto filters is also a very slow process here, as the GUI for controlling it is interactive and thus much slower to react than the "fire once" Standard Filtering GUI. Particularly, if you try to type a string in the filtering entry, it will immediately try to search and filter through all the possible valid values, which is extremely expensive. I suspect it is trying to do that immediately upon typing each and every character, which is a bad thing to do from a performance standpoint, you will get massive performance gains by using a timeout-based search trigger like I am suggesting in bug #151206
* When selecting columns A to M and clicking the "Pivot Table" button, after selecting "Current sheet" (or something like that, it takes 40 seconds for the main "Pivot Table Layout" dialog/wizard to appear, again single-threaded, could benefit significantly from some optimizations there. Afterwards however, generating the pivot table from the dragged values is very fast (roughly 5 seconds), so congrats on that!
* Creating a bar chart out of columns A and E (with A as a label column), after the wizard's questions and dragging into the sheet to attempt to insert/draw the chart, the app once again uses only a single core and takes approximately 3.5 minutes to show the chart; possibly, if it was multithreaded, it could take as little as 30 seconds on my computer.

Under all of these conditions, you can observe that LibreOffice Calc 7.4 uses only one of your CPU cores, at 100%. These problems could be vastly reduced if it were to split the problem space across cores/threads, which would mean that on most computer it would be at least 4 to 16 times faster (since most CPUs have 4 to 16 cores/threads nowadays), which would make LibreOffice Calc very compelling compared to the competition on that front.
Comment 1 m_a_riosv 2022-09-28 08:13:34 UTC
For calc it is active several versions ago.
With a calc spreadsheet open.
Menu/Tools/Options/LibreOffice Calc/Calculate - CPU Threading Settings
Comment 2 Jeff Fortin Tam 2022-09-28 13:54:11 UTC
That setting* is already enabled on my end, so visibly it is not working/implemented for the usecases I have benchmarked above.


*: I don't even see why there is a setting for this, it should always be enabled by default for everyone. It's like having a setting for "Allow my Lamborghini to go faster than 20 km/h" or "Don't make me wait to see the doctor when they already are available" :)
Comment 3 m_a_riosv 2022-09-28 17:05:52 UTC
Don't set up your own report as NEW, except you assign it to you, to resolve it.

Sorry, do you really know about what are you talking about? Multi-thread reprogramming it's less but a trivial task, much worse in a project like LibreOffice with millions of lines of code.

Are you sure that all the described operations are susceptibles to convert to multi-thread.

On my windows with 4 logical processors, on the task manager, 100% is the four working at 100%, a 25% is usually one processor at 100%

You have not added the information about your LibreOffice installation, Menu/Help/About LibreOffice (Use the button in middle to copy).
Comment 4 Jeff Fortin Tam 2023-04-04 19:25:13 UTC
> Sorry, do you really know about what are you talking about?

Why attack me and question my competence for filing a bug report?


> Multi-thread reprogramming it's less but a trivial task,
> much worse in a project like LibreOffice with millions of lines of code.

I never said this would be "trivial" nor "easy", anywhere in my bug report. I have no idea why you would think I said such a thing. By going on a limb, the closest thing I can see is where I said 'Here are some "obvious" areas where I've identified slow, single-core work', where the word "obvious" meant that the problem is easily observed (or has a very noticeable impact) in those areas of the application; not that it would be "trivial" to fix.


> 100% is the four working at 100%, a 25% is usually one processor at 100%

Sure. I thought my wording was clear, when I said "uses only one of your CPU cores, at 100%.", that I meant "100% of one core being used, and 0% of the others". Of course I it was not "combined" CPU usage being at 100%, otherwise I wouldn't have been filing this bug report; when you look at this in htop or gnome-system-monitor (or any CPU usage visualizer that separates the cores/threads instead of combining them), it is *very* clearly noticeable on my machine when 1 logical processor is pegged at 100% and the 7 others are sitting idle.


> You have not added the information about your LibreOffice installation

Oops, here it is (this was probably with version 7.4 back then, but the current 7.5.x versions, provided by Fedora 38 or via the flathub flatpak, still exhibit the issue):

Version: 7.5.2.2 (X86_64)
Build ID: 50(Build:2)
CPU threads: 8; OS: Linux 6.2; UI render: default; VCL: gtk3
Locale: fr-CA (en_CA.UTF-8); UI: en-US
Calc: threaded
Comment 5 Xiaoc 2023-06-15 14:11:09 UTC
I think so.When processing hundreds of pages of Word, libreOffice is also slower to open than the competition, and we all want LibreOffice to be better