Bug 138644 - large spreadsheet severe performance bug while creating pivot table - changing data field function is very slow
Summary: large spreadsheet severe performance bug while creating pivot table - changin...
Status: RESOLVED INSUFFICIENTDATA
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.4.7.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Pivot-Table
  Show dependency treegraph
 
Reported: 2020-12-03 15:02 UTC by Joseph Ervin
Modified: 2023-04-20 03:31 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
large-ish dummy spreadsheet to show the problem (2.42 MB, application/vnd.oasis.opendocument.spreadsheet)
2020-12-03 15:05 UTC, Joseph Ervin
Details
Sample file with pt (9.68 MB, application/vnd.oasis.opendocument.spreadsheet)
2020-12-04 17:43 UTC, m_a_riosv
Details
Better spreadsheet to show the problem (824.30 KB, application/vnd.oasis.opendocument.spreadsheet)
2020-12-11 14:52 UTC, Joseph Ervin
Details
Backtrace changing source range (16.48 KB, text/plain)
2022-03-14 12:05 UTC, James Edwards-Jones
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joseph Ervin 2020-12-03 15:02:12 UTC
Description:
In a large spreadsheet (10MB) with > 100K rows while creating a pivot table, and double clicking on an element in the "data field" to change the "data field function", say from "sum" to "count", there is a long delay that is dependent on the size of the selected data.  This was not the case in versions from 6 months or a year ago (not sure where it broke).

In my case, double-clicking on the data field eats 100% of my CPU for >5 minutes just to bring up the menu so I can change from "sum" to "count". 

I frequently use libreoffice in my job to create pivot tables of a large spreadsheet with over 100,000 rows.  It's been great until recently.  The last time I did it was in July 2020 and have updated libreoffice since then, and just became aware of the problem. 

Steps to Reproduce:
1.select all rows of a large spreadsheet
2.select Insert->Pivot Table, leave "current selection" selected and click OK
3.drag an available field into the Data Fields area - it comes in with the Sum function by default.
4. Double-click the just dragged field in the data fields area to change from Sum, say to Count.  Notice the long delay for the dialog to pop up.  That's the bug.  It gets longer and longer as the spreadsheet grows. 

Actual Results:
Dialog box to change data field function selection takes a long amount of time to come up, dependent on the spreadsheet size.  The delay in my working spreadsheets is *minutes*.

Expected Results:
The dialog box should pop right up with no delay.  


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 6.4.7.2
Build ID: 6.4.7.2-3.fc32
CPU threads: 4; OS: Linux 5.6; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded
Comment 1 Joseph Ervin 2020-12-03 15:05:36 UTC
Created attachment 167802 [details]
large-ish dummy spreadsheet to show the problem

The attached spreadsheet is big enough to illustrate the delay in question, but with this spreadsheet the delay is only a couple seconds,..hardly enough to get excited about.  My real work spreadsheets have a lot more data, and the delay to open the data function selection dialog is like 5 minutes. I can't share those spreadsheets with you though.
Comment 2 m_a_riosv 2020-12-04 17:43:52 UTC
Created attachment 167839 [details]
Sample file with pt

Time about five seconds to update the pivot table with quadruple data than in your example (882.000 rows)

Version: 7.2.0.0.alpha0+ (x64)
Build ID: 761a672d62df1891b9f4f367a499b220ab2b33fa
CPU threads: 4; OS: Windows 10.0 Build 20180; UI render: Skia/Vulkan; VCL: win
Locale: es-ES (es_ES); UI: en-US Calc: CL
Comment 3 Joseph Ervin 2020-12-04 17:56:10 UTC
Yeah, the dummy spreadsheet I made seems simple enough, with a lot of the same data elements used over and over, that the really horrendous delays don't show up.  That sheet only gives me a delay of a few seconds as you've seen. The real spreadsheet I'm working with has a lot of columns with tons of unique data, and several sheets with 50K, 50K, 50K, and 170K rows.  I'm trying to do the pivot table on the big 170K row sheet, and it ties up my linux box for many minutes.

Prior versions of libreoffice (I don't know exactly where it started) didn't have this behavior at all, so something has changed that introduced some type of global spreadsheet operation that happens when trying to bring up that dialog to change the data function.

Since it creates the appearance of a hard hang (100% CPU for several minutes), I consider this to be a somewhat serious usability issue. 

There are workarounds, such as creating the pivot table over a trivial number of rows, and then editing its properties to add in the full range.
Comment 4 m_a_riosv 2020-12-05 16:40:45 UTC
But without a proper sample file to test, I can't confirm the bug.

Please test with a clean profile Menu/Help/Restart in Safe Mode
Comment 5 Joseph Ervin 2020-12-05 16:44:24 UTC
How long a delay is required before you'll consider the bug verified? I can't give you the work spreadsheet that I'm using, so I'll have to create one of similar size with other data.
Comment 6 Joseph Ervin 2020-12-05 16:49:30 UTC
confirmed that the problem is unchanged by starting in safe mode.
Comment 7 m_a_riosv 2020-12-05 17:27:25 UTC
(In reply to Joseph Ervin from comment #5)
> How long a delay is required before you'll consider the bug verified? I
> can't give you the work spreadsheet that I'm using, so I'll have to create
> one of similar size with other data.

There is not a rule for that, but depending on how many data and what the pt has to do.
Comment 8 Joseph Ervin 2020-12-11 14:52:37 UTC
Created attachment 168059 [details]
Better spreadsheet to show the problem

This spreadsheet seems to show the problem handily.  Please select the whole sheet and select "Insert->Pivot Table", Click OK to confirm "current selection" and then define thing1 as the rows, thing2 as the columns, and "serial" as the data, which defaults to "Sum - serial".  Then double-click "Sum - serial" to change it to something else (like count).  libreoffice will hang up with CPU at 100% for a long time.
Comment 9 m_a_riosv 2020-12-14 17:03:30 UTC
I can see but for me, it takes about 8 seconds.
Comment 10 Joseph Ervin 2020-12-14 19:12:16 UTC
Sorry to ask, but what OS are you testing on?  I tried it with 6.2.7.1 on Windows, and the dialog box pops up instantly.  I upgraded the Windows install to 6.4.7.2, and now it takes 8 seconds like you reported.  So something changed between 6.2.7.1 and 6.4.7.2 that makes Windows go from "instantly" to "8 seconds", and at least on my Linux install makes it take "minutes". 

It's on my Fedora Linux install that it takes minutes for the dialog box to come up, and I want to check that you're testing on Linux as well. 

I'm running on Fedora 32
$ uname -a
Linux linux 5.6.14-300.fc32.x86_64 #1 SMP Wed May 20 20:47:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Aside from bringing up this one dialog box, libreoffice 6.4.7.2 is zippy quick on my Fedora32 linux box.
Comment 11 Joseph Ervin 2020-12-15 04:19:57 UTC
Just FYI, I updated my Fedora32 install tonight to the latest bits, and the behvavior is unchanged; tt Still gets stuck with 100% CPU for minutes when trying to change the pivot table data field function from "Sum" to something else.  

The fact that windows behavior changed also from bringing up the dialog instantly to taking 8 solid seconds of 100% CPU is clearly indicative of a regression.  There's something going on here.  Bringing up this dialog should take zero time. 

Please look into this regression.
Comment 12 s5t1e3v4e3m11@hotmail.com 2021-05-03 15:33:34 UTC
Valuable observations by the OP. 

Using the 3rd attachment of the OP (attachment 168059 [details]) my 5 year old Windows i5 laptop took about 4 sec. when I changed from sum to count (Version: 7.1.2.2 (x64) / LibreOffice Community).

But when in the Pivot Table Layout under "Row Fields" I double click the "serial"
and then in THAT dialog "Data Field" (the dialog has the same name as the OP mentioned, which brought me here - please let me know if I should file a different bug report) click "Options..." it took about 18 MINUTES until the "data Field Options" dialog appeared.

The OP mentioned using many unique values; indeed this is the key to the poor performance. Creating 1,000,000 (1 million!) rows with just 10 distinct integers gets everything done almost instantly.
Comment 13 James Edwards-Jones 2022-03-14 12:02:01 UTC
I am experiencing similar issues with a 140MB spreadsheet.

Double clicking the action type always hangs and usually crashes. Sometimes with a message saying the connection to the wayland compositor has been lost.

The workaround I used to create the table was to select a smaller range of rows initially and then change the source, but this isn't reliable and most of the time the UI will hang indefinitely when changing the source to the full number of rows.


Something here doesn't scale well. Could the expensive computations be saved until confirming the pivot table dialog instead of taking place on every change and freezing the UI? Maybe calculate them only on a subset of data initially for large data sets?


If struggling to reproduce maybe try with a data set 1000x larger? Although my laptop is high spec so maybe something else is needed to reproduce.

Once/if the table gets successfully created it works ok, but I'm having to save after every action because the crashes are so predictable and retry dozens of times to succeed.
Comment 14 James Edwards-Jones 2022-03-14 12:05:23 UTC
Created attachment 178875 [details]
Backtrace changing source range
Comment 15 Roman Kuznetsov 2022-09-20 16:33:09 UTC
We still wait some example file 

Set to NEEDINFO while we'll get the example
Comment 16 QA Administrators 2023-03-20 03:27:03 UTC Comment hidden (obsolete)
Comment 17 QA Administrators 2023-04-20 03:31:25 UTC
Dear Joseph Ervin,

Please read this message in its entirety before proceeding.

Your bug report is being closed as INSUFFICIENTDATA due to inactivity and
a lack of information which is needed in order to accurately
reproduce and confirm the problem. We encourage you to retest
your bug against the latest release. If the issue is still
present in the latest stable release, we need the following
information (please ignore any that you've already provided):

a) Provide details of your system including your operating
   system and the latest version of LibreOffice that you have
   confirmed the bug to be present

b) Provide easy to reproduce steps – the simpler the better

c) Provide any test case(s) which will help us confirm the problem

d) Provide screenshots of the problem if you think it might help

e) Read all comments and provide any requested information

Once all of this is done, please set the bug back to UNCONFIRMED
and we will attempt to reproduce the issue. Please do not:

a) respond via email 

b) update the version field in the bug or any of the other details
   on the top section of our bug tracker

Warm Regards,
QA Team

MassPing-NeedInfo-FollowUp