Bug 86019 - Poor performance opening spreadsheet with a particular chart
Summary: Poor performance opening spreadsheet with a particular chart
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.1.6.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Dennis Francis
URL:
Whiteboard: target:6.0.0 target:5.4.2
Keywords: bibisected, bisected, perf, regression
Depends on:
Blocks: Chart
  Show dependency treegraph
 
Reported: 2014-11-07 20:51 UTC by rlk
Modified: 2017-09-23 09:03 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
Spreadsheet demonstrating good save performance (2.84 MB, application/vnd.oasis.opendocument.spreadsheet)
2014-11-07 20:51 UTC, rlk
Details
Spreadsheet demonstrating poor save performance (2.85 MB, application/vnd.oasis.opendocument.spreadsheet)
2014-11-07 20:51 UTC, rlk
Details
Simpler case demonstrating the problem (1.59 MB, application/vnd.oasis.opendocument.spreadsheet)
2014-11-07 23:54 UTC, rlk
Details
Even simpler example (555.60 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-11-08 01:40 UTC, rlk
Details

Note You need to log in before you can comment on or make changes to this bug.
Description rlk 2014-11-07 20:51:21 UTC
Created attachment 109115 [details]
Spreadsheet demonstrating good save performance

I have a spreadsheet which exhibits very different performance opening depending upon whether a certain data series is present in a chart or not.

The two attached spreadsheets demonstrate the issue.  They are identical except that one has the Row 2021 data series in the chart on the 5th page.  The version that does not have that row takes 22 seconds of CPU time to save (Core i7-920XM).  The version that does takes 163 seconds to save.

In addition, when I exit editing the chart (by clicking outside of the chart area), the version with the row 2021 data series takes several minutes before returning to the spreadsheet (didn't measure it).  The version without that data series takes maybe 10 seconds or so.
Comment 1 rlk 2014-11-07 20:51:59 UTC
Created attachment 109116 [details]
Spreadsheet demonstrating poor save performance
Comment 2 rlk 2014-11-07 20:53:05 UTC
Note that the Row 2021 data set is the only one that has the data series in a row; the other data series are all columnar (for what it may be worth).
Comment 3 Joel Madero 2014-11-07 21:48:40 UTC
Ubuntu 14.04
LibreOffice 4.3.3.2 release

@Rik - thanks for reporting! In the future please always include your operating system and version of libreoffice in your report :)

So this one is quite interesting/complex - findings:

Open time is essentially the same on both.
Save time is dramatically different - I gave it 3+ minutes and finally quit the program.

That being said, the spreadsheet is so complex that it is REALLY hard to figure out what is causing the problem. If a better and simpler test case could be provided, that would be fantastic.

Also tested on 3.3 (inherited from OOo)
Opening time is substantially slower in 3.3
Save time:
(1) for first sheet - save time was 35 seconds;
(2) for second sheet - save time was 106 seconds.

Because it seems like it's gotten slower - I am confirming this as NEW. That being said, it might just be a result of an incredibly complex chart which takes time to save....and the test document is not terribly useful for determining what is going on because it's so complex.

A bibisect might help (although will take some time) so asking for one to explicitly address the slowing down of save time for the second sheet.


Setting as:
New.
Comment 4 rlk 2014-11-07 23:03:29 UTC
(rlk, not rik)

I did include the OS and version, 4.1.6.2 on x64/Linux.  Just not in the comments.

I know that spreadsheet is hideously complicated.  Believe it or not, it's only a fraction of the real thing (it's the full data set, but only a small number of pages).  I'll see what I can do.

The interesting thing is that the only difference between the two spreadsheets is that the slow one has one extra data set in the graph, and that extra data set has only about half a dozen points (the others have hundreds up to a few thousand).  But the obvious difference is that the extra data set is in a pair of rows rather than columns.
Comment 5 rlk 2014-11-07 23:54:18 UTC
Created attachment 109121 [details]
Simpler case demonstrating the problem

OK, here's a simpler case.  Same spreadsheet, without the formulas and formats (everything is converted to its numerical representation).  Saves in 131 seconds.  Remove that "row 2021" data range and it saves in 3 seconds.
Comment 6 Joel Madero 2014-11-08 01:34:14 UTC
So awesome - thanks!
Comment 7 Joel Madero 2014-11-08 01:39:38 UTC
so just a quick question, row 2021 of which sheet? I just saved and it definitely was more than 3 seconds but maybe that's just because my system is slower than yours
Comment 8 rlk 2014-11-08 01:40:20 UTC
Created attachment 109122 [details]
Even simpler example

This one removes the irrelevant columns and data.  With the row 2021 dataset, it takes 100 seconds to save; without, it takes 2 seconds.

The issue is not whether it's a regression or not; it's that adding an additional tiny data range to a chart drastically inflates save time.
Comment 9 rlk 2014-11-08 01:42:00 UTC
row 2021 -- look in the chart; it's row 2021 of the Splits sheet (row 2019 is the X axis).  You'll need to scroll over to the right (or ctrl+end) to see the data in that row.
Comment 10 m_a_riosv 2014-11-08 19:30:42 UTC
I think the issue is that you are mixing data in columns with data in rows. All are column data except Row2021 just the source of the issue.

Even it works I think it's not by design. In fact wizard or range editor ask for data in columns or rows.

For me not a bug. But I hope someone with acknowledgement about charts can bring some light.
Comment 11 rlk 2014-11-08 19:48:04 UTC
I do suspect that that mixing is at least part of the problem, but it's a problem nonetheless.  If it's not supposed to work at all it should be documented and blocked, but it does appear to be allowed and it's far from clear why it should lead to such a large performance problem.
Comment 12 Terrence Enger 2014-11-09 20:56:27 UTC
Working the the bibisect-43all repository, I see from `git bisect bad`:

    ea5723ff672d4a3ffb9ee123a96dda4681703a50 is the first bad commit
    commit ea5723ff672d4a3ffb9ee123a96dda4681703a50
    Author: Bjoern Michaelsen <bjoern.michaelsen@canonical.com>
    Date:   Wed May 21 08:57:55 2014 +0000

        source-hash-1b8cfd5d1414da9127ff18e4701b40f5bceabb36
    
        commit 1b8cfd5d1414da9127ff18e4701b40f5bceabb36
        Author:     Markus Mohrhard <markus.mohrhard@collabora.co.uk>
        AuthorDate: Tue May 20 23:59:55 2014 +0200
        Commit:     Markus Mohrhard <markus.mohrhard@collabora.co.uk>
        CommitDate: Wed May 21 01:50:44 2014 +0200
    
            remove unnecessary whitespaces
    
            Change-Id: I4d93c0149aaf90e2477c1382aa51f8f08c967626

    :100644 100644 501fbfa427b8a231fe035cd8c6c9861d0483e887 03ad3b133ed354d5ab45bdc712a5355013402c17 M	ccache.log
    :100644 100644 94e1769dda5ee27e4952754e43c4d26fe19c2009 e900cb9aba87ebf78801bd60b03866aab0460fac M	commitmsg
    :100644 100644 8c966b034e7c757c8922b3ba931bd93077f23e64 b8dc772fd79ceee78f8854fb2bdf577904e75049 M	make.log
    :040000 040000 5f3414102b71bcb5a2c7f817d6502959169c4345 291f8ca8cacce30d162bc70628131b1a960519bb M	opt

and from `git bisect log`:

    # bad: [423a84c4f7068853974887d98442bc2a2d0cc91b] source-hash-c15927f20d4727c3b8de68497b6949e72f9e6e9e
    # good: [65fd30f5cb4cdd37995a33420ed8273c0a29bf00] source-hash-d6cde02dbce8c28c6af836e2dc1120f8a6ef9932
    git bisect start 'latest' 'oldest'
    # good: [e02439a3d6297a1f5334fa558ddec5ef4212c574] source-hash-6b8393474974d2af7a2cb3c47b3d5c081b550bdb
    git bisect good e02439a3d6297a1f5334fa558ddec5ef4212c574
    # good: [4850941efe43ae800be5c76e1102ab80ac2c085d] source-hash-980a6e552502f02f12c15bfb1c9f8e6269499f4b
    git bisect good 4850941efe43ae800be5c76e1102ab80ac2c085d
    # skip: [a900e72b6357882284c5955bdf939bf14269f5fb] source-hash-dd1050b182260a26a1d0ba6d0ef3a6fecc3f4e07
    git bisect skip a900e72b6357882284c5955bdf939bf14269f5fb
    # skip: [3dda83fc3a43afc6af7f5c0ffd029e610ec1b9a3] source-hash-c59b3d6c5c8096486730007d9b9b053793b90b1e
    git bisect skip 3dda83fc3a43afc6af7f5c0ffd029e610ec1b9a3
    # good: [4f705a8cfb1998b09f2062510b207d35a33647d8] source-hash-1eeb20f3958666ec6ba6e0fcf52e92e5eb447a14
    git bisect good 4f705a8cfb1998b09f2062510b207d35a33647d8
    # good: [465e2be02951f9645beb3024506a5212907caf5f] source-hash-674801eb4af21c9ae83c122499f15fa4f4785b0f
    git bisect good 465e2be02951f9645beb3024506a5212907caf5f
    # good: [1de992ed60dad1fd65ff0383a8610b0468d9d772] source-hash-8bf0b9536cb33dfcce8a811b70c2ead285300f3f
    git bisect good 1de992ed60dad1fd65ff0383a8610b0468d9d772
    # good: [882db5e268e28962bdf805c820a5e031b0df9936] source-hash-383dccc094f8c8c07b4298ce0b7406d18cd61cee
    git bisect good 882db5e268e28962bdf805c820a5e031b0df9936
    # good: [69e91b0c890097cfb1781733f260644bb1ec9ee8] source-hash-66fce1f61e7e088bd92e19ecb0dd94553de6f824
    git bisect good 69e91b0c890097cfb1781733f260644bb1ec9ee8
    # good: [426fdcf0b13134ffbc7f92900d5862fad3723192] source-hash-c9f3c508bb1a1d94fd6172b9cdac30278559f31c
    git bisect good 426fdcf0b13134ffbc7f92900d5862fad3723192
    # bad: [9f0fd4c327fa8765bef8a0e2a0e85a620ab6f835] source-hash-fe1e1d1292973211f0edfa64c05279ca59634652
    git bisect bad 9f0fd4c327fa8765bef8a0e2a0e85a620ab6f835
    # bad: [ea5723ff672d4a3ffb9ee123a96dda4681703a50] source-hash-1b8cfd5d1414da9127ff18e4701b40f5bceabb36
    git bisect bad ea5723ff672d4a3ffb9ee123a96dda4681703a50
    # first bad commit: [ea5723ff672d4a3ffb9ee123a96dda4681703a50] source-hash-1b8cfd5d1414da9127ff18e4701b40f5bceabb36
Comment 13 Matthew Francis 2015-01-14 11:16:35 UTC
This seems to have been caused by commit 2fc4f5cd750cc3899ffc6b2f831edcb01e7773b4, though it's not clear from the commit message why that would be. Before the commit, saving attachment 109122 [details] is ~instant, while after it takes minutes
(* it was necessary to cherry pick the fix from commit 498c314861f0913a5b31ee29efc38aad12c3a781 to avoid a crash opening the file)

Adding Cc: to sbergman@redhat.com; Any idea what relevance this commit could have to a massive performance regression on save? Thanks


commit 2fc4f5cd750cc3899ffc6b2f831edcb01e7773b4
Author: Kohei Yoshida <kohei.yoshida@collabora.com>
Date:   Mon May 19 16:00:39 2014 -0400

    Register OpenGL windows to ScGridWindow upon file load.
    
    Change-Id: I6fdce232ef41079f20a983dc72da032196078858

commit 498c314861f0913a5b31ee29efc38aad12c3a781
Author: Stephan Bergmann <sbergman@redhat.com>
Date:   Tue May 20 09:41:59 2014 +0200

    Hack to for now work around dpy == null crashing CppunitTest_chart2_*
    
    ...since 2fc4f5cd750cc3899ffc6b2f831edcb01e7773b4 "Register OpenGL windows to
    ScGridWindow upon file load."  Should be revisited and fixed properly.
    
    Change-Id: I5a3d08cb750fd9ec7c98d5b85250eaabd1c7a8b2
Comment 14 Stephan Bergmann 2015-01-14 11:32:57 UTC
(In reply to Matthew Francis from comment #13)
> This seems to have been caused by commit
> 2fc4f5cd750cc3899ffc6b2f831edcb01e7773b4, though it's not clear from the
> commit message why that would be. Before the commit, saving attachment
> 109122 [details] is ~instant, while after it takes minutes
> (* it was necessary to cherry pick the fix from commit
> 498c314861f0913a5b31ee29efc38aad12c3a781 to avoid a crash opening the file)
> 
> Adding Cc: to sbergman@redhat.com; Any idea what relevance this commit could
> have to a massive performance regression on save? Thanks

No idea at all about 2fc4f5cd750cc3899ffc6b2f831edcb01e7773b4, and no idea whether 498c314861f0913a5b31ee29efc38aad12c3a781 has any (additional) performance impact.  The latter commit was a completely uneducated attempt at making "make check" work again, without much insight into the former commit's content; it looks like I pinged Kohei and Moggi (now on cc) about the latter commit's "Should be revisited and fixed properly," back then, but don't remember anything further.
Comment 15 Robinson Tryon (qubit) 2015-12-13 11:10:50 UTC Comment hidden (obsolete)
Comment 16 Xisco Faulí 2016-10-03 09:23:36 UTC
Adding Cc: to Kohei Yoshida
Comment 17 Commit Notification 2017-08-17 05:38:11 UTC
Dennis Francis committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=b2cb3196868deb499bb91fc9bdbe6c8f7d1e9033

tdf#86019: Use ChartDataRowSource_COLUMNS as the chart...

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Terrence Enger 2017-08-18 15:06:27 UTC
I am sorry to report that my results are disappointing.

Working on debian-stretch with the daily Linux dbgutil bibisect
repository, I see CPU times to open attachments
rowing-time-r-{7,10}.ods:

    LO                    ...7...      ...10...
    ------------------    -------      ---------
    dbgutil 2017-08-16    35.66        14.14
    dbgutil 2017-08-18    37.62        14.27
Comment 19 Markus Mohrhard 2017-08-18 15:08:17 UTC
(In reply to Terrence Enger from comment #18)
> I am sorry to report that my results are disappointing.
> 
> Working on debian-stretch with the daily Linux dbgutil bibisect
> repository, I see CPU times to open attachments
> rowing-time-r-{7,10}.ods:
> 
>     LO                    ...7...      ...10...
>     ------------------    -------      ---------
>     dbgutil 2017-08-16    35.66        14.14
>     dbgutil 2017-08-18    37.62        14.27

Don't ever report performance results with a dbgutil or a debug build. They are absolutely useless.

Either a release build or a symbols build.
Comment 20 Dennis Francis 2017-08-18 17:02:56 UTC
(In reply to Terrence Enger from comment #18)
> I am sorry to report that my results are disappointing.
> 
> Working on debian-stretch with the daily Linux dbgutil bibisect
> repository, I see CPU times to open attachments
> rowing-time-r-{7,10}.ods:
> 
>     LO                    ...7...      ...10...
>     ------------------    -------      ---------
>     dbgutil 2017-08-16    35.66        14.14
>     dbgutil 2017-08-18    37.62        14.27

Could you check the *save* performances of these documents in a release build as that is the main problem stated in the bug report content/comments/bibisection as opposed to what the title says. In my experiments with the last patch both "good" save perf document and "poor" save perf document have similar *save* performances in terms of running time. Thanks.
Comment 21 Xavier Van Wijmeersch 2017-08-19 10:50:25 UTC
opening the spreadsheet poor save performance takes 19 seconds
and saving takes 40 seconds

Version: 5.4.2.0.0+
Build ID: ddba061b07e55dfd2c91320ca790ff88618b132c
CPU threads: 8; OS: Linux 4.9; UI render: default; VCL: kde4; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:libreoffice-5-4, Time: 2017-08-17_01:01:48
Locale: nl-BE (en_US.UTF-8); Calc: group
Comment 22 Aron Budea 2017-08-19 12:30:09 UTC
(In reply to Xavier Van Wijmeersch from comment #21)
> opening the spreadsheet poor save performance takes 19 seconds
> and saving takes 40 seconds
> 
> Version: 5.4.2.0.0+
> Build ID: ddba061b07e55dfd2c91320ca790ff88618b132c


The commit is only in the 6.0 (master) branch for now.
Comment 23 Commit Notification 2017-08-23 21:59:45 UTC
Dennis Francis committed a patch related to this issue.
It has been pushed to "libreoffice-5-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=afd5b4b7eda6171817f5fa1a426d4c11a1a734c9&h=libreoffice-5-4

tdf#86019: Use ChartDataRowSource_COLUMNS as the chart...

It will be available in 5.4.2.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 24 Xisco Faulí 2017-09-23 08:17:29 UTC
A polite ping to Dennis Francis: is this bug fixed? if so, please close it as RESOLVED FIXED