Bug 81765 - FILEOPEN: slow loading in minutes of .ods and .xlsx with >1000 of conditional formats, also dump
Summary: FILEOPEN: slow loading in minutes of .ods and .xlsx with >1000 of conditional...
Status: REOPENED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.1.6.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: BSA target:6.3.0
Keywords: haveBacktrace, perf
: 61484 115488 (view as bug list)
Depends on:
Blocks: Conditional-Formatting
  Show dependency treegraph
 
Reported: 2014-07-25 22:27 UTC by Eduardo Moreno
Modified: 2019-05-03 08:40 UTC (History)
11 users (show)

See Also:
Crash report or crash signature:


Attachments
The file original in ODF format and exported to OOXML (424.48 KB, application/x-zip-compressed)
2014-07-25 22:27 UTC, Eduardo Moreno
Details
Callgrind output from master (6.42 MB, application/x-xz)
2018-09-22 14:22 UTC, Buovjaga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eduardo Moreno 2014-07-25 22:27:35 UTC
Created attachment 103478 [details]
The file original in ODF format and exported to OOXML

Problem description:
I create many files with a template. Fill the data with macros. But now, when open the file, if the file is in local hard disk takes 5 minutes to open. If the file is in a shared folder takes 30 minutes or crash LibO. The problem is with very files.
Now, export the file to OOXML format and open with MS Excel. The file open in 5 seconds, in hard disk or net.

Steps to reproduce:
1. Download the files in ODF format
2. Open with LibreOffice and measure the time
3. Download the file in XLSX format
4. Open with MS Excel and measure the time 

Current behavior:
Long time to open in LibreOffice.

Expected behavior:
The some time to open in LibreOffice and Ms Excel.

              
Operating System: Windows 7
Version: 4.2.5.2 release
Comment 1 tommy27 2014-07-26 11:35:53 UTC
very slow loading of both the .ods and .xlsx version of that file.
tested under Win7x64 using LibO 4.2.5.2
Comment 2 Eike Rathke 2014-07-28 10:00:18 UTC
Also slow in 4.1.6.2
Comment 3 Eike Rathke 2014-07-28 12:41:13 UTC
Slowness is due to the massive amount (>1000) of conditional formats, most on sheets FT-ING-PR-02 and FT-ING-PR-03, many of them even applied to identical ranges. Looks like a result of heavy copy&paste actions, or the macro you mentioned creates them. All these conditionals also extremely slow down scrolling around in the document once it was loaded.

@Markus:
Shouldn't a conditional style when pasted be merged with its neighbors if identical? Can we somehow detect and correct this mess during document load?

@Eduardo:
You could unprotect the sheets and invoke Format->Conditional->Manage and manually remove all the single formats and extend one of them to cover the needed ranges. It's quite a tedious task though.
Comment 4 Eduardo Moreno 2014-07-28 14:26:12 UTC
(In reply to comment #3)
> Slowness is due to the massive amount (>1000) of conditional formats, most
> on sheets FT-ING-PR-02 and FT-ING-PR-03, many of them even applied to
> identical ranges. Looks like a result of heavy copy&paste actions, or the
> macro you mentioned creates them. All these conditionals also extremely slow
> down scrolling around in the document once it was loaded.
> 
> @Markus:
> Shouldn't a conditional style when pasted be merged with its neighbors if
> identical? Can we somehow detect and correct this mess during document load?
> 
> @Eduardo:
> You could unprotect the sheets and invoke Format->Conditional->Manage and
> manually remove all the single formats and extend one of them to cover the
> needed ranges. It's quite a tedious task though.

The password to unprotect is 123456. The macro create a conditional format when add a row. Never know how many rows will be add.

If you say the roblem is conditional format, I try to modify the macro to create only uone range of conditional format.

I report the changes.
Comment 5 Markus Mohrhard 2014-07-30 22:28:49 UTC
(In reply to comment #3)
> Slowness is due to the massive amount (>1000) of conditional formats, most
> on sheets FT-ING-PR-02 and FT-ING-PR-03, many of them even applied to
> identical ranges. Looks like a result of heavy copy&paste actions, or the
> macro you mentioned creates them. All these conditionals also extremely slow
> down scrolling around in the document once it was loaded.
> 
> @Markus:
> Shouldn't a conditional style when pasted be merged with its neighbors if
> identical? Can we somehow detect and correct this mess during document load?
> 

We already do that if a document is imported and does not contain the new conditional format entries. However it is much more complicated during copy&paste as we have now conditional formats that rely on the range.

Take the simple example of a conditional format that just formats one column with a color scale. Now copying that column to the next one has two solutions for what we could do with the conditional formatting: merge and create ahuge one spanning two columns or create a second one that just spans this new column.

There will always be cases where one or the other solution are wrong.

Therefore we have now the manage conditional formatting dialog which helps the user to change the formatting or range if it is not what he expected.

Sadly I don't see a perfect solution here.

Sadly the UNO API does not reflect all the possibilities (I have to support the old one as it is published) and does not allow to modify the range of a conditional formatting currently.
Comment 6 QA Administrators 2015-09-04 02:49:06 UTC Comment hidden (obsolete)
Comment 7 Timur 2018-07-25 12:18:40 UTC
To compare for me (one-time non-scientific measurement):
4.1:  6:40 i.e. 400 secs
6.2+: 2:50 i.e. 170 seconds
Making the title more precise.
Comment 8 Buovjaga 2018-09-22 14:22:53 UTC
Created attachment 145108 [details]
Callgrind output from master

Callgrind from opening the XLSX in case it might help

Arch Linux 64-bit
Version: 6.2.0.0.alpha0+
Build ID: 0ffa7a733d834647dfd59b864c52a015028822b6
CPU threads: 8; OS: Linux 4.18; UI render: default; VCL: gtk3_kde5; 
Locale: fi-FI (fi_FI.UTF-8); Calc: threaded
Built on September 21st 2018
Comment 9 Xisco Faulí 2019-04-02 13:58:46 UTC
ODS file takes

real	3m20,457s
user	3m19,592s
sys	0m0,441s

in

Version: 5.2.0.0.alpha1+
Build ID: 5b168b3fa568e48e795234dc5fa454bf24c9805e
CPU Threads: 4; OS Version: Linux 4.15; UI Render: default; 
Locale: ca-ES (ca_ES.UTF-8)

@Noel, seeing your recent optimizations, I thought you might be interested in this issue...
Comment 10 Xisco Faulí 2019-04-02 14:03:12 UTC
(In reply to Xisco Faulí from comment #9)
> ODS file takes
> 
> real	3m20,457s
> user	3m19,592s
> sys	0m0,441s
> 
> in
> 
> Version: 5.2.0.0.alpha1+
> Build ID: 5b168b3fa568e48e795234dc5fa454bf24c9805e
> CPU Threads: 4; OS Version: Linux 4.15; UI Render: default; 
> Locale: ca-ES (ca_ES.UTF-8)
> 
> @Noel, seeing your recent optimizations, I thought you might be interested
> in this issue...

I meant

Version: 6.3.0.0.alpha0+
Build ID: 3b518953a8141b0d5043c2f3996a92956fdc3a47
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded
Comment 11 Commit Notification 2019-04-20 06:03:17 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/ec7ba61a6164c805f5a71b077715b7e1521a2d62%5E%21

simplify SfxPoolItemArray_Impl (tdf#81765 related)

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Commit Notification 2019-04-20 06:18:15 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/bcb0c9b4bee1d943d9c60f9d4512dba901f85f54%5E%21

flatten SfxItemPool_Impl (tdf#81765 related)

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 13 Buovjaga 2019-04-20 07:57:21 UTC
Did a new build just now and the XLSX opening is significantly improved.

Opening XLSX

6.3.0:
real    1m33,838s
user    1m25,919s
sys     0m2,076s

6.2.2:
real    2m1,142s
user    2m0,547s
sys     0m0,272s

Opening ODS:

6.3.0:
real    1m34,639s
user    1m34,088s
sys     0m0,285s

6.2.2:
real    1m36,199s
user    1m35,475s
sys     0m0,286s
Comment 14 Noel Grandin 2019-04-20 08:15:17 UTC
buovjava, that was just laying the groundwork, haven't landed the real improvement yet :-)
Comment 15 Commit Notification 2019-04-20 16:19:48 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/003d11f410b7e515981b3efbd65d936d94d87121%5E%21

tdf#81765 slow loading of .ods with >1000 of conditional formats

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Noel Grandin 2019-04-20 17:11:11 UTC
I've got a couple of small tweaks still coming for this, but with the time now under 15s for me, I consider this closed
Comment 17 Telesto 2019-04-20 18:05:03 UTC
(In reply to Noel Grandin from comment #16)
> I've got a couple of small tweaks still coming for this, but with the time
> now under 15s for me, I consider this closed

Nice, many thanks Noel!
Comment 18 Commit Notification 2019-04-21 05:59:46 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/c757117afb398277a46e79ba22066c5bbf2c9f72%5E%21

tdf#81765 slow loading of .ods with >1000 of conditional formats, part 2

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Commit Notification 2019-04-21 14:25:08 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/aae39e5dbfde76726e3ad6a2e99874490515da58%5E%21

tdf#81765 slow loading of .ods with >1000 of conditional formats, part 3

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Xisco Faulí 2019-04-22 13:00:00 UTC
(In reply to Xisco Faulí from comment #9)
> ODS file takes
> 
> real	3m20,457s
> user	3m19,592s
> sys	0m0,441s

now it takes

real	0m28,094s
user	0m27,225s
sys	0m0,398s

in

Version: 6.3.0.0.alpha0+
Build ID: e913727c7ee3af0bb4031c6829abfb3373306492
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

the XLSX takes

real	0m29,372s
user	0m29,097s
sys	0m0,380s

@Noel, thanks a lot for the huge improvements here!!
Comment 21 Xisco Faulí 2019-04-22 13:13:25 UTC
*** Bug 61484 has been marked as a duplicate of this bug. ***
Comment 22 Xisco Faulí 2019-04-23 21:05:44 UTC
*** Bug 115488 has been marked as a duplicate of this bug. ***
Comment 23 Xisco Faulí 2019-04-23 21:08:08 UTC
Checked again in

Version: 6.3.0.0.alpha0+
Build ID: 0a04150b6eefb5feb7ecefaa5cd63dbac8c1574f
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

and the time measure is even better for me ( tried a few times )

real	0m22,981s
user	0m22,592s
sys	0m0,316s
Comment 24 Aron Budea 2019-04-24 04:59:18 UTC
Opening the XLS with today's daily build in Windows takes ~20s for me, nice improvement!

(In reply to Commit Notification from comment #15)
> https://git.libreoffice.org/core/+/
> 003d11f410b7e515981b3efbd65d936d94d87121%5E%21
> 
> tdf#81765 slow loading of .ods with >1000 of conditional formats
Let's mention the follow-up fix as well:
https://cgit.freedesktop.org/libreoffice/core/commit/?id=1e3d0a57689006cd7244481958025177c01e4d09
author		Noel Grandin <noel.grandin@collabora.co.uk>	2019-04-23 11:23:36 +0200
committer	Noel Grandin <noel.grandin@collabora.co.uk>	2019-04-23 19:56:53 +0200

fix bad comparison in StrCmp/StrLess
Comment 25 Commit Notification 2019-04-24 06:31:02 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/943a2f44bdc67cc795e553fd98693dba501ad717%5E%21

tdf#81765 slow loading of .ods with >1000 of conditional formats, part 4

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 26 Commit Notification 2019-05-02 15:58:21 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/585e0ac43b9bd8a2f714903034e435c84ae3fc96%5E%21

revert part of "tdf#81765 slow loading of .ods"

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 27 Noel Grandin 2019-05-02 16:16:48 UTC
Reopen this because I had to revert the major speedup part
Comment 28 Noel Grandin 2019-05-03 08:20:19 UTC
Attaching the IRC discussion of this problem 

May 02 14:10:15 <noelgrandin>	so of the items we stick into pools we modify all over the place
May 02 14:10:20 <noelgrandin>	s/so/some
May 02 14:10:39 <mst___>	noelgrandin, that sounds very wrong
May 02 14:11:44 <noelgrandin>	mst___, ScPatternAttr 
May 02 14:12:17 <noelgrandin>	and of course there are the SvxFieldItem things that editeng likes to modify
May 02 14:12:29 *	mst___ is surprised that such horrible abuse of interface contract is in sc, not sw...
May 02 14:13:12 <noelgrandin>	I think its a case of (a)busing the de-duplicate functionality they have
May 02 14:13:25 <erAck>	noelgrandin: where does that modify items in the pool?
May 02 14:14:27 <noelgrandin>	erAck, ScPatternAttr ends up in a pool, and ScPatternAttr extends SfxSetItem, and lots of places (1500+ lines) call GetItemSet() on it and modify stuff inside it, and it's operator== is dependant on it's contents
May 02 14:17:00 <noelgrandin>	which was not a problem when the code would always scan all the ScPatternAttr's in a pool to find a match, but as soon as I tried to make the scanning smarter...
May 02 14:19:21 *	erAck doesn't quite understand.. and doesn't have time to spend hours to dig into it now
May 02 14:21:11 <noelgrandin>	erAck, pool contain items, and when we add an item, we attempt to match against an existing item (de-duplicate for memory saving I guess). that process of matching becomes O(n^2) when loading certain documents. I made the scanning faster by sorting the list. But that sorting fails because various code likes to modify the ScPatternAttr that are inside a pool, resulting in an unsorted list inside the pool
May 02 14:23:44 <mmeeks>	noelgrandin: perhaps good to have some stack-traces from a hardware watchpoint of the state being changed (?) =)
May 02 14:24:59 <noelgrandin>	mmeeks, it gets changed from tons of places, I have a 1500+ line patch where I tried to move the changing inside the class so I could trigger a manual re-sort of the cached items in the pool.
May 02 14:25:02 <erAck>	noelgrandin: so what you call a pool is actually the SfxItemSet?
May 02 14:25:20 <noelgrandin>	erAck, no, the SfxItemSet is itself also an item in a pool 
May 02 14:25:41 <noelgrandin>	erAck, sorry, that should be SfxSetItem (which contains an SfxItemSet)
May 02 14:25:53 <noelgrandin>	all rather confusing
May 02 14:27:10 <noelgrandin>	anyhow, just a load-time perf improvement, backing it out no big deal
Comment 29 Xisco Faulí 2019-05-03 08:40:09 UTC
back to

real	3m44,168s
user	3m36,107s
sys	0m4,511s

in

Version: 6.3.0.0.alpha0+
Build ID: f20810f7829d9f3b7167df316e1303810b746366
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded