Bug 97177 - FILEOPEN: Document with huge autofilter takes long to open
Summary: FILEOPEN: Document with huge autofilter takes long to open
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.0.0.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.1.0 target:7.0.0.1
Keywords: perf
Depends on:
Blocks: File-Opening
  Show dependency treegraph
 
Reported: 2016-01-16 11:32 UTC by sergio.callegari
Modified: 2020-07-18 22:25 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Same file in ODS format (6.80 MB, application/vnd.oasis.opendocument.spreadsheet)
2016-01-16 12:38 UTC, Usama
Details
Example file ODS (14.39 MB, application/vnd.oasis.opendocument.spreadsheet)
2020-06-05 18:47 UTC, Telesto
Details
Example file XLSX (14.87 MB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2020-06-05 18:51 UTC, Telesto
Details
framegraph ( 1st try ) (567.94 KB, image/svg+xml)
2020-06-19 14:51 UTC, Xisco Faulí
Details
framegraph ( 2st try ) (3.06 MB, image/svg+xml)
2020-06-20 16:32 UTC, Xisco Faulí
Details
perf flamegraph (414.91 KB, application/x-bzip)
2020-06-22 11:59 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description sergio.callegari 2016-01-16 11:32:20 UTC
Please refer to the file at https://www.elsevier.com/__data/assets/excel_doc/0015/91122/title_list.xlsx that has already been used as a benchmark in a large number of reports about LibO fileopen/filesave speed.  This is a huge xlsx file with all the scientific publications indexed in the Scopus database.

Recently, the opening time of such file has been addressed. In fact, recent LibO is quite efficient at opening it (6-7 s on my machine).  Thanks to the developers for that!

Unfortunately, a weird side effect of this action, is that LibO is now *much* more efficient at dealing with large spreadsheet when those are in the xlsx format than it its own one.

To see it, after having opened the file in attachment, save it as ods.  Saving still takes a lot of time, but I believe there is a separate bug for that. This is not the issue here.

Now, close LibO and restart it, trying to open the file again... This takes a long time (49-50 s on my machine).

The bad aspect is not just the long absolute opening time of the ods file. It is the weird fact that opening the native format takes 600% more time than opening the foreign one.  At least the file in the native file format is smaller (about 45% smaller).
Comment 1 Usama 2016-01-16 12:38:36 UTC
Created attachment 121986 [details]
Same file in ODS format

Hello,

Thank you for reporting the bug. I can confirm that the bug is present in LO 5.1.0.1 on Ubuntu 15.04

Opening the ODS file takes x10 times longer than the XLSX file that is the origin of the ODS one.
Comment 2 Tor Lillqvist 2016-01-18 09:44:54 UTC
"Native format" is a marketing / advocacy term, not a technical one.
Comment 3 sergio.callegari 2016-01-18 16:40:18 UTC
Always thought it was it was technical. Aren't /native/ formats those format in which LibO is validated to have correct roundtrip (and for which the internal data structures are guaranteed to have a non lossy correspondence with the file ones?) Isn't this the reason why LibO shows a warning dialog when saving in other formats?
Comment 4 Markus Mohrhard 2016-01-20 04:27:58 UTC
The specification around XLSX makes it much easier to tune the import performance than the ODF specification. We employ a number of steps in the XLSX import filter that are not really possible in ODF without changing the file format.
Comment 5 sergio.callegari 2016-01-20 09:59:57 UTC
Thanks, this is an interesting clarification.

One more question. Most of the time spent in loading the file in the ODS format, goes /before/ the progress bar starts working. Any clue about this? Is there anything that can be done about this specific aspect?
Comment 6 QA Administrators 2017-03-06 13:57:01 UTC Comment hidden (obsolete)
Comment 7 sergio.callegari 2017-03-07 13:18:33 UTC
Bug still present as of 5.3.1 RC 1.

On an haswell laptop, the xlsx file opens in a matter of seconds, the ods file opens in a time that is likely twenty times larger, making the ods format not an practical option.

In my opinion the most serious issue and the worse part of the matter is that the LibO file causes LibO to appear as frozen for a long time before showing a progress bar at all.

On opening a file, LibO should first of all show visual elements capable of giving the user some feedback about what is going on and only later start its file loading task.

Conversely, at least in Linux, when trying to open the ods file, you end up looking at an empty window that, not getting drawn, shows through the other applications that were previously on the desktop for almost a minute. E.g., if you had firefox open, you still see firefox through a frozen transparent LibO window, which prevents you from interacting with firefox unless you realize that you need to minimize it. Hence, LibO also ends up messing the Desktop experience.
Similarly, you get an entry that stays unnamed for about a minute on the task bar.
Comment 8 QA Administrators 2018-03-08 03:44:52 UTC Comment hidden (obsolete)
Comment 9 sergio.callegari 2018-03-08 11:00:04 UTC
Still an issue as of 6.0.2.1.

Opening the large spreadsheet in excel format takes much longer than in the xlsx format.

Again, issue is made much worse by the way in which progress bars appear.

"Loading" progress bar starts appearing only halfway through the opeining, meaning that the user for ~ half a minute only sees an apparently hung screen.
Comment 10 Roman Kuznetsov 2019-02-28 10:04:38 UTC
There isn't file on link https://www.elsevier.com/__data/assets/excel_doc/0015/91122/title_list.xlsx
Someone has it on his locale machine? Please attach it into bug directly

ODS file : opening it took 2m30sec in

Version: 6.3.0.0.alpha0+
Build ID: c57dc7d41bd62f933cffab6131edb7252606382d
CPU threads: 4; OS: Windows 6.1; UI render: default; VCL: win; 
Locale: ru-RU (ru_RU); UI-Language: en-US
Calc: threaded
Comment 11 Telesto 2020-06-05 10:22:17 UTC
*** Bug 108285 has been marked as a duplicate of this bug. ***
Comment 12 Telesto 2020-06-05 10:29:29 UTC
Changing to High -> Major
* Tedious slowness..
* Native format being substantial slower compared XLSX format; quite a bad impression.. (I'm actually using the XLSX to avoid the waiting time for ODS)
* An optimization seems to be within reach.. looking at the XLSX performance
Comment 13 Telesto 2020-06-05 11:52:03 UTC Comment hidden (obsolete)
Comment 14 Xisco Faulí 2020-06-05 18:36:32 UTC
I'm sure there will always be a XLSX file that open faster in Libo than the same file as ODS, but what's the point of having such a bug? it's too general and it will never be solved. Instead, let's use this ticket to focus on the performance issue seen while opening the attached document

it takes

real	1m59,400s
user	2m5,970s
sys	0m9,378s

in

Version: 7.1.0.0.alpha0+
Build ID: 8aee8c0cf1bdda1866594e75b0f9bd4b9a69c724
CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 15 Xisco Faulí 2020-06-05 18:37:26 UTC
@Julien, would you mind getting a flamegraph from the attached document ?
Comment 16 Telesto 2020-06-05 18:47:53 UTC
Created attachment 161667 [details]
Example file ODS
Comment 17 Telesto 2020-06-05 18:51:10 UTC
Created attachment 161668 [details]
Example file XLSX
Comment 18 Telesto 2020-06-05 18:56:34 UTC
(In reply to Xisco Faulí from comment #14)
> I'm sure there will always be a XLSX file that open faster in Libo than the
> same file as ODS, but what's the point of having such a bug? it's too
> general and it will never be solved. Instead, let's use this ticket to focus
> on the performance issue seen while opening the attached document
> 
> it takes
> 
> real	1m59,400s
> user	2m5,970s
> sys	0m9,378s
> 
> in
> 
> Version: 7.1.0.0.alpha0+
> Build ID: 8aee8c0cf1bdda1866594e75b0f9bd4b9a69c724
> CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3
> Locale: en-US (en_US.UTF-8); UI: en-US
> Calc: threaded

Sounds like GTK3.. 30 seconds
Version: 7.1.0.0.alpha0+ (x64)
Build ID: 191288d6a7fb52b31038a21c4e71ee57ffa3bacd
CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: Skia/Raster; VCL: win
Locale: nl-NL (nl_NL); UI: en-US
Calc: CL
Comment 19 Telesto 2020-06-05 19:02:56 UTC
(In reply to Xisco Faulí from comment #14)
> I'm sure there will always be a XLSX file that open faster in Libo than the
> same file as ODS, but what's the point of having such a bug? it's too
> general and it will never be solved

Please explain.. to general.. the ODS takes 20 seconds.. the XLSX file within 7 seconds.. 

There is comment 4 - explaining the issue, but some improvement wouldn't hurt (if possible).. or should the XLSX format be default for 1024plus-Column?
Comment 20 Julien Nabet 2020-06-05 19:26:56 UTC
(In reply to Xisco Faulí from comment #15)
> @Julien, would you mind getting a flamegraph from the attached document ?

You can use https://bugs.documentfoundation.org/attachment.cgi?id=161648

More generally, let's wait more feedback from Flamegraphs before providing more.
Comment 21 Xisco Faulí 2020-06-05 19:45:25 UTC
(In reply to Julien Nabet from comment #20)
> (In reply to Xisco Faulí from comment #15)
> > @Julien, would you mind getting a flamegraph from the attached document ?
> 
> You can use https://bugs.documentfoundation.org/attachment.cgi?id=161648

I fail to see the relation between this ticket and bug 161648

> More generally, let's wait more feedback from Flamegraphs before providing
> more.

fair enough
Comment 22 Xisco Faulí 2020-06-05 19:58:07 UTC
(In reply to Telesto from comment #18)
> (In reply to Xisco Faulí from comment #14)
> > I'm sure there will always be a XLSX file that open faster in Libo than the
> > same file as ODS, but what's the point of having such a bug? it's too
> > general and it will never be solved. Instead, let's use this ticket to focus
> > on the performance issue seen while opening the attached document
> > 
> > it takes
> > 
> > real	1m59,400s
> > user	2m5,970s
> > sys	0m9,378s
> > 
> > in
> > 
> > Version: 7.1.0.0.alpha0+
> > Build ID: 8aee8c0cf1bdda1866594e75b0f9bd4b9a69c724
> > CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3
> > Locale: en-US (en_US.UTF-8); UI: en-US
> > Calc: threaded
> 
> Sounds like GTK3.. 30 seconds
> Version: 7.1.0.0.alpha0+ (x64)
> Build ID: 191288d6a7fb52b31038a21c4e71ee57ffa3bacd
> CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: Skia/Raster; VCL: win
> Locale: nl-NL (nl_NL); UI: en-US
> Calc: CL

Same time with x11

Version: 7.1.0.0.alpha0+
Build ID: 8aee8c0cf1bdda1866594e75b0f9bd4b9a69c724
CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: x11
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

and

Versão: 6.4.3.2 (x64)
ID da versão: 747b5d0ebf89f41c860ec2a39efd7cb15b54f2d8
Processos do CPU: 1; SO: Windows 6.1 Service Pack 1 Build 7601; Gestão da interface: padrão; VCL: win; 
Configuração regional: es-ES (es_ES); Idioma da interface: pt-PT
Calc: threaded

Again, if a file is opening faster with one filter than with another one, it's a problem in the filter itself.
Even better, I would use CSV as default for 1024plus-Column files. sarcasm off
Comment 23 Xisco Faulí 2020-06-05 20:01:17 UTC
Files attached are completely unrelated
Comment 24 Xisco Faulí 2020-06-05 20:08:07 UTC
Same results when multi-threaded calculation is disabled
Comment 25 Buovjaga 2020-06-17 09:14:46 UTC
Noel wrote a patch for this: https://gerrit.libreoffice.org/c/core/+/96516

Before:
real    0m15,074s
user    0m15,847s
sys     0m2,081s

After:
real    0m12,268s
user    0m14,442s
sys     0m0,749s

Arch Linux 64-bit
Version: 7.1.0.0.alpha0+
Build ID: ad0351b84926075297fb74abbe9b31a0455782af
CPU threads: 8; OS: Linux 5.7; UI render: default; VCL: kf5
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded
Built on 17 June 2020
Comment 26 Julien Nabet 2020-06-17 09:24:03 UTC
(In reply to Buovjaga from comment #25)
> Noel wrote a patch for this: https://gerrit.libreoffice.org/c/core/+/96516
> ...

Yes but for the moment a QA test fails on Jenkins.
Comment 27 Xisco Faulí 2020-06-17 12:34:54 UTC
For me it takes

real	1m38,793s
user	1m43,851s
sys	0m9,720s

without the patch and

real	1m22,985s
user	1m34,677s
sys	0m3,728s

with the patch
Comment 28 Xisco Faulí 2020-06-19 14:51:19 UTC
Created attachment 162210 [details]
framegraph ( 1st try )
Comment 29 Xisco Faulí 2020-06-20 16:32:33 UTC
Created attachment 162230 [details]
framegraph ( 2st try )
Comment 30 Commit Notification 2020-06-22 07:37:20 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/aabcf90da9a90240bddc140485f210dcab66724c

tdf#97177 speedup loading of large ODS file

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 31 Noel Grandin 2020-06-22 07:40:18 UTC
Pushed a small improvement, sorry, I don't have any more ideas
Comment 32 Xisco Faulí 2020-06-22 10:31:04 UTC
after Noel's commit it takes

real	1m10,605s
user	1m31,436s
sys	0m3,892s

and before 

real	1m25,321s
user	1m41,265s
sys	0m8,422s

so some improvement has been done. For the record, I double my ram last week and that might explain why it takes less time now than in previous measurements.
Comment 33 Julien Nabet 2020-06-22 11:59:36 UTC
Created attachment 162304 [details]
perf flamegraph

Here's a Flamegraph retrieved on pc Debian x86-64 with master sources updated today (after https://cgit.freedesktop.org/libreoffice/core/commit/?id=aabcf90da9a90240bddc140485f210dcab66724c:
"tdf#97177 speedup loading of large ODS file
drop ScSimpleRangeList and just use ScRangeList, which saves us a
conversion step.

Then teach ScRangeList to do a simple merge, since we are loading in row
order, and can just check the last few entries.

Then fix a case of optimisation doing the wrong thing in
ScAttrArray::SetPatternAreaImpl where std::vector::reserve repeatedly
resizes the data array and breaks the normal doubling-resizing inside
vector.

On my machine the time goes from 5.4s to 4.8s")
Comment 34 Commit Notification 2020-06-22 12:29:28 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-7-0":

https://git.libreoffice.org/core/commit/d1f613399d9c188ff4d5f41a57fbea6b02b536d9

tdf#97177 speedup loading of large ODS file

It will be available in 7.0.0.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.