Bug 135215 - FILEOPEN: XLSX: long time to open file
Summary: FILEOPEN: XLSX: long time to open file
Status: RESOLVED DUPLICATE of bug 81765
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.1 all versions
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.4.0
Keywords: filter:xlsx, perf
Depends on:
Blocks: XLSX File-Opening
  Show dependency treegraph
 
Reported: 2020-07-28 08:53 UTC by Xisco Faulí
Modified: 2022-02-21 11:30 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
sample file (2.67 MB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2020-07-28 08:53 UTC, Xisco Faulí
Details
framegraph (39.97 KB, image/svg+xml)
2020-08-04 10:06 UTC, Xisco Faulí
Details
Flamegraph (129.19 KB, application/x-bzip)
2022-02-02 12:37 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Xisco Faulí 2020-07-28 08:53:50 UTC
Created attachment 163672 [details]
sample file

Steps to reproduce:
1. Open attached document

-> It takes

real	0m11,271s
user	0m8,051s
sys	0m1,396s


in

Version: 7.1.0.0.alpha0+
Build ID: b68c10a0d0e6f83b6b037da72210033cacb1677b
CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 1 Xisco Faulí 2020-07-28 09:09:37 UTC
also reproduced in

Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)
Comment 2 m.a.riosv 2020-08-04 07:57:44 UTC
Never ends for me trying to save even after clean all cell formats beyond data.
Versión: 6.4.6.1 (x64)
Id. de compilación: 985dd72ca280d5c6da2e9f90f7ff9286cafe7ff8
Subprocs. CPU: 4; SO: Windows 10.0 Build 20180; Repres. IU: predet.; VCL: win; 
Configuración regional: es-ES (es_ES); Idioma de IU: es-ES Calc:
Comment 3 Xisco Faulí 2020-08-04 10:06:12 UTC
Created attachment 163928 [details]
framegraph
Comment 4 Xisco Faulí 2020-08-04 10:06:53 UTC Comment hidden (obsolete)
Comment 5 Roman Kuznetsov 2020-08-13 18:46:00 UTC Comment hidden (obsolete)
Comment 6 Xisco Faulí 2020-09-25 10:46:14 UTC Comment hidden (obsolete)
Comment 7 Noel Grandin 2020-09-25 13:05:42 UTC
This is some kind of pathological document with 100k+ stylesheets.

Tor, I seem to remember you did some kind of stylesheet deduplication work at one point? Can you dig out that commit?
Comment 8 Tor Lillqvist 2020-09-25 13:46:03 UTC
I don't remember. Looking at my commits, you might be thinking of ea55492a6e55290d92a59324b3cb31ed958981ab perhaps, but that is about de-duplicating conditional formats, not style-sheets.
Comment 9 Roman Kuznetsov 2021-04-12 14:01:09 UTC
It takes more than 10 minuts and I killed the process

Version: 7.2.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 7a0e0a84a02f505200331c19b28d45e898cd5a12
CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: threaded Jumbo
Comment 10 Xisco Faulí 2021-05-03 09:58:43 UTC
Still reproducible in

Version: 7.2.0.0.alpha0+ / LibreOffice Community
Build ID: 95d8eb87eb20351a2e5795fc8c16653c0f58d6b4
CPU threads: 4; OS: Linux 5.7; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

I killed LibreOffice after

real	6m36,025s
user	6m32,261s
sys	0m5,961s
Comment 11 Xisco Faulí 2022-02-01 16:55:43 UTC
it seems the import time improved. in

Version: 7.3.0.0.alpha1+ / LibreOffice Community
Build ID: 229123ccc6f90ebf66b3e659bebbd53f8a9bdd3a
CPU threads: 8; OS: Linux 5.10; UI render: default; VCL: gtk3
Locale: es-ES (es_ES.UTF-8); UI: en-US
Calc: threaded

I killed LibreOffice after

real	8m21,517s
user	8m22,417s
sys	0m0,761s

in

Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: 0b397d8ef0a2615e8e6202804ca2f6cb58436fa5
CPU threads: 8; OS: Linux 5.10; UI render: default; VCL: gtk3
Locale: es-ES (es_ES.UTF-8); UI: en-US
Calc: threaded

it takes

real	0m55,122s
user	0m56,226s
sys	0m0,642s
Comment 12 Roman Kuznetsov 2022-02-01 18:27:55 UTC
(In reply to Xisco Faulí from comment #11)
> it seems the import time improved. in
> 
> in
> 
> Version: 7.4.0.0.alpha0+ / LibreOffice Community
> Build ID: 0b397d8ef0a2615e8e6202804ca2f6cb58436fa5
> CPU threads: 8; OS: Linux 5.10; UI render: default; VCL: gtk3
> Locale: es-ES (es_ES.UTF-8); UI: en-US
> Calc: threaded
> 
> it takes
> 
> real	0m55,122s
> user	0m56,226s
> sys	0m0,642s

Confirm, now I can open it, it took around 3 minutes on my ultra mobile CPU
Comment 13 Xisco Faulí 2022-02-01 20:37:00 UTC
I did a reverse bisection and the situation got improved by

author	Noel Grandin <noel.grandin@collabora.co.uk>	2022-01-28 16:32:35 +0200
committer	Noel Grandin <noel.grandin@collabora.co.uk>	2022-01-29 10:40:31 +0100
commit 7f3682ecb8a40fe85b6525be9e73d49d76bb308b (patch)
tree 9f71d9a89ecb3d579cea72ebbd454817ae9dad51
parent 91622529794f0c519bec2938513a756f660e849c (diff)
fix loading file with very large number of styles

which is also backported to libreoffice-7-3

@Julien, any chance you could get a new framegraph ?
Comment 14 Julien Nabet 2022-02-02 12:37:29 UTC
Created attachment 177990 [details]
Flamegraph

Here's a Flamegraph retrieved on pc Debian x86-64 with master sources updated today (bff5d1a68a8b6f5776c5edb4ef0f919af1194d03) + gen rendering.
Comment 15 Luboš Luňák 2022-02-20 14:40:23 UTC
*** Bug 81765 has been marked as a duplicate of this bug. ***
Comment 16 Luboš Luňák 2022-02-20 18:31:18 UTC

*** This bug has been marked as a duplicate of bug 81765 ***
Comment 17 Commit Notification 2022-02-20 19:47:00 UTC
Luboš Luňák committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e81400196cd9c24be32552a19851da4162d51c7a

fix ScPatternAttr lookup hashing (tdf#135215)

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Xisco Faulí 2022-02-21 10:10:54 UTC
it takes

real	0m21,637s
user	0m22,673s
sys	0m0,622s


in

Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: 0723b41bed9bb4ad50d2993744a60177966d1a21
CPU threads: 8; OS: Linux 5.10; UI render: default; VCL: gtk3
Locale: es-ES (es_ES.UTF-8); UI: en-US
Calc: threaded

Very nice improvement.
@Luboš Luňák, thanks for fixing this issue!!
Comment 19 Commit Notification 2022-02-21 11:30:13 UTC
Luboš Luňák committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b26c34267cdf9d0b7ba4e2fda7ae706d5cd76299

replace SfxPoolItem::LookupHashCode() with Lookup() (tdf#135215)

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.