Bug 105036 - xlsx file with huge drawing.xml will not open
Summary: xlsx file with huge drawing.xml will not open
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.2.4.2 release
Hardware: All Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:xlsx, perf
Depends on:
Blocks: Calc-large-spreadsheets
  Show dependency treegraph
 
Reported: 2017-01-02 03:39 UTC by Howard Houliston
Modified: 2022-02-18 10:08 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
An xlsx file with four tabs (611.99 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2017-01-02 03:41 UTC, Howard Houliston
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Howard Houliston 2017-01-02 03:39:31 UTC
Description:
1. Attached xlsx file instead of opening sees open progress bar get to about 80% then stop. 
2. Happens if opening attempted via double-click or by starting Calc then using File > Open. 
3. Happens with version 5.2.4.2 in Win10 and version 5.1.4.2 in Linux Mint. 
4. Can be opened with Excel (2010).

Steps to Reproduce:
1.Attempt to open the file in whatever way you choose.


Actual Results:  
Open Progress bar quickly gets to about 80% and stops. Task Manager showed CPU usage had gone from around 30% to 60% but very little disk activity. CPU usage again went down to 30% when Calc was closed.

Expected Results:
Calc should have shown the contents of the spreadsheet


Reproducible: Always

User Profile Reset: No

Additional Info:


User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0
Comment 1 Howard Houliston 2017-01-02 03:41:15 UTC
Created attachment 130081 [details]
An xlsx file with four tabs
Comment 2 Bartosz 2017-01-03 01:00:33 UTC
In that .xlsx document the drawing.xml files are huge (about 350000 lines per sheet).
We should optimize opening of such big files.
Comment 3 QA Administrators 2018-01-04 03:35:37 UTC Comment hidden (obsolete)
Comment 4 Andrew 2018-04-14 06:21:02 UTC
This bug is still present in

Version: 6.0.3.2 (x64)
Build ID: 8f48d515416608e3a835360314dac7e47fd0b821
CPU threads: 4; OS: Windows 6.3; UI render: GL; 
Locale: en-GB (en_GB); Calc: CL
Comment 5 Roman Kuznetsov 2019-02-28 10:09:20 UTC
still repro in

Version: 6.3.0.0.alpha0+
Build ID: c57dc7d41bd62f933cffab6131edb7252606382d
CPU threads: 4; OS: Windows 6.1; UI render: default; VCL: win; 
Locale: ru-RU (ru_RU); UI-Language: en-US
Calc: threaded
Comment 6 Timur 2020-10-12 06:51:26 UTC
*** Bug 126185 has been marked as a duplicate of this bug. ***
Comment 7 Timur 2020-10-12 06:55:46 UTC
Another example is XLSX attachment 152496 [details] from bug 126185 with perf Flamegraph  attachment 166254 [details].
Comment 8 Roman Kuznetsov 2021-04-12 13:44:17 UTC Comment hidden (obsolete)
Comment 9 Xisco Faulí 2021-04-12 15:32:45 UTC
(In reply to Roman Kuznetsov from comment #8)
> OK, the XLSX file from attach has more than 1024 columns. If we enable our
> Jumbo sheet feature, then Calc takes about 10 sec to open the file
> 
> Version: 7.2.0.0.alpha0+ (x64) / LibreOffice Community
> Build ID: 7a0e0a84a02f505200331c19b28d45e898cd5a12
> CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: Skia/Raster; VCL:
> win
> Locale: ru-RU (ru_RU); UI: ru-RU
> Calc: threaded Jumbo
> 
> Xisco, can we close it as WFM or we should wait when the Jumbo sheet feature
> become a stable?

I think you should try without using jumbo sheet feature. The reason why it's faster using the feature it's because the charts are not imported at all, see bug 134553
Comment 10 Roman Kuznetsov 2021-04-12 16:01:51 UTC
(In reply to Xisco Faulí from comment #9)
> (In reply to Roman Kuznetsov from comment #8)
> > OK, the XLSX file from attach has more than 1024 columns. If we enable our
> > Jumbo sheet feature, then Calc takes about 10 sec to open the file
> > 
> > Version: 7.2.0.0.alpha0+ (x64) / LibreOffice Community
> > Build ID: 7a0e0a84a02f505200331c19b28d45e898cd5a12
> > CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: Skia/Raster; VCL:
> > win
> > Locale: ru-RU (ru_RU); UI: ru-RU
> > Calc: threaded Jumbo
> > 
> > Xisco, can we close it as WFM or we should wait when the Jumbo sheet feature
> > become a stable?
> 
> I think you should try without using jumbo sheet feature. The reason why
> it's faster using the feature it's because the charts are not imported at
> all, see bug 134553

Whithout Jumbo feature I get an error about more than 1024 columns and it takes a long time for opening
Comment 11 Luboš Luňák 2022-02-17 12:02:57 UTC
I see no problem with current master, so assuming one of my recent fixes took care of this.
Comment 12 Xisco Faulí 2022-02-17 16:02:13 UTC
(In reply to Luboš Luňák from comment #11)
> I see no problem with current master, so assuming one of my recent fixes
> took care of this.

Hi Luboš,
Actually, the import time regressed recently.
In a current master 

Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: 8942956e05f2208ffb666a2118f5db092c30ce6a
CPU threads: 8; OS: Linux 5.10; UI render: default; VCL: gtk3
Locale: en-GB (es_ES.UTF-8); UI: en-US
Calc: threaded Jumbo

it takes ( measured with OOO_EXIT_POST_STARTUP=1 )

real	0m37,653s
user	0m37,568s
sys	0m1,131s

while before

author	Luboš Luňák <l.lunak@collabora.com>	2022-02-07 18:06:12 +0100
committer	Luboš Luňák <l.lunak@collabora.com>	2022-02-08 12:26:18 +0100
commit fd4384c59eefc8f34d5fe90929d7cb44ee15b27f (patch)
tree 8543618389781675cafc6fabf88165d6e99b80ab
parent abc32f115ffd8df20ed122f6a769027b68da13f2 (diff)
avoid overflows in ScFlatUInt16RowSegments

it takes

real	0m6,427s
user	0m7,055s
sys	0m0,368s
Comment 13 Kevin Suo 2022-02-17 16:24:09 UTC Comment hidden (obsolete)
Comment 14 Xisco Faulí 2022-02-17 16:25:51 UTC
(In reply to Kevin Suo from comment #13)
> (In reply to Xisco Faulí from comment #12)
> Could you confirm you are not using a dbgutil build for such test on the
> recent master build, and did the same before and after that commit?

I checked with the bisect repository, thus, no dbgutil build
Comment 15 Luboš Luňák 2022-02-18 07:08:49 UTC
37s seems reasonable (it's 32s here), 6s is way too short. Given that the commit is a bugfix, presumably comment #9 applies.
Comment 16 Xisco Faulí 2022-02-18 10:05:02 UTC
(In reply to Luboš Luňák from comment #15)
> 37s seems reasonable (it's 32s here), 6s is way too short. Given that the
> commit is a bugfix, presumably comment #9 applies.

good point, let me try
Comment 17 Xisco Faulí 2022-02-18 10:08:20 UTC
it takes

real	0m32,677s
user	0m30,378s
sys	0m1,140s


in

Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: d697c96178d13725470192d63bd4fa1c202d0d2e
CPU threads: 8; OS: Linux 5.10; UI render: default; VCL: gtk3
Locale: es-ES (es_ES.UTF-8); UI: en-US
Calc: threaded

so the time is similar with and without jumbo sheets enabled. Sorry for the noise
@Luboš, thanks for your work on this.