Bug 141182 - Excessively large ODS spreadsheet hangs LibreOffice
Summary: Excessively large ODS spreadsheet hangs LibreOffice
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.1.1.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.4.0
Keywords: perf, wantBacktrace
Depends on:
Blocks:
 
Reported: 2021-03-22 17:18 UTC by Andrej Shadura
Modified: 2022-03-11 09:44 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
Vykaz_opatrenie_2_02_2021.ods (938.34 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-03-22 17:19 UTC, Andrej Shadura
Details
Merged Cell removed (18.12 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-03-22 23:26 UTC, [REDACTED]
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrej Shadura 2021-03-22 17:18:41 UTC
Description:
I was going to fill in a government form at https://www.pomahameludom.sk/ called Vykaz_opatrenie_2_02_2021.ods (the link is called "Výkaz vo formáte ODS (961 KB)" in the section "2) SZČO, ktorým poklesli tržby") and it repeatedly made LO unresponsive. Having inspected the file, I found a 227408K content.xml file inside, mostly consisting of empty rows.

Steps to Reproduce:
1. Open Vykaz_opatrenie_2_02_2021.ods


Actual Results:
LibreOffice hangs

Expected Results:
Spreadsheet opens


Reproducible: Always


User Profile Reset: No



Additional Info:
N/A
Comment 1 Andrej Shadura 2021-03-22 17:19:25 UTC
Created attachment 170639 [details]
Vykaz_opatrenie_2_02_2021.ods
Comment 2 Xisco Faulí 2021-03-22 20:37:05 UTC
it takes

real	1m0,949s
user	1m4,331s
sys	0m1,198s


in

Version: 7.2.0.0.alpha0+ / LibreOffice Community
Build ID: 5262a9e88037decc26da84e7fa62f2955d4cdb85
CPU threads: 4; OS: Linux 5.7; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

to open.
How long does it take for you ?
Comment 3 [REDACTED] 2021-03-22 23:26:14 UTC
Created attachment 170642 [details]
Merged Cell removed

The problem is caused by one huge merged cell, which merges range B20:I1048576
into one single cell. Removing that merged cell finally ends up in content.xml of roughly 85 kB. See attachment.
Comment 4 [REDACTED] 2021-03-22 23:36:52 UTC
(In reply to Uwe Auer from comment #3)

> into one single cell. Removing that merged cell finally ends up in
> content.xml of roughly 85 kB. See attachment.

Commenting my own comment: "Removing" is an imprecise phrase here. Of course it should mean that I broke up the merging of cells B20:I1048576 and re-created a merging of cell range H20:I20 into a new merged cell.
Comment 5 m.a.riosv 2021-03-23 01:04:34 UTC
Excel 365 it's not able to open the file.
Comment 6 Andrej Shadura 2021-03-23 08:05:11 UTC
(In reply to Xisco Faulí from comment #2)
> to open.
> How long does it take for you ?

In the end it *did* open (after about five minutes of LibreOffice being just a grey window not reacting to clicks and not updating itself), but because the sheet was password-protected, I was unable to edit it from LibreOffice directly — I had to unzip it and remove password protection by sed (vim loaded the file but became unusable).
Comment 7 Commit Notification 2022-03-05 16:25:29 UTC
Luboš Luňák committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/be99b23809687ca1143c8fe8d4ec3cfe6703c363

don't bother scanning nonexistent data (tdf#141182)

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Roman Kuznetsov 2022-03-06 16:53:15 UTC
it took 30 sec for opening the file in

Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: 7ac19fbce8a35f559eebb879cd0f232bfc95e703
CPU threads: 4; OS: Mac OS X 12.1; UI render: default; VCL: osx
Locale: ru-RU (ru_RU.UTF-8); UI: en-US
Calc: threaded Jumbo

Here is the mac mini 2014 with 2core Intel i5
Comment 9 Xisco Faulí 2022-03-08 12:39:03 UTC
before the fix it takes

real	3m50,839s
user	3m56,049s
sys	0m0,957s

and after it takes

real	1m5,081s
user	1m11,620s
sys	0m0,850s
Comment 10 Xisco Faulí 2022-03-08 13:32:39 UTC
Unfortunately there was a performance regression before the fix introduced by:

author	Justin Luth <justin_luth@sil.org>	2021-12-08 14:22:01 +0200
committer	Justin Luth <jluth@mail.com>	2021-12-11 07:20:35 +0100
commit 297ab561c6754f89326a1e8ce1751233669578d7 (patch)
tree b942b5a59beb4def1afd8ae3c3ec7a7172d1d2cb
parent 489d7298d2e609ee5900f05ba0064845a7a551ce (diff)
tdf#128895 sc xmlimport: create enough dynamic cols if props

Before this commit, the import time was

real	0m41,846s
user	0m47,622s
sys	0m0,884s

and after

real	3m49,591s
user	3m55,001s
sys	0m1,028s

so even after Luboš' fix, it takes longer than before 297ab561c6754f89326a1e8ce1751233669578d7. Reopening...
Comment 11 Mike Kaganski 2022-03-09 08:38:18 UTC
(In reply to Xisco Faulí from comment #10)
> Unfortunately there was a performance regression before the fix introduced
> by:
> 
> commit 297ab561c6754f89326a1e8ce1751233669578d7 (patch)
> tdf#128895 sc xmlimport: create enough dynamic cols if props
> 
> so even after Luboš' fix, it takes longer than before
> 297ab561c6754f89326a1e8ce1751233669578d7. Reopening...

Then this is the Jumbo Sheets problem, and it (a) needs an own issue (this one is fixed, the mentioned regression is separate), and (b) it needs to block bug 133764.
Comment 12 Justin L 2022-03-09 12:37:25 UTC
(In reply to Mike Kaganski from comment #11)
> Then this is the Jumbo Sheets problem, and it (a) needs an own issue (this
> one is fixed, the mentioned regression is separate), and (b) it needs to
> block bug 133764.

Well, Xisco was seeing this without enabling Jumbo mode, so I don't think it is necessarily a Jumbo issue (although certainly that would make it even worse). And my patch wasn't a jumbo-fix either.

I don't see my commit as a true regression, but the end result just shows another case where calc needs to improve performance.

Having a 200+MB content.xml file is a bit of a failure in its own right. On the machinery I have to work with, I haven't found any GUI tool that lets me inspect that file (without crashing on load). So I'm washing my hands on this one. I have no problems having my patch reverted if that is deemed valuable at this time.
Comment 13 Luboš Luňák 2022-03-09 18:29:07 UTC
(In reply to Justin L from comment #12)
> I haven't found any GUI tool that lets me inspect that file (without crashing on load).

'xmllint --format file > file2' and view it as a normal text file.

> I have no problems having my patch reverted if that is deemed valuable at this time.

I see no reason for reverting it, correctness is more important than speed. And moreover I cannot reproduce, the file loads in 20s on my Ryzen 2500U.
Comment 14 Luboš Luňák 2022-03-11 09:11:17 UTC
And I've also pushed a fix that makes the workaround from 297ab561c6754f89326a1e8ce1751233669578d7 unnecessary. So I consider this fixed one way or another, if there's still some related problem, please file a new bugreport.
Comment 15 Xisco Faulí 2022-03-11 09:44:39 UTC
(In reply to Luboš Luňák from comment #14)
> And I've also pushed a fix that makes the workaround from
> 297ab561c6754f89326a1e8ce1751233669578d7 unnecessary. So I consider this
> fixed one way or another, if there's still some related problem, please file
> a new bugreport.

It takes

real	0m13,633s
user	0m16,895s
sys	0m0,429s

in

Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: dddee125cc32f1ad5228e598a7de04e9654e65c1
CPU threads: 8; OS: Linux 5.10; UI render: default; VCL: gtk3
Locale: es-ES (es_ES.UTF-8); UI: en-US
Calc: threaded

Thanks for fixing this issue!!