Bug 163872 - FILEOPEN Long time to open xlsx file (ScInterpreter::ScIfJump is very expensive to run)
Summary: FILEOPEN Long time to open xlsx file (ScInterpreter::ScIfJump is very expensi...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
24.2.0.0 alpha0+
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:25.2.0
Keywords: bibisected, bisected, perf, regression
Depends on:
Blocks: Performance
  Show dependency treegraph
 
Reported: 2024-11-12 21:32 UTC by raal
Modified: 2024-11-21 06:17 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
163872_reduced.xlsx: reduced to sheet 75 - the slowest page by far (at least to that point) (67.83 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2024-11-19 17:40 UTC, Justin L
Details
163872_perf.svg: an incredible amount of time is spent in mdds _gnu.debug (2.69 MB, image/svg+xml)
2024-11-20 19:29 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description raal 2024-11-12 21:32:25 UTC
Description:
Cannot open xlsx file.
Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 7a7ba9cbee91485a9254949d1594352b3629c070
CPU threads: 4; OS: Linux 6.8; UI render: default; VCL: gtk3
Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US
Calc: threaded

Steps to Reproduce:
1.open file from bug 163865


Actual Results:
LO freeze, file do not open 

Expected Results:
Open file


Reproducible: Always


User Profile Reset: No

Additional Info:
.
Comment 1 raal 2024-11-12 21:36:21 UTC
This seems to have begun at the below commit in bibisect repository/OS linux-64-24.2.
Adding Cc: to Noel Grandin ; Could you possibly take a look at this one?
Thanks
 b3c14523e3d622babc44ea6b0bc7e5c24eb14c00 is the first bad commit
commit b3c14523e3d622babc44ea6b0bc7e5c24eb14c00
Author: Jenkins Build User <tdf@pollux.tdf>
Date:   Tue Jul 11 19:46:40 2023 +0200

    source d15c4caabaa21e0efe3a08ffbe145390e802bab9

140260: tdf#123026 xlsx import: recalc optimal row height on import | https://gerrit.libreoffice.org/c/core/+/140260
Comment 2 m_a_riosv 2024-11-12 22:56:08 UTC
It opens for me with
Version: 24.8.3.2 (X86_64) / LibreOffice Community
Build ID: 48a6bac9e7e268aeb4c3483fcf825c94556d9f92
CPU threads: 16; OS: Windows 11 X86_64 (10.0 build 26100); UI render: Skia/Raster; VCL: win
Locale: es-ES (es_ES); UI: en-US
Calc: CL threaded

Takes a lot, but maybe is something with my configuration, it has external links.
And I couldn't reproduce the issue reported on tdf#163865
Comment 3 raal 2024-11-13 07:12:11 UTC
Thanks Miguel, you are right, it opens after some time. Before the commit it open at 2 seconds, now after 30 and CPU is at 100 %
Comment 4 Noel Grandin 2024-11-14 07:27:42 UTC
Adding Justin, maybe he has an idea for limiting the scope of the tdf#123026 fix to avoid this
Comment 5 Justin L 2024-11-16 20:54:47 UTC
(In reply to raal from comment #0)
> 1.open file from bug 163865
That would be ca7e9fa4-7667-4af3-a64d-8d1db535dd62-Ukraine_Support_Tracker_Release_19.xlsx (attachment 197568 [details])

This has 82 sheets, mostly with a few charts. The biggest sheet (Bilateral Assistance, MAIN DATA) has approximately 4000 rows of data.

I confirmed that setting the global config to NOT recalculate row height on file-open lets this document load. So that at least is a workaround available with 24.8. See bug 124098.

For me, it failed to load the worksheet - I stopped waiting after 30 minutes. I expect it has triggered either an error or a terribly unoptimized path. I see at least one path through MaybeInterpret() which loops through 1 million rows in ScIfJump.

It seems to be entering some kind of loop on nTab 45, column 2 in the first 54 rows. That is tab A.19 - which in LO 7.5 takes a minute or two to switch to. At some point it runs HandleRefArrayForParallelism which returns bAllowThreading and all the time is spend in here as it methodically chews through each row in tab 75 (Refugees recorded) column J one by one until 1647.

So we must have terrible performance evaluating 
=INDEX($E:$E,MATCH($J$10,IF($B:$B=$B1648,$I:$I),1))
Comment 6 Justin L 2024-11-19 17:40:21 UTC
Created attachment 197689 [details]
163872_reduced.xlsx: reduced to sheet 75 - the slowest page by far (at least to that point)

I'm amazed to find that we might be doing all this processing on rows that are flagged as manual height, so surely lots of optimization potential there. However, this worst sheet doesn't specify customHeight, so it does need to be analysed.
Comment 7 Justin L 2024-11-20 19:29:20 UTC
Created attachment 197702 [details]
163872_perf.svg: an incredible amount of time is spent in mdds _gnu.debug

So things are super slow for the testing developer, but nearly so bad for the end user.
Comment 8 Commit Notification 2024-11-21 01:37:34 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/1bc4835bd00d12bee93671a01423f4646f4622f0

tdf#163872 ScTable::SetOptimalHeight: optimize if all rows ManualSize

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Justin L 2024-11-21 01:43:46 UTC
Comment 8 doesn't solve this bug, but it does help with some of the sheets.

I don't know how to optimize ScInterpreter::ScIfJump().
I suspect it is very dangerous to do so. A person would need to have both a good understanding of what =IF() is supposed to be capable of doing, and also clearly understand how ScInterpreter works.