Bug 162126 - Performance regression for a huge XLSX file opening
Summary: Performance regression for a huge XLSX file opening
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
25.2.0.0 alpha0+
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, filter:xlsx, haveBacktrace, perf, regression
Depends on:
Blocks: XLSX
  Show dependency treegraph
 
Reported: 2024-07-21 07:41 UTC by Roman Kuznetsov
Modified: 2024-07-23 11:30 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Perf flamegraph (2.82 MB, image/svg+xml)
2024-07-23 11:30 UTC, Buovjaga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Roman Kuznetsov 2024-07-21 07:41:53 UTC
Description:
Performance regression for a huge XLSX file opening

This one follows by bug 146735

Steps to Reproduce:
1. Download XLSX file by the link https://disk.yandex.ru/d/wQjGswYE_m2D0w
2. Try open it in current 25.2 master version
3. It will take some minutes
4. Try open the file in 7.4.0.3 version
5. It took only around 30 seconds for me

Actual Results:
LibreOffice opens the file for over minutes

Expected Results:
LibreOffice opens the file for only 30 sec


Reproducible: Always


User Profile Reset: No

Additional Info:
Fast opening

Version: 7.4.0.3 (x64) / LibreOffice Community
Build ID: f85e47c08ddd19c015c0114a68350214f7066f5a
CPU threads: 16; OS: Windows 10.0 Build 19045; UI render: Skia/Vulkan; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: CL

Slow opening

Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: ccc3996cfcbebe14e9d5f3511906cfc64ddf3452
CPU threads: 16; OS: Windows 10 X86_64 (10.0 build 19045); UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: CL threaded
Comment 1 Roman Kuznetsov 2024-07-21 07:43:59 UTC
Julien, hi. Could you please test it and if you will confirm the problem please make a perfgraph for this case. Thanks
Comment 2 Buovjaga 2024-07-21 13:50:28 UTC
This seems difficult to bisect.

I adapted a script from https://wiki.documentfoundation.org/QA/Bibisect/Automation like this:

#!/usr/bin/env bash
# bisect_perf.sh  Called with 3+1 parameters: "file.ext" good-time timeout-time [fix]
    fileext=$1 ;   goodtime=$2 ; timeouttime=$3 ; regfix=$4
    if [[ $regfix = "fix" ]]; then before=1; after=0; else before=0; after=1; fi
    # timeout takes seconds by default
    timeout $timeouttime bash -c ' OOO_EXIT_POST_STARTUP=1 SAL_USE_VCLPLUGIN=gen ./instdir/program/soffice --norestore "$@" ' bash  /home/user/libobugs/"$fileext"
    status=$? ;         echo "$status"
    exptime=$SECONDS
        if [ $status = 124 ] ; then
                echo "timeout in $timeouttime, done in $exptime sec" ; exit $before
        elif [ $status = 134 ] ; then
            echo "done in $exptime sec" ; exit $after
        elif [ $status = 42 ] ; then
            if (( $exptime < $goodtime ))  ; then
                echo "opens fast in $exptime sec" ; exit $before
            elif (( $exptime > $goodtime ))  ; then
                echo "opens slow in $exptime sec" ; exit $after
            else echo "error"
            fi
        else echo "other exit code"
    git reset --hard
    fi

Then started a run with it in the Linux 24.2 repo, which seemed to be the one where it slowed down:

git bisect start master oldest && git bisect run /path/to/bisect_perf.sh "Тест.xlsx" 80 240

Limit for good being 80 secs and timeout being 4 minutes. However, the results were not reliable in the actual bisecting run. The original bad time I saw was like 2min30sec, but in the bisect run I got many timeouts, so going over 4 mins. Doing the run a couple of times and checking the blamed vs. previous commit revealed the result was incorrect.

I may continue investigating this.
Comment 3 Buovjaga 2024-07-21 15:22:20 UTC
Ok, checking with linux-64-25.2 repo, there was a further pessimisation, like +30 secods. This time, I could reliably bisect it. The blamed commit is
 290c8f6e048fedf63437e3fdf629555ac89dd3ad
ITEM: Change SfxItemSet to use unordered_set

Maybe it's not fair to use the "Regression By" field in this case as it's the huge rework project.
Comment 4 Buovjaga 2024-07-22 10:22:24 UTC
Noel's very recent c3e8dbc139c3b1644ea07101a8c1111572ffa017 seems to have improved the opening time by more than 10 seconds for me.
Comment 5 Roman Kuznetsov 2024-07-22 17:58:39 UTC
(In reply to Buovjaga from comment #3)
> Ok, checking with linux-64-25.2 repo, there was a further pessimisation,

I saw the same bad time in 24.2 too...
Comment 6 Julien Nabet 2024-07-23 10:46:44 UTC
I tried both buttons with "скачать", it does nothing so I can't test the file.
Comment 7 Buovjaga 2024-07-23 11:30:21 UTC
Created attachment 195449 [details]
Perf flamegraph

Spends a lot of time getting optimal height and figuring out RichStrings at least.

Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: a7de9cc5e89cd0d0c2f6363b2c0cc265c528b121
CPU threads: 8; OS: Linux 6.9; UI render: default; VCL: kf6 (cairo+wayland)
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: CL threaded