Bug 79049 - FILEOPEN: OOXML Workbook file hangs when opening
Summary: FILEOPEN: OOXML Workbook file hangs when opening
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
3.5.0 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.2.0
Keywords: bibisectRequest, filter:xlsx, haveBacktrace, perf, regression
Depends on:
Blocks: XLSX
  Show dependency treegraph
 
Reported: 2014-05-22 05:10 UTC by Darren
Modified: 2021-06-07 01:30 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:


Attachments
File that hangs LibO (9.13 MB, application/vnd.openxmlformats-officedocument.spreadsheetml)
2015-06-21 18:35 UTC, Buovjaga
Details
simplified document (1.89 MB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2015-09-15 16:01 UTC, Xisco Faulí
Details
perf flamegraph (136.08 KB, application/x-bzip)
2021-04-13 19:10 UTC, Julien Nabet
Details
perf flamegraph (102.28 KB, application/x-bzip)
2021-04-30 15:45 UTC, Julien Nabet
Details
perf flamegraph during opening (82.34 KB, application/x-bzip)
2021-05-02 11:08 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Darren 2014-05-22 05:10:43 UTC
I have tried opening the attached file in v 4.0 on Linux and Windows, and on 4.2.4 .2 in Windows.

The file can be opened in Excel.
Comment 1 Darren 2014-05-22 05:13:51 UTC
The file is 9Mb and I can't attach it. Here is a URL to download it.

http://www.sco.ca.gov/Files-UPD/estates_of_deceased_persons_file.xlsx
Comment 2 tommy27 2014-05-22 18:36:06 UTC
issue confirmed in LibO 4.2.4.2 under Win7x64... it hangs when loading progress bar is at 80-90%

tested even with older versions till 3.5.0 and still hangs
it keeps hanging even with recent 4.3.0.0alpha1+ 

file is correctly loaded in Excel viewer
Comment 3 QA Administrators 2015-06-08 14:41:54 UTC Comment hidden (obsolete)
Comment 4 Buovjaga 2015-06-21 18:35:21 UTC
Created attachment 116711 [details]
File that hangs LibO

Our attachment file size limits are now higher, so attaching.
It still hangs.

Win 7 Pro 64-bit Version: 5.1.0.0.alpha1+
Build ID: 3ecef8cedb215e49237a11607197edc91639bfcd
TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2015-06-19_23:16:58
Locale: fi-FI (fi_FI)
Comment 5 Xisco Faulí 2015-09-15 16:01:55 UTC
Created attachment 118751 [details]
simplified document

This issue is still present in

Version: 5.0.1.2
Build ID: 81898c9f5c0d43f3473ba111d7b351050be20261
Locale: es-ES (es_ES)

on Windows 7 (64-bit)
Comment 6 Julien Nabet 2015-12-06 22:01:54 UTC
On pc Debian x86-64 with master sources updated today, I could reproduce this.

First I noticed gdb the call to "Application::Yield" in a loop in file importSheetFragments (see http://opengrok.libreoffice.org/xref/core/sc/source/filter/oox/workbookfragment.cxx#344)
but thread apply all bt shows the pb was elsewhere
Indeed, after some calls to Yield, it showed lots of calls (in fact, I never saw the end of it after some minutes) to SheetDataBuffer::addColXfStyle (http://opengrok.libreoffice.org/xref/core/sc/source/filter/oox/sheetdatabuffer.cxx#358).
It's called at 2 locations:
1)
    431     std::map< FormatKeyPair, ApiCellRangeList > rangeStyleListMap;
    432     for( XfIdRangeListMap::const_iterator aIt = maXfIdRangeLists.begin(), aEnd = maXfIdRangeLists.end(); aIt != aEnd; ++aIt )
    433     {
    434         addIfNotInMyMap( getStyles(), rangeStyleListMap, aIt->first.first, aIt->first.second, aIt->second );
    435     }
    436     // gather all ranges that have the same style and apply them in bulk
    437     for (  std::map< FormatKeyPair, ApiCellRangeList >::iterator it = rangeStyleListMap.begin(), it_end = rangeStyleListMap.end(); it != it_end; ++it )
    438     {
    439         const ApiCellRangeList& rRanges( it->second );
    440         for ( ::std::vector< CellRangeAddress >::const_iterator it_range = rRanges.begin(), it_rangeend = rRanges.end(); it_range!=it_rangeend; ++it_range )
    441             addColXfStyle( it->first.first, it->first.second, *it_range );
    442     }

see http://opengrok.libreoffice.org/xref/core/sc/source/filter/oox/sheetdatabuffer.cxx#441

2) (some lines after)
    444     for ( std::map< sal_Int32, std::vector< ValueRange > >::iterator it = maXfIdRowRangeList.begin(), it_end =  maXfIdRowRangeList.end(); it != it_end; ++it )
    445     {
    446         ApiCellRangeList rangeList;
    447         AddressConverter& rAddrConv = getAddressConverter();
    448         // get all row ranges for id
    449         for ( std::vector< ValueRange >::iterator rangeIter = it->second.begin(), rangeIter_end = it->second.end(); rangeIter != rangeIter_end; ++rangeIter )
    450         {
    451             if ( it->first == -1 ) // it's a dud skip it
    452                 continue;
    453             CellRangeAddress aRange( getSheetIndex(), 0, rangeIter->mnFirst, rAddrConv.getMaxApiAddress().Column, rangeIter->mnLast );
    454 
    455             addColXfStyle( it->first, -1, aRange, true );
    456         }
    457     }

I don't understand the goal of rangeStyleListMap so I removed it for the test. However, I still reproduced this never ending loop.

Anyway, it seems the problem is in this part since in a second time, I commented the 2 parts and I could open the file in about 20secs (i5, 6GB)
Comment 7 Robinson Tryon (qubit) 2015-12-09 17:50:07 UTC Comment hidden (obsolete)
Comment 8 Maarten Bosmans 2016-09-27 19:17:05 UTC
The biggest problem (SheetDataBuffer::addColXfStyle) is already addressed in tdf#100709.

The simplified document now opens in 15 minutes on my desktop.
There are some more things to be done, I'm working on it.
Comment 9 Xisco Faulí 2017-07-13 10:51:39 UTC Comment hidden (obsolete)
Comment 10 QA Administrators 2018-11-05 03:43:32 UTC Comment hidden (obsolete)
Comment 11 Xisco Faulí 2019-04-02 14:17:04 UTC
it takes in

real	2m36,163s
user	2m34,301s
sys	0m2,062s

in

Version: 6.3.0.0.alpha0+
Build ID: 3b518953a8141b0d5043c2f3996a92956fdc3a47
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded
Comment 12 Xisco Faulí 2019-04-02 14:31:16 UTC
Version: 5.2.0.0.alpha1+
Build ID: 5b168b3fa568e48e795234dc5fa454bf24c9805e
CPU Threads: 4; OS Version: Linux 4.15; UI Render: default; 
Locale: ca-ES (ca_ES.UTF-8)

I killed Libo after

real	13m36,830s
user	13m33,749s
sys	0m1,558s

so that's a huge improvement
Comment 13 Buovjaga 2019-04-02 14:42:53 UTC Comment hidden (obsolete)
Comment 14 Julien Nabet 2019-04-02 16:31:33 UTC
(In reply to Buovjaga from comment #13)
> Patch related to this: https://gerrit.libreoffice.org/#/c/29528/
> Abandoned due to inactivity.
> 
> Julien: any ideas?

I retrieved the patch locally but there are too much conflicts.
Moreover, Eike's comments hadn't been followed.
Comment 15 Roman Kuznetsov 2021-04-12 14:40:49 UTC
Lo hangs

Version: 7.2.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 7a0e0a84a02f505200331c19b28d45e898cd5a12
CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: threaded Jumbo
Comment 16 Julien Nabet 2021-04-13 19:10:40 UTC
Created attachment 171169 [details]
perf flamegraph

On pc Debian x86-64 with master sources updated today + gtk3 rendering, the simplified example seems to hang too.
Comment 17 Julien Nabet 2021-04-13 19:11:50 UTC
Noel: noticing your work about optimizing some sc parts, thought you might be interested in this one.
Comment 18 Julien Nabet 2021-04-13 19:46:58 UTC
Following my comment from 2015, there's indeed something very slow in sc/source/filter/oox/sheetdatabuffer.cxx:

Indeed, with this patch, I don't reproduce the hang (it's still not very quick to open):
diff --git a/sc/source/filter/oox/sheetdatabuffer.cxx b/sc/source/filter/oox/sheetdatabuffer.cxx
index de1d2c76f3c9..65db163e55bc 100644
--- a/sc/source/filter/oox/sheetdatabuffer.cxx
+++ b/sc/source/filter/oox/sheetdatabuffer.cxx
@@ -321,7 +321,7 @@ void SheetDataBuffer::setMergedRange( const ScRange& rRange )
 }
 
 typedef std::pair<sal_Int32, sal_Int32> FormatKeyPair;
-
+/*
 static void addIfNotInMyMap( const StylesBuffer& rStyles, std::map< FormatKeyPair, ScRangeList >& rMap, sal_Int32 nXfId, sal_Int32 nFormatId, const ScRangeList& rRangeList )
 {
     Xf* pXf1 = rStyles.getCellXf( nXfId ).get();
@@ -345,6 +345,7 @@ static void addIfNotInMyMap( const StylesBuffer& rStyles, std::map< FormatKeyPai
     }
     rMap[ FormatKeyPair( nXfId, nFormatId ) ] = rRangeList;
 }
+*/
 
 void SheetDataBuffer::addColXfStyle( sal_Int32 nXfId, sal_Int32 nFormatId, const ScRange& rAddress, bool bProcessRowRange )
 {
@@ -413,7 +414,7 @@ void SheetDataBuffer::finalizeImport()
 
     // write default formatting of remaining row range
     maXfIdRowRangeList[ maXfIdRowRange.mnXfId ].push_back( maXfIdRowRange.maRowRange );
-
+/*
     std::map< FormatKeyPair, ScRangeList > rangeStyleListMap;
     for( const auto& [rFormatKeyPair, rRangeList] : maXfIdRangeLists )
     {
@@ -493,7 +494,7 @@ void SheetDataBuffer::finalizeImport()
 
         rDocImport.setAttrEntries(getSheetIndex(), nScCol, std::move(aAttrParam));
     }
-
+*/
     // merge all cached merged ranges and update right/bottom cell borders
     for( const auto& rMergedRange : maMergedRanges )
         applyCellMerging( rMergedRange.maRange );
Comment 19 Xisco Faulí 2021-04-13 20:17:37 UTC
it seems the import time got worse somewhere in the 6-4 branch. Using the bisect repository, it takes

real	3m2,747s
user	2m58,752s
sys	0m7,018s

in

Version: 6.3.0.0.alpha1+
Build ID: c98b1f1cd43b3e109bcaf6324ef2d1f449b34099
CPU threads: 4; OS: Linux 5.7; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded

while in

Version: 6.4.0.0.alpha1+
Build ID: 9bc848cf0d301aa57eabcffa101a1cf87bad6470
CPU threads: 4; OS: Linux 5.7; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded


I killed LibreOffice after

real	6m21,139s
user	6m15,603s
sys	0m7,721s
Comment 20 Xisco Faulí 2021-04-13 21:32:34 UTC
On linux, the bisection points to 2a775ef5ef0d2dfc2583341df0dd7abfff317915, which is obviously wrong. It probably indicates when then build wasn't done incrementally, so the commit introduced the regression might have been submitted before: https://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=2a775ef5ef0d2dfc2583341df0dd7abfff317915
Comment 21 Commit Notification 2021-04-30 14:03:58 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/857caa5fc69b92e781457a1b67a89aa051c2d70f

tdf#79049 speed up OOXML workbook load

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Julien Nabet 2021-04-30 15:45:30 UTC
Created attachment 171542 [details]
perf flamegraph

Here's an updated Flamegraph with master sources updated today (c90792cf4309557981d1f89febeff9157fd93b0c) still on the simplified example with gen rendering.
Comment 23 Commit Notification 2021-04-30 19:40:42 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/deac5c84732c3491a0ef5bf7f8c1552e6def4fc0

tdf#79049 speed up OOXML workbook load (2)

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 24 Commit Notification 2021-04-30 19:41:55 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/61386aa03cd166473a58dbb4be0dd5e0ce82195c

tdf#79049 speed up OOXML workbook load (3)

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 25 Roman Kuznetsov 2021-05-01 08:33:15 UTC
1.03 minutes for the Buovjaga's file opening in

Version: 7.2.0.0.alpha0+ / LibreOffice Community
Build ID: a52590d76b89dc75be2aa87f4287624c89f1e82f
CPU threads: 4; OS: Mac OS X 10.16; UI render: default; VCL: osx
Locale: ru-RU (ru_RU.UTF-8); UI: en-US
Calc: threaded


cool! It's a miracle made by Noel again!
Comment 26 Commit Notification 2021-05-02 08:01:14 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/86b345a963a64fd9b9a3cab522b3ac2e909977fd

tdf#79049 speed up OOXML workbook load (4)

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 27 Commit Notification 2021-05-02 10:00:14 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/ffde7949ab6bd434b0f086d1a3bdf83f31aeda48

tdf#79049 speed up OOXML workbook load (5)

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 28 Julien Nabet 2021-05-02 11:08:59 UTC
Created attachment 171587 [details]
perf flamegraph during opening

Here's an updated Flamegraph with master sources updated today (ffde7949ab6bd434b0f086d1a3bdf83f31aeda48)
Comment 29 Xisco Faulí 2021-05-03 07:59:56 UTC
it takes

real	0m38,064s
user	0m40,087s
sys	0m5,067s

in

Version: 7.2.0.0.alpha0+ / LibreOffice Community
Build ID: 95d8eb87eb20351a2e5795fc8c16653c0f58d6b4
CPU threads: 4; OS: Linux 5.7; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

while in

Version: 7.1.4.0.0+ / LibreOffice Community
Build ID: 06d5d625e1dc8489e51b962353ac423669e61fed
CPU threads: 4; OS: Linux 5.7; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

I killed LibreOffice after

real	10m51,050s
user	10m50,187s
sys	0m6,040s

@Noel, Nice work!!