Bug 79049 - FILEOPEN: OOXML Workbook file hangs when opening
Summary: FILEOPEN: OOXML Workbook file hangs when opening
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
3.5.0 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:xlsx, haveBacktrace, perf
Depends on:
Blocks: XLSX
  Show dependency treegraph
 
Reported: 2014-05-22 05:10 UTC by Darren
Modified: 2019-08-31 11:14 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
File that hangs LibO (9.13 MB, application/vnd.openxmlformats-officedocument.spreadsheetml)
2015-06-21 18:35 UTC, Buovjaga
Details
simplified document (1.89 MB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2015-09-15 16:01 UTC, Xisco Faulí
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Darren 2014-05-22 05:10:43 UTC
I have tried opening the attached file in v 4.0 on Linux and Windows, and on 4.2.4 .2 in Windows.

The file can be opened in Excel.
Comment 1 Darren 2014-05-22 05:13:51 UTC
The file is 9Mb and I can't attach it. Here is a URL to download it.

http://www.sco.ca.gov/Files-UPD/estates_of_deceased_persons_file.xlsx
Comment 2 tommy27 2014-05-22 18:36:06 UTC
issue confirmed in LibO 4.2.4.2 under Win7x64... it hangs when loading progress bar is at 80-90%

tested even with older versions till 3.5.0 and still hangs
it keeps hanging even with recent 4.3.0.0alpha1+ 

file is correctly loaded in Excel viewer
Comment 3 QA Administrators 2015-06-08 14:41:54 UTC Comment hidden (obsolete)
Comment 4 Buovjaga 2015-06-21 18:35:21 UTC
Created attachment 116711 [details]
File that hangs LibO

Our attachment file size limits are now higher, so attaching.
It still hangs.

Win 7 Pro 64-bit Version: 5.1.0.0.alpha1+
Build ID: 3ecef8cedb215e49237a11607197edc91639bfcd
TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2015-06-19_23:16:58
Locale: fi-FI (fi_FI)
Comment 5 Xisco Faulí 2015-09-15 16:01:55 UTC
Created attachment 118751 [details]
simplified document

This issue is still present in

Version: 5.0.1.2
Build ID: 81898c9f5c0d43f3473ba111d7b351050be20261
Locale: es-ES (es_ES)

on Windows 7 (64-bit)
Comment 6 Julien Nabet 2015-12-06 22:01:54 UTC
On pc Debian x86-64 with master sources updated today, I could reproduce this.

First I noticed gdb the call to "Application::Yield" in a loop in file importSheetFragments (see http://opengrok.libreoffice.org/xref/core/sc/source/filter/oox/workbookfragment.cxx#344)
but thread apply all bt shows the pb was elsewhere
Indeed, after some calls to Yield, it showed lots of calls (in fact, I never saw the end of it after some minutes) to SheetDataBuffer::addColXfStyle (http://opengrok.libreoffice.org/xref/core/sc/source/filter/oox/sheetdatabuffer.cxx#358).
It's called at 2 locations:
1)
    431     std::map< FormatKeyPair, ApiCellRangeList > rangeStyleListMap;
    432     for( XfIdRangeListMap::const_iterator aIt = maXfIdRangeLists.begin(), aEnd = maXfIdRangeLists.end(); aIt != aEnd; ++aIt )
    433     {
    434         addIfNotInMyMap( getStyles(), rangeStyleListMap, aIt->first.first, aIt->first.second, aIt->second );
    435     }
    436     // gather all ranges that have the same style and apply them in bulk
    437     for (  std::map< FormatKeyPair, ApiCellRangeList >::iterator it = rangeStyleListMap.begin(), it_end = rangeStyleListMap.end(); it != it_end; ++it )
    438     {
    439         const ApiCellRangeList& rRanges( it->second );
    440         for ( ::std::vector< CellRangeAddress >::const_iterator it_range = rRanges.begin(), it_rangeend = rRanges.end(); it_range!=it_rangeend; ++it_range )
    441             addColXfStyle( it->first.first, it->first.second, *it_range );
    442     }

see http://opengrok.libreoffice.org/xref/core/sc/source/filter/oox/sheetdatabuffer.cxx#441

2) (some lines after)
    444     for ( std::map< sal_Int32, std::vector< ValueRange > >::iterator it = maXfIdRowRangeList.begin(), it_end =  maXfIdRowRangeList.end(); it != it_end; ++it )
    445     {
    446         ApiCellRangeList rangeList;
    447         AddressConverter& rAddrConv = getAddressConverter();
    448         // get all row ranges for id
    449         for ( std::vector< ValueRange >::iterator rangeIter = it->second.begin(), rangeIter_end = it->second.end(); rangeIter != rangeIter_end; ++rangeIter )
    450         {
    451             if ( it->first == -1 ) // it's a dud skip it
    452                 continue;
    453             CellRangeAddress aRange( getSheetIndex(), 0, rangeIter->mnFirst, rAddrConv.getMaxApiAddress().Column, rangeIter->mnLast );
    454 
    455             addColXfStyle( it->first, -1, aRange, true );
    456         }
    457     }

I don't understand the goal of rangeStyleListMap so I removed it for the test. However, I still reproduced this never ending loop.

Anyway, it seems the problem is in this part since in a second time, I commented the 2 parts and I could open the file in about 20secs (i5, 6GB)
Comment 7 Robinson Tryon (qubit) 2015-12-09 17:50:07 UTC Comment hidden (obsolete)
Comment 8 Maarten Bosmans 2016-09-27 19:17:05 UTC
The biggest problem (SheetDataBuffer::addColXfStyle) is already addressed in tdf#100709.

The simplified document now opens in 15 minutes on my desktop.
There are some more things to be done, I'm working on it.
Comment 9 Xisco Faulí 2017-07-13 10:51:39 UTC
Setting Assignee back to default. Please assign it back to yourself if you're
still working on this issue
Comment 10 QA Administrators 2018-11-05 03:43:32 UTC Comment hidden (obsolete)
Comment 11 Xisco Faulí 2019-04-02 14:17:04 UTC
it takes in

real	2m36,163s
user	2m34,301s
sys	0m2,062s

in

Version: 6.3.0.0.alpha0+
Build ID: 3b518953a8141b0d5043c2f3996a92956fdc3a47
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded
Comment 12 Xisco Faulí 2019-04-02 14:31:16 UTC
Version: 5.2.0.0.alpha1+
Build ID: 5b168b3fa568e48e795234dc5fa454bf24c9805e
CPU Threads: 4; OS Version: Linux 4.15; UI Render: default; 
Locale: ca-ES (ca_ES.UTF-8)

I killed Libo after

real	13m36,830s
user	13m33,749s
sys	0m1,558s

so that's a huge improvement
Comment 13 Buovjaga 2019-04-02 14:42:53 UTC
Patch related to this: https://gerrit.libreoffice.org/#/c/29528/
Abandoned due to inactivity.

Julien: any ideas?
Comment 14 Julien Nabet 2019-04-02 16:31:33 UTC
(In reply to Buovjaga from comment #13)
> Patch related to this: https://gerrit.libreoffice.org/#/c/29528/
> Abandoned due to inactivity.
> 
> Julien: any ideas?

I retrieved the patch locally but there are too much conflicts.
Moreover, Eike's comments hadn't been followed.