Bug 118662 - FILESAVE - Time to save a 500Kb xlsx file has grown and causes a dump (see comment 7)
Summary: FILESAVE - Time to save a 500Kb xlsx file has grown and causes a dump (see co...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.2.0.0.alpha0+
Hardware: All Windows (All)
: medium normal
Assignee: Armin Le Grand
URL:
Whiteboard: target:6.2.0 target:6.3.0
Keywords: bibisected, perf, regression
Depends on:
Blocks: Regressions-AW080
  Show dependency treegraph
 
Reported: 2018-07-10 12:35 UTC by arnaud.boudier
Modified: 2020-08-03 12:22 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Xlsx test file to reproduce problem (499.78 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2018-07-10 12:36 UTC, arnaud.boudier
Details
xlsx test file - WinDbg debug from procdump.txt (20.84 KB, text/plain)
2018-07-16 11:03 UTC, Timur
Details

Note You need to log in before you can comment on or make changes to this bug.
Description arnaud.boudier 2018-07-10 12:35:30 UTC
Description:
Hello everyone,

I am using libreOffice ver 5.0.0.3 with a xlsx file that has a size of 500Kb.
The problem is that every time I save this file, it takes 5 seconds approximately to save. Even if I had only modified one cell, this is very strange.

Does libreOffice read the entire file every time we save ? It doesn't check for modification ?

I attached an xlsx example file full of "TOTO" that recreate my problem (I take 5 seconds to save it).

Maybe what I am experiencing is a normal behavior from libreOffice ... I don't know about the save strategy. 5 seconds for 500kb data seems to be huge no ?

Thank you

Steps to Reproduce:
1. Open xslx file attached
2. Change on cell value
3. Save

Actual Results:
It should take 5 seconds to save

Expected Results:
The save should be instantaneous


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 arnaud.boudier 2018-07-10 12:36:10 UTC
Created attachment 143423 [details]
Xlsx test file to reproduce problem
Comment 2 Xisco Faulí 2018-07-11 11:45:21 UTC Comment hidden (obsolete)
Comment 3 arnaud.boudier 2018-07-12 08:42:15 UTC
Hello,

Thank you for the reply, I have already tested with a more recent version(6.0.xx) and the save time was the same, maybe one second quicker.

You can test it on your side
Comment 4 Buovjaga 2018-07-15 17:52:33 UTC
If the idea is that you propose some different theoretical saving strategy, then I don't think we can accept this as a bug.
If the saving time would have become slower with the exact same file between different versions, that would have been a bug.
Comment 5 arnaud.boudier 2018-07-16 07:00:47 UTC Comment hidden (obsolete)
Comment 6 Buovjaga 2018-07-16 10:35:41 UTC
(In reply to arnaud.boudier from comment #5)
> Yes I am just wondering if it is normal to have a such long save time.
> 
> 5 seconds for a 500Kb document !

Well, you said it dropped to 4 seconds in the latest version :)
I would be worried, if it was 30 seconds.

I causes the same CPU fan noise for me in 3.6.7 (even though the progress indicator does not show in the old version).
Comment 7 Timur 2018-07-16 11:03:48 UTC
Created attachment 143570 [details]
xlsx test file - WinDbg debug from procdump.txt

When reporting a bug please indicate how file was created and used. 
This xlsx has 786 errors with Open XML SDK tool, mostly MaxInclusive and MaxExclusive constraints failed. 

Regardless, slow with 6.2+ is very slow (around 20 seconds for me in Win) and it yields a dump. Dump is ther even in LO 3.3.
So I'll set to New. Dump shouldn't happen. 

PS Why are you using LO 5.0.0.3? It may be understandable to use x.y.last (even than it's 5.4.7 now) but not some.old.first.
Comment 8 arnaud.boudier 2018-07-16 11:14:01 UTC
Thank you for the reply,

We are using libreOffice 5.0 because of history. It is this version that was used during our development phase. Because libreOffice 6.0 seems to not revolved our problem we didn't update the version?

I can not go into the detail because of confidential purpose ( That's why the xlsx file is full of "TOTO") but I can say that we are using the xlsx file to write tests configurations and save the result of these tests.

We have a software (used for our tests) that can interact through libreoffice (thanks to the java api) and change the value of certain cells.

The problem is that we are interacting with libreoffice a lot, and a long save time is creating instability.


Maybe the file is corrupted because of crash or anything ....
Comment 9 Eike Rathke 2018-07-16 14:06:59 UTC
Even if it takes 5 seconds for you (the attached sample file takes 3s for me) then it probably just is at it is. The file size doesn't tell anything because .xlsx is a compressed file format, with 31 sheets and some drawing elements (probably the comment boxes) the unzipped content in this case is 5.3MB, likely in your original document more because all those 125144 TOTO cell contents boil down to just one shared string. Also, there is no crash for me, earliest version tried 5.3.7.
Comment 10 Buovjaga 2018-07-16 15:45:32 UTC
Ok, then we will repurpose this to the recent regression.

Bibisecting with Win 6.2 repo, I get these suspects:
726d7e7b8b50dca9914329dbfd9491f7c8961f68
a28a839b9f9eeec1544c5ceeeabe7b1083ce1655
4b4942224b550235da228655677b5c068a053254

Likely it is https://cgit.freedesktop.org/libreoffice/core/commit/?id=4b4942224b550235da228655677b5c068a053254
SOSAW080: Derive SdrObjGroup from SdrObjList

It is blamed for several other bugs.

Adding Cc: to Armin Le Grand
Comment 11 Luke 2018-07-17 04:28:22 UTC
Noel,
You've been working in this area. Is this something you'd be interested in?
Comment 12 Timur 2018-07-17 10:18:13 UTC
(In reply to Buovjaga from comment #10)
> Ok, then we will repurpose this to the recent regression.
Are we speaking about the same recent regression that made this a lot slower? 
Did you test dump with procdump in Win?
Please write for both Win and Lin.

I have very slow fileopen which is 30 seconds with 6.2+ in Windows 7 (with OpenCL, much longer without). 
libo-master~2018-07-09_23.19.04_LibreOfficeDev_6.2.0.0.alpha0_Win_x86
Same system, Lin 5.4 is around 3 secs, Win 5.4 is around 5 secs.
Comment 13 Buovjaga 2018-07-17 10:36:42 UTC
(In reply to Timur from comment #12)
> (In reply to Buovjaga from comment #10)
> > Ok, then we will repurpose this to the recent regression.
> Are we speaking about the same recent regression that made this a lot
> slower? 
> Did you test dump with procdump in Win?

Yes, the slow down on Windows. I did not test with procdump.
Comment 14 Armin Le Grand 2018-07-18 12:19:24 UTC
Save as what...? Get the question to save as Excel or ODF...?
Had no lo-6-0 handy, so checked save time against 5.2 -> 5.2 takes around double the time (!)

@Buovjaga: Is there good proof/reasoning why this *should* have to do with SOSAW080? A good way to look myself? The file does not even have DrawingLayer objects, and - AFAICS - no Groups either...?
Comment 15 Armin Le Grand 2018-07-18 12:46:54 UTC
Made sure (by setting a Breakpoint) that there is *no single* SdrObjGroup involved -> this object was changed, so cannot have to do with my change.
Re-checked other SdrObjects, only some CustomShapes exist, but none of these uses any Group object (SdrObjGroup) at all...
Comment 16 Buovjaga 2018-07-18 12:48:13 UTC
(In reply to Armin Le Grand (CIB) from comment #14)
> Save as what...? Get the question to save as Excel or ODF...?
> Had no lo-6-0 handy, so checked save time against 5.2 -> 5.2 takes around
> double the time (!)
> 
> @Buovjaga: Is there good proof/reasoning why this *should* have to do with
> SOSAW080? A good way to look myself? The file does not even have
> DrawingLayer objects, and - AFAICS - no Groups either...?

Save as XLSX. No other good proof than my bibisection, which has two commits from you and one from Noel.

Do you think Noel's is more suspicious? https://cgit.freedesktop.org/libreoffice/core/commit/?id=a28a839b9f9eeec1544c5ceeeabe7b1083ce1655
Comment 17 Armin Le Grand 2018-07-18 16:13:02 UTC
@Buovjaga: probably not (I do not fully get what that typedefs are good for, but should do no harm...)

Checked that susp change is in master, but not in lo-6-1, thus will compare these...
Comment 18 Armin Le Grand 2018-07-18 16:55:56 UTC
VerySleepy data:
toto-6-1 profiling statistics:

Duration: 212.859000
Date: Wed Jul 18 18:39:05 2018
Samples: 201797

toto-6-2 profiling statistics:

Duration: 277.610000
Date: Wed Jul 18 18:32:30 2018
Samples: 279161

-> 30% more time needed.
Checking SfxBaseModel::storeAsURL (central method doing the store), following ChildCalls. Changes start at XclExpXmlStream::exportDocument:

toto-6-1 ChildCalls:

ExcDocument::ReadDoc	107.01s	51.36%	scfiltlo	c:\lo\work06_lo-6-1\sc\source\filter\excel\excdoc.cxx	746	0x3bcec6cf
ExcDocument::WriteXml	50.24s	24.12%	scfiltlo	c:\lo\work06_lo-6-1\sc\source\filter\excel\excdoc.cxx	832	0x3bced442
ExcDocument::~ExcDocument	47.80s	22.95%	scfiltlo	c:\lo\work06_lo-6-1\sc\source\filter\excel\excdoc.cxx	741	0x3bce6699
oox::core::FilterBase::commitStorage	1.97s	0.94%	ooxlo	c:\lo\work06_lo-6-1\oox\source\core\filterbase.cxx	348	0x3b668c94
std::_Tree<std::_Tmap_traits<rtl::OUString,std::pair<rtl::OUString,std::shared_ptr<sax_fastparser::FastSerializerHelper> >,std::less<rtl::OUString>,std::allocator<std::pair<rtl::OUString const ,std::pair<rtl::OUString,std::shared_ptr<sax_fastparser::FastSerializerHelper> > > >,0> >::clear	0.61s	0.29%	scfiltlo	c:\program files (x86)\microsoft visual studio 14.0\vc\include\xtree	1473	0x3be841d4
XclExpRootData::~XclExpRootData	0.47s	0.23%	scfiltlo	c:\lo\work06_lo-6-1\sc\source\filter\excel\xeroot.cxx	66	0x3be75c14
framework::StatusIndicator::start	0.14s	0.07%	fwklo	c:\lo\work06_lo-6-1\framework\source\helper\statusindicator.cxx	51	0x13b31520
XclExpXmlStream::CreateOutputStream	0.04s	0.02%	scfiltlo	c:\lo\work06_lo-6-1\sc\source\filter\excel\xestream.cxx	986	0x3be7fc89
framework::StatusIndicator::setValue	0.03s	0.02%	fwklo	c:\lo\work06_lo-6-1\framework\source\helper\statusindicator.cxx	113	0x13b31407
framework::StatusIndicator::end	0.03s	0.01%	fwklo	c:\lo\work06_lo-6-1\framework\source\helper\statusindicator.cxx	66	0x13b30ec5

toto-6-2 ChildCalls:

ExcDocument::WriteXml	164.96s	60.88%	scfiltlo	c:\lo\work01\sc\source\filter\excel\excdoc.cxx	851	0x3a1bb127
ExcDocument::ReadDoc	93.82s	34.63%	scfiltlo	c:\lo\work01\sc\source\filter\excel\excdoc.cxx	748	0x3a1ba171
ExcDocument::~ExcDocument	10.91s	4.03%	scfiltlo	c:\lo\work01\sc\source\filter\excel\excdoc.cxx	741	0x3a1b45f9
oox::core::FilterBase::commitStorage	0.69s	0.25%	ooxlo	c:\lo\work01\oox\source\core\filterbase.cxx	349	0x2a177b40
std::_Tree<std::_Tmap_traits<rtl::OUString,std::pair<rtl::OUString,std::shared_ptr<sax_fastparser::FastSerializerHelper> >,std::less<rtl::OUString>,std::allocator<std::pair<rtl::OUString const ,std::pair<rtl::OUString,std::shared_ptr<sax_fastparser::FastSerializerHelper> > > >,0> >::clear	0.31s	0.12%	scfiltlo	c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.14.26428\include\xtree	1463	0x3a333d19
XclExpRootData::~XclExpRootData	0.13s	0.05%	scfiltlo	c:\lo\work01\sc\source\filter\excel\xeroot.cxx	66	0x3a32675e
framework::StatusIndicator::setValue	0.08s	0.03%	fwklo	c:\lo\work01\framework\source\helper\statusindicator.cxx	113	0x52d19487
framework::StatusIndicator::end	0.04s	0.02%	fwklo	c:\lo\work01\framework\source\helper\statusindicator.cxx	66	0x52d18f45
framework::StatusIndicator::start	0.03s	0.01%	fwklo	c:\lo\work01\framework\source\helper\statusindicator.cxx	51	0x52d195a0

all except
ExcDocument::WriteXml	164.96s	60.88%
ExcDocument::ReadDoc	93.82s	34.63%
are peanuts.

toto-6-1 ChildCalls:
ExcDocument::ReadDoc	107.01s	51.36%
ExcDocument::WriteXml	50.24s	24.12%

toto-6-2 ChildCalls:
ExcDocument::WriteXml	164.96s	60.88%
ExcDocument::ReadDoc	93.82s	34.63%

-> ExcDocument::WriteXml got much more expensive. Why...?
Comment 19 Armin Le Grand 2018-07-18 17:02:58 UTC
ChildCalls for ExcTable::WriteXml

toto-6-1 ChildCalls:
XclExpCellTable::Finalize	32.26s	64.22%
XclExpRecordList<XclExpRecordBase>::SaveXml	17.97s	35.78%

toto-6-2 ChildCalls:
XclExpRecordList<XclExpRecordBase>::SaveXml	156.67s	94.99%
XclExpCellTable::Finalize	8.27s	5.01%

For ImplEESdrWriter::ImplWriteShape
toto-6-1 ImplEESdrWriter::ImplWriteShape	3.26s	100.00%
toto-6-2 ImplEESdrWriter::ImplWriteShape	144.04s	100.00%
-> this got massively slower indeed :-(
Comment 20 Armin Le Grand 2018-07-18 17:13:48 UTC
Very strange - in ImplEESdrWriter::ImplWriteShape

toto-6-2 says:
EscherPropertyContainer::CreateGraphicProperties	140.83s	94.64%

while toto-6-2 *has no entry* for CreateGraphicProperties at all (!)

Checked in the source - there seems no change, in 6.1 and 6.2, ImplEESdrWriter::ImplWriteShape has six calls to CreateGraphicProperties. Why is it never called in 6.1 version? Why is it called in 6.2 version (at all then?). Have to debug this...
Comment 21 Armin Le Grand 2018-07-18 17:46:26 UTC
Difference seems that SvxShape::GetBitmap does something in 6.2 while 6.1 does nothing for drawing.Caption objects. This is because in 6.1

GetSdrObject()->IsInserted() -> false

while for 6.2

GetSdrObject()->IsInserted() -> true

What means that we have a case where 'IsInserted' is not used as 'this object is member of a page or group', but abused to suppress special operations.
I am pretty confident when I look into 6.1 (tomorrow) who and why is setting bInserted (member of SdrObject) I will see some hack for Calc version of SdrCaptionObj...

All this BTW in an official UNO API implementation for getting GetPropertyValue("Metafile") what means this will not work for 'special' XShapes what is by itself bad enough.
Comment 22 Armin Le Grand 2018-07-19 17:00:46 UTC
Changed lo-6-1 code to check -> there are SdrCaptionObj's that are not inserted to a SdrPage.
Changed lo-6-2 -> *all* SdrObjects including the SdrCaptionObj's *are* added to the SdrPage.
-> Need to find out where and why they get added in lo-6-2 but not in lo-6-1. Will be hard to find - how do you find a call that is *not* made...?
-> made sure with added code that SdrCaptionObj's are not removed again in lo-6-1
-> made sure that at/after import all SdrCaptionObj's are inserted in lo-6-1
-> Found XclObjComment - at save time, temporary SdrCaptionObj's are created for export - these get *not* added to a SdrPage (!)
Comment 23 Armin Le Grand 2018-07-19 17:09:03 UTC
I had added a huge comment in XclObjComment already when I removed the cloning of the SdrCaptionObject, see there (will delete this in the fix). It defines why this cannot be reverted - it leads to errors itself.
Another reason to not revert this is that SdrObjects should be inserted to a SdrPage/SdrGroup, else functionality is not guaranteed.
To fix this:
- Add an exclusive boolean flag to SdrCaptionObj, only accessible from XclObjComment, that will be used to suppress graphic creation in SvxShape::GetBitmap
- SvxShape::GetBitmap is a biiig time killer as we saw, mainly because always a full E3dView is constructed and used - ARGH! There are more elegant methods to create graphics nowadays. Will refresh that code. Metafile target can use SingleObjectPainter, Bitamp target can use convertPrimitive2DSequenceToBitmapEx directly.
- SdrObject::bInserted/IsInserted()/SetInserted can be simplified. No need for a local bool, change SetInserted to a state change (InsertedStateChange()), IsInserted() will be directly depend on being a member in a SdrObjList.

With this in place we may just let the SdrCaptionObj's create the graphics, it will be faster. But better not, this may increase file size and these are really not needed (at least nut until now). As I have seen most of these are PostIt objects anyways.
Comment 24 Commit Notification 2018-07-20 18:29:54 UTC
Armin Le Grand committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=cbc992e7370ab006ea7c0f8520896845f79f7749

tdf#118662 Cleanup old hack with cloned SdrCaptionObj

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 25 Timur 2018-07-23 13:14:49 UTC
(In reply to Timur from comment #12)
> I have very slow fileopen which is 30 seconds with 6.2+ in Windows 7. 
> Same system, Lin 5.4 is around 3 secs, Win 5.4 is around 5 secs.
It's around 8 secs now in Windows. 

There's still a dump:
FOLLOWUP_IP: 
cppuhelper3MSC!cppu::defaultBootstrap_InitialComponentContext+1780
694483d0 55              push    ebp
Comment 26 Armin Le Grand 2018-07-25 08:30:40 UTC
@Timur: The task is not about file open - and I see neither crash nor performance issue compared to versions before.
Still -compared lo-6-1 with master on a somewhat comparable base (both with debug code, both in debugger, both with the nearly same BG load on the machine):
- master: 91s
- lo-6-1: 109s
-> I see no slowdown - when being optimistic, maybe even a slight speedup - but that may be measurement/comparability :-)

For saving (save-as, use XSLX):
- master: 78s
- lo-6-1: 114s
-> Thjis is quite a gain in speed - same grain of salt with measurement issues, but difference is big enough to say that speedup is 'relevant'.

Of course load/save can never be fast enough :-) but the task was about a regression that I suppose to be fixed with this change (well.. seems more than fixed :-))
Comment 27 Xisco Faulí 2018-07-25 08:39:37 UTC
Hi Armin, I agree with you in comment 26. The regression is fixed and the situation has even improved compared to previous versions and IMHO it should be closed as RESOLVED FIXED now.
If new bugs are encountered, they should be reported in new bugs...
Comment 28 Timur 2018-07-25 11:23:49 UTC
OK. 
I noticed the first time fileopen with some LO version is longer than on subsequent loads, like 25 secs vs. 5 secs. It's when fileopen procdump happens. Can be another issue. 
There is no dump on multiple fileopen and filesave actions and filesave looks improved comparing to previous master. 
I guess this one is as good as it is with it's 5 (Win) or 3 (Lin) seconds for save action, per comment 9. Not what reporter hoped but it's like that from OO time.
Comment 29 Xisco Faulí 2018-07-27 10:15:35 UTC
@Armin, should it be cherry-picked to 6-1 ?
Comment 30 Armin Le Grand 2018-08-01 18:04:52 UTC
@Xisco: Could, but would not do so. The created pics for the mentioned UNO API *should* all be the same (or better - at least faster created), but let us use the time in master to get proof of that. Too picky for lo-6-1
Comment 31 Buovjaga 2018-10-01 17:11:43 UTC
Filesave stats on Win 10:
6.1.2: 15s
Latest master: 3,5s

Taken with stopwatch.
Comment 32 Xisco Faulí 2018-10-01 17:46:02 UTC
(In reply to Buovjaga from comment #31)
> Filesave stats on Win 10:
> 6.1.2: 15s
> Latest master: 3,5s
> 
> Taken with stopwatch.

@Buovjaga, thanks for testing.

@Armin, after https://cgit.freedesktop.org/libreoffice/core/commit/?id=4b4942224b550235da228655677b5c068a053254 was backported, do you think it should be backported ?
On the downside, we would introduce bug 119180 in 6-1...
Comment 33 Commit Notification 2019-05-09 10:12:01 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/82dc7a39364af0ccc7357f0dffe2a2cd04632f4e%5E%21

crashtesting: failure importing tdf118662-1.xlsx

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.