Bug 95225 - scalc is unreasonably slow to open and close with test file
Summary: scalc is unreasonably slow to open and close with test file
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All Linux (All)
: low normal
Assignee: Not Assigned
URL:
Whiteboard: interoperability
Keywords: filter:xlsx, haveBacktrace, perf
Depends on:
Blocks: XLSX
  Show dependency treegraph
 
Reported: 2015-10-21 13:19 UTC by John
Modified: 2017-08-18 23:03 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
A file with empty cells that loads really slowly (12.43 KB, application/vnd.oasis.opendocument.spreadsheet)
2015-10-21 13:23 UTC, John
Details
callgrind_1 (215.13 KB, text/plain)
2015-10-24 19:48 UTC, Joel Madero
Details
Good Callgrind File (7.35 MB, text/plain)
2015-10-24 20:40 UTC, Joel Madero
Details
Excel file that LibreOffice has trouble with (14.49 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2015-11-12 14:58 UTC, John
Details

Note You need to log in before you can comment on or make changes to this bug.
Description John 2015-10-21 13:19:34 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:41.0) Gecko/20100101 Firefox/41.0
Build Identifier: LibreOffice Version: 5.0.2.2 and earlier versions

The test file is cut down from one produced by Quickbooks via Microsoft Excel,
opened by scalc (very slowly) saved as .ods (very slowly).
I have deleted all cell content.  Nevertheless it takes several seconds to open and to complete the rendering of this file.
I've unzipped the file and the slowness appears to be due to sections in content.xml beginning
<form:textarea form:name="FILTER" form:control-implementation="ooo:com.sun.star.
form.component.TextField" xml:id="control1" form:id="control1" form:current-valu
e="AQCgFlByb2ZpdCAmIExvc3MAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

I don't know what these are, but apparently LibreOffice Calc thinks they mean something and wastes a lot of time trying to do something with them.

Reproducible: Always

Steps to Reproduce:
1.At command line type
/opt/libreoffice5.0/program/scalc ReallyEmptySLOW.ods
2.Watch how long before scalc is ready to go...
3.



I would love to upload the test file (only 12kilobytes) but I don't see a way to do it here.

[Information automatically included from LibreOffice]
Locale: en-US
Module: SpreadsheetDocument
[Information guessed from browser]
OS: Linux (All)
OS is 64bit: yes


Reset User Profile?No
Comment 1 John 2015-10-21 13:23:43 UTC
Created attachment 119827 [details]
A file with empty cells that loads really slowly

Here is the test file that LibreOffice takes several seconds (on an intel i7) to open.
It is not an artificial example, it is cut down from a file created by Quickbooks via MS Excel
Comment 2 Joel Madero 2015-10-21 15:15:10 UTC
Can you better describe how you're making this file? I don't fully understand it (but I don't use Quickbooks either). Are you making it, creating a xlsx, then converting that to ODS? 

FWIW: the content of the xml is not the same as an empty spreadsheet (thus however you are making it is adding a bunch of crap, which explains the slownesss). It also opens with some kind of an error (the XML) in Firefox saying that it's a bad XML file (lacks structure elements)

Your File XML: http://pastebin.com/jvzdBzSj
An empty spreadsheet created with LibreOffice: http://pastebin.com/NfjdyX19

Given that the files are dramatically different in content (even if visually they look the same). I am closing this as NOTOURBUG. There is no reason to believe this is our issue, however you are creating the file is creating extra stuff (which apparently isn't even right as Firefox spits out errors about XML structure being messed up) and thus that's the issue.

Thanks
Comment 3 Joel Madero 2015-10-21 17:49:21 UTC
Hi There,

It does parse it but that takes time - thus probably the cycles. I'll push this back to UNCONFIRMED (for now) and try to find an expert to give a bit more feedback. 

I might also suggest reporting to Quickbooks (do they still exist?) and say "why do you add all this garbage to your xml exports..."
Comment 4 Eike Rathke 2015-10-21 20:04:01 UTC
As already noticed, the document is not empty. It contains two text area form controls in exactly the same place and both have their visibility set to invisible, hence the document *appears* to be empty. One of the forms contains a long string (7724 characters) as text area value, which is a Base64 encoded binary. I didn't investigate further, but it might be some graphics related file and contains (apparently the title) "Profit & Loss" and some notions of Arial font.

The time spent might be due to the long string which is tried to wrap into a text area but since there are no words it can't be wrapped and the iterator just eats along on the string.

If someone wants to run it with callgrind and find the actual bottleneck of this abuse.. until then regard it as hit shappens.
Comment 5 Joel Madero 2015-10-21 20:24:13 UTC
moving to NEEDINFO - 

@John - if you want to do a callgrind then feel free to move it back to UNCONFIRMED so it can be investigated further.

export VALGRIND=callgrind

then just run soffice

Thanks
Comment 6 Joel Madero 2015-10-21 20:34:39 UTC
Just FYI - you need to build LibreOffice with symbols for this to work...I may try it myself. Poke me privately through email to remind me in a week if you're interested in it getting done
Comment 7 John 2015-10-22 09:06:18 UTC
(In reply to Joel Madero from comment #2)
> Can you better describe how you're making this file? I don't fully
> understand it (but I don't use Quickbooks either). Are you making it,
> creating a xlsx, then converting that to ODS? 
> 
> FWIW: the content of the xml is not the same as an empty spreadsheet (thus
> however you are making it is adding a bunch of crap, which explains the
> slownesss). It also opens with some kind of an error (the XML) in Firefox
> saying that it's a bad XML file (lacks structure elements)
> 
> Your File XML: http://pastebin.com/jvzdBzSj
> An empty spreadsheet created with LibreOffice: http://pastebin.com/NfjdyX19
> 
> Given that the files are dramatically different in content (even if visually
> they look the same). I am closing this as NOTOURBUG. There is no reason to
> believe this is our issue, however you are creating the file is creating
> extra stuff (which apparently isn't even right as Firefox spits out errors
> about XML structure being messed up) and thus that's the issue.
> 
> Thanks

In Quickbooks, you can ask for a report; by default it creates the
report as an Excel xlsx file, which I then save from Excel.
This file is very slow to open in LibreOffice and also LibreOffice
is slow to close having opened it.  Microsoft Excel doesn't seem
to have trouble with it (but I can't check properly, I use Linux,
don't have Excel here, only at work with Quickbooks).

So, I did Ctrl-A Ctrl-minus to delete cell content, and saved it
as a .ods file.  This file is very slow to open in LibreOffice and
also LibreOffice is slow to close having opened it.

John
Comment 8 John 2015-10-22 09:15:21 UTC
(In reply to Eike Rathke from comment #4)
> As already noticed, the document is not empty. It contains two text area
> form controls in exactly the same place and both have their visibility set
> to invisible, hence the document *appears* to be empty. One of the forms
> contains a long string (7724 characters) as text area value, which is a
> Base64 encoded binary. I didn't investigate further, but it might be some
> graphics related file and contains (apparently the title) "Profit & Loss"
> and some notions of Arial font.
> 
> The time spent might be due to the long string which is tried to wrap into a
> text area but since there are no words it can't be wrapped and the iterator
> just eats along on the string.
> 
> If someone wants to run it with callgrind and find the actual bottleneck of
> this abuse.. until then regard it as hit shappens.

Eike, of course you're right in saying
> It contains two text area form controls
I can see this in the content.xml

Ypu wrote:
> in exactly the same place and both have their visibility set
> to invisible, hence the document *appears* to be empty.
I didn't notice that.

You wrote:
> I didn't investigate further, but it might be some
> graphics related file and contains (apparently the title) "Profit & Loss"
> and some notions of Arial font.
That all makes sense, since Quickbooks was exporting a Profit & Loss report.

I've had a quick look at the original .xlsx file saved from Excel.
It has a section
<controls>
<control shapeId="1025" r:id="rId3" name="FILTER"/>
<control shapeId="1026" r:id="rId4" name="HEADER"/>
</controls>
which I suppose generated the offending section in the .ods file beginning
<form:textarea form:name="FILTER" form:control-implementation="ooo:com.sun.star.
form.component.TextField" xml:id="control1" form:id="control1" form:current-valu
e="AQCgFlByb2ZpdCAmIExvc3MAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

When I have access to a MS Windows machine, I'll cut down the file in MS Excel
and save as xlsx.  One could argue that the original bug is in reading the xlsx file.

John
Comment 9 John 2015-10-22 09:21:38 UTC
(In reply to Joel Madero from comment #3)
> Hi There,
> 
> It does parse it but that takes time - thus probably the cycles. I'll push
> this back to UNCONFIRMED (for now) and try to find an expert to give a bit
> more feedback. 
> 
> I might also suggest reporting to Quickbooks (do they still exist?) and say
> "why do you add all this garbage to your xml exports..."

Quickbooks was bought by Intuit years ago, I think, but they still develop and market it.  I'm using Quickbooks2012.  Nowadays they try to get you to use a cloud version.

At the moment, I'm not sure I can pin "adding garbage" on Quickbooks.
Maybe the garbage gets added when LibreOffice opens the xlsx file?
As I said elsewhere, I'll try to make a cut-down xlsx file so you can see where
the trouble started.

John
Comment 10 Joel Madero 2015-10-22 14:43:22 UTC
You can verify quickly that Quickbooks is the one adding the garbage to the file. Export the xlsx, without opening it in LibreOffice unzip the xlsx and check the content of the XML. I'm relatively sure it's not us
Comment 11 Joel Madero 2015-10-24 19:23:19 UTC
Attached is a callgrind from master. It seems to open slightly faster in master now but it's still a bit slow.

@Eike - any further thoughts on this? Punting back to UNCONFIRMED for now but Eike if you think it's just what it is - let's just close as NOTABUG.
Comment 12 Joel Madero 2015-10-24 19:48:22 UTC
Created attachment 119930 [details]
callgrind_1
Comment 13 Joel Madero 2015-10-24 20:19:27 UTC
Sorry that callgrind is garbage and I'm working on another one (they take about 30 minutes to do).

That being said I'm pushing this back to NEEDINFO (sorry for the noise)

@John: Another thing you can do is use a ods validator to make sure the file is actually a valid file and also check the xls file with a xls validator. You can find these with google pretty easily :) If you get a chance to do this (thanks if you do!) please report your findings.
Comment 14 Joel Madero 2015-10-24 20:38:32 UTC
Got it this time. Attaching but leaving in NEEDINFO until we get feedback about the xls validation and ods validation.

@John - thanks a million for working with us. There's still a chance this gets closed but you've been a great help trying to figure out what in the world is going on.

If you're ever interested in helping out the QA team - basically by triaging and going through these precise steps on other bugs - we'd love to have you swing by the chat. 

http://webchat.freenode.net/?channels=libreoffice-qa
Comment 15 Joel Madero 2015-10-24 20:40:03 UTC
Created attachment 119931 [details]
Good Callgrind File
Comment 16 John 2015-11-12 14:58:16 UTC
Created attachment 120503 [details]
Excel file that LibreOffice has trouble with
Comment 17 John 2015-11-12 15:09:05 UTC
12Nov2015 Sorry for delay, I've not had access to MS Windows
I attach NearEmptyXLfromQB.xlsx an Excel file, created from Quickbooks.
It is very nearly empty.
It opens and closes quickly in Excel.
However, in LibreOffice it opens very slowly (tens of seconds) and when I close LibreOffice it hangs indefinitely (? over a minute, at least) process still running.
The earlier ODS "file with empty cells that loads really slowly" was made from an xlsx file similar to NearEmptyXLfromQB.xlsx by opening xlsx file in LibreOffice, deleting cells and saving as an ODS.
So I think NearEmptyXLfromQB.xlsx is closer to the original problem.
It doesn't appear to have the form:textarea stuff of the ODS file (but then I suppose the source of the ODS form:textarea in XLSX will look different in OOXML).
Comment 18 John 2015-11-12 15:13:56 UTC
(In reply to Joel Madero from comment #14)
> @John - thanks a million for working with us. There's still a chance this
> gets closed but you've been a great help trying to figure out what in the
> world is going on.
> 
> If you're ever interested in helping out the QA team - basically by triaging
> and going through these precise steps on other bugs - we'd love to have you
> swing by the chat. 
> 
> http://webchat.freenode.net/?channels=libreoffice-qa

Thank you for the friendly comment.  As you see from the delay since my last,
I don't have a lot of time.  So probably you wouldn't want to wait weeks for me to get round to checking if I could reproduce a reported bug.

John
Comment 19 John 2015-11-12 17:10:51 UTC
I tried opening the file with gnumeric (apologies if that's a swear word here :-)
Gnumeric said 
Unexpected element 'controls' in state : worksheet
So, unzip NearEmptyXLfromQB.xlsx into directory unzipped 
edit xl/worksheets/sheet1.xml to remove <controls> ... </controls>
Then zip it up by (in unzipped) 
   zip -r ../NearEmptyXLfromQBnocontrols.xlsx *
Now LibreOffice opens it quickly and saves it quickly.
So that is the bug: LibreOffice can't handle <controls> ... </controls> in
the context of the xl/worksheets/sheet1.xml
Comment 20 Joel Madero 2015-12-06 18:38:25 UTC
Moving this to NEW as it is confirmed but in all honesty I don't see any developer tackling this soon. Quickbooks is doing some wonky export that looks empty but is not. This is a corner case bug and not affecting a lot of users (probably just 1 :) )

Moving to:
NEW
Normal - can slow down professional quality work but not going to prevent it (long time to load but it does in fact load)
Low - down from medium as this definitely a corner case.
Comment 21 Yousuf Philips (jay) (retired) 2017-07-02 22:30:05 UTC
@Meeks, @Aron: A calc perf issue that Joel has done a callgrind for in comment 15. How can we move this issue along?
Comment 22 Michael Meeks 2017-07-03 09:41:40 UTC
> How can we move this issue along?

Great to have the callgrind trace; I guess waiting for a developer to have time to optimize for this case. Failing that - reading the callgrind trace in kcachegrind and working out which functions take most of the time, and then reading the code to see if there is anything clearly silly there would be the next steps - I'd encourage anyone interested to do that; the closer this gets to code - the easier & more interesting it is for developers to get involved. the KCachegrind view is excellent for understanding the code & data flow to make fixing this a reasonably easy entry-level task.

HTH.
Comment 23 John 2017-08-18 19:55:48 UTC
Hi, this is John, I reported the bug originally.

Just to re-cap, my problem was an xlsx file generated by Quickbooks.
LibreOffice loaded it REALLY slowly.  Furthermore, if I saved the xlsx file from LibreOffice as an ods file, that loaded REALLY slowly too.

I have just installed LibreOffice_5.4.0.3_Linux_x86-64_deb on my Debian Jessie
system.  This version of LibreOffice opens xlsx files generated by Quickbooks
very promptly. It is a bit slow opening the file I saved from the much older LibreOffice as an ods file, but that, I think, is because the older LibreOffice mangled the file.

So, to sum up, I consider that this bug has been fixed in version 5.4.0.3
(and perhaps in earlier versions).

I am very happy!  Thank you to all those who contributed to the fix.

A Paypal donation is on its way.
Comment 24 Aron Budea 2017-08-18 23:03:31 UTC
Glad to hear that. This can be closed as WORKSFORME, then.