Bug 35543 - FILEOPEN: hangs when trying to open SpreadsheetML
Summary: FILEOPEN: hangs when trying to open SpreadsheetML
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.3.1 release
Hardware: All All
: medium normal
Assignee: Peter Jentsch
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-22 07:37 UTC by Max Remov
Modified: 2016-05-30 18:57 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
FILEOPEN: Spreadsheet hangs (213.89 KB, application/x-zip-compressed)
2011-03-22 08:03 UTC, Max Remov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Max Remov 2011-03-22 07:37:10 UTC
MSO 2007 opens file attached normally. Under Ubuntu 10.10.
Comment 1 Kohei Yoshida 2011-03-22 07:45:57 UTC
Perhaps you forgot to attach your file?
Comment 2 Max Remov 2011-03-22 08:03:04 UTC
Created attachment 44713 [details]
FILEOPEN: Spreadsheet hangs

unpack the zip
Comment 3 Rainer Bielefeld Retired 2011-03-22 10:15:42 UTC
Reproduciblen with reporter's sample document and LibO 3.3.1 WIN7. I will do further tests soon.
Comment 4 Rainer Bielefeld Retired 2011-03-22 12:22:22 UTC
[Reproducible] with "LibreOffice 3.3.2RC2  – WIN7  Home Premium  (64bit) German UI [OOO330m19 (Build:202 / tag 3.3.2.2)]".
No Idea whether LibO supports SpreadsheetML, but of course, it should not hang or crash.

OOo-dev 3.4 also hangs, MS EXCEL Viewer was not able to open the sample document (gave message, did not hang), OOo 3.1.1 tries text-import.

@Kohei:
Can you please check the hang problem?
Comment 5 Kohei Yoshida 2011-03-22 12:35:49 UTC
I don't officially support Excel 2003 XML file format, which is what this file is.  The current file uses XSLT which is not scalable & chokes on large-ish files (like this document).  And the only way to make this filter perform better is to write a new one from scratch, using raw C++ and ditch the current, poor-performing XSLT based filter.

So, our current filter is "better than nothing" but please don't expect a speedy file opening experience.
Comment 6 Kohei Yoshida 2011-03-22 12:44:48 UTC
c.f. http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats
Comment 7 Kohei Yoshida 2011-03-22 12:45:44 UTC
(In reply to comment #5)
> I don't officially support Excel 2003 XML file format, 

I meant to say "I don't think we officially support"...
Comment 8 Rainer Bielefeld Retired 2011-03-22 23:20:44 UTC
@Kohei
I agree with your comments concerning SpreadsheetML support.

Please excuse me, I strongly disagree with your classification of this bug report. The simple rule is:

  Opening a document never causes a crash or a hang.

The report shows a vulnerability of LibO, that can not be accepted. If a user tries to open a unsuspicious looking document with .xls name extensions, and suddenly the program hangs and he looses data of work from last 1/4 hour (or may be even worse), he will think some unfriendly things about LibO

Although it seems that SpreadsheetML are not so often used (it seems I never saw one before), this hang is a major problem.

So I restored Importance.

Please feel free to reassign if this hang problem isn't your area.
Comment 9 Kohei Yoshida 2011-03-23 06:42:14 UTC
It's not a hang.  It's just taking a long time to open.  There is a difference.  And to fix this we either remove this filter entirely or write one from scratch.  So changing the severity won't change the outcome.
Comment 10 Kohei Yoshida 2011-03-23 06:48:56 UTC
Nobody works in this area, to be brutally honest.
Comment 11 Max Remov 2011-03-23 07:27:17 UTC
Just comment: Gnumeric opens this file normally but gives error message at first: "'Page 1'!A6781 : Invalid attribute 'Height', expected number, received '13.6875'"
Comment 12 Kohei Yoshida 2011-03-23 08:20:10 UTC
(In reply to comment #11)
> Just comment: Gnumeric opens this file normally but gives error message at
> first: "'Page 1'!A6781 : Invalid attribute 'Height', expected number, received
> '13.6875'"

Then that would serve as your workaround for now; open it in gnumeric and save as the real xls file to open it in LibO.

The gnumeric folks are at least smart enough to avoid XSLT to write their import filter.  Their filter is written in pure C, which is the way it should be.
Comment 13 Peter Jentsch 2011-03-23 15:48:54 UTC
At first glance I see two problems processing the document: 

create-header-footer-style makes assumptions about the content of @x:Data in PageSetup/Footer that don't hold for the document at hand (content of @x:Data should be "&[0-9]&[A-Z]", but here its "&Ц&С".

then, processing the rows in the document does a recursion for each row, redundantly evaluating some rather expensive expressions each time. depending on the maximum allowed recursion depth of the processor, it either quits with an error at some point or just takes *really* long on a sufficiently large document.
Comment 14 Kohei Yoshida 2011-03-23 17:20:46 UTC
Reassigning to Peter as we discussed on the mailing list.  I'll be in CC.
Comment 15 Peter Jentsch 2011-03-24 23:34:54 UTC
The current xslt based import is not designed to open spreadsheets larger than a couple of hundret rows or columns. It's algorithm recurses into the same template for each row and column in the spreadsheet. It is not only incredibly slow but also bails out after reaching the maximum recursion depth of the xslt processor used or the maximum stack size, whichever is smaller (usually after some 2000 or 3000 iteration steps). 

Unfortunately, avoiding that recursion is not trivial, as calculating the current row/column index according to span rules and explicit row/column indices requires to modify a variable outside the iteration scope while looping over the rows / columns, which is not possible in xslt (at least not in xslt 1.0). 

To work around this limitation in xslt, it should be possible to write an extension function to simply store integers, thus avoiding the recursion. This, and possibly another extension function to do unit conversions, should speed up conversion considerably. 

So much for now.
Comment 16 Peter Jentsch 2011-03-26 06:36:00 UTC
I'd suggest setting the importance to normal, because the export actually returns after a couple of minutes, so LibO doesn't hang indefinetely, causing data loss.
Comment 17 Rainer Bielefeld Retired 2011-03-26 08:40:28 UTC
I can live with severity "normal". 
But I doubt that a normal users without knowledge concerning all this discussion here will be patient enough to wait 1/4 hour whether LibO will start to respond again. At the latest after 14 minutes he will lose hope that LibO will start responding again and kill LibO with task manager - dataloss!
Comment 18 Peter Jentsch 2011-03-27 12:46:59 UTC
I think it should be possible to use the InteractionHandler passed to the import/export filter to ask a "continue/abort" question whenever the import/export exceeds a predefined timeout, and I'm going to try to add that to the libxslt based xslt filter (because I know there's a thread started there doing the actual transformation, which I could watch and interrupt if necessary). Actually I think this could and should be done centrally, applying to all IO or at least to loading and saving any kind of document, but that's currently beyond my capabilities.  

That wouldn't improve importing the file attached to this bug, but would generelly prevent data-loss caused by broken filters.
Comment 19 Rainer Bielefeld Retired 2011-03-27 21:41:55 UTC
@Peter Jentsch:
It would be great to have that "emergency brake" in LibO
Comment 20 Peter Jentsch 2011-05-01 04:38:51 UTC
A (poor) fix for this is available on to master: it introduces a timeout of 60 seconds when importing files. When the timeout elapsed, a general IO error is announced along with the choice to cancel or continue. 

User-experience wise this can be improved considerably, but at least it fixes the "opening bogus file apparently crashes LibO" part of this bug (for XSLT based filters, at least).

@kohei: could you please have a look at the patch: it's my first commit, and altough I tested the fix/workaround with the file attached to this bug on a not too old state of master, I feel like it might do something stupid that I don't recognize as such. 

Fix committed as 
http://cgit.freedesktop.org/libreoffice/filters/commit/?id=2e9f9d82110342601d28408ae77d63b673993ebe

I'm marking this bug as resolved and would ask Max or Rainer to verify that the situation has improved and to file a new bug for the remaining part (LibO is still unable to import the attached file).
Comment 21 Kohei Yoshida 2011-05-04 07:08:48 UTC
(In reply to comment #20)

> @kohei: could you please have a look at the patch: it's my first commit, and
> altough I tested the fix/workaround with the file attached to this bug on a not
> too old state of master, I feel like it might do something stupid that I don't
> recognize as such. 

Sorry. I have zero experience in this area.  Could you send a note about your commit to the mailing list?  Someone else may have more experience in this area.
Comment 22 Muthu 2011-05-12 16:25:37 UTC
I guess throwing a dialog box every 60 sec is annoying...Can we instead, throw the dialog only for the first timeout and wait indefinitely when user chooses 'continue'? Or a way to disable the timer - e.g. say the user ticks 'huge file' from the file open dialog? Or better could be find the filesize before actually reading the file and adjust the timeout accordingly?
Comment 23 Muthu 2011-05-12 16:30:20 UTC
Also, I don't know if this makes sense, but, would it be possible to move this to one layer above so that it works for any type of file-open? Assuming we are able to dynamically set the timeout and avoid multiple dialogs/timeouts?
Comment 24 Peter Jentsch 2011-05-13 12:41:24 UTC
(In reply to comment #22)
> I guess throwing a dialog box every 60 sec is annoying...Can we instead, throw
> the dialog only for the first timeout and wait indefinitely when user chooses
> 'continue'? Or a way to disable the timer - e.g. say the user ticks 'huge file'
> from the file open dialog? Or better could be find the filesize before actually
> reading the file and adjust the timeout accordingly?

It'll be rather easy to display the warning only the first time the timeout elapses. So you'd be able to tell LibO to try as long as it takes to open a file. 

I would suggest not to invent any heuristics about when to show and when not to show the timeout: after all, it depends very much on the filter implementation which files will take how long to load. The file attached to this bug ain't particulary large with regards to file size, it's not even a very large spreadsheet.
Comment 25 Peter Jentsch 2011-05-13 12:50:39 UTC
(In reply to comment #23)
> Also, I don't know if this makes sense, but, would it be possible to move this
> to one layer above so that it works for any type of file-open? Assuming we are
> able to dynamically set the timeout and avoid multiple dialogs/timeouts?

That makes a lot of sense to me: 
I'd love the timeout to not show an unspecific error message, but work and look very much like the dialog that opens in FF whenever a script on a page takes to long to execute, like "this file takes unexpectedly long to load, do you want to continue or stop loading the file". And it would be great if the warning dialog doesn't block the ongoing load-process, but simple disappear if the load finished with out the user reacting to the notification. 

Unfortunately, that's not a trivial thing to do and currently beyond my skills.
Comment 26 Björn Michaelsen 2011-11-08 14:45:53 UTC
From reading the comment, this doesnt exactly sound like RESOLVED FIXED, or am I getting this wrong? Locks more like a RESOLVED WONTFIX/WORKSFORME to me, or do I miss something?
Comment 27 Rainer Bielefeld Retired 2011-11-08 22:48:04 UTC
MinGW Master "LibO-dev 3.5.0 – WIN7 Home Premium (64bit) English UI [(Build ID:  2ba5d12-e8c71c5-41e7bcd-4b83b90)] (daily/MinGW_cross-compilation2011-10-25_00.12.09)" 
shows an "General Input/Output Error" 90 seconds after I started opening attached sample document. What ever that might mean.

@Bjoern
I agree, this one is not really fixed, The patch might be useful to prevent LibO from hanging, but it's not a real solution of the problem.
Comment 28 Peter Jentsch 2011-11-09 13:04:17 UTC
(In reply to comment #27)
> MinGW Master "LibO-dev 3.5.0 – WIN7 Home Premium (64bit) English UI [(Build ID:
>  2ba5d12-e8c71c5-41e7bcd-4b83b90)]
> (daily/MinGW_cross-compilation2011-10-25_00.12.09)" 
> shows an "General Input/Output Error" 90 seconds after I started opening
> attached sample document. What ever that might mean.
> 
> @Bjoern
> I agree, this one is not really fixed, The patch might be useful to prevent
> LibO from hanging, but it's not a real solution of the problem.

Hi all, 

I tried to summarize the current status in comment #20: it's fixed with respect to hanging LibO (and thus potentially losing changes), but not regarding the broken Excel2003ML import filter. I'd still suggest to file a new bug for the broken excel 2003 xml import. As an alternative, we could change the subject to remove the dreaded HANG keyword (because it no longer hangs :-)

I stopped working on the issue both because of limited ability and uncertainty about the gravity of the remaining problem: Excel2003 XML should be quickly diminishing in use.

Really fixing the problem would at least require to develop and extension function for libxslt or saxon that allows to calculate the cell position in the sheet, whatever the remaining shortcomings of that particular import filter are. 

What about moving the filter out to an extension? We'd loose word2003/excel2003 impex in the core, but the support is really limited currently and it'd be easier to explain the limitations to sb who deliberately installs the filter extension?