Bug 99499 - FILEOPEN: ODS: Calc file with linked images take a long time to load
Summary: FILEOPEN: ODS: Calc file with linked images take a long time to load
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.0.6.1 rc
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:5.3.0
Keywords: perf
Depends on:
Blocks: Calc-Images
  Show dependency treegraph
 
Reported: 2016-04-25 19:43 UTC by John Navratil
Modified: 2019-04-02 14:43 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Spreadsheet which exhibits failure (25.51 KB, application/vnd.oasis.opendocument.spreadsheet)
2016-04-25 19:43 UTC, John Navratil
Details
The document is NOT conformant ODF1.2! (28.26 KB, text/plain)
2016-04-25 20:54 UTC, raal
Details

Note You need to log in before you can comment on or make changes to this bug.
Description John Navratil 2016-04-25 19:43:04 UTC
Created attachment 124629 [details]
Spreadsheet which exhibits failure

Upgraded Fedora 23 which included Fedora Calc to 5.0.6.1.  Spreadsheet (attached) no longer can be opened.  Waited 2 hours.  Tried on another machine (also upgraded) with same result.  I had noticed that this spreadsheet was taking what seemed to be a long time (several seconds) to load beforehand so I suspect something in the spreadsheet itself is the cause.

'ps' shows minimal cpu usage by the Calc process - it consumes a second of user time every several minutes.  The system monitor doesn't show memory being consumed, but there is a 10 KiB/s network traffic in and out (same data volume for each) which I cannot explain.

The application could be shutdown without resorting to 'kill', in which case I was prompted to wait or to force a shutdown.  No other LO application will start when in this state.

An attempt to start Writer, for example, fails with no apparent action.  When Calc is again shutdown after attempting to start Writer, a Calc is immediately started to begin a recovery of the default empty spreadsheet.  Such a recovery attempt is not initiated if there is no attempt to start Writer.

LO 5.0.6.1 ('about' doesn't say 'rc')
Build ID: 5.0.6.1-1.fc23

uname -r: 4.4.7-300.fc23.x86_64+debug
Comment 1 John Navratil 2016-04-25 19:53:23 UTC
After stopping Calc, the 10 KiB/s network observed dropped to zero.
Comment 2 raal 2016-04-25 20:54:59 UTC
Created attachment 124630 [details]
The document is NOT conformant ODF1.2!

Hello,
the document is not conformant ODF1.2.  http://odf-validator.rhcloud.com/
Comment 3 John Navratil 2016-04-25 21:28:58 UTC
Thank you very much for pointing this out to me.

It does beg the question as to how a document maintained by LibreOffice can become non-conformant, but that is a question for another day.

It would also seem to be a bug that a non-conformant document would cause Calc to hang up.  Shouldn't I be able to attempt to open my mother's recipe for Chicken Cacciatore without such behavior?  What was Calc attempting with all the network traffic?  It would seem that more effective defense against malformed input is in order.  (I give you my spreadsheet as an example :)

I recovered the data by opening this file in Microsoft Excel 2007 which complained about some unreadable content, apparently dropped the formulae, but allowed me to export a CSV.  I would hope that LO could do the same.
Comment 4 MM 2016-04-25 21:48:17 UTC
(In reply to raal from comment #2)
> Created attachment 124630 [details]
> The document is NOT conformant ODF1.2!
> 
> Hello,
> the document is not conformant ODF1.2.  http://odf-validator.rhcloud.com/

Sorry, but even a little text [one or two words] saved as odt or a few numbers in calc saved as ods are already too much for this validator in confirming mode, saying 'The document is NOT conformant ODF1.2!'
But it works with extended confirming. So how much can you trust it then ?!

Same with the reporter's file, which passes with 'extended conforming'.
Comment 5 Maxim Monastirsky 2016-04-25 21:53:23 UTC
(In reply to raal from comment #2)
> The document is NOT conformant ODF1.2!
This is normal. See https://wiki.documentfoundation.org/ODF#ODF_Extensions
Comment 6 Maxim Monastirsky 2016-04-25 23:09:20 UTC
The spreadsheet has some references to a remote image - http://www.digikey.com/web%20export/common/mkt/en/help.png?requestedName=help?requestedName=help?requestedName=help?requestedName=help?requestedName=help

After unzipping and replacing all occurrences of the above inside content.xml with some dummy url, it opens fine in a few seconds.
Comment 7 John Navratil 2016-04-25 23:56:32 UTC
(In reply to Maxim Monastirsky from comment #6)
> The spreadsheet has some references to a remote image -
> http://www.digikey.com/web%20export/common/mkt/en/help.
> png?requestedName=help?requestedName=help?requestedName=help?requestedName=he
> lp?requestedName=help
> 
> After unzipping and replacing all occurrences of the above inside
> content.xml with some dummy url, it opens fine in a few seconds.

I repeated your actions (learned something in the process - thanks) and confirm the file is recovered.

At this point, I'm conflicted.  On the one hand, the behavior exhibited by Calc when attempting to open this file is very unfriendly.  It not only gave no clue as to the problem, but left no avenue to repair it.

On the other hand, perfect defense is impossible and this would seem to be an odd case (I'm not really sold on this argument).  Excel 2007 opened it, and without that avenue (or your manual edits) I'd have been out of luck.

I also opened that URL (they are all the same) to confirm its validity and am still unable to load my original spreadsheet.  It would seem that whatever failure has been triggered ought to be addressed.  Even if the URL was inaccessible, a 5 minute timeout should have fired an exception.  Instead I say continuous network activity.  It seems that something is mishandling that URL.
Comment 8 raal 2016-04-26 04:50:16 UTC
Thanks for correcting me with validator. I found bug 52547 - the reason why the images are linked and not included in the file.

Similar writer bug 42742. Setting to new
Comment 9 MM 2016-05-14 10:43:26 UTC
(In reply to raal from comment #8)

> Similar writer bug 42742. Setting to new

Seems like a dup then.
Comment 10 John Navratil 2016-05-15 00:23:08 UTC
(In reply to MM from comment #9)
> (In reply to raal from comment #8)
> 
> > Similar writer bug 42742. Setting to new
> 
> Seems like a dup then.

Could be a dup.  However, in my case there were 15 links to the same existing PNG file and 10KB/s net traffic for two hours without opening anything.  Slightly different behavior from slow loading.
Comment 11 Giuseppe Castagno (aka beppec56) 2016-08-11 20:49:04 UTC
It seems there is a cyclical redirection driven by the web server.

The historical limit is 5 redirection, going to implement that in LO.
Comment 12 Giuseppe Castagno (aka beppec56) 2016-08-12 10:03:12 UTC
BZ was offline, reporting the commit myself.

I committed a patch related to this issue, to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=18009fe8fbe3982141ddca3f1fcd0900a63150a6

Related: tdf#99499, add a limit to the number of http redirections

It enables the Calc file loading and opening. 

Current daily on Linux:
Version: 5.3.0.0.alpha0+
Build ID: 18009fe8fbe3982141ddca3f1fcd0900a63150a6
CPU Threads: 8; OS Version: Linux 3.13; UI Render: default; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2016-08-12_06:44:06
Locale: en-US (en_US.UTF-8); Calc: group

has the patch, it can be used to test the loading.

There are other issues to be addressed, though, I'm still investigating.
Comment 13 Giuseppe Castagno (aka beppec56) 2016-08-14 10:37:54 UTC
LO has currently a limitation on the URI web query it can perform.
For example, in this bug, the URI:
<http://www.digikey.com/web%20export/common/mkt/en/help.png?requestedName=help?requestedName=help?requestedName=help?requestedName=help?requestedName=help>

for LO should instead be simplified into:
<http://www.digikey.com/web%20export/common/mkt/en/help.png?requestedName=help>

The other URI:
<https://sealserver.trustkeeper.net/seal_image.php?customerId=84EDAB68F81B2B31985E5E20392A8AC1&amp;size=105x54&amp;style=normal?requestedName=seal_image?requestedName=seal_image?requestedName=seal_image?requestedName=seal_image>

for LO should instead be simplified into:
<https://sealserver.trustkeeper.net/seal_image.php?customerId=84EDAB68F81B2B31985E5E20392A8AC1&amp;size=105x54&amp;style=normal>

This kind of simplification only solve this specific instance, would not work on other possible instances (e.g. with different more complex web queries).

This need further longer work.
Comment 14 Xisco Faulí 2017-09-11 08:41:12 UTC
Dear developer,
This bug has been in ASSIGNED status for more than 3 months without any
activity. Resetting it to NEW.
Please assigned it back to yourself if you're still working on this.
Comment 15 QA Administrators 2018-09-12 02:38:26 UTC Comment hidden (obsolete)
Comment 16 Roman Kuznetsov 2018-09-12 10:59:43 UTC
http://bugs.documentfoundation.org/attachment.cgi?id=124629 opens fast in

Версия: 6.1.1.1
ID сборки: 2718b4a18dfcc6a54ebe5f7b801ee7a47fa81e0c
Потоков ЦП: 4; ОС:Windows 6.1; Отрисовка ИП: по умолчанию; 
Локаль: ru-RU (ru_RU); Calc: group threaded

but I don't see any image in spreadsheet. There are some link to pdf's in some cells

@Xisco: what do you think now about this bug?
Comment 17 Xisco Faulí 2019-04-02 14:42:49 UTC
it takes

real	0m6,890s
user	0m2,263s
sys	0m0,232s

in

Version: 6.3.0.0.alpha0+
Build ID: 3b518953a8141b0d5043c2f3996a92956fdc3a47
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

I guess we can close this as RESOLVED WORKSFORME
Comment 18 Xisco Faulí 2019-04-02 14:43:12 UTC
(In reply to Roman Kuznetsov from comment #16)
> http://bugs.documentfoundation.org/attachment.cgi?id=124629 opens fast in
> 
> Версия: 6.1.1.1
> ID сборки: 2718b4a18dfcc6a54ebe5f7b801ee7a47fa81e0c
> Потоков ЦП: 4; ОС:Windows 6.1; Отрисовка ИП: по умолчанию; 
> Локаль: ru-RU (ru_RU); Calc: group threaded
> 
> but I don't see any image in spreadsheet. There are some link to pdf's in
> some cells
> 
> @Xisco: what do you think now about this bug?

Which images do you mean ?