Bug 92157 - Document with zero-sized graphic generating file format error / SAXParseException
Summary: Document with zero-sized graphic generating file format error / SAXParseExcep...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
5.0.0.0.beta3
Hardware: Other All
: high major
Assignee: Not Assigned
URL:
Whiteboard: target:5.2.0 target:5.1.6
Keywords: bibisected, bisected, filter:docx, regression
: 96905 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-06-18 13:13 UTC by Cor Nouws
Modified: 2016-09-24 10:03 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
better test file (104.99 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2015-06-22 10:20 UTC, Cor Nouws
Details
patch to ignore zero size for a graphic in docx (794 bytes, patch)
2015-10-22 08:07 UTC, libreoffice
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Cor Nouws 2015-06-18 13:13:55 UTC
Downloaded a document from Dutch government site
 /home/cono/Documenten/DATA/Nou&Off/Projecten/VluchtelingenWerk Nederland/MigratieLibreOffice/TestDocumenten/port_'Onafhankelijke_casemanager_in_de_vreemdelingenketen._Perspectieven_vanuit_het_buitenland'.docx

Open that in 5.0.0beta3

File format error found at
SAXParseException: '[word/document.xml line2]: unknown error', Stream 'word/document.xml', Line 2, Column 21949(row,col)
Comment 1 Cor Nouws 2015-06-18 13:15:38 UTC
File opens fine in 4.4.4.2

(I had a somewhat similar problem with another file in beta1 or so, but that now opens fine in beta3.)
Comment 2 Julien Nabet 2015-06-18 18:41:24 UTC
Could you provide the link so we can download the doc?
Comment 4 MM 2015-06-19 12:26:12 UTC
No problem opening with v5.0.0.0 b3 under ubuntu 14.04 x64 and mint 17.1 x64.
Comment 5 Julien Nabet 2015-06-19 17:43:02 UTC
On pc Debian x86-64 with master sources updated today, I don't reproduce this.
I tried too with LO Debian package 4.4.4.1.

Cor: did you try with a new LO profile? Have you got accessibility enabled? If yes, could you disable it and give a new try?
Comment 6 Cor Nouws 2015-06-22 10:20:30 UTC
Created attachment 116723 [details]
better test file

hmm, something strange with the link and the file that I have.
So I attached the file that does give the problem for me (also in 510master with clean user profile)
Comment 7 MM 2015-06-22 18:23:43 UTC
Confirmed with v5.0.0.0 b3 under mint 17.1 x64.
That last file doesn't open correctly.
Comment 8 Cor Nouws 2015-06-24 13:36:32 UTC
I can see that it opens OK in Version: 4.4.0.0.beta1
Build ID: 9af3d21234aa89dac653c0bd76648188cdeb683e
Locale: nl_NL

and bad in Version: 4.4.0.0.alpha2
Build ID: 24f0a5815f581dd9a7f09d30213a379edee6e9ac

Bibisecting however is not possible on 32 bits Ubuntu, so I rely on someone else to do that..
Comment 9 Cor Nouws 2015-06-24 13:37:11 UTC
(In reply to Cor Nouws from comment #8)
> I can see that it opens OK in Version: 4.4.0.0.beta1


Ignore! wrong issue for that comment!
Comment 10 Thomas Hackert 2015-07-12 16:02:16 UTC
Hello @ll
I can confirm it with LO
Version: 5.0.0.3
Build-ID: f79b5ba13f5e6cbad23f8038060e556217e66632
Gebietsschema: de-DE (de_DE.UTF-8)
(parallel installed, following the instructions from https://wiki.documentfoundation.org/Installing_in_parallel/Linux) with installed Germanophone lang- as well as helppack under Debian Testing AMD64 ... :( But I found also #89100 ... Could it be, that this bug is a duplicate?
HTH
Thomas.
Comment 11 Cor Nouws 2015-07-12 18:42:53 UTC
Hi Thomas,

(In reply to thackert from comment #10)
> ... :( But I found also #89100 ... Could it be, that this bug is a duplicate?

Could... but there may be more problems that result in the same error message for the user, so it must be check by the developers.

Thanks for mentioning the issue!
Comment 12 Michael Weghorn 2015-07-31 18:38:03 UTC
bisect result (using the "bibisect-50max" repository):

8b400e2c6b64ea88b911187a21de7090ee49f305 is the first bad commit
commit 8b400e2c6b64ea88b911187a21de7090ee49f305
Author: Matthew Francis <mjay.francis@gmail.com>
Date:   Wed May 27 18:16:49 2015 +0800

    source-hash-ebf767eeb2a169ba533e1b2ffccf16f41d95df35
    
    commit ebf767eeb2a169ba533e1b2ffccf16f41d95df35
    Author:     Michael Stahl <mstahl@redhat.com>
    AuthorDate: Thu Jan 22 12:50:07 2015 +0100
    Commit:     Michael Stahl <mstahl@redhat.com>
    CommitDate: Thu Jan 22 13:58:10 2015 +0100
    
        writerfilter: DOCX import: better error handling than "catch (...) {}"
    
        If there is a SAXParseException, OOXMLDocumentImpl::resolve() should not
        ignore it, because if it occurs in a substream some end tag handlers may
        not have been run and the DomainMapper may be in an inconsistent state,
        so continuing to parse the outer document is probably not a good idea.
    
        Also add some exception mangling so sfx2 can present a useful error
        dialog.
    
        Change-Id: I169ba6db25f2ae264af08a64edf76a6bf6757f85

:040000 040000 304d902d5bb07301189acae3bc2d1840d5ef1663 e47dc06f6dbb1f265fc8835f7eb9ac016f2afcc1 M	opt

---

$ git bisect log
# bad: [dda106fd616b7c0b8dc2370f6f1184501b01a49e] source-hash-0db96caf0fcce09b87621c11b584a6d81cc7df86
# good: [5b9dd620df316345477f0b6e6c9ed8ada7b6c091] source-hash-2851ce5afd0f37764cbbc2c2a9a63c7adc844311
git bisect start 'master' 'oldest'
# bad: [0c30a2c797b249d0cd804cb71554946e2276b557] source-hash-45aaec8206182c16025cbcb20651ddbdf558b95d
git bisect bad 0c30a2c797b249d0cd804cb71554946e2276b557
# good: [770ff0d1a74d2450c2decb349b62c5087e12c46b] source-hash-549b7fad48bb9ddcba7dfa92daea6ce917853a03
git bisect good 770ff0d1a74d2450c2decb349b62c5087e12c46b
# bad: [259e888083cf7697956bb7e5f2691e8153eadb4c] source-hash-1884c0bbd40f0ded41d7a1656cb64fb1f6368c36
git bisect bad 259e888083cf7697956bb7e5f2691e8153eadb4c
# good: [ee7c82541a2e99f76af570d3faa897504149913a] source-hash-54defd1bd3359c95e45891c7294847d0cebca753
git bisect good ee7c82541a2e99f76af570d3faa897504149913a
# bad: [504f60cf9ee84da75d4c15a62dedb18976129c14] source-hash-c8af68bc5adf093f9df803f6fe0147ac9d116169
git bisect bad 504f60cf9ee84da75d4c15a62dedb18976129c14
# good: [00c3cacafec11fdfbdf7f0c8c279503cd109d8a0] source-hash-f21114332bf670ab7f8e9b0a7f4d83d436d8fd9e
git bisect good 00c3cacafec11fdfbdf7f0c8c279503cd109d8a0
# bad: [5e1da738abc9f023f0c7bafcffc10d899b57a95b] source-hash-ef296e87b8afa1afdc08a23675658e0252dd2b86
git bisect bad 5e1da738abc9f023f0c7bafcffc10d899b57a95b
# good: [bdf9a49d5f818c69487628f49c13bed9bb2bc947] source-hash-df8c7d1c4e9d878797398fa5fd94477b04c2cc00
git bisect good bdf9a49d5f818c69487628f49c13bed9bb2bc947
# good: [7e264ef7d7c3096e9b779e5160c59419b53b138d] source-hash-f0d6e0e1e21afd0adf5bd01d771b2d83d8f13a48
git bisect good 7e264ef7d7c3096e9b779e5160c59419b53b138d
# bad: [3aa029ed7028303a0d5ebc84c697840c54c8df41] source-hash-134b523c425613848a2068f917c20a7a67fa0577
git bisect bad 3aa029ed7028303a0d5ebc84c697840c54c8df41
# bad: [4e106cf62e8a97370022bde02efcf044e1ed2c30] source-hash-c0c1b01a32b91984d61f2d0b9146719fcaed7e09
git bisect bad 4e106cf62e8a97370022bde02efcf044e1ed2c30
# good: [4e454b281b3cea9be43fceaa4c201f36a6a3d1be] source-hash-825e4995220209362c13ed5f07c98e43a5f456de
git bisect good 4e454b281b3cea9be43fceaa4c201f36a6a3d1be
# bad: [8b400e2c6b64ea88b911187a21de7090ee49f305] source-hash-ebf767eeb2a169ba533e1b2ffccf16f41d95df35
git bisect bad 8b400e2c6b64ea88b911187a21de7090ee49f305
# first bad commit: [8b400e2c6b64ea88b911187a21de7090ee49f305] source-hash-ebf767eeb2a169ba533e1b2ffccf16f41d95df35
Comment 13 Michael Weghorn 2015-07-31 18:50:07 UTC
On master, the behaviour is a bit different. When I open the file with the latest version in the "lo-linux-dbgutil-daily" bibisect repository (source-hash-2d9db406d301d722649ca539cacad823b89191ca), LibreOffice closes with the following assertion error when trying to open the file:

soffice.bin: /home/vmiklos/git/libreoffice/master/sw/source/core/bastyp/index.cxx:226: virtual SwIndexReg::~SwIndexReg(): Assertion `!m_pFirst && !m_pLast && "There are still indices registered"' failed.
Comment 14 Timur 2015-08-27 13:21:39 UTC
Looks like a duplicate of an annoying Bug 89100.
Pity for 2 bibisecs.

*** This bug has been marked as a duplicate of bug 89100 ***
Comment 15 libreoffice 2015-10-22 08:07:54 UTC
Created attachment 119870 [details]
patch to ignore zero size for a graphic in docx

bug 89100 is about the uncovering of previously hidden errors, there are likely multiple now visible errors.

I've investigated the problem for this document, and it's that the document has a graphic with size 0, 0. (It contains the xml <a:graphic...<a:xfrm><a:off x="0" y="0"/><a:ext cx="0" cy="0"/></a:xfrm>). This fails the test in SwFormatFrmSize::PutValue (MID_FRMSIZE_SIZE) which makes SfxItemPropertySet::setPropertyValue throw an IllegalArgumentException, aborting the parser.

The attached patch skips setting the graphic size if the size is 0, 0. With this patch applied I can open the document.
Comment 16 Julien Nabet 2015-10-23 21:37:55 UTC
(In reply to libreoffice from comment #15)
> ...
> The attached patch skips setting the graphic size if the size is 0, 0. With
> this patch applied I can open the document.
Perhaps you may be interested in contributing directly on LO? (see https://wiki.documentfoundation.org/Development/gerrit)
Comment 17 Robinson Tryon (qubit) 2015-12-17 04:38:04 UTC Comment hidden (obsolete)
Comment 18 Cor Nouws 2015-12-27 21:49:39 UTC
This problem is not fixed IMO.

5.1.0rc1 and 5.2.0 daily recent, it either opens in Draw, or (if filter is set explicitly to Ms Word 2007-2013 XML it gives a general IO error
Comment 19 Cor Nouws 2015-12-27 21:52:07 UTC
(In reply to libreoffice from comment #15)
> The attached patch skips setting the graphic size if the size is 0, 0. With
> this patch applied I can open the document.

Hi arbruin,

It is more common to send patches directly to gerrit. See the link in comment #16
Thanks for looking into this!
Cor
Comment 20 Cor Nouws 2016-01-05 13:06:15 UTC
*** Bug 96905 has been marked as a duplicate of this bug. ***
Comment 21 Mike Kaganski 2016-01-06 01:25:37 UTC
(In reply to libreoffice from comment #15)
> I've investigated the problem for this document, and it's that the document
> has a graphic with size 0, 0. (It contains the xml
> <a:graphic...<a:xfrm><a:off x="0" y="0"/><a:ext cx="0" cy="0"/></a:xfrm>).
> This fails the test in SwFormatFrmSize::PutValue (MID_FRMSIZE_SIZE) which
> makes SfxItemPropertySet::setPropertyValue throw an
> IllegalArgumentException, aborting the parser.
> 
> The attached patch skips setting the graphic size if the size is 0, 0. With
> this patch applied I can open the document.

The fix for bug 95775 already relaxed the abovementioned test in SwFormatFrmSize::PutValue (now it only checks if one of the two values (either x or y) is not 0). That relaxation is apparently not enough for this particular issue; but I beleive that the proper fix here would be not to avoid setting the size as in the patch attached to comment 15, but to completely remove the check from SwFormatFrmSize::PutValue.

libreoffice@arbruijn.dds.nl, please move on and post improved patch to gerrit. If you cannot do it, I'll prepare a patch around jan 15th, with proper credit to you. Thank you for your work!
Comment 22 Mike Kaganski 2016-01-06 01:53:59 UTC
By the way: why would a gvnmt want to insert a zero-sized image into a document?
Of course, it's possible that it's intended to be shown on a programmatic event (say, using vba), but it also could be used for tracking purposes...
Comment 23 Jeroen Hoek 2016-01-06 09:28:50 UTC
(In reply to Mike Kaganski from comment #22)
> By the way: why would a gvnmt want to insert a zero-sized image into a
> document?
> Of course, it's possible that it's intended to be shown on a programmatic
> event (say, using vba), but it also could be used for tracking purposes...

I am aware of several departments of our government using document generating software that allows users to fill in paragraphs of text and metadata such as addresses in a web application, which then generates a document that is almost, but not quite, completely unlike a proper OOXML document. That way, users cannot accidentally modify the government's chosen styling. This may be one of those cases.

It is possible to create a document that is well-formed XML, opens fine in Word, and is invalid or nonsensical OOXML, all at the same time.

That is, the cause may be incompetence rather than malice, although it would be quite interesting to actually find a government created document that does phone home with tracking information!

Duplicate bug 96905 contains an attached document that fails in (probably) the same way. It may be interesting to examine that one as well.
Comment 24 Mike Kaganski 2016-01-09 08:47:00 UTC
Posted a patch to gerrit: https://gerrit.libreoffice.org/21287
Comment 25 Commit Notification 2016-02-14 01:58:26 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=654f6ff28d7a148950b48ed8905d8f13a015a5b5

tdf#92157: allow both dimensions of a graphic to be 0

It will be available in 5.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 26 Chris Sherlock 2016-02-14 02:15:15 UTC
Whilst I've committed the patch (thanks whoever posted this - libreoffice@arbuijn.dds.nl, and Mike of course), there are still some unresolved questions. Especially around potential tracking!

I've posted on the dev mailing list asking for feedback on whether there is anything further we should be doing.
Comment 27 Cor Nouws 2016-02-14 10:06:40 UTC
Hi Jeroen,

(In reply to Jeroen Hoek from comment #23)
> [...]  which then generates a
> document that is almost, but not quite, completely unlike a proper OOXML
> document. [...] This may be one of those cases.

Do I read that right "completely unlike a proper.." 

> That is, the cause may be incompetence rather than malice, although it would
> be quite interesting to actually find a government created document that
> does phone home with tracking information!

But I guess the 0x0 graphic must be seen as unrelated?
Comment 28 Cor Nouws 2016-06-20 12:21:12 UTC
In 5.2 this opens now.
In 5.0.x and 5.1.x it does not.
@Mike, *
Can 654f6ff28d7a148950b48ed8905d8f13a015a5b5 be backported please?
Comment 29 Commit Notification 2016-07-13 20:16:05 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-5-1":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=88dc41490189a6ccc218c633c6385d4e99af0216&h=libreoffice-5-1

tdf#92157: allow both dimensions of a graphic to be 0

It will be available in 5.1.6.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.