Bug 146802 - SAXException: [word/document.xml line 2] - Lo tries to import VML equation
Summary: SAXException: [word/document.xml line 2] - Lo tries to import VML equation
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.4.0.0 alpha0+
Hardware: All All
: medium normal
Assignee: Attila Bakos (NISZ)
URL:
Whiteboard: target:7.4.0
Keywords: bibisected, bisected
Depends on:
Blocks: DOCX-SAXParse
  Show dependency treegraph
 
Reported: 2022-01-16 21:54 UTC by Gabor Kelemen (allotropia)
Modified: 2022-02-04 11:58 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of the error message (11.61 KB, image/png)
2022-01-16 21:54 UTC, Gabor Kelemen (allotropia)
Details
Error message after pressing No to previous question (7.96 KB, image/png)
2022-01-16 21:55 UTC, Gabor Kelemen (allotropia)
Details
Sample (xmllinted) (2.25 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2022-01-17 00:33 UTC, Aron Budea
Details
The problem in the xml (194.51 KB, image/jpeg)
2022-01-17 12:20 UTC, Attila Bakos (NISZ)
Details
The sample WITH problem minimised (70.66 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2022-01-17 12:23 UTC, Attila Bakos (NISZ)
Details
The sample WITHOUT problem minimised (70.18 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2022-01-17 12:24 UTC, Attila Bakos (NISZ)
Details
The minimised file is fixed (244.32 KB, image/jpeg)
2022-01-19 17:03 UTC, Attila Bakos (NISZ)
Details
The orignal file without that formula (390.45 KB, image/jpeg)
2022-01-20 07:38 UTC, Attila Bakos (NISZ)
Details
The sample without that formula (2.12 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2022-01-20 07:39 UTC, Attila Bakos (NISZ)
Details
Pdf from the fixed master (969.61 KB, application/pdf)
2022-01-20 17:07 UTC, Attila Bakos (NISZ)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gabor Kelemen (allotropia) 2022-01-16 21:54:26 UTC
Created attachment 177587 [details]
Screenshot of the error message

When opening attachment 96245 [details] from bug 76513 in current master, I get error message:

An error occurred during opening the file. This may be caused by incorrect file contents.
The error details are:
SAXException: [word/document.xml line 2]: unknown error
Proceeding with import may cause data loss or corruption, and application may become unstable or crash.

Do you want to ignore the error and attempt to continue loading the file?

Pressing No gives another error:

File format error found at unsatisfied query for interface of type com.sun.star.frame.XModel3!
SAXParseException: '[word/document.xml line 2]: unknown error', Stream 'word/document.xml', Line 2, Column 1490111(row,col).

Started with: 

https://git.libreoffice.org/core/+/121cbc250b36290f0f8c7265fea57256dad69553

author	Attila Bakos (NISZ) <bakos.attilakaroly@nisz.hu>	Thu Nov 11 14:02:12 2021 +0100
committer	László Németh <nemeth@numbertext.org>	Thu Jan 06 10:41:32 2022 +0100

tdf#66039 DOCX: import textboxes (with tables, images etc.) in group shapes

Adding CC to: Attila Bakos
Comment 1 Gabor Kelemen (allotropia) 2022-01-16 21:55:07 UTC
Created attachment 177588 [details]
Error message after pressing No to previous question
Comment 2 Aron Budea 2022-01-17 00:33:00 UTC
Created attachment 177593 [details]
Sample (xmllinted)

Reproduced using LO Version: 7.4.0.0.alpha0+ (b4a281af53efa0c36ee1770e6cf4d800be77a6d2) / Windows.

I've adjusted attachment 96245 [details] using xmllint to get a more accurate line number, but opening that hangs LO. Both files can be opened with LO 7.3.0.1
Comment 3 Attila Bakos (NISZ) 2022-01-17 12:20:11 UTC
Created attachment 177598 [details]
The problem in the xml

Well, actually this is not a really regression: As in my attachment shown there is  a VML equation inside the WPG-textbox. Before Writer imported nothing from the group-shape-textbox, now tries to import the formula but -- as it not implemented -- it throws an unhandled exception, as it mentioned. Soon there will be a patch from me for this situation too (which will ignore the embed vml part).
Comment 4 Attila Bakos (NISZ) 2022-01-17 12:23:26 UTC
Created attachment 177599 [details]
The sample WITH problem minimised
Comment 5 Attila Bakos (NISZ) 2022-01-17 12:24:08 UTC
Created attachment 177600 [details]
The sample WITHOUT problem minimised
Comment 6 Attila Bakos (NISZ) 2022-01-19 17:01:49 UTC
(In reply to Attila Bakos (NISZ) from comment #3)
> Created attachment 177598 [details]
> The problem in the xml
> 
> Well, actually this is not a really regression: As in my attachment shown
> there is  a VML equation inside the WPG-textbox. Before Writer imported
> nothing from the group-shape-textbox, now tries to import the formula but --
> as it not implemented -- it throws an unhandled exception, as it mentioned.
> Soon there will be a patch from me for this situation too (which will ignore
> the embed vml part).

And, there is the patch: https://gerrit.libreoffice.org/c/core/+/128627
Good thing, that problem is fixed where the VML inside the WPG, so it can be loaded now like in the minimised version, the bad one the original still hangs, so there are more problems inside what Writer tries to load... so other fix also coming soon.
Comment 7 Attila Bakos (NISZ) 2022-01-19 17:03:13 UTC
Created attachment 177654 [details]
The minimised file is fixed
Comment 8 Attila Bakos (NISZ) 2022-01-20 07:38:29 UTC
Created attachment 177665 [details]
The orignal file without that formula

It seems it works in the master, without that embed formula (on page 9, fig.1.7 in the bottom of that page) as my attachment shows.
-> the previous fix not enough for that situation, it must be corrected, coming soon.
Comment 9 Attila Bakos (NISZ) 2022-01-20 07:39:22 UTC
Created attachment 177666 [details]
The sample without that formula
Comment 10 Attila Bakos (NISZ) 2022-01-20 17:07:11 UTC
Created attachment 177677 [details]
Pdf from the fixed master

It seems it opens and can be converted to pdf without problem :)
Comment 11 Commit Notification 2022-02-03 08:13:40 UTC
Attila Bakos (NISZ) committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/4a38ca4035ac03571925e72cb47e0beb8da2003a

tdf#146802 OOXML import: fix embedded VML in grouped textbox

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Timur 2022-02-04 11:58:07 UTC
attachment 96245 [details] and attachment 177593 [details] open now. 
Thanks, Attila.