Bug 114428

Summary: XHTML import: xml declaration results in plain text import into Writer
Product: LibreOffice Reporter: Miklos Vajna <vmiklos>
Component: WriterAssignee: Miklos Vajna <vmiklos>
Status: RESOLVED FIXED    
Severity: normal CC: dirk, shlomif
Priority: medium    
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard: target:6.1.0 target:6.0.0.2 target:6.0.1
Crash report or crash signature: Regression By:
Attachments: Reproducer document.

Description Miklos Vajna 2017-12-12 15:29:45 UTC
Created attachment 138399 [details]
Reproducer document.

Steps to reproduce:

1) Open the attached bugdoc.
2) Expected result: XHTML file is imported into Writer.
3) Actual result: XHTML file is opened in Writer as plain text.

This only happens when the XHTML file has an XML declaration to state the encoding (but if it does not, the W3C validator raises a warning.)
Comment 1 Commit Notification 2017-12-13 13:45:43 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=4af729f31c64c09c76ea8bcfa5067092571b92de

tdf#114428 filter: recognize XHTML with XML declaration as HTML

It will be available in 6.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 2 Maxim Monastirsky 2017-12-13 14:21:32 UTC
*** Bug 37753 has been marked as a duplicate of this bug. ***
Comment 3 Commit Notification 2017-12-14 08:11:50 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=14daba5bd0ba64ff53ad98de7a84537ff03024ea

Related: tdf#114428 filter: associate .xhtml with HTML import

It will be available in 6.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 4 Commit Notification 2017-12-14 08:11:57 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=3fe64261b5658e28e2c0a1630cf878f066f77f0c

Related: tdf#114428 svtools HTML import: avoid XML declaration in body text

It will be available in 6.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 5 Maxim Monastirsky 2018-01-06 21:20:16 UTC
*** Bug 114856 has been marked as a duplicate of this bug. ***
Comment 6 Dirk 2018-01-07 15:43:08 UTC
As it is a bug not parsing proper HTML documents correctly a backport for 5.x would be appreciated.
Comment 7 V Stuart Foote 2018-01-07 16:45:27 UTC
*** Bug 114856 has been marked as a duplicate of this bug. ***
Comment 8 V Stuart Foote 2018-01-07 17:04:06 UTC
Hi Miklos, *

Seems to test out well on current master, so a backport to 6.0 seems reasonable.

The approach for handling the XHTML seems benign and maybe not too risky to backport to 5.4?

https://gerrit.libreoffice.org/#/c/46324/
https://gerrit.libreoffice.org/#/c/46387/
https://gerrit.libreoffice.org/#/c/46388/
Comment 9 Dirk 2018-01-09 09:01:25 UTC
+1 

that would be great!
Comment 10 Miklos Vajna 2018-01-09 10:52:23 UTC
I've proposed them for libreoffice-6-0, let's see how it goes. It's somewhere between a feature and a bugfix... :-)
Comment 11 Commit Notification 2018-01-09 15:56:19 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "libreoffice-6-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=bf3940fc88e732a498598f0df61eafd63bbd5ce3&h=libreoffice-6-0

Related: tdf#114428 svtools HTML import: avoid XML declaration in body text

It will be available in 6.0.0.2.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Commit Notification 2018-01-27 20:04:09 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "libreoffice-6-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f937a432c2351852e8b237c6e11dd9e43a2b28c9&h=libreoffice-6-0

Related: tdf#114428 filter: associate .xhtml with HTML import

It will be available in 6.0.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 13 Commit Notification 2018-01-27 20:04:15 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "libreoffice-6-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=6aa65f7664fe0dbe8c9d4ba7f320ef216e928780&h=libreoffice-6-0

tdf#114428 filter: recognize XHTML with XML declaration as HTML

It will be available in 6.0.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.