Created attachment 138399 [details] Reproducer document. Steps to reproduce: 1) Open the attached bugdoc. 2) Expected result: XHTML file is imported into Writer. 3) Actual result: XHTML file is opened in Writer as plain text. This only happens when the XHTML file has an XML declaration to state the encoding (but if it does not, the W3C validator raises a warning.)
Miklos Vajna committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=4af729f31c64c09c76ea8bcfa5067092571b92de tdf#114428 filter: recognize XHTML with XML declaration as HTML It will be available in 6.1.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
*** Bug 37753 has been marked as a duplicate of this bug. ***
Miklos Vajna committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=14daba5bd0ba64ff53ad98de7a84537ff03024ea Related: tdf#114428 filter: associate .xhtml with HTML import It will be available in 6.1.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Miklos Vajna committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=3fe64261b5658e28e2c0a1630cf878f066f77f0c Related: tdf#114428 svtools HTML import: avoid XML declaration in body text It will be available in 6.1.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
*** Bug 114856 has been marked as a duplicate of this bug. ***
As it is a bug not parsing proper HTML documents correctly a backport for 5.x would be appreciated.
Hi Miklos, * Seems to test out well on current master, so a backport to 6.0 seems reasonable. The approach for handling the XHTML seems benign and maybe not too risky to backport to 5.4? https://gerrit.libreoffice.org/#/c/46324/ https://gerrit.libreoffice.org/#/c/46387/ https://gerrit.libreoffice.org/#/c/46388/
+1 that would be great!
I've proposed them for libreoffice-6-0, let's see how it goes. It's somewhere between a feature and a bugfix... :-)
Miklos Vajna committed a patch related to this issue. It has been pushed to "libreoffice-6-0": http://cgit.freedesktop.org/libreoffice/core/commit/?id=bf3940fc88e732a498598f0df61eafd63bbd5ce3&h=libreoffice-6-0 Related: tdf#114428 svtools HTML import: avoid XML declaration in body text It will be available in 6.0.0.2. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Miklos Vajna committed a patch related to this issue. It has been pushed to "libreoffice-6-0": http://cgit.freedesktop.org/libreoffice/core/commit/?id=f937a432c2351852e8b237c6e11dd9e43a2b28c9&h=libreoffice-6-0 Related: tdf#114428 filter: associate .xhtml with HTML import It will be available in 6.0.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Miklos Vajna committed a patch related to this issue. It has been pushed to "libreoffice-6-0": http://cgit.freedesktop.org/libreoffice/core/commit/?id=6aa65f7664fe0dbe8c9d4ba7f320ef216e928780&h=libreoffice-6-0 tdf#114428 filter: recognize XHTML with XML declaration as HTML It will be available in 6.0.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.