When trying to import a HTML document into write it seems to insist on <html> on the first line. Otherwise it is being imported as plain text. This is broken as nowadays^W since at least 20years (HTML 3.2 as far as I can recall) requires a DOCTYPE header before. My expectation is that the parser needs to be adjusted so that it is more flexible.
Hmm, not clear this is valid. The GUI allows you to select the filter for import/opening of a document into a LO module. And HTML with .htm/.html are opened into Writer Web by default. The Writer Web module provides an HTML source view mode to directly adjust content and markup. And if you want to open HTML with formatting into Writer, selecting the "HTML Document Writer (*.html, *.htm, *.xhtml)" filter with the GUI will correctly handle it. Beyond that, not clear there is an issue. Perhaps provide a sample document you beleive is not being correctly opening into Writer Web, or with filter selection into Writer.
> And if you want to open HTML with formatting into Writer, selecting the "HTML Document Writer (*.html, *.htm, *.xhtml)" filter with the GUI will correctly handle it. As a user my expectation is that I don't have fiddle with a GUI to tell a program what is expected behavior. > Beyond that, not clear there is an issue. For me and others it is an issue, as people stumble over this (sorry) brain dead behavior. I needed to google for it to find a solution how to import a freaking HTML file correctly. > Perhaps provide a sample document you beleive is not being correctly opening into Writer Web, or with filter selection into Writer. The filter is not an option to me as said. Simple test: prompt > wget eiklaut.net prompt > libreoffice index.html Instead of the second step you can as well use the GUI, the File --> Open. It works as expected if I remove the first two lines from index.html.
(In reply to Dirk from comment #2) > Simple test: > > prompt > wget eiklaut.net > prompt > libreoffice index.html This is *not* about the DOCTYPE header (which should work with current releases of LO), but rather about the additional <?xml version="1.0" encoding="utf-8" ?> line. LO 6.1 will include a fix for that too. *** This bug has been marked as a duplicate of bug 114428 ***
This was no feature request bug a bug report. Can we have a solution also for the current versions pls?? My use case scenario is that I deal with a lot of PROPER HTML documents which libreoffice refuses to treat as such.
This issue is bug 114428, or its dupe bug 37753 in handling XHTML _not_ HTML. Corrected for 6.1.0 on current master and unlikely a candidate for backport to a 5.4. *** This bug has been marked as a duplicate of bug 114428 ***