Created attachment 93500 [details]
Proof of the bug (testcase)
LibreOffice Writer/Web has trouble opening certain HTML files. The problem occurs when LibreOffice opens an HTML file that has a DOCTYPE indicator that is not on the first line of the file. When LibreOffice attempts to open any file with <!DOCTYPE ...> not on the first line, it shows the HTML source code. See the attached "Proof of the bug" in the attachments section.
Steps to reproduce:
1. Make sure that you are running LibreOffice 4.2.
If you would like a quick way to reproduce the bug, download the attachment labeled "Proof of the bug." It is a zip file with four HTML files in it:
- 1) File that works.html - An HTML file that renders normally
- 2) File that doesn't work.html - A file that causes the bug
- 3) Moneydance-File that works.html - A revised version of a generated MoneyDance file by the original user who noticed this LibreOffice behavior that is fixed and renders normally
- 4) Moneydance-File that doesn't work.html - The original file generated by Moneydance that has the bug in it
If you want to make your own HTML files to further experiment with this bug, make one HTML file that has a DOCTYPE on the first line of the file, and one that has a DOCTYPE NOT on the first line.
3. Open the HTML files in LibreOffice Writer/Web (the ones that say "doesn't work" will make LibreOffice show the source while the ones that say "works" renders normally).
Please don't nominate you're own bugs - we have a procedure that QA/Devs do. Thanks
Hi xmlhttprequest, thanks for reporting.
Version: 18.104.22.168 Build ID: 05dceb5d363845f2cf968344d7adab8dcfb2ba71
Version: 22.214.171.124 Build ID: d7dbbd7842e6a58b0f521599204e827654e1fb8b
Version: 126.96.36.199.alpha0+ Build ID: ecf22894f522374cbdb8196d3bdef88e2fba7af9
TinderBox: Win-x86@39, Branch:master, Time: 2014-02-15_01:01:17
Version: 188.8.131.52.0+ Build ID: 2e2040401d99fe116b65b9661c3d4755091a660
Selecting the file type to open, explicitly as HTML Document (Writer) (*html;*.htm) open fine the file for me.
Importance is perhaps a little high, having an easy workaround.
Most definitely over prioritized - lowering to normal - this is a normal bug. Leaving as high as it's a regression.
Critical is meant for crashers, memory leaks, and similar bugs
(In reply to comment #2)
> Selecting the file type to open, explicitly as HTML Document (Writer)
> (*html;*.htm) open fine the file for me.
So, is it safe to say that LibreOffice's auto-detection of file types thinks that the HTML file is not an HTML file when its <!DOCTYPE ...> is not on the first line? If so, I will update the bug.
Thank you for helping and for finding the workaround.
I never would have thought about trying that.
I'll take care of it.
I submitted a fix for master to gerrit: https://gerrit.libreoffice.org/8079/. Unfortunately 4.2 requires a different fix (which hopefully I'll do later).
Maxim Monastirsky committed a patch related to this issue.
It has been pushed to "master":
fdo#74595 Make HTML detection to follow specs
The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.
Version: 184.108.40.206.alpha0+ Build ID: 22b709e84a7b6d38cab2dd37f2f2b28e0fc9d062
TinderBox: Win-x86@39, Branch:master, Time: 2014-02-20_00:01:31
Closing this one as FIXED as per comment 7.
And just for kicks: VERIFIED as per comment 8.
*** Bug 79863 has been marked as a duplicate of this bug. ***
My bug 79863 was classified as a duplicate, but my provided html has doctype on the first line, but with a few blank spaces before it. Also though this bug has been labelled verified-fixed, it still hasnt been fixed in 4.2.6.
Build ID: 2b959fb871a68f08a06850909abd16f71033aa3a
TinderBox: Linux-rpm_deb-x86@45-TDF, Branch:libreoffice-4-2, Time: 2014-06-06_06:33:25
(In reply to comment #12)
> My bug 79863 was classified as a duplicate, but my provided html has doctype
> on the first line, but with a few blank spaces before it.
Right, it was the same problem. LO required the DOCTYPE to be at the very beginning of the file. So it doesn't matter whether it has a space or a line break before.
> Also though this
> bug has been labelled verified-fixed, it still hasnt been fixed in 4.2.6.
Right, it was fixed for 4.3 (see the whiteboard). That fixed can't be applied to 4.2, because 4.2 uses a different code for HTML detection.
(In reply to comment #13)
> Right, it was fixed for 4.3 (see the whiteboard). That fixed can't be
> applied to 4.2, because 4.2 uses a different code for HTML detection.
Yes i do understand that the fix is different for 4.3 and 4.2 as you stated in comment 6, but you also stated "Unfortunately 4.2 requires a different fix (which hopefully I'll do later).". So are you confirming here that you arent going to be doing a 4.2 fix?
(In reply to comment #14)
> So are you confirming here that you arent going to be doing a 4.2 fix?
Probably not, but maybe I'll find some time for it at some point.
Good news for 4.2 users. Caolán pushed a fix for this to the 4.2 branch:
So this is fixed also for 4.2.6.
*** Bug 81865 has been marked as a duplicate of this bug. ***
*** Bug 82134 has been marked as a duplicate of this bug. ***