Bug 126879 - HTML-Export: Doctype doesn't coincide with end of Standalone-Tag
Summary: HTML-Export: Doctype doesn't coincide with end of Standalone-Tag
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
5.1.5.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Andreas Heinisch
URL:
Whiteboard: target:7.3.0
Keywords:
Depends on:
Blocks: (X)HTML-Export
  Show dependency treegraph
 
Reported: 2019-08-13 14:01 UTC by Robert Großkopf
Modified: 2021-09-13 20:10 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Großkopf 2019-08-13 14:01:08 UTC
Open a empty writer page.
Save it as "HTML".
Have a look at the HTML-code.
It will look like this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
<meta name="generator" content="LibreOffice 6.3.0.4 (Linux)"/>
<br/>

There is a DOCTYPE HTML 4.0 (which is obsolete since 1999!) and every standalone tag ends with "/>", which is only allowed with HTML5 or XHTML.

Two possible solutions:
1) Set <!DOCTYPE html> instead of the very old DOCTYPE. It will set the document to HTML5.
2) Remove all frontslashes followed by ">".

I would prefer to go forward, not back to 1999!

All tested with LO 6.3.0.4 (see above), OpenSUSE 15 64bit rpm Linux.
Comment 1 spots4as 2019-08-14 05:44:59 UTC
confirmed with LO 6.0.7.3 (Ubuntu Mate 18.04)
Comment 2 Robert Großkopf 2019-08-14 07:36:42 UTC
Could also reproduce this in OpenSUSE 15 with LO 5.1.5.2, the oldest Version, I have installed here.
Comment 3 Andreas Heinisch 2021-08-11 22:01:23 UTC
The definitions currently in use are the following:

#define OOO_STRING_SVTOOLS_HTML_doctype40 "HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\""
#define OOO_STRING_SVTOOLS_XHTML_doctype11                                                         \
    "html PUBLIC \"-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN\" "                                   \
    "\"http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd\""

Are these both obsolete now? 

I could just find this documentation and have no idea, if we can just drop them:
https://www.w3.org/QA/2002/04/valid-dtd-list.html
Comment 4 Robert Großkopf 2021-08-12 06:25:40 UTC
(In reply to Andreas Heinisch from comment #3)
> The definitions currently in use are the following:
> 
> #define OOO_STRING_SVTOOLS_HTML_doctype40 
"HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\"
has to be replaced by
"html\"

> #define OOO_STRING_SVTOOLS_XHTML_doctype11                                  
> \
>     "html PUBLIC \"-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN\" " 
> \
>     "\"http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd\""
> 
> Are these both obsolete now?

Don't know. 
See https://lists.w3.org/Archives/Public/www-math/2003Jun/0041.html
Seems it sometimes it better not using a doctype at all because the browser would try to fetch the whole DTD

I have only seen the difference between HTML4 and HTML5. And the declaration of HTML4 and standalone tags like <br/> is a bug.

The other doctypes are for xhtml-files and works well here.
Comment 5 Commit Notification 2021-08-16 16:37:54 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/821d2f8c058f7b7f45e23203d98aa9237289e265

tdf#126879 - Drop obsolete DOCTYPE HTML 4.0

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Commit Notification 2021-09-13 20:10:56 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/a7084f156a75ab363d2562b485b240bd350563fc

tdf#126879: sw_htmlexport: Add unittest

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.