Bug 90704 - FILESAVE: EXPORT: HTML: Page margin is lost, outdated DTD
Summary: FILESAVE: EXPORT: HTML: Page margin is lost, outdated DTD
Status: RESOLVED DUPLICATE of bug 66044
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-19 05:18 UTC by JC Ahangama
Modified: 2015-04-25 10:02 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Three files to help suggest exporting to HTML5 (5.35 MB, application/zip)
2015-04-19 05:18 UTC, JC Ahangama
Details

Note You need to log in before you can comment on or make changes to this bug.
Description JC Ahangama 2015-04-19 05:18:49 UTC
Created attachment 114900 [details]
Three files to help suggest exporting to HTML5

I exported a simple ODT document into HTML and found that the .79in page margin was not translated. When looking at the source code of the exported HTML file I saw the margin stated as follows:
@page { margin: 0.79in }

The problem went away when I corrected it to:
body { margin: 0.79in 0.79in 0.79in 0.79in }

That is a true translation of the margins given in page formatting in the ODT file.

The paragraphs were separated by an empty <p...></p> plus a <br> that sets off the paragraphs too far from eah other. Then I noticed that the document type used is a very old one:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>

So, I changed the page to HTML5 standard:
<!DOCTYPE html>
<head>
<meta charset="utf-8">

Then I removed all HTML tags and traversed the document simply translating the frmatting given in the ODT file to CSS. The result was an HTML file nearly identical to the ODT file. It is perfectly faihtful to the formatting the user specified and nothing in addition to it. Only thing outside the user's specification is UTF-8 transit encoding instruction, which is outside the scope of defining the doc. This appears to be a straightforward algorithm to export files to HTML. 

This new file does not specify the language nor the ltr/rtl direction. First, the author of the ODT file did not specify them. Second, HTML5 expects the web server to guess the directionality. And 'lang' is only an informational item useful for machine translators and not a requirement by HTML standard. This file was fully validated by W3C validator as HTML5.

The files in the attached zip folder:
demo.odt              <= the original doc
demo.html             <= file exported by LOW
demo-html5.html       <= HTML5 version hand created

Please read notes included with the source code of the HTML5 file. The specified fonts are not necessary to test the pages.

Thank you most humbly for your great program and a wish you will start exporting to HTML5 standard.

JC
Comment 1 Buovjaga 2015-04-25 10:02:34 UTC
Thank you for your report. I found an existing request for html5 export, so I added a comment asking to look at your showcase and description. I'll close this as duplicate.

*** This bug has been marked as a duplicate of bug 66044 ***