The HTML export of the Table of Contents a little bit ugly, because the exported tabulators displayed as single spaces, like:
It would be better to convert them to ellipses (U+2026) or double spaces ( ):
Note: HTML export of LibreOffice 4.4 has got an optional CSS2 dependent dot leaders support: http://cgit.freedesktop.org/libreoffice/core/commit/?id=3e17677f705d004ebb87d1268d640da1a1c8cdf4
(Related commit: http://cgit.freedesktop.org/libreoffice/core/commit/?id=3e17677f705d004ebb87d1268d640da1a1c8cdf4)
The referred improvement with screen shots: https://wiki.documentfoundation.org/ReleaseNotes/4.4#Improved_Table_of_Contents_in_HTML_export
add dev-list to cc for some EasyHacks ...
Migrating Whiteboard tags to Keywords: (easyHack)
When configuring the HTML Compatibility and using the “save as” dialog the page numbers are already aligned on the right (float:right) and CSS dots are included (content:”….)
Now the easyHack is to do the same for the “export” dialog? Should this feature be turned on per default? Probably most users will not stumble across the HTML compatibility option and therefore this feature will not be used very often.
From my understanding so far the task is to enhance ScHTMLExport::WriteHeader() and ScHTMLExport::WriteBody() in /core/sc/source/filter/html/htmlexp.cxx
in a similar way like it is already implemented with SwHTMLWriter::WriteStream() in /core/sw/source/filter/html/wrthtml.cxx when rHtmlOptions.IsPrintLayoutExtension() is enabled?
I would like to take this Easy Hack and have a few questions:
I tested the HTML export with two different *.odt documents and found that the HTML output for the “Table of contents” depends on how the document was originally created:
1) When the document was created with OpenOffice 4.2 the HTML export of LibreOffice 5.2 for the “Table of Contents” looks like expected (ugly formatted, but without missing anything):
Table of Contents
2) When the document was created with LibreOffice 5.2 the HTML is exported without page numbers:
Table of Contents
There are also differences in the generated HTML Source:
In 1) HTML paragraphs <p>....</p> are used and
in 2) it is a HTML table <table>..<tbody> <tr> <td>...</td> …..
So the missing page numbers of 2) needs to be fixed as well?
Probably the “Table of Content” is not detected in the right way?
3) I also tested the HTML export when using the “Save as” dialog. This was working with all documents without any problems:
Table of Contents
Here for the formatting HTML Spans are added for the Headings and for the page numbers respectively:
<p style="margin-bottom: 0in" class="leaders"><span><a href="#__RefHeading___Toc136_1696943280">Heading1</a></span><span>1</span></p>
Now from my understanding there are two tasks to be done:
1. Fix the missing page numbers bug
2. Implement a better “Table of Content” formatting for the “Export” dialog, therefore parts of the “Save as” HTML export might be reused.
Am I on the right track?
To me, that sound very much like the right track.
If you also take a lot at
Then it is just happy hacking.
Regarding the „Export“ HTML Export I became lost in the wrong module.
Meanwhile I learned that when using the „Export“ dialog the HTML is exported by SwXMLExport::exportDoc (sw/source/filter/xml/xmlexp.cxx)
Is this correct?
Now there are two options to solve the Easy Hack:
1) Using SwHTMLWriter for the „Export“ dialog or
2) enhancing SwXMLExport, therefore parts of the “Save as” HTML export might be copied or reused.
I would prefer first option because second one sounds to me more like reinventing the wheel.
Are there any suggestions?
sorry Martin that nobody answered your questions before, but you're understandably confused and on the wrong track:
the filter available from File->Export is the XHTML export filter, which is implemented via XSLT in filter/source/xslt/odf2xhtml. this is why you see SwXMLExport being used, first a flat ODF document is exported and that is converted to XHTML via that XSLT stuff. we generally try to ignore it,
because, well, XSLT.
this bug is about the Writer HTML4 export filter, which is available from
File->Save As; it's in sw/source/filter/html and implemented in C++.
if you say that we already write the "..." then perhaps somebody
already implemented the requested feature without being aware
that this bug exists?
the problem with the missing page numbers sounds like a different bug,
please check if it's already filed, you may of course try to fix it :)
Michael, thanks a lot for your help :-)
So the Easy Task is to improve the XML->(X)HTML transformation to get a nicer „Table of Content“? And this XSLT filter should be used for both dialogs, “Export” and “Save as”?
However, first I will investigate the missing page numbers bug.
(In reply to Martin Nathansen from comment #10)
> So the Easy Task is to improve the XML->(X)HTML transformation to get a
> nicer „Table of Content“?
the HTML one. although if, as you say, the HTML one is already fixed,
then of course we wouldn't object if you fix the XHTML one too :)
> And this XSLT filter should be used for both
> dialogs, “Export” and “Save as”?
no, only for Export (it's not in "Save as" because there is no
corresponding XHTML import filter, while there is a HTML4 one).
XHTML and HTML4 are somewhat different file formats, i think.
Meanwhile I found the reason why the page numbers in the table of content are not exported when the document was originally created by LO Writer and why it is exported from OO Writer documents.
The difference between both Writer documents are the links in the LO Writer Table of Content. The OO Table of Content has no such links:
LO Writer Table of Content::
<text:p text:style-name="P4"><text:a xlink:type="simple" xlink:href="#__RefHeading___Toc164_1531117683" text:style-name="Index_20_Link" text:visited-style-name="Index_20_Link">Heading1<text:tab/>1</text:a></text:p>
OO Writer Table of Content:
Because of this difference there are different XSL templates for the XHTML transformation chosen. The selector between both templates is in Modul_filter/source/xslt/odf2xhtml/export/common/table_of_content.xsl - Line 40:
<xsl:when test="parent::table-of-content and */text:tab or */*/text:tab">
For the LO Writer Table of Content the template "createIndexBodyTable" is applied and this template seems to be unfinished. When disabling the selector for this template the LO Table of Content is transformed in the same way like the OO Table of Content.
So there are two options to continue with the EasyHack:
1) Fixing the bug in the XSL template "createIndexBodyTable" and improving the HTML table created by this template.
2) Implementing a new XSL template which exports HTML paragraphs (instead of a HTML table) and realizing the formatting similar to the HTML4 formatting in SwHTMLWriter.
related bug opened:
JanI is default CC for Easy Hacks (Add Jan; remove LibreOffice Dev List from CC)
In my opinion this Easy Hack should be canceled:
The output of the HTML export is just one page and therefore page numbers are meaningless.
So the page numbers should be just removed from the ToC and that's all.
(In reply to Martin Nathansen from comment #15)
> In my opinion this Easy Hack should be canceled:
> The output of the HTML export is just one page and therefore page numbers
> are meaningless.
> So the page numbers should be just removed from the ToC and that's all.
Or a page break should be added, so that all pages can be exported. Have to test that, because I was pretty sure it exports multiples pages, BUT as one long file.
Closing as pr suggestion