Bug 89290 - HTML export: replace tabulator characters with double spaces or ellipses (three dot leader) in Table of Contents
Summary: HTML export: replace tabulator characters with double spaces or ellipses (thr...
Status: RESOLVED WONTFIX
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.5.2 release
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: difficultyBeginner, easyHack, skillCpp
Depends on:
Blocks:
 
Reported: 2015-02-10 15:42 UTC by László Németh
Modified: 2017-02-14 08:57 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description László Németh 2015-02-10 15:42:14 UTC
The HTML export of the Table of Contents a little bit ugly, because the exported tabulators displayed as single spaces, like:

Title 5
Title2 10

It would be better to convert them to ellipses (U+2026) or double spaces (  ):

Title...5
Title2...10

or 

Title  5
Title  10

Note: HTML export of LibreOffice 4.4 has got an optional CSS2 dependent dot leaders support: http://cgit.freedesktop.org/libreoffice/core/commit/?id=3e17677f705d004ebb87d1268d640da1a1c8cdf4
(Related commit: http://cgit.freedesktop.org/libreoffice/core/commit/?id=3e17677f705d004ebb87d1268d640da1a1c8cdf4)
Comment 1 László Németh 2015-02-10 15:44:19 UTC
The referred improvement with screen shots: https://wiki.documentfoundation.org/ReleaseNotes/4.4#Improved_Table_of_Contents_in_HTML_export
Comment 2 Björn Michaelsen 2015-05-19 13:01:12 UTC
add dev-list to cc for some EasyHacks ...
Comment 3 Robinson Tryon (qubit) 2015-12-10 11:50:47 UTC Comment hidden (obsolete)
Comment 4 Martin Nathansen 2016-01-24 13:33:02 UTC
When configuring the HTML Compatibility and using the “save as” dialog  the page numbers are already aligned on the right (float:right) and CSS dots are included (content:”….)

Now the easyHack is to do the same for the “export” dialog? Should this feature be turned on per default?  Probably most users will not stumble across the HTML compatibility option and therefore this feature will not be used very often.
Comment 5 Martin Nathansen 2016-01-25 15:59:58 UTC
From my understanding so far the task is to enhance ScHTMLExport::WriteHeader() and ScHTMLExport::WriteBody() in /core/sc/source/filter/html/htmlexp.cxx
in a similar way like it is already implemented with SwHTMLWriter::WriteStream() in /core/sw/source/filter/html/wrthtml.cxx when rHtmlOptions.IsPrintLayoutExtension() is enabled?
Comment 6 Martin Nathansen 2016-02-03 17:14:18 UTC
I would like to take this Easy Hack and have a few questions:

I tested the HTML export with two different *.odt documents and found that the HTML output for the “Table of contents” depends on how the document was originally created:

1) When the document was created with OpenOffice 4.2 the HTML export of LibreOffice 5.2 for the “Table of Contents” looks like expected (ugly formatted, but without missing anything):

Table of Contents
Heading1  1
Heading2  1
  Heading21  1
  Heading22  2


2) When the document was created with LibreOffice 5.2 the HTML is exported without page numbers:

Table of Contents
Heading1
Heading2
  Heading21
  Heading22

There are also differences in the generated HTML Source:
In 1)  HTML paragraphs <p>....</p> are used and 
in 2)  it is a HTML table <table>..<tbody> <tr> <td>...</td> …..

So the missing page numbers of 2) needs to be fixed as well?
Probably the “Table of Content” is not detected in the right way?


3) I also tested the HTML export when using the “Save as” dialog. This was working with all documents without any problems:

Table of Contents
Heading1.......................................1
Heading2.......................................1
  Heading21....................................1
  Heading22....................................2

Here for the formatting HTML Spans are added for the Headings and for the page numbers respectively:
<p style="margin-bottom: 0in" class="leaders"><span><a href="#__RefHeading___Toc136_1696943280">Heading1</a></span><span>1</span></p>

Now from my understanding there are two tasks to be done:
1. Fix the missing page numbers bug
2. Implement a better “Table of Content” formatting for the “Export” dialog, therefore parts of the “Save as” HTML export might be reused.

Am I on the right track?
Comment 7 jani 2016-02-03 17:42:00 UTC
To me, that sound very much like the right track.

If you also take a lot at
https://wiki.documentfoundation.org/Development/GetInvolved/DeveloperStepByStep

Then it is just happy hacking.
Comment 8 Martin Nathansen 2016-02-05 13:17:56 UTC
Regarding the „Export“ HTML Export I became lost in the wrong module.
Meanwhile I learned that when using the „Export“ dialog the HTML is exported by SwXMLExport::exportDoc (sw/source/filter/xml/xmlexp.cxx)

Is this correct?

Now there are two options to solve the Easy Hack:

1) Using SwHTMLWriter for the „Export“ dialog or

2) enhancing SwXMLExport, therefore parts of the “Save as” HTML export might be copied or reused.

I would prefer first option because second one sounds to me more like reinventing the wheel.

Are there any suggestions?
Comment 9 Michael Stahl (allotropia) 2016-02-05 14:01:26 UTC
sorry Martin that nobody answered your questions before, but you're understandably confused and on the wrong track:

the filter available from File->Export is the XHTML export filter, which is implemented via XSLT in filter/source/xslt/odf2xhtml.  this is why you see SwXMLExport being used, first a flat ODF document is exported and that is converted to XHTML via that XSLT stuff.  we generally try to ignore it,
because, well, XSLT.

this bug is about the Writer HTML4 export filter, which is available from
File->Save As; it's in sw/source/filter/html and implemented in C++.

if you say that we already write the "..." then perhaps somebody
already implemented the requested feature without being aware
that this bug exists?

the problem with the missing page numbers sounds like a different bug,
please check if it's already filed, you may of course try to fix it :)
Comment 10 Martin Nathansen 2016-02-05 14:41:00 UTC
Michael, thanks a lot for your help :-)

So the Easy Task is to improve the XML->(X)HTML transformation to get a nicer „Table of Content“? And this XSLT filter should be used for both dialogs, “Export” and “Save as”?

However, first I will investigate the missing page numbers bug.
Comment 11 Michael Stahl (allotropia) 2016-02-09 14:10:46 UTC
(In reply to Martin Nathansen from comment #10)
> So the Easy Task is to improve the XML->(X)HTML transformation to get a
> nicer „Table of Content“?

the HTML one.  although if, as you say, the HTML one is already fixed,
then of course we wouldn't object if you fix the XHTML one too :)

> And this XSLT filter should be used for both
> dialogs, “Export” and “Save as”?

no, only for Export (it's not in "Save as" because there is no
corresponding XHTML import filter, while there is a HTML4 one).

XHTML and HTML4 are somewhat different file formats, i think.
Comment 12 Martin Nathansen 2016-02-11 17:34:35 UTC
Meanwhile I found the reason why the page numbers in the table of content  are not exported when the document was originally created by LO Writer and why it is exported from OO Writer documents.

The difference between both Writer documents are the links in the LO Writer Table of Content. The OO Table of Content has no such links:

LO Writer Table of Content::
…
<text:p text:style-name="P4"><text:a xlink:type="simple" xlink:href="#__RefHeading___Toc164_1531117683" text:style-name="Index_20_Link" text:visited-style-name="Index_20_Link">Heading1<text:tab/>1</text:a></text:p>
…

OO Writer Table of Content:
…
<text:p text:style-name="P3">Heading1<text:tab/>1</text:p>
…
 
Because of this difference there are different XSL templates for the XHTML transformation chosen. The selector between both templates is in  Modul_filter/source/xslt/odf2xhtml/export/common/table_of_content.xsl - Line  40:
<xsl:when test="parent::table-of-content and */text:tab[1] or */*/text:tab[1]">

For the LO Writer Table of Content the template "createIndexBodyTable" is applied and this template seems to be unfinished. When disabling the selector for this template the LO Table of Content is transformed in the same way like the OO Table of Content.

So there are two options to continue with the EasyHack:

1) Fixing the bug in the XSL template "createIndexBodyTable" and improving the HTML table created by this template.

2) Implementing a new XSL template which exports HTML paragraphs (instead of a HTML table) and realizing the formatting similar to the HTML4 formatting in SwHTMLWriter.
Comment 13 Martin Nathansen 2016-02-12 15:29:54 UTC
related bug opened:
https://bugs.documentfoundation.org/show_bug.cgi?id=97801
Comment 14 Robinson Tryon (qubit) 2016-02-18 14:51:59 UTC Comment hidden (obsolete)
Comment 15 Martin Nathansen 2016-03-09 14:40:18 UTC
In my opinion this Easy Hack should be canceled:
The output of the HTML export is just one page and therefore page numbers are meaningless. 
So the page numbers should be just removed from the ToC and that's all.
Comment 16 jani 2016-03-09 15:58:25 UTC
(In reply to Martin Nathansen from comment #15)
> In my opinion this Easy Hack should be canceled:
> The output of the HTML export is just one page and therefore page numbers
> are meaningless. 
> So the page numbers should be just removed from the ToC and that's all.


Or a page break should be added, so that all pages can be exported. Have to test that, because I was pretty sure it exports multiples pages, BUT as one long file.
Comment 17 jani 2017-01-23 12:56:57 UTC
Closing as pr suggestion