Bug Hunting Session
Bug 66999 - XHTML export generates improper markup
Summary: XHTML export generates improper markup
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
4.0.3.3 release
Hardware: Other All
: medium normal
Assignee: Robert Antoni Buj i Gelonch
URL:
Whiteboard: target:5.0.0 target:4.4.4
Keywords:
: 66998 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-07-17 13:48 UTC by yet
Modified: 2015-05-14 16:11 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
DOCX (135.98 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2013-07-17 13:49 UTC, yet
Details
XHTML conversion result (603.70 KB, text/html)
2013-07-17 13:49 UTC, yet
Details
Outline numbering has been fixed in test case document (59.40 KB, application/vnd.oasis.opendocument.text)
2015-05-13 11:20 UTC, Robert Antoni Buj i Gelonch
Details
Outline numbering has been fixed in test case document: exporting output (603.94 KB, text/html)
2015-05-13 11:22 UTC, Robert Antoni Buj i Gelonch
Details
Example of CSS counters in Outline Numbering (2.78 KB, text/html)
2015-05-13 12:26 UTC, Robert Antoni Buj i Gelonch
Details

Note You need to log in before you can comment on or make changes to this bug.
Description yet 2013-07-17 13:48:56 UTC
I converted the attached DOCX file through UNOCONV to XHTML.

The generated XHTML file contains an invalid <H> tag.
Comment 1 yet 2013-07-17 13:49:27 UTC
Created attachment 82544 [details]
DOCX
Comment 2 yet 2013-07-17 13:49:54 UTC
Created attachment 82545 [details]
XHTML conversion result
Comment 3 David Tardon 2013-07-18 05:00:05 UTC
*** Bug 66998 has been marked as a duplicate of this bug. ***
Comment 4 Julien Nabet 2013-07-21 16:07:09 UTC
On pc Debian x86-64 with master sources updated yesterday, I reproduced the problem, put it at New and All platforms.
Comment 5 QA Administrators 2015-04-01 14:41:49 UTC
** Please read this message in its entirety before responding **

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

   *Test to see if the bug is still present on a currently supported version of LibreOffice (4.4.1 or later)
   https://www.libreoffice.org/download/

   *If the bug is present, please leave a comment that includes the version of LibreOffice and your operating system, and any changes you see in the bug behavior
 
   *If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a short comment that includes your version of LibreOffice and Operating System

Please DO NOT

   *Update the version field
   *Reply via email (please reply directly on the bug tracker)
   *Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 

1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3)

http://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug 
3. Leave a comment with your results. 
4a. If the bug was present with 3.3 - set version to "inherited from OOo"; 
4b. If the bug was not present in 3.3 - add "regression" to keyword


Feel free to come ask questions or to say hello in our QA chat: http://webchat.freenode.net/?channels=libreoffice-qa

Thank you for your help!

-- The LibreOffice QA Team This NEW Message was generated on: 2015-04-01
Comment 6 Buovjaga 2015-04-24 10:18:11 UTC
Exporting from LibO GUI and validating with http://validator.w3.org/ I get 46 Errors.
attachment 82545 [details] gives 46 Errors, 1 warning(s) so I guess the problem persists.

Win 7 Pro 64-bit Version: 5.0.0.0.alpha1+ (x64)
Build ID: f3375fa07f27bd2ade519af3c07d69040d10eaa9
TinderBox: Win-x86_64@42, Branch:master, Time: 2015-04-22_23:38:50
Locale: fi_FI
Comment 7 Julien Nabet 2015-05-11 19:56:20 UTC
Robert: noticing your xsl work about mediawiki export, thought you might be interested in this one.
See http://opengrok.libreoffice.org/xref/core/filter/source/xslt/odf2xhtml/export/xhtml/ for code pointer
If I'm wrong, don't hesitate to "uncc" yourself of course and sorry for the noise.
Comment 8 Robert Antoni Buj i Gelonch 2015-05-11 23:06:48 UTC
OK, a header can't be within a list in XHTML, and I think that LibreOffice's Outline Numbering can be converted to CSS Nested Counters. http://www.w3.org/TR/CSS2/generate.html#scope
Before I do anything, first I'll read carefully the code.
Comment 9 Robert Antoni Buj i Gelonch 2015-05-13 10:10:36 UTC
There is no issue with the export filter. The document has a bad format due to the fact headers are inside a numbering, probably the user did an incomplete Outline Numbering. This can be fixed by cleaning the format, removing the numbering, selecting the desired style (Header 1, Header 2, ...) in each header (Outline Numbering is preserved).
Comment 10 yet 2015-05-13 10:16:55 UTC
So you are saying that improper input is an excuse for generating invalid output?
Comment 11 Robert Antoni Buj i Gelonch 2015-05-13 10:27:26 UTC
Yes, the incorrect Outline Numbering is the reason for generating invalid output. Because headers can't be within a numbering list.
Comment 12 yet 2015-05-13 10:35:35 UTC
With respect but your argumentation is completely nonsense. An export filter must never  ever generate improper output - independent of whatever error situation - either in the processing or within the input. Once again: you are generating improper HTML - this *is* a bug.
Comment 13 Robert Antoni Buj i Gelonch 2015-05-13 11:20:50 UTC
Created attachment 115557 [details]
Outline numbering has been fixed in test case document

There are still two errors in xml validation.
Comment 14 Robert Antoni Buj i Gelonch 2015-05-13 11:22:31 UTC
Created attachment 115558 [details]
Outline numbering has been fixed in test case document: exporting output
Comment 15 yet 2015-05-13 11:25:13 UTC
This is an argument? Am I talking here against a wall? Do you have a basic understanding of quality assurance for a certain functionality as an output filter?
An output filter is supposed something that is compatible with the specs of the format that is support to generate. Is <H> a valid HTML tags? Instead of swallowing error or performing stupid error handling either raise an error message or deal with such an error situation properly. Generating trash output is not a correct functionality of an output filter.
Comment 16 Robert Antoni Buj i Gelonch 2015-05-13 12:26:54 UTC
Created attachment 115559 [details]
Example of CSS counters in Outline Numbering

The w3c validator reported that header was not a permitted content in element list (OL/UL) in the initial document, plus two more errors. From my point of view, the incorrect usage of the headers is an user mistake, and it's not an issue in the export filter. Now, I'm focused in these two errors. Next, I think that we can reduce the complexity in Outline Numbering by using CSS counters, but it's an enhancement.
Comment 17 Julien Nabet 2015-05-13 12:34:32 UTC
Michael/Miklos: there's a debate here about if the result of an export filter should be ok even if input is wrong, thought you might have an opinion as experts in filters (perhaps not in xhtml export filter made in xsl).

Also, will we keep xsl filters or will they be converted in Python? I think having read about this but I don't remember where or when.
Comment 18 David Tardon 2015-05-13 13:07:28 UTC
I am in favor of removing the XHTML export filter entirely. Seriously, nobody is going to touch that steaming pile of... XSLT. Actually, thinking about it, I am in favor of removing all the XSLT export filters, except maybe the MediaWiki one, for which we have received some improvements recently. Let's just admit that XSLT has never been the right tool for a complex document format conversion...
Comment 21 Commit Notification 2015-05-14 16:11:35 UTC
Robert Antoni Buj Gelonch committed a patch related to this issue.
It has been pushed to "libreoffice-4-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=445314e0bc5235b8a0c68348cd6ceed5517fc079&h=libreoffice-4-4

odf2xhtml: tdf#66999 character '–' is not allowed in the value of attribute 'id'

It will be available in 4.4.4.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 22 Commit Notification 2015-05-14 16:11:39 UTC
Robert Antoni Buj Gelonch committed a patch related to this issue.
It has been pushed to "libreoffice-4-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=04746c585a50f30f96a3b2cb7ce9dc3f1fdbd6bd&h=libreoffice-4-4

odf2xhtml: tdf#66999 there is no attribute 'name'

It will be available in 4.4.4.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.