I converted the attached DOCX file through UNOCONV to XHTML. The generated XHTML file contains an invalid <H> tag.
Created attachment 82544 [details] DOCX
Created attachment 82545 [details] XHTML conversion result
*** Bug 66998 has been marked as a duplicate of this bug. ***
On pc Debian x86-64 with master sources updated yesterday, I reproduced the problem, put it at New and All platforms.
** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: *Test to see if the bug is still present on a currently supported version of LibreOffice (4.4.1 or later) https://www.libreoffice.org/download/ *If the bug is present, please leave a comment that includes the version of LibreOffice and your operating system, and any changes you see in the bug behavior *If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a short comment that includes your version of LibreOffice and Operating System Please DO NOT *Update the version field *Reply via email (please reply directly on the bug tracker) *Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to "inherited from OOo"; 4b. If the bug was not present in 3.3 - add "regression" to keyword Feel free to come ask questions or to say hello in our QA chat: http://webchat.freenode.net/?channels=libreoffice-qa Thank you for your help! -- The LibreOffice QA Team This NEW Message was generated on: 2015-04-01
Exporting from LibO GUI and validating with http://validator.w3.org/ I get 46 Errors. attachment 82545 [details] gives 46 Errors, 1 warning(s) so I guess the problem persists. Win 7 Pro 64-bit Version: 5.0.0.0.alpha1+ (x64) Build ID: f3375fa07f27bd2ade519af3c07d69040d10eaa9 TinderBox: Win-x86_64@42, Branch:master, Time: 2015-04-22_23:38:50 Locale: fi_FI
Robert: noticing your xsl work about mediawiki export, thought you might be interested in this one. See http://opengrok.libreoffice.org/xref/core/filter/source/xslt/odf2xhtml/export/xhtml/ for code pointer If I'm wrong, don't hesitate to "uncc" yourself of course and sorry for the noise.
OK, a header can't be within a list in XHTML, and I think that LibreOffice's Outline Numbering can be converted to CSS Nested Counters. http://www.w3.org/TR/CSS2/generate.html#scope Before I do anything, first I'll read carefully the code.
There is no issue with the export filter. The document has a bad format due to the fact headers are inside a numbering, probably the user did an incomplete Outline Numbering. This can be fixed by cleaning the format, removing the numbering, selecting the desired style (Header 1, Header 2, ...) in each header (Outline Numbering is preserved).
So you are saying that improper input is an excuse for generating invalid output?
Yes, the incorrect Outline Numbering is the reason for generating invalid output. Because headers can't be within a numbering list.
With respect but your argumentation is completely nonsense. An export filter must never ever generate improper output - independent of whatever error situation - either in the processing or within the input. Once again: you are generating improper HTML - this *is* a bug.
Created attachment 115557 [details] Outline numbering has been fixed in test case document There are still two errors in xml validation.
Created attachment 115558 [details] Outline numbering has been fixed in test case document: exporting output
This is an argument? Am I talking here against a wall? Do you have a basic understanding of quality assurance for a certain functionality as an output filter? An output filter is supposed something that is compatible with the specs of the format that is support to generate. Is <H> a valid HTML tags? Instead of swallowing error or performing stupid error handling either raise an error message or deal with such an error situation properly. Generating trash output is not a correct functionality of an output filter.
Created attachment 115559 [details] Example of CSS counters in Outline Numbering The w3c validator reported that header was not a permitted content in element list (OL/UL) in the initial document, plus two more errors. From my point of view, the incorrect usage of the headers is an user mistake, and it's not an issue in the export filter. Now, I'm focused in these two errors. Next, I think that we can reduce the complexity in Outline Numbering by using CSS counters, but it's an enhancement.
Michael/Miklos: there's a debate here about if the result of an export filter should be ok even if input is wrong, thought you might have an opinion as experts in filters (perhaps not in xhtml export filter made in xsl). Also, will we keep xsl filters or will they be converted in Python? I think having read about this but I don't remember where or when.
I am in favor of removing the XHTML export filter entirely. Seriously, nobody is going to touch that steaming pile of... XSLT. Actually, thinking about it, I am in favor of removing all the XSLT export filters, except maybe the MediaWiki one, for which we have received some improvements recently. Let's just admit that XSLT has never been the right tool for a complex document format conversion...
https://gerrit.libreoffice.org/gitweb?p=core.git;a=commitdiff;h=ebe4eb9d0ab2759d7631dd2a967a7031da7e4c5e
… and https://gerrit.libreoffice.org/gitweb?p=core.git;a=commitdiff;h=070f0abd4413102cb442bc5d3d39c1c530e8d4c6
Robert Antoni Buj Gelonch committed a patch related to this issue. It has been pushed to "libreoffice-4-4": http://cgit.freedesktop.org/libreoffice/core/commit/?id=445314e0bc5235b8a0c68348cd6ceed5517fc079&h=libreoffice-4-4 odf2xhtml: tdf#66999 character '–' is not allowed in the value of attribute 'id' It will be available in 4.4.4. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Robert Antoni Buj Gelonch committed a patch related to this issue. It has been pushed to "libreoffice-4-4": http://cgit.freedesktop.org/libreoffice/core/commit/?id=04746c585a50f30f96a3b2cb7ce9dc3f1fdbd6bd&h=libreoffice-4-4 odf2xhtml: tdf#66999 there is no attribute 'name' It will be available in 4.4.4. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.