Bug 146264 - Export to XHTML duplicates text in list item
Summary: Export to XHTML duplicates text in list item
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
4.4 all versions
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.4.0 target:7.6.0
Keywords:
: 146263 (view as bug list)
Depends on:
Blocks: (X)HTML-Export
  Show dependency treegraph
 
Reported: 2021-12-16 13:16 UTC by How can I remove my account?
Modified: 2023-05-08 07:43 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Trivial sample document. (16.91 KB, application/vnd.oasis.opendocument.text)
2021-12-16 13:17 UTC, How can I remove my account?
Details

Note You need to log in before you can comment on or make changes to this bug.
Description How can I remove my account? 2021-12-16 13:16:19 UTC
Description:
Open the attached document. Export to XHTML. View the resulting .html file in a browser.

Steps to Reproduce:
See above.

Actual Results:
Duplicated "Hello" text. One to the right of the image, one below.

Expected Results:
Just one "Hello", below the image, as in the .odt.


Reproducible: Always


User Profile Reset: No



Additional Info:
.
Comment 1 How can I remove my account? 2021-12-16 13:16:43 UTC
Code pointer: filter/source/xslt/odf2xhtml/export/xhtml/body.xsl
Comment 2 How can I remove my account? 2021-12-16 13:17:57 UTC
Created attachment 176965 [details]
Trivial sample document.
Comment 3 How can I remove my account? 2021-12-16 13:18:42 UTC
@Svante, can you perhaps immediately say what needs to be done in the XSLT to fix this?
Comment 4 V Stuart Foote 2021-12-16 13:31:55 UTC
*** Bug 146263 has been marked as a duplicate of this bug. ***
Comment 5 V Stuart Foote 2021-12-16 13:40:36 UTC
confirmed on Windows builds
Version: 7.2.4.1 (x64) / LibreOffice Community
Build ID: 27d75539669ac387bb498e35313b970b7fe9c4f9
CPU threads: 8; OS: Windows 10.0 Build 19044; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

For the exported XHTML the "Hello" in the UL is duplicated.
Comment 6 Xisco Faulí 2021-12-16 16:12:55 UTC
also reproduced in

Version: 5.4.0.0.alpha1+
Build ID: 9feb7f7039a3b59974cbf266922177e961a52dd1
CPU threads: 4; OS: Linux 5.10; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); Calc: group
Comment 7 Xisco Faulí 2021-12-16 16:14:12 UTC
and

Version: 4.3.0.0.alpha1+
Build ID: c15927f20d4727c3b8de68497b6949e72f9e6e9e
Comment 8 Svante Schubert 2021-12-22 18:16:46 UTC
Hej Tor, 

I am about to power down for the holiday season, something similar I had fixed years ago, but can not remember

The input content.xml of your bug.odt shows:

ODF INTPUT:
<text:list xml:id="list1078085969" text:style-name="L1">
    <text:list-item>
        <text:p text:style-name="P1">
            <draw:frame draw:style-name="fr1" draw:name="Image1" text:anchor-type="paragraph" svg:width="3.528cm" svg:height="3.528cm" draw:z-index="0">
                <draw:image xlink:href="Pictures/100000010000006400000064A2FB08F214CB5BE3.png" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" draw:mime-type="image/png"/>
            </draw:frame>
            <text:span text:style-name="T1">Hello</text:span>
        </text:p>
    </text:list-item>
</text:list>

HTML OUTPUT:
  <ul>
    <li>
      <div class="P1" style="margin-left:0cm;"><span class="Bullet_20_Symbols"
          style="display:block;float:left;min-width:0.635cm;">•</span>
        <!--Next 'div' is emulating the top height of a draw:frame.-->
        <!--Next '
            div' is a draw:frame.
        -->
        <div style="height:3.528cm;width:3.528cm; padding:0;  float:left; position:relative; left:0cm; " class="fr1"
          id="Image1"><img style="height:3.528cm;width:3.528cm;" alt="" src="data:image/png;base64, <some-base64-data>" />
        </div>
        <!--Next 'div' added for floating.-->
        <div style="display:inline; position:relative; left:0cm;">Hello</div>
        <div xmlns:loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0"
          style="clear:both; line-height:0; width:0; height:0; margin:0; padding:0;"> </div>Hello<span
          xmlns:loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0" class="odfLiEnd" /> 
      </div>
    </li>
  </ul>

The ODF output shows first the image and beyond the list bullet with the "Hello" text. 
As you see from the comments that are provided in the HTML output that the error is likely in the xhtml/body.xsl. 
Obviously, the span content is being matched twice by XSLT.

I have not worked a while with XSLT and quite busy with other tasks, guess I have to pass on this one. Have a nice holiday season...
Comment 9 How can I remove my account? 2021-12-29 14:10:32 UTC
Sugested patch in https://gerrit.libreoffice.org/c/core/+/127683
Comment 10 How can I remove my account? 2021-12-29 16:35:20 UTC
Hmm, Svante now sent me a slightly improved version of the body.xsl that seems to fix the problem, too, in a cleaner fashion, and he presumably actually understands how it works. Will resubmit the patch wit that instead.

Sadly, though, that patch is based on https://github.com/oasis-tcs/odf-tc/blob/master/src/test/resources/odf1.3/tools/odf2html/export/xhtml/body.xsl . Thus it lacks some (small) cleanups that have been done to the body.xsl in the LibreOffice sources. I had no idea that our XSLT for this filter apparently is just a copy of some more authoritative (?) XSLT from OASIS... Perhaps we then shouldn't have those files in git at all, but download them from upstream? No idea, and I don't really want to know.
Comment 11 Svante Schubert 2021-12-29 21:26:13 UTC
Michael usually synchs these files, whenever we are submitting changes on OASIS side he moves the parts to LO archive and vice versa. I would assume the LO repo is the more authoritative repo - as long we keep it synched.. :-)I wanted to test the files with the spec and wrap my head more around it, but holiday time and family interrupted my aim, so finger crossed we did not break something - I just saw the pattern that your new parameter was quite in parallel to the existing one... Hopefully, one of us finds some minutes in silence to think this through... ;-)
Comment 12 How can I remove my account? 2021-12-30 08:48:41 UTC
But do we want to have the authoritative copy in LO, if the fact is that the only person who understands it works on the OASIS side? Isn't that counter-productive?
Comment 13 Svante Schubert 2021-12-30 10:31:50 UTC
I spend this morning looking into this - but I fear there are regressions in the ODF 1.3 part 4 formula spec at this chapter 4.4 some content is missing:
https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part4-formula/OpenDocument-v1.3-os-part4-formula.html#__RefHeading__1017896_715980110
I have tested the XSLT "directly" with saxon out-of-the-box using Maven build environment and pom.xml. Otherwise, with LO there would be noise with changing automatic style names during load/save of specs in between tests.My test transformation can be triggered stand-alone via mvn install (activating all test documents in the pom.xml - see https://github.com/svanteschubert/odf-tc/blob/html-floating-fix/pom.xml)All test files and HTML output with indent can be found at my fork: https://github.com/svanteschubert/odf-tc/tree/html-floating-fix/docs/odf1.3/tmp-test-output

I can also add some background to the meaning of the XSLT code part - what it does:

1. An ODF paragraph with a draw:frame a child becomes an HTML div not as usually HTML p to be valid HTML.
2. A draw:frame with other elements on the same level (siblings being more than ODF soft page breaks) is becoming a "floating" div (CSS float left) embracing its following siblings within the new div. 
Until the next following sibling is a draw:frame than this will become a left floating div embracing again its following siblings. This handling is tricky.
3. The variable we are extending "stopAtFirstFrame" is marking a mode to deal with the content before the first draw:frame.

The problem that you have correctly fixed - likely partly - earlier,  Tor, and I have now understood better is that the above complex CSS floating routine of draw:frame - template mode="frameFloating"  - is being triggered from two spots.
Aside from <xsl:template match="text:p | draw:page">  the mode="frameFloating" is also entered from draw:frame. Which is why you added the parameter and also from the apply-templates from the list routine, where the duplication is triggered. 

Sorry, I have to leave this issue now my time-boxed morning is over and I have my own windmills I need to ride against. Please make sure you test the specification before/after for regressions (best after some indent on the XML, e.g. I used manually JEdit editor with the XML plugin),
Good hunting, Tor!
Svante
Comment 14 Commit Notification 2022-01-03 17:06:46 UTC
Tor Lillqvist committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/11d2a2f5d260bb27d0e67f90579ca761cb2250ea

tdf#146264: Add a somewhat questionable hack to fix the issue

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 How can I remove my account? 2022-01-04 13:42:29 UTC
Will add a unit test for this bug fix, too. https://gerrit.libreoffice.org/c/core/+/127935
Comment 16 Commit Notification 2022-01-04 16:09:11 UTC
Tor Lillqvist committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/6bfeb2290c585e0e5fe982dde6ac57e4afca2e2f

tdf#146264: Add unit test

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Svante Schubert 2023-05-05 12:37:04 UTC
I have an update on this issue as Michael and I have worked on the XSLT filter for the ODF spec (being the same as the one for LO) but using some JavaScript for MathML by default.

We just finished a pull-request:
https://github.com/oasis-tcs/odf-tc/pull/47

Same fix I did for https://bugs.documentfoundation.org/show_bug.cgi?id=154989

There, was a erroneous recursion in the XSLT, which I removed, making the fix (earlier mentioned 'ugly hack' obsolete - so I removed it again).
Now I added some test files to the ODF TC git repo and will collect further, that Michael will add for the LO regression test as well.


The major enhancement is that the alignment of images/frames is done by CSS position. 
It all depends on the ODF attribute @text:anchor-type
1) if the anchor is at tha character 'as-char' CSS position:static (the default) is used
2) if the anchor exist, but not 'as-char' CSS position:relative with float:left is used
3) if the achor does not exist, CSS position:absolute will position relative to its parent (mostly the body/page). The latter case fixed https://bugs.documentfoundation.org/show_bug.cgi?id=154989

In addition, I added a pageHeight, background color for ODF graphics @draw:fill-color), fixed obvious typos/bugs in XSLT flow:
https://github.com/oasis-tcs/odf-tc/pull/47/commits/f93dd81a5c6ba8f06a67e98f9f3dc4fd79ccab0c
(note here I have not omitted the 'ugly hack' but renamed it, removed later to see it makes no difference).
Comment 18 Svante Schubert 2023-05-05 12:47:46 UTC
Michael will take this back to LO sources.. Thank you for that, Michael!

PS: I forgot to mention that the doubled bullet was due to the existent @style:num-suffix, which is not rendered by LO.
Michael and I decided to remove both @style:num-prefix and @style:num-suffix for the HTML rendering of bullets!

PPS: Some might be able fix the HTML layout completly in the way LO is rendering it by taking into account the image property @style:vertical-pos="top" to be found in the styles.xml parent Graphics style.

Due to style:vertical-pos="top" the image has to be shown on top of the paragraph, where the first list-item is equal to the paragraph in LO.
For this reason, the image comes before the list and the label of the list item should be hold back for images with such an attribute.

But also the other attributes should be considered by test documents:
https://tdf.github.io/odftoolkit/odf1.3/OpenDocument-v1.3-reference.html#attribute_style:vertical-pos_1
"below"  "bottom"  "from-top"  "middle"  "top"

Also the test documents have to take into account that there might be multiple paragraphs ahead and/or after the image.

This might become a follow-up issue for someone...

Already too much time invested on this in my spare time, triggered by a hackfest from Thorsten Behrens and Michael Stahl's suggestion to work on these issue. Nice trick, Michael! ;-)
Comment 19 Commit Notification 2023-05-08 07:43:17 UTC
Svante Schubert committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/5178ade8a12cc52c02cd6288932e5a85dfbaea1b

XHTML export: Removing bullet suffix, which is not viewed in LO - see tdf146264

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.