Bug 154989 - Text duplicated in XHTML export
Summary: Text duplicated in XHTML export
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
7.1.5.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Svante Schubert
URL:
Whiteboard: target:7.6.0 target:7.5.4
Keywords: bibisected, bisected, regression
Depends on:
Blocks: (X)HTML-Export
  Show dependency treegraph
 
Reported: 2023-04-24 15:21 UTC by Stéphane Guillou (stragu)
Modified: 2023-07-11 15:09 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
test ODG (68.01 KB, application/vnd.oasis.opendocument.graphics)
2023-04-24 15:21 UTC, Stéphane Guillou (stragu)
Details
resulting HTML (66.36 KB, text/html)
2023-04-24 15:22 UTC, Stéphane Guillou (stragu)
Details
New reuslt, after adjusting filter for this test case (5.44 KB, text/html)
2023-04-26 19:26 UTC, Svante Schubert
Details
comparison: ODG in LO; export to XHTML by 7.5.3; export to XHTML by 7.6 alpha0+ (302.64 KB, image/png)
2023-05-11 13:54 UTC, Stéphane Guillou (stragu)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stéphane Guillou (stragu) 2023-04-24 15:21:28 UTC
Created attachment 186888 [details]
test ODG

Contents of text boxes are multiplied in XHTML export.

Steps to reproduce:
1. Open attachment
2. File > Export... > XHTML

Can also be done with command:

soffice --headless --convert-to "html:XHTML Draw File:UTF8" testfile.odg

Result: resulting HTML file has several copies of the contents of each text box.

Bibisected with linux-64-7.1 repo to first bad commit 312432afdcb4032232a4fa5729851b4f3d473125 which points to core commit 932be9b55ce8b996184e724127925c436130cecd which is a cherry-pick of:

commit f680b6d74209fd78c547201b2f14c6547e55c81b
author	Svante Schubert <svante.schubert@gmail.com>	Wed Sep 09 15:27:54 2020 +0200
committer	Michael Stahl <michael.stahl@allotropia.de>	Tue Mar 02 13:24:15 2021 +0100
HTML XSLT: Adding missing MathML siblings. The floating draw:frame sibling content being text were not shown, nor further occuring draw:frame (other MathML)
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/111620

Svante and Michael, can you please have a look?
Comment 1 Stéphane Guillou (stragu) 2023-04-24 15:22:41 UTC
Created attachment 186889 [details]
resulting HTML
Comment 2 Svante Schubert 2023-04-26 10:46:21 UTC
I can reproduce this issue and will try to take a closer look on it (but with a low priority in my work queue)..
Comment 3 Svante Schubert 2023-04-26 19:26:16 UTC
Created attachment 186944 [details]
New reuslt, after adjusting filter for this test case

Initial feedback, as I found the erroneous recursion used for emulating floating images (and their floating siblings).

I will need to create a test case for this floating to make sure that this previous functionality can co-exist with this test case.
Comment 4 Svante Schubert 2023-04-26 19:30:18 UTC
I have done several enhancements in the OASIS ODF TC Github, where we synch the XSLT filter with the LO version to create the HTML version of the ODF specification.

All current changes can be found at:

https://github.com/oasis-tcs/odf-tc/commit/2a76ccd24030e16ad284349ca75187e1a96f38e0

This will obviously break the floating functionality but it is a good start as the document looks very similar to the input now, colors, position, etc. added! :-)

Thanks to Thorsten Behrens who organized a hackfest in Hamburg and invited me and Michael Stahl, who brought this issue up and will - hopefully ;-) - assist in merging the existing updates in the ODF-TC repo and with automated regression tests. This motivated me to fix this issue right away - even if there was no real priority..
Comment 5 Stéphane Guillou (stragu) 2023-04-28 21:16:57 UTC
Excellent, thank you for working on it and looking forward to seeing the improvements merged :)
Comment 6 Svante Schubert 2023-05-05 12:56:52 UTC
With Michaels help - he merged the earlier LO XSLT including fixes (see https://github.com/oasis-tcs/odf-tc/pull/46) - I finished my part of work on https://github.com/oasis-tcs/odf-tc/pull/47 

Michael is now taking the work back to LO repo and is updating the regression tests to the fixes we made (to avoid failing tests due our fixes (false positives )) ;-)

\o/
Comment 7 Commit Notification 2023-05-08 07:43:14 UTC
Svante Schubert committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e857b12dada1468cb3bdb49ed5ea636df0b6d153

tdf#154989 filter: XHTML export: avoid duplicated frames

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Commit Notification 2023-05-08 15:22:26 UTC
Svante Schubert committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/cb27c6c1b82272e8812bcb446e7179cc4f32bf34

tdf#154989 filter: XHTML export: avoid duplicated frames

It will be available in 7.5.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Stéphane Guillou (stragu) 2023-05-11 13:51:52 UTC
Thank you both, fabulous improvements in similarity to original, and structure of the source, in:

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 88bd66d258de5fee3d35aba80c61fec49eb2a969
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded
Comment 10 Stéphane Guillou (stragu) 2023-05-11 13:54:05 UTC
Created attachment 187199 [details]
comparison: ODG in LO; export to XHTML by 7.5.3; export to XHTML by 7.6 alpha0+

Comparison of exports with example ODG.

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 88bd66d258de5fee3d35aba80c61fec49eb2a969
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded