Bug 126716 - Export/Save as HTML produce different result from DOCX numbered list
Summary: Export/Save as HTML produce different result from DOCX numbered list
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
6.2.5.1 rc
Hardware: x86-64 (AMD64) Linux (All)
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:docx, filter:html
Depends on:
Blocks: (X)HTML-Export
  Show dependency treegraph
 
Reported: 2019-08-05 23:47 UTC by echan00
Modified: 2023-12-21 03:12 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample .DOCX file to show bug with conversion to HTML (12.86 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-08-05 23:48 UTC, echan00
Details

Note You need to log in before you can comment on or make changes to this bug.
Description echan00 2019-08-05 23:47:36 UTC
Description:
https://gofile.io/?c=jwgcwg

The attached file is converted to HTML appropriately on Mac OSX 6.2.5.1, but on the same Linux distribution (6.2.5.1) the HTML is different and some extra bullets are displayed.

Steps to Reproduce:
1. Download .DOCX file from https://gofile.io/?c=jwgcwg
2. Convert the .DOCX file using Linux distribution of LibreOffice into HTML
3. Convert the .DOCX file using Mac distribution of LibreOffice into HTML

Actual Results:
Compare the two HTML files. The linux version has bullets in front of the text (incorrect) and the Mac version does not (correct).

Expected Results:
Both resulting HTML files should NOT have any bullets in front of the text. It should look just like when the .DOCX file is opened.


Reproducible: Always


User Profile Reset: Yes



Additional Info:
Comment 1 echan00 2019-08-05 23:48:21 UTC
Created attachment 153150 [details]
Sample .DOCX file to show bug with conversion to HTML
Comment 2 echan00 2019-08-06 00:01:10 UTC
Upon further inspection:

Converting the attached file from DOCX to HTML using command line (e.g. soffice --headless --convert-to html problem.docx) results in a different HTML than exporting the file from the application GUI.
Comment 3 echan00 2019-08-06 01:18:24 UTC
It appears the application export function is more accurate than the command line conversion function.

In the case of the attachment in this bug, the application export is correct and accurate while the conversion function is incorrect.
Comment 5 Xisco Faulí 2020-02-18 17:09:49 UTC
A new major release of LibreOffice is available since this bug was reported.
Could you please try to reproduce it with the latest version of LibreOffice
from https://www.libreoffice.org/download/libreoffice-fresh/ ?
I have set the bug's status to 'NEEDINFO'. Please change it back to
'UNCONFIRMED' if the bug is still present in the latest version.
Comment 6 echan00 2020-02-19 06:04:36 UTC
Bug still existing in new version
Comment 7 Buovjaga 2020-05-10 15:49:03 UTC
What is the origin of this DOCX file? MS Office? Version what?
Comment 8 echan00 2020-05-10 18:54:37 UTC
Any docx file
Comment 9 echan00 2020-05-10 18:55:13 UTC
Yes MS office that uses docx
Comment 10 Buovjaga 2020-05-10 19:29:51 UTC
Ok, so here's the deal:
If you unzip the docx file and look at word/document.xml, the paragraphs are defined as a numbered list, using numPr as documented here: https://c-rex.net/projects/samples/ooxml/e1/Part4/OOXML_P4_DOCX_numPr_topic_ID0EBBCM.html

As the document does not show the numbering in LibreOffice, the result from "Export as xhtml" is more correct: it *does* create an ordered list <ol>, but it has the rule list-style: none; for <li> elements.

Also, the numbering in the xhtml export starts from 1. while the numbering in the document produced by "Save as html" start from 0. for some reason.

I know it is silly that we have two ways to save html...
Comment 11 echan00 2020-05-11 04:35:15 UTC
Thanks. I'm trying to verify your suggestion. I tried both commands below:

soffice --headless --convert-to html problem.docx

soffice --headless --convert-to xhtml problem.docx

The resulting HTML and XHTML files are identical. Is there a particular way to convert to the XHTML as you suggested?
Comment 12 echan00 2020-05-11 07:48:43 UTC
Seems like related to this bug: https://bugs.documentfoundation.org/show_bug.cgi?id=67035
Comment 13 Buovjaga 2020-05-11 08:24:12 UTC
(In reply to echan00 from comment #11)
> Thanks. I'm trying to verify your suggestion. I tried both commands below:
> 
> soffice --headless --convert-to html problem.docx
> 
> soffice --headless --convert-to xhtml problem.docx
> 
> The resulting HTML and XHTML files are identical. Is there a particular way
> to convert to the XHTML as you suggested?

File - Export and pick XHTML
Comment 14 amyekut 2021-02-04 10:40:32 UTC
Similar problem with font in HTML in numbered list wrong. 
See new ticket Bug 140146 for precise complaint
Comment 15 Stéphane Guillou (stragu) 2021-06-27 12:22:10 UTC
Reproduced with LO 7.3 alpha0, 7.2 beta1 and 7.0.6 on Ubuntu 18.04, using "File > Save as... > HTML", and opening the resulting .html file in Firefox 89.0.1 or Chromium 91.0.4472.101. Two lines are numbered, starting at 0.

"File > Export > XHTML" results in HTML files without numbering, as expected.

Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: f446a203fa2897bab8ae7686c948a8bf060675c6
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-06-24_15:16:38
Calc: threaded

Version: 7.2.0.0.beta1 / LibreOffice Community
Build ID: c6974f7afec4cd5195617ae48c6ef9aacfe85ddd
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Version: 7.0.6.2
Build ID: 144abb84a525d8e30c9dbbefa69cbbf2d8d4ae3b
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded
Comment 16 QA Administrators 2023-12-21 03:12:27 UTC
Dear echan00,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug