Bug 99737 - limitation of the HTML filter to handle multiple adjacent tags
Summary: limitation of the HTML filter to handle multiple adjacent tags
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.8.2 release
Hardware: All All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:docx
Depends on:
Blocks: (X)HTML-Export
  Show dependency treegraph
 
Reported: 2016-05-09 05:24 UTC by Ramyani Ghosh
Modified: 2025-11-07 12:28 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
test.html (161 bytes, text/html)
2016-05-18 07:17 UTC, Buovjaga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ramyani Ghosh 2016-05-09 05:24:27 UTC
<strong> and <em> are not working together using Libreoffice "docx:Office Open XML Text" when converting from html to docx. 

I have tried to convert from html to docx using Libreoffice  using the following command...

soffice --headless --convert-to "docx:Office Open XML Text" 'test.html'

test.html

    <html>
      <head>
      </head>
      <body>
        <strong><em>Apply em then strong</em></strong>
        <em><strong>Apply strong then em</strong></em>
      </body>
    </html>

When I am converting this test.html to docx then `<strong><em>Apply em then strong</em></strong>` is only taking `<em>` and `<em><strong>Apply strong then em</strong></em>` is only taking `<strong>`.
Comment 1 Ramyani Ghosh 2016-05-16 12:12:01 UTC
I had found this problem in LibreOffice 4.2.8.2 . Later I have updated the libreoffice version to 5.1.3.2. and it also has the same issue.
Comment 2 Buovjaga 2016-05-18 07:17:07 UTC
Repro.

Win 7 Pro 64-bit Version: 5.2.0.0.alpha1+
Build ID: f688acfdae00ebdd891737e533d54368810185e1
CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; 
TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2016-05-18_00:11:31
Locale: fi-FI (fi_FI)
Comment 3 Buovjaga 2016-05-18 07:17:22 UTC
Created attachment 125136 [details]
test.html
Comment 4 QA Administrators 2017-05-22 13:40:50 UTC Comment hidden (obsolete)
Comment 5 QA Administrators 2021-04-09 03:47:34 UTC Comment hidden (obsolete, spam)
Comment 6 QA Administrators 2025-05-11 03:11:07 UTC Comment hidden (obsolete)
Comment 7 Michael Hamann 2025-11-06 17:09:32 UTC
I can still reproduce this bug with the current LibreOffice version:

Version: 25.8.2.2 (X86_64) / LibreOffice Community
Build ID: 580(Build:2)
CPU threads: 16; OS: Linux 6.17; UI render: default; VCL: kf6 (cairo+wayland)
Locale: de-DE (de_DE.UTF-8); UI: de-DE
25.8.2-4
Calc: threaded

I also reproduce this bug when converting to ODT instead of docx.
Comment 8 Michael Hamann 2025-11-07 12:28:43 UTC
If you don't mind changing the HTML a bit, I noticed that <b> doesn't have this problem. Something like

<p><b><em>This text is first b and then italic.</em></b></p>

works perfectly well. It also works when replacing <em> by <i>. I don't know if changing those tags has any real consequences for the created documents, they are certainly different, in ODT <b> or <i> trigger a custom style instead of "Emphasis" or "Strong_20_Emphasis". For now, I consider replacing one of the tags as a usable workaround.