Bug 163164 - Support altChunk referencing HTML in DOCX
Summary: Support altChunk referencing HTML in DOCX
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: arsal4an
URL:
Whiteboard: target:25.8.0
Keywords: difficultyInteresting, easyHack, filter:docx, skillCpp
Depends on:
Blocks: File-Opening
  Show dependency treegraph
 
Reported: 2024-09-26 11:42 UTC by darrask
Modified: 2025-05-08 10:27 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
faulty DOCX (37.00 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2024-09-26 11:42 UTC, darrask
Details
A minimal valid DOCX+HTML sample (3.58 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2025-04-04 14:29 UTC, Mike Kaganski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description darrask 2024-09-26 11:42:18 UTC
Description:
Open the attached DOCX. It is completely empty in LibreOffice Writer.

However, opening it with NextCloud reveals that this is actually a rather large document: https://nextcloud.inrae.fr/s/zEPjc5SCPaAiGno

Maybe the references or the very wide table caused a problem?

Steps to Reproduce:
1.request the DOCX from me as I cannot upload it here (Downloading from NextCloud will transform it to another kind of file that is twice as large and does open up in Writer)
2. Open in Writer and stare at the blank page

Actual Results:
blank page

Expected Results:
visible document


Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 24.2.6.2 (X86_64) / LibreOffice Community
Build ID: 8e9a753d9daaea75c34b417ba1bdf556bf2fc5b3
CPU threads: 16; OS: Linux 6.8; UI render: default; VCL: gtk3
Locale: en-GB (en_GB.UTF-8); UI: en-GB
Calc: threaded
Comment 1 darrask 2024-09-26 11:42:55 UTC
Created attachment 196722 [details]
faulty DOCX
Comment 2 darrask 2024-09-26 11:43:51 UTC
(In reply to darrask from comment #0)
> Description:
> Open the attached DOCX. It is completely empty in LibreOffice Writer.
> 
> However, opening it with NextCloud reveals that this is actually a rather
> large document: https://nextcloud.inrae.fr/s/zEPjc5SCPaAiGno
> 
> Maybe the references or the very wide table caused a problem?
> 
> Steps to Reproduce:
> 1.request the DOCX from me as I cannot upload it here (Downloading from
> NextCloud will transform it to another kind of file that is twice as large
> and does open up in Writer)
> 2. Open in Writer and stare at the blank page
> 
> Actual Results:
> blank page
> 
> Expected Results:
> visible document
> 
> 
> Reproducible: Always
> 
> 
> User Profile Reset: No
> 
> Additional Info:
> Version: 24.2.6.2 (X86_64) / LibreOffice Community
> Build ID: 8e9a753d9daaea75c34b417ba1bdf556bf2fc5b3
> CPU threads: 16; OS: Linux 6.8; UI render: default; VCL: gtk3
> Locale: en-GB (en_GB.UTF-8); UI: en-GB
> Calc: threaded

I found out how to add an attachment
Comment 3 Mike Kaganski 2024-09-26 12:05:47 UTC
Basically, this is the same as bug 151080 - but here, the altChunk is for HTML, not MTH. Anyway, we don't support yet *any* kind of altChunk content.
Comment 4 darrask 2024-09-26 12:11:18 UTC
Thanks Mike. A warning would be useful so users don't think the file is empty.
Comment 5 Mike Kaganski 2024-09-26 12:21:38 UTC
In this case, we already support HTML; so, unlike bug 151080, this can be implemented fairly easily - just running a secondary import at each altChunk reference (it's a "one-time conversion facility", as ECMA-376 Part 1 specifies; and it may appear anywhere where a 'p' element is permitted).
Comment 6 Mike Kaganski 2024-09-27 03:33:44 UTC
The idea of the fix is: during XML parse, when a reference is found, use the "Text from file" function (i.e., similar to what happens in SwView::InsertMedium). This must be implemented as function in writerfilter::dmapper::DomainMapper (and its _Impl).
Comment 7 Mike Kaganski 2025-04-04 14:29:43 UTC
Created attachment 200167 [details]
A minimal valid DOCX+HTML sample
Comment 8 Commit Notification 2025-04-04 17:29:04 UTC
ArsalanKhan04 committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/c55ae722661d499cb27bc1f2727bc9873248adc5

tdf#163164 support altChunk referencing HTML in DOCX

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Buovjaga 2025-05-08 09:45:38 UTC
Muhammad: feel free to close as fixed.