Bug 163164 - Support altChunk referencing HTML in DOCX
Summary: Support altChunk referencing HTML in DOCX
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: difficultyInteresting, easyHack, filter:docx, skillCpp
Depends on:
Blocks:
 
Reported: 2024-09-26 11:42 UTC by darrask
Modified: 2024-09-28 03:16 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
faulty DOCX (37.00 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2024-09-26 11:42 UTC, darrask
Details

Note You need to log in before you can comment on or make changes to this bug.
Description darrask 2024-09-26 11:42:18 UTC
Description:
Open the attached DOCX. It is completely empty in LibreOffice Writer.

However, opening it with NextCloud reveals that this is actually a rather large document: https://nextcloud.inrae.fr/s/zEPjc5SCPaAiGno

Maybe the references or the very wide table caused a problem?

Steps to Reproduce:
1.request the DOCX from me as I cannot upload it here (Downloading from NextCloud will transform it to another kind of file that is twice as large and does open up in Writer)
2. Open in Writer and stare at the blank page

Actual Results:
blank page

Expected Results:
visible document


Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 24.2.6.2 (X86_64) / LibreOffice Community
Build ID: 8e9a753d9daaea75c34b417ba1bdf556bf2fc5b3
CPU threads: 16; OS: Linux 6.8; UI render: default; VCL: gtk3
Locale: en-GB (en_GB.UTF-8); UI: en-GB
Calc: threaded
Comment 1 darrask 2024-09-26 11:42:55 UTC
Created attachment 196722 [details]
faulty DOCX
Comment 2 darrask 2024-09-26 11:43:51 UTC
(In reply to darrask from comment #0)
> Description:
> Open the attached DOCX. It is completely empty in LibreOffice Writer.
> 
> However, opening it with NextCloud reveals that this is actually a rather
> large document: https://nextcloud.inrae.fr/s/zEPjc5SCPaAiGno
> 
> Maybe the references or the very wide table caused a problem?
> 
> Steps to Reproduce:
> 1.request the DOCX from me as I cannot upload it here (Downloading from
> NextCloud will transform it to another kind of file that is twice as large
> and does open up in Writer)
> 2. Open in Writer and stare at the blank page
> 
> Actual Results:
> blank page
> 
> Expected Results:
> visible document
> 
> 
> Reproducible: Always
> 
> 
> User Profile Reset: No
> 
> Additional Info:
> Version: 24.2.6.2 (X86_64) / LibreOffice Community
> Build ID: 8e9a753d9daaea75c34b417ba1bdf556bf2fc5b3
> CPU threads: 16; OS: Linux 6.8; UI render: default; VCL: gtk3
> Locale: en-GB (en_GB.UTF-8); UI: en-GB
> Calc: threaded

I found out how to add an attachment
Comment 3 Mike Kaganski 2024-09-26 12:05:47 UTC
Basically, this is the same as bug 151080 - but here, the altChunk is for HTML, not MTH. Anyway, we don't support yet *any* kind of altChunk content.
Comment 4 darrask 2024-09-26 12:11:18 UTC
Thanks Mike. A warning would be useful so users don't think the file is empty.
Comment 5 Mike Kaganski 2024-09-26 12:21:38 UTC
In this case, we already support HTML; so, unlike bug 151080, this can be implemented fairly easily - just running a secondary import at each altChunk reference (it's a "one-time conversion facility", as ECMA-376 Part 1 specifies; and it may appear anywhere where a 'p' element is permitted).
Comment 6 Mike Kaganski 2024-09-27 03:33:44 UTC
The idea of the fix is: during XML parse, when a reference is found, use the "Text from file" function (i.e., similar to what happens in SwView::InsertMedium). This must be implemented as function in writerfilter::dmapper::DomainMapper (and its _Impl).