Bug 148270 - FILEOPEN DOCX: TOC is not correctly interpreted
Summary: FILEOPEN DOCX: TOC is not correctly interpreted
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.3.0.2 rc
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, regression
Depends on:
Blocks: DOCX-TableofContents DOCX-Content_Control MSO-External-Producers
  Show dependency treegraph
 
Reported: 2022-03-30 13:52 UTC by Thomas Gerbet
Modified: 2023-05-24 21:54 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Document reproducing the issue (35.16 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2022-03-30 13:52 UTC, Thomas Gerbet
Details
Screenshot of the issue (8.60 KB, image/png)
2022-03-30 13:52 UTC, Thomas Gerbet
Details
The document in Word 2013 and Writer (112.59 KB, image/png)
2022-04-07 11:36 UTC, Gabor Kelemen (allotropia)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Gerbet 2022-03-30 13:52:02 UTC
Created attachment 179211 [details]
Document reproducing the issue

The TOC is not correctly detected and I can see the raw OOXML instructions in the document. An example document is attached reproducing the issue is attached.

The issue can be reproduced on 7.3.1.3 but not on 7.2.5.2 or 7.1.8.1.

I bisected it and it appears to have been introduced by commit a4432eb0946c0bc775b3d30b634bef5d66544f8d https://cgit.freedesktop.org/libreoffice/core/commit/?id=a4432eb0946c0bc775b3d30b634bef5d66544f8d
Comment 1 Thomas Gerbet 2022-03-30 13:52:45 UTC
Created attachment 179212 [details]
Screenshot of the issue
Comment 2 Rainer Bielefeld Retired 2022-03-31 05:28:51 UTC
Still REPRODUCIBLE with Server Installation of Version: 7.4.0.0.alpha0+ (x64)  Build ID b000d964fcc8849d10576bf3539bde7729db2eb1
CPU threads: 12; OS: Windows 10.0 Build 19044; UI render: default; VCL: win
Locale: de-DE (de_DE); UI: en-US  |  Calc: CL  |  Auto Colibre Theme  |  Special devUserProfile

Was still ok with Server Installation of Version: 6.0.7.3 (x64) 
Build-ID dc89aa7a9eabfd848af146d5086077aeed2ae4a5; CPU-Threads: 12; BS: Windows 10.0; UI-Render: GL; Gebietsschema: de-DE (de_DE); Calc: CL, Special devUserProfile

and 

Server Installation of Version:  4.0.0.3 WIN10
Build-ID  7545bee9c2a0782548772a21bc84a9dcc583b89;  Special devUserProfile

So REGRESSION
Comment 3 Rainer Bielefeld Retired 2022-03-31 05:30:23 UTC
a) No obvious DUP found with query <https://bugs.documentfoundation.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=DUPs148270&sharer_id=19321>
Comment 4 Timur 2022-03-31 13:42:32 UTC
Commit a4432eb0946c0bc775b3d30b634bef5d66544f8d is correct. Vasily, please see. 

Thomas, please explain how you make that DOCX, seems generated. That doesn't affect the bug as behavior is the same even if resaved in MSO. But interesting.
Comment 5 Thomas Gerbet 2022-03-31 18:22:24 UTC
Yes, I confirm this is a generated document.

It is generated by a software called Tuleap using a JS library called docx https://github.com/dolanmiu/docx
We (I'm one of the maintainer of Tuleap) however re-implemented the way the TOC is generated to prefill it with our context. I did not check if it the issue is also present when using only the docx library. You can see the code used to generate the TOC instruction here:

https://tuleap.net/plugins/git/tuleap/tuleap/stable?a=blob&hb=3b5ba2c21d161035ff1d371939edcf24096b4cc5&h=5716c31a40128f5f2129525896d3c28e2be3c719&f=plugins%2Fdocument_generation%2Fscripts%2Ftracker-report-action%2Fsrc%2FExporter%2FDOCX%2FTableOfContents%2Ftoc-field-instruction.ts

https://tuleap.net/plugins/git/tuleap/tuleap/stable?a=blob&hb=3b5ba2c21d161035ff1d371939edcf24096b4cc5&h=257848b38686120807fb4965c1f69078d680ac38&f=plugins%2Fdocument_generation%2Fscripts%2Ftracker-report-action%2Fsrc%2FExporter%2FDOCX%2FTableOfContents%2Ftable-of-contents.ts#L94
Comment 6 Gabor Kelemen (allotropia) 2022-04-07 11:36:41 UTC
Created attachment 179373 [details]
The document in Word 2013 and Writer

This is a content control field containing a TOC, which is not interpreted correctly in Writer.

Version: 7.4.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: cf4d5ed026c8799a70432a832a8a707c2e316216
CPU threads: 14; OS: Windows 10.0 Build 19044; UI render: default; VCL: win
Locale: en-US (hu_HU); UI: en-US
Calc: threaded Jumbo
Comment 7 Max Lay 2022-12-14 01:59:17 UTC
I can confirm I can also reproduce this bug on 7.37 and 7.43. Interestingly, we are also using a document produced with the same JS library used by Thomas.

Thomas, were you able to find a workaround?