Bug 154319 - ToC in DOCX has duplicated LS/LE elements; \d in TOC field gives displaced CI and text elements
Summary: ToC in DOCX has duplicated LS/LE elements; \d in TOC field gives displaced CI...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Mike Kaganski
URL:
Whiteboard: target:7.6.0 target:7.5.3
Keywords: filter:docx
Depends on:
Blocks:
 
Reported: 2023-03-22 05:35 UTC by Mike Kaganski
Modified: 2023-03-24 09:03 UTC (History)
0 users

See Also:
Crash report or crash signature:


Attachments
ToC with \s and \d (18.52 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-03-22 05:35 UTC, Mike Kaganski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2023-03-22 05:35:46 UTC
Created attachment 186130 [details]
ToC with \s and \d

The attached DOCX has a table of contents, having this field code:

  { TOC \o "1-3" \h \z \u \s chapter \d ":" }

The \s and \d define that there is the numbering and a specific separator (colon) before the page number [1].

It results in this ToC text:

  1	Ch 1	1:1
  1.1	Subch	1:1
  2	Ch 2	2:1
  3	Ch 3	3:1

Open the attached document in Writer, and inspect the ToC entries. They have the following structure:

  [LS][LS][E#][E][T][#][CI]":" [LE][LE]

Three problems in this structure:
1. [LS] and [LE] (Hyperlink start/end) are duplicated;
2. The colon is enclosed into double quotes, and is followed by a space;
3. [CI] and the colon are displaced, go after [#], while they must precede it.

The resulting text of the ToC (after update) is:

  1 Ch 1	11":" 
  1.1 Subch	11.1":" 
  2 Ch 2	12":" 
  3 Ch 3	13":" 

Version: 7.5.2.1 (X86_64) / LibreOffice Community
Build ID: e8bf3b441b8370f8440b0339fd9490765a8d57ca
CPU threads: 12; OS: Windows 10.0 Build 19045; UI render: default; VCL: win
Locale: ru-RU (ru_RU); UI: en-US
Calc: CL threaded

[1] https://support.microsoft.com/en-us/office/field-codes-toc-table-of-contents-field-1f538bc4-60e6-4854-9f64-67754d78d05c
Comment 1 Commit Notification 2023-03-23 04:39:58 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/76777c82fa4bb5080c135e2241c3f7122dcbb298

tdf#154319: fix TOC field codes parsing

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 2 Commit Notification 2023-03-24 09:03:43 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/128671288204136ceba258a5fe809c354728a175

tdf#154319: fix TOC field codes parsing

It will be available in 7.5.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.