Bug 118972 - Writer should recognize ToC entries structure and formatting from MSO DOC/X (test with Update index)
Summary: Writer should recognize ToC entries structure and formatting from MSO DOC/X (...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 118259 126998 (view as bug list)
Depends on:
Blocks: DOCX-TableofContents DOC-TableofContents
  Show dependency treegraph
 
Reported: 2018-07-27 14:57 UTC by Timur
Modified: 2021-02-17 04:43 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Test MSO DOCX with ToC (14.61 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-07-27 14:57 UTC, Timur
Details
Test MSO DOC with ToC (31.50 KB, application/vnd.ms-word)
2018-07-27 14:58 UTC, Timur
Details
Sample DOCX with different fonts (18.52 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-09-18 12:07 UTC, Aron Budea
Details
Comparison of MSWord vs Writer rendering (23.08 KB, image/png)
2020-05-09 23:29 UTC, Alvaro Segura
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Timur 2018-07-27 14:57:11 UTC
Created attachment 143804 [details]
Test MSO DOCX with ToC

Writer and Word seem to be mutually not interoperable on import to recognize entries set in Table of Contents (ToC).
While ToC looks fine on import, Update index shows that they don't really recognize and import ToC structure.

To test, open in LO attached Test DOCX with ToC created in MSO and right-click Update index. Structure is changed and that is obvious because default tab stops are set differently. That doesn't happen with custom ToC saved in ODT. 

I'll set as Enhancement. 
This is Fileopen. Filesave to DOC and DOCX is another issue. Update index on fileopen of RT file loses structure.
Comment 1 Timur 2018-07-27 14:58:09 UTC
Created attachment 143805 [details]
Test MSO DOC with ToC
Comment 2 Xisco Faulí 2018-07-31 09:50:46 UTC
Confirmed in

Version: 6.2.0.0.alpha0+
Build ID: 72b099d279e7096d41a04fe8c0dd493a5fc18a33
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group threaded
Comment 3 Timur 2018-08-01 16:28:45 UTC
*** Bug 118259 has been marked as a duplicate of this bug. ***
Comment 4 Aron Budea 2018-09-18 12:07:31 UTC
Created attachment 144984 [details]
Sample DOCX with different fonts

And here's a sample with different fonts in the ToC that get lost upon update. While it's not strictly structure, I'd say it belongs in this ticket (if not, I can open a separate one).
Comment 5 Timur 2019-08-28 07:45:17 UTC
*** Bug 126998 has been marked as a duplicate of this bug. ***
Comment 6 Alvaro Segura 2020-05-09 23:29:49 UTC
Created attachment 160582 [details]
Comparison of MSWord vs Writer rendering

This screenshots show this issue. It also highlight the importance of keeping those tabs in documents like this one, which try to keep a proper alignment of numbers and titles (a style so necely done be LaTeX for example). This is IMHO more important than it seems.

The style is kept upong loading the file, but is lost when updateing the TOC.


Here is a possible hint:

My guess is that Writer's and Word's TOC systems are different (Writer's is more sophisticated, I'd say). In Writer one can explicitly define that a TAB must exist between number and title. However, Word does not specify anything about the separa tion of number and title. What Word does, I think, is use the same format used in the numbering of headers in the document. If the document has TABS after section numbers then the TOC will have TABS, too.

That is defined in the format of the multilevel numbering scheme. "Define new multilevel list", hitting "More >>" to expand the dialog to more options, and selecting an option for "Number followed by:". Can be TAB, SPACE or NOTHING. and this affects both the headings throughout the document, and the Table of Contents.

Then, to achieve the same results, Writer could look at the existing format of multilevel heading numbering (which is correctly read). And add this TAB to TOC elements if the multilevel numbering uses TABS.