Bug 158279 - TOC links lost when converting .doc to HTML (steps in comment 5)
Summary: TOC links lost when converting .doc to HTML (steps in comment 5)
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
6.3.0.4 release
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:24.8.0 target:24.2.1 target:7....
Keywords: bibisected, bisected, filter:doc, regression
Depends on:
Blocks: TableofContents-Indexes
  Show dependency treegraph
 
Reported: 2023-11-20 08:37 UTC by Juan Ferri
Modified: 2024-03-04 09:20 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Input .doc file to reproduce the issue (60.39 KB, application/msword)
2023-11-20 08:39 UTC, Juan Ferri
Details
HTML file generated with LO 7 with TOC not working (16.50 KB, text/html)
2023-11-20 08:39 UTC, Juan Ferri
Details
HTML file generated with LO 6 with TOC working (16.78 KB, text/html)
2023-11-20 08:40 UTC, Juan Ferri
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Juan Ferri 2023-11-20 08:37:38 UTC
Description:
I have a .doc file with a table of contents with navigable links in Word.(attached TOC_not_working_in_HTML.doc).
This file has 2 sections and a TOC which links to them. If you open it in Word, you can navigate using TOC links to each section.

I generated the .html file using some macro: "soffice –headless TOC_not_working_in_HTML.doc  “macro:///Batch_Conversions.Module1.UpdateAllAndConvertToHTML(html)"
The macro basically opens the document with LO, updates all and saves the file converting it with the filter “HTML (StarWriter)”.
If instead of using the macro I convert it with (soffice --headless --convert-to "html:HTML (StarWriter)" TOC_not_working_in_HTML.doc), the .html TOC directly is a plain text without links to anything.

Here is the macro pseudo code:

Sub UpdateAllAndConvertToHTML( Optional Output_Format As String )

dim Filter(0) as new com.sun.star.beans.PropertyValue
Filter(0).Name  = "FilterName"
Filter(0).Value = "HTML (StarWriter)"

UpdateAllAndKeepFileOpen

CurFileName= ThisComponent.getLocation()
NewFileName=Left( CurFileName, Len( CurFileName ) - 4 ) + ".html"
ThisComponent.storeToURL( NewFileName, Filter() )
ThisComponent.Close ( True )

End Sub

Sub UpdateAllAndKeepFileOpen
{
…
document   = ThisComponent.CurrentController.Frame
dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")
dispatcher.executeDispatch(document, ".uno:SelectAll", "", 0, Array())
dispatcher.executeDispatch(document, ".uno:UpdateAll", "", 0, Array())
…}

End Sub

As you can see, the TOC_not_working_in_HTML_LO_7_BAD.html TOC is not working. The reference from the TOC don’t get you to the different sections.
The LO version used in this case was: LibreOffice 7.6.1.1

If I use the older version (LibreOffice 6.0.7.3), the resulting html TOC works (see TOC_not_working_in_HTML_LO_6_GOOD.html file).


Steps to Reproduce:
1. Convert to html with LibreOffice the .doc example file using a similar macro or the convert-to command directly

Actual Results:
The TOC in resulting html file is not working in LO version 7.6.1.1

Expected Results:
In older version (6.0.7.3) doing the same procedures, the TOC is working in HTML file.


Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 7.6.1.1 (X86_64) / LibreOffice Community
Build ID: c7cda394c5de06de37d8109c310df89a4d4c3a98
CPU threads: 24; OS: Linux 3.10; UI render: default; VCL: gtk3
Locale: en-US (C); UI: en-US
Calc: threaded
Comment 1 Juan Ferri 2023-11-20 08:39:14 UTC
Created attachment 190915 [details]
Input .doc file to reproduce the issue
Comment 2 Juan Ferri 2023-11-20 08:39:39 UTC
Created attachment 190916 [details]
HTML file generated with LO 7 with TOC not working
Comment 3 Juan Ferri 2023-11-20 08:40:01 UTC
Created attachment 190917 [details]
HTML file generated with LO 6 with TOC working
Comment 4 BogdanB 2023-12-09 12:13:33 UTC
Juan, please wait for others to test this case and confirm this bug, and set this as New. Thanks for reporting the bug.
Comment 5 Buovjaga 2024-01-15 15:36:33 UTC
1. Open attachment 190915 [details]
2. Right-click index on page 3, Update index
3. File - Save As... - HTML
4. Open saved HTML file and click on link in the ToC. It won't do anything.

Also repro on Windows.

Bibisected with linux-64-6.3 to 8ce36e943f0e50970925b2dd77729ef6036b4a49
move some searching inside IDocumentMarkAccess
Comment 6 Commit Notification 2024-02-07 17:45:29 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/0a32def8b519461b35b1e249d71ae9961b04400a

tdf#158279 TOC links lost when converting .doc to HTML

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Commit Notification 2024-02-09 10:15:38 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-24-2":

https://git.libreoffice.org/core/commit/9de81ad9460e3fdf7733d5ba740f08894ce4b1a7

tdf#158279 TOC links lost when converting .doc to HTML

It will be available in 24.2.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Commit Notification 2024-02-12 15:09:20 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-7-6":

https://git.libreoffice.org/core/commit/d022e6a702639d38fe296fd0e8f057c0204e42aa

tdf#158279 TOC links lost when converting .doc to HTML

It will be available in 7.6.6.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Commit Notification 2024-02-15 20:38:39 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-7-6-5":

https://git.libreoffice.org/core/commit/5976bff00ae8eceedd139fbbb2621240108a2400

tdf#158279 TOC links lost when converting .doc to HTML

It will be available in 7.6.5.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Juan Ferri 2024-03-04 09:20:04 UTC
Bug fixed. Thanks¡