Bug 147806

Summary: Dummy bookmarks generated when importing .doc fiels
Product: LibreOffice Reporter: Eyal Rozenberg <eyalroz1>
Component: filters and storageAssignee: Not Assigned <libreoffice-bugs>
Status: NEW ---    
Severity: normal CC: buzea.bogdan, ilmari.lauhakangas, telesto
Priority: medium Keywords: filter:doc
Version: 3.5.0 release   
Hardware: All   
OS: All   
Whiteboard:
Crash report or crash signature: Regression By:
Bug Depends on:    
Bug Blocks: 108288    
Attachments: Document exhibiting the bug
Older version in .doc format

Description Eyal Rozenberg 2022-03-06 16:52:45 UTC
When importing a .doc document, it seems multiple bookmarks are generated, named _RefXXXXXXXX, where XXXXXXXX is a long number (8 or 9 digits).

I'm assuming these bookmarks have to do with the targets of references in the original document - but I'm not even sure.

Anyway, this doesn't seem right. References are references, bookmarks are bookmarks, and they should not be mixed up. 

Also, the original references are, more often than not, targeting numbered items/paragraphs, headings, actual bookmarks present in the word document, or other similar targets. In those cases, I don't see how there's any excuse to create artificial bookmarks for the reference targets (and multiple duplicate ones to boot).

Seeing this with:
Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: fb9270b238cba4f36e595c5d7f4d85f6f3f18e1c
CPU threads: 4; OS: Linux 5.10; UI render: default; VCL: gtk3
Locale: en-IL (en_IL); UI: en-US

... but actually I had imported the .doc file with an earlier nightly of 7.4.0.0 from several weeks ago.
Comment 1 Telesto 2022-03-06 22:28:30 UTC
Please add an example file illustrating the behaviour
Comment 2 Eyal Rozenberg 2022-03-21 15:18:24 UTC
Created attachment 179010 [details]
Document exhibiting the bug

As per @telesto's request, I'm attaching a document with most of its contents removed. The remaining bookmarks (originally there were dozens and dozens) seem to correspond to cross-references and possibly their targets.
Comment 3 Buovjaga 2022-12-21 11:05:00 UTC
(In reply to Eyal Rozenberg from comment #2)
> Created attachment 179010 [details]
> Document exhibiting the bug
> 
> As per @telesto's request, I'm attaching a document with most of its
> contents removed. The remaining bookmarks (originally there were dozens and
> dozens) seem to correspond to cross-references and possibly their targets.

The attachment is an .odt file. Can you attach the original .doc? I get it that you might need to sanitise it in MS Office.
Comment 4 Eyal Rozenberg 2022-12-21 22:11:45 UTC
Created attachment 184302 [details]
Older version in .doc format

So, this is probably not the exact origin of the document I've already attached, but - it's almost that. And when we open it in LO we see these arbitrary-number-named ref's.
Comment 5 QA Administrators 2022-12-22 03:36:29 UTC Comment hidden (obsolete)
Comment 6 Buovjaga 2022-12-22 07:25:07 UTC
Confirmed already in 3.5.0. I opened the file in MSO 365 (converted to new format), but I don't know how to see the references. There are no indicators even in the document text that they would exist.