Bug 148198 - Editing single hyperlink breaks it into smaller ones
Summary: Editing single hyperlink breaks it into smaller ones
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.2.6.2 release
Hardware: All All
: medium normal
Assignee: Mike Kaganski
URL:
Whiteboard: target:7.5.0 target:7.4.0.2
Keywords:
: 99912 112429 149949 (view as bug list)
Depends on:
Blocks:
 
Reported: 2022-03-26 08:33 UTC by Jordi
Modified: 2022-07-18 10:07 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jordi 2022-03-26 08:33:54 UTC
As per https://ask.libreoffice.org/t/macro-to-get-list-of-hyperlinks-from-navigator/75503, when in Writer a character is added to the text of a hyperlink (not from the dialog), the single link is broken up into multiple ones. Version 3 of LO doesnt have this issue, it was introduced in v4
Comment 1 Mike Kaganski 2022-03-26 11:01:19 UTC
No repro using Version: 7.3.2.2 (x64) / LibreOffice Community
Build ID: 49f2b1bff42cfccbd8f788c8dc32c1c309559be0
CPU threads: 12; OS: Windows 10.0 Build 19044; UI render: default; VCL: win
Locale: en-US (ru_RU); UI: en-US
Calc:
Comment 2 Mike Kaganski 2022-03-26 11:22:17 UTC
Or, rather, *if* I post the *exact* steps to repro, it is not a bug.

0. Make sure that in Options/Writer/Comparison, "Random number ... - [x] Store it when changing the document" is *checked*.
1. Create a hyperlink (using Ctrl+K "Insert->Hyperlink" dialog).
2. Save the document *and reload it*.
3. Edit the text of the hyperlink by typing inside the text in Writer's main window (not thorough the dialog).
=> At this stage the macros show three links.
4. Save and reload
=> At this stage, Navigator also shows three links.

The reason is that the random number (checked at step 0, and used to make document comparison more robust and useful) creates a new property in the inserted text run, which makes the previously single text run with hyperlink three new runs - first without the random number, then second with the random number, then the third without. Every separate text run is an own hyperlink (from the code point of view).
Possible bug is that Navigator is not updated after step 3 properly.

Possible enhancement would be to use nested text runs, instead of non-nesting, side-by-side runs as created now.

(In reply to Jordi from comment #0)
> Version 3 of LO doesnt have this issue, it was introduced in v4

It was introduced in https://git.libreoffice.org/core/+/062eaeffe7cb986255063bb9b0a5f3fb3fc8e34c in 2011 (for version 3.6); then it was made optional in version 5.0 [1].

[1] https://wiki.documentfoundation.org/ReleaseNotes/5.0#Optional_RSIDs.
Comment 3 Jordi 2022-03-26 14:57:04 UTC
Mike, 

my clean install of 7.2.6.2 (Win10 x64), under Options/Writer/Comparison, "Random number ...", the first two options are disabled (grayed out) and the "[x] Store it when changing the document" is INDEED *checked*.

So given the above, I see a bug either way. Either #1 below is a bug or #2

1) editing the text of a link in Writer's main windows shouldn't break up a link into multiple. 

2) Navigator is not updated to reflect what is happening if #1 is not a bug (which it has to be; otherwise its the strangest behaviour)

Thanks.
Comment 4 Jordi 2022-03-26 15:24:37 UTC
(In reply to Mike Kaganski from comment #2)
> Or, rather, *if* I post the *exact* steps to repro, it is not a bug.
> 
> 0. Make sure that in Options/Writer/Comparison, "Random number ... - [x]
> Store it when changing the document" is *checked*.
> 1. Create a hyperlink (using Ctrl+K "Insert->Hyperlink" dialog).
> 2. Save the document *and reload it*.
> 3. Edit the text of the hyperlink by typing inside the text in Writer's main
> window (not thorough the dialog).
> => At this stage the macros show three links.
> 4. Save and reload
> => At this stage, Navigator also shows three links.
> 
> The reason is that the random number (checked at step 0, and used to make
> document comparison more robust and useful) creates a new property in the
> inserted text run, which makes the previously single text run with hyperlink
> three new runs - first without the random number, then second with the
> random number, then the third without. Every separate text run is an own
> hyperlink (from the code point of view).
> Possible bug is that Navigator is not updated after step 3 properly.
> 
> Possible enhancement would be to use nested text runs, instead of
> non-nesting, side-by-side runs as created now.
> 
> (In reply to Jordi from comment #0)
> > Version 3 of LO doesnt have this issue, it was introduced in v4
> 
> It was introduced in
> https://git.libreoffice.org/core/+/062eaeffe7cb986255063bb9b0a5f3fb3fc8e34c
> in 2011 (for version 3.6); then it was made optional in version 5.0 [1].
> 
> [1] https://wiki.documentfoundation.org/ReleaseNotes/5.0#Optional_RSIDs.


Just installed latest version, 

Version: 7.3.2.2 (x64) / LibreOffice Community
Build ID: 49f2b1bff42cfccbd8f788c8dc32c1c309559be0
CPU threads: 12; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: en-AU (en_AU); UI: en-GB
Calc: CL

with the same behaviour. Option also checked. I am fail to see how this is expected behaviour and NOT a bug.
Comment 5 Mike Kaganski 2022-07-11 12:47:47 UTC
*** Bug 149949 has been marked as a duplicate of this bug. ***
Comment 6 Mike Kaganski 2022-07-11 12:48:52 UTC
Likely we need do join the adjacent text runs with the same link address into single hyperlink.
Comment 7 Mike Kaganski 2022-07-11 13:20:02 UTC
*** Bug 149949 has been marked as a duplicate of this bug. ***
Comment 8 Mike Kaganski 2022-07-11 13:38:33 UTC
*** Bug 99912 has been marked as a duplicate of this bug. ***
Comment 9 Mike Kaganski 2022-07-11 13:40:34 UTC
*** Bug 112429 has been marked as a duplicate of this bug. ***
Comment 10 phv 2022-07-11 14:48:47 UTC
Since the bug report #149949 is a duplicate of this one, I want to bring my own experience. In my case, the comparison option is disabled and multiple character styles are previously applied to the hyperlink, resulting in a subdivision of this same link when saving.

As explain in my bug report, Word manages to save the same document in odt format without subdividing the link by character style. Once saved in Word and reopened in Writer, the document still displays a single link with several character styles. Two test documents are actually attached to the above-mentioned report.

While Writer behavior does not affect the functionality of the document, it is unacceptable because a unique link that appears as such when it is saved should remain so. Altering the link without notification is overriding the user's work. The readability of the links in the navigator and their management are both strongly compromised.
Comment 11 Mike Kaganski 2022-07-11 15:52:18 UTC
FTR:

Opening attachment 181227 [details] from bug 149949 in Writer gives a single hyperlink. Saving to ODT splits the hyperlink again; but saving to DOCX instead will keep a single hyperlink.

It seems that the relevant DOCX changes were commits 46b6bad7db21f3743a26b328f23e5d66f8211bb8, 482cdf173eca106848672acfe4923faf4584b1a7, and f176c9ba7be7f3051a52b9f57b56124038c0cfd6.

The related ODF export code is XMLTextParagraphExport::exportTextRange in xmloff/source/text/txtparae.cxx.
Comment 12 Mike Kaganski 2022-07-13 09:28:06 UTC
https://gerrit.libreoffice.org/c/core/+/137013 will merge adjacent identical hyperlinks on save to ODF. Unfortunately, I found no easy way to get the true hyperlink boundaries at the level where ODF is exported; this means that with that change, it will be impossible to save two adjacent *but separate* identical links to ODF (they will merge on save - the opposite to what happens now). This has a nice (?) property to heal already incorrectly split hyperlinks (note that this will happen *on save*, so don't expect to see merged hyperlinks immediately opening old "broken" files in a new version with that change, when/if it gets merged).

My hope is that having two adjacent fully identical hyperlinks would be highly unlikely scenario, much less frequent than having a single hyperlink consisting of differently formatted (or otherwise different internally) parts.
Comment 13 Jordi 2022-07-13 11:59:15 UTC
Hmm... I don't think the problem is with part of the code involved with saving the document, because the following test case shows the breakage occurring during editing. For example, 

1. create a blank file; don't save it.
2. Disable autosave just in case.
3. insert some dummy text and create a link.
4. insert (deleting doesnt trigger it) a character in the middle just from Writer (not the dialog). 

The code linked in the original post of this bug will now return multiple links, yet the file is not saved yet.

HTH
Comment 14 Mike Kaganski 2022-07-13 12:09:12 UTC
(In reply to Jordi from comment #13)
> The code linked in the original post of this bug

Which specific code? The Ask question has at least two pieces of code, one of which is getNextHyperlink that uses createEnumeration on paragraph objects to get access to the paragraph's text runs. Indeed, adding a text somewhere in the middle in an existing document adds a new text run with a new rsid, as explained in comment 2. It is also mentioned in the commit message on the change from comment 12, where I wrote that in the place where ODF is written, we can't get access to actual hyperlink boundaries, and thus I had to check each text run's hyperlink data, to find identical datas and merge them.

Another piece of code from the Ask question (by KamilLanda) uses Navigator frame directly. It shows that the hyperlink itself is not split. By the way, during the work on the patch, I discovered that assumed bug of Navigator not showing updated hyperlink count is not a bug, because the hyperlink is not split itself during the edit.
Comment 15 Jordi 2022-07-13 12:29:34 UTC
(In reply to Mike Kaganski from comment #14)
> (In reply to Jordi from comment #13)
> > The code linked in the original post of this bug
> 
> Which specific code? The Ask question has at least two pieces of code, one
> of which is getNextHyperlink that uses createEnumeration on paragraph
> objects to get access to the paragraph's text runs. Indeed, adding a text
> somewhere in the middle in an existing document adds a new text run with a
> new rsid, as explained in comment 2. It is also mentioned in the commit
> message on the change from comment 12, where I wrote that in the place where
> ODF is written, we can't get access to actual hyperlink boundaries, and thus
> I had to check each text run's hyperlink data, to find identical datas and
> merge them.
> 
> Another piece of code from the Ask question (by KamilLanda) uses Navigator
> frame directly. It shows that the hyperlink itself is not split. By the way,
> during the work on the patch, I discovered that assumed bug of Navigator not
> showing updated hyperlink count is not a bug, because the hyperlink is not
> split itself during the edit.

Apologies, should have reread whole thread. Commend 12 upon quick read sounded like you were saying it occurred during saving.

Just ignore previous message.
Comment 16 Mike Kaganski 2022-07-13 12:50:12 UTC
(In reply to Jordi from comment #15)
> Commend 12 upon quick read sounded like you were saying it occurred during saving.

I was, actually. Yes, "it" happens during saving. The question is - what is "it".
Specifically - splitting the hyperlink by text run boundaries happened during saving: let me state it again, that when you insert something in your example, *a new separate text run* is created, so *old* bigger text run gets split; but the *hyperlink* does not get split at this point in Writer's internals (yet).

OTOH, when you use createEnumeration on paragraph objects, you don't get a list of hyperlinks, but a list of text runs. There is XAccessibleHypertext [1] that you may use to get the correct hyperlink count, and their boundaries.

[1] https://api.libreoffice.org/docs/idl/ref/interfacecom_1_1sun_1_1star_1_1accessibility_1_1XAccessibleHypertext.html
Comment 17 Commit Notification 2022-07-14 05:43:36 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/9f4af852c4050d45bb5ab314480fc83639bea90a

tdf#148198: merge identical hyperlinks of adjacent text ranges on ODF export

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 phv 2022-07-15 10:45:25 UTC
Since we can't say it enough, thank you Mike for having fixed this bug in such a short time. Now, my documents saved then exported in pdf format no longer display several successive hyperlinks for a same target.

However, I notice that LibreOffice still has trouble combining attributes between adjacent elements (as in #146458), but that's another issue.
Comment 19 Commit Notification 2022-07-18 10:07:35 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/c50fbdceafdd4b857954f098e38cae03e8bc6064

tdf#148198: merge identical hyperlinks of adjacent text ranges on ODF export

It will be available in 7.4.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.