Bug 145381 - Hyperlinks losing final Bracket (Writer, Impress, etc)
Summary: Hyperlinks losing final Bracket (Writer, Impress, etc)
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Armin Le Grand
URL:
Whiteboard: target:7.4.0 target:7.3.0.0.beta2
Keywords:
: 141104 (view as bug list)
Depends on:
Blocks: Hyperlink
  Show dependency treegraph
 
Reported: 2021-10-29 13:08 UTC by Alexander
Modified: 2021-12-30 10:32 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander 2021-10-29 13:08:16 UTC
Description:
copy an URL into open Document (Writer, Impress, ..) and press Enter (makes it a blue Hyperlink). The URL needs a final bracket ")" as last character. The Hyperlink is missing the final ")" and shows it as normal (black) text and it is not part of the Hyperlink. You can check, if clicking Link-Copy-Address [I try to guess the english name, as I use german setting].
I copied lots of Wikipedia-Article-Links [e.g. "https://en.wikipedia.org/wiki/Rank_(linear_algebra)" ] and bug shows up often (about 100%).
It happens in Version 7.2.2.2 (Win10, x64) and 6.1.5.2 on Raspberry-Pi-400.

Steps to Reproduce:
1. copy affected URL (e.g. "https://en.wikipedia.org/wiki/Rank_(linear_algebra)" into open Document (e.g. Writer or Impress) and press Enter
2. watch the final bracket is not part of Link (blue vs black color) and missing in Copy-Link-Address (or edit Link).
3. -

Actual Results:
final ")" char is missing in Hyperlink object, e.g. here we have "https://en.wikipedia.org/wiki/Rank_(linear_algebra" plus a separate ")" as text.

Expected Results:
final ")" should stay part of Hyperlink/URL.


Reproducible: Always


User Profile Reset: No



Additional Info:
[Information automatically included from LibreOffice]
Locale: de
Module: PresentationDocument - but also Writer (likely all others).
[Information guessed from browser]
OS: Windows 10 (All) - but also Raspberry-PI-400-OS.
OS is 64bit: yes
Comment 1 Alexander 2021-10-29 13:39:15 UTC
Version: 7.2.2.2 (x64) / LibreOffice Community
Build ID: 02b2acce88a210515b4a5bb2e46cbfb63fe97d56
CPU threads: 8; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: de-DE
Calc: threaded
Comment 2 Mike Kaganski 2021-10-29 15:21:32 UTC
Repro in 7.2.2.2, and also in OOo 3.2.0.

(Funny thing is how e.g. Bugzilla also has the same flaw.)

FTR: the normative reference of interest here is RFC3986 Appendix C "Delimiting a URI in Context" [1], which specifically describes how to tell URLs from the rest of the text. Unfortunately, it does not discuss parentheses specifically; but since parentheses are valid characters in URLs, it seems natural to not exclude them when delimiting an URL.

[1] https://datatracker.ietf.org/doc/html/rfc3986#appendix-C
Comment 3 Michael Warner 2021-10-29 19:11:14 UTC
This is a duplicate of bug 141104.

Also consider the discussion in bug 113526. 

Putting a whole URL in parens or as part of a subexpression (like this: https://www.example.com) is a common use case and the trailing paren shouldn't be part of the link then, but IMHO if there is an opening paren inside the URL then the matching closing one should be added.
Comment 4 Michael Warner 2021-10-29 19:12:18 UTC
*** Bug 141104 has been marked as a duplicate of this bug. ***
Comment 5 Armin Le Grand 2021-12-09 15:22:44 UTC
Taking a look...
Comment 6 Armin Le Grand 2021-12-09 16:22:30 UTC
While I started to look, just a hint since I am curious:
In the original description here the link offered after '1.' the closing bracket is already not part of the URL. If I hover & use context menu, use copy_link it and directly paste it here in the textbox, I get:

https://en.wikipedia.org/wiki/Rank_(linear_algebra

This may be related to Firefox, or bugzilla, or the original text having been formatted like this already - many possibilities...

If I select with the mouse ensuring that the right closing bracket is included, I get the expected paste here in this edit window. All in the browser, no LibreOffice involved (yet)...
Comment 7 Armin Le Grand 2021-12-09 16:44:07 UTC
To get it reproduced, I have to copy below link using mouse & including the closing bracket. Pasting shows described behavior in all apps.

Also tried: CTRL-klick on link in description here (in browser) to open a new tab. In that tab, the URL is missing the ')', too, and shows a 'Did you mean...' text, so it does not find the page.
If I edit the URL in the URL line of the browser and add ')' the correct page is found.
I did that to allow me to copy the URL from he browser as URL-type (maybe/should be a clipboard format) to try LO with that -> result pretty much the same, but RETURN is needed in SW/SD o get detected as link. In SC it just gets pasted including closing bracket, but gets not detected as URL (no blueish color).
-> his shows that behavior and thus code is not really equal everywhere.

Also strange: In all apps I can right-click on that closing bracket and get the URL dialog, so I *can* do 'Edit URL', then indeed with just the closing bracket being in the dialog as URL. Very strange...

Will now look in debugger what happens here when he URL gets pasted...
Comment 8 Armin Le Grand 2021-12-09 17:08:24 UTC
Will concentrate on Writer 1st. Three URLs checked:

(1) the one from his task 
https://bugs.documentfoundation.org/show_bug.cgi?id=145381
-> gets inserted, interpreted as URL after return

(2) the one from description here, copied using context-menu & Copy_Link
-> gets inserted, interpreted as URL after return

(3) the fully copied one, including the closing bracket:
his one behaves differently, dependent on how I copy it from Firefox:
(a) if using context menu "copy" I get it immediately interpreted as link, the closing bracket is there but separated. An extra return is inserted, the cursor is one line deeper. So there are two returns, one after the URL and one extra.
(b) if using context menu "copy link" I have to press return myself to get it interpreted, or a space. If typing a closing bracket, it gets not interpreted as link, also with many other characters, seems to need a whitespace.

Still strange, but will concentrate on 3a, that is closest to the description. Maybe there is more to be found.
Comment 9 Armin Le Grand 2021-12-14 13:52:06 UTC
Identified one of the problems (call it problem (A)), fix is at https://gerrit.libreoffice.org/c/core/+/126832...
Comment 10 Commit Notification 2021-12-15 11:27:44 UTC
Armin Le Grand (Allotropia) committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/76f29376183be48c076ada06159581ea981de3d1

tdf#145381 handle closing brackets in URLs correctly

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2021-12-15 12:34:40 UTC
Armin Le Grand (Allotropia) committed a patch related to this issue.
It has been pushed to "libreoffice-7-3":

https://git.libreoffice.org/core/commit/fd06b1b2689d4189fd94beade3983af4acc5ffc3

tdf#145381 handle closing brackets in URLs correctly

It will be available in 7.3.0.0.beta2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Armin Le Grand 2021-12-16 10:34:25 UTC Comment hidden (obsolete)
Comment 13 Armin Le Grand 2021-12-16 10:36:17 UTC Comment hidden (obsolete)
Comment 14 Armin Le Grand 2021-12-16 10:37:34 UTC Comment hidden (obsolete)
Comment 15 Armin Le Grand 2021-12-16 10:38:43 UTC Comment hidden (obsolete)
Comment 16 Mike Kaganski 2021-12-16 10:53:09 UTC
(In reply to Armin Le Grand from comment #15)

Please see comment 2, where the issue with *Bugzilla* linkifier is confirmed ;-)
Comment 17 Armin Le Grand 2021-12-16 13:21:49 UTC Comment hidden (obsolete)
Comment 18 Armin Le Grand 2021-12-16 13:22:47 UTC Comment hidden (obsolete)
Comment 19 Armin Le Grand 2021-12-16 15:16:32 UTC Comment hidden (obsolete)
Comment 20 Buovjaga 2021-12-16 15:44:21 UTC
You are hitting an issue in Bugzilla's regexp: https://bugzilla.mozilla.org/show_bug.cgi?id=663299

Workaround: https://en.wikipedia.org/wiki/Rank_%28linear_algebra%29
Comment 21 Armin Le Grand 2021-12-17 09:29:30 UTC
Okay, I found a way to get a correct URL to my clipboard:

Use Thunderbird, new eMail, add some returns to get space, add in-between the url, corrrect it adding the closed bracket. It will do some 'resistance' and add the closed bracket as 2nd link - argh. Still, using context-menu and edit-link (also opens on double-klick, we should do that, too?) I managed to get a correct URL. On that URL, select with dragging mouse, then context-menu & copy

Open SW, paste...

Now at SwTransferable::PasteFileContent you get:
   nFormat = SotClipboardFormatId::HTML
and
   0x55555afc47a0 "<meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\"><a moz-do-not-send=\"true\" href=\"https://en.wikipedia.org/wiki/Rank_(linear_algebra)\">https://en.wikipedia.org/wiki/Rank_(linear_algebra)</a>"

as content. That get parsed correctly by HTMLParser and creates a correct link in LO, CTRL-click works.

Conclusion:
=> All errors of case (B) happen due to other software feeding HTML with error(s) in it in clipboard
=> I can confirm now that HTMLParser & LO work correctly
=> No further fix needed, but maybe informatikon that this is caused by other SW having same/similar bug
Comment 22 Buovjaga 2021-12-17 09:35:42 UTC
I verify the fix

Arch Linux 64-bit
Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: ab4ee55a2a03ce93debcda41d817a95517a711f0
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: kf5 (cairo+xcb)
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded
Built on 16 December 2021
Comment 23 Alexander 2021-12-17 09:45:26 UTC
But why does this happen also it link is copied from notepad.exe?
There is no extra hidden data then (e.g. font style, etc).
Perhaps I get something wrong, but at least I wonder.
Anyway: thanks for your work and happy x-mas.
Comment 24 Alexander 2021-12-17 09:45:58 UTC
typo: it => if
Comment 25 Buovjaga 2021-12-17 09:57:11 UTC
(In reply to Alexander from comment #23)
> But why does this happen also it link is copied from notepad.exe?
> There is no extra hidden data then (e.g. font style, etc).
> Perhaps I get something wrong, but at least I wonder.
> Anyway: thanks for your work and happy x-mas.

He fixed the issue two days ago and was talking about something else. You can test the fix with Win-x86_64@tb77-TDF from https://dev-builds.libreoffice.org/daily/master/current.html
Comment 26 BogdanB 2021-12-23 11:51:09 UTC
I tested the bug in 
Version: 7.3.0.1.0+ (x64) / LibreOffice Community
Build ID: 821e5733ce2149544fb6ff0b3d39923340f93fa7
CPU threads: 4; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: ro-RO (ro_RO); UI: en-US
Calc: threaded

If I press Enter after the link it is taking the fina bracket into the link and it is working well.