Bug 146754 - Inserting a hyperlink without scheme and with port creates an invalid URL
Summary: Inserting a hyperlink without scheme and with port creates an invalid URL
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Mike Kaganski
URL:
Whiteboard: target:7.4.0
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-14 07:01 UTC by Mike Kaganski
Modified: 2022-01-18 10:21 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2022-01-14 07:01:49 UTC
In a Writer document, use Insert->Hyperlink (Ctrl+K), check that "Protocol:" is Web, and type "www.wikipedia.org:80" into "URL:" field. Click OK, and try to Ctrl+Click the created hyperlink to open the web page in the browser.

=> This fails (at least on Windows 10, it opens a "You need a new app to open this www.wikipedia.org link" message, with "Look for an app in the Microsoft Store" as the only available option; testing with Ubuntu, Ctrl+click and "Open Hyperlink" from context menu both do nothing).

Changing the URL in the dialog from "www.wikipedia.org:80" into "http://www.wikipedia.org:80" fixes this.

The problem is that the colon separating the host from the port, in the absence of the real scheme part in the ULR, is treated as separating an unknown "www.wikipedia.org" scheme from the hierarchical part of the URL. This happens in INetURLObject::setAbsURIRef [1], which is called with bSmart = true, and m_eSmartScheme = Http (2) that comes from the dialog's "Protocol:" setting.

IMO, we need to consider bSmart when trying to detect unknown schemes. There may be different approaches, the easiest would be to always use m_eSmartScheme when bSmart is true and m_eSmartScheme is not INetProtocol::NotValid; a more strict check could be that the next character after the found colon is a digit; and even stricter check could be that the sequence of characters after the colon till the end of URL/forward slash/query separator/fragment separator only consists of digits.

This would also improve URLTransformer::parseSmart() for similar cases, which uses the same INetURLObject::setAbsURIRef call.

Stephan: what do you think?

[1] https://opengrok.libreoffice.org/xref/core/tools/source/fsys/urlobj.cxx?r=9f4aad49&mo=28507&fi=722#872
Comment 1 Stephan Bergmann 2022-01-14 07:24:16 UTC
(In reply to Mike Kaganski from comment #0)
> IMO, we need to consider bSmart when trying to detect unknown schemes. There
> may be different approaches, the easiest would be to always use
> m_eSmartScheme when bSmart is true and m_eSmartScheme is not
> INetProtocol::NotValid; a more strict check could be that the next character
> after the found colon is a digit; and even stricter check could be that the
> sequence of characters after the colon till the end of URL/forward
> slash/query separator/fragment separator only consists of digits.
[...]
> Stephan: what do you think?

Yes, some check that it is a mostly-valid URL starting with host:port and missing scheme:// prefix sounds good for the bSmart case.
Comment 2 Commit Notification 2022-01-15 09:53:48 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/6b973753d407d66dfa5fda86547246c486ab7087

tdf#146754: consider xyz:123 as host:port when parsing URLs smart

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.