In a Writer document, use Insert->Hyperlink (Ctrl+K), check that "Protocol:" is Web, and type "www.wikipedia.org:80" into "URL:" field. Click OK, and try to Ctrl+Click the created hyperlink to open the web page in the browser. => This fails (at least on Windows 10, it opens a "You need a new app to open this www.wikipedia.org link" message, with "Look for an app in the Microsoft Store" as the only available option; testing with Ubuntu, Ctrl+click and "Open Hyperlink" from context menu both do nothing). Changing the URL in the dialog from "www.wikipedia.org:80" into "http://www.wikipedia.org:80" fixes this. The problem is that the colon separating the host from the port, in the absence of the real scheme part in the ULR, is treated as separating an unknown "www.wikipedia.org" scheme from the hierarchical part of the URL. This happens in INetURLObject::setAbsURIRef [1], which is called with bSmart = true, and m_eSmartScheme = Http (2) that comes from the dialog's "Protocol:" setting. IMO, we need to consider bSmart when trying to detect unknown schemes. There may be different approaches, the easiest would be to always use m_eSmartScheme when bSmart is true and m_eSmartScheme is not INetProtocol::NotValid; a more strict check could be that the next character after the found colon is a digit; and even stricter check could be that the sequence of characters after the colon till the end of URL/forward slash/query separator/fragment separator only consists of digits. This would also improve URLTransformer::parseSmart() for similar cases, which uses the same INetURLObject::setAbsURIRef call. Stephan: what do you think? [1] https://opengrok.libreoffice.org/xref/core/tools/source/fsys/urlobj.cxx?r=9f4aad49&mo=28507&fi=722#872
(In reply to Mike Kaganski from comment #0) > IMO, we need to consider bSmart when trying to detect unknown schemes. There > may be different approaches, the easiest would be to always use > m_eSmartScheme when bSmart is true and m_eSmartScheme is not > INetProtocol::NotValid; a more strict check could be that the next character > after the found colon is a digit; and even stricter check could be that the > sequence of characters after the colon till the end of URL/forward > slash/query separator/fragment separator only consists of digits. [...] > Stephan: what do you think? Yes, some check that it is a mostly-valid URL starting with host:port and missing scheme:// prefix sounds good for the bSmart case.
Mike Kaganski committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/6b973753d407d66dfa5fda86547246c486ab7087 tdf#146754: consider xyz:123 as host:port when parsing URLs smart It will be available in 7.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.