Created attachment 121223 [details] Spreadsheet with user-defined Basic functions OK: ConvertToURL("C:\Foo Bar") --> file:///C:/Foo&20Bar Fail: ConvertToURL("C:\Foo&Bar") --> file:///C:/Foo&Bar Expected: ConvertToURL("C:\Foo&Bar") --> file:///C:/Foo&26Bar Problem occurs with characters ! $ & ' ( ) * + , / : = @ ConvertToURL correctly encodes the follwing characters: <space> " # % ; ? [ \ ] { | } Reference: https://tools.ietf.org/html/rfc3986#page-12
Same with PyUNO: >>> s = '/foo & bar/concept.odt' >>> uno.systemPathToFileUrl(s) 'file:///foo%20&%20bar/concept.odt'
What is the correct procedure to test this bug? I ran the Path2URL macro and it said A Scripting Framework error occurred while running the Basic script Standard.Module1.Path2URL. Message: wrong number of parameters! Set to NEEDINFO. Change back to UNCONFIRMED after you have provided the information.
The functions are called from the spreadsheet cells. Look at the cell formulas.
Sorry about that, I guess the concept was too alien to me. Confirmed. Win 7 Pro 64-bit Version: 5.2.0.0.alpha0+ Build ID: 917d59a84124d1022bd1912874e7a53c674784f1 CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2015-12-12_12:17:04 Locale: fi-FI (fi_FI)
Thanks for filing. IMO an enhancement ?
(In reply to Cor Nouws from comment #5) > Thanks for filing. IMO an enhancement ? No enhancement. One of my Basic macros truncated valid file paths when I was assuming that function convertToURL is able to deal with valid paths like C:/foo & bar/document.odt Since Basic does not even provide basic string functions helping to implement my own URL encoding, I start my code with an error message and program exit when the following function returns True: Function IsBadURL(sURL$) As Boolean Dim a(), s$, nSplit&, sPN$ nSplit = 9 'file:/// If getGUIType = 1 then nSplit = 11 'file:///C: sPN = mid(sURL, nSplit) a() = Array("!", "$", "&", "'", "(", ")", "*", "+", ",", ":", "=", "@") for each s in a() if instr(sPN, s) > 0 then IsBadURL = True exit function endif next End Function
OK.
Weird work-around for the Basic fanboys among us: Function WorkAround_ConvertToURL(sysPath) As String Dim oSrv, a(), sURL$, nSplit&, sCode$, sProtocol$, sPathName$, sChar$, sRoot$ sURL = convertToURL(sysPath) a() = Array("!", "$", "&", "'", "(", ")", "*", "+", ",", ":", "=", "@") If getGUIType = 1 then nSplit = 11 else nSplit = 8 endif sProtocol = left(sURL, nSPlit) sPathName = mid(sURL, nSplit +1) oSrv = createUnoService("com.sun.star.sheet.FunctionAccess") For each sChar in a() if instr(sPathName, sChar) > 0 then sCode = "%"& hex(asc(sChar)) sPathName = oSrv.callFunction("SUBSTITUTE", Array(sPathName, sChar, sCode)) endif next WorkAround_ConvertToURL = sProtocol & sPathName End Function
Created attachment 121619 [details] bt with debug symbols After some tests with gdb, it seems it decides to escape or not from here: http://opengrok.libreoffice.org/xref/core/tools/source/fsys/urlobj.cxx#541 541 inline bool mustEncode(sal_uInt32 nUTF32, INetURLObject::Part ePart) 542 { 543 return !rtl::isAscii(nUTF32) || !(aMustEncodeMap[nUTF32] & ePart); 544 } (gdb indicates ePart=INetURLObject::PART_PCHAR) which gives then: http://opengrok.libreoffice.org/xref/core/tools/source/fsys/urlobj.cxx#441 a 128 elements array. If I apply this patch: diff --git a/tools/source/fsys/urlobj.cxx b/tools/source/fsys/urlobj.cxx index 771c3bb..bf279f16f 100644 --- a/tools/source/fsys/urlobj.cxx +++ b/tools/source/fsys/urlobj.cxx @@ -442,7 +442,7 @@ static sal_uInt32 const aMustEncodeMap[128] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* */ PP, -/* ! */ PA +PD+PE+PF+PG+PH+PI+PJ+PK+PL+PM+PN+PO+PP+PQ+PR, +/* ! */ PA +PD+PE+PF+PG+PH+PI+PJ+PK +PM+PN+PO+PP+PQ+PR, /* " */ PM+PN +PP, /* # */ PM, /* $ */ PA +PD+PE+PF+PG+PH+PI+PJ+PK+PL+PM+PN+PO+PP+PQ+PR, "!" is escaped. However, I don't know the impact of this change on other parts of LO.
** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present on a currently supported version of LibreOffice (5.1.6 or 5.2.3 https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the version of LibreOffice and your operating system, and any changes you see in the bug behavior If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a short comment that includes your version of LibreOffice and Operating System Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to "inherited from OOo"; 4b. If the bug was not present in 3.3 - add "regression" to keyword Feel free to come ask questions or to say hello in our QA chat: http://webchat.freenode.net/?channels=libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug-20170103
This is still in 5.2.3.
** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
Dear Andreas Säger, To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
Dear Andreas Säger, To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
Still in Version: 7.3.4.2 / LibreOffice Community Build ID: 728fec16bd5f605073805c3c9e7c4212a0120dc5 CPU threads: 4; OS: Linux 5.4; UI render: default; VCL: x11 Locale: de-DE (de_DE.UTF-8); UI: en-US Calc: threaded
Why should & be URL-encoded? It only has a special meaning in the query part, i.e. after question mark.
IMHO it is not the task of ConvertToURL() to percent-encode anything except space as %20, as that is the only neither unreserved nor reserved character in an URI and what characters of the reserved characters are allowed depends on the actual URI scheme and protocol. See https://datatracker.ietf.org/doc/html/rfc3986#section-2 and following. It is the task of the application programmer (or another function) to know which reserved characters act as delimiters in a specific URI scheme and need to be percent-encoded.
The URI scheme is known in the function - it is 'file:' URL scheme. It should encode more characters, including # (which otherwise would be misinterpreted as start of fragment) or backslash (in non-Windows paths), which is invalid in URLs. But the characters listed in comment 0 are all normal in file URLs. Note that forward slash delimits hierarchical path parts.
(In reply to Andreas Säger from comment #0) > Fail: ConvertToURL("C:\Foo&Bar") --> file:///C:/Foo&Bar > Expected: ConvertToURL("C:\Foo&Bar") --> file:///C:/Foo%26Bar This is an incorrect expectation. The URL is not required to percent-encode every non-alphanumeric character. This character - as well as the rest listed in the "Problem occurs with characters" - is OK in the URL unchanged. Not percent-encoding it is OK. > Reference: > https://tools.ietf.org/html/rfc3986#page-12 The reference describes *how* to percent-encode octets, and also *when*; and the latter is phrased "when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component". And this is exactly why the characters discussed in the comment 0 are OK in file URL. (In reply to Andreas Säger from comment #6) > One of my Basic macros truncated valid file paths when I was > assuming that function convertToURL is able to deal with valid paths like > C:/foo & bar/document.odt So the bug is somewhere in the said macro. It likely has some incorrect assumptions about URL syntax. Closing NOTABUG.
(In reply to Andreas Säger from comment #0) Just to expand the last comment. RFC 8089 [1] sect. 2 defines 'file' URL as: > file-URI = file-scheme ":" file-hier-part > > file-hier-part = ( "//" auth-path ) > / local-path > > auth-path = [ file-auth ] path-absolute > > local-path = path-absolute > > file-auth = "localhost" > / host with the explanation "importing the "host" and "path-absolute" rules from [RFC3986] (as updated by [RFC6874])". RFC 3986 defines them this way: > host = ... > path-absolute = "/" [ segment-nz *( "/" segment ) ] > segment = *pchar > segment-nz = 1*pchar > pchar = unreserved / pct-encoded / sub-delims / ":" / "@" > pct-encoded = "%" HEXDIG HEXDIG > unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" > sub-delims = "!" / "$" / "&" / "'" / "(" / ")" > / "*" / "+" / "," / ";" / "=" And finally, ALPHA, DIGIT, and HEXDIG are defined in RFC 2234 in an obvious way. The scheme [1] is known both in Basic's ConvertToURL, and in Python's uno.systemPathToFileUrl, as mentioned in comment 18, simply because the functions are defined to convert system path to file URL. The system path characters are converted to characters of the URL's 'file-hier-part'; more precisely, to 'path-absolute', because 'file-auth' is not applicable to system-path-to-URL conversion (the auth information is not available in the system path). Further, I omitted the 'host' part, because the issue did not consider "hosted" system paths, like UNC, so this allows to avoid additional complexity (which doesn't change anything, just clutters the text). Now let us consider the issue character by character: > Problem occurs with characters > ! > $ > & > ' > ( > ) > * > + > , > = All the above are 'sub-delims', explicitly allowed in 'pchar', which constitute both 'segment' and 'segment-nz' of 'path-absolute'. > / This one is explicitly shown as the character delimiting segments in 'path-absolute'. Both Linux filesystems, and Windows filesystems use this character as hierarchy delimiter, so whenever it appears in the source path, it gets "converted" to itself in the resulting URL. > : > @ These two are allowed explicitly in 'pchar'. > ConvertToURL correctly encodes the follwing characters: > <space> > " > { > | > } These are not listed at all among the "very limited set", from which URIs consist (RFC 3986 sect. 1.2.1; Appendix a), so it must be percent-encoded. > \ This one gets converted to "/" on Windows, since it's a hierarchy separator there; on Linux, it behaves the same as the five above. > % This is the character used in 'pct-encoded'; and its conversion is explicitly defined in RFC 3986 sect. 2.4. > # > ? > [ > ] These are 'gen-delims' that aren't explicitly mentioned as allowed in the 'path-absolute' components. > ; And this one is interesting. It is part of 'sub-delims', so in theory, could stay as is. Interestingly, both INetURLObject [2] and osl_getFileURLFromSystemPath [3] (which are used in Basic and Python, respectively), percent-encode it. The most likely reason is that the previous version of the "Uniform Resource Identifiers (URI): Generic Syntax" standard, RFC 2396, didn't allow that character there. [1] https://www.rfc-editor.org/rfc/rfc8089 [2] https://opengrok.libreoffice.org/xref/core/include/tools/urlobj.hxx?r=485300f9#179 [3] https://opengrok.libreoffice.org/xref/core/include/osl/file.h?r=0ce7c84c#1443