Bug 112225 - Percent-encoding (URL encoding) shouldn't be used for all hyperlink characters except space
Summary: Percent-encoding (URL encoding) shouldn't be used for all hyperlink character...
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
5.4.0.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsDevAdvice
Depends on:
Blocks: Hyperlink
  Show dependency treegraph
 
Reported: 2017-09-05 08:36 UTC by Thomas Lendo
Modified: 2022-07-12 12:48 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Example files in a zip archive (odt, docx and the exported pdf files) (508.65 KB, application/x-zip-compressed)
2017-12-14 10:35 UTC, Thomas Lendo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Lendo 2017-09-05 08:36:38 UTC
Is it necessary to percent-encoding (URL encoding) all hyperlink UTF-8 characters today? Some Windows programs like PDF-XChange have problems to open such encoded hyperlinks (e.g. "f%C3%BCr" is problematic but not "für").

For space (%20) an exception can be made for web browsers and other programs that handle space as resource separator.

MS Word 2013 doesn't percent-encoding hyperlinks. I haven't seen a problem due to that behavior. Someone else?
Comment 1 Xisco Faulí 2017-12-14 09:02:21 UTC
Could you please share a document created by MS Word 2013 ?
Comment 2 Thomas Lendo 2017-12-14 10:35:08 UTC
Created attachment 138438 [details]
Example files in a zip archive (odt, docx and the exported pdf files)

Xisco, the zip archive contains:
* "Bug 112225 Straßenbrücken.odt" created with LibO 6.1 with a link to the docx file
* "Bug 112225 Straßenbrücken.docx" created with MSO Word 2013 with a link to the odt file
* pdf file that was saved from within MSO Word 2013
* pdf file that was exported by LibO 6.1.0.0.alpha0+ (Win-x86_64@42, 2017-12-12_00:23:14, Windows 10.0)
Comment 3 Thomas Lendo 2017-12-14 10:46:38 UTC
PS: I missed to use the option "Export URLs relative to file system" in the PDF options ob LibO 6.1. Anyway, it's odd that you can't use a relative path in the hyperlink dialog.
Comment 4 Anton F 2021-07-22 10:13:37 UTC
Not reproduced.

Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: cd2b5168e8ef1cb6e721bc5220421464ed723096
CPU threads: 2; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: ru-RU (ru_RU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-07-21_14:56:23
Calc: threaded
Comment 5 Thomas Lendo 2022-02-03 16:04:20 UTC
Still reproducible with

Version: 7.4.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 52443996eff721e612ac4afc1eb1a53bb8a3e06f
CPU threads: 12; OS: Windows 10.0 Build 19043; UI render: Skia/Vulkan; VCL: win
Locale: de-AT (de_AT); UI: de-DE
Calc: threaded



Steps:
- Open the attached zip archive.
- Extract the odt and docx files.
- Open the odt file in Writer, the docx file in MS Word.
- Export from Writer to pdf and save from Word to pdf.
- Open the 2 pdf files and see the tooltip. ü for example is shown as %C3%BC and ß is sown as %C3%9F in the pdf file exported from Writer.
Comment 6 Stephan Bergmann 2022-07-12 12:48:23 UTC
(In reply to Thomas Lendo from comment #0)
> Is it necessary to percent-encoding (URL encoding) all hyperlink UTF-8
> characters today? Some Windows programs like PDF-XChange have problems to
> open such encoded hyperlinks (e.g. "f%C3%BCr" is problematic but not "für").

That appears to be a shortcoming of those "Windows programs like PDF-XChange".  (And I suggest you get in touch with them.)

With your sample PDF files from attachment 138438 [details], "Bug 112225 Straßenbrücken (exported with LibO 6.1).pdf" contains

> <</Type/Action/S/URI/URI(file:///C:/Users/ths/Desktop/Bug%20112225%20Stra%C3%9Fenbr%C3%BCcken.docx)>>

while "Bug 112225 Straßenbrücken (saved with Word 2013).pdf" contains

> <</Type/Action/S/URI/URI(Bug%20112225%20Straßenbrücken.odt) >>

when interpreting the file's bytes as UTF-8.  (The difference in absolute vs. relative URL is apparently as per your comment 3.)

But at least <https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf> "PDF 1.7", section 12.6.4.7 "Interactive Features: Actions: Action Types: URI Actions" in table 206 "Additoinal entries specific to a URI action" on page 424, specifies that key "URI" is of type "ASCII string" with value: "(Required) The uniform resource identifier to resolve, encoded in 7-bit ASCII."

And your "...(exported with LibO 6.1).pdf" matches that requirement while your "...(saved with Word 2013).pdf" does not.

(See also <https://git.libreoffice.org/core/+/a346dfccd7e342d776dd59eb3ed128557e22a1bf%5E%21> "tdf#70833: IDNA support when exporing hyperlinks to PDF" for how we are careful to write ASCII-only URI values.)