Bug 152627 - PDF with hyperlinks: links are lost with import to Draw
Summary: PDF with hyperlinks: links are lost with import to Draw
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
7.3.2.2 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:pdf
: 157242 (view as bug list)
Depends on:
Blocks: PDF-Import-Draw
  Show dependency treegraph
 
Reported: 2022-12-21 01:14 UTC by Graham Perrin
Modified: 2023-09-28 19:00 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot: Okular and LibreOffice Draw views of a PDF (560.34 KB, image/png)
2022-12-21 01:14 UTC, Graham Perrin
Details
PDF for example 02 (20.50 KB, application/pdf)
2022-12-21 01:19 UTC, Graham Perrin
Details
The email from which I produced the PDF for example 02 (9.23 KB, message/rfc822)
2022-12-21 01:24 UTC, Graham Perrin
Details
PDF produced by Draw (45.39 KB, application/pdf)
2022-12-21 01:38 UTC, Graham Perrin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Graham Perrin 2022-12-21 01:14:14 UTC
Created attachment 184279 [details]
Screenshot: Okular and LibreOffice Draw views of a PDF

Visual breakage
===============


Example 01
----------

The screenshot at bug 149216 comment 0. 

Blue underlining neither begins at the beginning of the link, nor ends at the end of the link. 

Underlining overextends, to part of an adjacent word that was not originally part of the linked text. 


Example 02
----------

Screenshot attached. 

Upper window: Okular. 

Lower window: LibreOffice Draw. 

Environment: FreeBSD 14.0-CURRENT. 

% pkg info -x libreoffice
libreoffice-7.4.3.2_1
% uname -aKU
FreeBSD mowa219-gjp4-8570p-freebsd 14.0-CURRENT FreeBSD 14.0-CURRENT #27 main-n259662-ebdf27b6f367-dirty: Sun Dec 11 11:31:52 GMT 2022     grahamperrin@mowa219-gjp4-8570p-freebsd:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64 1400074 1400074
%
Comment 1 Graham Perrin 2022-12-21 01:19:16 UTC
Created attachment 184280 [details]
PDF for example 02

This PDF was used for the screenshot for example 02 in comment 0.
Comment 2 Graham Perrin 2022-12-21 01:24:09 UTC
Created attachment 184281 [details]
The email from which I produced the PDF for example 02

I opened this .eml file in Thunderbird. Printed, destination: Save as PDF.
Comment 3 Graham Perrin 2022-12-21 01:38:46 UTC
Created attachment 184282 [details]
PDF produced by Draw

Functional breakage
===================

The screenshot at comment 0 shows me pointing at a usable link in an Okular view of the PDF at comment 1. 

In the PDF attached to this comment, produced by Draw: 

* there's the (somewhat broken) appearance of links

* with the file open in Okular, pointing at what appears to be a link 
  does not cause the cursor to change (from an arrow, to a hand); and 
  clicking does not have the required effect.
Comment 4 Graham Perrin 2022-12-21 01:43:44 UTC
(In reply to Graham Perrin from comment #0)

> …
> 
> Example 02
> ----------
> 
> Screenshot attached. 
> 
> Upper window: Okular. 
> 
> Lower window: LibreOffice Draw. 
> 
> …

Now I see, the fonts: 

* monospace (correct), viewed in Okular
* not monospace in LibreOffice Draw. 

Is that, bug 85295?

If so, does it partly explain the misplaced beginnings and endings of blue underlines?
Comment 5 V Stuart Foote 2022-12-21 18:25:24 UTC
Seems project poppler based pdfio filter (psdiprocessor.cxx) performs no parsing of the URL/URI--LibreOffice is not a PDF viewer or editor.   T

The xpdf/poppler import is a not-so-simple extraction of page formatting, the text runs are extracted without substantive logic to reassemble syntax or structure.  The URI underlines are graphic decoration not attributes of the individual glyphs, so reasonable they are not aligned with the extracted text runs.

Alternatively, the pdfium based insert is a full PDF viewer and renders the content to document canvas with high fidelity.

IMHO => NAB, this is not a bug, but providing logic in the PDF import filter to recognize URI would be useful.

Strangely can't find an enhancement request for filter import of URL from PDF, setting => Enhancement
Comment 6 Graham Perrin 2022-12-22 02:22:22 UTC
Thank you, 

(In reply to V Stuart Foote from comment #5)

> … the pdfium based insert 

Please, can you put this in a LibreOffice context for me? 

I see <https://pdfium.googlesource.com/pdfium/+/refs/heads/main/README.md>, however it's entirely new to me. 


> Strangely can't find an enhancement request for filter import of URL from
> PDF, setting => Enhancement

I'll edit the summary line here to focus on the enhancement. 

----

Assume that the visual breakage (essentially: monospace misrepresented as proportional) falls under bug 85295.
Comment 7 V Stuart Foote 2022-12-22 02:46:54 UTC
(In reply to Graham Perrin from comment #6)
> > … the pdfium based insert 
> 
> Please, can you put this in a LibreOffice context for me? 
> 
> I see <https://pdfium.googlesource.com/pdfium/+/refs/heads/main/README.md>,
> however it's entirely new to me. 
> 

The pdfium is the Google sponsored PDF viewer for the Chrome browse. Its open source libs provide an alternative to poppler libs for parsing PDF pages to LO's vcl canvas. See bug 89727 and its see also links...

these commits:
=-ref-=
https://gerrit.libreoffice.org/26586
https://gerrit.libreoffice.org/26628
https://gerrit.libreoffice.org/26695
https://gerrit.libreoffice.org/26706
https://gerrit.libreoffice.org/26724
https://gerrit.libreoffice.org/26739
https://gerrit.libreoffice.org/26743
https://gerrit.libreoffice.org/26755
Comment 8 Graham Perrin 2022-12-22 21:19:10 UTC
(In reply to V Stuart Foote from comment #7)

Thanks, no mention of PDFium in LibreOffice Help :-(

I had to search the Internet for a while to discover the feature. 

Draw requires use of the Insert menu (never the File menu) to use PDFium with a PDF file … something like that?
Comment 9 Buovjaga 2023-09-28 10:04:25 UTC
*** Bug 157242 has been marked as a duplicate of this bug. ***
Comment 10 Eyal Rozenberg 2023-09-28 19:00:44 UTC
(In reply to V Stuart Foote from comment #5)
> LibreOffice is not a PDF viewer or editor.

I have recently given a presentation at LOCon about this very issue:

https://app.box.com/s/9y6aiisdagyavppi4132qakt8fck2gif

bottom line:

LibreOffice is the most popular FOSS application people use to edit PDFs. And while it is not appropriate for precision edits of a PDF's structure - an office suite is quite well suited for tasks like editing the exported PDF from a web page. So that's not a valid argument.

LibreOffice' import filter for PDFs is is simply faulty, and must be improved - for the benefit of many users and millions of potential users. It's as simple as that.


Also, on an unrelated note: The hyperlinks are discarded also with the Writer import filter.