Bug 101155 - FILEOPEN PDF text overlaps because text is split to many boxes, with duplicated characters
Summary: FILEOPEN PDF text overlaps because text is split to many boxes, with duplicat...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
3.6.7.2 release
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: interoperability
Keywords: filter:pdf
: 160277 (view as bug list)
Depends on:
Blocks: PDF-Import-Draw
  Show dependency treegraph
 
Reported: 2016-07-27 14:32 UTC by E.Mi
Modified: 2024-03-20 22:34 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
pdf (134.46 KB, application/pdf)
2016-07-27 14:32 UTC, E.Mi
Details
Comparison (396.67 KB, image/png)
2016-07-27 14:35 UTC, E.Mi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description E.Mi 2016-07-27 14:32:34 UTC
Created attachment 126435 [details]
pdf
Comment 1 E.Mi 2016-07-27 14:35:03 UTC
Created attachment 126436 [details]
Comparison
Comment 2 Heiko Tietze 2016-07-30 07:42:31 UTC
/confirmed (moving from Draw to filter & storage)

The PDF contains Garamond which is not installed on my Linux system. Perhaps the issue can be attributed to font subsitution.

Version: 5.2.0.3
Build ID: 7dbd85f5a18cfeaf6801c594fc43a5edadc2df0c
CPU Threads: 8; OS Version: Linux 4.6; UI Render: default; 
Locale: de-DE (en_US.UTF-8)
Comment 3 Buovjaga 2016-08-05 18:35:16 UTC
It already overlaps in 3.6 (just checking for regressions here..).
Well, I don't have Garamond either.

Arch Linux 64-bit
Version 3.6.7.2 (Build ID: e183d5b)
Comment 4 QA Administrators 2017-09-01 11:20:42 UTC Comment hidden (obsolete)
Comment 5 QA Administrators 2020-05-23 03:42:53 UTC Comment hidden (obsolete)
Comment 6 Timur 2021-03-05 12:16:00 UTC
Repro 7.2+ in Linux with no Garamond. Looks better with replacement to EB Garamond. 
But real issue is that text in one line ex. title is split to many boxes so they don't fit.
Comment 7 QA Administrators 2023-03-06 04:23:02 UTC Comment hidden (obsolete)
Comment 8 Stéphane Guillou (stragu) 2024-03-20 02:15:26 UTC
Same in recent trunk build:

Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 479b5bbe8ca2177ba7574e7aa2308b5d0de1895c
CPU threads: 8; OS: Linux 6.5; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: CL threaded

And on Windows 11.

However, opening the PDF in Firefox and doing Shift + Arrow keyboard selection of characters e.g. in the "Nome" text run jumps to the line below before reaching the "e". As if the whole document is split into columns of characters.

Same file is handled better in e.g. Evince and Okular, in which it is possible to select whole lines of text.

Wondering where that kind of splitting is coming from (i.e. if whatever created the PDF is somehow at fault), and if it is possible to mimic the grouping that some PDF readers manage?

In any case, very much related to bug 32249.
Comment 9 Stéphane Guillou (stragu) 2024-03-20 02:17:45 UTC
*** Bug 160277 has been marked as a duplicate of this bug. ***
Comment 10 Stéphane Guillou (stragu) 2024-03-20 02:19:05 UTC
Other sample PDF with same splitting issue in attachment 193198 [details] from bug 160277.
Comment 11 Eyal Rozenberg 2024-03-20 22:34:35 UTC
Can reproduce this with:

Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: ffccbf4762a9ae810bcdd21c41fccdd436e7bfc9
CPU threads: 4; OS: Linux 6.6; UI render: default; VCL: gtk3
Locale: he-IL (en_IL); UI: en-US
Calc: threaded

and I _do_ have Garamond - although with Garamond, the effect is much more subtle.