Bug 165411 - PDF Import: Monospaced attribute of fonts dropped
Summary: PDF Import: Monospaced attribute of fonts dropped
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: PDF-Import-Draw Fonts PDF-Import-Writer
  Show dependency treegraph
 
Reported: 2025-02-23 23:31 UTC by Eyal Rozenberg
Modified: 2025-02-25 08:12 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Side-by-side screenshot: Attachment 199370 Pg 5 in LO Draw 25.8 nightly vs Atril PDF viewer (301.17 KB, image/png)
2025-02-24 16:11 UTC, Eyal Rozenberg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2025-02-23 23:31:21 UTC
Consider the PDF document in attachment 198915 [details] - page 5, "Installation" section,  last paragraph. It's a command listing, single lined, with the text being in a monospace variant of the Computer Modern font.

Unfortunately, when opened in LO (Draw or Writer), it's considered as just plain old "Computer Modern" - variable-space, Roman, serif, medium weight - like the text in previous lines. This, inlike boldface Computer Modern we find in the beginning of the same section (the word "Important:")

(There are additional command listings on that page, below and above the example I gave.)
Comment 1 V Stuart Foote 2025-02-24 14:55:44 UTC
@Eyal, give us a clip Adobe Reader vs. LO Draw. I'm looking but no seeing an issue on the 5th page of the PDF.
Comment 2 Eyal Rozenberg 2025-02-24 16:11:54 UTC
Created attachment 199421 [details]
Side-by-side screenshot: Attachment 199370 [details] Pg 5 in LO Draw 25.8 nightly vs Atril PDF viewer

As requested. Top is a recent LibreOffice Draw 25.8 nightly, bottom is the Atril PDF viewer. I'm sure it would be the same with Adobe.
Comment 3 V Stuart Foote 2025-02-24 16:27:07 UTC
(In reply to Eyal Rozenberg from comment #0)
> Consider the PDF document in attachment 198915 [details] - page 5,

(In reply to Eyal Rozenberg from comment #2)
> Created attachment 199421 [details]

Oh, makes more sense now. Think you meant attachment 199370 [details] from bug 16536
Comment 4 V Stuart Foote 2025-02-24 16:30:50 UTC
(In reply to V Stuart Foote from comment #3)

s/bug 16536/bug 165363/
Comment 5 Eyal Rozenberg 2025-02-24 19:10:21 UTC
(In reply to V Stuart Foote from comment #4)

Yes, sorry and thanks. Too bad we don't have the ability to edit comments.

I did mean attachment 199370 [details] from bug 165363. It's the Emacs org-mode guide.
Comment 6 V Stuart Foote 2025-02-24 22:53:52 UTC
A PDF font library only holds the glyphs. Spacing of individual glyphs is encoded into the Text object streams, which we don't directly parse. Just the start position, size and text runs themselves.

Much like your ask for bug 165396 it would require much more processing to parse each Text object stream and then to sum up the stream's glyph positions and then use that to detect and to distribute as the basis for intercharacter spacing in the resulting Draw text shape.

Text object conversion now is simple, we read the text run and locate its position without concern if font use was as fixed or proportionaly spaced.  

Its mostly reliable now, with bidi and complex composite glyphs having occasional issue, but this would add too much complexity to what is suited to task now.

IMHO => WF
Comment 7 Eyal Rozenberg 2025-02-24 22:59:11 UTC
(In reply to V Stuart Foote from comment #6)
> A PDF font library only holds the glyphs. Spacing of individual glyphs is
> encoded into the Text object streams, which we don't directly parse. Just
> the start position, size and text runs themselves.


The problem is not (mostly) with the spacing: The problem is with the choice of font variant. The Computer Modern typeface has monospaced variants, just like it has italic variants. The ask is not for LO to detect italicization of glyphs, nor their being spaced uniformly, but rather to notice the PDF tells us to use certain typeface variants. LO manages to do that for Bold and Italic - but not for monospace. So,

> it would require much more processing to parse each Text object stream 

No, it would not.