169402 – Some characters overlapped in a PDF on import with CID font subsets

Bug 169402 - Some characters overlapped in a PDF on import with CID font subsets

Summary: Some characters overlapped in a PDF on import with CID font subsets

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Draw (show other bugs)
Version: (earliest affected)	25.8.2.2 release
Hardware:	x86-64 (AMD64) All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	PDF-Import-Draw
	Show dependency tree / graph

Reported:	2025-11-12 12:40 UTC by voldemarz@gmail.com
Modified:	2025-11-12 21:53 UTC (History)
CC List:	2 users (show)

See Also:	169174
Crash report or crash signature:

Attachments
PDF that does not appear properly in LibreOffice (664.45 KB, application/pdf) 2025-11-12 12:40 UTC, voldemarz@gmail.com	Details
Comparison between LibreOffice (top) and Brave browser (bottom) (465.88 KB, image/webp) 2025-11-12 12:43 UTC, voldemarz@gmail.com	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description voldemarz@gmail.com 2025-11-12 12:40:12 UTC

Created attachment 203897 [details]
PDF that does not appear properly in LibreOffice

Some characters in a PDF overlap. Don't see a clear pattern, might be related to latvian special characters “ā,ē,ū,š,ģ,ķ,ļ,ž,č,ņ“.

The problematic PDF  is attached. It was exported by MS Word and Windows and appears fine on all other PDF viewers I have tried.

LibreOffice was installed via Discover on Kubuntu 24.04 from Flathub. No settings were modified.

Comment 1 voldemarz@gmail.com 2025-11-12 12:43:38 UTC

Created attachment 203898 [details]
Comparison between LibreOffice (top) and Brave browser (bottom)

Comment 2 V Stuart Foote 2025-11-12 14:33:29 UTC

The five "fonts" recorded into the PDF are unusable.

CIDFont+F1
CIDFont+F2
CIDFont+F3
CIDFont+F4
CIDFont+F5

Majority of the text runs are in the CIDFont+F4.

IIANM the Character ID (CID) subset fonts have no meaning to LibreOffice's use of Poppler --> Cairo import filter object conversion to sd text strings. So an *unmanaged font fall back* occurs for each unknown font.

Currently you can use the Tools -> Options -> Font dialog to substitute a system font of your choosing. I used the serif XITS font (for its excellent font metrics), but you probably want a sanserif.  

While if you have the source OOXML on you system you could check and assign LibreOffice font substitution to use the exact font of the document.

Also, seems one can force the MS Word to PDF export to use its "Best for printing" mode that should embed the actual font rather than CID subsets.

@Dave, any insight on the poppler --> Cairo handling of CID subsets in our pdfio import filters? Is this another aspect of our poppler builds not including font support?

Comment 3 V Stuart Foote 2025-11-12 21:04:22 UTC

@fpy, font subsets in the see also bug 169174 are not CID 

Rather they are dvips rendered as PS type 1, would think those would be near hopeless to try to sort out, and not really the issue here.

Comment 4 fpy 2025-11-12 21:18:05 UTC

(In reply to V Stuart Foote from comment #3)
> @fpy, font subsets in the see also bug 169174 are not CID 
> 
> Rather they are dvips rendered as PS type 1, would think those would be near
> hopeless to try to sort out, 

so, WONTFIX ?

> and not really the issue here.

ok. not DUPLICATE

Comment 5 Dave Gilbert 2025-11-12 21:53:54 UTC

(In reply to V Stuart Foote from comment #3)
> @fpy, font subsets in the see also bug 169174 are not CID 
> 
> Rather they are dvips rendered as PS type 1, would think those would be near
> hopeless to try to sort out, and not really the issue here.

If I understand correctly, the pdf does have character widths encoded in it,

6 0 obj
<<
/Type /Font
/BaseFont /BEAPJB+CMSS17
....
/Widths [ 625 313 313 313 313 313 313 313 313 313 ...

so I guess we could do something to find the best metch based on that.

(Maybe even just choseing something no-larger would work to avoid overlaps).

PDF does weird font stuff all the time because of tending to include them, but also because it's setup for doing subsets of fonts by the way it encodes text.