170147 – Garbled text and font encoding issues when opening PDFs created via "Print to PDF" drivers

Bug 170147 - Garbled text and font encoding issues when opening PDFs created via "Print to PDF" drivers

Summary: Garbled text and font encoding issues when opening PDFs created via "Print to...

Status:	RESOLVED NOTOURBUG

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Draw (show other bugs)
Version: (earliest affected)	26.8.0.0 alpha0+ master
Hardware:	All All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	PDF-Import-Draw
	Show dependency tree / graph

Reported:	2025-12-27 17:49 UTC by Jose L Viejo
Modified:	2025-12-28 21:50 UTC (History)
CC List:	1 user (show)

See Also:
Crash report or crash signature:

Attachments
sample pdf, printed pdf to pdf (1.16 MB, application/pdf) 2025-12-27 20:44 UTC, Jose L Viejo	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jose L Viejo 2025-12-27 17:49:08 UTC

Description:
I’m reporting an issue regarding how LibreOffice Draw handles text in PDFs that were generated using "Print to PDF" drivers (e.g., from a browser) instead of being directly downloaded or exported natively.

When users choose to "Print to PDF" to save a pdf, the resulting file often appears corrupted in Draw. The text is displayed as "garbage" (random symbols) or renders with the completely wrong font.

The exact same files open and render perfectly in almost any other PDF viewer (Adobe Acrobat, Chrome, Edge, etc.).

Even if I run the file through PDF repair tools, no errors are flagged, yet Draw still fails to render the text correctly.

It seems like Draw is struggling to interpret the font encoding or subsets created by these specific print drivers, making the file as "illegible" for editing while others read it just fine.

Steps to Reproduce:
Steps to reproduce, open a web pdf.

instead of directly downloading it, use a "Print to PDF" driver to save it.

Open the resulting file in LibreOffice Draw, Draw does not alert any issue when opening it.

Notice that the text is unreadable/garbled.

Actual Results:
Failure with printing from PDF to PDF, direct downloads without problems

Expected Results:
It should operate like readers that open printed PDFs from PDF to PDF without any problems.

Reproducible: Always

User Profile Reset: No

Additional Info:
Version: 25.8.4.2 (X86_64)
Build ID: 290daaa01b999472f0c7a3890eb6a550fd74c6df
CPU threads: 4; OS: Windows 10 X86_64 (build 19045); UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL threaded

Comment 1 V Stuart Foote 2025-12-27 19:50:59 UTC

There are two PDF import filters available to LibreOffice.

Check to see if the Insert -> Image --so pdfium based-- filter  meets your needs with its hi-fidelity raster. 

As opposed to the File -> Open filtering--so poppler parsed to cairo object rendering. May have some issues with and loss of fidelity.

Comment 2 Jose L Viejo 2025-12-27 20:44:31 UTC

Created attachment 204828 [details]
sample pdf, printed pdf to pdf

Comment 3 Jose L Viejo 2025-12-27 20:46:41 UTC

Thanks for your prompt reply. Inserting it as an image opens it correctly, but the objects are lost, leaving only a rather unusable image.


Draw, when I use File -> Open, I get unreadable objects, as many as there are texts.

Draw, when I use Insert -> Image, I get a single, readable image, but it's unusable for a bank statement.

Acrobat Reader, File -> Open opens it correctly, preserving the selectable text.

PDF24 Reader, File -> Open opens it correctly, preserving the selectable text, but it groups the text. When I select text, the selection doesn't correspond with the mouse.

Comment 4 V Stuart Foote 2025-12-27 22:39:24 UTC

Sorry, but uncompress of your "Bullzip PDF" print does not expose any usable font or glyph /toUnicode tables for the poppler/cairo import filters. Get a different set of character glyphs when the PDF is uncompressed.

While the fully pdfium filter renders the "printout" to raster image placed onto document canvas.

Otherwise PDF are not intended to be "editable", and the project's filter import and reconstruction of text spans extracted via poppler and into Cairo draw objects is *dependent* on exposure of any text strings to poppler libs.

Having a non-legible text extraction by poppler is indicative that the source PDF is encoded with other than Unicode, or with Unicode PUA.

Would note when checked, got similar results opening the PDF with Inkscape, likewise poppler based. It doesn't work there either.

=> NOB

Comment 5 Eyal Rozenberg 2025-12-28 00:54:32 UTC Comment hidden (obsolete)

(In reply to V Stuart Foote from comment #1)
Stuart, stop trying to deflect bug reports about opening PDFs by suggesting people insert them as images. That's a different feature and not an alternative to actually importing the PDF.

Comment 6 Eyal Rozenberg 2025-12-28 01:04:52 UTC

I might actually say this is "not a bug", in the following sense: If I open this PDF in a regular PDF viewer app, like KPDF, and copy the text out of it - I get the same gibberish that OP has reported. That is, it seems that junk is what's in the file, and Bullzip somehow mapped this to glyphs of visually-readable text.

If the text _had_ been accessible in a PDF viewer, I would say this is our bug, since fonts not being available is no reason not to produce a reasonable rendering with a fallback font.

On the other hand - we could say that this is a bug in Bullzip PDF, so NOTOURBUG is also reasonable.

Comment 7 Jose L Viejo 2025-12-28 19:33:28 UTC

I find the 'Not a Bug' assessment arbitrary. Relying on KPDF’s failure to justify LibreOffice's behavior is flawed logic; it simply suggests that KPDF shares the same limitation or rendering bug as Draw.

Blaming Bullzip or labeling the file content as 'junk' overlooks the critical fact that Adobe Acrobat Reader—the reference implementation of the PDF standard—renders and prints the file perfectly.

We should not use third-party interpreters (like KPDF, PDF24, etc.) as the benchmark for correctness. If they all fail, it is not an excuse for LibreOffice to fail as well. The fact remains: the file works flawlessly in Acrobat, meaning the visual data is valid. LibreOffice is failing to interpret what the standard viewer handles correctly. Therefore, this is a compatibility bug

Comment 8 V Stuart Foote 2025-12-28 21:50:19 UTC

(In reply to Jose L Viejo from comment #7)
> I find the 'Not a Bug' assessment arbitrary. Relying on KPDF’s failure to
> justify LibreOffice's behavior is flawed logic; it simply suggests that KPDF
> shares the same limitation or rendering bug as Draw.
> 
> Blaming Bullzip or labeling the file content as 'junk' overlooks the
> critical fact that Adobe Acrobat Reader—the reference implementation of the
> PDF standard—renders and prints the file perfectly.
> 
> We should not use third-party interpreters (like KPDF, PDF24, etc.) as the
> benchmark for correctness. If they all fail, it is not an excuse for
> LibreOffice to fail as well. The fact remains: the file works flawlessly in
> Acrobat, meaning the visual data is valid. LibreOffice is failing to
> interpret what the standard viewer handles correctly. Therefore, this is a
> compatibility bug

No. You are missing the point. LibreOffice reads the obfuscated PDF just fine using a pdfium based filter--as you indicated.

LibreOffice is not failing, it is doing exactly what it is able to do. And, the PDF is internally corrupt from the poppler parsing lib projects perspective.

So these, or any other PDF for which poppler can not read text spans, deliver object streams that *are* parsed with poppler lib calls, but they include obfuscated character mappings used for the print job that do not include the Unicode glyph tables poppler needs to bring characters back to their Unicode point values. Which are *necessary* to then assemble into cairo drawing objects to place onto LibreOffice drawing canvas or writer page. 

What is passed is garbage because it is not structured in a fashion usable by poppler lib extraction. Just reality.

Absent that minimal structure--there is nothing an import filter parsing text runs from the PDF can do, it is just reading what's there.  Any poppler based application will suffer the same way. 

Results contrasted with the pdfium based filter, which makes *no* attempt at all to parse text runs--it simply places what it receives as a discrete object. It reads exactly what the PDF page layout states for each object with each character and graphic element laid out as parsed. All assembled and then rendered to a raster image (with some user control).

=> NOT OUR BUG (i.e. not our responsibility to correct Bullzip PDF, or the poppler project)