Bug 78216 - PDF export should not remap embedded font
Summary: PDF export should not remap embedded font
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
(earliest affected) release
Hardware: Other All
: medium normal
Assignee: Not Assigned
Keywords: filter:pdf
: 78215 (view as bug list)
Depends on:
Blocks: PDF-Export
  Show dependency treegraph
Reported: 2014-05-03 07:17 UTC by Markus Klingspor
Modified: 2019-08-21 21:00 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Note You need to log in before you can comment on or make changes to this bug.
Description Markus Klingspor 2014-05-03 07:17:32 UTC
When exporting to a PDF, the font gets embedded into the PDF. To save space, only the used characters are embedded if the font is sub settable to save space. The characters are remapped starting with 0. To be able to extract the text from the PDF, a ToUnicode CMAP is added. However, when converting the PDF to Postscript and back to PDF, the CMAP is lost as Postscript has no notion of ToUnicode, but instead relies on CMAPs embedded in the font.
When Printing the document to PDF the font still gets subsetted but the characters are not remapped. Therefore although the ToUnicode map also gets lost during processing, the Text can still be extracted.
The effect can be reproduced by exporting to PDF and then using pdf2ps and gs -sDEVICE=pdfwrite to generate a PDF again.
Comment 1 Julien Nabet 2014-05-03 08:20:00 UTC
*** Bug 78215 has been marked as a duplicate of this bug. ***
Comment 2 steve 2014-10-27 12:38:06 UTC
Since we do have a duplicate report this issue has already been confirmed thus setting to NEW.
Comment 3 QA Administrators 2015-12-20 16:07:07 UTC Comment hidden (obsolete)
Comment 4 Markus Klingspor 2016-01-13 16:16:45 UTC
The problem still exists with Version on MacOSX 10.11.2
Comment 5 QA Administrators 2017-10-30 08:29:10 UTC Comment hidden (obsolete)
Comment 6 Khaled Hosny 2019-08-21 21:00:26 UTC
This is how PDF works, and nearly every PDF producer does the same. Copying text from postscript is not something we can generally support since there are often complex character to glyph relationships that can’t be addressed by simply not remapping the font.