Bug 85295 - PDF: handling of embedded fonts, glyphs, subsets
Summary: PDF: handling of embedded fonts, glyphs, subsets
Status: RESOLVED WONTFIX
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:pdf
: 79044 147178 150132 (view as bug list)
Depends on:
Blocks: PDF-Import-Draw
  Show dependency treegraph
 
Reported: 2014-10-21 16:19 UTC by Alexandr
Modified: 2023-04-24 18:03 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
A file where the integrated font ain't used, instead a much bigger one that messes it all up. (336.42 KB, application/x-xz)
2015-02-09 23:59 UTC, Jouni Järvinen
Details
An Okular list of three embedded subsets (23.36 KB, image/png)
2022-12-22 01:24 UTC, Graham Perrin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexandr 2014-10-21 16:19:05 UTC
PDF file format allows to embed fonts. It makes pdf suitable if you need correct representation on different systems. Unfortunately, LibreOffice pdfimport does not support the feature and often destroys documents (see bug 63834, bug 76500, bug 77830 and bug 79044 ). If it is impossible to import fonts, it would be useful at least preserve fonts to make LibreOffice usable for small changes in pdf.
Difficulty is that usually whole font is not embedded, but only its subset is.
Comment 1 A (Andy) 2014-10-24 19:49:29 UTC
Reproducible with LO 4.3.2.2 (Win 8.1)
Comment 2 Jouni Järvinen 2015-02-09 23:58:05 UTC
Reproducible on 4.4.0.3, Win7 x64.
Comment 3 Jouni Järvinen 2015-02-09 23:59:23 UTC
Created attachment 113273 [details]
A file where the integrated font ain't used, instead a much bigger one that messes it all up.
Comment 4 QA Administrators 2016-02-21 08:34:34 UTC Comment hidden (obsolete)
Comment 5 Jouni Järvinen 2016-02-21 16:39:42 UTC
Reproducible on 5.1.0 RC3 aka 5.1.0.3, Win7 x64.

Overview of issues:
1) texts aren't center-aligned, instead more than somewhat to the right, flowing over the borders
2) in case of multi-row text,
2.1) the font is too large
2.2) text ain't on multiple rows
3) the pages seem to be of different sizes from each other

The font is too big for this purpose. That alone causes #1 and #2.*.
Comment 6 Jouni Järvinen 2016-02-21 16:41:46 UTC
(In reply to Jouni Järvinen from comment #5)
> Reproducible on 5.1.0 RC3 aka 5.1.0.3, Win7 x64.
> 
> Overview of issues:
> 1) texts aren't center-aligned, instead more than somewhat to the right,
> flowing over the borders
> 2) in case of multi-row text,
> 2.1) the font is too large
> 2.2) text ain't on multiple rows
> 3) the pages seem to be of different sizes from each other
> 
> The font is too big for this purpose. That alone causes #1 and #2.*.

My bad on that one, I didn't re-check my text. Ignore it, and take this one instead:

Overview of issues:
1) texts aren't center-aligned, instead more than somewhat to the right, flowing over the borders
2) the pages seem to be of different sizes from each other in LO, but perfect in Foxit Reader, everything being in the middle.
Comment 7 Heiko Tietze 2016-05-10 12:03:43 UTC
*** Bug 79044 has been marked as a duplicate of this bug. ***
Comment 8 QA Administrators 2017-09-01 11:15:16 UTC Comment hidden (obsolete)
Comment 9 Michael Stahl (allotropia) 2017-10-12 19:56:28 UTC
PDF does not embed fonts, it embeds those glyphs which are used.

LibreOffice is an editing application and so importing only a subset
of a font is frustrating and useless in practice because you can't
edit the document to add letters that weren't used in the original PDF.
Comment 10 LibreTraining 2018-05-17 21:53:37 UTC
(In reply to Michael Stahl from comment #9)
> PDF does not embed fonts, it embeds those glyphs which are used.
> 
> LibreOffice is an editing application and so importing only a subset
> of a font is frustrating and useless in practice because you can't
> edit the document to add letters that weren't used in the original PDF.

PDFs can embed full and complete installable fonts.

The fonts must have the embedding setting in the font set to Installable.
For example, the Liberation and Noto fonts included with LO are Installable.

The application creating the PDF must embed the entire font file.
Adobe applications will not do this no matter if the font allows it.
There is no reason why LibreOffice cannot be enabled to embed an entire font in a PDF when it is clearly allowed by the font creator.

The application reading the PDF must support installing the embedded font file.
Some PDF readers will install the embedded fonts.
Again, there is no reason why LibreOffice when importing a PDF cannot be enabled to install embedded fonts when the fonts are properly embedded, and the font allows it.

The font sub-setting that goes on is either caused by the fonts' internal embedding settings restrictions, and/or the application doing the PDF creation and font embedding either restricts it or cannot do it.
Sometimes the font sub-setting is a user option setting which is designed to minimize the size of the PDF. Just like ODT files with embedded fonts, PDF files with embedded full font files can be quite large.

So this is possible.
It is not a limitation of the PDF format.
It is a limitation of LibreOffice.
Comment 11 Timur 2022-02-28 14:15:00 UTC
I don't know why this was closed. Bug (unfortunelly) may remain open for years/decades but I don't support WontFix. 
I see often similar reports "PDF doesn't look good", where submitters are surely not satisfied with any explanation that keeps wrong look. 

Some more examples are attachment 165948 [details] from bug 137128 where AvenirLTStd-Heavy is in PDF but AvenirLTStd in LO and attachment 178037 [details] from bug 147178 where ArialMT is in PDF but Arial in LO.
Comment 12 V Stuart Foote 2022-02-28 15:18:36 UTC
(In reply to Timur from comment #11)
> I don't know why this was closed. Bug (unfortunelly) may remain open for
> years/decades but I don't support WontFix. 
> I see often similar reports "PDF doesn't look good", where submitters are
> surely not satisfied with any explanation that keeps wrong look. 
> 
> Some more examples are attachment 165948 [details] from bug 137128 where
> AvenirLTStd-Heavy is in PDF but AvenirLTStd in LO and attachment 178037 [details]
> [details] from bug 147178 where ArialMT is in PDF but Arial in LO.

Michael S. in comment 9 is correct. Unless the creator of the PDF intentionally and legally embeds the entire font, only a subset of the actual glyphs used in the document are available in the PDF.  That is the vast majority of PDFs that LO users would be importing or inserting.

While possible to parse and fully load a fully embedded font, it would be unreliable and error prone as there is no real standard for doing so across the many PDF generators.

Insertion to canvas via pdfium libs is a non-issue, leaving import "opening" into a ODG drawing where internal PDF font naming is notoriously fickle--all meaning that what we do now on import is appropriate. And for the few PDF that actually embed a full font, we are safe to ignore the practice.

LibreOffice is NOT a PDF editor, and we have adjusted the PDF form handling to use only the "builtin" PDF reader fonts [1] on export.  We have no obligation to  do more with font handling for what is intended to be a non-editable presentation format--doing more only disappoints users who discover neither their imported nor their exported PDF retain fidelity to the original source font.

IMHO => WF was and remains the correct resolution to avoid unreasonable user expectation of what is a low frequency usage. Not worth any dev effort.

=-ref-=
[1] for bug 50879 - https://gerrit.libreoffice.org/c/core/+/99032
Comment 13 Timur 2022-02-28 15:31:49 UTC
*** Bug 147178 has been marked as a duplicate of this bug. ***
Comment 14 Heiko Tietze 2022-03-01 07:25:47 UTC
AFAIU, we do not use the few glyphs stored in the PDF because it's not a complete font and editing wont be possible. Makes sense to me. If the PDF contains the whole font we should use it, on the other hand.

I suggest to show an infobar if a font was not shipped with the PDF. Something like "Fonts in this document have been replaced. For full editing capabilities please ask the original author to embed the whole font." 
(Me concerns the list of bugs in c0 regarding the issue.)

And we should provide options to export the font because it's our mission to make documents readable cross-platform and cross-applications. According [1] the open font spec knows: No embedding, Print and preview, Editable, Installable. And MSO allows to save fonts respectively.

[1] https://www.microsoft.com/en-us/microsoft-365/blog/2015/07/06/document-font-embedding-demystified/
Comment 15 Timur 2022-07-27 07:53:27 UTC
*** Bug 150132 has been marked as a duplicate of this bug. ***
Comment 16 Graham Perrin 2022-12-22 01:24:28 UTC
Created attachment 184304 [details]
An Okular list of three embedded subsets

Please: is the attachment at bug 152627 comment 0 also an example of this bug 85295?

In this case, there was no intention to add any character (isntead, I might have removed parts of e-mail addresses). 

Opening the file (keyword: filter:pdf) results in an essential difference: 

* wherever there's a monospace font, Draw presents proportional (not monospace).

The attachment here shows the three embedded subsets.
Comment 17 Graham Perrin 2022-12-22 02:14:19 UTC
(In reply to Graham Perrin from comment #16)

> * wherever there's a monospace font, Draw presents 
>   proportional (not monospace).

Might this improve through the change that's currently associated with bug 143659? 

Maybe more useful than the preceding screenshot: the information below was amongst output from strings(1) for the PDF. 


…
57 0 obj
<< /Type /FontDescriptor
   /FontName /IRJSJC+CairoFont-0-0
   /Flags 4
   /FontBBox [ -64 -240 907 760 ]
   /ItalicAngle 0
   /Ascent 760
   /Descent -240
   /CapHeight 760
   /StemV 80
   /StemH 80
   /FontFile3 53 0 R
endobj
26 0 obj
<< /Type /Font
   /Subtype /Type1
   /BaseFont /IRJSJC+CairoFont-0-0
   /FirstChar 32
   /LastChar 117
   /FontDescriptor 57 0 R
   /Encoding /WinAnsiEncoding
   /Widths [ 260 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 285 0 0 0 0 0 0 0 0 0 740 0 549 0 0 0 0 0 0 0 0 0 0 0 0 551 579 0 0 0 0 0 0 0 0 0 0 0 0 604 633 514 0 591 0 0 0 0 305 0 0 982 0 619 0 0 454 0 434 657 ]
    /ToUnicode 55 0 R
endobj
…


…
62 0 obj
<< /Type /FontDescriptor
   /FontName /CTGIPS+CairoFont-1-0
   /Flags 4
   /FontBBox [ 0 -240 917 765 ]
   /ItalicAngle 0
   /Ascent 765
   /Descent -240
   /CapHeight 765
   /StemV 80
   /StemH 80
   /FontFile3 58 0 R
endobj
27 0 obj
<< /Type /Font
   /Subtype /Type1
   /BaseFont /CTGIPS+CairoFont-1-0
   /FirstChar 32
   /LastChar 133
   /FontDescriptor 62 0 R
   /Encoding /WinAnsiEncoding
   /Widths [ 260 0 408 0 0 0 0 0 0 0 0 0 268 322 0 372 572 572 572 572 572 0 0 0 0 0 268 0 572 0 572 0 899 0 0 0 730 0 519 0 0 0 0 0 0 0 0 0 605 0 0 0 0 0 0 930 0 0 0 0 0 0 0 0 0 561 0 480 615 564 344 615 618 258 0 0 258 935 618 605 615 0 413 479 361 0 0 0 0 510 0 0 0 0 0 0 0 0 0 0 0 791 ]
    /ToUnicode 60 0 R
endobj
…


…
67 0 obj
<< /Type /FontDescriptor
   /FontName /UUAJCH+CairoFont-2-0
   /Flags 4
   /FontBBox [ 0 -235 602 765 ]
   /ItalicAngle 0
   /Ascent 765
   /Descent -235
   /CapHeight 765
   /StemV 80
   /StemH 80
   /FontFile3 63 0 R
endobj
28 0 obj
<< /Type /Font
   /Subtype /Type1
   /BaseFont /UUAJCH+CairoFont-2-0
   /FirstChar 32
   /LastChar 133
   /FontDescriptor 67 0 R
   /Encoding /WinAnsiEncoding
   /Widths [ 602 0 0 0 0 0 0 0 602 602 602 0 602 602 602 602 602 602 602 602 602 602 602 602 602 602 602 0 0 602 0 602 602 602 602 602 602 602 602 0 0 602 0 0 0 602 602 0 602 0 602 602 602 602 602 602 602 602 0 0 0 0 0 602 0 602 602 602 602 602 602 602 602 602 602 602 602 602 602 602 602 602 602 602 602 602 602 602 602 602 602 0 0 0 0 0 0 0 0 0 0 602 ]
    /ToUnicode 65 0 R
endobj
…
Comment 18 Graham Perrin 2022-12-22 02:16:48 UTC
> Might this improve through the change that's currently associated with bug
> 143659? 

My bad, a typo there. Should have been bug 143095 …

Apologies for the noise.
Comment 19 ⁨خالد حسني⁩ 2022-12-22 07:55:48 UTC
Closing again as WONTFIX for yhe same reasons explained in https://bugs.documentfoundation.org/show_bug.cgi?id=101220#c37.

An extra info bar might be a UX improvement, but that deserves its own issue.
Comment 20 ⁨خالد حسني⁩ 2022-12-22 07:58:58 UTC
(In reply to LibreTraining from comment #10)
> (In reply to Michael Stahl from comment #9)
> > PDF does not embed fonts, it embeds those glyphs which are used.
> > 
> > LibreOffice is an editing application and so importing only a subset
> > of a font is frustrating and useless in practice because you can't
> > edit the document to add letters that weren't used in the original PDF.
> 
> PDFs can embed full and complete installable fonts.

Yes, but this is rarely done because it can increase file size significantly, so even if we supported extracting such fonts it is not going to help with any if the reported issues since these PDFs are not embedding full fonts.