Bug 101220 - Fonts subset into PDF are not being used to render PDF to canvas, receive questionable fallback replacement
Summary: Fonts subset into PDF are not being used to render PDF to canvas, receive que...
Status: RESOLVED WONTFIX
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
3.6.7.2 release
Hardware: x86-64 (AMD64) All
: high normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: difficultyInteresting, easyHack, filter:pdf, skillCpp, topicDebug
: 101373 101422 101427 106203 108003 114846 115776 128054 129893 140005 (view as bug list)
Depends on:
Blocks: PDF-Import-Draw
  Show dependency treegraph
 
Reported: 2016-07-30 08:07 UTC by E.Mi
Modified: 2022-10-07 11:57 UTC (History)
20 users (show)

See Also:
Crash report or crash signature:


Attachments
pdf (218.03 KB, application/pdf)
2016-07-30 08:07 UTC, E.Mi
Details
screenshot (385.81 KB, image/png)
2016-07-30 08:09 UTC, E.Mi
Details
Screenshot (78.85 KB, image/png)
2016-07-30 11:10 UTC, m_a_riosv
Details
Evince properties (136.13 KB, image/jpeg)
2016-08-08 07:55 UTC, E.Mi
Details
PDF sample (73.49 KB, application/pdf)
2016-10-16 14:36 UTC, RGB
Details

Note You need to log in before you can comment on or make changes to this bug.
Description E.Mi 2016-07-30 08:07:10 UTC
Created attachment 126482 [details]
pdf
Comment 1 E.Mi 2016-07-30 08:09:14 UTC
Created attachment 126483 [details]
screenshot
Comment 2 m_a_riosv 2016-07-30 11:10:42 UTC
Created attachment 126487 [details]
Screenshot

Cannot reproduce:
Won10x64
Version: 5.1.5.2 (x64)
Build ID: 7a864d8825610a8c07cfc3bc01dd4fce6a9447e5
CPU Threads: 1; OS Version: Windows 6.19; UI Render: default; 
Locale: es-ES (es_ES); Calc: CL
Comment 3 V Stuart Foote 2016-07-31 12:13:21 UTC
Cannot reproduce. Checkboxes are well formed and the PDF filter works correctly opening into Draw, and also with new filter when inserting as an image with recent master.

On Centos/RHEL 7.2 64-bit en-US with parallel installs of

Version: 5.1.5.2
Build ID: 7a864d8825610a8c07cfc3bc01dd4fce6a9447e5
CPU Threads: 2; OS Version: Linux 3.10; UI Render: default; 
Locale: en-US (en_US.UTF-8); Calc: group

Version: 5.3.0.0.alpha0+
Build ID: faddbce32ed863bda4238e54dd11df1b468ccd86
CPU Threads: 2; OS Version: Linux 3.10; UI Render: default; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2016-07-01_05:15:51
Locale: en-US (en_US.UTF-8)


Also correct filter handling on Windows 10 Pro 64-bit en-US with

Version: 5.1.5.2 (x64) (/A parallel install)
Build ID: 7a864d8825610a8c07cfc3bc01dd4fce6a9447e5
CPU Threads: 8; OS Version: Windows 6.19; UI Render: GL; 
Locale: en-US (en_US); Calc: CL

Version: 5.2.0.4 (x64) (fully installed)
Build ID: 066b007f5ebcc236395c7d282ba488bca6720265
CPU Threads: 8; OS Version: Windows 6.19; UI Render: default; 
Locale: en-US (en_US)

Version: 5.3.0.0.alpha0+ (/A parallel install)
Build ID: 4a6329badc9c8679945d1a1ec225e26e15d7bfd2
CPU Threads: 8; OS Version: Windows 6.2; UI Render: GL; 
TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2016-07-30_10:25:39
Locale: en-US (en_US); Calc: CL
Comment 4 V Stuart Foote 2016-07-31 12:15:31 UTC
@ekari,

Please provide your Linux distro and desktop environment in use so we can attempt to reproduce your issue.
Comment 5 E.Mi 2016-07-31 12:42:14 UTC
I'm using Ubuntu gnome 16.04.1 desktop is gnome shell
Comment 6 Buovjaga 2016-08-07 12:28:17 UTC
Confirmed.

Arch Linux 64-bit, KDE Plasma 5
Version: 5.3.0.0.alpha0+
Build ID: f3d26af51588af441f62fb69bb7a5432845226ac
CPU Threads: 8; OS Version: Linux 4.6; UI Render: default; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on August 5th 2016

Arch Linux 64-bit
Version 3.6.7.2 (Build ID: e183d5b)
Comment 7 V Stuart Foote 2016-08-07 13:32:12 UTC
The two check boxes in use are glyphs from specific Microsoft CID-encoding and Identity-H symbol fonts with Unicode PUA values:

upper set is U+F035 from Wingdings2


lower set is U+F031 from Webdings


Both fonts appear to be subset embedded into the PDF.

So, for some reason looks to be a font substitution issue on Linux.

Guess question would be how does Evince, Ocular or XPDF handle the checkbox glyphs?  If they also balk--issue is OS and Desktop. If not issue is in the LibreOffice filter.
Comment 8 V Stuart Foote 2016-08-08 02:20:29 UTC
Needinfo how are Evince, Okular or XPDF handling the test PDF?
Comment 9 Buovjaga 2016-08-08 07:54:53 UTC
Okular handled it just fine.
Comment 10 E.Mi 2016-08-08 07:55:10 UTC
Created attachment 126660 [details]
Evince properties

This is the properties of evince, it says it has other fonts not embedded on the PDF and substitution will occur
Comment 11 V Stuart Foote 2016-08-08 12:53:06 UTC
Well since Evince and Okular handle the extracting the subsetted Webdings and Wingdings2 it looks to be our filter handling.

Not clear if it is an issue with LibreOffice font substitution on Linux, or with PDF filter extraction from the PDF subsetting of the font.

Another NEEDINFO, could you use a current build of master and check the handling of the PDF from Insert -> Image (filter enhancements for bug 89727), rather than opening PDF into Draw (or Impress or Writer).
Comment 12 E.Mi 2016-08-09 08:06:05 UTC
@Buovjaga

Could you test for me?
Comment 13 V Stuart Foote 2016-08-10 13:15:29 UTC
(In reply to ekari from comment #12)
> @Buovjaga
> 
> Could you test for me?

Version: 5.3.0.0.alpha0+
Build ID: 107a7cc5a2f1c018cbba6b35f3ea590027f8ec9a
CPU Threads: 2; OS Version: Linux 3.10; UI Render: default; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2016-08-10_00:20:02
Locale: en-US (en_US.UTF-8); Calc: group

Confirmed issue as the LibreOffice PDF filter(s) not extracting the subset fonts for use in composing the PDF onto canvas.

The new insert as image filter also does not extract the subset fonts for use.

"Break" of the PDF apart in Draw, and selecting shows the checkbox glyphs from Wingding2 and Webdings as not installed and receiving a replacement glyph.
Comment 14 V Stuart Foote 2016-08-10 14:32:11 UTC
*** Bug 101427 has been marked as a duplicate of this bug. ***
Comment 15 V Stuart Foote 2016-08-10 14:46:29 UTC
Reviewing duplicate bug 101427 on page 17 of attachment 126720 [details] subset fonts not being extracted are Computer Modern and Bitstream Charter

Text being rendered in those fonts is receiving fallback font with different metrics.

And, this behavior of the PDF filter actually affects Windows builds (and I assume OS X builds) and is not limited to Linux systems.

If fonts are not installed and available to a system--the subset fonts in the PDF are not being used in composing the PDF onto LibreOffice document canvas.
Comment 16 vvort 2016-08-10 15:32:53 UTC
Is this report is the same as bug 85295?
Comment 17 Thorsten Behrens (allotropia) 2016-08-10 15:51:45 UTC
Interesting - since the pdf import internally generates an odf file, now that font embedding for odf is implemented, that should not be excessively hard to fix.

Code pointers:
 - sdext/source/pdfimport/tree/drawtreevisiting.cxx: aFontProps[ "fo:font-family" ] = rFont.familyName
   for where to write the stuff
 - xmloff/source/style/XMLFontAutoStylePool.cxx: if( tryToEmbedFonts ) for code that actually embeds fonts (note that you want the bExportFlat case)
Comment 18 vvort 2016-08-10 15:57:37 UTC
> that should not be excessively hard to fix

Please note that many "incorrect" characters have no Unicode representation.
And it would be hard to embed into ODF character with code 0x01, for example.
But I can be wrong, of course.
Comment 19 V Stuart Foote 2016-08-10 16:59:37 UTC
(In reply to vvort from comment #16)
> Is this report is the same as bug 85295?

Yes that started the same but went awry, lets give it a see also and concentrate here.

Don't we have to do more work in poppler-cairo in the import filter to pull the fonts out?  Something has been missing from the PDF import filter since implemented if Evince, Okular even ImageMagick, and GIMP have no trouble extracting the fonts. Interestingly Inkscape also chokes on the subset and substitutes.
Comment 20 jani 2016-08-11 06:38:58 UTC
Missing keyword topic<foo> as well as code pointer, a demand for easyhacks.

setting status to NEEDINFO
Comment 21 jani 2016-08-11 10:20:04 UTC Comment hidden (obsolete)
Comment 22 jani 2016-08-11 10:20:04 UTC Comment hidden (obsolete)
Comment 23 V Stuart Foote 2016-08-12 13:59:31 UTC
*** Bug 101422 has been marked as a duplicate of this bug. ***
Comment 24 V Stuart Foote 2016-08-12 14:05:26 UTC
*** Bug 101373 has been marked as a duplicate of this bug. ***
Comment 25 V Stuart Foote 2016-08-21 02:59:38 UTC
*** Bug 101611 has been marked as a duplicate of this bug. ***
Comment 26 RGB 2016-10-16 14:36:55 UTC
Created attachment 128030 [details]
PDF sample

I'd just tested today's 5.3 alpha build and found this problem when inserting a PDF as an image into Writer. All text (numbers and Greek characters) are in the wrong font.

Attached is a simple PDF created when testing LabPlot software. The PDF looks Ok on Okular and on its properties it shows the font is correctly embedded. 

Note that the font is substituted even if it is present on the system. 

If the PDF is turn into SVG and then inserted, the SVG works without a problem.
Comment 27 V Stuart Foote 2017-03-01 23:47:42 UTC
*** Bug 106203 has been marked as a duplicate of this bug. ***
Comment 28 V Stuart Foote 2017-05-22 17:35:27 UTC
*** Bug 108003 has been marked as a duplicate of this bug. ***
Comment 29 RGB 2017-05-22 18:28:04 UTC
The problem seems fixed for 5.4.0.0.beta1!
Comment 30 V Stuart Foote 2017-05-22 19:36:32 UTC
(In reply to RGB from comment #29)
> The problem seems fixed for 5.4.0.0.beta1!

No, please look more closely...  

The ipdf filter to insert is now pdfium based and handles subset fonts correctly, rendering a high fidelity but low resolution bitmap to document canvas. 

But the pdfimport filter continues to use fall back font substitution rather than parsing the subset fonts. Poor fidelity to original fonts, but as high resolution vector font draw objects.  Issues when the fall back replacment lacks coverage of a code point--or the source font uses private use addressing (PUA).
Comment 31 V Stuart Foote 2018-01-07 15:06:43 UTC
*** Bug 114846 has been marked as a duplicate of this bug. ***
Comment 32 V Stuart Foote 2018-02-16 17:27:20 UTC
*** Bug 115776 has been marked as a duplicate of this bug. ***
Comment 33 V Stuart Foote 2019-10-09 17:20:49 UTC
*** Bug 128054 has been marked as a duplicate of this bug. ***
Comment 34 V Stuart Foote 2019-10-09 17:26:32 UTC
(In reply to V Stuart Foote from comment #33)
> *** Bug 128054 has been marked as a duplicate of this bug. ***

On Windows build of 6.3.2.2

attachment 154867 [details] from dupe issue opens with fallback (poorly), but inserts as image cleanly.
Comment 35 V Stuart Foote 2020-01-09 03:46:19 UTC
*** Bug 129893 has been marked as a duplicate of this bug. ***
Comment 36 V Stuart Foote 2021-01-29 21:26:02 UTC
*** Bug 140005 has been marked as a duplicate of this bug. ***
Comment 37 ⁨خالد حسني⁩ 2022-09-28 16:22:07 UTC
This is unfixable. Fonts embedded in PDF are not generally usable for editing.

1) They are usually subset fonts, so any character not used in the document with any given font will be missing.
2) They are often Type 1 or bare CFF fonts and both we can’t use as standalone fonts.
3) They also are usually stripped from any OpenType layout tables (so no ligatures, kerning, Arabic or Indic support etc.)
4) Glyph are re-encoeded and the cmap table is not usually usable, since the original Unicode encoding is lost.

The only case where the embedded fonts are usable is when full OpenType font is embedded, and this rarely happens because of the file size implications.

My suggestion is to close as WONTFIX.
Comment 38 Hossein 2022-10-07 11:57:50 UTC
(In reply to خالد حسني from comment #37)
> My suggestion is to close as WONTFIX.
I agree.

In case someone wants to extract embedded fonts from a PDF file, there are several programs and services for this purpose. Some are listed here:

How can I extract embedded fonts from a PDF as valid font files?
https://stackoverflow.com/q/3488042/2316442