Bug 114844 - Font substitution in attached PDF changed between 5.2 and 5.3
Summary: Font substitution in attached PDF changed between 5.2 and 5.3
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: graphics stack (show other bugs)
Version:
(earliest affected)
5.3.0.3 release
Hardware: All Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, regression
Depends on:
Blocks: Font-Substitution
  Show dependency treegraph
 
Reported: 2018-01-05 10:51 UTC by Aron Budea
Modified: 2018-04-30 22:59 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
2016CommunicationRallyjobapplication.pdf (206.41 KB, application/pdf)
2018-01-05 10:51 UTC, Aron Budea
Details
Screenshot (173.39 KB, image/png)
2018-01-05 10:52 UTC, Aron Budea
Details
Comparison screenshot (5.2.0.4 vs 6.1 build, Linux) (220.50 KB, image/png)
2018-01-07 02:29 UTC, Aron Budea
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aron Budea 2018-01-05 10:51:54 UTC
Created attachment 138893 [details]
2016CommunicationRallyjobapplication.pdf

Open the attached PDF in Draw (downloaded from [1]).

=> Text in the note and several table cells don't fit and overlap other cells.
Checkboxes are also rendered incorrectly, but that's independent from this bug.

Observed using LO 6.1 master build (a0e136d2cbb3784ddfcbddcfed5d784c8e4c9a64) & 5.3.0.3 / Ubuntu 17.04.
PDF is rendered fine in 5.2.0.4.
=> regression

PDF also looks fine in 6.0.0.1 / Windows 7. => Linux only

The bug starts with the following commit (with SAL_USE_COMMON_LAYOUT environment variable set):

https://cgit.freedesktop.org/libreoffice/core/commit/?id=828b8cf4d26c4d72c1f2146fd7a5bbb3b0465718
author		Akash Jain <akash96j@gmail.com>	2016-07-06 10:35:24 +0530
committer	Khaled Hosny <khaledhosny@eglug.org>	2016-10-18 20:41:29 +0200

"GSoC: Integrate new CommonSalLayout in unx/ code"

[1] http://www.oces.tulsacounty.org/4h/4hForms/2016CommunicationRallyjobapplication.pdf
Comment 1 Aron Budea 2018-01-05 10:52:20 UTC
Created attachment 138894 [details]
Screenshot
Comment 2 V Stuart Foote 2018-01-05 16:49:49 UTC
Your screen shot shows font substitutions are being made for several of the PalatinoLinoType fonts subsetted into the PDF. Similar for me on Windows builds.

So, IMHO this is correct behavior (if referenced font is not installed on system) but then fallback handling of font metrics is not ideal.

Unlike other PDF "viewers", for our purposes of extracting PDF content, on filter import of a PDF the font substitution has to be made--as we likely will need to change text to use glyphs that are not included with the available subset in the PDF. 

And the trouble comes with the fallback mechanism, which is passed off to the OS to deal with. 

I don't know if the PDF font embedding includes all the metrics--anyone?  But not sure we can improve that if the font metrics are not available.

So, not really an issue with HarfBuzz?
Comment 3 Xavier Van Wijmeersch 2018-01-06 17:21:57 UTC
I opened the attachment and the only problem i have is; checkboxes are rendered incorrectly.

Version: 5.3.8.0.0+
Build ID: 7f1297d9b4f449eb9ada8008fb21b7046d1a8f19
CPU Threads: 8; OS Version: Linux 4.14; UI Render: default; VCL: kde4; Layout Engine: new; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:libreoffice-5-3, Time: 2017-11-10_15:56:34
Locale: nl-BE (en_US.UTF-8); Calc: group

Version: 6.1.0.0.alpha0+
Build ID: 4ead201c578ce4cc17f65d2a97a591e112307a1a
CPU threads: 8; OS: Linux 4.14; UI render: default; VCL: kde4; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2017-12-31_00:43:41
Locale: nl-BE (en_US.UTF-8); Calc: group threaded
Comment 4 Aron Budea 2018-01-07 02:28:09 UTC
(In reply to V Stuart Foote from comment #2)
> Your screen shot shows font substitutions are being made for several of the
> PalatinoLinoType fonts subsetted into the PDF. Similar for me on Windows
> builds.
Right, I should've paid closer attention to the details, the fonts look obviously different. Let me attach a comparison screenshot between 5.2.0.4 and 6.1 master build.

> Unlike other PDF "viewers", for our purposes of extracting PDF content, on
> filter import of a PDF the font substitution has to be made--as we likely
> will need to change text to use glyphs that are not included with the
> available subset in the PDF. 
This sounds logical, however it doesn't explain two things:
- why did the font substitution change between 5.2 and 5.3, and what does it have to do with the common layout change?
- why is it fine in Windows? I don't have the font there either, and it pretty much looks the same as the pre-5.3 version in Linux.
Comment 5 Aron Budea 2018-01-07 02:29:08 UTC
Created attachment 138932 [details]
Comparison screenshot (5.2.0.4 vs 6.1 build, Linux)
Comment 6 ⁨خالد حسني⁩ 2018-01-07 10:53:04 UTC
What are the fonts used in the document in each version? Probably the pre-HarfBuzz version is using a Type 1 font that has closer metrics to the font used in the PDF (which wouldn't be an issue on Windows as it doesn't usually come with Type 1 fonts).
Comment 7 V Stuart Foote 2018-01-07 14:45:18 UTC
(In reply to Khaled Hosny from comment #6)
> What are the fonts used in the document in each version? 

Looks like reading the font and its metrics happens in the import filter [1], but then gets handled for fallback elsewhere.

Ironic that only way to tell now what font gets used is to export from Draw to PDF and compare. We'd need something like bug 61134 or bug 78186 to help here.


=-ref-=
[1] https://opengrok.libreoffice.org/xref/core/sdext/source/pdfimport/wrapper/wrapper.cxx?#588
Comment 8 Aron Budea 2018-01-10 12:35:38 UTC
As far as I can see the good versions (pre-5.3 Linux versions and pre/post-5.3 Windows versions) actually use Palatino Linotype, and post-5.3 Linux versions use Bitstream Vera Serif.
Comment 9 Adolfo Jayme Barrientos 2018-01-12 09:47:23 UTC
(In reply to Khaled Hosny from comment #6)
> What are the fonts used in the document in each version? Probably the
> pre-HarfBuzz version is using a Type 1 font that has closer metrics to the
> font used in the PDF (which wouldn't be an issue on Windows as it doesn't
> usually come with Type 1 fonts).

That would be my guess as well; Ubuntu includes URW Palladio L as a Palatino substitute, but that is a Type 1 font, which would explain its disappearance from newer LibreOffice versions.
Comment 10 Aron Budea 2018-01-12 14:15:57 UTC
(In reply to Adolfo Jayme from comment #9)
> That would be my guess as well; Ubuntu includes URW Palladio L as a Palatino
> substitute, but that is a Type 1 font, which would explain its disappearance
> from newer LibreOffice versions.
Ah, I didn't know there were clones of this font. Not only that, there are actually new FOSS versions as well, FPL Neu and TeX Gyre Pagella.
Eg. in Ubuntu TeX Gyre Pagella can be installed separately through the 'tex-gyre' package, which fixes font substitution in the attached PDF.

A general question about font substitution, could the handling of metrics be improved when the font is missing, to avoid badly looking output as it is in attachment 138932 [details]? I don't know what kind of information can be collected from the PDF.
Comment 11 ⁨خالد حسني⁩ 2018-04-30 22:59:40 UTC
(In reply to Aron Budea from comment #10)
> A general question about font substitution, could the handling of metrics be
> improved when the font is missing.

No. If the font is missing, we have no idea what its metrics were, this kind of information is not embedded in Office documents.

Closing this as not a bug, if one does not have the exact font, all bets are off and the end result is system-dependent.