Bug 123234 - Missing characters while exporting to PDF using certain fonts
Summary: Missing characters while exporting to PDF using certain fonts
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: ⁨خالد حسني⁩
URL:
Whiteboard: target:7.5.0 target:7.4.3 inReleaseNo...
Keywords: bibisected, filter:pdf, needUITest
Depends on:
Blocks: PDF-Export
  Show dependency treegraph
 
Reported: 2019-02-07 19:20 UTC by tamius.han
Modified: 2022-12-08 14:30 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of the issue (313.71 KB, image/png)
2019-02-07 19:22 UTC, tamius.han
Details
Sample document (4.64 MB, application/vnd.oasis.opendocument.text)
2019-02-07 19:25 UTC, tamius.han
Details
Sample document gets rendered as this (26.11 KB, application/pdf)
2019-02-07 19:26 UTC, tamius.han
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tamius.han 2019-02-07 19:20:24 UTC
When exporting to PDF, certain special characters (žščć) do not appear in exported PDF.

Said characters are definitely supported by the font, though, as Writer (and every other program for that matter) renders them without a problem. They only go missing when exporting to PDF.
Comment 1 tamius.han 2019-02-07 19:22:11 UTC
Created attachment 148987 [details]
Screenshot of the issue
Comment 2 tamius.han 2019-02-07 19:25:52 UTC
Created attachment 148988 [details]
Sample document
Comment 3 tamius.han 2019-02-07 19:26:51 UTC
Created attachment 148989 [details]
Sample document gets rendered as this
Comment 4 tamius.han 2019-02-07 19:27:50 UTC
IM Fell fonts are provided by otf-im-fell-types package in AUR (arch user repository): https://aur.archlinux.org/packages/otf-im-fell-types/
Comment 5 Durgapriyanka 2019-02-08 16:16:41 UTC
Thank you for reporting the bug. I can confirm the bug present in

Version: 6.3.0.0.alpha0+
Build ID: b6b28931435e44aca92b8c0e1659f701e3ed1a87
CPU threads: 2; OS: Windows 6.1; UI render: default; VCL: win; 
TinderBox: Win-x86@42, Branch:master, Time: 2019-01-30_06:57:04
Locale: en-US (en_US); UI-Language: en-US
Calc: threaded
Comment 6 Xisco Faulí 2019-02-12 17:38:15 UTC
Also reproduced in

Version: 5.2.0.0.alpha0+
Build ID: 3ca42d8d51174010d5e8a32b96e9b4c0b3730a53
Threads 4; Ver: 4.15; Render: default; 

Version: 4.3.0.0.alpha1+
Build ID: c15927f20d4727c3b8de68497b6949e72f9e6e9e

but not in

Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)
Comment 7 Buovjaga 2019-02-12 19:59:28 UTC
(In reply to Xisco Faulí from comment #6)
> but not in
> 
> Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)

Weird: I repro in oldest of 41max, oldest of 43all (+ last36onmaster).
Comment 8 Buovjaga 2019-04-15 15:59:24 UTC
Repro in oldest commit of win32-4.3
Comment 9 Timur 2021-03-19 10:17:57 UTC
Repro 6.3.0 and no repro 6.4.0 and 7.2+ in Windows. So fixed but rebisect and I guess UI test would be useful. 

Linux still has a bug in 7.2+. GUI and headless seem to be different. 
In bibisect repo 50max latest is good and 5.2 oldest is wrong with headless, so not bibisectable that way. 
In 41max oldest is good and latest is bad, I used GEN.
Comment 12 Timur 2021-03-19 11:14:12 UTC
@Xisco, please see if the sources for previous bibisect commits are of some value, just 11 of them:
https://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=6db8d0ba581463dfe1a791404044e7b1a1051bfa..c12ab867f282e783507fcf74ab5c90e784681f65
Comment 13 ⁨خالد حسني⁩ 2022-09-13 19:23:39 UTC
I pretty much doubt this is fixed anywhere (unless one is using the TTF verdion of the fonts). Not a regression either, The code where this bug happens haven’t changed since it was originally written.

These characters are usung a deprecated mechanism for making accented glyphs in CFF table and our font subsetter does bot support it.
Comment 14 ⁨خالد حسني⁩ 2022-09-13 23:51:42 UTC
I thought this is going to be a quick fix, but it is bit more complicated.

OpenType fonts contain a CFF table describing glyph outlines, but when embedding fonts in PDF we convert CFF table to Type 1 fonts.

CFF table uses what is called Type 2 Charstring format (Type 1 fonts use Type 1 format). The Type 2 spec has a deprecated section with this text:

> endchar – adx ady bchar achar endchar (14) |–
>
> In addition to the optional width (see section 4.2, “Operator for Finishing a
> Path” for more details) endchar may have four extra arguments that correspond
> exactly to the last four arguments of the Type 1 charstring command “seac”
> (see Type 1 Font Format book). The Type 1 charstring command argument asb is
> not included because all sidebearings are considered to be zero and hence
> unencoded in Type 2 charstrings.
>
> It is important to note the following restrictions which are the same as
> those for Type 1 but frequently overlooked.
> The bchar and achar refer to glyph names in StandardEncoding and not to any
> current font encoding or re-encoding. This requires that a glyph name be
> determined from bchar and achar via StandardEncoding and then the appropriate
> charstring be located by that name.
>
> This construct can only be used to build glyphs from components named in
> StandardEncoding. This construct may not be nested. 

So my first thought was to re-encode this deprecated use of endchar operator as seac operator, and this almost works except when achar and bchar are not part of the font subset already determined. At this level we can’t extend the font subset to include them (this has to be determined at much higher level that the code that can catch this endchar use), so the next fix is to decompose these accented glyphs and use the outlines of the bchar and achar directly in the subset font, but this is more involved than I can allocate time for right now.

For code pointers, start at https://git.libreoffice.org/core/+/refs/heads/master/vcl/source/fontsubset/cff.cxx#890, if size() >= 4, then this is a deprecated use of endchar and figure out how to get the outlines of bchar and achar and adjust achar with adx and ady.
Comment 15 ⁨خالد حسني⁩ 2022-09-14 08:33:33 UTC
https://gerrit.libreoffice.org/c/core/+/139908 but does not work quite right. The accented letters show uo, but the accents are shifted. It is already too much code to support a deprecated feature. At this point, I’d recommend using the TTF versions of the fonts or contacting the designer to not use this deprecated mechanism.
Comment 16 Commit Notification 2022-10-15 17:09:28 UTC
Khaled Hosny committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/03ff7ee47c6b4e0dbf38a040825aaca53ce2ed28

tdf#123234: Fix subsetting CFF deprecated endchar

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Commit Notification 2022-10-15 20:40:59 UTC
Khaled Hosny committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/45318605384fac327a8b487e0132c4559d25a29c

tdf#123234: Fix subsetting CFF deprecated endchar

It will be available in 7.4.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Stéphane Guillou (stragu) 2022-12-08 14:30:10 UTC
Fix verified in:

Version: 7.5.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: ad085990b8073a122ac5222e5220f8f1d6826dcf
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Thanks Khaled!