Bug 112152 - Narrow No-Break Space (U+202F) causes PDF Error by using bundled Liberation fonts
Summary: Narrow No-Break Space (U+202F) causes PDF Error by using bundled Liberation f...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
5.1.0.3 release
Hardware: All All
: medium normal
Assignee: خالد حسني
URL:
Whiteboard: target:7.5.0 target:7.4.3
Keywords: bibisected, bisected, regression
: 129672 (view as bug list)
Depends on:
Blocks: PDF-Export-Invalid
  Show dependency treegraph
 
Reported: 2017-08-31 19:16 UTC by janina-cleemann
Modified: 2022-10-15 13:20 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
Document with narrow no-break space (7.54 KB, application/vnd.oasis.opendocument.text)
2017-09-09 16:20 UTC, Buovjaga
Details
ODT - with U+202F (8.36 KB, application/vnd.oasis.opendocument.text)
2019-04-11 15:42 UTC, [REDACTED]
Details
ODT - without U+202F (8.36 KB, application/vnd.oasis.opendocument.text)
2019-04-11 15:45 UTC, [REDACTED]
Details
PDF - CUPS-PDF print/export of ODT - with U+202F (9.28 KB, application/pdf)
2019-04-11 15:48 UTC, [REDACTED]
Details
Adobe Reader DC still errors on the imbedded font (38.37 KB, image/png)
2022-10-02 17:06 UTC, V Stuart Foote
Details
LO750 Writer exportToPDF with embeddedFont, Acrobat errors can not extract the embedded font (8.67 KB, application/pdf)
2022-10-02 23:17 UTC, V Stuart Foote
Details

Note You need to log in before you can comment on or make changes to this bug.
Description janina-cleemann 2017-08-31 19:16:08 UTC
Description:
While using the Font Liberation Serif with a Narrow No-Break Space and exporting the document to a PDF the following Error occurs after scrolling: "Die eingebettete Schrift "BAAAAA+LiberationSerif" konnte nicht entnommen werden. Einige Zeichen werden u. U. nicht korrekt angezeigt bzw. gedruckt."

Steps to Reproduce:
1. copy Narrow No-Break Space e.g. from Wikipedia
2. paste in Writer-Document
3. Export to PDF

Actual Results:  
Following errormessage occurs: "BAAAAA+LiberationSerif" konnte nicht entnommen werden. Einige Zeichen werden u. U. nicht korrekt angezeigt bzw. gedruckt."

Expected Results:
no errormessage


Reproducible: Always

User Profile Reset: No

Additional Info:


User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063
Comment 1 Buovjaga 2017-09-09 16:20:45 UTC
Created attachment 136136 [details]
Document with narrow no-break space

I don't see any problems with PDF export.

Arch Linux 64-bit, KDE Plasma 5
Version: 6.0.0.0.alpha0+
Build ID: a27eb931c22313d4dd5c73b35358c0532d20b79e
CPU threads: 8; OS: Linux 4.12; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on September 8th 2017
Comment 2 Xisco Faulí 2018-10-17 17:26:32 UTC
Could you please try to reproduce it with the latest version of LibreOffice
from https://www.libreoffice.org/download/libreoffice-fresh/ ?
I have set the bug's status to 'NEEDINFO'. Please change it back to
'UNCONFIRMED' if the bug is still present in the latest version.
Comment 3 [REDACTED] 2019-04-11 15:42:51 UTC
Created attachment 150701 [details]
ODT - with U+202F
Comment 4 [REDACTED] 2019-04-11 15:44:36 UTC
Due to a question at ask.libreoffice.org (see https://ask.libreoffice.org/en/question/190046/exported-pdf-gives-liberationserif-font-error-in-adobe-acrobat-reader/) I've made the following test.

1) Create one document containing a single U+202F code (Liberation Sans)
2) Create one document containing no U+202F code with almost the same text
3) Create an PDF Export for each .odt fikle using LibreOffice's PDF Export function 
4) Open both using Adobe Acrobat Reader DC 2018.010.20099 on Windows 10 Pro 1809.17763.437

Result 1 - File containig U+202F
Result is confirming reporter system. I do get the same message, when clicking onto the text in Acrobat Reader. Please note: Other ODF Readers may not report the error / warning, while using other creators like cups-pdf won't produce the error when opening the PDF with Adobe Reader.

Result 2 - File *not* containig U+202F
No error / warning in Acrobat Reader reported.

PDF creation done with:
Version: 6.2.2.2
Build ID: 2b840030fec2aae0fd2658d8d4f9548af4e3518d
CPU threads: 8; OS: Linux 4.12; UI render: default; VCL: kde5; 
Locale: en-GB (en_GB); UI-Language: en-US
Calc: threaded
Comment 5 [REDACTED] 2019-04-11 15:45:50 UTC
Created attachment 150702 [details]
ODT - without U+202F
Comment 6 [REDACTED] 2019-04-11 15:48:05 UTC
Created attachment 150703 [details]
PDF - CUPS-PDF print/export of ODT - with U+202F
Comment 7 Buovjaga 2019-04-11 18:10:32 UTC
New per comment 4
Comment 8 V Stuart Foote 2019-04-11 21:44:18 UTC
Confirmed on Windows 10 Ent 64-bit en-US (1803) with
Version: 6.3.0.0.alpha0+ (x64)
Build ID: 74ed80b5744fdfacf9b9c3ef8ab235c64510c20d
CPU threads: 8; OS: Windows 10.0; UI render: GL; VCL: win; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2019-04-11_04:13:57
Locale: en-US (en_US); UI-Language: en-US
Calc: CL

and Adobe Acrobat CC 2019.008.20080

I created a Writer document with Liberation Sans, holding the text: 
"Text with U+202F entered:  right there." and Export it to PDF.

On initial opening of the exported PDF, Acrobat reports:

"Cannot extract the embedded font 'BAAAAA+LiberationSans'. Some characters may not display or print correctly."

Uncompressing the PDF, for the stream "Text with U+202F entered:  right there."
shows the NBS is handled in the runs. And the font seems well described, and the BT -> ET runs look correct. In the clip following the second run holding Tf<12>Tj, is the NBS.

BT
56.8 724 Td /F1 12 Tf[<01>110<02>-1<0304>2<05>2<06>-2<07>-2<04>2<08>-1<05>2<09>-2<0A>8<0B>-1<0C>-1<0B>-1<0D>2<05>-5<02>6<0E>-1<04>2<02>-1<0F02>-1<10>-1<11>2<05>]TJ
ET
Q
q 0 0 0 rg
BT
200.6 724 Td /F1 12 Tf<12>Tj
ET
Q
q 0 0 0 rg
BT
203 724 Td /F1 12 Tf[<0F07>-2<13>6<08>-1<04>-5<05>2<04>2<08>-1<02>-1<0F02>-1<14>]TJ
ET
Q
Q endstream
endobj
6 0 obj
<< /Font 7 0 R /ProcSet [ /PDF /Text ] >>
endobj
7 0 obj
<< /F1 8 0 R >>
endobj
8 0 obj
<< /BaseFont /BAAAAA+LiberationSans /FirstChar 0 /FontDescriptor 9 0 R /LastChar 20 /Subtype /TrueType /ToUnicode 10 0 R /Type /Font /Widths [ 750 610 556 500 277 277 722 222 556 722 583 556 556 610 556 333 556 277 200 556 277 ] >>
endobj
9 0 obj
<< /Ascent 905 /CapHeight 979 /Descent -211 /Flags 4 /FontBBox [ -543 -303 1301 980 ] /FontFile2 11 0 R /FontName /BAAAAA+LiberationSans /ItalicAngle 0 /StemV 80 /Type /FontDescriptor >>
endobj
10 0 obj
<< /Length 558 >>
stream
/CIDInit/ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo<<
/Registry (Adobe)
/Ordering (UCS)
/Supplement 0
>> def
/CMapName/Adobe-Identity-UCS def
/CMapType 2 def
1 begincodespacerange
<00> <FF>
endcodespacerange
20 beginbfchar
<01> <0054>
<02> <0065>
<03> <0078>
<04> <0074>
<05> <0020>
<06> <0077>
<07> <0069>
<08> <0068>
<09> <0055>
<0A> <002B>
<0B> <0032>
<0C> <0030>
<0D> <0046>
<0E> <006E>
<0F> <0072>
<10> <0064>
<11> <003A>
<12> <202F>
<13> <0067>
<14> <002E>
endbfchar
endcmap
CMapName currentdict /CMap defineresource pop
end
end
endstream
endobj


Repeating steps but without the U+202F used does not error when opening in Acrobat.

Looking at the font, the glyph #2037 uni202F is defined--but of course as a space has no graphical feature.

So, flip a coin if this is issue with font or an issue with Acrobat too sensitive to a single glyph.  Either way I'd lean toward NotOurBug.
Comment 9 Cristian Secară 2019-04-12 10:13:16 UTC
As mentioned on my question on ask.libreoffice.org [1], when using OpenOffice to generate the .pdf from same .odt file, the resulted file no longer shows an error in AAR DC. So,  are you sure about "NotOurBug" ?

Here are both versions for comparison:
https://www.secarica.ro/test/libo_nnbsp.pdf
https://www.secarica.ro/test/opno_nnbsp.pdf
(and source
https://www.secarica.ro/test/libo_nnbsp.odt)

[1] https://ask.libreoffice.org/en/question/190046/exported-pdf-gives-liberationserif-font-error-in-adobe-acrobat-reader/
Comment 10 V Stuart Foote 2019-04-12 15:24:35 UTC
A search of the Adobe forums, or Google, for the string "Cannot extract the embedded font" will show this is a long running issue with Acrobat with multiple PDF generators, including multiple Adobe products. Adobe has patched Acrobat/Reader for embedded font issues--composite characters for example.

IMHO Adobe is risk adverse regards font licensing so it is conservative with handling font embedding in PDF, believe their filters are overly sensitive when reading PDF compared to other viewers.

LibreOffice has reworked its PDF generator compared to earlier OpenOffice derivatives, so do not expect similar behavior there.

Will poke at this some more, and if it is specific to a block of Unicode glyphs, like "General Punctuation" holding NNBS, for specific font producers--could require an issue opened with Adobe. Or for in this case with Redhat for a rebuild of the bundled Liberation fonts.

@Miklos, any thought on a direction for this specific bug?
Comment 11 Miklos Vajna 2019-04-15 07:35:58 UTC
No thoughts off the top of my head, sorry. :-)
Comment 12 randombugs 2021-07-06 16:18:09 UTC
Was reporting this Bug to Ghostscript. They suggested I post it here. They also provided some additional information about it: https://bugs.ghostscript.com/show_bug.cgi?id=703890

Maybe its nothing new, but tbh i dont understand a thing about it. Just trying to be useful :) .
Comment 13 Aron Budea 2021-09-11 02:00:48 UTC
Seems related to bug 129672, this regressed from the same commit in 5.1, and shows the same error in Adobe Reader:

https://cgit.freedesktop.org/libreoffice/core/commit/?id=41007842ed9bb5d6165792a197769f72dae55a2c
author		Martin Hosken <martin_hosken@sil.org>	2015-09-10 10:14:18 +0700
committer	Martin Hosken <martin_hosken@sil.org>	2015-09-14 01:16:40 +0000

Refactor graphite integration and update graphite
Comment 14 Grace_Cooper 2021-10-13 01:07:56 UTC Comment hidden (spam)
Comment 15 Grace_Cooper 2021-10-13 01:08:26 UTC Comment hidden (spam)
Comment 16 Grace_Cooper 2021-10-13 01:08:37 UTC Comment hidden (spam)
Comment 17 Grace_Cooper 2021-10-13 01:09:27 UTC Comment hidden (spam)
Comment 18 خالد حسني 2022-09-29 04:14:20 UTC
I can reproduce this with 7.4 but not with master, so I believe this is fixed. It would be nice to know what commit fixed it.
Comment 19 Timur 2022-09-29 10:24:46 UTC
I still see the same message opening exported PDF with Adobe: "Cannot extract the embedded font 'BAAAAA+LiberationSans'. Some characters may not display or print correctly."

Version: 7.5.0.0.alpha0+ / LibreOffice Community
Build ID: bb47ffbc9d36e83695aa0d01767d3f83533c04e0
Comment 20 Timur 2022-09-29 10:48:28 UTC Comment hidden (obsolete)
Comment 21 خالد حسني 2022-09-29 12:09:03 UTC Comment hidden (obsolete)
Comment 22 خالد حسني 2022-09-29 12:11:02 UTC
(In reply to خالد حسني from comment #21)
> (In reply to Timur from comment #20)
> > New again unless clarified. That build is 6 days old.
> 
> You are testing with a week-old build, I tested with master.

Also, FWIW, I tested first with 7.4 and I reproduced the issue there.
Comment 23 خالد حسني 2022-10-02 07:08:28 UTC
Anyone else reproduces this?
Comment 24 V Stuart Foote 2022-10-02 17:06:08 UTC
Created attachment 182791 [details]
Adobe Reader DC still errors on the imbedded font

Recreated test ODT using Liberation Sans of Comment 8

Opening in Adobe Acrobat Reader DC (2022.002.20191 64bit) the PDF opens, placing the Reader text cursor into the string I get the same "Cannot extract the embedded font 'BAAAAA+LiberationSans'. Some characters may not display or print correctly."

2022-10-01 TB77 nightly
Version: 7.5.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: c3b5eea4304ad6815b491f549fce008a9630c213
CPU threads: 8; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded
Comment 25 خالد حسني 2022-10-02 21:22:25 UTC Comment hidden (obsolete)
Comment 26 خالد حسني 2022-10-02 21:26:37 UTC
(In reply to خالد حسني from comment #25)
> OK, the file opens with Adobe Acrobat but not Adobe Acrobat Reader DC.

Actually no, it opens fine in both. Please attach the a new PDF that shows the issue.
Comment 27 V Stuart Foote 2022-10-02 23:17:13 UTC
Created attachment 182793 [details]
LO750 Writer exportToPDF  with embeddedFont, Acrobat errors can not extract the embedded font

error tip pops open in Adobe Reader when text selection cursor is positioned over the text run.
Comment 28 Commit Notification 2022-10-03 03:50:39 UTC
Khaled Hosny committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/bb67f10786fd5e232b198d09139c41078c3fc60d

tdf#112152: Fix subsetting empty glyphs

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 29 Commit Notification 2022-10-03 09:34:32 UTC
Khaled Hosny committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/a350f1d1f19d9a3d9a400b6ae410b44c662a64b3

tdf#112152: Fix subsetting empty glyphs

It will be available in 7.4.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 30 Timur 2022-10-04 13:23:04 UTC
Verified, thanks.
Comment 31 خالد حسني 2022-10-15 13:20:48 UTC
*** Bug 129672 has been marked as a duplicate of this bug. ***