Bug 112152 - Narrow No-Break Space (U+202F) causes PDF Error by using bundled Liberation fonts
Summary: Narrow No-Break Space (U+202F) causes PDF Error by using bundled Liberation f...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
5.1.0.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, regression
Depends on:
Blocks: PDF-Export-Invalid
  Show dependency treegraph
 
Reported: 2017-08-31 19:16 UTC by janina-cleemann
Modified: 2021-10-13 01:09 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Document with narrow no-break space (7.54 KB, application/vnd.oasis.opendocument.text)
2017-09-09 16:20 UTC, Buovjaga
Details
ODT - with U+202F (8.36 KB, application/vnd.oasis.opendocument.text)
2019-04-11 15:42 UTC, [REDACTED]
Details
ODT - without U+202F (8.36 KB, application/vnd.oasis.opendocument.text)
2019-04-11 15:45 UTC, [REDACTED]
Details
PDF - CUPS-PDF print/export of ODT - with U+202F (9.28 KB, application/pdf)
2019-04-11 15:48 UTC, [REDACTED]
Details

Note You need to log in before you can comment on or make changes to this bug.
Description janina-cleemann 2017-08-31 19:16:08 UTC
Description:
While using the Font Liberation Serif with a Narrow No-Break Space and exporting the document to a PDF the following Error occurs after scrolling: "Die eingebettete Schrift "BAAAAA+LiberationSerif" konnte nicht entnommen werden. Einige Zeichen werden u. U. nicht korrekt angezeigt bzw. gedruckt."

Steps to Reproduce:
1. copy Narrow No-Break Space e.g. from Wikipedia
2. paste in Writer-Document
3. Export to PDF

Actual Results:  
Following errormessage occurs: "BAAAAA+LiberationSerif" konnte nicht entnommen werden. Einige Zeichen werden u. U. nicht korrekt angezeigt bzw. gedruckt."

Expected Results:
no errormessage


Reproducible: Always

User Profile Reset: No

Additional Info:


User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063
Comment 1 Buovjaga 2017-09-09 16:20:45 UTC
Created attachment 136136 [details]
Document with narrow no-break space

I don't see any problems with PDF export.

Arch Linux 64-bit, KDE Plasma 5
Version: 6.0.0.0.alpha0+
Build ID: a27eb931c22313d4dd5c73b35358c0532d20b79e
CPU threads: 8; OS: Linux 4.12; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on September 8th 2017
Comment 2 Xisco Faulí 2018-10-17 17:26:32 UTC
Could you please try to reproduce it with the latest version of LibreOffice
from https://www.libreoffice.org/download/libreoffice-fresh/ ?
I have set the bug's status to 'NEEDINFO'. Please change it back to
'UNCONFIRMED' if the bug is still present in the latest version.
Comment 3 [REDACTED] 2019-04-11 15:42:51 UTC
Created attachment 150701 [details]
ODT - with U+202F
Comment 4 [REDACTED] 2019-04-11 15:44:36 UTC
Due to a question at ask.libreoffice.org (see https://ask.libreoffice.org/en/question/190046/exported-pdf-gives-liberationserif-font-error-in-adobe-acrobat-reader/) I've made the following test.

1) Create one document containing a single U+202F code (Liberation Sans)
2) Create one document containing no U+202F code with almost the same text
3) Create an PDF Export for each .odt fikle using LibreOffice's PDF Export function 
4) Open both using Adobe Acrobat Reader DC 2018.010.20099 on Windows 10 Pro 1809.17763.437

Result 1 - File containig U+202F
Result is confirming reporter system. I do get the same message, when clicking onto the text in Acrobat Reader. Please note: Other ODF Readers may not report the error / warning, while using other creators like cups-pdf won't produce the error when opening the PDF with Adobe Reader.

Result 2 - File *not* containig U+202F
No error / warning in Acrobat Reader reported.

PDF creation done with:
Version: 6.2.2.2
Build ID: 2b840030fec2aae0fd2658d8d4f9548af4e3518d
CPU threads: 8; OS: Linux 4.12; UI render: default; VCL: kde5; 
Locale: en-GB (en_GB); UI-Language: en-US
Calc: threaded
Comment 5 [REDACTED] 2019-04-11 15:45:50 UTC
Created attachment 150702 [details]
ODT - without U+202F
Comment 6 [REDACTED] 2019-04-11 15:48:05 UTC
Created attachment 150703 [details]
PDF - CUPS-PDF print/export of ODT - with U+202F
Comment 7 Buovjaga 2019-04-11 18:10:32 UTC
New per comment 4
Comment 8 V Stuart Foote 2019-04-11 21:44:18 UTC
Confirmed on Windows 10 Ent 64-bit en-US (1803) with
Version: 6.3.0.0.alpha0+ (x64)
Build ID: 74ed80b5744fdfacf9b9c3ef8ab235c64510c20d
CPU threads: 8; OS: Windows 10.0; UI render: GL; VCL: win; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2019-04-11_04:13:57
Locale: en-US (en_US); UI-Language: en-US
Calc: CL

and Adobe Acrobat CC 2019.008.20080

I created a Writer document with Liberation Sans, holding the text: 
"Text with U+202F entered:  right there." and Export it to PDF.

On initial opening of the exported PDF, Acrobat reports:

"Cannot extract the embedded font 'BAAAAA+LiberationSans'. Some characters may not display or print correctly."

Uncompressing the PDF, for the stream "Text with U+202F entered:  right there."
shows the NBS is handled in the runs. And the font seems well described, and the BT -> ET runs look correct. In the clip following the second run holding Tf<12>Tj, is the NBS.

BT
56.8 724 Td /F1 12 Tf[<01>110<02>-1<0304>2<05>2<06>-2<07>-2<04>2<08>-1<05>2<09>-2<0A>8<0B>-1<0C>-1<0B>-1<0D>2<05>-5<02>6<0E>-1<04>2<02>-1<0F02>-1<10>-1<11>2<05>]TJ
ET
Q
q 0 0 0 rg
BT
200.6 724 Td /F1 12 Tf<12>Tj
ET
Q
q 0 0 0 rg
BT
203 724 Td /F1 12 Tf[<0F07>-2<13>6<08>-1<04>-5<05>2<04>2<08>-1<02>-1<0F02>-1<14>]TJ
ET
Q
Q endstream
endobj
6 0 obj
<< /Font 7 0 R /ProcSet [ /PDF /Text ] >>
endobj
7 0 obj
<< /F1 8 0 R >>
endobj
8 0 obj
<< /BaseFont /BAAAAA+LiberationSans /FirstChar 0 /FontDescriptor 9 0 R /LastChar 20 /Subtype /TrueType /ToUnicode 10 0 R /Type /Font /Widths [ 750 610 556 500 277 277 722 222 556 722 583 556 556 610 556 333 556 277 200 556 277 ] >>
endobj
9 0 obj
<< /Ascent 905 /CapHeight 979 /Descent -211 /Flags 4 /FontBBox [ -543 -303 1301 980 ] /FontFile2 11 0 R /FontName /BAAAAA+LiberationSans /ItalicAngle 0 /StemV 80 /Type /FontDescriptor >>
endobj
10 0 obj
<< /Length 558 >>
stream
/CIDInit/ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo<<
/Registry (Adobe)
/Ordering (UCS)
/Supplement 0
>> def
/CMapName/Adobe-Identity-UCS def
/CMapType 2 def
1 begincodespacerange
<00> <FF>
endcodespacerange
20 beginbfchar
<01> <0054>
<02> <0065>
<03> <0078>
<04> <0074>
<05> <0020>
<06> <0077>
<07> <0069>
<08> <0068>
<09> <0055>
<0A> <002B>
<0B> <0032>
<0C> <0030>
<0D> <0046>
<0E> <006E>
<0F> <0072>
<10> <0064>
<11> <003A>
<12> <202F>
<13> <0067>
<14> <002E>
endbfchar
endcmap
CMapName currentdict /CMap defineresource pop
end
end
endstream
endobj


Repeating steps but without the U+202F used does not error when opening in Acrobat.

Looking at the font, the glyph #2037 uni202F is defined--but of course as a space has no graphical feature.

So, flip a coin if this is issue with font or an issue with Acrobat too sensitive to a single glyph.  Either way I'd lean toward NotOurBug.
Comment 9 Cristian Secară 2019-04-12 10:13:16 UTC
As mentioned on my question on ask.libreoffice.org [1], when using OpenOffice to generate the .pdf from same .odt file, the resulted file no longer shows an error in AAR DC. So,  are you sure about "NotOurBug" ?

Here are both versions for comparison:
https://www.secarica.ro/test/libo_nnbsp.pdf
https://www.secarica.ro/test/opno_nnbsp.pdf
(and source
https://www.secarica.ro/test/libo_nnbsp.odt)

[1] https://ask.libreoffice.org/en/question/190046/exported-pdf-gives-liberationserif-font-error-in-adobe-acrobat-reader/
Comment 10 V Stuart Foote 2019-04-12 15:24:35 UTC
A search of the Adobe forums, or Google, for the string "Cannot extract the embedded font" will show this is a long running issue with Acrobat with multiple PDF generators, including multiple Adobe products. Adobe has patched Acrobat/Reader for embedded font issues--composite characters for example.

IMHO Adobe is risk adverse regards font licensing so it is conservative with handling font embedding in PDF, believe their filters are overly sensitive when reading PDF compared to other viewers.

LibreOffice has reworked its PDF generator compared to earlier OpenOffice derivatives, so do not expect similar behavior there.

Will poke at this some more, and if it is specific to a block of Unicode glyphs, like "General Punctuation" holding NNBS, for specific font producers--could require an issue opened with Adobe. Or for in this case with Redhat for a rebuild of the bundled Liberation fonts.

@Miklos, any thought on a direction for this specific bug?
Comment 11 Miklos Vajna 2019-04-15 07:35:58 UTC
No thoughts off the top of my head, sorry. :-)
Comment 12 randombugs 2021-07-06 16:18:09 UTC
Was reporting this Bug to Ghostscript. They suggested I post it here. They also provided some additional information about it: https://bugs.ghostscript.com/show_bug.cgi?id=703890

Maybe its nothing new, but tbh i dont understand a thing about it. Just trying to be useful :) .
Comment 13 Aron Budea 2021-09-11 02:00:48 UTC
Seems related to bug 129672, this regressed from the same commit in 5.1, and shows the same error in Adobe Reader:

https://cgit.freedesktop.org/libreoffice/core/commit/?id=41007842ed9bb5d6165792a197769f72dae55a2c
author		Martin Hosken <martin_hosken@sil.org>	2015-09-10 10:14:18 +0700
committer	Martin Hosken <martin_hosken@sil.org>	2015-09-14 01:16:40 +0000

Refactor graphite integration and update graphite
Comment 14 Grace_Cooper 2021-10-13 01:07:56 UTC Comment hidden (spam)
Comment 15 Grace_Cooper 2021-10-13 01:08:26 UTC Comment hidden (spam)
Comment 16 Grace_Cooper 2021-10-13 01:08:37 UTC Comment hidden (spam)
Comment 17 Grace_Cooper 2021-10-13 01:09:27 UTC Comment hidden (spam)