Bug 151525 - FILEOPEN: LibreOffice Draw renders emojis from some PDFs as the Unicode Replacement Character '�'
Summary: FILEOPEN: LibreOffice Draw renders emojis from some PDFs as the Unicode Repla...
Status: RESOLVED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
7.3.6.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:pdf
Depends on:
Blocks: PDF-Import-Draw
  Show dependency treegraph
 
Reported: 2022-10-14 10:13 UTC by Thomas Szymczak
Modified: 2022-12-31 11:06 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
I typed a small HTML to show a Web page with emojis. I used this to create emoji-test.pdf by opening the HTML file in Firefox and printing it to a file. (325 bytes, text/html)
2022-10-14 10:20 UTC, Thomas Szymczak
Details
Then I printed the HTML file to PDF in Firefox, creating this file. (10.59 KB, application/pdf)
2022-10-14 10:22 UTC, Thomas Szymczak
Details
How the file looks when I open it in Draw (21.28 KB, image/png)
2022-10-14 10:23 UTC, Thomas Szymczak
Details
How the file looks when I open it in Atril PDF reader (Correct rendering) (20.85 KB, image/png)
2022-10-14 10:24 UTC, Thomas Szymczak
Details
What happens why I open emoji-test.pdf in Draw, then export it as a PDF again. Now it looks wrong in any viewer. (13.10 KB, application/pdf)
2022-10-14 10:29 UTC, Thomas Szymczak
Details
HTML source of emoji test, gzipped to avoid text encoding issues (234 bytes, application/gzip)
2022-10-25 11:33 UTC, Thomas Szymczak
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Szymczak 2022-10-14 10:13:32 UTC
Description:
LibreOffice Draw is unable to properly display emojis from at least some PDF files. If you open a file containing emojis, it's displayed as different-colored replacement characters (�). If you export the file as a PDF again, then open the exported file in another viewer, it also shows up as replacement characters.

Steps to Reproduce:
Open a certain PDF file in LibreOffice draw. I will attach a file that reproduces the bug.

Actual Results:
Emoji are displayed as a bunch of colored Unicode Replacement Characters. Like this, but in different colors: ����.

Expected Results:
Emoji are displayed as faces, animals, etc., in the same manner as when the PDF is opened in another Viewer.


Reproducible: Always


User Profile Reset: No


OpenGL enabled: Yes

Additional Info:
Version: 7.3.6.2 / LibreOffice Community
Build ID: 30(Build:2)
CPU threads: 4; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Ubuntu package version: 1:7.3.6-0ubuntu0.22.04.1
Calc: threaded
Comment 1 Thomas Szymczak 2022-10-14 10:20:49 UTC
Created attachment 183041 [details]
I typed a small HTML to show a Web page with emojis. I used this to create emoji-test.pdf by opening the HTML file in Firefox and printing it to a file.
Comment 2 Thomas Szymczak 2022-10-14 10:22:11 UTC
Created attachment 183042 [details]
Then I printed the HTML file to PDF in Firefox, creating this file.
Comment 3 Thomas Szymczak 2022-10-14 10:23:27 UTC
Created attachment 183043 [details]
How the file looks when I open it in Draw
Comment 4 Thomas Szymczak 2022-10-14 10:24:20 UTC
Created attachment 183044 [details]
How the file looks when I open it in Atril PDF reader (Correct rendering)
Comment 5 Thomas Szymczak 2022-10-14 10:27:45 UTC
Unfortunately, the HTML file itself got corrupted when I uploaded it. Looks like I hit an encoding bug while trying to report an encoding bug.
Comment 6 Thomas Szymczak 2022-10-14 10:29:45 UTC
Created attachment 183045 [details]
What happens why I open emoji-test.pdf in Draw, then export it as a PDF again. Now it looks wrong in any viewer.
Comment 7 Thomas Szymczak 2022-10-14 10:53:51 UTC
I was able to reproduce this on another computer with the LibreOffice profile reset. It was also running LO 7.3.6.2 on Ubuntu. This eliminates the config files as a cause.
Comment 8 Roman Kuznetsov 2022-10-15 15:11:09 UTC
Confirm the problem in

Version: 7.5.0.0.alpha0+ / LibreOffice Community
Build ID: 55ee3ede2bb0211e895053ed3a54bb1c99cc94ca
CPU threads: 4; OS: Linux 5.15; UI render: default; VCL: kf5 (cairo+xcb)
Locale: ru-RU (ru_RU.UTF-8); UI: en-US
Calc: threaded

Okular opens the PDF file correctly
Comment 9 Kevin Suo 2022-10-21 12:12:52 UTC
I am quite sure this is a font substitution issue in the sdext.pdfimport filter. What font are you using for the emoji? The pdf says it uses CairoFont-0-0 and CairoFont-1-1. When open in Draw, the font name is italic due to bug 143095. I don't have CairoFont installed so I don't know whether the emoji comes back when you manually set the font to CairoFont.
Comment 10 Thomas Szymczak 2022-10-25 11:33:59 UTC
Created attachment 183256 [details]
HTML source of emoji test, gzipped to avoid text encoding issues
Comment 11 Thomas Szymczak 2022-10-25 11:56:28 UTC
(In reply to Kevin Suo from comment #9)
> I am quite sure this is a font substitution issue in the sdext.pdfimport
> filter. What font are you using for the emoji? The pdf says it uses
> CairoFont-0-0 and CairoFont-1-1. When open in Draw, the font name is italic
> due to bug 143095. I don't have CairoFont installed so I don't know whether
> the emoji comes back when you manually set the font to CairoFont.

Are you saying we need to see what happens when we specify the font within the PDF, or when I change the font after opening it?

Changing the font of the text after I open it doesn't help.

I then explored changing the font of the PDF by specifying it in the HTML. This has strange behavior. If I switch the HTML to a serif font, the PDF I make by printing it in Firefox also has a serif font when I open it in my default PDF viewer (Atril). When I open it in Draw, it looks the same (sans-serif).
Comment 12 ⁨خالد حسني⁩ 2022-12-24 00:10:18 UTC
This sounds like an encoding issue, possibly the PDF importer is mishandling surrogate pairs and they end up converted to replacement character which is often used in encoding errors.
Comment 13 ⁨خالد حسني⁩ 2022-12-25 00:11:52 UTC
The PDF is actually broken, the ToUnicode of the emoji font embedded in the PDF maps everything to U+FFFD (�), so the text representation is unrecoverable.
Comment 14 Thomas Szymczak 2022-12-31 02:57:40 UTC
Is this a bug in Firefox or the font used?
Comment 15 ⁨خالد حسني⁩ 2022-12-31 11:06:56 UTC
(In reply to Thomas Szymczak from comment #14)
> Is this a bug in Firefox or the font used?

Most likely Firefox.