Bug 91666 - PDF: Broken import of non-BMP characters
Summary: PDF: Broken import of non-BMP characters
Status: RESOLVED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
5.0 all versions
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: PDF-Import-Draw
  Show dependency treegraph
 
Reported: 2015-05-27 14:25 UTC by neomix
Modified: 2022-12-25 02:14 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
an example of a pdf document (285.24 KB, application/pdf)
2015-05-27 14:25 UTC, neomix
Details
LOO 5.4.2.1-64 odg with errors (44.27 KB, application/vnd.oasis.opendocument.graphics)
2017-09-20 22:39 UTC, paulystefan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description neomix 2015-05-27 14:25:09 UTC
Created attachment 116073 [details]
an example of a pdf document

When you open LibreOffice 4.4.3-5.0.0.0beta1 PDF opened with the wrong encoding. In Foxit, Adobe, STDU everything is fine.
Comment 1 raal 2015-05-27 20:01:09 UTC
I can confirm with Version: 5.1.0.0.alpha1+
Build ID: b9630867d17c01ec41f6461b1e96288f3932248c
TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2015-05-25_00:46:48
Comment 2 Urmas 2015-06-23 11:17:28 UTC
The file contains PUA-B characters which are cut to the lower word by the importer.
Comment 3 vvort 2016-06-07 10:36:46 UTC
If you try to copy this text with Adobe/Foxit, you will see the same problems.
Comment 4 Xisco Faulí 2017-06-12 11:41:27 UTC
Changing version back to the earliest affected version.
Comment 5 paulystefan 2017-09-20 22:39:13 UTC
Created attachment 136410 [details]
LOO 5.4.2.1-64 odg with errors

gsview5.0 and acrobat pdf reader no problem, LOO 5.4.2.1-64 same problem also with new pdfium-engine
Comment 6 QA Administrators 2018-09-21 02:47:43 UTC Comment hidden (obsolete)
Comment 7 Roman Kuznetsov 2018-09-21 12:42:57 UTC
still repro in 6.1.1.2
Comment 8 QA Administrators 2019-09-22 02:59:30 UTC Comment hidden (obsolete)
Comment 9 neomix 2019-09-22 09:50:01 UTC
6.3.1.2 with errors
Comment 10 QA Administrators 2021-09-23 03:33:43 UTC Comment hidden (obsolete)
Comment 11 Rajasekaran Karunanithi 2022-11-26 00:11:24 UTC
Still reproducable in LO 7.4.2.3 under Windows 10(x64).

Version: 7.4.2.3 (x64) / LibreOffice Community
Build ID: 382eef1f22670f7f4118c8c2dd222ec7ad009daf
CPU threads: 4; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: ta-IN (en_IN); UI: en-US
Calc: threaded
Comment 12 ⁨خالد حسني⁩ 2022-12-25 02:14:47 UTC
The PDF is broken, there is no ToUnicode mapping the glyphs to input text and the textual content can’t be properly extracted.