Bug 121938 - PDF import: Romanian special characters not shown
Summary: PDF import: Romanian special characters not shown
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
4.1.0.4 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:pdf
Depends on:
Blocks: PDF-Import-Draw
  Show dependency treegraph
 
Reported: 2018-12-06 09:45 UTC by san_ionut
Modified: 2022-12-25 14:10 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Missing diacritics (29.64 KB, image/png)
2018-12-06 09:55 UTC, san_ionut
Details
Same text in the original PDF (24.42 KB, image/png)
2018-12-06 09:56 UTC, san_ionut
Details
PDF with Romanian diacritics (451.94 KB, application/pdf)
2019-01-04 13:23 UTC, san_ionut
Details
Another PDF with Romanian diacritics (2.07 MB, application/pdf)
2019-01-04 13:30 UTC, san_ionut
Details
odf with text Derulati (8.28 KB, application/vnd.oasis.opendocument.text)
2021-08-22 12:24 UTC, BogdanB
Details
PDF generated from derulati.odt (7.26 KB, application/pdf)
2021-08-22 12:25 UTC, BogdanB
Details

Note You need to log in before you can comment on or make changes to this bug.
Description san_ionut 2018-12-06 09:45:38 UTC
Description:
When opening a PDF in LibreOffice, the Romanian special characters are not shown.

Steps to Reproduce:
1. Open a PDF file with Romanian special characters (ă, î, ș, ț, Ă, Î, Ș, Ț)
2. The Romanian text should be written with a font which includes all the above characters (generic Windows fonts, such as the Arial family, Times New Roman etc. usually include these characters)
3.

Actual Results:
The characters ș, ț, Ș, Ț are not shown

Expected Results:
Ș, Ț, Ș, ț should be seen


Reproducible: Always


User Profile Reset: Yes


OpenGL enabled: Yes

Additional Info:
I've tested this on several .pdf files and the result is always the same
Comment 1 san_ionut 2018-12-06 09:55:13 UTC
Created attachment 147317 [details]
Missing diacritics
Comment 2 san_ionut 2018-12-06 09:56:35 UTC
Created attachment 147318 [details]
Same text in the original PDF
Comment 3 raal 2018-12-30 15:15:55 UTC Comment hidden (obsolete)
Comment 4 san_ionut 2019-01-04 13:23:55 UTC
Created attachment 148029 [details]
PDF with Romanian diacritics

This is a PDF file where the Romanian diacritics ș, Ș, ț and Ț are not shown when opening in Draw
Comment 5 san_ionut 2019-01-04 13:30:57 UTC
Created attachment 148030 [details]
Another PDF with Romanian diacritics

Same behavior, the ș, Ș, ț and Ț characters are missing
Comment 6 raal 2019-01-05 13:38:53 UTC
Confirm.Version: 6.3.0.0.alpha0+
Build ID: ef58bf56ad292656ad2de0a417eda72cc170f782
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
and Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)
Comment 7 BogdanB 2019-01-16 04:52:10 UTC
I am romanian also and I confirm on
Version: 6.3.0.0.alpha0+
Build ID: afbbdcc216a84b59fb263777659b044c4a7cf6f0
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2019-01-13_03:54:12
Locale: ro-RO (ro_RO.UTF-8); UI-Language: en-US
Calc: threaded
Comment 8 QA Administrators 2021-01-16 04:16:29 UTC Comment hidden (obsolete)
Comment 9 BogdanB 2021-01-16 09:25:08 UTC
I downloaded again the attachment from comment 5 and open with LibreOffice. The same missing diacritics.

Version: 7.2.0.0.alpha0+
Build ID: 5adc93a9a9426ef79054751be2904896f787a8a2
CPU threads: 4; OS: Linux 5.8; UI render: default; VCL: gtk3
Locale: ro-RO (ro_RO.UTF-8); UI: en-US
Calc: threaded
Comment 10 Kevin Suo 2021-08-22 09:11:17 UTC
On current master the fonts are correctly applied as Arial or Arial Bold.
One issue I notice is that, when you open the pdf with a pdf reader (such as evince on linux) and make selection on the text, I only get "Derula i" rather than "Derulați".
Comment 11 BogdanB 2021-08-22 09:46:43 UTC
I confirm what Kevin has notice: strange behaviour in PDF: diacritics are correct only visual, not when selecting a word.

When I copy-paste text from PDF to any text editor there is NO diacritics.

Should be not our bug?
Comment 12 Kevin Suo 2021-08-22 10:17:16 UTC
Was the ț in Derulați exported as an image or any other object in the pdf? Is it ok if prepare a document using draw or writer, export to pdf and then reopen with Draw?

As the char is shown when opening with a pdf viewer, of course it is Draw's bug. But it should be very hard to fix if we don't know where to find the char text on the page.
Comment 13 Kevin Suo 2021-08-22 10:27:32 UTC
And the more strange thing is that, if I open the pdf with FireFox, the ț is shown and selectable...

The xpdf script (<libreoffice path>/program/xpdfimport) processed the ț char as blank space. So, if it is libreoffice bug, then it must be something wrong with the xpdf (or even in the upstream poppler which is used by libreoffice to process the pdf)
Comment 14 Kevin Suo 2021-08-22 10:44:25 UTC
I GUESS the problem may be somewhere in member function 
PDFOutDev::drawChar

in sdext/source/pdfimport/xpdfwrapper/pdfioutdev_gpl.cxx

i.e., need debug to see if the special char is correctly included but is then lost within this function.
Comment 15 BogdanB 2021-08-22 12:24:43 UTC
Created attachment 174472 [details]
odf with text Derulati
Comment 16 BogdanB 2021-08-22 12:25:43 UTC
Created attachment 174473 [details]
PDF generated from derulati.odt
Comment 17 BogdanB 2021-08-22 12:27:40 UTC
I attached an odt file with text "Derulați" and also the exported PDF file generated form it. As you can see the PDF if it is opened with Draw is correctly open and the text and diactricits are well read by LibreOffice.

So, I this the problem should be in the PDF file used by the reporter. Maybe is not well encoded in PDF.
Comment 18 ⁨خالد حسني⁩ 2022-12-25 01:48:52 UTC
I can reproduce this issue with any of the attached PDFs, any one else can reproduce?

Version: 7.4.3.2 / LibreOffice Community
Build ID: 1048a8393ae2eeec98dff31b5c133c5f1d08b890
CPU threads: 10; OS: Mac OS X 13.0.1; UI render: default; VCL: osx
Locale: en-US (en_EG.UTF-8); UI: en-US
Calc: threaded
Comment 19 BogdanB 2022-12-25 06:09:54 UTC
I can NOT repro with
Version: 7.5.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: ad085990b8073a122ac5222e5220f8f1d6826dcf
CPU threads: 16; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: ro-RO (ro_RO.UTF-8); UI: en-US
Calc: threaded

Also I can NOT repro with
Version: 7.4.0.2 / LibreOffice Community
Build ID: 1512ce97d7ed39dce3121f7e15651fd8895f950e
CPU threads: 16; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: ro-RO (ro_RO.UTF-8); UI: en-US
Calc: threaded

But I have to mention that I have ttf-mscorefonts-installer (Microsoft Fonts) installed on my Linux computer. 
(sudo add-apt-repository multiverse AND sudo apt update && sudo apt install ttf-mscorefonts-installer)
Comment 20 ⁨خالد حسني⁩ 2022-12-25 14:10:09 UTC
Sorry, I meant to say I can not reproduce. Closing, please reopen if you can still reproduce with up to date versions of LibreOffice.