Bug 113543 - When exporting to tagged PDF, characters in certain fonts are lost or misrepresented in the underlying text.
Summary: When exporting to tagged PDF, characters in certain fonts are lost or misrepr...
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.4.2.2 release
Hardware: All Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: PDF-Export
  Show dependency treegraph
 
Reported: 2017-10-31 03:38 UTC by Quentin Christensen
Modified: 2019-04-22 13:07 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Quentin Christensen 2017-10-31 03:38:14 UTC
I am using Google's Raleway font: https://fonts.google.com/specimen/Raleway

When I export to Tagged PDF, visually, the exported PDF appears normal.  If you read with a screen reader, or copy the text from the PDF, characters may be missing or misrepresented.  This problem seems to be worse when the document is zoomed out further.

To reproduce:

1. Set the font to Raleway.
2. Type the word "certificate".
3. Set the zoom to 20% (either control+mouse scroll wheel down, or from the view menu then zoom).
4. Export to PDF (alt+f, then e) and be sure to check "tagged PDF".
5. Open the exported PDF file in Adobe Reader.
6. Copy all the text.  Press CONTROL+A to select all, then CONTROL+C to copy.
7. Paste the text into notepad, LO writer, or anywhere else.

The expected result is that the pasted text should be the same as the original text - the word "certificate" in this case.

The actual result is a word with incorrect characters.  Usually either a missing i (certifcate), or substituted letters (certificcte) or both (certifccte).

I tested on both Windows 7 and Windows 10 with LibreOffice 5.4.2.2 (x64).

On the Windows 7 machine, increasing zoom level didn't seem to be as likely to fix the issue as it seemed to on Windows 10.
Comment 1 Dieter Praas 2017-10-31 09:09:21 UTC
Reproducible for me with LO 6.0 (got "certifccte" as result)

Version: 6.0.0.0.alpha1 (x64)
Build ID: c1d1f859b268f650143d48f294999cda0fa57350
CPU threads: 4; OS: Windows 10.0; UI render: default; 
Locale: de-DE (de_DE); Calc: group
Comment 2 Timur 2018-04-19 10:07:11 UTC
I did the following:
- got Raleway v 4020 fonts from https://github.com/impallari/Raleway/ 
- installed in Windows 7
- set Raleway-v4020 (basic and also black)
- followed repro steps
But no repro even with LO 6.0 nor with 6.1+.
Not sure if related to Bug 115117 and what zoom and tag have to do with this.
Please test again.
Comment 3 Quentin Christensen 2018-04-19 23:41:11 UTC
Hi Timur,

I tested on the build I was running this morning:

LibreOffice 6.0.0.3 (x64)
Build ID: 64a0f66915f38c6217de274f0aa8e15618924765
CPU threads: 8; OS: Windows 10.0; UI render: default; 
Locale: en-AU (en_AU); Calc: group

While I updated that build, I reproduced on my other computer on Windows 7 with LibreOffice 5.4.2.2 (x64)

Now, I've updated and rebooted and my main PC is running:
Version: 6.0.3.2 (x64)
Build ID: 8f48d515416608e3a835360314dac7e47fd0b821
CPU threads: 8; OS: Windows 10.0; UI render: default; 
Locale: en-AU (en_AU); Calc: group

My Windows 10 is the latest Fast Insider build, though since it also reproduces on Windows 7, I don't think the Windows version is so important here.

For just one word, tagging isn't necessary, true - I needed it for the file I was originally creating.

Zoom does seem to have an effect though.

When I have zoom set to 20%, my exported PDF has the word "certifccte" instead of "certificate".  When I zoom up to 200%, the word becomes "certifccate"

I can definitely still reproduce this.  I did originally download my Raleway font from Google rather than GitHub.  If it is an issue with the font itself, perhaps the GitHub version is newer - although I expect more people would find it through Google.  Surely though the font just controls how the text is rendered, it shouldn't change what the underlying text is?
Comment 4 QA Administrators 2019-04-21 02:52:38 UTC Comment hidden (obsolete)
Comment 5 Timur 2019-04-22 13:07:24 UTC
No repro now. Probably resolved with Bug 66597 or Bug 115117. 
I'll close as WFM.