Bug 116911 - PDF generated from LibreOffice Writer has a problem where text from the document, when copied gets mangled.
Summary: PDF generated from LibreOffice Writer has a problem where text from the docum...
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
(earliest affected)
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: Not Assigned
Depends on:
Blocks: PDF-Export
  Show dependency treegraph
Reported: 2018-04-10 02:44 UTC by Kevin Buchs
Modified: 2018-04-17 15:33 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:

Original Writer document (651.48 KB, application/vnd.oasis.opendocument.text)
2018-04-10 02:46 UTC, Kevin Buchs
PDF exported by LibreOffice (1.18 MB, application/pdf)
2018-04-10 02:48 UTC, Kevin Buchs
Text document pasted from copy of PDF (2.95 KB, text/plain)
2018-04-10 02:49 UTC, Kevin Buchs
My PDF export options (17.61 KB, image/png)
2018-04-10 02:50 UTC, Kevin Buchs
text file of corrupted strings dumped from PDF (4.37 KB, text/plain)
2018-04-10 05:57 UTC, V Stuart Foote
text file extracted strings after font change to Default style and export to PDF (4.43 KB, text/plain)
2018-04-10 05:58 UTC, V Stuart Foote

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Buchs 2018-04-10 02:44:22 UTC
I create documents regularly which are then converted to PDF by LibreOffice. When I open the documents in one of several PDF readers and highlight/copy the text and then paste it into a text editor or into the editor of a Word Press site, the text gets garbled. Most of the characters and sentences are correct, but then there are sections where different characters appear or no characters appear. I would guess overall it is is 90-95% correct. 

Steps to Reproduce:
1. Export PDF
2. Open in PDF reader. Copy some text to clipboard.
3. Paste into plain text processor

Actual Results:  
I will attach the original document, the PDF and the text file generated from the copy/paste. I will also attach an image of my very plain/default PDF export settings. 

Expected Results:
see above

Reproducible: Always

User Profile Reset: Yes

OpenGL enabled: Yes

Additional Info:

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
Comment 1 Kevin Buchs 2018-04-10 02:46:51 UTC
Created attachment 141239 [details]
Original Writer document
Comment 2 Kevin Buchs 2018-04-10 02:48:00 UTC
Created attachment 141240 [details]
PDF exported by LibreOffice
Comment 3 Kevin Buchs 2018-04-10 02:49:00 UTC
Created attachment 141241 [details]
Text document pasted from copy of PDF
Comment 4 Kevin Buchs 2018-04-10 02:50:00 UTC
Created attachment 141242 [details]
My PDF export options
Comment 5 V Stuart Foote 2018-04-10 05:48:48 UTC
Confirmed on Windows 10 Pro 64-bit en-US with
 Version: (x64)
Build ID: 8f48d515416608e3a835360314dac7e47fd0b821
CPU threads: 8; OS: Windows 10.0; UI render: default; 
Locale: en-US (en_US); Calc: CL

So this is really weird--is it a problem with the font. Or with export filtering to PDF?

In sample ODT, the Default paragraph style has modified font to use Calibri, and direct formatting is applied to increase font size.

Exporting from the ODT to PDF result renders in various viewers (Adobe Reader, Firefox, Chrome) correct appearance. But as noted, a select, copy, paste from the document has characters garble: l -> a, a -> l, o -> i, i -> o, g -> " "

Then dumping text of the PDF with gs (i.e. 'gswin64c -sDEVICE=txtwrite -o output.txt Tue-1000-Apr-10-2018-Devotion-2.pdf') has corrupted strings.

If I change the Default style to use a font other than Calibri--e.g. Liberation Sans, or Arial--the resulting PDF then has no string corruption on copy/paste or if dumping the strings with gs.

For now can work around of using a different font.

But weird!
Comment 6 V Stuart Foote 2018-04-10 05:57:21 UTC
Created attachment 141247 [details]
text file of corrupted strings dumped from PDF

This text file is a gswin64c string dump from the ODT with Calibri export. The PDF is composed correctly--but the strings have characters transposed.  Same document with a font change of the Default Paragraph style to use Arial or Liberation Sans exported to PDF that also views correctly--and when strings are extracted no glitches (attached next post).
Comment 7 V Stuart Foote 2018-04-10 05:58:54 UTC
Created attachment 141248 [details]
text file extracted strings after font change to Default style and export to PDF
Comment 8 Timur 2018-04-17 15:33:56 UTC
WFM in master. I guess it's a duplicate of Bug 115117. Please see LO 6.0.4.