Bug 116911 - PDF generated from LibreOffice Writer has a problem where text from the document, when copied gets mangled.
Summary: PDF generated from LibreOffice Writer has a problem where text from the docum...
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: PDF-Export
  Show dependency treegraph
 
Reported: 2018-04-10 02:44 UTC by Kevin Buchs
Modified: 2018-04-17 15:33 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Original Writer document (651.48 KB, application/vnd.oasis.opendocument.text)
2018-04-10 02:46 UTC, Kevin Buchs
Details
PDF exported by LibreOffice (1.18 MB, application/pdf)
2018-04-10 02:48 UTC, Kevin Buchs
Details
Text document pasted from copy of PDF (2.95 KB, text/plain)
2018-04-10 02:49 UTC, Kevin Buchs
Details
My PDF export options (17.61 KB, image/png)
2018-04-10 02:50 UTC, Kevin Buchs
Details
text file of corrupted strings dumped from PDF (4.37 KB, text/plain)
2018-04-10 05:57 UTC, V Stuart Foote
Details
text file extracted strings after font change to Default style and export to PDF (4.43 KB, text/plain)
2018-04-10 05:58 UTC, V Stuart Foote
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Buchs 2018-04-10 02:44:22 UTC
Description:
I create documents regularly which are then converted to PDF by LibreOffice. When I open the documents in one of several PDF readers and highlight/copy the text and then paste it into a text editor or into the editor of a Word Press site, the text gets garbled. Most of the characters and sentences are correct, but then there are sections where different characters appear or no characters appear. I would guess overall it is is 90-95% correct. 

Steps to Reproduce:
1. Export PDF
2. Open in PDF reader. Copy some text to clipboard.
3. Paste into plain text processor

Actual Results:  
I will attach the original document, the PDF and the text file generated from the copy/paste. I will also attach an image of my very plain/default PDF export settings. 

Expected Results:
see above


Reproducible: Always


User Profile Reset: Yes


OpenGL enabled: Yes

Additional Info:


User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
Comment 1 Kevin Buchs 2018-04-10 02:46:51 UTC
Created attachment 141239 [details]
Original Writer document
Comment 2 Kevin Buchs 2018-04-10 02:48:00 UTC
Created attachment 141240 [details]
PDF exported by LibreOffice
Comment 3 Kevin Buchs 2018-04-10 02:49:00 UTC
Created attachment 141241 [details]
Text document pasted from copy of PDF
Comment 4 Kevin Buchs 2018-04-10 02:50:00 UTC
Created attachment 141242 [details]
My PDF export options
Comment 5 V Stuart Foote 2018-04-10 05:48:48 UTC
Confirmed on Windows 10 Pro 64-bit en-US with
 Version: 6.0.3.2 (x64)
Build ID: 8f48d515416608e3a835360314dac7e47fd0b821
CPU threads: 8; OS: Windows 10.0; UI render: default; 
Locale: en-US (en_US); Calc: CL

So this is really weird--is it a problem with the font. Or with export filtering to PDF?

In sample ODT, the Default paragraph style has modified font to use Calibri, and direct formatting is applied to increase font size.

Exporting from the ODT to PDF result renders in various viewers (Adobe Reader, Firefox, Chrome) correct appearance. But as noted, a select, copy, paste from the document has characters garble: l -> a, a -> l, o -> i, i -> o, g -> " "

Then dumping text of the PDF with gs (i.e. 'gswin64c -sDEVICE=txtwrite -o output.txt Tue-1000-Apr-10-2018-Devotion-2.pdf') has corrupted strings.

If I change the Default style to use a font other than Calibri--e.g. Liberation Sans, or Arial--the resulting PDF then has no string corruption on copy/paste or if dumping the strings with gs.

For now can work around of using a different font.

But weird!
Comment 6 V Stuart Foote 2018-04-10 05:57:21 UTC
Created attachment 141247 [details]
text file of corrupted strings dumped from PDF

This text file is a gswin64c string dump from the ODT with Calibri export. The PDF is composed correctly--but the strings have characters transposed.  Same document with a font change of the Default Paragraph style to use Arial or Liberation Sans exported to PDF that also views correctly--and when strings are extracted no glitches (attached next post).
Comment 7 V Stuart Foote 2018-04-10 05:58:54 UTC
Created attachment 141248 [details]
text file extracted strings after font change to Default style and export to PDF
Comment 8 Timur 2018-04-17 15:33:56 UTC
WFM in master. I guess it's a duplicate of Bug 115117. Please see LO 6.0.4.