I create documents regularly which are then converted to PDF by LibreOffice. When I open the documents in one of several PDF readers and highlight/copy the text and then paste it into a text editor or into the editor of a Word Press site, the text gets garbled. Most of the characters and sentences are correct, but then there are sections where different characters appear or no characters appear. I would guess overall it is is 90-95% correct.
Steps to Reproduce:
1. Export PDF
2. Open in PDF reader. Copy some text to clipboard.
3. Paste into plain text processor
I will attach the original document, the PDF and the text file generated from the copy/paste. I will also attach an image of my very plain/default PDF export settings.
User Profile Reset: Yes
OpenGL enabled: Yes
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
Created attachment 141239 [details]
Original Writer document
Created attachment 141240 [details]
PDF exported by LibreOffice
Created attachment 141241 [details]
Text document pasted from copy of PDF
Created attachment 141242 [details]
My PDF export options
Confirmed on Windows 10 Pro 64-bit en-US with
Version: 184.108.40.206 (x64)
Build ID: 8f48d515416608e3a835360314dac7e47fd0b821
CPU threads: 8; OS: Windows 10.0; UI render: default;
Locale: en-US (en_US); Calc: CL
So this is really weird--is it a problem with the font. Or with export filtering to PDF?
In sample ODT, the Default paragraph style has modified font to use Calibri, and direct formatting is applied to increase font size.
Exporting from the ODT to PDF result renders in various viewers (Adobe Reader, Firefox, Chrome) correct appearance. But as noted, a select, copy, paste from the document has characters garble: l -> a, a -> l, o -> i, i -> o, g -> " "
Then dumping text of the PDF with gs (i.e. 'gswin64c -sDEVICE=txtwrite -o output.txt Tue-1000-Apr-10-2018-Devotion-2.pdf') has corrupted strings.
If I change the Default style to use a font other than Calibri--e.g. Liberation Sans, or Arial--the resulting PDF then has no string corruption on copy/paste or if dumping the strings with gs.
For now can work around of using a different font.
Created attachment 141247 [details]
text file of corrupted strings dumped from PDF
This text file is a gswin64c string dump from the ODT with Calibri export. The PDF is composed correctly--but the strings have characters transposed. Same document with a font change of the Default Paragraph style to use Arial or Liberation Sans exported to PDF that also views correctly--and when strings are extracted no glitches (attached next post).
Created attachment 141248 [details]
text file extracted strings after font change to Default style and export to PDF
WFM in master. I guess it's a duplicate of Bug 115117. Please see LO 6.0.4.