Bug 115123 - Copying text from PDF gives corrupt text in new file depending on font
Summary: Copying text from PDF gives corrupt text in new file depending on font
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
6.0.0.2 rc
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-20 11:21 UTC by dehcjam
Modified: 2018-04-19 09:44 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
PDF file with FreeSerif that looks good (24.99 KB, application/pdf)
2018-01-21 12:20 UTC, dehcjam
Details
Text output after copy and paste from the PDF file with FreeSErif (11.26 KB, application/vnd.oasis.opendocument.text)
2018-01-21 12:21 UTC, dehcjam
Details
PDF file with Time New Roman - copy and paste text from it works (44.24 KB, application/pdf)
2018-01-21 12:23 UTC, dehcjam
Details
Original ODT file created with LO 6.0.0.2 (9.54 KB, application/vnd.oasis.opendocument.text)
2018-01-21 12:24 UTC, dehcjam
Details

Note You need to log in before you can comment on or make changes to this bug.
Description dehcjam 2018-01-20 11:21:55 UTC
Description:
A PDF was generated from ODT file with Writer 6.0.0.2 (RC2).
When copying text from the PDF file in Adobe Acrobat Reader and the pasting the text in a new file the text is corrupt when the fonts FreeSerif  or Linux Libertine are used. Using Times New Roman the copied text is correct.
OS: Windows 10. The same problem was observed with LO 6.0.0.1 in openSUSE Tumbleweed when using FreeSerif.

Steps to Reproduce:
1.ODT created with Writer 6.0 RC2
2.PDF generated
3.PDF looks good
4.Copy and paste text from PDF file with Adobe Acrobat Reader DC (18.009.20050) 

Actual Results:  
The pasted text is corrupt depending on the font used. It is corrupt with FreeSerif (version from 2012-05-03) or Linux Libertine(5.3.0) but correct with Times New Roman.

Expected Results:
The text copied from PDF file should be correct.


Reproducible: Always


User Profile Reset: No



Additional Info:
The same problem was observed with LO 6.0.0.1 in openSUSE Tumbleweed when using FreeSerif.


User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Comment 1 Chavdar 2018-01-20 16:15:21 UTC
I could not find any problems with corrupted text.

Reproduced the steps exactly and used Liberation Serif and Linux Libertine G fonts.

Version: 6.0.0.2 (x64)
Build ID: 06b618bb6f431d27fd2def25aa19c833e29b61cd
CPU threads: 4; OS: Windows 10.0; UI render: default; 
Locale: bg-BG (bg_BG); Calc: group
Comment 2 Jean-Baptiste Faure 2018-01-20 21:06:13 UTC
What do you mean by "corrupt" ?
Please provide a test file and describe step by step what to do to reproduce the problem.
Do you experience the same problem if you paste the copied text in another text editor?

Set status to NEEDINFO, please set it back to UNCONFIRMED once requested
informations are provided.

Best regards. JBF
Comment 3 dehcjam 2018-01-21 12:20:08 UTC
Created attachment 139238 [details]
PDF file with FreeSerif that looks good
Comment 4 dehcjam 2018-01-21 12:21:29 UTC
Created attachment 139239 [details]
Text output after copy and paste from the PDF file with FreeSErif
Comment 5 dehcjam 2018-01-21 12:23:18 UTC
Created attachment 139240 [details]
PDF file with Time New Roman - copy and paste text from it works
Comment 6 dehcjam 2018-01-21 12:24:59 UTC
Created attachment 139241 [details]
Original ODT file created with LO 6.0.0.2
Comment 7 dehcjam 2018-01-21 12:32:21 UTC
The problem is still reproducible here.
Steps:
1. Create ODT file
2. Export ODT to PDF with LO 6.0.0.2 in Windows 10
3. Copy text from the PDF file and paste it in a new ODT file.
Result: The text is messed up as can be seen in the attached file with the name LO6-PDF-copy-text-corrupt-output.odt
It is also messed up when the text is pasted in Wordpad in Windows or in KWrite in openSUSE 42.3
Doing the same with LO 5.3.4.2 in openSUSE 42.3 gives the expected result: the text is the same as in the PDF file after copy & paste.
Comment 8 Xisco Faulí 2018-01-22 14:57:59 UTC
putting back to UNCONFIRMED as steps have been provided in comment 7
Comment 9 Chavdar 2018-01-22 15:51:26 UTC
Confirmed

Copying the text from file "PDF file with FreeSerif that looks good" makes the pasted text corrupted.
Times New Roman one pastes as expexted.

Version: 5.4.4.2 (x64)
Build ID: 2524958677847fb3bb44820e40380acbe820f960
CPU threads: 4; OS: Windows 6.19; UI render: default; 
Locale: bg-BG (bg_BG); Calc: group

Version: 6.1.0.0.alpha0+
Build ID: d28e10b095b4ee0986fbe86170928bf077da04b9
CPU threads: 4; OS: Windows 10.0; UI render: default; 
TinderBox: Win-x86@62-TDF, Branch:MASTER, Time: 2018-01-13_22:59:50
Locale: bg-BG (bg_BG); Calc: group threaded
Comment 10 Jean-Baptiste Faure 2018-01-22 21:20:10 UTC
Confirmed under Ubuntu 16.04 with LO 5.4.4, LO 6.0.1.0.0+ and the current master.
No problem if the font used is Caladea but same problem with Ubuntu font.
The problem is also seen when importing the pdf with LibreOffice Draw.

Best regards. JBF
Comment 11 Timur 2018-04-19 09:44:55 UTC
WorksForMe now in master. I guess a dupe of Bug 115117. Please test with 6.0.4.