Created attachment 139227 [details] Archive containing original document and constructed pdf files When using the export as PDF menu option and creating a pdf, some links cannot be copied properly using adobe acrobat reader DC version 2018.009.20050 under Windows 10 Links copied from the pdf have some letters changed or added and therefore become corrupt. Similar misbehavior happens when printing to a pdf file using the printer driver in Windows 10, Attached are : 1. Example docx document test.docx 2. pdf created using the libra office pdf export - test.pdf 3. pdf created using windows print to pdf driver - test_print_as_pdf.pdf To reconstruct the problem copy the text from test.pdf into a notepad. it will result in: 2012 - Present: The Reference Model: A Disease Model for Diabeti disease progression based on use of iomputng power and literature referenies. See: htps://simtk.org/proeeits/therefmodel Do the same for test_print_as_pdf.pdf and it will result in: 2012 - Present: The Reference Model: A Disease Model for Diabec disease progression based on use of compung power and literature references. See: hps://simtk.org/projects/therefmodel Clearly there is some incompatibility between software components since different conversions to pdf create different outcomes. It is also possible this is a bug in adobe copy. Hopefully this description is sufficient to reproduce the issue. Jacob
Issue related to fonts used. I see it happens with Calibri and Carlito. Can be seen when text from PDF is copied and pasted back. Started in 5.3.0
What is happening here is that Callibri has a ti ligature that is enabled by default and the PDF we produce has problems in copying ligatures from fonts built in certain ways. Before the switch to HarfBuzz we didn’t enabling ligatures for Latin text at all so such issue was masked. Not actually regression, copying text with ligatures and other advanced text layout features have always been. A simple workaround this is to disable ligatures, proper fix is tracked in bug 66597. *** This bug has been marked as a duplicate of bug 66597 ***
*** Bug 116284 has been marked as a duplicate of this bug. ***
Bug 66597 is becoming a kind of meta bugs with different issues lumped together, lets separate different issues.
*** Bug 116056 has been marked as a duplicate of this bug. ***
*** Bug 116490 has been marked as a duplicate of this bug. ***
Khaled Hosny committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=b94a66ebc8db6c5ca9c7dcfdfbb06b49deae4939 tdf#115117: Fix PDF ToUnicode CMAP for ligatures It will be available in 6.1.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
With fix we get "ti " and "htt ps" i.e. all chars but with some space: on use of computi ng power and literature references. See: htt ps://simtk.org/projects/therefmodel Surely better than it was, but can you please explain the space.
Khaled Hosny committed a patch related to this issue. It has been pushed to "libreoffice-6-0": http://cgit.freedesktop.org/libreoffice/core/commit/?id=90fb652ebbc4b16ae5001140076f52209e913345&h=libreoffice-6-0 tdf#115117: Fix PDF ToUnicode CMAP for ligatures It will be available in 6.0.4. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Confirmed fixed for me on 6.0.4. Thanks!
(In reply to Timur from comment #8) > With fix we get "ti " and "htt ps" i.e. all chars but with some space: > on use of computi ng power and literature references. See: > htt ps://simtk.org/projects/therefmodel Which file is this?
(In reply to Khaled Hosny from comment #11) > Which file is this? DOCX from this bug. And also in duplicate 116490 with word "final" that becomes "fi nal".
(In reply to Timur from comment #12) > (In reply to Khaled Hosny from comment #11) > > Which file is this? > DOCX from this bug. > And also in duplicate 116490 with word "final" that becomes "fi nal". I cannot reproduce that, here is the text extracted with pdftotext: 2012 - Present: The Reference Model: A Disease Model for Diabetic disease progression based on use of computing power and literature references. See: https://simtk.org/projects/therefmodel
(In reply to Khaled Hosny from comment #13) > (In reply to Timur from comment #12) > > (In reply to Khaled Hosny from comment #11) > > > Which file is this? > > DOCX from this bug. > > And also in duplicate 116490 with word "final" that becomes "fi nal". > > I cannot reproduce that, here is the text extracted with pdftotext: > > > > 2012 - Present: The Reference Model: A Disease Model for Diabetic disease > progression based > on use of computing power and literature references. See: > https://simtk.org/projects/therefmodel The text copied from Acrobat Reader DC: 2012 - Present: The Reference Model: A Disease Model for Diabetic disease progression based on use of computing power and literature references. See: https://simtk.org/projects/therefmodel
I copy the text from LO exported test.pdf in PDF-Xchange Viewer 2.5. into a notepad or again to LO and I see space. But when I copy text from the same test.pdf from within Adobe Reader or Master PDF Editor, it's fine. Sorry I didn't test both. Maybe it's about my Viewer (https://www.tracker-software.com/product/pdf-xchange-viewer/download?fileid=446). But text copied from MSO exported test.pdf from within the same PDF-Xchange Viewer is fine. So I thought it's LO issue. Something is different, I don't say wrong.
Let me write a conclusion: looks like a Viewer bug. I use an old version because I have a license. New version from https://www.tracker-software.com/product/pdf-xchange-editor/download?fileid=613 doesn't copy a space. Thank you.
*** Bug 117451 has been marked as a duplicate of this bug. ***