66181 – PDF printing of pre-formatted section does not preserve spaces - cut&paste is then wrong.

Bug 66181 - PDF printing of pre-formatted section does not preserve spaces - cut&paste is then wrong.

Summary: PDF printing of pre-formatted section does not preserve spaces - cut&paste is...

Status:	RESOLVED NOTOURBUG

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Printing and PDF export (show other bugs)
Version: (earliest affected)	4.1.0.0.beta2
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	PDF-Export
	Show dependency tree / graph

Reported:	2013-06-26 00:29 UTC by gordon.lack
Modified:	2023-07-03 09:31 UTC (History)
CC List:	4 users (show)

See Also:	148025
Crash report or crash signature:

Attachments
*Sample document (as .odt, .html and .pdf) to show the issue. Also the three .py files resulting from a cut&paste from each** (60.46 KB, application/x-gzip) 2013-06-26 00:29 UTC, gordon.lack	Details
Example with wide character spacing (17.75 KB, application/gzip) 2017-10-20 13:25 UTC, Tim Retout	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description gordon.lack 2013-06-26 00:29:45 UTC

Created attachment 81435 [details]
Sample document (as .odt, .html and .pdf) to show the issue.  Also the three *.py files resulting from a cut&paste from each

I can put Python program source code into a LibreOffice document and mark it as pre-Formatted (or Source Code).

If I cut&paste this into a text editor from LibreOffice itself, it works.

If I export it as an html file, and cut&paste from that, it works.

But if I export it as a PDF file then a cut&paste of that will fail, as the PDF no longer retains the information about spaces - rather essential to pythign code.

The problem also applies to other languages as in general any layout is lost, and spaces can be lost from inside string etc.

I would like PDF exports to contain real spaces formatting in those document sections which are pre-formatted in just the same ay the the HTML export lists them as "<pre>"

As a start, does anyone know which parts of the source actually handle these outputs for html and PDF?  I could then try hacking around in the code myself, but at the moment I'm lost as to where this is (not) going on.

Comment 1 Tim Retout 2017-10-20 13:25:35 UTC

Created attachment 137143 [details]
Example with wide character spacing

I have run into what looks like a very closely-related issue in LibreOffice 5.2.2 - it seems to affect wide character spacing, not just pre-formatted text.

In this example, spaces are *added* rather than removed.  The HTML version does not have spaces added.  Interestingly, spaces are not inserted between all characters:

    A R EA S O F E XP E RT I SE

This breaks copy/paste, pdf2text and searching on the generated PDF.

Comment 2 flywire 2022-03-20 09:17:18 UTC

*** Bug 148025 has been marked as a duplicate of this bug. ***

Comment 3 Cheryl Galloway 2022-06-13 06:24:00 UTC

Thank you so much for the amazing programming bugs solution here these are really useful to all. There should be the people who might face the same problem with coding able to grab the help here. Thumbs up for your trusted updates on https://www.devdiscourse.com/article/education/1772066-5-best-dissertation-writing-services-with-phd-writers these would work for all.

Comment 4 Khaled Hosny 2023-07-03 09:31:37 UTC

This depends on the font viewer. The PDF contains the correct glyphs to text mapping and the space glyph is present. Some PDF viewers will interpret wide gaps between glyphs as space even if there is no space glyph there.

I tested Adobe Acrobat Reader and Apple’s Preview and both copied the text correctly. Poppler-based PDF viewers show extra spaces.

There is not much we can do about this, short of Bug 117428 which is a heavy hammer and would it impossible to select individual characters.