Copy and paste source code text from libreoffice pdf guides loses indent space characters. Reproducing: For example, the bottom of https://documentation.libreoffice.org/assets/Uploads/Documentation/en/GS7.3/GS73-GettingStarted.pdf#page=431 has sample python code. A copy and paste as text loses the blank line and any line with leading spaces is pasted with a single space before the code, losing the other spaces. Tabs are ignored. Code is formatted in Liberation Mono, a fixed-pitch font, with 4 spaces per indent. Sample output: import uno def HelloWorld(): doc = XSCRIPTCONTEXT.getDocument() cell = doc.Sheets[0]['A1'] cell.setString('Hello World from Python') return This is important for a language like python because indents are part of the syntax. It's expected the space characters would be retained, at least with an option.
Here's a better example: *Text.txt* def hello_world(): print("Hello World!") hello_world() *File ends line above with no CRLF* Method of capturing the text is important. Exporting as pdf with default settings retains the leading spaces but drops blank lines. Using: java -jar pdfbox-app-2.0.25.jar ExtractText test.pdf test1.txt Version: 7.3.1.3 (x64) / LibreOffice Community Build ID: a69ca51ded25f3eefd52d7bf9a5fad8c90b87951 CPU threads: 8; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win Locale: en-AU (en_AU); UI: en-GB Calc: threaded ============================================================ java -jar pdfbox-app-2.0.25.jar TextToPDF -standardFont Courier test.pdf test.txt java -jar pdfbox-app-2.0.25.jar ExtractText test.pdf test1.txt Retains blank lines (but appends space to each line and a final CRLF.
Blank lines might be a problem because there is no such thing as a new line character in a pdf, the next paragraph is just started further down. A viable workaround might be to add a single space to empty lines. If any code was developed for the python REPL a space on a line intended to be blank would cause it to fail. https://pdf-xchange.eu/ will select/copy text including all white space for pasting but it can't pick up blank lines (because they don't exist).
Pasting the selection from https://pdf-xchange.eu/ will contain the required blank lines *if* it is written in the pdf. LibreOffice does not write blank lines in the pdf. I suggest an enhancement to tweak the format to support it. eg: stream /F1 10 Tf BT 40 763.07751 Td 0 -11.0775 Td (Lorem ipsum dolor sit amet,) Tj 0 -11.0775 Td ( consectetur adipiscing ) Tj 0 -11.0775 Td () Tj 0 -11.0775 Td (elit. sed do eiusmod) Tj ET endstream Decoded LibreOffice pdf: stream 0.1 w q 0 0.028 595.275 841.861 re W* n q 0 0 0 rg BT 56.8 776.789 Td /F1 10 Tf<0102030405020606070809070A06010B0C0D>Tj ET Q q 0 0 0 rg BT 56.8 765.489 Td /F1 10 Tf<040404040E0A0F10110B1213020606070414070A060115120C04>Tj ET Q q 0 0 0 rg BT 56.8 742.789 Td /F1 10 Tf<05020606070809070A06010B0C>Tj ET Q Q endstream There is no "() Tj" for blank line (unpacking the characters within the lines is not important to demonstrate this). ------------------------------------------------------------ Indented code is a potential complication that could be overcome by formatting it left aligned.
*** This bug has been marked as a duplicate of bug 66181 ***
Created attachment 179058 [details] XpdfReader screenshot showing selection
Two issues with pdf: 1. Leading spaces lost UNLESS caption in selection 2. Extra blank line above MsgBox Demonstration ============= The functionality depends on the pdf viewer used. 1a. Using XpdfReader Version 4.03 www.xpdfreader.com * Open https://www.pitonyak.org/OOME_3_0.pdf#page=88 * Copy/Paste Listing to a text editor: Listing 59. Modified bubble sort. Sub ExampleForNextSort Dim iEntry(10) As Integer Dim iOuter As Integer, iInner As Integer, iTemp As Integer Dim bSomethingChanged As Boolean ' Fill the array with integers between -10 and 10 For iOuter = LBound(iEntry()) To Ubound(iEntry()) iEntry(iOuter) = Int((20 * Rnd) -10) Next iOuter ' iOuter runs from the highest item to the lowest For iOuter = UBound(iEntry()) To LBound(iEntry()) Step -1 'Assume that the array is already sorted and see if this is incorrect bSomethingChanged = False For iInner = LBound(iEntry()) To iOuter-1 If iEntry(iInner) > iEntry(iInner+1) Then iTemp = iEntry(iInner) iEntry(iInner) = iEntry(iInner+1) iEntry(iInner+1) = iTemp bSomethingChanged = True End If Next iInner 'If the array is already sorted then stop looping! If Not bSomethingChanged Then Exit For Next iOuter Dim s$ For iOuter = LBound(iEntry()) To Ubound(iEntry()) s = s & iOuter & " : " & iEntry(iOuter) & CHR$(10) Next iOuter MsgBox s, 0, "Sorted Array" End Sub 1b. Repeat in Acrobat, Brave, SumatraPDF etc: Listing 59. Modified bubble sort. Sub ExampleForNextSort Dim iEntry(10) As Integer Dim iOuter As Integer, iInner As Integer, iTemp As Integer Dim bSomethingChanged As Boolean ' Fill the array with integers between -10 and 10 For iOuter = LBound(iEntry()) To Ubound(iEntry()) iEntry(iOuter) = Int((20 * Rnd) -10) Next iOuter ' iOuter runs from the highest item to the lowest For iOuter = UBound(iEntry()) To LBound(iEntry()) Step -1 'Assume that the array is already sorted and see if this is incorrect bSomethingChanged = False For iInner = LBound(iEntry()) To iOuter-1 If iEntry(iInner) > iEntry(iInner+1) Then iTemp = iEntry(iInner) iEntry(iInner) = iEntry(iInner+1) iEntry(iInner+1) = iTemp bSomethingChanged = True End If Next iInner 'If the array is already sorted then stop looping! If Not bSomethingChanged Then Exit For Next iOuter Dim s$ For iOuter = LBound(iEntry()) To Ubound(iEntry()) s = s & iOuter & " : " & iEntry(iOuter) & CHR$(10) Next iOuter MsgBox s, 0, "Sorted Array" End Sub 2a. Another example using XpdfReader to open https://documentation.libreoffice.org/assets/Uploads/Documentation/en/GS7.3/GS73-GettingStarted.pdf#page=425 Sub AppendHello Dim oDoc Dim sTextService$ Dim oCurs REM ThisComponent refers to the currently active document. oDoc = ThisComponent REM Verify that this is a text document. sTextService = "com.sun.star.text.TextDocument" If NOT oDoc.supportsService(sTextService) Then MsgBox "This macro only works with a text document" Exit Sub End If REM Get the view cursor from the current controller. oCurs = oDoc.currentController.getViewCursor() REM Move the cursor to the end of the document. oCurs.gotoEnd(False) REM Insert text "Hello" at the end of the document. oCurs.Text.insertString(oCurs, "Hello", False) End Sub 2b. Let's try again including the Listing Caption: Listing 5: Append the text “Hello” at the end of to the current document Sub AppendHello Dim oDoc Dim sTextService$ Dim oCurs REM ThisComponent refers to the currently active document. oDoc = ThisComponent REM Verify that this is a text document. sTextService = "com.sun.star.text.TextDocument" If NOT oDoc.supportsService(sTextService) Then MsgBox "This macro only works with a text document" Exit Sub End If REM Get the view cursor from the current controller. oCurs = oDoc.currentController.getViewCursor() REM Move the cursor to the end of the document. oCurs.gotoEnd(False) REM Insert text "Hello" at the end of the document. oCurs.Text.insertString(oCurs, "Hello", False) End Sub
I repro with https://documentation.libreoffice.org/assets/Uploads/Documentation/en/GS7.3/GS73-GettingStarted.pdf#page=431 copied to a text editor
The leading spaces are in the PDF, viewers may chose not to copy them, but there is not much we can do about this. PDF is not a good format for exchanging plain text data because it is a purely visual format and and preservation of the underlying textual data is rather limited.
(In reply to خالد حسني from comment #8) > PDF is not a good format for exchanging plain text data because it is a > purely visual format and preservation of the underlying textual data is > rather limited. Agree. Regardless, it IS being used to exchange plain text so we should do what we can to support it, especially with such a minimal request. > The leading spaces are in the PDF, viewers may choose not to copy them, but > there is not much we can do about this. Did you look at the comment detail? "There is no "() Tj" for blank line" Explicitly writing blank lines really improves the functionality of code as demonstrated. Leading spaces are more problematic but a suitable browser could be recommended in the guides. This is an enhancement request and it would be simple to implement. I'd like it to remain open.
(In reply to flywire from comment #9) > (In reply to خالد حسني from comment #8) > > PDF is not a good format for exchanging plain text data because it is a > > purely visual format and preservation of the underlying textual data is > > rather limited. > > Agree. Regardless, it IS being used to exchange plain text so we should do > what we can to support it, especially with such a minimal request. > > > The leading spaces are in the PDF, viewers may choose not to copy them, but > > there is not much we can do about this. > > Did you look at the comment detail? > > "There is no "() Tj" for blank line" > > Explicitly writing blank lines really improves the functionality of code as > demonstrated. Leading spaces are more problematic but a suitable browser > could be recommended in the guides. I don’t think we have any knowledge of blank lines by the time we are writing PDF output. > This is an enhancement request and it would be simple to implement. I'd like > it to remain open. Fell free to re-open if you are planing to work on it, otherwise I don’t think it is as simple to implement as it might seem.