Bug 149216 - Draw pdf import: wrong width of imported text boxes
Summary: Draw pdf import: wrong width of imported text boxes
Status: RESOLVED DUPLICATE of bug 49705
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
7.3.2.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-05-21 11:56 UTC by Yan Pas
Modified: 2024-03-20 13:28 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
screenshot (431.59 KB, image/png)
2022-05-21 11:56 UTC, Yan Pas
Details
the document, page 3 (108.15 KB, application/pdf)
2022-05-21 11:58 UTC, Yan Pas
Details
pg3 of attachment 180282 extracted (PDFtk) and inserted (3.98 MB, application/vnd.oasis.opendocument.text)
2022-05-21 13:39 UTC, V Stuart Foote
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yan Pas 2022-05-21 11:56:43 UTC
Created attachment 180281 [details]
screenshot

After importing some document as PDF into Draw, some text boxes are wider than in original document, thus I'm unable to edit this pdf.
Comment 1 Yan Pas 2022-05-21 11:58:09 UTC
Created attachment 180282 [details]
the document, page 3
Comment 2 V Stuart Foote 2022-05-21 13:39:38 UTC
Created attachment 180284 [details]
pg3 of attachment 180282 [details] extracted (PDFtk) and inserted

LibreOffice is not a PDF editor. The PDF import filter correctly reads the text runs and assigns a reasonable font. That the span is over width to the "margins" of the original is not a bug, the text runs are independent of other page decorations.

Fidelity of page 3, extracted and inserted (so using the pdfium based libs) is flawless.

LibreOffice is *NOT* a PDF editor (there is no such program)--attempting to use it as such is wrong.
Comment 3 Yan Pas 2022-05-22 11:16:41 UTC
OK, I see, I wish it could edit PDFs like MS Word and google docs do (yes, this document was imported in Draw, not Writer). Mark it as invalid if the behaviour is expected then.
Comment 4 V Stuart Foote 2022-05-22 17:18:27 UTC
Yes, MS Word 2019 does a fair job structuring its PDF import. And it saves that out to functional ODF Text document .ODT if you need.

At the moment, Justin Luth's work on bug 118370 to cleanup LibreOffice Text boxes and provide a means to merge a selection of multiple Text boxes into a new single Text box on the Draw canvas is a functional, if manual process--via UNO .uno:TextCombine

The resulting Text Box does not resize on the Draw canvas to match margin layout of other imported PDF elements, but the resulting Text Box can be resized as needed to manually compose the Draw document.

There is no corresponding command when the PDF is filter imported to Writer or to Impress. Not clear implementing the same UNO TextCombine command for TextBox in Writer is even feasible. 

More general handling for extracting text runs from PDF is open as bug 32249, but in sum the LibreOffice filter offerings and object framework are not suited to parsing PDF layout back into meaningful editable ODF documents with acceptable fidelity. The PDF format is a presentation format--it is not intended to be "edited".
Comment 5 Stéphane Guillou (stragu) 2024-03-20 13:28:10 UTC
Reproduce with sample PDF in:

Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 53c5d570cab036b23f4969b858a648c8f0c24f93
CPU threads: 8; OS: Linux 6.5; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: CL threaded

This issue relates to the justification of the source paragraph, so let's mark as duplicate of bug 49705.
Thanks!

*** This bug has been marked as a duplicate of bug 49705 ***