1. Open the attached PDF in Draw
2. Consider the two-line title on the first page, and specifically the vertical placement of the second line, at a (relatively high) vertical distance from the fir
3. Select the two textboxes of the title text
4. In the context menu for the selected boxes, choose "Consolidate Text"
5. Consider again the vertical placement of the second line of the title text and its distance from the first line
The second line of title text does not move. The space between the two lines is set (either via space-after the first line or space-before the second line, preferably the latter) so as to maintain the vertical position of the second title line.
The second line of the title moves up
Note that preserving the position is of significance when the origin of the two lines is in a document you did not create, and whose text you wish to edit, but whose layout, especially vertical layout, must be maintained. Specifically important when editing imported PDF files.
Created attachment 189652 [details]
PDF to import into draw for reproducing the bug
PDF for use with the reproduction instructions.
Created attachment 189653 [details]
A simple LibreOffice Draw file for reproduction
For a simpler reproduction:
1. Create a new drawing in LO.
2. Insert two textboxes with some text at a non-trivial vertical distance from each other.
3. Select both textboxes.
4. In the context menu, select Consolidate Text.
The attachment lets you skip steps (1.) - (2.).
Why? PDF is not intended as and editable document format. And LibreOffice is certainly not a PDF editor.
Despite the work Justin Luth did on 'Consolidate Text' for bug 118370, bug 32249 remains open to implement a different filter import approach that would do more efficient parsing of BT/ET spans and 'ActualText' streams and render text to paragraph objects rather than draw text box objects.
There is little need nor justification otherwise to put dev effort into the PDF import filter(s) that drive the simple consolidate text handling. The text spans for the individual draw text boxes have been parsed as has the canvas position for the draw object.
When user elects to consolidate text, the formatting of the initial text box selection is the basis for the resulting consolidation--the subsequent text boxes are not considered as they are irrelevant to the intent.
That was spelled out in Justin's commit.
To my understanding we use PDFium and I wonder if we should defer those tickets to https://groups.google.com/g/pdfium-bugs.
(In reply to V Stuart Foote from comment #3)
This is not a bug against the PDF import filter. This is not a bug requiring "editing PDFs". This is a bug about consolidating two textboxes.
> When user elects to consolidate text, the formatting of the initial text box
> selection is the basis for the resulting consolidation
Yes, this is a bug. And that's because the user can always choose to apply the first text box' format uniformly, if they want to; but they can't restore the second textbox' formatting of its former content, in the consolidated box, after the consolidation.
(In reply to Eyal Rozenberg from comment #5)
> Yes, this is a bug. And that's because the user can always choose to apply
> the first text box' format uniformly, if they want to; but they can't
> restore the second textbox' formatting of its former content, in the
> consolidated box, after the consolidation.
Let me clarify that:
* If the formatting and placement of each text box is maintained on consolidation, the user can then always choose to apply the first textbox' format uniformly to the consolidated box.
* If the placement of the second box and its formatting is forgotten upon consolidation, the user cannot restore it. Even if the second textbox used a named style - that won't help, since the vertical space between the textboxes is not captured anywhere.
So, the consolidation should choose the less-destructive alternative. Now, Justin wrote that "It is the user's responsibility to afterwards fix up paragraphing, and set paragraph properties." - and that's fine; in fact, it agrees even better with my suggestion.
(In reply to Heiko Tietze from comment #4)
> To my understanding we use PDFium and I wonder if we should defer those
> tickets to https://groups.google.com/g/pdfium-bugs.
1. This bug is not about the import filter...
2. AFAIAC We use pdfium for raster-rendering, not for extracting multiple objects; and again, that doesn't matter for purposes of this bug.
The 'Consolidate Text' utility was implemented expressly for bug 118370 as a means to work with lexically disjointed text span content filter imported from PDF as draw textbox objects.
Realistically, no one authoring a document would prepare/place text boxes and then choose to consolidate them. And other than with import of PDF content, concerns for loss of spatial placement/formatting of subsequent draw objects is a non-issue.
So as implemented the utility is intended only for handling import of PDF, and as such it is suited to task with needed enhancement to the import process to be done for bug 32249
Implementing bug 32249 would probably continue to use the poppler/cairo libs as now, or possibly could shift to a pdfium based solution. In either case to import PDF text spans directly to ODF Paragraph objects or consolidated sd Textbox objects depending on the LO module receiving the import.
This is NAB, and see no need for Enhancement of the utility as implemented.
Created attachment 189946 [details]
A more complex ODG test file to have the full picture
Here is another test file, including character and paragraph formatting, as well as text box formatting, to have the full picture.
* (Noting that paragraph and character formatting _is_ conserved. Not text box formatting.)
* Hoping to conserve relative position of the different pieces of text is unrealistic. Your example uses the specific case of two text boxes that happen to be left-aligned and that don't overlap in their X and/or Y positions. But in other cases, consolidating will involve LTR boxes that are not left-aligned and that would end up in an overlap.
* If we were to somehow implement what you are asking in a manner that accommodates the above, the original paragraph formatting would have to be amended with a new custom value for Below or Above Paragraph Spacing – which would then propagate to new inserted paragraphs. This means that it could go _against_ the expectation of someone joining two text boxes in order to have a more consistent document with equal spacing of paragraphs. (i.e. "Why did this keep the wrong spacing? I wanted to get something consistent!")
Whatever the feature was designed for in the first place, it is not used exclusively for PDF editing (even though it's particularly useful for it, as the documentation says) and it would be bad to break existing.
IMHO, because of the above, this is a "won't fix", and we should focus on solving bug 32249 to resolve the issue of too many separate text boxes on PDF import.
(In reply to V Stuart Foote from comment #8)
> Realistically, no one authoring a document would prepare/place text boxes
> and then choose to consolidate them.
Well, if you could consolidate shapes as well as no-shape textboxes, then - it does becomes more realistic it might be desirable. Although, granted, in that case you would probably not care about v-positioning.
> And other than with import of PDF
> content, concerns for loss of spatial placement/formatting of subsequent
> draw objects is a non-issue.
Hmm... I wonder if there are indeed no other use-cases. That's a pretty strong assumption. But Ok, I'll grant you this as well.
> So as implemented the utility is intended only for handling import of PDF,
> and as such it is suited to task with needed enhancement to the import
> process to be done for bug 32249
No. Bug 32249 is about making it easier to edit imported PDFs. If, when consolidating text, you lose the v-positioning, that hinders the editing process: You either mess up the document, or have to artificially re-position text to reproduce what you've lost. That's bad.
Plus, once a feature is introduced into LO - it should be implemented properly, not just to the minimum level at which someone believes that a certain bug report needs. "Consolidate text", taken unto itself, should, and I would say must, maintain positioning; and the user may choose to drop this formatting if they wish.
(In reply to Stéphane Guillou (stragu) from comment #9)
> Here is another test file, including character and paragraph formatting, as
> well as text box formatting, to have the full picture.
Good example to think about this problem, yes.
> * (Noting that paragraph and character formatting _is_ conserved. Not text
> box formatting.)
Not exactly. That is, think of the horizontal endpoints of the text in the top textbox. If you consolidate it with the one below it - the paragraphs don't end at the same place, they end near the right edge of the combined textbox. So, the "right indent 0" is maintained, but not the right edge of the paragraph. If that were maintained, the vertical positioning of the text from the first textbox would also be maintained.
> * Hoping to conserve relative position of the different pieces of text is
> unrealistic. Your example uses the specific case of two text boxes that
> happen to be left-aligned and that don't overlap in their X and/or Y
Well, that is a trickier case, yes. However, if we supported negative vspace before paragraph, it would be possible (and easy)
> But in other cases, consolidating will involve LTR boxes that are
> not left-aligned and that would end up in an overlap.
Why does the left-aligning matter? That's a paragraph-level feature.
> * If we were to somehow implement what you are asking in a manner that
> accommodates the above, the original paragraph formatting would have to be
> amended with a new custom value for Below or Above Paragraph Spacing – which
> would then propagate to new inserted paragraphs.
It would, yes.
> This means that it could go
> _against_ the expectation of someone joining two text boxes in order to have
> a more consistent document with equal spacing of paragraphs.
1. We are in Draw, not Writer. I might have accepted this line of reasoning there
2. The user knows they've consolidated text in a way which maintains vertical (and horizontal) positioning, i.e. that they get slightly "contorted" settings to make that happen. If they want a consistent text box with equal spacing etc. - they can clear the DF we introduce, very easily. But if we make things simple and consistent, the user can't, without significant effort and basing themselves on good memory, reposition the content.
> (i.e. "Why did
> this keep the wrong spacing? I wanted to get something consistent!")
"Why? Since it looks the same as before the consolidation."
"I want something consistent. This has a bunch of DF originating in PDF placement and textbox consolidation; let me clear the DF and set my own formatting"
> Whatever the feature was designed for in the first place, it is not used
> exclusively for PDF editing (even though it's particularly useful for it, as
> the documentation says) and it would be bad to break existing.
So, Stuart made the opposite argument. I think the behavior I suggest is better fore the feature as-is: Maintaining rather than losing information.
However, if we can sketch out an non-PDF-import use case when Textbox consolidation is supposed to act differently - then perhaps we could have either two version of the command's behavior, e.g.:
* Post-command-application baloon widget, a-la the paste options balloon in MS Office, or
* Simply two commands, e.g. "Consolidate" and "Consolidate Text" or whatever
* A "maintain positioning?" dialog box
The question to what extend of precision we load PDF is to me rather a marketing and development effort question and not concerning UX. Removing the keyword.