Bug 154727 - pdf import: odd text layout (tabs?) (UK IHT 407)
Summary: pdf import: odd text layout (tabs?) (UK IHT 407)
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
7.6.0.0 alpha0+
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: PDF-Import-Draw
  Show dependency treegraph
 
Reported: 2023-04-09 12:59 UTC by Dave Gilbert
Modified: 2025-04-09 03:11 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Original IHT407 form unfilled (96.95 KB, application/pdf)
2023-04-09 12:59 UTC, Dave Gilbert
Details
Screenshot of LO's rendering of this document (168.02 KB, image/png)
2023-04-09 13:00 UTC, Dave Gilbert
Details
Okular's nice rendering of the same file (119.11 KB, image/png)
2023-04-09 13:02 UTC, Dave Gilbert
Details
attachment 186549 inserted to document (168.43 KB, image/png)
2023-04-09 15:42 UTC, V Stuart Foote
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Gilbert 2023-04-09 12:59:43 UTC
Created attachment 186549 [details]
Original IHT407 form unfilled

In the attached UK government form, the text is laid out on a grid in libreoffice -draw but not in any other PDF viewer.
It looks to me as if it's tab related - navigator is showing all that text as having tabs rather than spaces.
Comment 1 Dave Gilbert 2023-04-09 13:00:45 UTC
Created attachment 186550 [details]
Screenshot of LO's rendering of this document
Comment 2 Dave Gilbert 2023-04-09 13:02:17 UTC
Created attachment 186551 [details]
Okular's nice rendering of the same file
Comment 3 V Stuart Foote 2023-04-09 15:42:52 UTC
Created attachment 186553 [details]
attachment 186549 [details] inserted to document

Works for me when I "insert" PDF a page at a time as image. See attached. When inserted you can "break" and then "consolidate" the text spans back into lexically meaningful runs to reassemble sentences and paragraphs.

Otherwise LibreOffice is not a PDF "viewer". 

YMMV but personally I would never attempt to fill a form using LibreOffice as doing so is "out of scope". IIUC these UK forms are meant to be filled online, with newer forms also using the obligatory 'GDS Transport' font. This form subsets just the IRModena-Regular and IRModena-Bold and when not local to system LibreOffice will substitute.

When "Opened" as a document (to Draw, Impress or Writer depending on filter selected) LibreOffice filter imports the text runs of the PDF creating a draw text box shape for each run--there can be multiple draw shapes per line of text and the position/size of the shape frame is dependent on combination of the PDF sequence and the font details from the PDF. When done in the Draw module you can "consolidate" the text boxes back to sentences and paragraphs, but the line heights can shift--breaking the inserted PDF image offers a little more fidelity, but both require font substitution.  PDF generated from source documents with fonts with odd metrics are going to have issues--just as here.

IMHO => NOB
Comment 4 Dave Gilbert 2023-04-09 15:55:33 UTC
Hi,
  Thanks - however, I'm not sure this is a simple font substitution screwup.
Each of the words seems to have been tab aligned - how did that happen?

(IHT411 is showing simple font substitution issues with simple places just a bit longer than needed and overlapping stuff, but I'd agree that's not a bug).

I agree about not using LO for form filling; but what I was actually trying to do
was use it when the other PDF viewers were breaking and screwing up form field values so use it for manual editing.

(We've just fixed two Okular bugs on this).
Comment 5 V Stuart Foote 2023-04-09 16:01:25 UTC
@Miklos, the Cairo based ipdf import filter does fail rather notably parsing the draw text object placements and sizing compared to the pdfium based filter. Should we do better?
Comment 6 V Stuart Foote 2023-04-09 16:20:51 UTC
ipdf filter import seemingly grid/tabed layout issues noted these builds:

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 1e9f4de320f67d1218c710bcee1969a2324c6888
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

Version: 7.5.2.2 (X86_64) / LibreOffice Community
Build ID: 53bb9681a964705cf672590721dbc85eb4d0c3a2
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

Version: 6.4.7.2 (x64)
Build ID: 639b8ac485750d5696d7590a72ef1b496725cfb5
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: GL; VCL: win; 
Locale: en-US (en_US); UI-Language: en-US
Calc: threaded
Comment 7 V Stuart Foote 2023-04-09 16:30:04 UTC
(In reply to V Stuart Foote from comment #6)
> ipdf filter import seemingly grid/tabed layout issues noted these builds:
> 
likewise with 
Version: 5.4.7.2 (x64)
Build ID: c838ef25c16710f8838b1faec480ebba495259d0
CPU threads: 8; OS: Windows 6.19; UI render: GL; 
Locale: en-US (en_US); Calc: group

so work on bug 50879 is not involved (export only but just checking).
Comment 8 QA Administrators 2025-04-09 03:11:21 UTC
Dear Dave Gilbert,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug