Bug 150217 - Add import Filter support for PDFs holding XFA based form content
Summary: Add import Filter support for PDFs holding XFA based form content
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
7.2.7.2 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsDevEval
Depends on:
Blocks: PDF-Import-Draw PDF-Insert
  Show dependency treegraph
 
Reported: 2022-08-01 08:10 UTC by mikeclemmons_2000
Modified: 2022-08-02 06:20 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Draw cannot open PDF (1.84 MB, application/pdf)
2022-08-01 08:10 UTC, mikeclemmons_2000
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mikeclemmons_2000 2022-08-01 08:10:08 UTC
Created attachment 181526 [details]
Draw cannot open PDF

PDF: https://www.canb.uscourts.gov/sites/default/files/forms/denb-request-form-version-5.15.pdf

Result:

If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document. 

Expected result:

What the Firefox PDF viewer shows
Comment 1 Mike Kaganski 2022-08-01 11:00:30 UTC
Not sure it's a bug, or that we should "fix" it.

LibreOffice is *not* a PDF viewer or editor. It can import some subset of PDF into its own document model; but PDF forms, scripting, and everything dynamic is out of scope of LibreOffice IMO.

So it's completely normal that it correctly imports the placeholder.
Comment 2 V Stuart Foote 2022-08-01 15:19:26 UTC
PDF with scripted XFA (Adobe's XML Form Architecture) as here generated by Adobe's LiveCycle Designer ver. ES 10.1, remain a common source document LibreOffice users need to manipulate in some fashion. Inability to even view it is distracting.

Our current pdf import filters (both pdfium and poppler based implementations) do not parse the XML describing the form.

pdfium now has an XFA parser [1], poppler is looking at it [2]. And, mozilla has added XFA support to pdf.js [3].

XFA based forms are not going away anytime soon, so ability to at least expose the PDF form as an image with our pdfium implementation seems reasonable minimal handling.

Worth the effort?

=-ref-=

https://github.com/chromium/pdfium/tree/master/xfa
https://gitlab.freedesktop.org/poppler/poppler/-/issues/530
https://github.com/mozilla/pdf.js/issues/2373
Comment 3 Mike Kaganski 2022-08-01 15:43:54 UTC
(In reply to V Stuart Foote from comment #2)
> PDF with scripted XFA ... remain a common source document
> LibreOffice users need to manipulate in some fashion.

This needs clarifying. Why do they need that? To create a static image?

LibreOffice imports PDF as a set of graphical objects. PDF forms are means to provide data to some services. The two worlds don't intersect.

> pdfium now has an XFA parser [1], poppler is looking at it [2]. And, mozilla
> has added XFA support to pdf.js [3].

Setting aside the general-purpose PDF libraries supporting dynamic PDF content (which is natural, given their general-purposeness), Mozilla's decision is natural (given that opening PDFs in browsers is a norm, and users expect to interact with such PDFs in a normal way), and is orthogonal to how LibreOffice opens these files, so this reference is also unrelated.

> XFA based forms are not going away anytime soon, so ability to at least
> expose the PDF form as an image with our pdfium implementation seems
> reasonable minimal handling.

Again: why? Supporting some "minimal" image-like support for a purely dynamic feature seems worse than just clear "we do not support it" to me.
Comment 4 V Stuart Foote 2022-08-01 16:18:22 UTC
(In reply to Mike Kaganski from comment #3)
as you note in comment 1, LibreOffice is *not* a PDF viewer nor an editor.

For bug 89727 project implemented a simple insert from PDF as image--for what ever purpose user may need--we should at least provide ability to render an XFA based PDF form to a static image. 

To me that is reasonable minimal function in line with our position that LO is not a PDF viewer or editor. But we should be able to see the content of any PDF, XFA forms included. With bug 114234 open against a need for a dialog to manipulate insert/import of multi-page PDFS.

Beyond that, dev's choice to provided conversion on export to a functional form requiring much more complex import. There is no requirement driving LO ability to do so
Comment 5 Miklos Vajna 2022-08-02 06:20:01 UTC
pdfium has a feature flag to support xfa, but as Mike says it would be quite some effort to get that working (just to set expectations). The other trouble is that xfa would mean we also bundle the v8 javascript engine, which is again quite some maintenance.

My take would be that it's not impossible to do this (after all Chrome's pdf viewer is somewhat similar to how you can view PDFs in Draw, and there this works), but it's quite hard. It's easier for browsers, that already have a js engine at hand.