Bug 160196 - Combined "Hybrid PDF" and "Archival PDF" options generate non-conformant PDF/A files
Summary: Combined "Hybrid PDF" and "Archival PDF" options generate non-conformant PDF/...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
24.2.1.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:25.8.0
Keywords:
Depends on:
Blocks: PDF-Export
  Show dependency treegraph
 
Reported: 2024-03-14 06:55 UTC by peter.wyatt
Modified: 2025-01-22 14:33 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Invalid PDF/A-2 file with file attachment (Hybrid PDF w/ ODF) (38.20 KB, application/pdf)
2024-03-14 23:43 UTC, peter.wyatt
Details
From Dev build: incorrect PDF/UA-1 conformance in PDF 2.0 file (68.99 KB, application/pdf)
2025-01-20 02:33 UTC, peter.wyatt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description peter.wyatt 2024-03-14 06:55:46 UTC
LibreOffice 24.2 release allows a user to check both(!) the "Hybrid PDF (embed ODF file)" and "Archival (PDF/A, ISO 19005)" options. However the resultant PDF can  NEVER BE COMPLIANT to PDF/A-1 or PDF/A-2 for any conformance levels as embedded files are not permitted by any of these older PDF/A standards. 

You need to use PDF/A-3 (ISO 19005-3:2012) - quoting introduction of PDF/A-3:

"This part of ISO 19005 adds a new goal (beyond that of ISO 19005-2) which is to enable PDF documents to serve as containers for other file formats, so that a single physical file can contain not only the visual representation but also other representations including the original authored version, richer semantic formats, and others. This part of ISO 19005 does not address the long-term suitability of formats, that may be embedded, other than those compliant with any part of this International Standard." - see https://www.iso.org/obp/ui/en/#iso:std:iso:19005:-3:ed-1:v1:en

Without checking the details of every possible PDF output that LibreOffice might generate, it is probably safe (or at least very very close) to simply change the XMP metadata conformance information to be PDF/A-3 instead of PDF/A-2 for this situation. Of course, you should validate the generated PDFs with a tool such as veraPDF to check conformance.

PS. When you look at PDF/A-3, also read Annex E "Associated Files", since the embedded ODF file is the "source" of the PDF export so additionally adding the AF entry to the DocCatalog with an AFRelationship key value of Source. Associated Files were also directly adopted into PDF 2.0 (ISO 32000-2) to facilitate open data transfer and are being used by the likes of LaTeX for MathML, etc. A few PDF readers are also now supporting Associated Files in their attachment panes.
Comment 1 peter.wyatt 2024-03-14 23:43:53 UTC
Created attachment 193117 [details]
Invalid PDF/A-2 file with file attachment (Hybrid PDF w/ ODF)

Example invalid PDF/A-2 that contains the ODF file attachment
Comment 2 Tomaz Vajngerl 2024-04-04 10:49:32 UTC
Good point - thanks!
Comment 3 peter.wyatt 2024-04-05 02:27:29 UTC
If you want to pass me some sample corrected PDFs, I'm more than happy to check them for you for both ISO subset compliance as well as correctness against the PDF specifications.
Comment 4 Commit Notification 2024-12-26 06:52:14 UTC
Tomaž Vajngerl committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/5364b7b663b2be30d802474618dea3db14a2182a

tdf#160196 Embedded files are not allowed in PDF/A-1 and PDF/A-2

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 5 Tomaz Vajngerl 2025-01-13 06:26:45 UTC
This is fixed in master...
Comment 6 peter.wyatt 2025-01-20 02:32:28 UTC
Thanks for your efforts!

I'm using the LibreOffice-Dev release to check things and noted the following:

* a PDF saved with options "PDF 2.0 base" and PDF/UA and Hybrid ODF is correctly identified as PDF/A-4f but has PDF/UA-1 instead of PDF/UA-2 conformance. PDF/UA-1 only applies to PDF 1.x files as PDF 2.0 files have to use PDF/UA-2. See attachment "pdfa-4f+ua+odf.pdf"

I tried testing with VeraPDF/Rest Docker container but this also has a bug (see https://github.com/veraPDF/veraPDF-library/issues/1501). Please note that the current veraPDF implementation only tests against a single declared PDF ISO subset so you will need to manually select each conformance level separately.
Comment 7 peter.wyatt 2025-01-20 02:33:23 UTC
Created attachment 198617 [details]
From Dev build: incorrect PDF/UA-1 conformance in PDF 2.0 file
Comment 8 peter.wyatt 2025-01-20 02:55:38 UTC
More:

* PDF/A-3 files that include "Hybrid ODF" also need to use Associated Files. ISO 19005-3 was the first place that Associated Files was documented before it was included in PDF 2.0 - so do what you do for PDF 2.0 (i.e. add AFRelationship=Source for the ODF to the file spec dictionary) and all will be good!

You can use veraPDF to confirm this.
Comment 9 Tomaz Vajngerl 2025-01-22 06:25:55 UTC
Hi Peter, 

I have fixed most of what you reported - just need to get them into master.

One thing that is left for PDF/UA-2 (when using an empty document as source) according to VeraPDF is:
<rule specification="ISO 14289-2:2024" clause="8.8" testNumber="1" status="failed" failedChecks="1" tags="syntax"> 
    <description>All destinations whose target lies within the current document shall be structure destinations</description>
    <object>PDDestination</object>
    <test>isStructDestination == true</test>
    <check status="failed">
        <context>root/document[0]/OpenActionDestination[0]</context>
        <errorMessage>Destination in Outline item, OpenAction or Link annotation is not a structure destination</errorMessage>
    </check>
</rule>

Do you know what this is referring to? (I don't have the specs for PDF/UA-2)
Comment 10 Commit Notification 2025-01-22 09:12:34 UTC
Tomaž Vajngerl committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/d23c5560c13fc10ef367ae1d3bbf6c790dde2a61

tdf#160196 add support for PDF/UA-2 and NS for struct. elem.

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 peter.wyatt 2025-01-22 12:34:32 UTC
Hi Tomaz,
   Structure Destinations (SDs) are clause 12.3.2.3 of ISO 32000-2:2020 (no cost edition). They look like standard Destinations but reference a structure tree element rather than a page object. 

PDF/UA-2 is a licensed ISO publication so I cannot share or quote it as you know, but if you look at the PDF Association's free "Well-Tagged PDF" Specification (https://pdfa.org/wtpdf) and follow its guidance you will be 100% compliant to PDF/UA-2 in all aspects. Helpfully, the WTPDF clauses use the same numbers as PDF/UA-2 so just look at WTPDF clause 8.8 for SDs...

BTW I did find other issues with my testing of LibreOffice-Dev concerning the combination of PDF/A-1 and PDF/UA-1. This is a very interesting combination as PDF/A-1 was written against Adobe PDF 1.4 while PDF/UA-1 was written against ISO 32000-1:2008 (PDF 1.7) - this means some PDF/UA requirements cannot be met with a PDF 1.4 feature set (e.g. the Tabs entry; certain structure elements; etc as they were all introduced from PDF 1.5 to PDF 1.7). PDF/A-2 and PDF/A-3 are both also written against ISO 32000-1:2008 so either of these will be OK. 

I'm seeking guidance and recommendations from our community of experts, but I think(!!) our recommendation will be PDF/UA-1 is only compatible with PDF/A-2 or PDF/A-3 and not PDF/A-1 - and that such dual-conforming files should always be marked as PDF 1.7 (unless you really want to write a lot of code to jump thro' hoops and work out precise versioning - which I would strongly recommend against). Technically PDF/A-1 is superseded by PDF/A-2 and PDF/A-3 and equivalent capable - there is NO technical reason to PDF/A-1 anymore. 

When there is a formal answer available I will post back here but for now please consider the above (i.e. don't allow PDF/A-1 and PDF/UA-1 combination - or force it to PDF/A-2 or PDF/A-3 whenever PDF/UA-1 is used). Similarly PDF/UA-1 cannot be used with PDF 2.0 files - that is what PDF/UA-2 is for.
Comment 12 Commit Notification 2025-01-22 14:33:46 UTC
Tomaž Vajngerl committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/31b601d8fc2afa73fb0afd9c1d58ad488aa038cf

tdf#160196 add /AF entry to catalog, enable assoc. files in PDF/A-3

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.