Bug 157517 - possible regression in PDF/UA export: PDF/UA identifier missing when PDF/A also used
Summary: possible regression in PDF/UA export: PDF/UA identifier missing when PDF/A al...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
7.6.0.2 rc
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Michael Stahl (allotropia)
URL:
Whiteboard: target:24.2.0 target:7.6.3
Keywords: accessibility, bibisected, bisected, regression
Depends on:
Blocks: PDF-Accessibility
  Show dependency treegraph
 
Reported: 2023-09-29 17:30 UTC by (hede)
Modified: 2023-10-18 20:27 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
demo file (11.37 KB, application/vnd.oasis.opendocument.text)
2023-09-29 17:31 UTC, (hede)
Details
PDF/UA generated by LibreOffice 7.6.2.1 (25.58 KB, application/pdf)
2023-09-29 17:32 UTC, (hede)
Details
PDF/UA generated by LibreOffice 7.5.6.2 (26.40 KB, application/pdf)
2023-09-29 17:32 UTC, (hede)
Details
PAC screenshot failing PDF (76.84 KB, image/png)
2023-09-29 17:32 UTC, (hede)
Details
PAC screenshot with successful PDF (46.06 KB, image/png)
2023-09-29 17:33 UTC, (hede)
Details
PDF/UA Reference Suite example - complies to both PDF/A and PDF/UA-1 (61.04 KB, application/pdf)
2023-10-16 20:58 UTC, peter.wyatt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description (hede) 2023-09-29 17:30:38 UTC
Description:
I have documents where I can check the document via PDF Accessibility Checker (PAC) 2021 and this results in success with LibreOffice 7.5.6.2 but fails if the PDF is generated via LibreOffice 7.6.2.1. The error is:
metadata -> PDF/UA identifier missing

Same options, same same document, uninstalled and reinstalled LibreOffice on the same computer.

Steps to Reproduce:
1. Create a simple odt document (sourefile.odt)
2. export it as PDF/UA with LibreOffice 7.5.6.2 (pac-OK-7562.pdf)
3. export it as PDF/UA with LibreOffice 7.6.2.1 (pac-NOK-7621.pdf)


Actual Results:
1. check the PDF generated by Libreoffice 7.5.6.2 via PAC 2021, it succeeds (pac-with-7562.png)
2. check the PDF generated by Libreoffice 7.6.2.1 via PAC 2021, it fails with "PDF/UA identifier missing" (pac-with-7621-metadata.png)

Expected Results:
PDF/UA generated by Libreoffice 7.6.2.1 should also succeed if checked via PAC 2021


Reproducible: Always


User Profile Reset: No

Additional Info:
I will attach (if I'm allowed to) demo files and screenshots from PAC
Comment 1 (hede) 2023-09-29 17:31:26 UTC
Created attachment 189888 [details]
demo file
Comment 2 (hede) 2023-09-29 17:32:03 UTC
Created attachment 189889 [details]
PDF/UA generated by LibreOffice 7.6.2.1
Comment 3 (hede) 2023-09-29 17:32:24 UTC
Created attachment 189890 [details]
PDF/UA generated by LibreOffice 7.5.6.2
Comment 4 (hede) 2023-09-29 17:32:54 UTC
Created attachment 189891 [details]
PAC screenshot failing PDF
Comment 5 (hede) 2023-09-29 17:33:52 UTC
Created attachment 189892 [details]
PAC screenshot with successful PDF
Comment 6 raal 2023-10-11 14:55:11 UTC
If someone knows accessibility checker on Linux, I can bisect. Marking as regression.
Comment 7 Stéphane Guillou (stragu) 2023-10-13 21:30:09 UTC
No repro in:

Version: 7.6.3.0.0+ (X86_64) / LibreOffice Community
Build ID: ba808a28f5ea365eaf8fe5d9c7c91b417633d75f
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

I can see that your attachment 189889 [details] checked with https://demo.verapdf.org/ using the PDF/UA-1 profile reports that the following rule failed:

Specification: ISO 14289-1:2014, Clause: 5, Test number: 1	
The PDF/UA version and conformance level of a file shall be specified using the PDF/UA Identification extension schema.	Failed
1 occurrences 
https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFUA-Part-1-rules#rule-5-1

But my own PDF/UA export of attachment 189888 [details] does not fail it.

Can you please:
- share the PDF options used in the export dialog
- test a daily build of 7.6.3: https://dev-builds.libreoffice.org/daily/libreoffice-7-6/
Comment 8 (hede) 2023-10-14 07:48:52 UTC
PDF exported with the following export options:
- PDF/A-3b
- PDF/UA
- export placeholders

I think it's the PDF/A option. Without it the PAC 2021 tests do succeed. 

I've checked two additional LibreOffice versions:
- Version 7.6.2.1 in Arch Linux
- current daily build for Windows: https://dev-builds.libreoffice.org/daily/libreoffice-7-6/Win-x86_64@tb77-TDF/2023-10-13_05.41.16/LibreOfficeDev_7.6.3.0.0_Win_x86-64.msi

Both still failing the PAC 2021 test here for me with my options.

Still it seems to be a regression as with LibreOffice 7.5.6.2 the PDF/A + PDF/UA Export succeeds.
Comment 9 Stéphane Guillou (stragu) 2023-10-14 23:41:33 UTC
Yes, I can now reproduce if I also use PDF-A/3b and PDF/UA.

Bibisected with linux-64-7.6 repo to first bad commit 3543a32a8872c7b77b978aa7c20f8f9a49af9061 which points to core commit c4b12d06698402984b3ffdbd2c139f261fa35ca1 which is a cherrypick of:

commit 41717420af68994c2fde522ea86db6e5ed643034
author	Michael Stahl 	Fri Jul 07 16:43:45 2023 +0200
committer	Michael Stahl 	Fri Jul 07 18:32:30 2023 +0200
tdf#153472 vcl: PDF/A export: produce valid XMP metadata
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/154169

Michael, I assume this change was unintended. Can you please have a look?
Comment 10 Michael Stahl (allotropia) 2023-10-16 11:18:22 UTC
so, as the commit message indicates, veraPDF claims when validating PDF/A that it is invalid to add the PDF/UA tag to the metadata.

whereas PAC claims that it really wants the PDF/UA tag to be there in a PDF/UA document.

what tags should a document that is both PDF/A and PDF/UA have in its XMP metadata?
i don't know (and currently i don't have a copy of ISO 19005...)
Comment 11 peter.wyatt 2023-10-16 20:56:40 UTC
I asked the VeraPDF team and this is what they said:

* Attachment 189889 [details] has PDF/A identification XMP, but no PDF/UA identification XMP => passes PDF/A validation (veraPDF), but doesn’t pass PDF/UA validation (both veraPDF and PAC2021)

* Attachment 189890 [details] has both PDF/A and PDF/UA identification XMPs, BUT has no extension schema for PDF/UA identification => fails PDF/A validation because it uses metadata property “pdfuaid:part” not defined nor in predefined in XMP specs, neither in any embedded extension schemas. Passes PDF/UA validation (both veraPDF and PAC2021).  

The main issue is that with some combination of version and settings LibreOffice adds both PDF/A and PDF/UA identification XMPs but fails to add the extension schema for pdfuaid properties required for PDF/A compliance. 

I added an attachment which is one of the files from PDF/UA reference suite, which complies to both PDF/A-2A and PDF/UA-1 and in particular contains the necessary XMP extension schema.
Comment 12 peter.wyatt 2023-10-16 20:58:32 UTC
Created attachment 190246 [details]
PDF/UA Reference Suite example - complies to both PDF/A and PDF/UA-1

PDF/UA Reference Suite example - complies to both PDF/A and PDF/UA-1
Comment 13 Michael Stahl (allotropia) 2023-10-17 07:45:06 UTC
thank you very much Peter!

this extension schema appears to be static XML that can just be inserted into the XMP.
Comment 14 peter.wyatt 2023-10-17 20:11:41 UTC
Correct - this is a static addition for the combination of PDF/UA used with PDF/A. And if you added further metadata then you also need to add additional extension schemas to comply with PDF/A requirements...
Comment 15 Commit Notification 2023-10-18 14:58:39 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/a4971aab4d57bf9177c55c3fb0e163e0db7c48fd

tdf#157517 vcl: PDF/UA export: add PDF/A extension schema to XMP

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Michael Stahl (allotropia) 2023-10-18 14:59:46 UTC
fixed on master
Comment 17 Commit Notification 2023-10-18 20:27:14 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-7-6":

https://git.libreoffice.org/core/commit/72163db77aa1c160a63e64d5637de4a383966c76

tdf#157517 vcl: PDF/UA export: add PDF/A extension schema to XMP

It will be available in 7.6.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.