Bug 106651 - PDF/A documents fail validation - ISO 19005-1:2005 violations (see comment #5)
Summary: PDF/A documents fail validation - ISO 19005-1:2005 violations (see comment #5)
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
5.2.3.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: regression
Depends on:
Blocks: PDF-Export-Invalid
  Show dependency treegraph
 
Reported: 2017-03-19 23:38 UTC by Jim Avera
Modified: 2021-08-22 00:56 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Test writer document (9.08 KB, application/vnd.oasis.opendocument.text)
2017-03-19 23:40 UTC, Jim Avera
Details
Complete .xml file sent back by the pdf/a validation service (2.29 KB, application/xml)
2017-03-19 23:41 UTC, Jim Avera
Details
veraPDF failed PDF/A validation report (25.04 KB, text/html)
2017-07-19 23:06 UTC, Moritz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jim Avera 2017-03-19 23:38:48 UTC
Description:
Writer docs exported to PDF/A fail the free validation test at www.validatepdfa.com, which appears to use the validation software sold by Solid Documents.

(this may very well be an error in their validation tests, but somebody who knows this stuff may want to check that it isn'a a bona-fide LO bug)

The errors all seem to have something to do with font names being not representable in Unicode.  This seem a bit unlikely to me :-) but the error messages are
  "The font 'EAAAAA+ArialMT' violates a condition to be mapped to Unicode"
  (and similarly for every other standard font)

Now that I think about it, maybe the problem is that the encoding of the entire file is being mis-recognized by the validation software (or the PDF-A is really mis-encoded).   


Steps to Reproduce:
1. Load attached writer doc (probably any will do)
2. File->Export to PDF and select "PDF/A" in the options
3. send an email to validate@validatepdfa.com with a message body containing the following 5 lines, and the .pdf file attached (no Subject).  AFAIK this won't sign you up for spam.  But.
/Service (www.validatepdfa.com)
/Subscribe false
/Language /en
/Country /us
/ZipResults false



Actual Results:  
The service sends back an email saying the validation failed, and with a .xml attachment giving the details, which include the following:

<metadata>
        <problem severity="warning" objectID="59" clause=" TN0003" standard="pdfa">Recommended property 'format' for schema 'dc' missing</problem>
      </metadata>
      <fonts>
        <problem severity="fatalError" objectID="13" clause="6.3.8" standard="pdfa">The font 'HAAAAA+CourierNewPSMT' violates a condition to be mapped to Unicode</problem>
        <problem severity="fatalError" objectID="18" clause="6.3.8" standard="pdfa">The font 'CAAAAA+TimesNewRomanPS-ItalicMT' violates a condition to be mapped to Unicode</problem>
        <problem severity="fatalError" objectID="23" clause="6.3.8" standard="pdfa">The font 'BAAAAA+TimesNewRomanPSMT' violates a condition to be mapped to Unicode</problem>
        <problem severity="fatalError" objectID="28" clause="6.3.8" standard="pdfa">The font 'DAAAAA+TimesNewRomanPS-BoldItalicMT' violates a condition to be mapped to Unicode</problem>
        <problem severity="fatalError" objectID="33" clause="6.3.8" standard="pdfa">The font 'EAAAAA+ArialMT' violates a condition to be mapped to Unicode</problem>
        <problem severity="fatalError" objectID="38" clause="6.3.8" standard="pdfa">The font 'FAAAAA+Arial-ItalicMT' violates a condition to be mapped to Unicode</problem>
        <problem severity="fatalError" objectID="43" clause="6.3.8" standard="pdfa">The font 'IAAAAA+CourierNewPS-ItalicMT' violates a condition to be mapped to Unicode</problem>
        <problem severity="fatalError" objectID="48" clause="6.3.8" standard="pdfa">The font 'JAAAAA+CourierNewPS-BoldItalicMT' violates a condition to be mapped to Unicode</problem>
        <problem severity="fatalError" objectID="53" clause="6.3.8" standard="pdfa">The font 'GAAAAA+Arial-BoldItalicMT' violates a condition to be mapped to Unicode</problem>
      </fonts>


Expected Results:
Joy and Beauty


Reproducible: Always

User Profile Reset: No

Additional Info:


User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Comment 1 Jim Avera 2017-03-19 23:40:38 UTC
Created attachment 132018 [details]
Test writer document
Comment 2 Jim Avera 2017-03-19 23:41:27 UTC
Created attachment 132019 [details]
Complete .xml file sent back by the pdf/a validation service
Comment 3 Etienne Desautels 2017-04-04 17:38:20 UTC
I think I have the same bug: when creating a PDF/A and validating it in Acrobat I have this error:
Text cannot be mapped to unicode (154 matches on 2 pages)

And when I copy text from the produced PDF in Preview.app (OS X) all accented characters are missing or wrong. So it looks that the /ToUnicode is not well done for characters made from many glyphs.

The problem is also present when exporting to a normal (not PDF/A) PDF: bad unicode mapping.

For me the problem is present in version 5.2.3.3 but it's not present in versions 5.1.4.2 and 5.1.6.2. So that looks like a regression.
Comment 4 Buovjaga 2017-04-12 18:20:04 UTC
NEW per comment 3
Comment 5 Moritz 2017-07-19 23:06:10 UTC
Created attachment 134743 [details]
veraPDF failed PDF/A validation report

I can confirm this is still a problem in 5.2.7.2.

But, what lead me to this ticket is not the proprietary online validation service, but the validation reports by the open source offline PDF/A validator VeraPDF, which is very similar. I've attached the full report using the same t.odt provided previously in this ticket.

> Rule: Specification: ISO 19005-1:2005, Clause: 6.3.8, Test number: 1	
> 
> The font dictionary shall include a ToUnicode entry whose value is a CMap 
> stream object that maps character codes to Unicode values, as described in 
> PDF Reference 5.9, unless the font meets any of the following three 
> conditions: (*) fonts that use the predefined encodings MacRomanEncoding, 
> MacExpertEncoding or WinAnsiEncoding, or that use the predefined Identity-H 
> or Identity-V CMaps; (*) Type 1 fonts whose character names are taken from 
> the Adobe standard Latin character set or the set of named characters in the 
> Symbol font, as defined in PDF Reference Appendix D; (*) Type 0 fonts whose
> descendant CIDFont uses the Adobe-GB1, Adobe-CNS1, Adobe-Japan1 orAdobe
> Korea1 character collections.	
> 
> Failed
> 21 occurrences
> Glyph	
> toUnicode != null	
> root/document[0]/pages[0](1 0 obj PDPage)/contentStream[0](2 0 obj PDContentStream)/operators[73]/usedGlyphs[13](HAAAAA+CourierNewPSMT 10 0)
> root/document[0]/pages[0](1 0 obj PDPage)/contentStream[0](2 0 obj PDContentStream)/operators[83]/usedGlyphs[5](IAAAAA+CourierNewPS-ItalicMT 5 0)
> root/document[0]/pages[0](1 0 obj PDPage)/contentStream[0](2 0 obj PDContentStream)/operators[13]/usedGlyphs[4](BAAAAA+TimesNewRomanPSMT 5 0)
> [...]
Comment 6 Buovjaga 2018-05-29 17:09:07 UTC
Jim, Moritz: can you re-test? Already with 5.3.0 I get a correct result with profile PDF/A-1a in http://demo.verapdf.org/
Comment 7 Moritz 2018-05-29 18:24:59 UTC
(In reply to Buovjaga from comment #6)
> Jim, Moritz: can you re-test? Already with 5.3.0 I get a correct result with
> profile PDF/A-1a in http://demo.verapdf.org/

Cool. I can confirm: I just tested the t.odt from this ticket exported to PDF/A in LibreOffice 6.0.4.1 with the VeraPDF validator 1.12.1 and it now passes validation without any errors. I also tried validatepdfa.com but I did not yet get a mail back.
Comment 8 Buovjaga 2018-06-02 17:49:05 UTC
Let's close as WFM.