Bug 66580 - exported PDF is invalid because of forbidden custom keys in the trailer
Summary: exported PDF is invalid because of forbidden custom keys in the trailer
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:pdf
Depends on:
Blocks: PDF-Export
  Show dependency treegraph
 
Reported: 2013-07-04 09:39 UTC by Jos van den Oever
Modified: 2020-01-03 10:12 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jos van den Oever 2013-07-04 09:39:34 UTC
PDF exported by LibreOffice contains the key /DocChecksum in the trailer dictionary. When an ODF document is embedded, it also contains the key /AdditionalStreams.

These keys are not defined in the PDF 1.4 specification. The specification forbids use of custom keys in the trailer:

===
A PDF producer or Acrobat plug-in extension may also add keys to any PDF
object that is implemented as a dictionary, except the file trailer dictionary (see Section 3.4.4, “File Trailer”). In addition, a PDF producer or Acrobat plug-in may create tags that indicate the role of marked-content operators (PDF 1.2), as described in Section 9.5, “Marked Content.”
===

A strict PDF validator would declare PDF documents saved by LibreOffice invalid.
Comment 1 Cor Nouws 2013-07-04 09:55:09 UTC
Hi Jos,
Thanks for the report.
I set this to New, trusting your expertise in this ;)
Is this an issue new with the 410beta2 or already in older versions too?
Best,
Cor
Comment 2 Jos van den Oever 2013-07-04 10:06:15 UTC
The /DocChecksum and /AdditionalStreams were added to OpenOffice on 2007-03-26.

In the LibreOffice git repository this is commit d217c079d7b3ca7b5039428594e7cdfdf9a0c4a9
Comment 3 Cor Nouws 2013-07-04 10:34:02 UTC
thanks! I change the version conform your info.
Comment 4 kurt.pfeifle 2014-05-04 17:14:39 UTC
What will happen on this?

Do you need suggestions about how to implement these features in a spec conforming way?
Comment 5 kurt.pfeifle 2014-05-04 17:18:29 UTC
To give you a link to the relevant PDF-1.4 specification:

     http://acroeng.adobe.com/PDFReference/PDF_1.4/PDF%20Reference%201.4.pdf

The quote given by Jos in his bug report is from named page 723 (as printed on page), page 743 (as counted from first), in appendix E, "PDF Name Registry". 

Here is a recently created website holding *all* PDF specifications ever published by Adobe:

     http://acroeng.adobe.com/wp/?page_id=321
Comment 6 kurt.pfeifle 2014-05-09 21:12:44 UTC
According to these test results:

    https://docs.google.com/spreadsheets/d/1Ok37dvlRSpzKpdKJ6gYycM5QzM7sv4_YCybHbiFMVFI

none of 36 different PDF viewers or applications did have a problem to display or process the tested hybrid PDF created by LibreOffice.

Hence I took the liberty to set importance of this bug to much lower for now. I won't protest if someone even closed it as WONTFIX unless there appears other evidence of real life problems...
Comment 7 QA Administrators 2015-06-08 14:41:36 UTC Comment hidden (obsolete)
Comment 8 QA Administrators 2016-09-20 10:00:33 UTC Comment hidden (obsolete)
Comment 9 Jos van den Oever 2016-09-20 10:59:11 UTC
PDF documents created with version 5.1.2.2.0 on Linux 4.4 still add the key /DocChecksum and /AdditionalStreams to the PDF files.
Comment 10 kurt.pfeifle 2017-10-22 20:01:35 UTC
(In reply to Jos van den Oever from comment #9)
> PDF documents created with version 5.1.2.2.0 on Linux 4.4 still add the key
> /DocChecksum and /AdditionalStreams to the PDF files.

Jos, the additional (proprietary) keys used by OpenOffice/LibreOffice to embed
the original OpenDocument file into the Hybrid PDF are not doing any real
h a r m:

  * As I showed in comment #6 none of the 36 tested PDF viewers has any problem
    opening and displaying a Hybrid PDF!

There are other reasons which would may  M E  want to modify the way LO creates
a Hybrid PDF:

  * N O N E  of the other PDF readers do have a way to detect that there is an
    embedded OpenDocument file in the PDF!

The reason for this is that the way OO/LO implemented this feature was that they did it in a non-standard, "proprietary" way -- while they could have utilized the standards-defined "embed another file into the PDF"-feature. (See for example bug95328 and comments).

And there  A R E  good use cases to be able to detect the embeddedness of the
original OpenDocument file in a PDF even by non-OO/LO applications:

  - User(s) may not be aware of this when they open the PDF in a PDF reader.
    However, the reader may draw their attention to the fact of the original
    ODT/ODS document being embedded. After all, whoever embedded the original
    document into the Hybrid PDF most likely  W A N T E D  it to be editable.

  - Users may need/want to extract the embedded ODT/ODS file without switching 
    to LibreOffice first (which may not even be installed on their currently
    used computer system).

  - I could easily think of more use cases, why it would be good to be able to
    D E T E C T  the fact of the embedded original and editable file and also
    to  E X T R A C T  it from the PDF via a software other than OO/LO.
Comment 11 QA Administrators 2018-10-23 02:50:06 UTC Comment hidden (obsolete)
Comment 12 Alexis de Lattre 2020-01-02 12:20:36 UTC
It is very very strange that a project such as LibreOffice that promote the OpenDocument standard and interoperability in general doesn't respect the PDF standard ! Adding proprietary keys in the PDF trailer that only LibreOffice can read is certainly a bad practice. Using the "Embedded Files" feature of the PDF standard is clearly the way to go !

It could be the occasion to add support for Embedded Files in the LibreOffice PDF export. Embedded Files in PDF is starting to be a widely used feature with electronic invoicing standards such as Factur-X/ZUGFeRD that use the Embedded Files feature of the PDF standard to add an XML file in a PDF invoice (to allow automatic processing of the invoice), and the possibility to add other document as attachments of the PDF (documents that justify the invoice, for example a signed acceptance form).

For instance, I recently developed a LibreOffice extension to be able to generate Factur-X invoices from LibreOffice Calc (cf https://github.com/akretion/factur-x-libreoffice-extension). This extension contains a Python macro that post-processes the PDF file generated by LibreOffice to add the XML file as attachment to the PDF. The code of this macro would be much simpler if the PDF export feature of LibreOffice had native support for Embedded Files. And generating structured electronic invoices (with Factur-X, ZUGFeRD or other standards) is starting to be compulsory in some countries (for example, it is now compulsory in France when you invoice the public sector).
Comment 13 Julien Nabet 2020-01-02 19:37:54 UTC
Michael/Miklos/Tomaž: I don't know who's PDF expert so thought one of you might have some idea.

The problem here is "AdditionalStreams" keyword doesn't exist in PDF standard.
Taking a look at git history of d217c079d7b3ca7b5039428594e7cdfdf9a0c4a9, it's been added with:
commit d217c079d7b3ca7b5039428594e7cdfdf9a0c4a9
Author: Ivo Hinkelmann <ihi@openoffice.org>
Date:   Mon Mar 26 10:21:15 2007 +0000
    INTEGRATION: CWS ipdf (1.92.80); FILE MERGED
    2007/01/19 16:08:58 pl 1.92.80.8: #137143# ecnrypt add streams
    2007/01/19 11:48:56 pl 1.92.80.7: RESYNC: (1.99-1.102); FILE MERGED
    2006/10/04 18:52:04 pl 1.92.80.6: RESYNC: (1.96-1.99); FILE MERGED
    2006/07/25 09:31:00 pl 1.92.80.5: RESYNC: (1.93-1.96); FILE MERGED
    2006/07/04 16:34:49 pl 1.92.80.4: removed a warning
    2006/07/04 13:48:22 pl 1.92.80.3: RESYNC: (1.92-1.93); FILE MERGED
    2006/06/26 15:00:09 pl 1.92.80.2: #137143# emit document checksum
    2006/06/12 16:53:42 pl 1.92.80.1: #137143# add AddStream interface

Shouldn't it be removed, put in readonly (I mean LO may read this on old files but should replace the keyword when modifying) or at minimum make this deprecated?
Instead there's "EmbeddedFiles" in specs (see https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdf_reference_archive/pdf_reference_1-7.pdf)

LO should respect PDF standard, I put this one to normal importance but it should be even higher than this.
Comment 14 Jean-Baptiste Faure 2020-01-02 20:27:41 UTC
According to comment #13 I changed version to inherited from OOo.

Best regards. JBF
Comment 15 Michael Meeks 2020-01-02 20:28:34 UTC
The hybrid PDF functionality was a great innovation, and the standard didn't cover it then of course. It would be great to find some resources / and/or interested people to implement the new standard using EmbeddedFiles. Alexis - are you interested in some code pointers there ? hacking the core to rename a few attributes and re-structuring the stream is likely to be a good start. I imagine a Collaboran would be happy to mentor someone that wanted to work on this themselves, but we can't resource a fix absent a customer ourselves today.
Comment 16 Julien Nabet 2020-01-02 21:30:53 UTC
Thank you Michael for your very quick feedback! :-)

I took a look at https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf which 1.4 version (released in 2001 according to https://fr.wikipedia.org/wiki/Portable_Document_Format#Versions). Version 1.4 had already "EmbeddedFiles" keyword (see 3.3 part).
So wondered why adding the non standard "AdditionalStreams" whereas this keyword was existing. 
Or perhaps I wrongly understood this?
Comment 17 kurt.pfeifle 2020-01-02 22:26:04 UTC
(In reply to Michael Meeks from comment #15)
> The hybrid PDF functionality was a great innovation, and the standard didn't
> cover it then of course.

Indeed the hybrid PDF functionality  i s   a great innovation.
However it could even then have been (and still can be implemented) by using
the standard conforming method of embedding the source file.

If done, this would have the advantage that every standard compliant PDF
viewer or PDF processing software could auto-discover the embedded source
file and let the user "do something" with it even in the absence of a
LibreOffice installation on his system.
Comment 18 Julien Nabet 2020-01-03 10:12:55 UTC
Here are some code pointers:
https://opengrok.libreoffice.org/search?project=core&full=AdditionalStreams&defs=&refs=&path=&hist=&type=&si=full

To create the pdf:
emitTrailer() method from vcl/source/gdi/pdfwriter_impl.cxx

The rest seems related to PDF import