| Summary: | soffice builds of pdf files are unreproducible | ||
|---|---|---|---|
| Product: | LibreOffice | Reporter: | Rene Engelhard <rene> |
| Component: | Printing and PDF export | Assignee: | Not Assigned <libreoffice-bugs> |
| Status: | NEW --- | ||
| Severity: | normal | CC: | stephane.guillou, thb |
| Priority: | medium | Keywords: | filter:pdf |
| Version: | 24.2.0.3 release | ||
| Hardware: | All | ||
| OS: | All | ||
| See Also: | http://bugs.debian.org/1065448 | ||
| Whiteboard: | |||
| Crash report or crash signature: | Regression By: | ||
| Bug Depends on: | |||
| Bug Blocks: | 103378 | ||
| Attachments: |
ODP test file used for illustration
HTML diff of PDFs with decompressed streams |
||
|
Description
Rene Engelhard
2024-03-04 21:05:44 UTC
Thorsten, what do you think? Don't e.g. ODT files also store a timestamp at each save? But odt files stay the same (unless changed and re-saved of course) so are per definition reproducible. pdf files which are (in this and other cases in Debian) are rebuilt every time on every package build from a .doc/.od? differ each time. (Or, if one wants to go that route, the "source file" (od?) stays the same anyway and the "binary" (pdf) changes. That's a possible analogy) Created attachment 193373 [details]
ODP test file used for illustration
Created attachment 193374 [details]
HTML diff of PDFs with decompressed streams
Reproducibility seems indeed not possible at this stage. I've attached an example to show that there is more going on than just different time stamps. Steps to reproduce the example: 1. Export slide.odp to PDF twice. 2. Decompress streams in the two PDFs with ` mutool clean -d slide1.pdf tmp1.pdf mutool clean -d slide2.pdf tmp2.pdf 3. Generate diff html with vim: vimdiff tmp1.pdf tmp2.pdf -c TOhtml -c 'w! diff.html' -c 'qa!' There are four points where the PDFs differ: - A binary stream (length is also different). - xmp:CreateDate tag. - /CreationDate field. - PDF Trailer ID, which is just a random blob. As far as I understand, random trailer IDs are sometimes useful for document tracking, but they are not critical. It would be helpful to have an option to create reproducible PDFs, e.g. with a command-line option, or to disable all variable parts when SOURCE_DATE_EPOCH is set. OK, let's set as new. |