Description: In the output from PDFWriterImpl::emitCatalog(), `/MediaBox` parameters are truncated to integers. And we might lose precision in `/MediaBox`. Steps to Reproduce: 1. Create Writer Document. 2. Select from the menu: File -> Export As -> Export As PDF..., and press Export. 3. Save as sample.pdf 4. Inspect the sample.pdf (e.g. `cat sample.pdf`, or `grep -a /MediaBox sample.pdf`). Actual Results: There are two `/MediaBox`s: ``` /MediaBox[0 0 595.303937007874 841.889763779528] ``` and ``` /MediaBox[ 0 0 595 841 ] ``` The former is the output from `PDFPage::emit()` and the latter is from `PDFWriterImpl::emitCatalog()`. Expected Results: The two `/MediaBox` should be same: ``` /MediaBox[0 0 595.303937007874 841.889763779528] ``` and ``` /MediaBox[0 0 595.303937007874 841.889763779528] ``` Reproducible: Always User Profile Reset: No Additional Info: This is a patch for the commit eae375a224b44cd5bb18e180732668668ee9178f ======== From 132753a0026072c13506c4765622c0eada9c31b3 Mon Sep 17 00:00:00 2001 From: Lemures Lemniscati <lemures.lemniscati@gmail.com> Date: Wed, 16 Mar 2022 21:06:37 +0900 Subject: [PATCH] Use double instead of sal_Int32 for nMediaBoxWidth and nMediaBoxHeight `nMediaBoxWidth` and `nMediaBoxHeight` in `PDFWriterImpl::emitCatalog()` should be `double` in accordance with `PDFPage::emit()`. --- vcl/source/gdi/pdfwriter_impl.cxx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vcl/source/gdi/pdfwriter_impl.cxx b/vcl/source/gdi/pdfwriter_impl.cxx index fc15be2be..1847eb7d7 100644 --- a/vcl/source/gdi/pdfwriter_impl.cxx +++ b/vcl/source/gdi/pdfwriter_impl.cxx @@ -4539,8 +4539,8 @@ bool PDFWriterImpl::emitCatalog() aLine.append( getResourceDictObj() ); aLine.append( " 0 R\n" ); - sal_Int32 nMediaBoxWidth = 0; - sal_Int32 nMediaBoxHeight = 0; + double nMediaBoxWidth = 0; + double nMediaBoxHeight = 0; sal_Int32 nUserUnit = 1; if( m_aPages.empty() ) // sanity check, this should not happen { -- 2.35.1
Created attachment 178914 [details] Use double instead of sal_Int32 for nMediaBoxWidth and nMediaBoxHeight in `PDFWriterImpl::emitCatalog()`
(In reply to Lemures Lemniscati from comment #0) I tested it with a version: バージョン: 6.4.0.3 (x64) Build ID: b0a288ab3d2d4774cb44b62f04d5d28733ac6df8 CPU threads: 8; OS:Windows 10.0 Build 22000; UI render: default; VCL: win; ロケール: ja-JP (ja_JP); UIの言語: ja-JP Calc: threaded https://git.libreoffice.org/core/+log/b0a288ab3d2d4774cb44b62f04d5d28733ac6df8
Thought you might be interested in https://wiki.documentfoundation.org/Development/GetInvolved, specifically https://wiki.documentfoundation.org/Development/GetInvolved#Prepare_to_submit_patches since you've already done most of the job: - find the bug - analyze it - download code and build it - find a fix and test it So you just need to provide license statement and get a gerrit account to submit your patch for review from your behalf!
Thank you. But I didn't try compiling and testing, yet. I have a poor environment only, so I'll try it later when possible. I didn't push any patch to gerrit yet. But, I add here a statement just in case someone proceeds to making the modification with the patch: All of my past & future contributions to LibreOffice may be licensed under the MPLv2/LGPLv3+ dual license.
I gave it a try, I reproduced what you described: <</Type/Page/Parent 4 0 R/Resources 6 0 R/MediaBox[0 0 595.303937007874 841.889763779528]/Group<</S/Transparency/CS/DeviceRGB/I true>>/Contents 2 0 R>> /MediaBox[ 0 0 595 841 ] after the patch, I got: <</Type/Page/Parent 4 0 R/Resources 6 0 R/MediaBox[0 0 595.303937007874 841.889763779528]/Group<</S/Transparency/CS/DeviceRGB/I true>>/Contents 2 0 R>> /MediaBox[ 0 0 595.303937007874 841.889763779528 ] I submitted the patch for review on your behalf here: https://gerrit.libreoffice.org/c/core/+/131716 Let's wait for the feedback now.
Lemures Lemniscati committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/fb48a6408a515057a03c485bf00df5c34706595b tdf#148033: Loss of precision in /MediaBox (PDFWriterImpl::emitCatalog()) It will be available in 7.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
I cherry-picked the patch for 7.3 branch, let's wait review. Meanwhile, I assigned you since you found the fix.
Thank you for testing and pushing the patch. I've sent a license statement to libreoffice(at)lists.freedesktop.org.
Lemures Lemniscati committed a patch related to this issue. It has been pushed to "libreoffice-7-3": https://git.libreoffice.org/core/commit/48aaca5ba9c27a247ed502c3db827c6ac9f34df9 tdf#148033: Loss of precision in /MediaBox (PDFWriterImpl::emitCatalog()) It will be available in 7.3.3. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Let's put this one to FIXED now the patch is also on 7.3 branch. Good job and thank you for the fix! :-)
Thank you! :)
I try to find an information about the MediaBox in the global catalog, and can't so far. The information I find is about Page, not global. Since this bug cared about the precision of that, do you know where it's documented, and how is it used?
(In reply to Mike Kaganski from comment #12) Sorry for the noise; I see in PDF 32000-1:2008 (PDF 1.7 ISO standard), 7.7.3 "Page Tree", 7.7.3.4 "Inheritance of Page Attributes": > Some page attributes ... are designed as *inheritable*. ... > EXAMPLE A document may specify the same media box for all of its pages by > including a MediaBox entry in the root node of the page tree. If > necessary, an individual page object may override this inherited > value with a MediaBox entry of its own. The "root node of the page tree" is a "Nodes" object, having pages as its Kids - which is what was handled here (we only write one such object for all the pages). Calculation of this data for a strange value of maximal rectangle of all page sizes in the root was implemented in commit 98468607f7a8d0b1f5f7e3ecd09f756aea904d00 (INTEGRATION: CWS vcl87 (1.122.16); FILE MERGED, 2008-04-03, related to i#75941). It may result in a rectangle not corresponding for any single page - e.g., if one outputs a landscape and a portrait A4 pages, the rectangle here would have maximal width and maximal height of the two, like in /MediaBox[ 0 0 841.889763779528 841.889763779528 ] And since we export every page's MediaBox explicitly, this value is just a bloat, and should be dropped completely. By the way, the OOo bug mentioned above told about problems in some third-party software (like ImageMagick) confused by the mismatch of the arbitrary default values written there before (which came from commit df0f52d3aadea5c4d5f600d1533901af1087b464 - #100608# preparations for PDF export, 2002-07-08), and the individual pages' boxes. That was that third-party software bug - I wonder if it was addressed over the years; but anyway, not including this useless data would already prevent that problem from happening.