Bug 148033 - Loss of precision in /MediaBox: `nMediaBoxWidth` and `nMediaBoxHeight` in `PDFWriterImpl::emitCatalog()` should be `double` in accordance with `PDFPage::emit()`.
Summary: Loss of precision in /MediaBox: `nMediaBoxWidth` and `nMediaBoxHeight` in `PD...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Lemures Lemniscati
URL:
Whiteboard: target:7.4.0 target:7.3.3
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-16 12:51 UTC by Lemures Lemniscati
Modified: 2023-07-26 11:23 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Use double instead of sal_Int32 for nMediaBoxWidth and nMediaBoxHeight in `PDFWriterImpl::emitCatalog()` (1.04 KB, patch)
2022-03-16 12:55 UTC, Lemures Lemniscati
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lemures Lemniscati 2022-03-16 12:51:40 UTC
Description:
In the output from PDFWriterImpl::emitCatalog(), `/MediaBox` parameters are truncated to integers. And we might lose precision in `/MediaBox`.

Steps to Reproduce:
1. Create Writer Document.
2. Select from the menu: File -> Export As -> Export As PDF..., and press Export.
3. Save as sample.pdf
4. Inspect the sample.pdf (e.g. `cat sample.pdf`, or `grep -a /MediaBox sample.pdf`).

Actual Results:
There are two `/MediaBox`s:
```
/MediaBox[0 0 595.303937007874 841.889763779528]
```
and
```
/MediaBox[ 0 0 595 841 ]
```

The former is the output from `PDFPage::emit()` and the latter is from `PDFWriterImpl::emitCatalog()`.

Expected Results:
The two `/MediaBox` should be same:
```
/MediaBox[0 0 595.303937007874 841.889763779528]
```
and
```
/MediaBox[0 0 595.303937007874 841.889763779528]
```



Reproducible: Always


User Profile Reset: No



Additional Info:
This is a patch for the commit eae375a224b44cd5bb18e180732668668ee9178f
========
From 132753a0026072c13506c4765622c0eada9c31b3 Mon Sep 17 00:00:00 2001
From: Lemures Lemniscati <lemures.lemniscati@gmail.com>
Date: Wed, 16 Mar 2022 21:06:37 +0900
Subject: [PATCH] Use double instead of sal_Int32 for nMediaBoxWidth and
 nMediaBoxHeight

`nMediaBoxWidth` and `nMediaBoxHeight` in `PDFWriterImpl::emitCatalog()`
should be `double` in accordance with `PDFPage::emit()`.
---
 vcl/source/gdi/pdfwriter_impl.cxx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/vcl/source/gdi/pdfwriter_impl.cxx b/vcl/source/gdi/pdfwriter_impl.cxx
index fc15be2be..1847eb7d7 100644
--- a/vcl/source/gdi/pdfwriter_impl.cxx
+++ b/vcl/source/gdi/pdfwriter_impl.cxx
@@ -4539,8 +4539,8 @@ bool PDFWriterImpl::emitCatalog()
     aLine.append( getResourceDictObj() );
     aLine.append( " 0 R\n" );
 
-    sal_Int32 nMediaBoxWidth = 0;
-    sal_Int32 nMediaBoxHeight = 0;
+    double nMediaBoxWidth = 0;
+    double nMediaBoxHeight = 0;
     sal_Int32 nUserUnit = 1;
     if( m_aPages.empty() ) // sanity check, this should not happen
     {
-- 
2.35.1
Comment 1 Lemures Lemniscati 2022-03-16 12:55:58 UTC
Created attachment 178914 [details]
Use double instead of sal_Int32 for nMediaBoxWidth and  nMediaBoxHeight in `PDFWriterImpl::emitCatalog()`
Comment 2 Lemures Lemniscati 2022-03-16 13:04:47 UTC
(In reply to Lemures Lemniscati from comment #0)

I tested it with a version:
バージョン: 6.4.0.3 (x64)
Build ID: b0a288ab3d2d4774cb44b62f04d5d28733ac6df8
CPU threads: 8; OS:Windows 10.0 Build 22000; UI render: default; VCL: win; 
ロケール: ja-JP (ja_JP); UIの言語: ja-JP
Calc: threaded

https://git.libreoffice.org/core/+log/b0a288ab3d2d4774cb44b62f04d5d28733ac6df8
Comment 3 Julien Nabet 2022-03-16 19:42:01 UTC
Thought you might be interested in https://wiki.documentfoundation.org/Development/GetInvolved, specifically https://wiki.documentfoundation.org/Development/GetInvolved#Prepare_to_submit_patches since you've already done most of the job:
- find the bug
- analyze it
- download code and build it
- find a fix and test it

So you just need to provide license statement and get a gerrit account to submit your patch for review from your behalf!
Comment 4 Lemures Lemniscati 2022-03-17 14:07:17 UTC
Thank you.
But I didn't try compiling and testing, yet.
I have a poor environment only, so I'll try it later when possible.


I didn't push any patch to gerrit yet.
But, I add here a statement just in case someone proceeds to making the modification with the patch:

   All of my past & future contributions to LibreOffice may be
   licensed under the MPLv2/LGPLv3+ dual license.
Comment 5 Julien Nabet 2022-03-17 18:02:12 UTC
I gave it a try, I reproduced what you described:
<</Type/Page/Parent 4 0 R/Resources 6 0 R/MediaBox[0 0 595.303937007874 841.889763779528]/Group<</S/Transparency/CS/DeviceRGB/I true>>/Contents 2 0 R>>
/MediaBox[ 0 0 595 841 ]

after the patch, I got:
<</Type/Page/Parent 4 0 R/Resources 6 0 R/MediaBox[0 0 595.303937007874 841.889763779528]/Group<</S/Transparency/CS/DeviceRGB/I true>>/Contents 2 0 R>>
/MediaBox[ 0 0 595.303937007874 841.889763779528 ]

I submitted the patch for review on your behalf here:
https://gerrit.libreoffice.org/c/core/+/131716
Let's wait for the feedback now.
Comment 6 Commit Notification 2022-03-17 21:13:13 UTC
Lemures Lemniscati committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/fb48a6408a515057a03c485bf00df5c34706595b

tdf#148033: Loss of precision in /MediaBox (PDFWriterImpl::emitCatalog())

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Julien Nabet 2022-03-17 21:18:43 UTC
I cherry-picked the patch for 7.3 branch, let's wait review.

Meanwhile, I assigned you since you found the fix.
Comment 8 Lemures Lemniscati 2022-03-17 22:43:18 UTC
Thank you for testing and pushing the patch.
I've sent a license statement to libreoffice(at)lists.freedesktop.org.
Comment 9 Commit Notification 2022-03-18 15:17:15 UTC
Lemures Lemniscati committed a patch related to this issue.
It has been pushed to "libreoffice-7-3":

https://git.libreoffice.org/core/commit/48aaca5ba9c27a247ed502c3db827c6ac9f34df9

tdf#148033: Loss of precision in /MediaBox (PDFWriterImpl::emitCatalog())

It will be available in 7.3.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Julien Nabet 2022-03-18 15:44:22 UTC
Let's put this one to FIXED now the patch is also on 7.3 branch.

Good job and thank you for the fix! :-)
Comment 11 Lemures Lemniscati 2022-03-18 16:12:36 UTC
Thank you!  :)
Comment 12 Mike Kaganski 2023-07-26 07:51:31 UTC
I try to find an information about the MediaBox in the global catalog, and can't so far. The information I find is about Page, not global.

Since this bug cared about the precision of that, do you know where it's documented, and how is it used?
Comment 13 Mike Kaganski 2023-07-26 09:29:34 UTC
(In reply to Mike Kaganski from comment #12)

Sorry for the noise; I see in PDF 32000-1:2008 (PDF 1.7 ISO standard), 7.7.3 "Page Tree", 7.7.3.4 "Inheritance of Page Attributes":

> Some page attributes ... are designed as *inheritable*. ...
> EXAMPLE    A document may specify the same media box for all of its pages by
>            including a MediaBox entry in the root node of the page tree. If
>            necessary, an individual page object may override this inherited
>            value with a MediaBox entry of its own.

The "root node of the page tree" is a "Nodes" object, having pages as its Kids - which is what was handled here (we only write one such object for all the pages).

Calculation of this data for a strange value of maximal rectangle of all page sizes in the root was implemented in commit 98468607f7a8d0b1f5f7e3ecd09f756aea904d00 (INTEGRATION: CWS vcl87 (1.122.16); FILE MERGED, 2008-04-03, related to i#75941). It may result in a rectangle not corresponding for any single page - e.g., if one outputs a landscape and a portrait A4 pages, the rectangle here would have maximal width and maximal height of the two, like in

  /MediaBox[ 0 0 841.889763779528 841.889763779528 ]

And since we export every page's MediaBox explicitly, this value is just a bloat, and should be dropped completely.

By the way, the OOo bug mentioned above told about problems in some third-party software (like ImageMagick) confused by the mismatch of the arbitrary default values written there before (which came from commit df0f52d3aadea5c4d5f600d1533901af1087b464 - #100608# preparations for PDF export, 2002-07-08), and the individual pages' boxes. That was that third-party software bug - I wonder if it was addressed over the years; but anyway, not including this useless data would already prevent that problem from happening.