Bug 101611 - pdfimport filter does not honor page cropping (masking) as set in a PDF document, resulting pages in LO document are oversize (comment 4)
Summary: pdfimport filter does not honor page cropping (masking) as set in a PDF docum...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
5.1.4.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Dave Gilbert
URL:
Whiteboard: target:25.2.0
Keywords: filter:pdf
Depends on:
Blocks: PDF-Import-Draw
  Show dependency treegraph
 
Reported: 2016-08-19 11:56 UTC by E.Mi
Modified: 2024-08-29 13:02 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
pdf (1.86 MB, application/pdf)
2016-08-19 11:56 UTC, E.Mi
Details
Side by side (347.91 KB, image/jpeg)
2016-08-19 11:57 UTC, E.Mi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description E.Mi 2016-08-19 11:56:30 UTC
Created attachment 126906 [details]
pdf
Comment 1 E.Mi 2016-08-19 11:57:40 UTC
Created attachment 126907 [details]
Side by side
Comment 2 V Stuart Foote 2016-08-21 02:59:38 UTC
This is clearly a duplicate of bug 101220 -- in this case the embedded TimesNewRomanPS font family is not being extracted and used in the pdfimport filter.

The fallback font selected on Linux does not match font metrics (Windows does a bit better)--while that fallback could be improved potentially, the correct way to resolve is the extract the embedded font(s) and render the document with better fidelity using the source fonts.

@ekari, please stop making these duplicate bug submissions for PDF exhibiting poor fidelity on font substitution--they are all the same issue.

*** This bug has been marked as a duplicate of bug 101220 ***
Comment 3 E.Mi 2016-08-21 07:53:56 UTC
I was referring to the blue fluxogram that is bigger than the original and the red fluxogram has a strange image instead of a bell
Comment 4 V Stuart Foote 2016-08-21 12:23:11 UTC
The oversize bubble object is because on import the page is not being cropped and formatted as specified in the PDF.

The sample document base page is 9.92" x 6.99", and crop values of "0.583 in" top and bottom, and "0.833 in" left, "0.819 in" right are applied to the document specifying an intended size of 8.26 inch  x 5.82 inch.

The LibreOffice pdfimport filter mishandles that. It only sees the 9.92" x 6.99" base page size, and then applies margins of 0.20" left & right and 0.39" top & bottom. The crop margins that should mask/resize the page are not handled and are lost.

This is similar to issues of bug 86211 which is the general case that clipping is not implemented, but here it is more specific in that the import filter does not recognize the page cropping that should be applied. So the resulting page size of the document in LibreOffice is oversize to what is described in PDF and then additional margin space is added.

Testing a recent master on Windows 10 Pro 64-bit en-US where upgrading to current poppler (ver 0.46) as done for bug 101460 is present does not affect this aspect of the import filter.

Version: 5.3.0.0.alpha0+ (x64)
Build ID: 932804559e845fb8ec6ac3a3b49308136a7e81e6
CPU Threads: 8; OS Version: Windows 6.19; UI Render: GL; 
TinderBox: Win-x86_64@62-TDF, Branch:MASTER, Time: 2016-08-20_21:42:18
Locale: en-US (en_US); Calc: CL

Restating the issue and to NEW.

Otherwise, the Bell glyph (a PUA U+F041 symbol) is from the subset embedded MSOutlook symbol font that is bug 101220 as are the other font layout issues.
Comment 5 V Stuart Foote 2016-08-21 12:52:30 UTC
@ekari, thanks for reporting this valid issue. But please do not attach complete documents as examples. It does not help the QA process. It is much better to extract a page or two, and provide *annotated* screen clips--especially if unable to extract pages from the example PDF.

And, *please* be mindful of copyright--the commercial documents you submit are clearly not covered by a Creative Commons license. Extracting a page or two meets "fair use" tests for copyright, attaching the whole document is questionable.
Comment 6 E.Mi 2016-08-21 19:14:47 UTC
@V Stuart Foote What software do you recommend to use to extract a page without altering the contents? I tried GIMP and it removed the embedded fonts and other stuff..
Comment 7 V Stuart Foote 2016-08-21 20:55:17 UTC
I hold license for and use both iceni Infix PDF editor (v 6.50) and Adobe Acrobat (v 9.5.5) --both companies are moving to subscription based licensing for more current releases-- but either of which make unadulterated page extractions of content but tweak some of the meta data, but there are other choices.

And on Linux a number of products will allow you to "extract" pages without structural changes to the content, but again some meta data tweaks.

Master PDF Editor
https://code-industry.net/masterpdfeditor/

PDF Studio
http://www.qoppa.com/pdfstudio/
Comment 8 QA Administrators 2017-09-01 11:21:04 UTC Comment hidden (obsolete)
Comment 9 V Stuart Foote 2019-03-18 16:35:25 UTC
Issue of the pdfimport filter not correctly cropping PDF page remains in recent master/6.3.0alpha0+ build.

Version: 6.3.0.0.alpha0+
Build ID: 5fe551931d49a64ca4ea793a5016c098e41e84cd
CPU threads: 8; OS: Windows 10.0; UI render: default; VCL: win; 
Locale: en-US (en_US); UI-Language: en-US
Calc: CL
Comment 10 QA Administrators 2021-03-18 04:16:53 UTC Comment hidden (obsolete)
Comment 11 QA Administrators 2023-03-19 03:25:29 UTC Comment hidden (obsolete)
Comment 12 Dave Gilbert 2024-08-24 23:23:28 UTC
I'm working on a fix that happens to also fix this I think.
Although your test file has been useful, since it's highlighting a bug in my fix, so I'll try and get that fixed first.
In short there's very little clipping in the pdf code!
Comment 13 V Stuart Foote 2024-08-25 11:19:13 UTC
(In reply to Dave Gilbert from comment #12)
> I'm working on a fix that happens to also fix this I think.
> Although your test file has been useful, since it's highlighting a bug in my
> fix, so I'll try and get that fixed first.
> In short there's very little clipping in the pdf code!

@Dave G.

Thank you. As you proceed, perhaps check the see also bug 86211 when working up your patch for applicability there. And assign that issue to yourself as well. 

Then when you submit patches implementing clipping masks via git please just include both tdf#101611 *and* tdf#86211 in the first line of the patch.
Comment 14 Dave Gilbert 2024-08-26 12:25:45 UTC
Side note: The example pdf here uses JPX/Jpeg2000 embedded images; when we build our embedded libpoppler we don't enable that (since we don't have a JPEG2000 lib built?) so all the images on this document appear blank.  It works fine when using Fedora/Debian's system libpoppler.
This message has been brought to you to warn my future self not to lose another day on it.
Comment 15 Commit Notification 2024-08-29 12:30:17 UTC
Dr. David Alan Gilbert committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b416c5b8e32632a63e1e791c34896e17d89f7982

tdf#101611, tdf#108813, tdf#86211, sdext,pdfimport: Clip fills

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Dave Gilbert 2024-08-29 13:02:15 UTC
The commit just pushed to master seems to fix this fine.