Bug 101563 - Export to PDF with linked images creates huge PDF files.
Summary: Export to PDF with linked images creates huge PDF files.
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
5.2.0.4 release
Hardware: All All
: medium major
Assignee: Not Assigned
URL:
Whiteboard: target:5.4.0 target:5.3.0.1 target:5.2.5
Keywords: bibisected, bisected, filter:pdf, regression
Depends on:
Blocks: PDF-Export
  Show dependency treegraph
 
Reported: 2016-08-16 16:19 UTC by Paddy Landau
Modified: 2017-02-01 16:51 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample document and image (2.04 MB, application/zip)
2016-08-16 16:19 UTC, Paddy Landau
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paddy Landau 2016-08-16 16:19:15 UTC
Created attachment 126857 [details]
Sample document and image

SUMMARY

When exporting a document with a linked image to PDF, the resulting PDF file is huge.

In my tests, it varies between 6 and 10 times larger than required.

STEPS TO REPRODUCE

1. Create a document with at least one linked image. (A sample document and image are provided in the attachment.)

2. Export to PDF using lossless export.

WHAT IS EXPECTED

• The exported PDF should be a size commensurate with the original document and linked images.

WHAT HAPPENS INSTEAD

• The PDF file is many times larger.

• In my tests, PDF files are between 6 and 10 times larger than from previous LibreOffice versions.

• In the attached sample, the PDF for previous versions is consistent at 2.1 Mb, but from version 5.2.0 it is ten times larger at 21 Mb.

MORE INFORMATION

• Tested on versions 5.0.6, 5.1.4, 5.1.5 and 5.2.0.
  — Versions 5.0.6, 5.1.4 and 5.1.5 all work correctly.
  — Version 5.2.0 has this bug.

• This is a regression, which used to happen a long time ago but was fixed. I haven't tested on versions prior to 5.0.6 because I don't know how to obtain those older versions.

• This might seem minor, but it is significant when the resulting PDF file greatly exceeds 100 Mb (instead of the expected 20 Mb), the PDF file is a website download, and there are many PDF files.

• An expected workaround might be to use embedded images instead of linked images, but that is not a sensible option when the images can change.
Comment 1 Julien Nabet 2016-08-20 13:39:34 UTC
On pc Debian x86-64 with master sources udpated today, I could reproduce this.

I noticed this on console when loading odt file:
warn:legacy.osl:6272:1:sw/source/core/graphic/ndgrf.cxx:596: Cannot swap in graphic
Comment 2 Paddy Landau 2016-08-20 14:56:10 UTC
@Julien Nabet, thank you for confirming this.

I don't get any error on the console, except for a segfault when closing LibreOffice:

terminate called after throwing an instance of 'com::sun::star::uno::DeploymentException'

But that has nothing to do with the document or PDF.
Comment 3 Cor Nouws 2016-08-22 10:26:22 UTC
Hi Paddy,

So you basically say that in some previous version an exported PDF linked to images and did not insert them as such. Sure?

(Isn't this the problem from bug 99723?)
Comment 4 Paddy Landau 2016-08-22 16:00:00 UTC
@Cor, no, the final PDF contains the files.

What I'm saying is that the Export to PDF somehow expands the file sizes dramatically when saving to PDF. Versions 5.0.6 to 5.2.0 do not do this.

Bug 99723 looks similar. It may be the same, but your experience was not as dramatic as mine, where the sizes were at least six times larger.
Comment 5 Julien Nabet 2016-08-22 19:22:53 UTC
Let's put back to NEW since the bug has been confirmed.
Comment 6 Cor Nouws 2016-08-25 13:17:06 UTC
(In reply to Paddy Landau from comment #4)

> Bug 99723 looks similar. It may be the same, but your experience was not as
> dramatic as mine, where the sizes were at least six times larger.

I tested with 3.3.0.4
Difference image compressed / not compressed is 259 kB <> 9MB
So definitely a huge difference.
I set as duplicate of 99723

*** This bug has been marked as a duplicate of bug 99723 ***
Comment 7 Paddy Landau 2016-08-25 17:33:50 UTC
@Cor — this is not a duplicate of bug 99723.

Bug 99723: File size is not reduced with required compression, but acts as if lossless was specified.

Bug 101563: File size is made 6–10 times larger when asking for lossless, doing the opposite of compression. (It's as if the JPG files were converted to BMP.)

--------------------------

In the sample given in this bug, the original files are
• ODT 0.1 Mb
• image 2 Mb
• total 2.2 Mb (discrepancy due to rounding).

The PDF from LO 5.1.5.2 is correct at 2.1 Mb, whereas the PDF from 5.2.0.4 is an astonishing 20.6 Mb.

Using compression 90% instead of lossless, the sample from bug 99723 results in an acceptable 2.1 Mb from both LO 5.1.5.2 and 5.2.0.4.

--------------------------

So, this is not the same problem as reported in bug 99723.
Comment 8 Xisco Faulí 2016-08-30 16:46:43 UTC
Issue introduced in range 28ac7d0f0cea9067d7faba3b72a164729df26e5d..c58655c5a221d986fa3c3eed2f28810269205721
Comment 9 Paddy Landau 2016-09-29 14:50:06 UTC
I confirm that LO 5.2.2.2 (released today) still has the bug.
Comment 10 Cor Nouws 2016-10-27 10:17:36 UTC
Hi Paddy,

Any change to test a daily build?
thanks
Comment 11 Paddy Landau 2016-10-27 10:39:07 UTC
@Cor Nouws
I've just tried running the latest Daily, but unfortunately it crashes on startup on my machine. I'll try again tomorrow with a new build.
Comment 12 Paddy Landau 2016-10-28 09:36:56 UTC
@Cor Nouws

I have tested with today's version (28-Oct-2016 04:19), and unfortunately it still has the same problem.

LibreOfficeDev 5.3.0.0.alpha1 eb07ae8fc52378d9b59bcb6a7df8bb022b8b9cc0
Comment 13 James Murray 2016-11-12 22:58:37 UTC
I'm seeing an issue with PDF size too, this may be a different bug though?

With the same source file using jpg compression at 90% and 600dpi I get a file three times larger than I used to.

-rw-rw-rw- 1 jsm jsm 11709486 Nov 12 22:46 MS3baseV30_Hardware-1.4-2015-10-12-lo5.0.0.5.pdf
-rw-rw-rw- 1 jsm jsm 36890076 Nov 12 22:55 MS3baseV30_Hardware-1.4-2015-10-12-lo5.1.2.2.pdf
-rw-rw-rw- 1 jsm jsm 37589447 Nov 12 22:47 MS3baseV30_Hardware-1.4-2015-10-12-lo5.2.2.2.pdf
Comment 14 Paddy Landau 2016-11-13 15:27:37 UTC
@James Murray
I don't know if this bug is related. Could it be related to bug 99723?
It might be worth your while to test the development version 5.3.
Comment 15 Xisco Faulí 2016-11-14 09:58:05 UTC
Yes, it seems a duplicate of bug 99723.
I can no longer reproduce it in 

Version: 5.3.0.0.alpha1+
Build ID: fef32a42c8bd8fd640d6c9cdc2f839fb43ad490c
CPU Threads: 4; OS Version: Linux 4.8; UI Render: GL; VCL: gtk3; Layout Engine: new; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group

*** This bug has been marked as a duplicate of bug 99723 ***
Comment 16 Paddy Landau 2016-11-14 10:22:25 UTC
Please do not mark this as a duplicate of bug 99723. It is for something quite different.
Comment 17 Xisco Faulí 2016-11-14 10:24:50 UTC
Why is it different?
in

Version: 5.3.0.0.alpha1+
Build ID: fef32a42c8bd8fd640d6c9cdc2f839fb43ad490c
CPU Threads: 4; OS Version: Linux 4.8; UI Render: GL; VCL: gtk3; Layout Engine: new; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group

the output file size is 89,2 kB (89215 bytes)
Comment 18 Paddy Landau 2016-11-14 10:28:16 UTC
@Xisco Fauli — see comment #7.
Comment 19 Xisco Faulí 2016-11-14 10:51:02 UTC
even though you say it's not a duplicate of bug 99723, it can be closed as RESOLVED WORKSFORME as it's no longer reproducible in 

Version: 5.3.0.0.alpha1+
Build ID: fef32a42c8bd8fd640d6c9cdc2f839fb43ad490c
CPU Threads: 4; OS Version: Linux 4.8; UI Render: GL; VCL: gtk3; Layout Engine: new; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group
Comment 20 Paddy Landau 2016-11-14 11:52:11 UTC
@Xisco Fauli — I have just downloaded the latest development version:

14 November 2016 LibreOfficeDev 5.3.0.0.alpha1

Linux Ubuntu 16.04 64-bit:
Build 2559ab66fd2976df54fc7d66bac5b7c0f7c23370

Windows 10 64-bit:
Build c5f5b3e5334c52502c1de28828a44ad469c68850

I am still getting this error on both Linux and Windows 10.

Did you check with embedded images or linked images? You can try the sample that I attached to the initial report. Again, see comment #7 for details.

Reopening.
Comment 21 Aron Budea 2016-11-15 09:25:47 UTC
I could also still reproduce it with the same build as Xisco's (home built, but the same commit). The file size of the exported PDF of the attached sample is ~20MB.

Version: 5.3.0.0.alpha1+
Build ID: fef32a42c8bd8fd640d6c9cdc2f839fb43ad490c
CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; Layout Engine: new; 
Locale: hu-HU (hu_HU); Calc: CL
Comment 22 raal 2016-11-19 18:49:28 UTC
This seems to have begun at the below commit.
Adding Cc: to Noel Grandin; Could you possibly take a look at this one? Thanks

author	Noel Grandin <noel@peralex.com>	2016-04-13 09:30:11 (GMT)
committer	Noel Grandin <noel@peralex.com>	2016-04-13 11:27:53 (GMT)
commit	19b34c0039c6293f9b37aa70f8055aa2be28ba09 (patch)
tree	04463a78141cd94ee70cd463ba7687993410c276
parent	fe8896bab01ccb595c993e54866a01f554b54f4f (diff)
loplugin:passstuffbyref in svtools

e46c94bf8b05440ece9d69b09c253d9aab6d4f6b is the first bad commit
commit e46c94bf8b05440ece9d69b09c253d9aab6d4f6b
Author: Norbert Thiebaud <nthiebaud@gmail.com>
Date:   Fri Apr 22 22:59:28 2016 -0700

    source sha:19b34c0039c6293f9b37aa70f8055aa2be28ba09
	
 git bisect log
# bad: [6380ca07b05f68dedcaa379302cfe1fa478571c4] source sha:60b74fe1775e647545d2da1fcc58a4c63ec18aa5
# good: [1f670510f08cb800cbae2a1dd6ea70d3542e4721] source sha:49c2b9808df8a6b197dec666dfc0cda6321a4306
git bisect start 'origin/master' 'oldest'
# good: [38f37b8ec1a2d199bb957cfd2581df7d1b273b74] source sha:c0da1080b61a1d51654fc34fdaeba373226065ff
git bisect good 38f37b8ec1a2d199bb957cfd2581df7d1b273b74
# good: [11ae494d8c566f23e0ef84ba0cc25fb1388b67f7] source sha:470cfa9860232ab70e017e6084d80f80d469555c
git bisect good 11ae494d8c566f23e0ef84ba0cc25fb1388b67f7
# bad: [ee4cfd75d2452b8c416b4ec27358f7a905d6f5cf] source sha:aa544a002e534a313ad9dd365e80f052789d9963
git bisect bad ee4cfd75d2452b8c416b4ec27358f7a905d6f5cf
# bad: [c59865b07f405048acae57452454009f8bc50235] source sha:b477a9e0b620a5e1c709e404c5a4e816ef5794f1
git bisect bad c59865b07f405048acae57452454009f8bc50235
# bad: [d23917903409b837fede67cc707378f23af45806] source sha:d9508c82330ffce6b20fb7ed13c7bcc01f298053
git bisect bad d23917903409b837fede67cc707378f23af45806
# bad: [f0f1ed701513ccddfe6e05c054a5c2172651d941] source sha:b8eb2946511ce617323b13dffe2b1d9704e0be60
git bisect bad f0f1ed701513ccddfe6e05c054a5c2172651d941
# good: [416a6423bc982d3e9b86f5966ad3d23debe8fd85] source sha:32102b9aa75a296b99f3fdaf370bd83bfd629f4e
git bisect good 416a6423bc982d3e9b86f5966ad3d23debe8fd85
# bad: [352ff855ee0cee178f1b605421ae6d35fee32c46] source sha:9a31442171cf8bd79574c318d91ef220ee7389bb
git bisect bad 352ff855ee0cee178f1b605421ae6d35fee32c46
# good: [393a7d3fd679779c61bdfcee1ee0e6d1ca04d5fb] source sha:299d938bf05faf60b848a9d4862e58bb42db3e65
git bisect good 393a7d3fd679779c61bdfcee1ee0e6d1ca04d5fb
# bad: [01622b95dbce9171197721077f3a710a76891a9d] source sha:ebe94af4eca68360c99f3421f1298f94747de003
git bisect bad 01622b95dbce9171197721077f3a710a76891a9d
# good: [20952a1b02c5a66c575eab2a20950876187b8c5f] source sha:523036daaddf466eee46183bbec9a71d45c48a41
git bisect good 20952a1b02c5a66c575eab2a20950876187b8c5f
# bad: [e46c94bf8b05440ece9d69b09c253d9aab6d4f6b] source sha:19b34c0039c6293f9b37aa70f8055aa2be28ba09
git bisect bad e46c94bf8b05440ece9d69b09c253d9aab6d4f6b
# good: [e5c4e40d209dd676441a205320dfd0bd68a331d4] source sha:fe8896bab01ccb595c993e54866a01f554b54f4f
git bisect good e5c4e40d209dd676441a205320dfd0bd68a331d4
# first bad commit: [e46c94bf8b05440ece9d69b09c253d9aab6d4f6b] source sha:19b34c0039c6293f9b37aa70f8055aa2be28ba09
Comment 23 Noel Grandin 2016-12-08 05:48:21 UTC
@raal reverting that commit didn't fix this problem for me.

Are you sure you didn't hit a range of commits when bibisecting?
Comment 24 raal 2016-12-08 16:40:11 UTC
(In reply to Noel Grandin from comment #23)
> @raal reverting that commit didn't fix this problem for me.
> 
> Are you sure you didn't hit a range of commits when bibisecting?

Hello Noel,
retested again, repo ~/bibisect-win32-5.2

$ git checkout e46c94bf8b05440ece9d69b09c253d9aab6d4f6b
Checking out files: 100% (21872/21872), done.
Previous HEAD position was 1f67051... source sha:49c2b9808df8a6b197dec666dfc0cda6321a4306
HEAD is now at e46c94b... source sha:19b34c0039c6293f9b37aa70f8055aa2be28ba09

bug is here

$ git checkout HEAD~1
Checking out files: 100% (83/83), done.
Previous HEAD position was e46c94b... source sha:19b34c0039c6293f9b37aa70f8055aa2be28ba09
HEAD is now at e5c4e40... source sha:fe8896bab01ccb595c993e54866a01f554b54f4f

bug is not here. So bibisect should be correct..

repo ~/bibisect-win32-5.2 is max repo, 1 result contain only 1 commit.
Comment 25 Noel Grandin 2016-12-10 12:58:02 UTC
Is this a windows only bug? Having trouble finding a point in time where this __works__ on Linux and I've gone back 500 revisions from e46c94bf8b05440ece9d69b09c253d9aab6d4f6b
Comment 26 Paddy Landau 2016-12-10 13:37:23 UTC
@Noel Grandin — No, both Windows and Linux have the bug. I don't have access to a Mac, so I can't test the Mac version.

LO versions 5.0.6, 5.1.4, 5.1.5 and 5.1.6.2 definitely all work correctly.

The bug was originally present in an old version — I don't recall which one, unfortunately — and has recurred starting with version 5.2.0.
Comment 27 Cor Nouws 2016-12-10 19:56:36 UTC
(In reply to Noel Grandin from comment #25)
> Is this a windows only bug? Having trouble finding a point in time where
> this __works__ on Linux and I've gone back 500 revisions from
> e46c94bf8b05440ece9d69b09c253d9aab6d4f6b

I've not reproduced the bug on Linux..
Comment 28 Aron Budea 2016-12-11 05:55:56 UTC
If I revert this line:
const OUString&         GetLink() const { return maLink; }
http://opengrok.libreoffice.org/xref/core/include/svtools/grfmgr.hxx#392

to:
OUString                GetLink() const { return maLink; }

...the exported PDF returns to its normal size. Quite interesting.
Comment 29 raal 2016-12-11 14:20:46 UTC
*** Bug 104479 has been marked as a duplicate of this bug. ***
Comment 30 Paddy Landau 2016-12-11 15:03:05 UTC
Please note that 104479 is a duplicate of 99723, not of this bug, which is for sometime else (please see comment #7). I shall mark the bugs as appropriate.
Comment 31 Steve Edmonds 2016-12-11 18:30:50 UTC
Bug 104479 does not seem a duplicate of either 104479 or 104479 but may be related.
In my instance I have no linked images.
Changing the compression does reduce file size.
Comment 32 Julien Nabet 2016-12-11 20:05:02 UTC
(In reply to Aron Budea from comment #28)
> If I revert this line:
> const OUString&         GetLink() const { return maLink; }
> http://opengrok.libreoffice.org/xref/core/include/svtools/grfmgr.hxx#392
> 
> to:
> OUString                GetLink() const { return maLink; }
> 
> ...the exported PDF returns to its normal size. Quite interesting.

On pc Debian x86-64 with master sources updated today, I could reproduce the initial pb. I tested the revert but it doesn't change anything. (I must recognize I just runned "make svl.build").
Do you confirm the effect of this change on your pc?
Comment 33 Julien Nabet 2016-12-11 20:06:23 UTC
(In reply to Julien Nabet from comment #32)
> (In reply to Aron Budea from comment #28)
> > If I revert this line:
> > const OUString&         GetLink() const { return maLink; }
> > http://opengrok.libreoffice.org/xref/core/include/svtools/grfmgr.hxx#392
> > 
> > to:
> > OUString                GetLink() const { return maLink; }
> > 
> > ...the exported PDF returns to its normal size. Quite interesting.
> 
> On pc Debian x86-64 with master sources updated today, I could reproduce the
> initial pb. I tested the revert but it doesn't change anything. (I must
> recognize I just runned "make svl.build").
> Do you confirm the effect of this change on your pc?

Oups, this file is called at many places, not just svl. I must run "make" at root.
Comment 34 Aron Budea 2016-12-11 21:04:13 UTC
(In reply to Julien Nabet from comment #33)
> Oups, this file is called at many places, not just svl. I must run "make" at
> root.

Yes, it's called from a lot of places. As mentioned in comment 28, this reversion fixed PDF export for me. I wanted to look into it further, because it's a very peculiar issue, but compilation took quite some time, and I haven't had the opportunity.
Comment 35 Julien Nabet 2016-12-11 23:21:35 UTC
(In reply to Aron Budea from comment #34)
> (In reply to Julien Nabet from comment #33)
> > Oups, this file is called at many places, not just svl. I must run "make" at
> > root.
> 
> Yes, it's called from a lot of places. As mentioned in comment 28, this
> reversion fixed PDF export for me. I wanted to look into it further, because
> it's a very peculiar issue, but compilation took quite some time, and I
> haven't had the opportunity.

So long to build it seems like building from scratch. I give up this one.
It's far too long for just testing this small change that I don't know how it can impact the pb.
Comment 36 Aron Budea 2016-12-12 04:07:30 UTC
Julien, no worries, thanks for giving it a try.

I tested, and could reproduce the bug with 5.3beta2 / Ubuntu 16.04 (so, in Linux).

Noel, for reproduction please note that the settings have to be changed as shown in the first image (set to Lossless compression, remove checkbox from Reduce image resolution).
Comment 37 Noel Grandin 2016-12-12 07:53:49 UTC
I'm going to push a patch with Aron's suggested change.

Would someone mind bibisecting this one on Linux? - I suspect we may have more than one cause here. Thanks.
Comment 38 Commit Notification 2016-12-12 09:03:42 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=b7f92a21a458fc6fa68894fbc881eda0a1e8325e

tdf#101563 - Export to PDF with linked images creates huge PDF files.

It will be available in 5.4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 39 Paddy Landau 2016-12-13 14:07:36 UTC
I have tested today's current version, and I'm pleased to report that the bug has been fixed.

Version: 5.4.0.0.alpha0+
Build ID: 08fa2e9307c9e4a49e18ecb0b4e9461492122fe3

Thank you, everyone who helped.
Comment 40 Steve Edmonds 2016-12-13 19:23:54 UTC
If you still have Version: 5.4.0.0.alpha0+ installed, can I ask if you could please test against the file in comment 3 of bug 104479 to see the impact. 
I have noticed the PDF size progressively growing from 5.0.6.3 to 5.2.4.1  2.5MB=>6.3MB=>11MB. 90% compression, resize images to 300dpi, only export bookmarks checked.
Comment 41 Commit Notification 2016-12-13 19:40:32 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-5-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=871d610bc9b162ae68b263d857cf4168d124d180&h=libreoffice-5-3

tdf#101563 - Export to PDF with linked images creates huge PDF files.

It will be available in 5.3.0.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 42 Julien Nabet 2016-12-13 19:42:15 UTC
Backport on 5.3 branch: https://gerrit.libreoffice.org/#/c/31973/1
Backport on 5.2 branch, on review: see https://gerrit.libreoffice.org/#/c/31974/

Let's put this one to FIXED now.
Comment 43 Steve Edmonds 2016-12-13 20:24:13 UTC
I have not delved this far forward in LO development before. If I wanted to check the effect of the backport to the 5.3 branch, when would that be built and where.
Would it be in http://dev-builds.libreoffice.org/daily/libreoffice-5-3/Win-x86@62-merge-TDF/current/ which has a 5 day old build of 5.3.0.0.beta1.
Comment 44 Julien Nabet 2016-12-13 20:37:39 UTC
(In reply to Steve Edmonds from comment #43)
> I have not delved this far forward in LO development before. If I wanted to
> check the effect of the backport to the 5.3 branch, when would that be built
> and where.
> Would it be in
> http://dev-builds.libreoffice.org/daily/libreoffice-5-3/Win-x86@62-merge-TDF/
> current/ which has a 5 day old build of 5.3.0.0.beta1.

In general you must wait for 24/48 hours (for Linux build). Sometimes it can be longer (it seems so for Win builds). You can check if the daily build includes the commit by checking "build id" of daily build.

Eg: if you go to http://dev-builds.libreoffice.org/daily/libreoffice-5-3/Win-x86_64@62-TDF/current/
There's buildinfo txt:
libreoffice-5-3~2016-12-08_16.10.30_build_info.txt    
Reading first lines, you'll find this:
core:7f47d68c4310b8bae09286a81036a6fa669a1705

Now, if you go to this url to have all the commits of 5.3 branch:
https://cgit.freedesktop.org/libreoffice/core/log/?h=libreoffice-5-3
1) Change list entry from "log msg" to "range"
2) In the blank area, copy paste 7f47d68c4310b8bae09286a81036a6fa669a1705
3) Click "Search" button
=> you'll see the last commit included in the build.
Here, it's https://cgit.freedesktop.org/libreoffice/core/commit/?h=libreoffice-5-3&id=871d610bc9b162ae68b263d857cf4168d124d180

Perhaps there's a faster way but I don't know it.
Comment 45 Aron Budea 2016-12-14 01:10:11 UTC
I think I found where this is coming from.
I put a breakpoint in the mentioned GetLink() function, which was called here during opening the file:

pSwGrfNode->SetGraphic(aGrf, rGrfObj.GetLink());
http://opengrok.libreoffice.org/xref/core/sw/source/core/docnode/swbaslnk.cxx#166

Then I followed a few levels deeper:

void GraphicObject::SetGraphic( const Graphic& rGraphic, const OUString& rLink )
{
    SetGraphic( rGraphic );
    maLink = rLink;
}

The problem here is that maLink and rLink are the same, and SetGraphic( rGraphic ) clears maLink, so the link is lost.

I'd change the line above to this:

pSwGrfNode->SetGraphic(aGrf, OUString(rGrfObj.GetLink()));

(maybe with a comment mentioning it's intentional, since rGrfObj is coming from pSwGrfNode a couple of lines earlier)

I tested this particular change in Windows with the previous version of GetLink() (returning reference), and it fixed the size of PDF export for me in Windows 7.
Noel, would you mind updating the fix?
Comment 46 Noel Grandin 2016-12-14 06:44:57 UTC
Maybe if we changed it to detect self-assignment?

void GraphicObject::SetGraphic( const Graphic& rGraphic, const OUString& rLink )
{
    // avoid self-assignment, because SetGraphic clears maLink
    if ( rGraphic != this.maGraphic && rLink != this.maLink)
    {
        SetGraphic( rGraphic );
        maLink = rLink;
    }
}
Comment 47 Noel Grandin 2016-12-14 07:26:20 UTC
Actually that should be

void GraphicObject::SetGraphic( const Graphic& rGraphic, const OUString& rLink )
{
    // avoid self-assignment, because SetGraphic clears maLink
    if ( rGraphic != this.maGraphic || rLink != this.maLink)
    {
        SetGraphic( rGraphic );
        maLink = rLink;
    }
}
Comment 48 Paddy Landau 2016-12-14 08:44:00 UTC
@Steve Edmonds re comment #40

I have posted my results on bug #104479 comment #17.
Comment 49 Commit Notification 2016-12-14 08:54:43 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-5-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=e8b9fb81685db158b8b1285b2de627573a31ed76&h=libreoffice-5-2

tdf#101563 - Export to PDF with linked images creates huge PDF files.

It will be available in 5.2.5.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 50 Aron Budea 2016-12-15 06:12:30 UTC
(In reply to Noel Grandin from comment #47)
> void GraphicObject::SetGraphic( const Graphic& rGraphic, const OUString&
> rLink )
> {
>     // avoid self-assignment, because SetGraphic clears maLink
>     if ( rGraphic != this.maGraphic || rLink != this.maLink)
>     {
>         SetGraphic( rGraphic );
>         maLink = rLink;
>     }
> }

You're right that this seems to be the best place to deal with this potential issue. I wouldn't combine the two conditions, though, if somehow the first is true (so rGraphic != this.maGraphic), but the second is false (so rLink == this.maLink, or rather &rLink == &this.maLink), the bug is still triggered.

This might never happen with the current surrounding code, but can it be ruled out completely?

How about this:

void GraphicObject::SetGraphic( const Graphic& rGraphic, const OUString& rLink )
{
    // avoid self-assignment, because SetGraphic clears maLink
    if (rGraphic == this.maGraphic)
    {
        maLink = rLink;        
    }
    else if (&rLink != &this.maLink)
    {
        SetGraphic( rGraphic );
        maLink = rLink;
    }
    else
    {
        OUString rLinkCopy;
        rLinkCopy = rLink;
        SetGraphic( rGraphic );
        maLink = rLinkCopy;
    }
}

(I haven't tested the code)
Comment 51 Noel Grandin 2016-12-15 06:48:31 UTC
Good point Aron. Something like this is probably simpler:

 void GraphicObject::SetGraphic( const Graphic& rGraphic, const OUString& rLink )
 {
     // in case we are called from a situation where rLink and maLink are the same thing,
     // we need a copy because SetGraphic clears maLink
     OUString sLinkCopy = rLink;
     SetGraphic( rGraphic );
     maLink = sLinkCopy;
 }
Comment 52 Aron Budea 2016-12-15 22:16:01 UTC
Much simpler indeed. Looks good to me.
Comment 53 Julien Nabet 2017-01-13 18:05:26 UTC
Aron/Noel: would one of you have a little time to submit a patch to gerrit with the change discussed in the last comments?
(I could do it too if you want, just tell me :-))
Comment 54 Commit Notification 2017-01-13 18:23:48 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=24fa5d0570b997cc92f1fdf412f517f8d4021207

better fix for tdf#101563: Export to PDF creates huge PDF files

It will be available in 5.4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 55 Paddy Landau 2017-01-27 15:07:18 UTC
I have installed yesterday's release 5.2.5.1.

I'm pleased to say that this bug has been fixed!

Thank you, everyone who played a part in fixing this bug.
Comment 56 Paddy Landau 2017-02-01 16:49:32 UTC
I tested this today on 5.3.0.3, and again I'm pleased to report that it has also been fixed here.

I think that this bug can be marked fixed.

Thank you again.
Comment 57 Julien Nabet 2017-02-01 16:51:02 UTC
Thank you for your feedback Paddy.
Let's put this one to FIXED then.