Bug 104479 - Export as PDF of ODT with JPGs produces much larger PDFs
Summary: Export as PDF of ODT with JPGs produces much larger PDFs
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.1.6.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, regression
: 105045 (view as bug list)
Depends on:
Blocks: PDF-Export JPEG-compression-regressions
  Show dependency treegraph
 
Reported: 2016-12-07 21:34 UTC by Steve Edmonds
Modified: 2019-07-15 15:26 UTC (History)
13 users (show)

See Also:
Crash report or crash signature:


Attachments
PDF of historical size. (2.16 MB, application/pdf)
2016-12-07 21:34 UTC, Steve Edmonds
Details
PDF from 5.2.4 (7.40 MB, application/pdf)
2016-12-07 21:35 UTC, Steve Edmonds
Details
the file being used in the transformation into pdf (1.45 MB, application/vnd.openxmlformats-officedocument.presentationml.presentation)
2017-05-08 14:43 UTC, Douglas C. R. Paes
Details
Writer document with image that is corrupted in PDFs (143.84 KB, application/vnd.oasis.opendocument.text)
2017-05-29 21:11 UTC, Steve Edmonds
Details
PDF from writer document with corrupted image. (151.29 KB, application/pdf)
2017-05-29 21:12 UTC, Steve Edmonds
Details
See comment 32 (693.12 KB, application/zip)
2017-05-30 08:30 UTC, Paddy Landau
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Edmonds 2016-12-07 21:34:12 UTC
Created attachment 129381 [details]
PDF of historical size.

PDF files produced with 5.2.4 are much larger than on 5.1.6 or 5.2.3.3.
At image resizing to 300dpi a 6MB PDF is now 10.5MB under 5.2.4
At image resizing to 150dpi the 6MB file became 2.2MB with 5.1.6 and the 10.5MB file became 7.4MB with 5.2.4.
Comment 1 Steve Edmonds 2016-12-07 21:35:47 UTC
Created attachment 129382 [details]
PDF from 5.2.4

PDF of increased size.
Comment 2 Steve Edmonds 2016-12-07 21:37:54 UTC
Full version information.
Version: 5.2.4.1
Build ID: 20m0(Build:1)
CPU Threads: 4; OS Version: Linux 3.16; UI Render: default; VCL: kde4; 
Locale: en-NZ (en_US.UTF-8); Calc: group
Comment 3 Steve Edmonds 2016-12-07 21:45:18 UTC
Writer file https://drive.google.com/open?id=0ByFEFUXgJhGkZ0ZQekRrY1dqeG8
Comment 4 Steve Edmonds 2016-12-07 22:24:31 UTC
Also noticed now in Version: 5.2.3.3, Build ID: 20m0(Build:3)
Comment 5 MM 2016-12-07 22:29:20 UTC Comment hidden (obsolete)
Comment 6 Steve Edmonds 2016-12-07 23:09:58 UTC Comment hidden (obsolete)
Comment 7 Thomas Hackert 2016-12-10 16:08:09 UTC
Hello Steve, *,
thank you very much for reporting this bug :) I can reproduce it with

OS: Debian Testing AMD64
LO: Version: 5.2.3.3
Build-ID: 1:5.2.3-2
CPU-Threads: 4; BS-Version: Linux 4.5; UI-Render: Standard; VCL: x11; 
Gebietsschema: de-DE (de_DE.UTF-8); Calc: group
(Debian's own version)

LO: Version: 5.2.3.3
Build-ID: d54a8868f08a7b39642414cf2c8ef2f228f780cf
CPU-Threads: 4; BS-Version: Linux 4.5; UI-Render: Standard; 
Gebietsschema: de-DE (de_DE.UTF-8); Calc: group

LO: Version: 5.2.4.1
Build-ID: 9b50003582f07ac674d6451e411e9b77cccd2b22
CPU-Threads: 4; BS-Version: Linux 4.5; UI-Render: Standard; VCL: gtk2; 
Gebietsschema: de-DE (de_DE.UTF-8); Calc: group

LO: Version: 5.2.3.1
Build-ID: 01ec8f357e651ca9656837b783cf7e6a32ee4d92
CPU-Threads: 4; BS-Version: Linux 4.5; UI-Render: Standard; 
Gebietsschema: de-DE (de_DE.UTF-8); Calc: group

LO: Version: 5.2.0.4
Build-ID: 066b007f5ebcc236395c7d282ba488bca6720265
CPU-Threads: 4; BS-Version: Linux 4.5; UI-Render: Standard; 
Gebietsschema: de-DE (de_DE.UTF-8)
(the last ones all parallel installed, following the instructions from https://wiki.documentfoundation.org/Installing_in_parallel/Linux)

but not in

LO: Version: 5.1.6.2
Build-ID: 07ac168c60a517dba0f0d7bc7540f5afa45f0909
CPU-Threads: 4; BS-Version: Linux 4.5; UI-Render: Standard; 
Gebietsschema: de-DE (de_DE.UTF-8); Calc: group

so setting the status to "NEw" and changing the version to 5.2.0.4.
Comment 8 raal 2016-12-11 14:20:46 UTC
Hello,

Thank you for submitting the bug. The bug has previously been reported, so this bug will be added as a duplicate of it. You will automatically be CCed to updates made to the other bug.

*** This bug has been marked as a duplicate of bug 101563 ***
Comment 9 Paddy Landau 2016-12-11 15:03:52 UTC

*** This bug has been marked as a duplicate of bug 99723 ***
Comment 10 Paddy Landau 2016-12-11 15:04:33 UTC
This is not a duplicate of bug 101563, which is for something else, but for 99723.
Comment 11 Steve Edmonds 2016-12-11 18:17:12 UTC
(In reply to Paddy Landau from comment #9)
> 
> *** This bug has been marked as a duplicate of bug 99723 ***

This is not a duplicate of Bug 99723 - Setting image Compression in PDF export does not result in smaller file size.

Changing the compression in my file does reduce image size.
Comment 12 Steve Edmonds 2016-12-11 18:19:07 UTC
(In reply to Paddy Landau from comment #10)
> This is not a duplicate of bug 101563, which is for something else, but for
> 99723.

Correct, not a duplicate of bug 101563, 101563 states linked images, mine are all embedded.
Comment 13 Steve Edmonds 2016-12-11 18:26:59 UTC
Comment 11 should read "Changing the compression in my file does reduce file (PDF) size.
Comment 14 Steve Edmonds 2016-12-13 19:11:58 UTC
This has been a progressive issue, delving further back and may be an accumulation of multiple bugs.
Using the same file (from comment 3), 90% compression, reduced 300dpi image resolution.
LO 5.0.6.3. PDF 2.5MB
LO 5.1.6.2.0+ PDF 6.3MB
LO 5.2.4.1 PDF 11MB
Comment 15 Julien Nabet 2016-12-13 20:45:47 UTC
On pc Debian x86-64 with master sources updated today (so it includes https://cgit.freedesktop.org/libreoffice/core/commit/?id=b7f92a21a458fc6fa68894fbc881eda0a1e8325e), here are the results I get:
-rw-r--r-- 1 julien julien 22909072 déc.  13 21:41 PT252-PT253manual.odt
-rw-r--r-- 1 julien julien  7563159 déc.  13 21:43 PT252-PT253manual.pdf

I tested with 90% compression, reduced 300dpi image resolution
Comment 16 Steve Edmonds 2016-12-13 21:24:19 UTC
Thanks, as I get the same results I quoted on Linux (X64) and windows (X32), this probably means the commit for 101563 doesn't fix this bug.
Comment 17 Paddy Landau 2016-12-14 08:43:12 UTC
My results are as follows.

LO version  Compression   Size (Mb)
5.1.6.2     Lossless        28.3
5.2.3.3     Lossless        28.3
5.4.0.0     Lossless        28.3

5.1.6.2     90%, 300 dpi     6.4
5.2.3.3     90%, 300 dpi    11.0
5.4.0.0     90%, 300 dpi     7.6
Comment 18 Steve Edmonds 2016-12-14 18:25:25 UTC
Thanks, I think that confirms this bug is still present as we should be getting a 2.5MB file.
Comment 19 Aron Budea 2017-01-04 06:12:22 UTC Comment hidden (bibisection)
Comment 20 Aron Budea 2017-01-04 06:32:46 UTC
This traces back to the same commit as bug 99723. Adding Cc: to Michael Meeks.

https://cgit.freedesktop.org/libreoffice/core/commit/?id=76ec54e8c9f3580450bca85236a4f5af0c328588

author	Michael Meeks <michael.meeks@collabora.com>	2016-02-08 14:24:15 (GMT)
committer	Michael Meeks <michael.meeks@collabora.com>	2016-02-09 00:09:08 (GMT)

tdf#97662 - Try to preserve original compressed JPEGs harder.


The file in question contains 106 JPG/PNG images, ~22 MB altogether, but 20 images of size 0.2 to 2 MB make up almost all of that size (and 5 of those are PNGs).

The fix to bug 101458 is responsible for this change in size (in 5.2.4.2 it's roughly the same as in 5.4.0.0 below):

(In reply to Paddy Landau from comment #17)
> 5.2.3.3     90%, 300 dpi    11.0
> 5.4.0.0     90%, 300 dpi     7.6
Comment 21 raal 2017-01-08 13:59:18 UTC
*** Bug 105045 has been marked as a duplicate of this bug. ***
Comment 22 clubchef 2017-01-26 18:47:55 UTC
When will this Bug (PDF Problem) be fixed?!
In LO 5.2.5 it is unhappily still present.
Comment 23 Aron Budea 2017-01-26 21:05:24 UTC
Clubchef, there's no ETA, but if it's causing you trouble, you could install an earlier version separately from the current one, and use that for exporting to PDF (5.0.6 and 5.1.1 are free from this regression). Details and download links are available here:
https://wiki.documentfoundation.org/Installing_in_parallel
Comment 24 Xisco Faulí 2017-03-23 10:57:05 UTC
*** Bug 106627 has been marked as a duplicate of this bug. ***
Comment 25 Douglas C. R. Paes 2017-05-08 14:43:51 UTC
Created attachment 133160 [details]
the file being used in the transformation into pdf
Comment 26 Douglas C. R. Paes 2017-05-08 15:01:55 UTC
We are facing problems trying to export PPTX files into PDF.

All the files are attached as links because of the attachments limit of 10 MB

The original file (https://drive.google.com/open?id=0B2d7BMp8tlURbHdZNGJicjNJdG8) is 1,5 MB.

The LibreOffice generated PDF (https://drive.google.com/open?id=0B2d7BMp8tlURNWQ0bGQ4YW9ONTQ) is 82,5 MB

The same PPTX converted into PDF by Microsoft Office (https://drive.google.com/open?id=0B2d7BMp8tlURX1lsMERlYlNyMkU) is 10,7 MB

Besides the size problem, the other problem we have is the CPU usage that goes 100% for  much time.

In a MAC with 16 GB of RAM and 4 cores, it took 8 minutes to finish (in the fastest try)
In another machine, with 16 GB or RAM and 8 cores, it took 5 minutes.
In our server, which is a Ubuntu 16.04 also with 16 GB or RAM and 8 cores, it takes more than 10 minutes (this server runs other services, all intensive in the resources usage, that is why it is slower).

I hope the provided files help in the problem investigation.

Let me know if you guys need more information from me.
Comment 27 Paddy Landau 2017-05-08 17:57:12 UTC
@Douglas C. R. Paes
There is the obvious question: are you exporting from MS and LO with the same settings, i.e. image compression and reduction?

I attempted this on Linux Ubuntu 16.04 (64-bit), with image compression 80% and size reduction to 150dpi.

Not only did the CPU hit 100% (one CPU at a time, which is to be expected), but also the RAM hit 100% with the swap file hitting 2Gb. There is clearly something wrong.
Comment 28 Telesto 2017-05-23 20:51:37 UTC
(In reply to Douglas C. R. Paes from comment #26)

> Besides the size problem, the other problem we have is the CPU usage that
> goes 100% for  much time.
I created a new report (bug 108038) for this one and also for comment 27 (bug 108037 because LibO is also crashing because of the memory usage)
Comment 29 Steve Edmonds 2017-05-29 21:06:52 UTC
I am posting this here first as it could be related to image export to PDF, otherwise it is another PDF export bug.
The exported PDF shows the image as a grey fuzzy shape. Now on LO 5.2.5.1 on opensuse.
Files attached named bungImage.
Comment 30 Steve Edmonds 2017-05-29 21:11:16 UTC
Created attachment 133702 [details]
Writer document with image that is corrupted in PDFs
Comment 31 Steve Edmonds 2017-05-29 21:12:37 UTC
Created attachment 133703 [details]
PDF from writer document with corrupted image.
Comment 32 Paddy Landau 2017-05-30 08:28:43 UTC
@Steve Edmonds

I get the same result as you do, and yet…

• When I download your attachment and open it, I see the same as you do.

• But when I view your attachment using Google Chrome, I see the image correctly, albeit in grayscale.

Quite bizarre.

However, as far as I can tell, this seems to be a separate issue. Please would you raise a new bug report, attach not only your document and PDF but also the original image, and post the link here so that we can see the bug report?

More information:

I extracted the image from your ODT file (using Archive Manager); re-saved the JPEG without change using GIMP (which saved as a different file size); and replaced the image in your document. This time, the export worked correctly. Therefore, I suspect that something is up with your original image — possibly, LibreOffice not recognising some metadata? Or, your original image has invalid metadata and GIMP could figure it out?

I shall attach:
• The extracted image so that you can check if it's the same as your original
• The image as re-saved by GIMP
• Your document but with the re-saved image (same size and location as the original)
• That document as a PDF
Comment 33 Paddy Landau 2017-05-30 08:30:07 UTC
Created attachment 133718 [details]
See comment 32

See comment 32 for an explanation of the contents of this ZIP file.
Comment 34 Steve Edmonds 2017-05-30 20:09:38 UTC
(In reply to Paddy Landau from comment #32)
> @Steve Edmonds
> 
> I get the same result as you do, and yet…
> 
> • When I download your attachment and open it, I see the same as you do.
> 
> • But when I view your attachment using Google Chrome, I see the image
> correctly, albeit in grayscale.
> 
> Quite bizarre.
> 
> However, as far as I can tell, this seems to be a separate issue. Please
> would you raise a new bug report, attach not only your document and PDF but
> also the original image, and post the link here so that we can see the bug
> report?
> 
> More information:
> 
> I extracted the image from your ODT file (using Archive Manager); re-saved
> the JPEG without change using GIMP (which saved as a different file size);
> and replaced the image in your document. This time, the export worked
> correctly. Therefore, I suspect that something is up with your original
> image — possibly, LibreOffice not recognising some metadata? Or, your
> original image has invalid metadata and GIMP could figure it out?
> 
> I shall attach:
> • The extracted image so that you can check if it's the same as your original
> • The image as re-saved by GIMP
> • Your document but with the re-saved image (same size and location as the
> original)
> • That document as a PDF

If I edit the image in Gimp, Writer can process it. I don't have an original image, it was provided in a document. The image extracted from my document opens fine in Gwenview and Gimp and prints to PDF fine from there. I wondered if the image handling problem leading to bloated PDFs is causing the issue I see. I will open a new bug.
Comment 35 Aron Budea 2017-05-30 21:14:52 UTC
Paddy, Steve, this other issue is bug 102928, please see bug 102928 comment 16.
Comment 36 Steve Edmonds 2017-05-30 22:54:20 UTC
Thanks, that is it, I couldn't locate that in my bug search, the jpg is CMYK. Interesting that this seemed to start about the same time as the PDF bloating.
Comment 37 Douglas C. R. Paes 2017-08-07 12:51:38 UTC
(In reply to Paddy Landau from comment #27)
> @Douglas C. R. Paes
> There is the obvious question: are you exporting from MS and LO with the
> same settings, i.e. image compression and reduction?
> 
> I attempted this on Linux Ubuntu 16.04 (64-bit), with image compression 80%
> and size reduction to 150dpi.
> 
> Not only did the CPU hit 100% (one CPU at a time, which is to be expected),
> but also the RAM hit 100% with the swap file hitting 2Gb. There is clearly
> something wrong.

Sorry, but I have received no notification about your message here.
So, I used not special parameters for the conversion using both LibreOffice and MS Office.
In both of them, the only think I did was to open the PPTX file, and then use the Save as PDF action.
Comment 38 Sascha Grebe 2018-04-12 11:11:36 UTC
The Problem with "Export as PDF produces much larger PDFs" is still existing in LO 6.0.3.2.  Exports with 80% / 125dpi are much to large.

Can anyone tell me when this will be fixed?!
Comment 39 Miklos Vajna 2018-05-18 11:26:46 UTC
I think the "too large PDF" problem was bug 105954, I fixed that recently. Can you confirm that we can close this bug as a duplicate of that? Thanks.
Comment 40 clubchef 2018-05-18 12:02:14 UTC
I can confirm, that the pdf export is still not working correctly in LO 6.0.4.2 (x64). The size of the created PDFs are much to big.
Comment 41 Aron Budea 2018-05-18 12:03:38 UTC
(In reply to Miklos Vajna from comment #39)
> I think the "too large PDF" problem was bug 105954, I fixed that recently.
> Can you confirm that we can close this bug as a duplicate of that? Thanks.
Confirm with a daily build, that is:
https://dev-builds.libreoffice.org/daily/master/

@clubchef: the fix isn't in 6.0.4.2, yet, it will be part of 6.0.5. Please test with a daily build.
Comment 42 Miklos Vajna 2018-05-18 12:28:00 UTC
I fixed that other bug yesterday, so use a daily build >= 2018-05-18, please.
Comment 43 Timur 2018-05-21 16:27:07 UTC
(In reply to Steve Edmonds from comment #14)
> Using the same file (from comment 3), 90% compression, reduced 300dpi image
> resolution.
> LO 5.0.6.3. PDF 2.5MB
> LO 5.1.6.2.0+ PDF 6.3MB
> LO 5.2.4.1 PDF 11MB

LO 6.1+ before fix: PDF 7,6 MB
LO 6.1+ after fix: PDF 2,8 MB

*** This bug has been marked as a duplicate of bug 105954 ***
Comment 44 clubchef 2018-06-22 07:48:24 UTC
Jesus, it works!!! Thanks a lot Miklos!!!
(Tested with LO 6.0.5.2 x64)
Comment 45 Yannick Chiron 2019-07-11 17:10:55 UTC
This bug was reintroduced with LibreOffice 6.2.5.2, and was fixed with 6.2.4.2.

I will add that very strangely, this bug reappears regularly on LibreOffice versions.

I know this very well, as I keep always LibreOffice updated and create pdf from odt Writer newsletter files (containing text and jpeg images).

My last letter is 895 KB in odt format.

At no compression quality, the pdf is 4.1 MB

At 99% compression quality, the pdf is 1.2 MB. 

At 95% compression quality, the pdf is 972 KB. 

At 90% compression quality, the pdf is 756 KB. 

I use a Macbook Pro (2.3 GHz Intel Core i7), macOS Mojave 10.14.5 

I can provide attachments of similar documents if needed.
Comment 46 Steve Edmonds 2019-07-11 20:59:47 UTC
I do not notice this on Version: 6.2.5.2 Build ID: 20(Build:2) from openSUSE build service. I have re-checked original file and also against a PDF created with 6.4.2 a few days ago before updating to 6.5.2.
Comment 47 Aron Budea 2019-07-11 22:37:12 UTC
Yannick, as it's a different bug, please open a new bug report, and provide samples. Feel free to add the URL to this one to the See Also field.
Comment 48 Paddy Landau 2019-07-12 09:09:42 UTC
I have noticed an oddity. This has been the case for some months.

• I open a large document and export to PDF: It's fine.

• I make any change, even a tiny one (e.g. add or delete a comma), and export to PDF: The result is a massive PDF.

• I close that changed document, reopen it without further changes, and export to PDF again: It's fine.

My latest test, with version 6.2.4.2, results in an increase from a 25.6Mb PDF to 150.0Mb. That's an extraordinary 486%. Closing and reopening the document goes back to a 25.6Mb PDF.

I've got into the habit of always closing and reopening a document before exporting to PDF.

Until this bug is fixed, I hope that this tip helps some people.
Comment 49 Yannick Chiron 2019-07-15 15:26:29 UTC
Trying to create a sample with the faulty behaviour, I wanted to blur people's faces on the picture. But then I noticed that the pdf export had a correct size. After many attempts, I tried to delete the original picture from the original odt file, save the odt file, reinsert the same picture, and do export as pdf with no loss again. And now it has a correct size as well.  This lets me very confuse, as I don't know what is sometimes happening, so that my pdf export would be big. But it seems that in these cases, one picture imported is responsible, and delete/save/reimport save the problem. 

I will add that before inserting the pictures, they all are opened in GIMP 2.10.12 to be cropped, resized, compressed, but that only some of them would provoke the bug.