Bug 134339 - Export to pdf creates big bloated file
Summary: Export to pdf creates big bloated file
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
6.3.6.2 release
Hardware: All macOS (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: MacOS-Wishlist PDF-Export-Options-Dialog
  Show dependency treegraph
 
Reported: 2020-06-27 12:12 UTC by Hugh Craddock
Modified: 2022-09-03 10:28 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
sample file (27.08 MB, application/vnd.oasis.opendocument.text)
2020-08-11 10:42 UTC, Xisco Faulí
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hugh Craddock 2020-06-27 12:12:43 UTC
Description:
Exporting a large document with images creates a bloated pdf on the second and subsequent attempts (but normal on the first attempt after opening LO). Bloated=about x5 expected size. In the files below, about 28MB>140MB
See www.craddocks.co.uk/files/Poulton source.odt
www.craddocks.co.uk/files/Poulton_normal.pdf (exported first time after opening LO)
www.craddocks.co.uk/files/Poulton_big.pdf (subsequently)
Similar results with other files of similar size containing images.

Steps to Reproduce:
1.Open file
2.Select File/Export as/Export as PDF/Lossless compression

Actual Results:
First time after opening LO: no problem, exported file is expected size.
Second and subsequent occasions: bloated file size.

Expected Results:
Exported pdf is similar size to odt file.


Reproducible: Always


User Profile Reset: Yes



Additional Info:
Recently upgraded to 6.3.6.2 from previous version (at least 12 months old): bug not previously encountered.
Version: 6.3.6.2
Build ID: 2196df99b074d8a661f4036fca8fa0cbfa33a497
CPU threads: 4; OS: Mac OS X 10.15.5; UI render: default; VCL: osx; 
Locale: en-GB (en_GB.UTF-8); UI-Language: en-US
Calc: threaded
Comment 1 Hugh Craddock 2020-06-28 10:37:18 UTC
Can now confirm this is not happening in v.6.0.4.2, which I have reinstalled.
May be similar to bug 128978.
There is also a problem with exporting as pdf to a NAS: LO crashes instead of writing the file — again, not experienced with v.6.0.4.2.  This isn't logged separately, only mentioned here in case it's relevant.
Comment 2 Hugh Craddock 2020-06-28 10:42:19 UTC
Sorry, the source file URL should have been:
www.craddocks.co.uk/files/Poulton_source.odt
Comment 3 Rizal Muttaqin 2020-07-04 03:31:43 UTC
With lossless compression (other option set to be default), I got 105.1 MB

Versi: 6.4.3.2
ID Build: 1:6.4.3-0ubuntu0.18.04.1
Thread CPU: 4; OS: Linux 4.20; Render UI: baku; VCL: kf5; 
Locale: id-ID (id_ID.UTF-8); Bahasa-UI: id-ID
Calc: threaded

Version: 7.1.0.0.alpha0+
Build ID: 54c866828a9b85c23830e4a8be2c27b59ffd3cd5
Thread CPU: 4; OS: Linux 4.20; Render UI: baku; VCL: kf5
Locale: id-ID (id_ID.UTF-8); UI: id-ID
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2020-07-02_17:46:36
Calc: threaded
Comment 4 Xisco Faulí 2020-08-11 10:42:33 UTC
Created attachment 164149 [details]
sample file
Comment 5 Roman Kuznetsov 2020-09-16 20:10:28 UTC
It seems that options "Reduce image resolution" should be deactivate when you selected "Lossless" variant

That option by default has 300dpi value and it's just increase images size in PDF instead decrease :D

When I disabled that option I got a 28 mb PDF file

I'm not sure it's a bug.

Thorsten, you are a guru in PDF export theme. Can you say us your opinion here, please?
Comment 6 Justin L 2020-12-15 16:08:45 UTC
Sounds similar to bug 134736.
Comment 7 Justin L 2020-12-15 16:15:12 UTC
There has been a fair amount of work done in 7.1 and 7.2 around PDFs, and it seems to me that some of it is around reducing the size. For example:
	pdf: deduplicate resources when copying from external PDF stream

For an overview of any commits referencing PDF, see
https://cgit.freedesktop.org/libreoffice/core/log/?qt=grep&q=pdf.

So it is worth re-testing with 7.1 beta or master. (But I don't really have bandwidth to waste downloading a 30MB example file...)
Comment 8 Roman Kuznetsov 2020-12-15 18:07:42 UTC
(In reply to Justin L from comment #7)
> There has been a fair amount of work done in 7.1 and 7.2 around PDFs, and it
> seems to me that some of it is around reducing the size. For example:
> 	pdf: deduplicate resources when copying from external PDF stream
> 
> For an overview of any commits referencing PDF, see
> https://cgit.freedesktop.org/libreoffice/core/log/?qt=grep&q=pdf.
> 
> So it is worth re-testing with 7.1 beta or master. (But I don't really have
> bandwidth to waste downloading a 30MB example file...)

I still get 102mb PDF file in

Version: 7.2.0.0.alpha0+ (x64)
Build ID: 761a672d62df1891b9f4f367a499b220ab2b33fa
CPU threads: 4; OS: Windows 6.1 Service Pack 1 Build 7601; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: en-US
Calc: CL

and I still think that option's value 300 DPI is too big by default

I would suggest Hugh just use JPEG compression for any document.
Comment 9 Hugh Craddock 2020-12-15 18:38:20 UTC
Thank you for the recent comments. I'm afraid I'm stuck with v.6.0.4.2 at present for precisely the reason identified in the bug report, so unable easily to perform any new tests (I'm not sure I'd be comfortable running two versions).

But please note two aspects of the original report: first that the exported pdf was bloated only on the <second> such export after opening LibreOffice (i.e. the first export was roughly equivalent in size to the odt file); and secondly, that on second and subsequent exports, the pdf is about <five times> the size of the odt file.  I could change the compression or DPI settings, but I'd suggest an exported pdf ought to be roughly equivalent in size: higher compression/lower DPI ought to secure a smaller export file size, not a smaller but still bloated file size.  And as the export file size was in any case inconsistent, clearly something wasn't quite right and has changed since v.6.0.4.2.

I respect Justin's wish not to download a 30MB file for testing purposes — it's just this is the sort of file size where these problems were apparent.
Comment 10 Alex Thurgood 2021-02-10 11:52:26 UTC
Testing against 

Version: 7.0.3.1
Build ID: d7547858d014d4cf69878db179d326fc3483e082
CPU threads: 8; OS: Mac OS X 10.16; UI render: default; VCL: osx
Locale: fr-FR (fr_FR.UTF-8); Langue IHM : fr-FR
Calc: threaded

Export to PDF :
Lossless compression
Reduce resolution of images to : 300 dpi
PDF 1/A archive format

1st time : PDF file size 110.7 Mb
2nd time : PDF file size 110.7 Mb
3rd time : PDF file size 110.7 Mb

What am I doing differently, or what didn't I understand ?
Comment 11 Hugh Craddock 2021-02-10 21:11:50 UTC
(In reply to Alex Thurgood from comment #10)
> 1st time : PDF file size 110.7 Mb
> What am I doing differently, or what didn't I understand ?

Thanks Alex for taking the trouble to test and report.

That's striking.  My original post had 'reduce image resolution' unchecked. What do you get please if you export with lossless compression and no 'reduce image resolution'? It ought to be an exported pdf with similar file size to the original odt file — but wasn't in my experience.

Hugh
Comment 12 Alex Thurgood 2021-02-11 08:57:55 UTC
(In reply to Hugh Craddock from comment #11)

 
> That's striking.  My original post had 'reduce image resolution' unchecked.
> What do you get please if you export with lossless compression and no
> 'reduce image resolution'? It ought to be an exported pdf with similar file
> size to the original odt file — but wasn't in my experience.
> 

Hi Hug,

It seems that there are other similar bug reports around similar situations :

bug 119634 (which actually seems quite similar to your bug report)

bug 119492

bug 134736

bug 93462


I imagine that any export which modifies the image resolution and the compression applied will necessarily have an effect on the end file size. The problem is finding out which part of this isn't actually working correctly.
Comment 13 Alex Thurgood 2021-02-11 08:58:54 UTC
(In reply to Alex Thurgood from comment #12)
> (In reply to Hugh Craddock from comment #11)
> 
>  
> > That's striking.  My original post had 'reduce image resolution' unchecked.
> > What do you get please if you export with lossless compression and no
> > 'reduce image resolution'? It ought to be an exported pdf with similar file
> > size to the original odt file — but wasn't in my experience.
> > 
> 
> Hi Hug,
> 

Sorry Hugh, I didn't mean to cut short your name !
Comment 14 Hugh Craddock 2021-02-11 12:21:24 UTC
(In reply to Alex Thurgood from comment #13)
> (In reply to Alex Thurgood from comment #12)

Hi Alex, yes there may be an overlap with bug 119634.  I had a good look round before posting the original report, but this one escaped me.
Comment 15 Hugh Craddock 2021-06-10 12:01:20 UTC
I've now had to upgrade from v.6.0.4.2 to v.7.0.6.2, owing to a new Mac Air running OS v.11.3.1 which appears incapable of running the earlier version, and the problem remains in this later release.

To reiterate, open LO, open the specimen document (now kindly saved as attachment 164149 [details]) and export as PDF (lossless compression selected) and the file size is approximately the same as the .odt file (I had 28.4MB>28.8MB, which is what one would expect).  Now repeat the export (having changed nothing in the document nor in LO settings), and it comes out six times larger (28.4MB>139.7MB).  The export process is also slower.

This should be eminently reproducible, and as described in Bug 119634, erratic. It's also, in my view, a serious impediment to using larger documents in LO.  I've found the error occurs in documents over a threshold which lies somewhere between 10–30MB.
Comment 16 Telesto 2021-06-11 09:39:12 UTC
27,4 MB lossless with 
Version: 7.2.0.0.alpha1+ (x64) / LibreOffice Community
Build ID: 3b57ebb445df8a2bc3d916ea79f8af45e20e4e62
CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: Skia/Raster; VCL: win
Locale: nl-NL (nl_NL); UI: en-US
Calc: CL

-> Windows
Comment 17 Hugh Craddock 2021-06-11 20:17:49 UTC
Hi Telesto, what happened when you repeated the export a second time?
Comment 18 Hugh Craddock 2021-12-29 18:40:40 UTC
This problems manifests in a slightly different way in 7.1.8.1 running on a Mac running Monterey 12.0.1.

Using the sample file, which is 28.4MB, correctly saves with lossless compression as a pdf of 28.6MB.  There is no change on repeated attempts.  However, edit the file by adding just one character, and the file now saves as a pdf of around 140MB.  Saving the now-edited file in odt generates a file still at 28.4MB.

Plainly, adding one character to a file should not inflate the file size of a pdf export by a factor of five (and one would expect the pdf to be roughly proportionate in size to the odt file).
Comment 19 zcrhonek 2021-12-30 07:18:43 UTC
Tested with Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: c13db6e792cc347ffff4585f23866f195651f21f
CPU threads: 4; OS: Linux 5.11; UI render: default; VCL: gtk3
Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US
Calc: threaded Jumbo

And I can not confirm:

PDF options:
loosless compression
tick off Reduce image resolution

1st export 28,8 MB
2nd export 28,8 MB
add 2 characters to text
3rd export 28,8 MB
save odt file
4th export 28,8 MB

Maybe MacOS specific bug.
Comment 20 eisa01 2022-02-06 11:50:32 UTC
I can not confirm either. File size is around 28-29 MB regardless if I edit it or not

(Chose Lossless, and disabled reduce DPI)

If I have reduce DPI to 300 enabled, I get a 106 MB file

Does it still occur on just released LO 7.3?

Version: 7.3.0.3 / LibreOffice Community
Build ID: 0f246aa12d0eee4a0f7adcefbf7c878fc2238db3
CPU threads: 10; OS: Mac OS X 12.2; UI render: Skia/Metal; VCL: osx
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 21 QA Administrators 2022-08-06 03:34:14 UTC Comment hidden (obsolete)
Comment 22 Roman Kuznetsov 2022-08-07 11:11:00 UTC
Still repro (Lossless compression + Reduce image resolution to 300 dpi => 106 mb file size) in

Version: 7.5.0.0.alpha0+ / LibreOffice Community
Build ID: d75c5c1f61a174b3b333e9db6536ab15cc37d00b
CPU threads: 4; OS: Mac OS X 12.5; UI render: Skia/Raster; VCL: osx
Locale: ru-RU (ru_RU.UTF-8); UI: en-US
Calc: threaded Jumbo

And if I deselect the Reduce image resolution to option, then I get 28 mb result PDF file size