Bug 74670 - Saving a pptx (created with MSO) in Impress as .pptx more than quadruples file-size (due to copies of images)
Summary: Saving a pptx (created with MSO) in Impress as .pptx more than quadruples fil...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Impress (show other bugs)
Version:
(earliest affected)
3.3.0 release
Hardware: Other All
: medium normal
Assignee: Tünde Tóth
URL:
Whiteboard: target:7.4.0
Keywords: filter:pptx
Depends on:
Blocks: PPTX
  Show dependency treegraph
 
Reported: 2014-02-07 12:34 UTC by Liam Smit
Modified: 2022-03-31 09:56 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Internal file listing of MSO.pptx, LO.pptx and LO.odp (5.12 KB, application/gzip)
2014-02-07 12:34 UTC, Liam Smit
Details
md5sum hashes of files in ppt/media (923 bytes, application/gzip)
2014-02-07 13:03 UTC, Liam Smit
Details
LibreOffice save as odp (315.42 KB, application/gzip)
2014-02-27 13:46 UTC, Liam Smit
Details
LibreOffice save as pptx (789.28 KB, application/gzip)
2014-02-27 13:47 UTC, Liam Smit
Details
Original MS Office Power Point presentation file saved in MS PowerPoint 2010. (1.42 MB, application/gzip)
2014-09-11 14:08 UTC, Liam Smit
Details
MS Office Power Point presentation format file saved in LibreOffice Impress 4.2.6. (2.15 MB, application/gzip)
2014-09-11 14:10 UTC, Liam Smit
Details
MS Office Power Point presentation file saved in LibreOffice Impress 4.2.6 in ODP format. (843.45 KB, application/gzip)
2014-09-11 14:12 UTC, Liam Smit
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Liam Smit 2014-02-07 12:34:53 UTC
Created attachment 93600 [details]
Internal file listing of MSO.pptx, LO.pptx and LO.odp

I received a presentation save in .pptx format by MS Office 2010:
1,4M Feb  7 11:28 mo_save_as_pptx.pptx

I opened it in Impress and then saved it as a .pptx file without making any changes. The resulting file is more than four times as large:
6,5M Feb  7 12:11 lo_save_as_pptx.pptx

Saving the same file as a .odp file without making any changes results in a file that is four times smaller than the original. 
368K Feb  7 14:15 lo_save_as_odp.odp

So it seems that Libre Office can import the .pptx file correctly but can not write it out correctly as a .pptx i.e. writing to .odp seems fine.

Examining the resulting files reveals that when Impress writes out a .pptx file many of the images are copies of the images used in the presentation (logos, backgrounds, etc).

Please see the attached archive for the file-listing of each of the three files i.e. MS Office .pptx, LibreOffice .pptx and LibreOffice .odp.
Comment 1 Liam Smit 2014-02-07 13:03:28 UTC Comment hidden (obsolete)
Comment 2 Liam Smit 2014-02-24 10:30:32 UTC Comment hidden (obsolete)
Comment 3 retired 2014-02-27 12:43:56 UTC Comment hidden (obsolete)
Comment 4 Liam Smit 2014-02-27 13:00:14 UTC Comment hidden (obsolete)
Comment 5 Liam Smit 2014-02-27 13:46:09 UTC Comment hidden (obsolete)
Comment 6 Liam Smit 2014-02-27 13:47:27 UTC Comment hidden (obsolete)
Comment 7 QA Administrators 2014-09-03 21:32:37 UTC Comment hidden (obsolete)
Comment 8 Liam Smit 2014-09-11 14:08:40 UTC
Created attachment 106135 [details]
Original MS Office Power Point presentation file saved in MS PowerPoint 2010.

Original MS Office Power Point presentation file saved in MS PowerPoint 2010.

Note the text and numbers were scrambled.
Comment 9 Liam Smit 2014-09-11 14:10:45 UTC
Created attachment 106136 [details]
MS Office Power Point presentation format file saved in LibreOffice Impress 4.2.6.

Note text and numbers intentionally scrambled.
Comment 10 Liam Smit 2014-09-11 14:12:56 UTC
Created attachment 106137 [details]
MS Office Power Point presentation file saved in LibreOffice Impress 4.2.6 in ODP format.

Note text and numbers were intentionally scrambled.
Comment 11 Liam Smit 2014-09-11 14:28:05 UTC
OK let's start again. Please ignore everything before Comment 8.

I've uploaded three files:

1.) PowerPoint format file saved in MS Office PowerPoint 2010 which is 1.6MB big:
1.6M Sep 11 15:26 Staff_Update_September_2014_mso.pptx

2.) Open the file from 1.) in LibreOffice Impress 4.2.6 and save it as a pptx results in it more than tripling in size to 5.8MB:
5.8M Sep 11 15:31 Staff_Update_September_2014_lo.pptx

3.) Open the file from 1.) in LibreOffice Impress 4.2.6 and save it as a .odp results in it halving in size to 0.8MB:
883K Sep 11 15:50 Staff_Update_September_2014_lo.odp


From what I can determine, when Impress writes out a .pptx file (i.e. file 2 above) then many of the images that are used in the presentation (logos, backgrounds, etc) get saved multiple times as separate copies of the same image.


Possibly a problem in the export filter to .pptx?
Comment 12 ign_christian 2014-09-12 15:29:16 UTC
Confirmed under Ubuntu 12.04 x86 with:
- LO 4.0.6.2 : size become 3.1 MB
- LO 4.1.6.2 : size become 5.9 MB
- LO 4.2.6.3 : size become 6.0 MB
- LO 4.3.1.2 : can't be saved again as pptx, strange..

After unzipping the pptx, under 'media' folder we can see many duplicate images. In original pptx, 'media' folder only contains 21 image files. But after saving in 4.2.6.3 or 4.1.6.2, it grows to 269 files. In 4.0.6.2: 110 files.

Not all image has duplicates, most duplicates is logo in footer (possibly because of master slides).
Comment 13 ign_christian 2014-09-12 15:35:36 UTC
(In reply to comment #12)
Sorry..I forgot to tell, only tried with saving as pptx.
Comment 14 QA Administrators 2016-02-21 08:36:41 UTC Comment hidden (obsolete)
Comment 15 Liam Smit 2016-02-22 06:55:39 UTC
Problem still exists in LibreOffice 5.1:

Original MS Office file:
1,6M Sep 11  2014 Staff_Update_September_2014_mso.pptx


Saved as PPTX by LO 5.1
5,8M Feb 22 08:41 Staff_Update_September_2014_mso_lo_5-1.pptx


Saved as ODP by LO 5.1:
833K Feb 22 08:40 Staff_Update_September_2014_mso_lo_5-1.odp


OS: Ubuntu 14.04.4 LTS
LO: Via PPA (5.1.0 rc3)
Comment 16 QA Administrators 2017-03-06 15:35:37 UTC Comment hidden (obsolete)
Comment 17 Roman Kuznetsov 2019-01-11 12:08:13 UTC
still repro in

Version: 6.3.0.0.alpha0+
Build ID: 6b4ea2d8ddd681fec98773d7e0bbec9657a1fc08
CPU threads: 4; OS: Windows 6.1; UI render: default; VCL: win; 
Locale: ru-RU (ru_RU); UI-Language: en-US
Calc: threaded

I got 3.2 mb for resaved PPTX
Comment 18 QA Administrators 2021-01-11 03:57:04 UTC Comment hidden (obsolete)
Comment 19 NISZ LibreOffice Team 2021-01-12 14:07:44 UTC
Still goes from 1.5 Mb to 3.2 in:

Version: 7.2.0.0.alpha0+ (x64)
Build ID: 80497c7d81af36f703d122ac78baa26387a5854d
CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: Skia/Raster; VCL: win
Locale: en-US (hu_HU); UI: en-US
Calc: CL
Comment 20 Commit Notification 2022-03-30 16:25:52 UTC
Tünde Tóth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/aea8043bc5f5187498fa450505d6de9d6986e2a6

tdf#74670 tdf#91286 PPTX XLSX export: save image once

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 NISZ LibreOffice Team 2022-03-31 09:56:26 UTC
Verified in:
Version: 7.4.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: a3988b2d147a2442b348d58b79dbd6e71472b7af
CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: Skia/Raster; VCL: win
Locale: hu-HU (hu_HU); UI: en-US
Calc: threaded