Bug 74670

Summary: Saving a pptx (created with MSO) in Impress as .pptx more than quadruples file-size (due to copies of images)
Product: LibreOffice Reporter: Liam Smit <liam.smit>
Component: ImpressAssignee: Tünde Tóth <tundeth>
Status: VERIFIED FIXED    
Severity: normal CC: libreoffice
Priority: medium Keywords: filter:pptx
Version: 3.3.0 release   
Hardware: Other   
OS: All   
See Also: https://bugs.documentfoundation.org/show_bug.cgi?id=118535
https://bugs.documentfoundation.org/show_bug.cgi?id=91286
Whiteboard: target:7.4.0
Crash report or crash signature: Regression By:
Bug Depends on:    
Bug Blocks: 108226    
Attachments: Internal file listing of MSO.pptx, LO.pptx and LO.odp
md5sum hashes of files in ppt/media
LibreOffice save as odp
LibreOffice save as pptx
Original MS Office Power Point presentation file saved in MS PowerPoint 2010.
MS Office Power Point presentation format file saved in LibreOffice Impress 4.2.6.
MS Office Power Point presentation file saved in LibreOffice Impress 4.2.6 in ODP format.

Description Liam Smit 2014-02-07 12:34:53 UTC
Created attachment 93600 [details]
Internal file listing of MSO.pptx, LO.pptx and LO.odp

I received a presentation save in .pptx format by MS Office 2010:
1,4M Feb  7 11:28 mo_save_as_pptx.pptx

I opened it in Impress and then saved it as a .pptx file without making any changes. The resulting file is more than four times as large:
6,5M Feb  7 12:11 lo_save_as_pptx.pptx

Saving the same file as a .odp file without making any changes results in a file that is four times smaller than the original. 
368K Feb  7 14:15 lo_save_as_odp.odp

So it seems that Libre Office can import the .pptx file correctly but can not write it out correctly as a .pptx i.e. writing to .odp seems fine.

Examining the resulting files reveals that when Impress writes out a .pptx file many of the images are copies of the images used in the presentation (logos, backgrounds, etc).

Please see the attached archive for the file-listing of each of the three files i.e. MS Office .pptx, LibreOffice .pptx and LibreOffice .odp.
Comment 1 Liam Smit 2014-02-07 13:03:28 UTC Comment hidden (obsolete)
Comment 2 Liam Smit 2014-02-24 10:30:32 UTC Comment hidden (obsolete)
Comment 3 retired 2014-02-27 12:43:56 UTC Comment hidden (obsolete)
Comment 4 Liam Smit 2014-02-27 13:00:14 UTC Comment hidden (obsolete)
Comment 5 Liam Smit 2014-02-27 13:46:09 UTC Comment hidden (obsolete)
Comment 6 Liam Smit 2014-02-27 13:47:27 UTC Comment hidden (obsolete)
Comment 7 QA Administrators 2014-09-03 21:32:37 UTC Comment hidden (obsolete)
Comment 8 Liam Smit 2014-09-11 14:08:40 UTC
Created attachment 106135 [details]
Original MS Office Power Point presentation file saved in MS PowerPoint 2010.

Original MS Office Power Point presentation file saved in MS PowerPoint 2010.

Note the text and numbers were scrambled.
Comment 9 Liam Smit 2014-09-11 14:10:45 UTC
Created attachment 106136 [details]
MS Office Power Point presentation format file saved in LibreOffice Impress 4.2.6.

Note text and numbers intentionally scrambled.
Comment 10 Liam Smit 2014-09-11 14:12:56 UTC
Created attachment 106137 [details]
MS Office Power Point presentation file saved in LibreOffice Impress 4.2.6 in ODP format.

Note text and numbers were intentionally scrambled.
Comment 11 Liam Smit 2014-09-11 14:28:05 UTC
OK let's start again. Please ignore everything before Comment 8.

I've uploaded three files:

1.) PowerPoint format file saved in MS Office PowerPoint 2010 which is 1.6MB big:
1.6M Sep 11 15:26 Staff_Update_September_2014_mso.pptx

2.) Open the file from 1.) in LibreOffice Impress 4.2.6 and save it as a pptx results in it more than tripling in size to 5.8MB:
5.8M Sep 11 15:31 Staff_Update_September_2014_lo.pptx

3.) Open the file from 1.) in LibreOffice Impress 4.2.6 and save it as a .odp results in it halving in size to 0.8MB:
883K Sep 11 15:50 Staff_Update_September_2014_lo.odp


From what I can determine, when Impress writes out a .pptx file (i.e. file 2 above) then many of the images that are used in the presentation (logos, backgrounds, etc) get saved multiple times as separate copies of the same image.


Possibly a problem in the export filter to .pptx?
Comment 12 ign_christian 2014-09-12 15:29:16 UTC
Confirmed under Ubuntu 12.04 x86 with:
- LO 4.0.6.2 : size become 3.1 MB
- LO 4.1.6.2 : size become 5.9 MB
- LO 4.2.6.3 : size become 6.0 MB
- LO 4.3.1.2 : can't be saved again as pptx, strange..

After unzipping the pptx, under 'media' folder we can see many duplicate images. In original pptx, 'media' folder only contains 21 image files. But after saving in 4.2.6.3 or 4.1.6.2, it grows to 269 files. In 4.0.6.2: 110 files.

Not all image has duplicates, most duplicates is logo in footer (possibly because of master slides).
Comment 13 ign_christian 2014-09-12 15:35:36 UTC
(In reply to comment #12)
Sorry..I forgot to tell, only tried with saving as pptx.
Comment 14 QA Administrators 2016-02-21 08:36:41 UTC Comment hidden (obsolete)
Comment 15 Liam Smit 2016-02-22 06:55:39 UTC
Problem still exists in LibreOffice 5.1:

Original MS Office file:
1,6M Sep 11  2014 Staff_Update_September_2014_mso.pptx


Saved as PPTX by LO 5.1
5,8M Feb 22 08:41 Staff_Update_September_2014_mso_lo_5-1.pptx


Saved as ODP by LO 5.1:
833K Feb 22 08:40 Staff_Update_September_2014_mso_lo_5-1.odp


OS: Ubuntu 14.04.4 LTS
LO: Via PPA (5.1.0 rc3)
Comment 16 QA Administrators 2017-03-06 15:35:37 UTC Comment hidden (obsolete)
Comment 17 Roman Kuznetsov 2019-01-11 12:08:13 UTC
still repro in

Version: 6.3.0.0.alpha0+
Build ID: 6b4ea2d8ddd681fec98773d7e0bbec9657a1fc08
CPU threads: 4; OS: Windows 6.1; UI render: default; VCL: win; 
Locale: ru-RU (ru_RU); UI-Language: en-US
Calc: threaded

I got 3.2 mb for resaved PPTX
Comment 18 QA Administrators 2021-01-11 03:57:04 UTC Comment hidden (obsolete)
Comment 19 NISZ LibreOffice Team 2021-01-12 14:07:44 UTC
Still goes from 1.5 Mb to 3.2 in:

Version: 7.2.0.0.alpha0+ (x64)
Build ID: 80497c7d81af36f703d122ac78baa26387a5854d
CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: Skia/Raster; VCL: win
Locale: en-US (hu_HU); UI: en-US
Calc: CL
Comment 20 Commit Notification 2022-03-30 16:25:52 UTC
Tünde Tóth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/aea8043bc5f5187498fa450505d6de9d6986e2a6

tdf#74670 tdf#91286 PPTX XLSX export: save image once

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 NISZ LibreOffice Team 2022-03-31 09:56:26 UTC
Verified in:
Version: 7.4.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: a3988b2d147a2442b348d58b79dbd6e71472b7af
CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: Skia/Raster; VCL: win
Locale: hu-HU (hu_HU); UI: en-US
Calc: threaded