Bug 115005 - Regression LibO 6.0RC creates much larger files than 5.4 by including duplicate/redundant images
Summary: Regression LibO 6.0RC creates much larger files than 5.4 by including duplica...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Impress (show other bugs)
Version:
(earliest affected)
6.0.0.2 rc
Hardware: All Linux (All)
: medium normal
Assignee: Serge Krot (CIB)
URL:
Whiteboard: target:6.1.0 target:6.0.4
Keywords: bibisected, bisected, regression
Depends on:
Blocks: Save 116266
  Show dependency treegraph
 
Reported: 2018-01-14 18:17 UTC by sergio.callegari
Modified: 2018-05-12 12:07 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample file showing the issue (50.61 KB, application/vnd.oasis.opendocument.presentation)
2018-01-26 21:37 UTC, sergio.callegari
Details
File saved by LibreOffice 5.4.4.2 (26.36 KB, application/vnd.oasis.opendocument.presentation)
2018-02-10 20:26 UTC, OfficeUser
Details
File saved by LibreOffice 6.0.1.1 (43.50 KB, application/vnd.oasis.opendocument.presentation)
2018-02-10 20:27 UTC, OfficeUser
Details

Note You need to log in before you can comment on or make changes to this bug.
Description sergio.callegari 2018-01-14 18:17:52 UTC
Description:
I have a presentation which is about 800kB. If I open it with LibO6.0 RC2 and save it again, the file size jumps up to about 1.3MB.

This seems to be due to how images are managed.

Opening the presentation file with a zip tool reveals that the LibO 6.0 version of the presentation includes multiple png "copies" of an emf image included in the presentation.

Unfortunately, I cannot attach the presentation I'm working on here now. I'll try to see if I can create a reproducible test case. In the meantime, I am posting the bug in case:

- someone else experiences the same issue, so that they can refer to here and maybe help providing a test case
- some developer can immediately recognize what may have lead to this regression

Steps to Reproduce:
See description

Actual Results:  
See description

Expected Results:
See description


Reproducible: Always


User Profile Reset: No



Additional Info:
[Information automatically included from LibreOffice]
Locale: en-US
Module: StartModule
[Information guessed from browser]
OS: Linux (All)
OS is 64bit: yes


User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0
Comment 1 sergio.callegari 2018-01-14 18:42:14 UTC
I seem to be able to reliably reproduce a difference between 5.4 and 6.0 in that 6.0 always stores a png version of any emf/wmf image inserted in a presentation, which already seems a regression to me.

It looks like I am not able to reproduce the case where I had multiple versions of the png associated to the same emf image (which was used both in master pages and in the slides).
Comment 2 MM 2018-01-14 23:36:00 UTC
Could be it's: https://wiki.documentfoundation.org/ReleaseNotes/6.0#Improvements_to_ODF_Export

"Metafiles which were previously saved in the internal SVM (Star View Metafile) format are now accompanied by a PNG fallback graphic. This makes it easier for other ODF readers to display the graphics."
Comment 3 sergio.callegari 2018-01-15 10:25:17 UTC
Looks like a reasonable explanation for what I observed in my comment from 14 Jan (even if I'd very much prefer to have it configurable in the "compatibility" options as this means saving all vector images twice - which often make a difference from having a document suitable for being sent via email or requiring some large file attachment service).

What remains unexplained is the case where I got 3 identical PNGs of the same vector image in the odp file. They went away by resaving the doc with LibO 5.4, so - unfortunately - right now I do not have a file for analysis.

I suspect, but I cannot be sure, that this occurred after importing master pages in a presentation from another template, where these master pages from the other template contained the same images that were already in the presentation where they got imported. But I cannot be sure...
Comment 4 Xisco Faulí 2018-01-15 10:42:21 UTC
> What remains unexplained is the case where I got 3 identical PNGs of the
> same vector image in the odp file. They went away by resaving the doc with
> LibO 5.4, so - unfortunately - right now I do not have a file for analysis.

Putting to NEEDINFO until a file is provided...
Comment 5 sergio.callegari 2018-01-15 14:48:47 UTC
Here we go again. I have a file where opening the odp with a zip tool, I see two perfectly identical (same CRC) png files corresponding to the same wmf.

Archive:  demo.odp
  Length      Date    Time    Name
---------  ---------- -----   ----
       47  2018-01-15 14:38   mimetype
     7205  2018-01-15 14:38   Thumbnails/thumbnail.png
     1888  2018-01-15 14:38   meta.xml
    12049  2018-01-15 14:38   settings.xml
    33698  2018-01-15 14:38   content.xml
   298760  2018-01-15 14:38   Pictures/10000201000002C3000002C376C5E25DC0676B4B.png
   298760  2018-01-15 14:38   Pictures/10000201000002C3000002C3A4E810BA42055A45.png
      859  2018-01-15 14:38   Pictures/1000000000000020000000204B249CA79A42C6D7.png
        0  2018-01-15 14:38   Configurations2/floater/
        0  2018-01-15 14:38   Configurations2/menubar/
        0  2018-01-15 14:38   Configurations2/progressbar/
        0  2018-01-15 14:38   Configurations2/toolbar/
        0  2018-01-15 14:38   Configurations2/accelerator/current.xml
        0  2018-01-15 14:38   Configurations2/statusbar/
        0  2018-01-15 14:38   Configurations2/images/Bitmaps/
        0  2018-01-15 14:38   Configurations2/popupmenu/
        0  2018-01-15 14:38   Configurations2/toolpanel/
     1603  2018-01-15 14:38   META-INF/manifest.xml
   176031  2018-01-15 14:38   styles.xml
   315552  2018-01-15 14:38   Pictures/1004D0A00000E7400000E74040C4C8430774A921.wmf
---------                     -------
  1146452                     20 files

Furthermore, when this happens, one of the copies of that image appearing in the presentation gets deteriorated, as if rather than using the "perfect" vector version, LibO started using just a png for it.

Unfortunately, this stuff on which I am having the issue is not particularly sensitive, but includes vector versions of the template used for slides at my Institution and should not be openly shared.
Comment 6 sergio.callegari 2018-01-15 14:54:02 UTC
Indeed, one of the images stops being an emf and becomes a png! This is evident from using the "save" function, that rather than proposing the saving of an emf now proposes the saving of a png.

Specifically, I have some logo in vector form and cropped in some master pages. I also have the same logo in non cropped form in the last slide. This latter logo gets replaced by a png and, at the same time, the png file starts appearing twice in the odp.

With this the matter seems more serious than I originally expected, because some document content is lost (the vector image) and replaced by a lower fidelty one (the png version of the same image).
Comment 7 Xisco Faulí 2018-01-16 17:26:37 UTC
could you please share the file ?
Comment 8 sergio.callegari 2018-01-26 21:37:09 UTC
Created attachment 139388 [details]
Sample file showing the issue

Please find attached a file showing the issue.

Inside the "Picture" folder, there are two png images, both 1.7kB in size, both with CRC 84D41D37, namely the same file. 

1742  Stored     1742   0% 2018-01-26 21:24 84d41d37  Pictures/10000200000001090000010943CE3636A5225AEB.png
    1742  Stored     1742   0% 2018-01-26 21:24 84d41d37  Pictures/100002000000010900000109E2FD4B392411EDF7.png

These corresponds to the vector image

5225  Stored     5225   0% 2018-01-26 21:24 ca96b3f8  Pictures/2000001000001B5900001B59FFB0B197CBF7C754.svm

This file shows the problem quite well. When a vector image is inserted in the presentation and then copied multiple times, LibO 6 in some occasions makes one png per copy, rather than making one for the each different vector image.
Comment 9 OfficeUser 2018-02-10 20:24:55 UTC
Confirmed. Using the builds 5.4.4.2 and 6.0.1.1 I did the following.

- With 5.4.4.2 I opened the attached file and deleted sheet 2.
- Saved the document as reference

- I Opened the reference file in 5.4.4.2 and copied the two images (one after the other) into sheet 2
- Saved as "test_ref_5.4.4.2.odp".


- I Opened the reference file in 6.0.1.1 and copied the two images (one after the other) into sheet 2
- Saved as "test_ref_6.0.1.1.odp".

Result:
test_ref_5.4.4.2.odp cotains one png.file. Size: 27.0 kB
test_ref_6.0.1.1.odp cotains FIVE png.file. Size: 44.5 kB


==> The file size nearly doubles with 6.0.1.1!

I will attach the files saved by the two different LibreOffice builds.
Comment 10 OfficeUser 2018-02-10 20:26:58 UTC
Created attachment 139763 [details]
File saved by LibreOffice 5.4.4.2
Comment 11 OfficeUser 2018-02-10 20:27:39 UTC
Created attachment 139764 [details]
File saved by LibreOffice 6.0.1.1
Comment 12 Xisco Faulí 2018-02-15 18:28:58 UTC
Using attachment 139763 [details],

it points me to

author	Samuel Mehrbrodt <Samuel.Mehrbrodt@cib.de>	2018-01-12 17:32:41 +0100
committer	Samuel Mehrbrodt <Samuel.Mehrbrodt@cib.de>	2018-01-15 13:50:10 +0100
commit	3da86d8987db6223b0acc5d8a1b56f7e0c54bbef (patch)
tree	0afdf8c0a0497ebfd8ef1303bfc51c5a3177a4a5
parent	0623f3a8f5d6fbc5e9b933cb034184084e8ac666 (diff)
tdf#114488 Rank multiple images also for flat odf
Only the file extension was considered before
which is not available in flat odf.

Now both internal and external URLs are resolved to their respective mimetype.



being before 30K and after 39K.

Personally I don't consider a 9K increase a bug. I would, if the difference were from 30K to 300K ( 10 times ) or greater.
Closing as RESOLVED WONTFIX
Comment 13 OfficeUser 2018-02-15 21:21:40 UTC
@Xisco: Thanks for the bibisect!

In other cases, the size increase might be much higher. In my case it was +65 percent. Imagine files that are several mega bytes...

I have added Samuel. Let's wait for his feedback. Perhaps it is easy to fix and just something he has overlooked...
Comment 14 OfficeUser 2018-02-15 21:29:49 UTC
Link to the patch related issue: Bug 114488
Comment 15 sergio.callegari 2018-02-15 21:57:55 UTC
Quoting from the initial report.

> I have a presentation which is about 800kB. If I open it with LibO6.0 RC2
> and save it again, the file size jumps up to about 1.3MB.

This is almost 2X.

I wonder if someone could clarify what does "Rank multiple images also for flat odf. Only the file extension was considered before which is not available in flat odf. Now both internal and external URLs are resolved to their respective mimetype." actually mean.

In my case, I insert an svg and then copy and paste it and I get multiple identical equivalent pngs in the odf. Why should both "internal" and "external" URLs be involved?

I am trying to understand what is going on, since I think that if LibO >=6 cannot get fixed, it should be possible to at least write an offline odf tool to get rid of all the duplicate figures, updating the internal references to them. I have tried keeping a LibO 5.4 around to do the roundtrip through it, to reduce file sizes, but this seems unreliable (at times figures disappear) and does more than desired (eliminates all the pngs corresponding to svgs, not just the redundant ones).
Comment 16 Samuel Mehrbrodt (CIB) 2018-02-22 07:23:36 UTC
So as far as I understand there are a few issues here:
1) Image size grows because of the fallback images. I think we can consider adding an option whether to include fallback images or not.
2) Vector images are being replaced with PNGs, and the vector images are gone afterwards. Needs investigation.
3) Fallback PNGs are somethimes added twice. Also needs investigation.

Related commits:
https://cgit.freedesktop.org/libreoffice/core/commit/?id=6b3cc69fd2b2de5ace68f2739eb383267d66f76f
https://cgit.freedesktop.org/libreoffice/core/commit/?id=38602abc2d2b59bc3644e37797b9b1bc779fd993
https://cgit.freedesktop.org/libreoffice/core/commit/?id=2d3023c9713c4c7cac732a6831c69dec581a7751

(The commit mentioned above (3da86d8987db6223b0acc5d8a1b56f7e0c54bbef) should be unrelated to this issue, as it only affects importing.
Comment 17 sergio.callegari 2018-02-22 11:18:47 UTC
Nice summary. I'd add that

a) I have a feeling that 2) might be related with round trips involving both LibO 6 and LibO 5.x.

b) When roundtrips involving LibO 5.x are successfull, 3) is often solved. That is: take a LibO file with duplicate equivalent PNGs; open it with LibO 5.4.x; save (all the equivalent PNGs are discarded); open with LibO 6.x; save (equivalent PNGs are re-created, typically not in duplicate fashion).
Comment 18 Commit Notification 2018-03-20 06:48:44 UTC
Serge Krot committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=79b2f1cb36ea4fec61b0620085313eb53fce9fa0

tdf#115005 Do not remove original vector images from slides

It will be available in 6.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Commit Notification 2018-03-21 07:46:26 UTC
Serge Krot committed a patch related to this issue.
It has been pushed to "libreoffice-6-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=070f3db51da48c70cde12050c18fb03de2192c0f&h=libreoffice-6-0

tdf#115005 Do not remove original vector images from slides

It will be available in 6.0.4.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Commit Notification 2018-03-28 06:57:25 UTC
Serge Krot committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1c1160967acf49cffae8921f3ab8361821bbaaaf

tdf#115005: New option to prevent adding fallback images

It will be available in 6.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Samuel Mehrbrodt (CIB) 2018-04-04 06:05:22 UTC
*** Bug 115898 has been marked as a duplicate of this bug. ***
Comment 22 Xisco Faulí 2018-04-19 11:05:41 UTC
This issue is still reproducible in master

Version: 6.1.0.0.alpha0+
Build ID: cb5f6503f593d7c7a719542281b9efd274134f7c
CPU threads: 4; OS: Linux 4.13; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group

Let's keep bug 117074 as a follow-up bug...
Comment 23 opensuse.lietuviu.kalba 2018-05-12 12:07:03 UTC
(In reply to Commit Notification from comment #18)
> Serge Krot committed a patch related to this issue.
> It has been pushed to "master":
> 
> http://cgit.freedesktop.org/libreoffice/core/commit/
> ?id=79b2f1cb36ea4fec61b0620085313eb53fce9fa0
> 
> tdf#115005 Do not remove original vector images from slides
> 


Sorry, Serge Krot, but seems your changes are unrelated to description: 

https://cgit.freedesktop.org/libreoffice/core/commit/?id=79b2f1cb36ea4fec61b0620085313eb53fce9fa0 and https://cgit.freedesktop.org/libreoffice/core/commit/?id=070f3db51da48c70cde12050c18fb03de2192c0f&h=libreoffice-6-0 talks only about SVG, and SVG as image/x-vclgraphic .

https://www.openoffice.org/api/docs/common/ref/com/sun/star/graphic/GraphicDescriptor.html says "internal mime type image/x-vclgraphic, in which case the original mime type is not available anymore"

https://www.openoffice.org/api/docs/common/ref/com/sun/star/graphic/GraphicDescriptor.html indicate that many vector images have their own mimetypes, e.g. image/svg+xml image/x-emf image/x-eps image/x-wmf . Strange, but I don't see PDF, though it is supported.

I open new bug for discarded PDF as images from ODT document (though not slides): https://bugs.documentfoundation.org/show_bug.cgi?id=117576