Bug 73270 - FILESAVE: images lost after saving to new file name and immediately exporting to PDF
Summary: FILESAVE: images lost after saving to new file name and immediately exporting...
Status: VERIFIED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.3.2 release
Hardware: x86-64 (AMD64) Windows (All)
: medium critical
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 47148
  Show dependency treegraph
 
Reported: 2014-01-03 23:49 UTC by Matthias Basler
Modified: 2016-01-22 18:00 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
Report with all images as created by GRAMPS (custom report) (646.37 KB, application/vnd.oasis.opendocument.text)
2014-01-04 20:43 UTC, Matthias Basler
Details
Report after saving to different name and different directory and then saving again. (557.19 KB, application/vnd.oasis.opendocument.text)
2014-01-04 20:45 UTC, Matthias Basler
Details
Exported PDF (with some missing images) after saving to different name and directory (583.11 KB, application/pdf)
2014-01-04 20:58 UTC, Matthias Basler
Details
Document example in pdf format (does not exhibit problem) (1.43 MB, application/pdf)
2014-06-25 14:21 UTC, Wade D. Peterson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Basler 2014-01-03 23:49:25 UTC
I discovered a critical bug similar to 52226, but on Win7, which leads to data loss:

A large document containing lots of images is saved as ODF to a *new* file name (Save as...). Subsequently some images fail to load and if the document is then saved again (using Save or Save as), the saved file will have these images completely missing. These images obviously disappeared when saving to a new file.

The first saved file is ~50MB (like the original one), the second one only 30MB.

This is reproducable for me, at least with the particular document I use. (Sorry, cannot attach the file since it contains sensitive data.)

It is an issue I am pretty sure I reported or confirmed 1-2 years ago already (might have been at the OOo bug tracker though). At that time it happened with AutoSave enabled, but obviously now AutoSave is not involved.

I am on Win7, 64 Bit on Core-i5 with LO 3.6.3
Comment 1 Matthias Basler 2014-01-03 23:52:26 UTC
P.S. Before you ask me to try this out in LO 4.1.x, I should note that due to another major reported issue (freezing when scrolling documents with lots of images), LO4.1.x is practically unusable for me as well. I have just reverted to 3.6.x.
Comment 2 Matthias Basler 2014-01-04 00:39:20 UTC
Reproduced with 3.6.7.2 as well (Win7, 64 Bit).

I should also note that I can not reproduce this issue with only a 40 page subset of the original 300 page document. Probably occurs only with large documents, such as books.
Comment 3 retired 2014-01-04 10:38:00 UTC
Matthias, can you please provide a test document so this can be tested against and subsequently be confirmed. If your document contains sensitive data, please clear that or replace it with random information. In your case, you could just replace any sensitive images with a random image you repeat - I'm sure you can find  plenty of cat pictures on the net ;)

Without a test document there isn't really anything that can be looked into.

Setting to NEEDINFO until more detail is provided.

After providing the requested info, please reset this bug to UNCONFIRMED. Thanks :)
Comment 4 Matthias Basler 2014-01-04 15:06:38 UTC
Well, having spent 3hrs trying to create a sample document, I give up.

It seems that everytime I simplify an affected document and save it again, that document is then "stable", that is, I cannot reproduce the issue on it any more.
I even created a Lorem ipsum document with 100 pages and 200 image, but again I could not reproduce the issue on it.

---
There is, however two details about the issue that I can share:

On the first document where the error occurs (a custom genealogy report generated by the open source GRAMPS application), the corrupted file consistently ends up at 30MB instead of the 50MB of the initial file. There is no obvious rule as to which images get lost, as far as I can tell.
The corrupted file has only 178 images (instead of the inital 193 images) in its "pictures" subdirectory, and I can give you a typical before<->after section of the content.xml:

Before (correct):
-----
<draw:frame draw:style-name="fr4" draw:name="MYIMAGE_JPG" text:anchor-type="paragraph" svg:y="0cm" svg:width="3.11cm" svg:height="4.001cm" draw:z-index="1">
<draw:image xlink:href="Pictures/10000000000002BC000003848D4E4622.jpg" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/></draw:frame> </text:p></table:table-cell></table:table-row></table:table></table:table-cell></table:table-row>
-----

After:
-----
<draw:frame draw:style-name="fr4" draw:name="MYIMAGE_JPG" text:anchor-type="paragraph" svg:y="0cm" svg:width="3.11cm" svg:height="4.001cm" draw:z-index="3">
<draw:image/></draw:frame> </text:p></table:table-cell></table:table-row></table:table></table:table-cell></table:table-row>
-----
(Note that I replaced the filename with "MYIMAGE" for privacy reasons.)

As you can see, the "xlink:href" has been removed and only a "<draw:image/>" tag remains of the image. This is an identical effect as described in 
bug 52638, comment 12. So my guess is, that we have a duplicate here and that  bug 52638 is not a Mac-OS X issue only.
Comment 5 Matthias Basler 2014-01-04 20:43:43 UTC
Created attachment 91494 [details]
Report with all images as created by GRAMPS (custom report)
Comment 6 Matthias Basler 2014-01-04 20:45:10 UTC
Created attachment 91495 [details]
Report after saving to different name and different directory and then saving again.
Comment 7 Matthias Basler 2014-01-04 20:58:25 UTC
Created attachment 91496 [details]
Exported PDF (with some missing images) after saving to different name and directory

I finally have a test case for you. It is yet another genealogy report from the GRAMPS database, but this time just a sample database, so no sensitive data. Fortunately (well, for testing) the problem already happens with this small report.

It is interesting to note that the affected images are the three images on the right hand side of the first report page. In my previous long report the missing images were also usually on the right hand side (right aligned), but not all such images were missing and are missing in this sample. Now that I think of it: The affected images in the other problematic book were also on the right hand side. This is probably not just coincidence.

If I scroll down after saving the document for the first time and then up I can see the "Read error" message on the affected images. If I then do a PDF export the images are missing in the resulting PDF, just the frames remain.

To me this looks as if during the "Save as" operation the links of the affected images to their in-memory location get broken, thus LO being unable to show and correctly save those images afterwards.

Hope you have now everything you need to fix this bug.
Comment 8 Matthias Basler 2014-01-04 21:01:29 UTC
Correcting version to 3.6.3 (accidently had set it to 3.4.3) and changed status to unconfirmed again as requested.
Comment 9 tommy27 2014-02-02 09:57:12 UTC
(In reply to comment #7)
> Created attachment 91496 [details]
> Exported PDF (with some missing images) after saving to different name and
> directory
> 
> I finally have a test case for you.

this will finally allow to debug


> It is interesting to note that the affected images are the three images on
> the right hand side of the first report page.
> .... 
>  The affected images in the other problematic book were also on
> the right hand side. This is probably not just coincidence.

maybe not.

> If I scroll down after saving the document for the first time and then up I
> can see the "Read error" message on the affected images. If I then do a PDF
> export the images are missing in the resulting PDF, just the frames remain.

I confirm both things under Win7 64bit using 4.1.4.2.
however if I close the .odt with "read error" on image and then reopen it, there are no more missing images and even doing another "down & up" page scroll, the image stay there and can be exported to PDF with no issue.

> To me this looks as if during the "Save as" operation the links of the
> affected images to their in-memory location get broken, thus LO being unable
> to show and correctly save those images afterwards.

let's hear about a developer impression.
adding Writer expert to CC list.


by the way, Matthias Basler, which LibO version are you currently using?
the 3.6.x branch is now obsolete and it's suggested to upgrade to the 4.1.x stable release (.4 already available, .5 going to be released in a few days)
Comment 10 tommy27 2014-02-02 10:06:45 UTC
still reproducible with Version: 4.3.0.0.alpha0+
Build ID: a995462e6855061816c6529c366f20ace2b45868
TinderBox: Win-x86@42, Branch:master, Time: 2014-01-31_23:29:34

edited summary notes.

actually the bug is just reproducible if you save test file with another file name, don't close it and directly export it to PDF.

PDF output will have missing images and the source ODT will temporarily show read error alerts.

however if you save test file to another file name, then close it and reopen the new version, PDF output will be correct and you will not notice any read error message in source ODT.

so it seems that images are temporarily corrupted after first save but then work again if you close and reopen.
Comment 11 Matthias Basler 2014-02-03 19:59:14 UTC
Hi tommy27.

Your observations in Comment 10 are exactly what I observed as well. Please note that PDF export (after the fist saving) is only one aspect, the one much more critical being the fact that saving a second time (without closing LO and reopening the file in between) will also create a corrupt file.

See my comment 1 as for one reason why I am still on 3.6.x branch. Basically working on very large documents is almost impossible on 4.1 due to two issues, and one, the very slow loading time, is still persistent on 4.2 to my knowledge.
Comment 12 Cor Nouws 2014-02-03 20:12:29 UTC
(In reply to comment #11)

> See my comment 1 as for one reason why I am still on 3.6.x branch. Basically
> working on very large documents is almost impossible on 4.1 due to two
> issues, and one, the very slow loading time, is still persistent on 4.2 to
> my knowledge.

Hi Matthias,

I've seen some commits yesterday about images, larger files and such.
WOuld not surprise me if the cause for that trouble is the source for this issue too.
If you could try a daily build one of these days, that would be great :)
thanks a lot!

Cor
Comment 13 Matthias Basler 2014-02-03 20:20:42 UTC
Indeed Michael Stahl has fixed bug 73300 just yesterday, which was the main showstopper for me.
Comment 14 Matthias Basler 2014-06-20 12:42:22 UTC
Today I tested this issue with LibO 4.3.0.1 (RC1) and was not able to reproduce this phenomenon with either of three test documents, including the one attached here and two large sample "book" with lots of images.

Interestingly I had one document open for quite some time today (on 4.3.0.1), during which I created several new documents and do tests in them. When I later returned to the first document, some images would show the "Read error" warning and a saved version of the document would indeed have these images missing. 

Not sure what I should think of this...

Can please anyone else check if this is still reproducible with 4.3.0.1 or higher?
Comment 15 Wade D. Peterson 2014-06-25 14:21:11 UTC
Created attachment 101749 [details]
Document example in pdf format (does not exhibit problem)
Comment 16 Wade D. Peterson 2014-06-25 14:51:58 UTC
Created attachment 101749 [details]
Document example in pdf format (does not exhibit problem)

I just wanted to confirm that I have had a problem similar to that reported by Matthias for some years.  That is, when saving documents in pdf format the illustrations are sometimes lost.

This problem has occurred on earlier versions of OpenOffice, as well as my current version of LibreOffice 4.1.6.2 (stable) running under Windows 8.

This problem has been intermittent and is not reproducible.  It seems to occur on large files with many illustrations.  To show an example of the type of files where the problem occurred in the past, I've attached an example of a pdf file that correctly formatted under OpenOffice (WbBridge_28MAR2012.pdf).  This file was made about two years ago and formatted correctly, but the same problem would occasionally occur.  I was able to format this file correctly only after I manually paged through the entire document.  Once I had briefly inspected every page, I was able to successfully save it as a pdf.  [My intuition is that some type of caching operation was going on in the background while I did this].

This problem seems to be related to another problem where saving the document erases all of the illustrations.  For example, my current project is several hundred pages long with 132 illustrations.  The normal file size is about 2,747 Kb, but sometimes it wipes out all of the illustrations when I save it.  It did it yesterday using LibreOffice 4.1.6.2, and the resulting file size was cut down to 318 Kb.  Sorry, I can't publicly share this document until I deposit it at the Copyright Office.

Most of my illustrations are done with LibreOffice Draw.  I copy them from the Draw application and past them into the Write application using 'Edit' -> 'Paste Special' -> 'GDI Metafile'.  I change the file names every day by modifying the date at the end of the file name (I do this for legal reasons as a record of invention).  This supports the thread above where the problem seems to happen when it is saved as a new file.  Under my technique, I save it as a new file everyday.  However, I have seen similar activity when I do a 'copy and paste' under the Windows file system, and then change the file name.  After opening the new file for the first time it seems to do the same thing sometimes.

This is going to be a tough bug to fix as it seems to happen only on big files and is intermittent.

(In reply to comment #14)
> Today I tested this issue with LibO 4.3.0.1 (RC1) and was not able to
> reproduce this phenomenon with either of three test documents, including the
> one attached here and two large sample "book" with lots of images.
> 
> Interestingly I had one document open for quite some time today (on
> 4.3.0.1), during which I created several new documents and do tests in them.
> When I later returned to the first document, some images would show the
> "Read error" warning and a saved version of the document would indeed have
> these images missing. 
> 
> Not sure what I should think of this...
> 
> Can please anyone else check if this is still reproducible with 4.3.0.1 or
> higher?
Comment 17 Cor Nouws 2014-07-08 07:43:15 UTC
isn't this just a sample of bug 47148 ??
Comment 18 Rupert Kolb 2014-07-11 09:46:22 UTC
Confirmation of comment 14:

I created a document (size about 23 MB, 220 images, 130 pages) with 4.2.x and former versions.
When opening it with 4.3.0.1 everything is still ok.
But after saving, and opening it with 4.2.5.2 or 4.3.0.1 once more, some(!) images are missing.

For me it is a "blocker". I have to stay at a 4.2 version, until this is solved.
Comment 19 Björn Michaelsen 2014-07-11 18:16:07 UTC
(In reply to comment #18)
> Confirmation of comment 14:
> I created a document (size about 23 MB, 220 images, 130 pages) with 4.2.x
> and former versions.
> When opening it with 4.3.0.1 everything is still ok.
> But after saving, and opening it with 4.2.5.2 or 4.3.0.1 once more, some(!)
> images are missing.

Great! Could you make that document available somewhere and explicitly name the images missing (e.g. by page number)?

With that, this might be bibisectable and would help to make sure if this is a regression in LibreOffice or if it is inherited. Alternatively, consider doing the bibisect yourself: https://wiki.documentfoundation.org/Bibisect

Also, please restrain yourself to triage information in bugzilla comments. Thanks.
Comment 20 Alex Thurgood 2015-01-03 17:41:03 UTC
Adding self to CC if not already on
Comment 21 QA Administrators 2016-01-17 20:04:27 UTC Comment hidden (obsolete)
Comment 22 Jean-Baptiste Faure 2016-01-22 06:08:43 UTC
Not reproducible for me in LO 5.1.1.0.0+ built at home under Ubuntu 15.10 x86-64.
Please test under MS-Windows.

Best regards. JBF
Comment 23 tommy27 2016-01-22 06:24:43 UTC
bug is gone under Win8.1 x64 using LibO 5.0.4.2

WORKSFORME
Comment 24 Matthias Basler 2016-01-22 16:51:13 UTC
I checked the bug using attachment 91494 [details] and could not reproduce it under Win7 64 Bit and LO 5.0.4.2. Neither could I reproduce it with large report with which I originally discovered the bug.

Great.
Comment 25 tommy27 2016-01-22 18:00:06 UTC
Happy End  :-)