Bug 34805 - Recovery of MS OFFICE documents fails after crash
Summary: Recovery of MS OFFICE documents fails after crash
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.3.1 release
Hardware: Other All
: highest critical
Assignee: Not Assigned
URL:
Whiteboard: target:3.4.2
Keywords:
Depends on:
Blocks: mab3.4
  Show dependency treegraph
 
Reported: 2011-02-27 12:34 UTC by Steve Edmonds
Modified: 2011-12-23 17:09 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshots, See Comment 2 (98.90 KB, application/pdf)
2011-05-30 11:12 UTC, Rainer Bielefeld Retired
Details
Failed recovery 3.4.1, note 9 (60.53 KB, image/png)
2011-07-16 13:37 UTC, Steve Edmonds
Details
[1] Screenshots: AutoRecovery process - error messages (94.21 KB, application/pdf)
2011-07-17 04:27 UTC, manj_k
Details
[2] Original 'autorecovery.doc' (created with LibO 3.4.2 RC1) (56.00 KB, application/msword)
2011-07-17 04:28 UTC, manj_k
Details
[3] Recovered file 'autorecovery.doc_1.odt' (58.04 KB, application/vnd.oasis.opendocument.text)
2011-07-17 04:30 UTC, manj_k
Details
[4] AutoRecovery info 1 / backup folder – 'autorecovery.doc_0.odt' (57.79 KB, application/vnd.oasis.opendocument.text)
2011-07-17 04:32 UTC, manj_k
Details
[5] AutoRecovery info 2 / Temp folder – 'lu4989y.tmp' (57.79 KB, application/octet-stream)
2011-07-17 04:33 UTC, manj_k
Details
[6] Original file 'He heard quiet steps behind him.doc' (created with MSWord 2000) (22.50 KB, application/msword)
2011-07-17 06:33 UTC, manj_k
Details
[7] Recovered file 1 'He heard quiet steps behind him.doc' (15.00 KB, application/msword)
2011-07-17 06:35 UTC, manj_k
Details
[8] Recovered file 2 'He%20heard%20quiet%20steps%20behind%20him.doc_0.odt' (16.52 KB, application/vnd.oasis.opendocument.text)
2011-07-17 06:37 UTC, manj_k
Details
[9] AutoRecovery info 1 / backup folder – 'He heard quiet steps behind him.doc_0.odt' (16.52 KB, application/vnd.oasis.opendocument.text)
2011-07-17 06:39 UTC, manj_k
Details
[10] AutoRecovery info 2 / Temp folder – 'luautmz.tmp' (16.52 KB, application/octet-stream)
2011-07-17 06:41 UTC, manj_k
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Edmonds 2011-02-27 12:34:46 UTC
If the file you are editing is a .doc the recovery information is not recoverable.
Edit a .doc, set recovery  set autoRecovery information save time to say 5 mins.

Add some content to the file and wait more than 5 mins without saving.
Crash (kill) LibreOffice. The document fails to recover with a message Read-Error: Error reading file 

Reproducable on OSX and Win7.
Also occurs in Openoffice 3.3
Comment 1 Steve Edmonds 2011-02-27 16:33:04 UTC
Also under Linux (OpenSUSE)
Comment 2 Rainer Bielefeld Retired 2011-05-30 11:08:53 UTC
[Reproducible] with "LibreOffice 3.4.0RC2  – WIN7  Home Premium  (64bit) English UI [OOO340m1 (Build:12)]". I created a test document "mytest.doc" and provoked a crash by exporting the document to a .pdf due to "Bug 36820 - PDFEXPORT with option "Tagged PDF" crashes WRITER documents". LibO crashed, and when I restarted it during recovery I got some "Read error" message, I do not remember.

I checked the backup folder and found a file "mytest.doc_1.odt". When I renamed it to "something.doc" after I had copied it to an other folder LibO was able to open "something.doc" without problems, and the document was a perfect copy of my  "mytest.doc" with all contents.

So It seems that the name of the recovery copy "mytest.doc_1.odt" is wrong  or LibO expects the wrong name or soemthing else.

In an additional test I modified the name of the backup document from "__mytest.doc_0.odt" to "__mytest.doc", what was the name of my new sample. but that did not work either, when I started recovery I got an error message "__mytest.doc_0.odt ont available".

I also did a similar test with an EXCEL document. You will not believe it, this time my additionally opened .doc and also an .odt have been recovered without problems, but for the excel document I got an error message "Read Error, Unknown or unsupported Excel file format."

It seems recovery for MS OFFICE documents is messed up.
Comment 3 Rainer Bielefeld Retired 2011-05-30 11:12:12 UTC
Created attachment 47332 [details]
Screenshots, See Comment 2
Comment 4 Petr Mladek 2011-06-09 07:36:57 UTC
This might cause data loss => we should do our best to fix it for 3.4.1 => increasing severity and priority.
Comment 5 jjthaden 2011-07-01 15:32:34 UTC
(In reply to comment #1)
> Also under Linux (OpenSUSE)

Also in Win XP
Comment 6 Don't use this account, use tml@iki.fi 2011-07-15 02:33:02 UTC
Could not reproduce in a straightforward test with 3.4.2 rc1 on Windows XP. I set the aurorecovery interval to one minute. I edit a .doc document, adding a bit of text, wait for over one minute, and terminate the soffice.bin process in Task Manager. Then I restart LibreOffice. It offers to recover the document that was being edited, I click "Start Recovery". Next it then offers to send a crash report (huh, I thought we didn't have this functionality, where do these crash reports go?), I click Cancel. The document opens with my unsaved additions intact.

Is there something special about the initial reporter's situation? Where do you store the temporary files, what is the path for "Temporary files" in Tools:Options:LibreOffice:Paths ?

I think the claim that the problem the initial bug reporter sees would be specific to MSO file types is a red herring. As far as I can understand, autorecovery backup files are  in fact saved as the appropriate ODF format. I.e. for text documents, .odt. (The file names used end with .tmp, though.) Autorecovery works also for documents that have been created from scratch without having ever been saved in any format at all, how would that work if autorecovery used the format the document was loaded as, or last saved as?

Rainer, I am afraid I am going to ignore your complex comment, it just confuses me... I don't know what backup folder and files you are talking about.
Comment 7 Rainer Bielefeld Retired 2011-07-15 03:00:55 UTC
(In reply to comment #6)

> Rainer, I am afraid I am going to ignore your complex comment, it just confuses
> me... I don't know what backup folder and files you are talking about.

Confuses me now, too. Created the comment while testing with all delusions and confusions during the test. I think the result is more or less that recovery documents are available, but LibO does not find them because it expects other names (or similar). If you do not find an other approach (and if you cant' reproduce it will be hard to find ...) I can retry again and may be I find a more direct way.
Comment 8 Rainer Bielefeld Retired 2011-07-16 07:40:43 UTC
That was simple, it seems the bug has vanished. No longer reproducible with "LibreOffice 3.4.1  - WIN7  Home Premium (64bit) German UI [OOO340m1 (Build:103)]", also works fine with Master dev-build.

Closing due to latest results.
Comment 9 Steve Edmonds 2011-07-16 13:35:52 UTC
I can still reproduce this every time on 3.4.1 on OSX.

Set "Save auto recovery information" to 2 minutes.
Don't tick "Always create backup copy"

Open a .doc file
Change some things
Kill LO (force quit on a mac)

Open LO again. The attached screen shot is what I get, document is not recovered.
steve
Comment 10 Steve Edmonds 2011-07-16 13:37:29 UTC
Created attachment 49184 [details]
Failed recovery 3.4.1, note 9
Comment 11 Rainer Bielefeld Retired 2011-07-16 23:40:00 UTC
@Steve:
Strange! Because for me the problem is no longer reproducible, we might have 2 effects with the same result?

I have some difficulties to understand your crash method, can you please use a crash bug (see below!)

Can you do an additional test by trying to crash with a document you can attach here (with simple line "This is first test YYYMMDDhhmm)" and "[Bug 39159] PRINTING: Switching radio buttons "Selection <-> All" Crashes" (If it woks, please confirm there for your OS)
Please check your backup folder before recovery:
a) Is a copy of the document in the backup folder?
b) If yes, make a copy of backup document to an other folder
c) check whether that document can be opened after recovery.
d) if not, also attach copy
e) try again with clean new user profile.

We need much more information than a simple "it does not work"
Comment 12 manj_k 2011-07-17 04:25:45 UTC
Reproducible [1]
with LibO 3.4.2 RC1 on WinXP 32b
[LibreOffice 3.4.2 OOO340m1 (Build:201)] 

AutoRecovery doesn't work for the attached 'autorecovery.doc'. [2]
The recovered file is always the last (manually) saved file. [3]

The up-to-date AutoRecovery info is saved:
1. in the user\backup folder 
(as 'autorecovery.doc_0.odt' / 'autorecovery.doc_1.odt')

2. in the LOKALE~1\Temp folder
(as 'lu4989y.tmp', or a similar name)

(respectively the paths in Tools > Options > LibreOffice > Paths >
Backups / Temporary files).

The up-to-date AutoRecovery info isn't available for the AutoRecovery process.


Attachments:

[1] Screenshots: AutoRecovery process - error messages

[2] Original 'autorecovery.doc' (created with LibO 3.4.2 RC1)

[3] Recovered file 'autorecovery.doc_1.odt'

[4] AutoRecovery info 1 / backup folder – 'autorecovery.doc_0.odt'

[5] AutoRecovery info 2 / Temp folder – 'lu4989y.tmp'
Comment 13 manj_k 2011-07-17 04:27:34 UTC
Created attachment 49203 [details]
[1] Screenshots: AutoRecovery process - error messages
Comment 14 manj_k 2011-07-17 04:28:32 UTC
Created attachment 49204 [details]
[2] Original 'autorecovery.doc' (created with LibO 3.4.2 RC1)
Comment 15 manj_k 2011-07-17 04:30:32 UTC
Created attachment 49205 [details]
[3] Recovered file 'autorecovery.doc_1.odt'
Comment 16 manj_k 2011-07-17 04:32:52 UTC
Created attachment 49206 [details]
[4] AutoRecovery info 1 / backup folder – 'autorecovery.doc_0.odt'
Comment 17 manj_k 2011-07-17 04:33:54 UTC
Created attachment 49207 [details]
[5] AutoRecovery info 2 / Temp folder – 'lu4989y.tmp'
Comment 18 manj_k 2011-07-17 06:31:41 UTC
LibO 3.4.2 RC1 on WinXP 32b
[LibreOffice 3.4.2 OOO340m1 (Build:201)]

Another sample without dataloss 

1. 'He heard quiet steps behind him.doc' - created with MSWord 2000 [6]
2. Opened with Writer
3. Modified the document
4. Killed soffice.bin

The recovered file 'He heard quiet steps behind him.doc' was the last saved file (without modifications) [7]

From AutoRecovery process (step 4):
"The automatic recovery process was interrupted.
The document listed below will be saved in the folder noted below if you click 'Save'"
(Path: Tools > Options > LibreOffice > Paths > My Documents)

This saved file 'He%20heard%20quiet%20steps%20behind%20him.doc_0.odt' is up-to-date with all current AutoRecovery info. [8] [9] [10]


Attachments:

[6] Original file 'He heard quiet steps behind him.doc' (created with MSWord 2000)

[7] Recovered file 1 'He heard quiet steps behind him.doc'

[8] Recovered file 2 'He%20heard%20quiet%20steps%20behind%20him.doc_0.odt'

[9] AutoRecovery info 1 / backup folder – 'He heard quiet steps behind him.doc_0.odt'

[10] AutoRecovery info 2 / Temp folder – 'luautmz.tmp'
Comment 19 manj_k 2011-07-17 06:33:41 UTC
Created attachment 49209 [details]
[6] Original file 'He heard quiet steps behind him.doc' (created with MSWord 2000)
Comment 20 manj_k 2011-07-17 06:35:04 UTC
Created attachment 49210 [details]
[7] Recovered file 1 'He heard quiet steps behind him.doc'
Comment 21 manj_k 2011-07-17 06:37:19 UTC
Created attachment 49211 [details]
[8] Recovered file 2 'He%20heard%20quiet%20steps%20behind%20him.doc_0.odt'
Comment 22 manj_k 2011-07-17 06:39:49 UTC
Created attachment 49212 [details]
[9] AutoRecovery info 1 / backup folder – 'He heard quiet steps behind him.doc_0.odt'
Comment 23 manj_k 2011-07-17 06:41:31 UTC
Created attachment 49213 [details]
[10] AutoRecovery info 2 / Temp folder – 'luautmz.tmp'
Comment 24 Rainer Bielefeld Retired 2011-07-17 07:27:36 UTC
Sigh, with manj_k's "autorecovery.doc" and proceeding as per Comment 11 I can reproduce the problem with 3.4.1 (Read error).

Unbelievable, today I get the problem with my simple test document from Comment 8. Unfortunately I have no idea why this prolbem sometimes does not appear.

I believe the most promising approach will be that Tor tries again to reproduce with "autorecovery.doc"?
Comment 25 Rainer Bielefeld Retired 2011-07-17 07:53:22 UTC
Nothing mysterious with my good test results any longer - I did my tests with from Comment 8 with my source.odt documents, not with my test.doc documents :-(

@manj_k:
Thank you for your endurance.
Comment 26 Don't use this account, use tml@iki.fi 2011-07-18 00:53:47 UTC
I think it is fairly pointless to attach screenshots or intermediate backup files, recovered files etc in this bug report. I am not really going to look at them. What is needed is exact reproduction instructions that work 100% reliably. Plus possibly then a sample document if it is for some reason impossible to reproduce the problem just using exact minimal reproduction instructions.

I am now going to concentrate on the autorecovery.doc from comment #14.

For the Mac, wouldn't using the kill command from a shell be better than "force quit" from Finder? For all I know, "force quit" might first try to make the process quit in a "clean" fashion, and thus not match a real crash.
Comment 27 Don't use this account, use tml@iki.fi 2011-07-18 01:10:27 UTC
BTW, why do you people use *two* minutes when testing this? Why not one minute, which would make it a bit less tedious to reproduce the bug? Have you noticed a difference in behaviour between one and two minutes?
Comment 28 Don't use this account, use tml@iki.fi 2011-07-18 01:14:43 UTC
OK, with the autorecovery.doc from comment I indeed see the stupid "Read-Error. Error reading file" message when the autorecovery is in progress. Could it be the image in the document that causes the recovery problem? Anyway, no need for any more instructions or attachments now, thanks!
Comment 29 Don't use this account, use tml@iki.fi 2011-07-18 04:58:51 UTC
BTW, when I in comment #6 talk about the automatically saved backup copies of documents always being in ODF, I was misunderstanding. I think... I was looking at the files in the temp directory's lu*.tmp subdirectory, not the one in LO the user/backup directory.
Comment 30 Don't use this account, use tml@iki.fi 2011-07-18 05:02:31 UTC
Oh, and to reproduce the problem from scratch, it is enough to insert a picture from a file into an empty text document, and save as .doc. Then open it, add some minimal amount of text, and wait for it to be auto-saved. Then kill soffice.bin, start LO, and watch the recovery fail.
Comment 31 Don't use this account, use tml@iki.fi 2011-07-18 05:09:03 UTC
Eh, and now I can reproduce it even with no picture inserted into the document, wtf. I thought I had checked that already last week.
Comment 32 Steve Edmonds 2011-07-18 13:12:31 UTC
I have had it on Suse linux, first noticed with LO3.3.2 in a real "hang" situation.
I can reproduce it in linux LO3.3.2 and on a mac with 3.4.1 every time with a .doc without a picture.
I always start with a .doc created in MSO and sent to me. I have not had a .doc yet that it does not exhibit this problem. The .doc is edited, not saved or saved as and wait for the autorecovery time to elapse before killing or flicking the switch at the wall.
Comment 33 Don't use this account, use tml@iki.fi 2011-07-19 02:51:39 UTC
My current theory is that the autosave files (the one stored in the user-specific user/backup directory) are always ODF format (and indeed have names that say so, like x4.doc_0.odt for a document loaded from the file x4.doc). They do seem to be well-formed proper ODF documents. But then when ther autorecovery tries to restore the in-progress edit state of the document from the autosave file, it uses the filter for the original document, i.e. it tries to load the above x4.doc autosave file x4.doc_0.odt as if it was a .doc document.
Comment 34 Don't use this account, use tml@iki.fi 2011-07-19 03:06:53 UTC
Here is a stack trace from SfxBaseModel::load() while attempting the recovery, where in the seqArguments "FilterName" is "MS Word 97" even though "URL" is "file:///C:/Documents%20and%20Settings/N.N/Application%20Data/LibreOffice/3/user/backup/x4.doc_0.odt", and that file indeed is a proper .odt file (and it *does* contain the document as it was when being edited and I intentionally killed the soffice.bin process).


>	sfxmi.dll!SfxBaseModel::load(const com::sun::star::uno::Sequence<com::sun::star::beans::PropertyValue> & seqArguments=0x00e9ec80 {size=9})  Line 1852	C++
 	sfxmi.dll!SfxBaseModel::recoverFromFile(const rtl::OUString & i_SourceLocation={...}, const rtl::OUString & i_SalvagedFile={...}, const com::sun::star::uno::Sequence<com::sun::star::beans::PropertyValue> & i_MediaDescriptor=0x00e9ee3c {size=8})  Line 1788 + 0x30 bytes	C++
 	fwkmi.dll!framework::AutoRecovery::implts_openOneDoc(const rtl::OUString & sURL={...}, comphelper::MediaDescriptor & lDescriptor={...}, framework::AutoRecovery::TDocumentInfo & rInfo={...})  Line 2663 + 0xb2 bytes	C++
 	fwkmi.dll!framework::AutoRecovery::implts_openDocs(const framework::DispatchParams & aParams={...})  Line 2548	C++
 	fwkmi.dll!framework::AutoRecovery::implts_doRecovery(const framework::DispatchParams & aParams={...})  Line 3018 + 0xc bytes	C++
 	fwkmi.dll!framework::AutoRecovery::implts_dispatch(const framework::DispatchParams & aParams={...})  Line 637	C++
 	fwkmi.dll!framework::AutoRecovery::implts_asyncDispatch(void * __formal=0x00000000)  Line 1631	C++
 	fwkmi.dll!framework::AutoRecovery::LinkStubimplts_asyncDispatch(void * pThis=0x06164908, void * pCaller=0x00000000)  Line 1620 + 0xf bytes	C++
 	tlmi.dll!01183544() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for tlmi.dll]	
 	vclmi.dll!vcl::EventPoster::DoEvent_Impl()  + 0x12 bytes	C++
 	vclmi.dll!vcl::EventPoster::LinkStubDoEvent_Impl()  + 0xe bytes	C++
 	tlmi.dll!01183544() 	
 	vclmi.dll!ImplHandleClose()  + 0x151 bytes	C++
 	vclmi.dll!ImplWindowFrameProc()  + 0x2e5 bytes	C++
 	vclmi.dll!SalFrame::CallCallback()  + 0x16 bytes	C++
 	vclmi.dll!SalFrameWndProc()  + 0x788 bytes	C++
 	vclmi.dll!SalFrameWndProcW()  + 0x30 bytes	C++
 	user32.dll!7e368734() 	
 	user32.dll!7e368816() 	
 	user32.dll!7e3689cd() 	
 	user32.dll!7e368a10() 	
 	vclmi.dll!ImplDispatchMessage()  + 0xc bytes	C++
Comment 35 Rainer Bielefeld Retired 2011-07-19 03:21:32 UTC
(In reply to comment #34)
Yes, that's it!

When I 
1. copy paste the original document autorecovery.doc to the backup folder 
   after the crash, 
2. delete the autorecovery.doc_1.odt in backup folder
3. Rename autorecovery.doc by autorecovery.doc_1.odt
4. Backup file will be opened during recovery without problems

That has to be expected from Tor's results
Comment 36 Don't use this account, use tml@iki.fi 2011-07-19 04:29:40 UTC
Yeah, the interesting question now is which is the true intentional of the people who have written this code: 1) that documents indeed always are autosaved as ODF, (and the use of the MS Word filter then to recover an autosaved text document is a bug), or 2) that a document is autosaved in the format it is associated with (the one it was loaded from, or saved as, whichever was later, or something) (and that a text document at least in fact is autosaved as .odt is a bug).
Comment 37 Rainer Bielefeld Retired 2011-07-19 05:36:11 UTC
Heritage from OOo? I did some tests:
a) OOo 3.1.1 and 3.4-Dev OOo300m103(Build 9578) both create .odt backup 
   files in the backup folder
b) I checked a crash with OOo-Dev3.4, recovery shows the same "Read file error"
   like LibO

What ever that might tell concerning "the true intentional of the
people who have written this code".

I did not find a related OOo Issue
Comment 38 Don't use this account, use tml@iki.fi 2011-07-19 05:46:02 UTC
Could it really be that the use of "foreign" (non-ODF) formats is so rare among the Sun/Oracle developers writing this code that nobody even tried if the autorecovery code in the 3.4 development line works for non-ODF documents?
Comment 39 Don't use this account, use tml@iki.fi 2011-07-19 07:35:44 UTC
It seems that this small change fixes the problem: http://cgit.freedesktop.org/libreoffice/libs-core/commit/?id=6c397dfce4f7afddb55329b738388ab4eb16b7f8

Committed to master, one review needed for the 3-4 branch, three reviews needed for the 3-4-2 branch.
Comment 40 Don't use this account, use tml@iki.fi 2011-07-20 03:08:12 UTC
Fridrich cherry-picked it to the libreoffice-3-4 branch, will thus be in 3.4.3.
Comment 41 Michael Meeks 2011-07-20 03:23:42 UTC
It seems the bug was introduced Jan 2010:

commit 502e0900447f2eab00ead2e2703fe0395ac92baa
Author: Frank Schoenheit [fs] <frank.schoenheit@sun.com>
Date:   Tue Jan 5 22:32:38 2010 +0100

in the big re-write of the recovery code. Good to have it fixed, but certainly not a LibreOffice specific bug.
Comment 42 Petr Mladek 2011-07-25 10:45:27 UTC
We needed 3.4.2-rc3, so we added this fix there.