Bug 125401 - FILESAVE: Saving a document creates a new file (new inode number) thereby messing-up with the creation time of the document
Summary: FILESAVE: Saving a document creates a new file (new inode number) thereby mes...
Status: RESOLVED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
6.1.0.3 release
Hardware: All Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, regression
Depends on:
Blocks: rename-before-copy-regressions
  Show dependency treegraph
 
Reported: 2019-05-20 20:02 UTC by Alex
Modified: 2019-07-10 07:02 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex 2019-05-20 20:02:17 UTC
Description:
With LibreOffice 6.2.3.2 on Mint 19.1 Cinnamon

I have tested with Writer, Calc and Impress, and the problem is present with each program.

When one opens an existing document, upon saving (whether or not changes have been made) instead of updating the content and modification date of the file, a new file is created (new inode number) and the creation time of the document is changed to the saved/modification time.

Steps to Reproduce:
1. Open an existing document
2. Save it as is
3. Check the timestamps in the file explorer
   and/or
   check the inode and timestamps in command line using:
   1- `stat \path\to\document
   2- `sudo debugfs -R 'stat /path/to/document' /dev/sdXn | grep crtime 

Actual Results:
The inode number is modified.
The creation time is updated.

Expected Results:
The inode number should stay the same.
The creation time should not be updated.


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 Aron Budea 2019-05-27 04:37:17 UTC
Confirmed in 6.3.0.0.alpha1+ (38ac0586448d4f07811b139f62f62686b029feba) / Ubuntu 18.04.
No issue in 6.0.0.3.
=> regression

Bibisected to the following commit using repo bibisect-linux-64-6.1. I only checked whether the inode number stays the same in the stat command output. Adding Cc: to Miklos Vajna, please take a look.

https://cgit.freedesktop.org/libreoffice/core/commit/?id=2157a3536f97ff5ae7c82611a801fef7e3708983
author		Miklos Vajna <vmiklos@collabora.co.uk>	2018-01-08 16:49:25 +0100
committer	Miklos Vajna <vmiklos@collabora.co.uk>	2018-01-09 09:09:27 +0100

sfx2 store: try rename before copying
Comment 2 Miklos Vajna 2019-05-27 07:28:23 UTC
I'm not sure what to fix here. My understanding is that ~all software has this best practice, that if you overwrite a large file, then first you write to a tmp file next to the final target, and once you're done, you delete the original and move the tmp file to the final destination.

Sure, if really wanted we can keep this for Windows only where we have ReplaceFileW(), but I would rather close this as wontfix.

References:

- https://lwn.net/Articles/682721/ "the standard POSIX atomic-overwrite trick (write to temporary file in same directory, rename over the top)"
- GLib's g_file_set_contents() does the same
Comment 3 Alex 2019-05-29 14:02:15 UTC
How is it done in Windows? From the link to LWN it seems it uses the tunnel effect... is that correct? Else, couldn't it be implemented the same way in Linux?

Maybe a work around in Linux (to at least preserve the creation time) would be, upon creation of the temporary copy, to make the creation time of this temporary copy the same as the original file.
Comment 4 Miklos Vajna 2019-05-29 14:29:32 UTC
Windows provides <https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-replacefilea>, the closest what we have is already happening on Linux, as far as I see. That's why I lean towards just closing this as won't fix.

Just checked: next to glib, rsync doesn't have any code to tweak ctimes.
Comment 5 Alex 2019-05-29 14:50:18 UTC
Some additional thoughts.

Reading about the g_file_set_contents() function[1] it points to the fact that changing the inode number also breaks existing hard links to the original file (wow!).

I don't know how it is done exactly at the code level, but to solve both the 'creation time' (crtime) and 'hard link' issues, here is another suggestion: why not overwrite the content of the original file with the content of the temporary file rather than overwriting the file itself?

To avoid the risk of losing any information in case of crash (whether the original data or the new data), just before the overwriting of the content an additional temporary copy of the original file could be created. This would preserve the content of that file in case of crash in the process of writing.

Added to note: mind that ctime (change time) and crtime (creation time) are different attributes.


[1] https://developer.gnome.org/glib/stable/glib-File-Utilities.html#g-file-set-contents
Comment 6 Miklos Vajna 2019-05-29 15:32:34 UTC
We already detect the situation when the hard link count is >1 and go back to the old slow way. I just don't see a similar way for the creation time part, it seems the rest of the world just don't care about that. ;-)
Comment 7 Alex 2019-05-29 15:46:04 UTC
Ah. Good that the hard link problem is taken care of. :)

Not sure that the rest of the world just don't care, see for instance this recent thread here:
https://www.linuxquestions.org/questions/linux-newbie-8/how-do-i-preserve-crtime-creation-birth-time-when-copying-from-windows-ntfs-to-linux-ext4-4175625229/

I admit it is a different case-scenario so I don't try to lean heavily on it, but the thread is still an interesting read.

Reflecting on this read, I just noticed that for the file replacement, instead of doing:
$ mv SOURCE DEST
(which preserves neither the creation time nor the modification time)
one could do:
$ cp -p SOURCE DEST
(which preserve both).

Wouldn't it be simple/easy to implement?
Comment 8 Alex 2019-05-29 15:55:39 UTC
Bonus: this method might also eliminate the need to take care of hard links. :)
Comment 9 Miklos Vajna 2019-05-30 06:59:10 UTC
Thanks for the 'cp -p' example, I'll take a look what APIs it uses.
Comment 10 Alex 2019-05-30 16:12:05 UTC
No problem.

I don't know if it can be useful but I also came across the case of rsync, which has an -X (--xattr) option to preserve extended attributes.

See:
https://wiki.archlinux.org/index.php/File_permissions_and_attributes#Preserving_extended_attributes
Comment 11 Alex 2019-05-30 17:10:46 UTC
I also found some additional information in this Q&A:
https://unix.stackexchange.com/questions/118840/preserving-extended-attributes-with-cp-rsync/119980#119980

By the way, thanks for having a look into this.
Comment 12 Alex 2019-05-30 22:25:09 UTC
Ah, no. I just realized that timestamps attributes are independent of extended attributes.
This was a false lead, sorry.
Comment 13 Alex 2019-06-02 21:29:24 UTC
(In reply to Miklos Vajna from comment #4)

I figured the following regarding rsync.

In case the file DEST already exists, the commands
$ rsync -t --inplace SOURCE DEST
preserves the crtime of DEST and assigns to DEST the mtime of SOURCE.

That is, the result regarding those timestamps is the same as with
$ cp -p SOURCE DEST
Comment 14 Miklos Vajna 2019-07-09 10:46:03 UTC
I got back to this and I'm afraid 'cp -a' can't copy the creation time of the destination document after all.

Here is what I tried:

1) Save test.odt using Writer.

2) Check creation time:

$ sudo debugfs -R "stat $PWD/test.odt" /dev/mapper/system-root | grep crtime
crtime: 0x5d246d4a:24aeed68 -- Tue Jul  9 12:32:42 2019

3) 'cp -a test.odt test2.odt'

4) Check creation time of test2.odt:

$ sudo debugfs -R "stat $PWD/test2.odt" /dev/mapper/system-root | grep crtime
crtime: 0x5d246eaf:7a4c63cc -- Tue Jul  9 12:38:39 2019

Based on this, I still think we can't do much here -> closing, sorry. The document has meta.xml BTW, which always contains the creation / modification time, and that is correct.

If it is discovered that there is some sane API we can use, similar to stat() and S_ISLNK(), then it would make sense to reopen this.

(Marking as "not our bug", given that this works fine on Windows, it's a shorcoming of the APIs we have available inside #ifdef UNX, I would say.)
Comment 15 Alex 2019-07-09 19:12:32 UTC
(In reply to Miklos Vajna from comment #14)
> I got back to this and I'm afraid 'cp -a' can't copy the creation time of
> the destination document after all.
> 
> Here is what I tried:
> 
> 1) Save test.odt using Writer.
> 
> 2) Check creation time:
> 
> $ sudo debugfs -R "stat $PWD/test.odt" /dev/mapper/system-root | grep crtime
> crtime: 0x5d246d4a:24aeed68 -- Tue Jul  9 12:32:42 2019
> 
> 3) 'cp -a test.odt test2.odt'
> 
> 4) Check creation time of test2.odt:
> 
> $ sudo debugfs -R "stat $PWD/test2.odt" /dev/mapper/system-root | grep crtime
> crtime: 0x5d246eaf:7a4c63cc -- Tue Jul  9 12:38:39 2019
> 
> Based on this, I still think we can't do much here -> closing, sorry. The
> document has meta.xml BTW, which always contains the creation / modification
> time, and that is correct.
> 
> If it is discovered that there is some sane API we can use, similar to
> stat() and S_ISLNK(), then it would make sense to reopen this.
> 
> (Marking as "not our bug", given that this works fine on Windows, it's a
> shorcoming of the APIs we have available inside #ifdef UNX, I would say.)

Hello Miklos and thank you for looking into this.

Following your explanation, I fear you did not get the point.

Of course if you copy a file using 'cp -a SOURCE DEST' to a NEW (non existing) destination file DEST, the new destination will have an updated creation time (since it is created at the time the 'cp' command is called.

However, the command 'cp -a SOURCE DEST' applied to an already existing DEST file preserves the creation time of DEST.

This is what could be used to preserve the creation time of any document being edited, where SOURCE would be the cached and currently being edited version, and DEST the initial file from which the edition resumed.

Could you consider reassessing the scenario and therefore maybe reopening the bug?

Thanks,

Alex
Comment 16 Alex 2019-07-09 19:30:17 UTC
PS: I realize I was maybe not very clear either in my initial explanation, sorry about that...
Comment 17 Miklos Vajna 2019-07-10 07:02:47 UTC
I get what you say, but it goes against the whole idea described comment 2, and comment 4 already mentions how we have an OS-provided API to do this on Windows and how on Linux we manually move the temp file to the final destination.

Given that create time is only observable via some debug API as root, I think the cost of supporting this scenario is way larger than the benefit of not copying over data from a temp file byte by byte to the final destination, especially for e.g. larger spreadsheets.

If some other developer find a good API we can use here for the UNX case, I think it makes sense to re-open this.