Bug 88298 - [RFE] Make Flat XML ODF format git friendly
Summary: [RFE] Make Flat XML ODF format git friendly
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 91098 (view as bug list)
Depends on:
Blocks: ODF-Flat
  Show dependency treegraph
 
Reported: 2015-01-11 18:31 UTC by David Juran
Modified: 2019-07-19 09:40 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Juran 2015-01-11 18:31:38 UTC
As mentioned in Bug 88127, I would like to save LibreOffice documents into a GIT repository with the ability to diff and merge them. A start of one approach is to use the Flat XML ODF format and save those as git commits but this is complicated by  time-stamps and similar meta-data that cause the Flat file to change even though no changes has been made to the document itself.

This RFE is about (optionally) making the Flat XML ODF format less "chatty" by excluding some metadata which would make version control of documents easier to accomplish.
Comment 1 Robinson Tryon (qubit) 2015-01-11 19:05:37 UTC
(In reply to David Juran from comment #0)
> This RFE is about (optionally) making the Flat XML ODF format less "chatty"
> by excluding some metadata which would make version control of documents
> easier to accomplish.

Sounds like a good proposal for improvement.

Severity -> enhancement
Status -> NEW

It would be great to have some initial notes about what kinds of "chatty" information could be stripped or re-organized to make the FODF formats more attractive for diffing.

Here's one challenge: Indenting and adding newlines to the XML. Here's what the FODTs look like now:

<office:meta><meta:creation-date>2015-01-11T13:55:01.410705604</meta:creation-date><dc:date>2015-01-11T13:55:18.841988521</dc:date><meta:editing-duration>PT17S</meta:editing-duration><meta:editing-cycles>1</meta:editing-cycles><meta:document-statistic meta:table-count="0" meta:image-count="0" meta:object-count="0" meta:page-count="1" meta:paragraph-count="1" meta:word-count="2" meta:character-count="13" meta:non-whitespace-character-count="12"/>

Uggh. Run that through 'xmllint --format', and you can get something human-readable that diffs pretty reasonably:

<     <dc:date>2015-01-11T13:55:18.841988521</dc:date>
<     <meta:editing-duration>PT17S</meta:editing-duration>
<     <meta:editing-cycles>1</meta:editing-cycles>
---
>     <dc:date>2015-01-11T13:55:41.538627598</dc:date>
>     <meta:editing-duration>PT39S</meta:editing-duration>
>     <meta:editing-cycles>2</meta:editing-cycles>
Comment 2 David Tardon 2015-05-11 07:02:43 UTC
*** Bug 91098 has been marked as a duplicate of this bug. ***
Comment 3 V字龍(Vdragon) 2017-03-20 14:49:29 UTC
Here's a git clean filter implementation that strips most of the annoying tags
https://github.com/Lin-Buo-Ren/Useful_Git_Clean_and_Smudge_Filters

You may check out the source here:
https://github.com/Lin-Buo-Ren/Useful_Git_Clean_and_Smudge_Filters/blob/master/Git%20Clean%20and%20Smudge%20Filters/clean-odf-flat-xml.bash
Comment 4 V字龍(Vdragon) 2018-01-03 19:15:28 UTC
The previously mentioned clean filter has moved to the following address:
https://github.com/libreoffice-tw/Clean-Filter-for-Flat-XML-ODF-Documents
Comment 5 Basil Eric Rabi 2018-02-04 15:57:16 UTC
Aside from the time stamps and printer metadata, saving an unmodified calc file in .fods format changes a lot of "style:name" entries in the xml file. The style:name values seem to be just swapped around. This causes a lot of diff noises in the xml file even when no modification is done on the format or in the content.
Comment 6 Mike Kaganski 2019-05-01 07:44:51 UTC
(In reply to Basil Eric Rabi from comment #5)
> saving an unmodified calc
> file in .fods format changes a lot of "style:name" entries in the xml file.
> The style:name values seem to be just swapped around. This causes a lot of
> diff noises in the xml file even when no modification is done on the format
> or in the content.

https://git.libreoffice.org/core/+/eb128a7d6bbc27b4dbbf9461c81c90e40203b114 *possibly* would address that part.
Comment 7 regivanx 2019-07-19 09:40:18 UTC
I have already used fodt in git, and automatic styles are painful.

1) they prevent transparent comparison of two versions of files in git,
2) they make it difficult to apply an xsl style sheet to a set of fodt files, because the automatic style numbers are random,
3) the file is less readable by a human being.

It is possible to disable the use of automatic styles in LibreOffice Writer:

Tools -> Options -> LibreOffice Writer -> Comparison -> Random number to improve accuracy of document comparison -> NOT Store it when changing the document

My opinion is that automatic styles are not useful for all purposes, and there should be a LibreOffice command-line option to replace automatic styles with named styles in files (that is, a xsl style sheet).

Ideally, flat files should not use automatic styles by default, in order to be easier to handle with external programs (git, xsltproc...).