As mentioned in Bug 88127, I would like to save LibreOffice documents into a GIT repository with the ability to diff and merge them. A start of one approach is to use the Flat XML ODF format and save those as git commits but this is complicated by time-stamps and similar meta-data that cause the Flat file to change even though no changes has been made to the document itself.
This RFE is about (optionally) making the Flat XML ODF format less "chatty" by excluding some metadata which would make version control of documents easier to accomplish.
(In reply to David Juran from comment #0)
> This RFE is about (optionally) making the Flat XML ODF format less "chatty"
> by excluding some metadata which would make version control of documents
> easier to accomplish.
Sounds like a good proposal for improvement.
Severity -> enhancement
Status -> NEW
It would be great to have some initial notes about what kinds of "chatty" information could be stripped or re-organized to make the FODF formats more attractive for diffing.
Here's one challenge: Indenting and adding newlines to the XML. Here's what the FODTs look like now:
<office:meta><meta:creation-date>2015-01-11T13:55:01.410705604</meta:creation-date><dc:date>2015-01-11T13:55:18.841988521</dc:date><meta:editing-duration>PT17S</meta:editing-duration><meta:editing-cycles>1</meta:editing-cycles><meta:document-statistic meta:table-count="0" meta:image-count="0" meta:object-count="0" meta:page-count="1" meta:paragraph-count="1" meta:word-count="2" meta:character-count="13" meta:non-whitespace-character-count="12"/>
Uggh. Run that through 'xmllint --format', and you can get something human-readable that diffs pretty reasonably:
*** Bug 91098 has been marked as a duplicate of this bug. ***
Here's a git clean filter implementation that strips most of the annoying tags
You may check out the source here:
The previously mentioned clean filter has moved to the following address:
Aside from the time stamps and printer metadata, saving an unmodified calc file in .fods format changes a lot of "style:name" entries in the xml file. The style:name values seem to be just swapped around. This causes a lot of diff noises in the xml file even when no modification is done on the format or in the content.
(In reply to Basil Eric Rabi from comment #5)
> saving an unmodified calc
> file in .fods format changes a lot of "style:name" entries in the xml file.
> The style:name values seem to be just swapped around. This causes a lot of
> diff noises in the xml file even when no modification is done on the format
> or in the content.
https://git.libreoffice.org/core/+/eb128a7d6bbc27b4dbbf9461c81c90e40203b114 *possibly* would address that part.
I have already used fodt in git, and automatic styles are painful.
1) they prevent transparent comparison of two versions of files in git,
2) they make it difficult to apply an xsl style sheet to a set of fodt files, because the automatic style numbers are random,
3) the file is less readable by a human being.
It is possible to disable the use of automatic styles in LibreOffice Writer:
Tools -> Options -> LibreOffice Writer -> Comparison -> Random number to improve accuracy of document comparison -> NOT Store it when changing the document
My opinion is that automatic styles are not useful for all purposes, and there should be a LibreOffice command-line option to replace automatic styles with named styles in files (that is, a xsl style sheet).
Ideally, flat files should not use automatic styles by default, in order to be easier to handle with external programs (git, xsltproc...).
Any movement on this enhancement - I have a 300 page policy document I am tracking in Git for changes and the noise creates large git overhead with each save.
*** This bug has been marked as a duplicate of bug 85660 ***