Bug 70625 - Add --cat parameter to make git diffs pretty
Summary: Add --cat parameter to make git diffs pretty
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
4.2.0.0.alpha0+ Master
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: reviewed:2022
Keywords: difficultyMedium, easyHack, skillCpp, topicCleanup
: 152446 152451 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-10-18 17:27 UTC by Michael Meeks
Modified: 2024-02-07 18:33 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Meeks 2013-10-18 17:27:04 UTC
I just read:
   http://git-scm.com/book/ch7-2.html (search for odt)

And realized we could become a much more useful part of the git ecosystem if we had a simple '--cat' mode that dumped most (if not all) formats as flat-odf to allow easier diffing (and logging?)

What we need (instead of that embarassing script) is something that handles doc and ODF and spreadsheets etc. and dumps text so:

loffice --cat <filename> | less

We will inevitably need to use a /tmp file and adapt the existing --convert-to txt code to do this easily, but - it'd be great to have that built-in.

Code pointers:

desktop/source/app/cmdlineargs.cxx
http://cgit.freedesktop.org/libreoffice/core/tree/desktop/source/app/cmdlinehelp.cxx#n50
http://cgit.freedesktop.org/libreoffice/core/tree/desktop/source/app/cmdlineargs.cxx

checkout GetConversionList:

http://cgit.freedesktop.org/libreoffice/core/tree/desktop/source/app/app.cxx#n2443

Of course - the slight downer is that the factory process, and the command-line-arg parsing piece are separated by a process / factory barrier - so it's possible we'd need to add a round-trip reply that returns the /tmp filename and then cat that.

Thanks for poking !
Comment 1 Björn Michaelsen 2013-10-18 20:03:40 UTC
CCing developer list to Easy Hacks missing this.
Comment 2 How can I remove my account? 2013-10-19 08:11:40 UTC
Please note that there will be lots of false differences between (flat) ODF exports of even only minimally edited versions of a document, though, thanks to gratuitous randomness in the ODF output. See my recent changes in master that check for the LIBO_ONEWAY_STABLE_ODF_EXPORT environment variable, and in case that is set, do ODF output in a more "stable" manner. Unfortunately, as the "ONEWAY" part of the env var name indicates, this is not intended to be roundtrip-safe, though, so that code path can not be made the default. It would be great if people who actually understand the issues involved would figure out roudtrip-safe ways to solve the problem (that task it likely not an EasyHack)
Comment 3 Michael Meeks 2013-10-19 14:43:04 UTC
Sure - I think the flat-odf idea is prolly not a great one - instead we should just convert to text. Then of course we have a paragraph / line-wrapping problem instead: that small changes perturb that a lot, but ... c'est la vie.

I agree that ODF is hardly easy to read on the command-line; but plain-text: more so ;-)
Comment 4 How can I remove my account? 2013-10-19 16:35:16 UTC
Ah, I didn't read the linked article so I thought you meant flat ODF for storage of docs, but yeah, if just for diffing ,hen plain text obviously is better.
Comment 5 Deena Francis 2014-07-29 03:02:14 UTC
I'd like to work on this as my first open source contribution.
Comment 6 Deena Francis 2014-07-29 11:46:22 UTC
There is small problem with the idea, the --convert-to option prints out to stdout a string indicating the file names involved.

For example :

$ soffice --headless --convert-to txt:Text --outdir /tmp /tmp/filezBIL6j.odt

This prints out the following string to stdout :


"""

convert /tmp/filezBIL6j.odt -> /tmp/filezBIL6j.txt using Text

"""

If we are to reuse --convert-to code, this string will be present along with the --cat output.

Unfortunately, I could not find where this string gets printed in the code using http://opengrok.libreoffice.org/
Comment 7 Maxim Monastirsky 2014-07-29 11:56:24 UTC
(In reply to comment #6)
> Unfortunately, I could not find where this string gets printed in the code
> using http://opengrok.libreoffice.org/
It's here:
http://opengrok.libreoffice.org/xref/core/desktop/source/app/dispatchwatcher.cxx#489
Comment 8 Michael Meeks 2014-07-29 15:16:43 UTC
We could of course add a different option to LibreOffice specific to this functionality (perhaps) eg. a --cat <file> parameter ? that could output the plain text on stdout - and avoid the necessity to manage /tmp files in shell - which is a bit horrible =)
Comment 9 Deena Francis 2014-07-30 00:43:05 UTC
Added my changes to gerrit for review.

https://gerrit.libreoffice.org/#/c/10623/
Comment 10 Commit Notification 2014-08-15 06:39:27 UTC
deenafrancis committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=3d318e6cf4a183e14a043840b9990958c7527536

fdo#70625 Add --cat parameter to make git diffs pretty



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 11 Michael Meeks 2014-08-15 06:41:23 UTC
Nice patch Deena - thanks for that.

A few more things might be a good idea:

* perhaps auto-enable --headless on Linux - there are other settings to force windows not to show on Mac etc. I think ;-)

* work out what we want for spreadsheets / presentations - export as CSV ? or ... something there would be good I guess.

Then I guess we need to persuade someone to knock up some sample git config bits such that we can get nice human readable diffs easily - perhaps dropping that in the wiki ? [ and the 4.4 features wiki page I guess - perhaps the SparkleShare people would appreciate that too ? ].

Anyhow - a really great start; - oh ! and also can you send an E-mail like this:

https://wiki.documentfoundation.org/Development/Developers#Developers_and_Contributors_list

so we get the auditing right =)

Thanks !
Comment 12 Deena Francis 2014-08-15 13:52:38 UTC
Thanks for verifying and accepting the patch. 
I will work on improving the --cat feature for document formats other than those supported by swriter.

(In reply to comment #11)
> Nice patch Deena - thanks for that.
> 
> A few more things might be a good idea:
> 
> * perhaps auto-enable --headless on Linux - there are other settings to
> force windows not to show on Mac etc. I think ;-)
> 
> * work out what we want for spreadsheets / presentations - export as CSV ?
> or ... something there would be good I guess.
> 
> Then I guess we need to persuade someone to knock up some sample git config
> bits such that we can get nice human readable diffs easily - perhaps
> dropping that in the wiki ? [ and the 4.4 features wiki page I guess -
> perhaps the SparkleShare people would appreciate that too ? ].
> 
> Anyhow - a really great start; - oh ! and also can you send an E-mail like
> this:
> 
> https://wiki.documentfoundation.org/Development/
> Developers#Developers_and_Contributors_list
> 
> so we get the auditing right =)
> 
> Thanks !
Comment 13 Robinson Tryon (qubit) 2015-12-14 04:59:02 UTC Comment hidden (obsolete)
Comment 14 Robinson Tryon (qubit) 2016-02-18 14:52:33 UTC Comment hidden (obsolete)
Comment 15 Michael Meeks 2019-03-20 09:14:23 UTC
This needs extending to spreadsheet & impress formats I guess =) thanks though Deena ! =)
Comment 16 Matt K 2021-03-27 03:24:58 UTC Comment hidden (off-topic)
Comment 17 woundorf 2021-03-29 13:59:41 UTC Comment hidden (off-topic)
Comment 18 Matt K 2021-04-01 02:46:45 UTC Comment hidden (off-topic)
Comment 19 Mike Kaganski 2022-12-10 09:44:14 UTC Comment hidden (off-topic)
Comment 20 Mike Kaganski 2022-12-10 09:46:59 UTC
*** Bug 152451 has been marked as a duplicate of this bug. ***
Comment 21 Mike Kaganski 2022-12-10 09:47:11 UTC
*** Bug 152446 has been marked as a duplicate of this bug. ***
Comment 22 Tagwerk 2022-12-10 11:48:01 UTC
In reply to Mike Kaganski ...
> *** Bug 152451 has been marked as a duplicate of this bug. ***
> *** Bug 152446 has been marked as a duplicate of this bug. ***
In that case...

    +1 for some love for the two bugs

Not sure how much love would be needed, maybe just a little as conversion to PDF is aleady there.
Comment 23 Moritz Duge (allotropia) (a.k.a. kolAflash) 2023-03-24 11:58:57 UTC
Hint: "--cat" doesn't work if another LibreOffice instance is running.

Bug 129713 - soffice --cat option doesn't output document text contents when there is another instance running
Comment 24 Matt K 2023-07-24 23:07:26 UTC
(In reply to Matt K from comment #16)
> but html works for all 3;
> should that then be used (objects are converted to images for .odt and .ods)?

I debugged the --cat output path for .odp files and can easily make the --cat option work for HTML output (similar to "--convert-to html" command line); is this acceptable or do we want to iterate every object and print its text contents without HTML tags present?
Comment 25 Tagwerk 2024-02-07 18:33:51 UTC
(In reply to Matt K from comment #24)
> (In reply to Matt K from comment #16)
> > but html works for all 3;
> > should that then be used (objects are converted to images for .odt and .ods)?
> 
> I debugged the --cat output path for .odp files and can easily make the
> --cat option work for HTML output (similar to "--convert-to html" command
> line); is this acceptable or do we want to iterate every object and print
> its text contents without HTML tags present?
I don't think we should be mixing things, if it is plain text (UTF-8) for documents, it should probably be plain text (UTF-8) for presentation formats.

As above a +1 for any work making --cat work for all formats.

(I'll throw in a mention of an obscure edge case for documents: Bug 159583)