I just read: http://git-scm.com/book/ch7-2.html (search for odt) And realized we could become a much more useful part of the git ecosystem if we had a simple '--cat' mode that dumped most (if not all) formats as flat-odf to allow easier diffing (and logging?) What we need (instead of that embarassing script) is something that handles doc and ODF and spreadsheets etc. and dumps text so: loffice --cat <filename> | less We will inevitably need to use a /tmp file and adapt the existing --convert-to txt code to do this easily, but - it'd be great to have that built-in. Code pointers: desktop/source/app/cmdlineargs.cxx http://cgit.freedesktop.org/libreoffice/core/tree/desktop/source/app/cmdlinehelp.cxx#n50 http://cgit.freedesktop.org/libreoffice/core/tree/desktop/source/app/cmdlineargs.cxx checkout GetConversionList: http://cgit.freedesktop.org/libreoffice/core/tree/desktop/source/app/app.cxx#n2443 Of course - the slight downer is that the factory process, and the command-line-arg parsing piece are separated by a process / factory barrier - so it's possible we'd need to add a round-trip reply that returns the /tmp filename and then cat that. Thanks for poking !
CCing developer list to Easy Hacks missing this.
Please note that there will be lots of false differences between (flat) ODF exports of even only minimally edited versions of a document, though, thanks to gratuitous randomness in the ODF output. See my recent changes in master that check for the LIBO_ONEWAY_STABLE_ODF_EXPORT environment variable, and in case that is set, do ODF output in a more "stable" manner. Unfortunately, as the "ONEWAY" part of the env var name indicates, this is not intended to be roundtrip-safe, though, so that code path can not be made the default. It would be great if people who actually understand the issues involved would figure out roudtrip-safe ways to solve the problem (that task it likely not an EasyHack)
Sure - I think the flat-odf idea is prolly not a great one - instead we should just convert to text. Then of course we have a paragraph / line-wrapping problem instead: that small changes perturb that a lot, but ... c'est la vie. I agree that ODF is hardly easy to read on the command-line; but plain-text: more so ;-)
Ah, I didn't read the linked article so I thought you meant flat ODF for storage of docs, but yeah, if just for diffing ,hen plain text obviously is better.
I'd like to work on this as my first open source contribution.
There is small problem with the idea, the --convert-to option prints out to stdout a string indicating the file names involved. For example : $ soffice --headless --convert-to txt:Text --outdir /tmp /tmp/filezBIL6j.odt This prints out the following string to stdout : """ convert /tmp/filezBIL6j.odt -> /tmp/filezBIL6j.txt using Text """ If we are to reuse --convert-to code, this string will be present along with the --cat output. Unfortunately, I could not find where this string gets printed in the code using http://opengrok.libreoffice.org/
(In reply to comment #6) > Unfortunately, I could not find where this string gets printed in the code > using http://opengrok.libreoffice.org/ It's here: http://opengrok.libreoffice.org/xref/core/desktop/source/app/dispatchwatcher.cxx#489
We could of course add a different option to LibreOffice specific to this functionality (perhaps) eg. a --cat <file> parameter ? that could output the plain text on stdout - and avoid the necessity to manage /tmp files in shell - which is a bit horrible =)
Added my changes to gerrit for review. https://gerrit.libreoffice.org/#/c/10623/
deenafrancis committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=3d318e6cf4a183e14a043840b9990958c7527536 fdo#70625 Add --cat parameter to make git diffs pretty The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Nice patch Deena - thanks for that. A few more things might be a good idea: * perhaps auto-enable --headless on Linux - there are other settings to force windows not to show on Mac etc. I think ;-) * work out what we want for spreadsheets / presentations - export as CSV ? or ... something there would be good I guess. Then I guess we need to persuade someone to knock up some sample git config bits such that we can get nice human readable diffs easily - perhaps dropping that in the wiki ? [ and the 4.4 features wiki page I guess - perhaps the SparkleShare people would appreciate that too ? ]. Anyhow - a really great start; - oh ! and also can you send an E-mail like this: https://wiki.documentfoundation.org/Development/Developers#Developers_and_Contributors_list so we get the auditing right =) Thanks !
Thanks for verifying and accepting the patch. I will work on improving the --cat feature for document formats other than those supported by swriter. (In reply to comment #11) > Nice patch Deena - thanks for that. > > A few more things might be a good idea: > > * perhaps auto-enable --headless on Linux - there are other settings to > force windows not to show on Mac etc. I think ;-) > > * work out what we want for spreadsheets / presentations - export as CSV ? > or ... something there would be good I guess. > > Then I guess we need to persuade someone to knock up some sample git config > bits such that we can get nice human readable diffs easily - perhaps > dropping that in the wiki ? [ and the 4.4 features wiki page I guess - > perhaps the SparkleShare people would appreciate that too ? ]. > > Anyhow - a really great start; - oh ! and also can you send an E-mail like > this: > > https://wiki.documentfoundation.org/Development/ > Developers#Developers_and_Contributors_list > > so we get the auditing right =) > > Thanks !
Migrating Whiteboard tags to Keywords: (EasyHack DifficultyBeginner SkillCpp TopicCleanup ) [NinjaEdit]
JanI is default CC for Easy Hacks (Add Jan; remove LibreOffice Dev List from CC) [NinjaEdit]
This needs extending to spreadsheet & impress formats I guess =) thanks though Deena ! =)
Using version 7.1.1.2 on Windows 10, the --cat option seems to not work from command line (nothing is printed); was this intentionally changed and intended to not be supported (even though the help still prints "--cat" as an option)? The "--convert-to txt" option works for .odt, but doesn't convert text in objects (e.g. Insert -> Shape); should objects be supported, or just simply do what --convert-to does but send output to console rather than a file? Also, "--convert-to txt" doesn't work for .ods and .odp files, and "--convert-to csv" doesn't work for .odp, but html works for all 3; should that then be used (objects are converted to images for .odt and .ods)?
I want to add that the --cat option does not work (empty output) if LibreOffice is open. Maybe this is the problem encountered by Matt K.
(In reply to woundorf from comment #17) > the --cat option does not work (empty output) if LibreOffice is open I don't see any output on Windows whether or not LibreOffice is open.
Comment 16, comment 17 and comment 18 are about bug 129713 and bug 112536.
*** Bug 152451 has been marked as a duplicate of this bug. ***
*** Bug 152446 has been marked as a duplicate of this bug. ***
In reply to Mike Kaganski ... > *** Bug 152451 has been marked as a duplicate of this bug. *** > *** Bug 152446 has been marked as a duplicate of this bug. *** In that case... +1 for some love for the two bugs Not sure how much love would be needed, maybe just a little as conversion to PDF is aleady there.
Hint: "--cat" doesn't work if another LibreOffice instance is running. Bug 129713 - soffice --cat option doesn't output document text contents when there is another instance running
(In reply to Matt K from comment #16) > but html works for all 3; > should that then be used (objects are converted to images for .odt and .ods)? I debugged the --cat output path for .odp files and can easily make the --cat option work for HTML output (similar to "--convert-to html" command line); is this acceptable or do we want to iterate every object and print its text contents without HTML tags present?
(In reply to Matt K from comment #24) > (In reply to Matt K from comment #16) > > but html works for all 3; > > should that then be used (objects are converted to images for .odt and .ods)? > > I debugged the --cat output path for .odp files and can easily make the > --cat option work for HTML output (similar to "--convert-to html" command > line); is this acceptable or do we want to iterate every object and print > its text contents without HTML tags present? I don't think we should be mixing things, if it is plain text (UTF-8) for documents, it should probably be plain text (UTF-8) for presentation formats. As above a +1 for any work making --cat work for all formats. (I'll throw in a mention of an obscure edge case for documents: Bug 159583)