Bug 93104 (Binary-ODF) - Add "binary" file types bodt, bods,bodp, etc..
Summary: Add "binary" file types bodt, bods,bodp, etc..
Status: RESOLVED WONTFIX
Alias: Binary-ODF
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
4.4.4.3 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Format-Filters ODF
  Show dependency treegraph
 
Reported: 2015-08-03 18:59 UTC by Olivier Hallot
Modified: 2019-03-11 14:05 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Olivier Hallot 2015-08-03 18:59:03 UTC
It seems that Excel has a special file format that allows to save the data into a special "binary" format that allows a very fast opening/saving of the data.

This is particularly useful when the spreadsheet is very large, mostly with data. I once stumbled into such beast with a xls of 200MB that was unusable when converted to ODF.

So why not implement a "binary" ODF file format, where basically we store the document in a format that reduce the overhead of a XML parsing? 

Like "Flat ODF", where data is unzipped and with just one file, the "binary" ODF: 

* does not need to be compliant with any other standard,
* does not need to care about interoperability,
* will load and save much faster,
* "use at your own risk".

Filetypes can be bods, bodt, bodp, bodf, bodb, bodc as suggestion.
Comment 1 Joel Madero 2015-08-03 19:35:41 UTC
Wouldn't this have to go through the ODF committee? I don't really know how these things play out but I didn't think we had the ability to just create new file formats.
Comment 2 Olivier Hallot 2015-08-03 19:44:41 UTC
Hi Joel: 
No need to go to ODF committee because we are not at the moment claiming for a new standard... We just want to read and store huge files quicker.

Besides, I don't thing "Flat ODF" as we already have, is supported by anything else than LibreOffice. "Flat ODF" is very handy to peek & poke at some XML content with a plain text editor,  out of LibreOffice (at user's own risk).

regards
Comment 3 Joel Madero 2015-08-03 19:46:34 UTC
Reasonable enhancement then! Thanks for that explanation.

As you know - who knows if/when this will be implemented as it would really require a volunteer to know quite a bit and be able to implement a pretty challenging feature.

That being said, setting to NEW. Maybe an interesting GSoC project for next year?
Comment 4 Maxim Monastirsky 2015-08-03 20:59:06 UTC
(In reply to Olivier Hallot from comment #0)
> "binary" format that allows a very fast opening/saving of the data.
Any source for such claims? Does Excel open xlsx file much slower than xls?

> I once stumbled into such beast with a xls of 200MB that was unusable
> when converted to ODF.
Then you should report it as a bug, and it should be fixed. It's not a reason to invent a new format (and we already have too much of them to maintain).

> So why not implement a "binary" ODF file format
I don't understand. "ODF" is all about XML, if it's not XML, then it's not ODF.

> Like "Flat ODF", where data is unzipped and with just one file
Even MS Excel binary format is actually a container for several files ("streams"), just like ZIP. It's just a different format than ZIP.

> * does not need to be compliant with any other standard,
> * does not need to care about interoperability,
So what's the point of such format if it creates vendor lock-in? We already had once binary formats in StarOffice/OOo days, and we dropped them, because of this. (Not to mention that XML format is easy to fix by hand if it gets some corruption.)

> * will load and save much faster,
See above. You can't claim such thing without proving it. The XML parsing overhead is not so high. And after all it depends on implementation. I'm sure that a filter of a binary format could be horribly slow, if poorly implemented.

> * "use at your own risk".
And you expect that people will follow this? Even now people don't follow the recommendation of working with ODF, and exporting to DOC/DOCX/whatever only if you need to open the file with MS Office. Even worse - people keep saving to MS Word/Excel 2003 XML formats, although it's known to be in a bad state, and data loss is almost guaranteed with each save.

So IMHO this bug should be closed as WONTFIX. You can also ask on the dev ml. I'm pretty sure you'll get there exactly the same response.
Comment 5 Maxim Monastirsky 2015-08-03 21:39:05 UTC
(In reply to Olivier Hallot from comment #2)
> Besides, I don't thing "Flat ODF" as we already have, is supported by
> anything else than LibreOffice.
And yet, it conforms to the ODF standard, see section 2.2.1-c of the ODF 1.2 spec.
Comment 6 MM 2015-08-03 21:42:45 UTC
(In reply to Maxim Monastirsky from comment #4)
> (In reply to Olivier Hallot from comment #0)

> > * does not need to be compliant with any other standard,
> > * does not need to care about interoperability,
> So what's the point of such format if it creates vendor lock-in? We already
> had once binary formats in StarOffice/OOo days, and we dropped them, because
> of this. (Not to mention that XML format is easy to fix by hand if it gets
> some corruption.)

I agree with that. Libreoffice should be about open formats, not closing them off with another set of binaries that no other programs can read or will support.

> > * will load and save much faster,
> See above. You can't claim such thing without proving it. The XML parsing
> overhead is not so high. And after all it depends on implementation. I'm
> sure that a filter of a binary format could be horribly slow, if poorly
> implemented.

From what i've read, reading excel binary is a bit faster than xml, but slower when writing.
Comment 7 David Tardon 2015-09-07 08:04:11 UTC
I think this is a terrible idea. Libreoffice's headline is is "Moved by freedom -- powered by standards". "Standards" in this case means usage of ODF. Introduction of a new proprietary format goes directly against that. (I could also mention that such a filter would be a big chunk of code that we'd be struck with supporting ~forever.)
Comment 8 Thomas Lendo 2018-09-29 18:55:28 UTC
Adding needsUXEval.

I don't like the idea of a special, new file format especially because it's binary and LibreOffice-only.
Comment 9 Heiko Tietze 2019-03-11 14:05:29 UTC
File size might have been an issue in the past when users deal with a gazillion of data points. But then, storage is cheap and you can pack it for sharing. As for the usability another format adds confusion. So let's close this as WF primarily since "bodf" would lack on standardization and because Calc is not meant for extensive data analysis (talking about GB+).