It seems that Excel has a special file format that allows to save the data into a special "binary" format that allows a very fast opening/saving of the data.
This is particularly useful when the spreadsheet is very large, mostly with data. I once stumbled into such beast with a xls of 200MB that was unusable when converted to ODF.
So why not implement a "binary" ODF file format, where basically we store the document in a format that reduce the overhead of a XML parsing?
Like "Flat ODF", where data is unzipped and with just one file, the "binary" ODF:
* does not need to be compliant with any other standard,
* does not need to care about interoperability,
* will load and save much faster,
* "use at your own risk".
Filetypes can be bods, bodt, bodp, bodf, bodb, bodc as suggestion.
Wouldn't this have to go through the ODF committee? I don't really know how these things play out but I didn't think we had the ability to just create new file formats.
No need to go to ODF committee because we are not at the moment claiming for a new standard... We just want to read and store huge files quicker.
Besides, I don't thing "Flat ODF" as we already have, is supported by anything else than LibreOffice. "Flat ODF" is very handy to peek & poke at some XML content with a plain text editor, out of LibreOffice (at user's own risk).
Reasonable enhancement then! Thanks for that explanation.
As you know - who knows if/when this will be implemented as it would really require a volunteer to know quite a bit and be able to implement a pretty challenging feature.
That being said, setting to NEW. Maybe an interesting GSoC project for next year?
(In reply to Olivier Hallot from comment #0)
> "binary" format that allows a very fast opening/saving of the data.
Any source for such claims? Does Excel open xlsx file much slower than xls?
> I once stumbled into such beast with a xls of 200MB that was unusable
> when converted to ODF.
Then you should report it as a bug, and it should be fixed. It's not a reason to invent a new format (and we already have too much of them to maintain).
> So why not implement a "binary" ODF file format
I don't understand. "ODF" is all about XML, if it's not XML, then it's not ODF.
> Like "Flat ODF", where data is unzipped and with just one file
Even MS Excel binary format is actually a container for several files ("streams"), just like ZIP. It's just a different format than ZIP.
> * does not need to be compliant with any other standard,
> * does not need to care about interoperability,
So what's the point of such format if it creates vendor lock-in? We already had once binary formats in StarOffice/OOo days, and we dropped them, because of this. (Not to mention that XML format is easy to fix by hand if it gets some corruption.)
> * will load and save much faster,
See above. You can't claim such thing without proving it. The XML parsing overhead is not so high. And after all it depends on implementation. I'm sure that a filter of a binary format could be horribly slow, if poorly implemented.
> * "use at your own risk".
And you expect that people will follow this? Even now people don't follow the recommendation of working with ODF, and exporting to DOC/DOCX/whatever only if you need to open the file with MS Office. Even worse - people keep saving to MS Word/Excel 2003 XML formats, although it's known to be in a bad state, and data loss is almost guaranteed with each save.
So IMHO this bug should be closed as WONTFIX. You can also ask on the dev ml. I'm pretty sure you'll get there exactly the same response.
(In reply to Olivier Hallot from comment #2)
> Besides, I don't thing "Flat ODF" as we already have, is supported by
> anything else than LibreOffice.
And yet, it conforms to the ODF standard, see section 2.2.1-c of the ODF 1.2 spec.
(In reply to Maxim Monastirsky from comment #4)
> (In reply to Olivier Hallot from comment #0)
> > * does not need to be compliant with any other standard,
> > * does not need to care about interoperability,
> So what's the point of such format if it creates vendor lock-in? We already
> had once binary formats in StarOffice/OOo days, and we dropped them, because
> of this. (Not to mention that XML format is easy to fix by hand if it gets
> some corruption.)
I agree with that. Libreoffice should be about open formats, not closing them off with another set of binaries that no other programs can read or will support.
> > * will load and save much faster,
> See above. You can't claim such thing without proving it. The XML parsing
> overhead is not so high. And after all it depends on implementation. I'm
> sure that a filter of a binary format could be horribly slow, if poorly
From what i've read, reading excel binary is a bit faster than xml, but slower when writing.
I think this is a terrible idea. Libreoffice's headline is is "Moved by freedom -- powered by standards". "Standards" in this case means usage of ODF. Introduction of a new proprietary format goes directly against that. (I could also mention that such a filter would be a big chunk of code that we'd be struck with supporting ~forever.)
I don't like the idea of a special, new file format especially because it's binary and LibreOffice-only.
File size might have been an issue in the past when users deal with a gazillion of data points. But then, storage is cheap and you can pack it for sharing. As for the usability another format adds confusion. So let's close this as WF primarily since "bodf" would lack on standardization and because Calc is not meant for extensive data analysis (talking about GB+).