Bug 137308 - import newer / zstandard as zip-format
Summary: import newer / zstandard as zip-format
Status: ASSIGNED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Akshay Kumar Dubey
URL:
Whiteboard:
Keywords: difficultyInteresting, easyHack, skillCpp
Depends on:
Blocks: Format-Filters
  Show dependency treegraph
 
Reported: 2020-10-07 10:56 UTC by paulystefan
Modified: 2025-04-01 13:40 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description paulystefan 2020-10-07 10:56:59 UTC
zstandard is a fast compression archive format available as zip-Container since Data Version 6.3.8

It is much faster than older compression zip formats.

https://en.wikipedia.org/wiki/Zip_(file_format)

https://facebook.github.io/zstd/

And dictionary compression is for text applications ideal.

Zstandard is a fast compression algorithm, providing high compression ratios. It also offers a special mode for small data, called dictionary compression. The reference library offers a very wide range of speed / compression trade-off, and is backed by an extremely fast decoder (see benchmarks below). Zstandard library is provided as open source software using a BSD license. Its format is stable and published as IETF RFC 8478


There is no recovery record like winrar or parchive but it is faster.
Comment 1 Mike Kaganski 2020-10-07 11:18:25 UTC
Sigh. Please try reading before submitting ideas - especially when your previous idea (tdf#137305) has a reply with a link to the relevant part of the standard.

Specifically, [1]:

> An OpenDocument Package shall meet the following requirements:
> A)It shall be a Zip file, as defined by [ZIP]. All files contained in the
> Zip file shall be non compressed (STORED) or compressed using the “deflate”
> (DEFLATED) algorithm.

The standard does not allow any other algorithms. Doing otherwise would make the package non-conformant/invalid.

Your suggestion is not in the right place, again. You should submit it to OASIS, with the accompanying analysis of the gains that you expect to get, for typical documents (with text and images), both size-wise and performance-wise. It is WONTFOX here until the standard changes.

[1] http://docs.oasis-open.org/office/OpenDocument/v1.3/OpenDocument-v1.3-part2-packages.html#__RefHeading__752791_826425813
Comment 2 paulystefan 2020-10-07 11:25:35 UTC
but this is part of your work to improve LO.

Is LO-dev not in contact of oasis-organisation?

This enhancement ist not tomorrow but next year or decade or century or in millions of years.
Comment 3 Michael Meeks 2020-10-07 11:38:39 UTC
Let me re-open this and turn it into an easy hack.

Clearly we should be able to import files in newer ZIP formats such as zstandard.

My hope would be that libzip would (eventually) be able to do that for us; but anyhow - the inflator code is here - it explictly uses inflate.

https://git.libreoffice.org/core/+/refs/heads/master/package/source/zipapi/Inflater.cxx

Possibly we will want to include a new external module for zstd (I guess) to re-use that code - the BSD seems fine, and to add that to readlicense_oo.

Beyond that I think we'll want read support widely deployed for some years - before getting this into the standard, and then some more years before turning it on by default.

Contributions much appreciated =)
Comment 4 Mike Kaganski 2020-10-07 11:49:44 UTC
However, see also ISO/IEC 21320-1 "Document Container File — Part 1: Core", which, according to the Wikipedia article mentioned in comment 0, requires that "Files in ZIP archives may only be stored uncompressed, or using the "deflate" compression (i.e. compression method may contain the value "0" - stored or "8" - deflated)".
Comment 5 paulystefan 2020-10-07 15:32:16 UTC
zstandard is up to 4 times faster with modern hardware and with more compression possibility.

So for user a legal zip odt-mode for archiving (actual)

and a fast zip odt-mode with zstandard for working could be possible.

zstandard compression is also possible for internal automatic fast saving in a second parallel work file in session.

Huge files are the main target for this improvement.
Comment 7 Mike Kaganski 2020-10-16 10:49:21 UTC
(In reply to paulystefan from comment #6)
> so kernel starts 4 times faster.

Please don't turn this request into unmanageable advertising board (and no, "kernel starts 4 times faster" is wrong, 4 times improvements were reported for decompression, while boot time improvements were much more modest).
Comment 8 paulystefan 2020-10-26 15:32:07 UTC
ok.

I want only to show the actual way of other open source communities in this area.

For all compression and decompression in the framework of Libre Office, there is a potential with modern codecs like zstandard.

With Libre Office 7 some old software and hardware without modern cpu functions are gone.
So more is possible in this field also with new normality SSD with 500MB/s and more instead of HDD with 50 to 100 MB/s.

Benchmarks about this are available in internet searches by "7zip zstandard" and others.
Comment 9 paulystefan 2022-07-01 01:32:11 UTC
zstandard (actual Version 1.5.2) is perhaps also a possibility in next future for the installation files of LO for less size and faster installation.
Comment 10 Akshay Kumar Dubey 2025-04-01 09:08:41 UTC
hi,

i have assigned this issue to myself and would like to work on implementing Zstandard compression in Libreoffice. so far, i have cloned the Zstandard (zstd) repository from Facebook and placed the lib part from it in the external/zstd folder. I have also attempted to configure it—though i believe i may not have done it correctly (by referring to the existing zlib configuration) in the configure.ac . additionally, i have tried to introduce a flag to switch between zstd and zlib.

i would appreciate any suggestions or guidance on how to proceed further. any insights on properly configuring and integrating Zstandard within LibreOffice would be especially helpful.

Thanks,
Akshay
Comment 11 Michael Meeks 2025-04-01 13:40:04 UTC
According to this - there is now a zstd type added to the ZIP format https://github.com/facebook/zstd/issues/1378

While we should be very careful about producing these files (ie. we should probably not do that) - I would expect that it is useful to add support for de-compressing OD* files that have zstd support.

We use zstd in COOL - and it is as advertised way faster than the deflate we used before - the fundamentally slow piece is always compression not de-compression really, the bit format is better and so on and it's not a huge CPU/RAM hog blah blah =)

As for standardization - I'd support that but we should of course have an implementation or two first.

So - how to do that - step #1: we should add zstd into its own external/ folder as an optional dependency and make it build there inside our build-system based on the external/zlib template.

Then step #1 - we should attack the package/source/ folders - if you:

$ git grep -i deflate # in there

you'll see the entry points for the existing deflate code. Luckily the zstd API is IIRC remarkably similar to the zlib one. Clearly we should leave the deflate path alone and keep that working, and re-factor the code wherever necessary to abstract the compression algorithm there. So perhaps three commits:

1. external zstd compilation
2. re-factor package/ code to allow plug-able compress/decompress
3. implement a zstd compression backend.

How does that sound ? =)

Akshay - sounds like you're making progress with 1. I would expect to compile not from a git checkout, but a release tar-ball I imagine, and we would really want to see that pushed to gerrit so we can look at the code-changes.

Thanks!