From the daily build language pack LibreOfficeDev_24.8.0.0.alpha0_Linux_x86-64_deb_langpack_de.tar.gz, 7z tells that the below deb package uses the xz compression method : ------------------------------ $ 7z l libreofficedev24.8-dict-de_24.8.0.0.alpha0-1_amd64.deb [...] Type = xz Physical Size = 7713188 Method = LZMA2:23 CRC64 Streams = 1 Blocks = 4 ------------------------------ Next, I extracted the content of this deb package with the below command : 7z x libreofficedev24.8-dict-de_24.8.0.0.alpha0-1_amd64.deb This extracts the below file : data.tar Next I compressed this tar archive with xz using several compression forces : ------------------------------ $ xz -1 --threads=1 --stdout data.tar > d-1.xz $ xz -4 --threads=1 --stdout data.tar > d-4.xz $ xz -5 --threads=1 --stdout data.tar > d-5.xz $ xz -9 --threads=1 --stdout data.tar > d-9.xz ------------------------------ Looking at the size of the files, it seems the deb archive doesn't use the maximum compression force of xz : ------------------------------ $ ls -lhS * -rw-r--r-- 1 j j 82M déc. 16 03:40 data.tar -rw-r--r-- 1 j j 16M déc. 16 14:21 d-1.xz -rw-r--r-- 1 j j 11M déc. 16 14:22 d-4.xz -rw-r--r-- 1 j j 7,4M déc. 16 03:40 libreofficedev24.8-dict-de_24.8.0.0.alpha0-1_amd64.deb -rw-r--r-- 1 j j 6,7M déc. 16 14:24 d-5.xz -rw-r--r-- 1 j j 3,8M déc. 16 14:25 d-9.xz ------------------------------ The memory requirement on GNU/Linux in order to use LibreOffice is at least 256MB but prefered 512MB : https://www.libreoffice.org/get-help/system-requirements/#Linux The man page of xz tells : "decompressing a file created with xz -9 currently requires 65 MiB of memory". So there is no limitation to use the maximum force ("-9") for xz compression for a LibreOffice deb package. Users would then be able to download the deb files faster and save storage. The servers would save storage and bandwidth. As this is a single compression for many downloads, the maximum compression force for xz would provide an overall benefit.
Thanks Jérôme. Do you know what difference it makes in the time needed to decompress such packages, and overall install times? Could you test that too?
On xz decompression speed, the man page of unxz tells : "On the same hardware, the decompression speed is approximately a constant number of bytes of compressed data per second. In other words, the better the compression, the faster the decompression will usually be." My xz/unxz version : ------------- $ unxz --version xz (XZ Utils) 5.2.2 liblzma 5.2.2 $ ------------- If you want to know a part of the overall installation process performance, we can pipe the xz decompression process to file extraction process (tar in my test). I performed the below test with the core deb archive which is the largest. Of course, I ensured only one terminal ran : ------------- $ mkdir t $ dpkg-deb --extract LibreOfficeDev_24.8.0.0.alpha0_Linux_x86-64_deb/DEBS/lodevbasis24.8-core_24.8.0.0.alpha0-1_amd64.deb t $ tar cf sys-tree.tar t $ xz -9 --threads=1 --stdout sys-tree.tar > sys-tree-9.tar.xz $ xz -1 --threads=1 --stdout sys-tree.tar > sys-tree-1.tar.xz $ rm -rf t && mkdir t $ time ( unxz --to-stdout sys-tree-1.tar.xz | tar xf - --directory t ) real 0m5,969s user 0m5,992s sys 0m0,528s $ rm -rf t && mkdir t $ time ( unxz --to-stdout sys-tree-9.tar.xz | tar xf - --directory t ) real 0m5,930s user 0m5,360s sys 0m0,588s $ $ rm -rf t && mkdir t $ time ( unxz --to-stdout sys-tree-1.tar.xz | tar xf - --directory t ) real 0m6,093s user 0m6,004s sys 0m0,560s $ rm -rf t && mkdir t $ time ( unxz --to-stdout sys-tree-9.tar.xz | tar xf - --directory t ) real 0m5,905s user 0m5,368s sys 0m0,624s $ ------------- On my hardware, the core deb archive that has been compressed with the '-9' force parameter decompresses slightly faster than the archive that has been compressed with the '-1' force parameter.
Charts are very important to each of us, without charts we cannot know where we are. Let's play https://battleshipgame.io with me.
Thanks Jérôme. From what you said, I think it makes sense. Cloph, is that an easy switch in packaging config?
Most deb packages are created by the "epm" tool. The "distro-configs/LibreOfficeLinux.conf" chooses the "--enable-epm" option of "autogen.sh". epm has no command option for compression force. The desktop integration package is built with dpkg-deb in sysui/CustomTarget_deb.mk.
The epm source file workdir/UnpackedTarball/epm/deb.c shows : --- if (Verbosity) puts("Building Debian binary distribution..."); if (run_command(directory, "dpkg --build %s", name)) --- Thus the epm tool itself calls the dpkg program. When I look into the dpkg-deb man page, there are 3 environment variables that change the behaviour of the compression method of dpkg-deb (and thus dpkg) : DPKG_DEB_COMPRESSOR_TYPE=xz DPKG_DEB_COMPRESSOR_LEVEL=9 DPKG_DEB_THREADS_MAX=1 Could we set those variables in the config_host.mk.in file ?
Created attachment 198353 [details] the proposed fix for this bug
(In reply to Jérôme from comment #7) > Created attachment 198353 [details] > the proposed fix for this bug Please submit it to Gerrit: https://wiki.documentfoundation.org/Development/gerrit/setup If you want to do it completely via web, after creating a Gerrit account you may visit https://git.libreoffice.org/core/+/refs/heads/master/config_host.mk.in and click the [edit] link to immediately create a new change for the file. https://wiki.documentfoundation.org/Documentation/GerritEditing Also: https://wiki.documentfoundation.org/Development/GetInvolved#License_statement
I just submitted it Gerrit.
Created attachment 198378 [details] comparaison of deb archives sizes with or without the new variables definitions Thanks to the proposal from Christian Lohmaier, I moved those variable definitions into the packaging recipe in instsetoo_native/CustomTarget_install.mk instead. I build without defining thoses variables with the below configuration : ./autogen.sh --with-distro=LibreOfficeLinux --with-package-format=deb --disable-online-update --disable-breakpad Then I get the deb packages sizes in workdir/installation. Next I delete all those deb archives. Finally I define the new variables and build again. The attached file compares the deb archives sizes. The total saving for the 43 deb archives is 2.6% (10 MiB).
As a reviewer Christian Lohmaier notes that TDF baseline has currently 1.20.9 version of dpkg. He points that the manpage https://man7.org/linux/man-pages/man1/dpkg-deb.1.html gives the following versions that begin to support the environment variables : - 1.21.10 for DPKG_DEB_COMPRESSOR_LEVEL - 1.21.10 for DPKG_DEB_COMPRESSOR_TYPE - 1.21.9 for DPKG_DEB_THREADS_MAX Currently the only way would be to use the "dpkg-deb -Zxz -z9" command line instead of "dpkg" (maybe as an additional patch on workdir/UnpackedTarball/epm). However, the same man page tells the command line options began to be supported with the following version : - 1.16.2 for "-z" - 1.15.6 for "-Z" - 1.21.9 for "--threads-max" Updating the epm source with a patch may break the build with older versions of dpkg (and we couldn't use the --threads-max yet). Perhaps the environment variables method is gentler until the next TDF baseline update.
The xz man page (https://manpages.debian.org/testing/xz-utils/xz.1.en.html) tells that from 6 to 9 has quite the same settings that affect compression speed : ----------- Preset DictSize CompCPU CompMem DecMem -0 256 KiB 0 3 MiB 1 MiB -1 1 MiB 1 9 MiB 2 MiB -2 2 MiB 2 17 MiB 3 MiB -3 4 MiB 3 32 MiB 5 MiB -4 4 MiB 4 48 MiB 5 MiB -5 8 MiB 5 94 MiB 9 MiB -6 8 MiB 6 94 MiB 9 MiB -7 16 MiB 6 186 MiB 17 MiB -8 32 MiB 6 370 MiB 33 MiB -9 64 MiB 6 674 MiB 65 MiB Column descriptions: [...] CompCPU is a simplified representation of the LZMA2 settings that affect compression speed. The dictionary size affects speed too, so while CompCPU is the same for levels -6 ... -9, higher levels still tend to be a little slower. ------------
Created attachment 198493 [details] --with-lang=ALL comparaison of deb archives sizes with 3 configurations I used this configuration : ./autogen.sh --with-distro=LibreOfficeLinux --with-lang=ALL --with-package-format=deb --disable-online-update –disable-breakpad It appears 9 force isn't efficient on small files (current language dependent files except dictionaries). 3 test cases : - A : default (undefined environment variables) - B : xz method, single thread - C : all B variables + 9 force.
Created attachment 198542 [details] an other session --with-lang=ALL I don't understand the statistics. Between each test I call : 1. make clean 2. ./autogen.sh --with-distro=LibreOfficeLinux --with-lang=ALL --with-package-format=deb --disable-online-update –disable-breakpad 3. make I don't restart my computer between the tests. I noticed that /tmp (on tmpfs) has a lot of directory with names like "ooopackaging*" with the modification time of the previous tests. This may decrease the available memory between each test.
Created attachment 198599 [details] --with-lang=ALL comparaison of deb archives sizes with 3 configurations I used this configuration : ./autogen.sh --with-distro=LibreOfficeLinux --with-lang=ALL --with-package-format=deb --disable-online-update –disable-breakpad 3 test cases : - A : default (undefined environment variables) - B : xz method, single thread - C : all B variables + 9 force. Compared to my previous tests : - I configured /tmp on hard disk instead of RAM memory, - I set "export PARALLELISM=1". My test host only has 12 Gigabytes of physical RAM, which could lead to memory competition. It appears 9 force isn't efficient on small files (current language dependent files except dictionaries). However the archives size in B case is slightly smaller than in A case : the single thread options makes the compression better. Moreover it should decrease the memory consumption.
Created attachment 198748 [details] --with-lang=ALL comparaison of deb archives sizes across methods and forces I used this configuration : ./autogen.sh --with-distro=LibreOfficeLinux --with-lang=ALL --with-package-format=deb --disable-online-update –disable-breakpad For all cases, I set "export PARALLELISM=1". My host has 12 GiB of physical RAM. Into instsetoo_native/CustomTarget_install.mk : - always "DPKG_DEB_THREADS_MAX=1" - DPKG_DEB_COMPRESSOR_TYPE takes "none", "zstd", "gzip" or "xz", - DPKG_DEB_COMPRESSOR_LEVEL takes several values. On my specific host, it appears the best setting is : DPKG_DEB_COMPRESSOR_TYPE = xz DPKG_DEB_COMPRESSOR_LEVEL = 7. Maybe the archives are too small to take benefit from compression.
I will try to use the XZ_OPT environment variable for xz.
Created attachment 198777 [details] --with-lang=ALL comparaison of deb archives sizes with several XZ_OPT values I used this configuration : ./autogen.sh --with-distro=LibreOfficeLinux --with-lang=ALL --with-package-format=deb --disable-online-update –disable-breakpad For all cases, I set "export PARALLELISM=1" because my host has 12 GiB of physical RAM. Into instsetoo_native/CustomTarget_install.mk : - always "DPKG_DEB_THREADS_MAX=1", "DPKG_DEB_COMPRESSOR_TYPE=xz", - XZ_OPT has always "--threads=1 --memlimit=max", - XZ_OPT take several compression forces (0 to 9, a few extreme, --x86). It appears the final compression depends on the kind of archive (and maybe the size). I will try to propose a patch with : - dictionaries with 2 force - help with 8 force - default to 5 force. It could help to access to the "Installed-Size" in order to choose the compression force. We build smaller archives faster when disabling the xz multi-thread/process compression ("make -j 24" already provides parallel dpkg processes).
Created attachment 199411 [details] total packages size decreases with patch on epm I submitted a patch to gerrit which saves 16MiB (2.4%) of total size. The attached file shows the comparison results.
Created attachment 199536 [details] comparaison of xz dictionary maximum size values With 128MiB maximum dictionary size, the epm patch saves 17MiB (2,4%) and it reduces the cpu usage while compressing (saves ~4% "user+sys" time). 128 MiB maximum xz dictionary size ensure that we respect the 256 MiB hardware memory prerequisites for installing LO on Linux (xz decompression memory ~ dictionary size of the xz archive). The patch activate the --threads=1 which makes the overall cpu efficiency better (and archive compression better). Without this option with n processors, you will have n "make" processes which are running each n compression processes : n^2 concurrent processes on the same resources. This explain why the "real" time increases with this test with PARALLELISM=1 (because this variable has no impact on the number of xz parallel processes).
I think the patch is ready for review here : https://gerrit.libreoffice.org/c/core/+/179624
The lastest patch should provide immediate benefit on build hosts without updating dpkg (see comment 11). This patch now uses the XZ_OPT environment variable, which seems to be available for years in xz.
Created attachment 199841 [details] patch comparaison against master after unsetting PARALLELISM The patch limits the memory consumption of compression (< 1,3GiB for xz with max dictionary size of 128 MiB). Thus I can now unset PARALLELISM. I tested it on my host with 4 cores and 12 GiB. I saw each test build with the default "make -j 4". The attached file shows the results.