108631 – ENHANCEMENT: Optimization of the file-save strategy

Bug 108631 - ENHANCEMENT: Optimization of the file-save strategy

Summary: ENHANCEMENT: Optimization of the file-save strategy

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	filters and storage (show other bugs)
Version: (earliest affected)	Inherited From OOo
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:	perf

Depends on:
Blocks:	Too-Much-File-Access
	Show dependency tree / graph

Reported:	2017-06-19 07:37 UTC by Telesto
Modified:	2019-04-19 11:01 UTC (History)
CC List:	2 users (show)

See Also:	84246
Crash report or crash signature:

Attachments
Example file (3.20 MB, application/vnd.oasis.opendocument.spreadsheet) 2017-06-19 07:38 UTC, Telesto	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Telesto 2017-06-19 07:37:35 UTC

Description:
I'm not an developer, so I'm have no clue why it's working how it works. However it does seem that the file saving of large Calc files isn't that efficient. Especially for large spreadsheets (or Writer Documents)

When saving the sample file around 450 MB will be written to the disk for every regular and auto-save. As some sort of catching mechanism before saving. The exported file will be around 3,19 MB. However the caching mechanism will use all ssd drive write cycles pretty fast. 

Another oddity is that after saving the file gets loaded again. It's looking quite inefficient to me; especially for large files.

Excel is only writing the necessary stuff to the disk (as far I know of) 

Steps to Reproduce:
1. Open the attached file
2. Save a copy and monitor disk usage (process explorer)

Actual Results:  
- Around 450 MB gets written to the disk
- The saved file gets reloaded 

Expected Results:
- Less writes to the disk
- No reload (or not in this extend)


Reproducible: Always

User Profile Reset: No

Additional Info:
Version: 6.0.0.0.alpha0+
Build ID: cbf371e07fd5dea1ea08a1f299360d1273961ebd
CPU threads: 4; OS: Windows 6.19; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2017-06-14_23:13:57
Locale: en-US (nl_NL); Calc: CL

and 3.0.0


User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0

Comment 1 Telesto 2017-06-19 07:38:22 UTC

Created attachment 134124 [details]
Example file

Comment 2 Aron Budea 2017-06-20 00:08:26 UTC

Confirmed with LO 5.4beta2.
For me it's even more than 450 MB, something close to the ~580 MB of content.xml.

I'm not touching severity, but I'd rather consider this a performance bug.

Comment 3 Aron Budea 2017-06-22 22:28:17 UTC

Eike and Markus told me it's likely something done in generic storage code, and not Calc-specific.

Comment 4 Aron Budea 2017-08-04 20:40:35 UTC

This commit should be relevant (clue from Markus):
https://cgit.freedesktop.org/libreoffice/core/commit/?id=f92183833fa569006602ac7e93c906d2094e0d4d
author		Matúš Kukan <matus.kukan@collabora.com>	2014-12-13 23:11:53 (GMT)
committer	Matúš Kukan <matus.kukan@collabora.com>	2014-12-13 23:21:20 (GMT)

"package: Better to use temporary files for huge memory zip streams

ZipPackageBuffer was holding the whole compressed data stream in one uno::Sequence which seems to be a lot for big documents in some cases."

Comment 5 Buovjaga 2019-04-19 11:01:29 UTC

https://bugs.documentfoundation.org/show_bug.cgi?id=113042#c24 mentions plans to work on zip compression. Perhaps this aspect can be dealt with as well.