Bug 159598 - Merging a whole column increases file size unnecessarily (comment 8)
Summary: Merging a whole column increases file size unnecessarily (comment 8)
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:ods
Depends on:
Blocks: Calc-Merge-Split
  Show dependency treegraph
 
Reported: 2024-02-06 13:48 UTC by Heiko Tietze
Modified: 2024-04-16 19:00 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Test file (1.41 MB, application/vnd.oasis.opendocument.spreadsheet)
2024-02-06 13:48 UTC, Heiko Tietze
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Heiko Tietze 2024-02-06 13:48:28 UTC
Created attachment 192432 [details]
Test file

Configured a few cells with col width, row height, a bit validity checks, some merged cells, and hidden remaining content. Not more than 100 cells in total.

Saved, and at the next step trying to add protection LibreOffice stalled. So I went back to the saved document but no success as well. I have to kill the application on load.

Created on macOS with 7.6.4, and reproduced in a VM with

SAL_USE_VCLPLUGIN=gen instdir/program/scalc --safe-mode /home/ht/UTTT_Board.ods
Comment 1 ady 2024-02-06 14:48:32 UTC
The main problem generating the delay/lag is the hidden rows/columns.

1. Open attachment 192432 [details] (it might take more than a minute, YMMV).
2. Select all cells.
3. Menu Format > Columns > Show. The hidden columns will be displayed.
4. Select all cells; Menu Format > Rows > Show. The hidden rows will be displayed.
5. Save as a different name and close.

6. Now opening the new file will take some seconds, not minutes.

This issue with (all "inactive") hidden rows and columns has been mentioned several times in more than one report. I don't recall whether a specific report is dedicated only to this issue.

Probably a simpler ods file will show the main issue anyway (so there will be no need to wait minutes to initially open the sample file).

Setting as NEW (until someone finds a dupe, if that happens).
Comment 2 Telesto 2024-02-06 19:23:45 UTC Comment hidden (obsolete)
Comment 3 Telesto 2024-02-06 19:45:21 UTC
Well the content.xml is lovely 300 MB file, with the content repeating itself endlessly

Do you have the steps from scratch to generate such a file? The file itself is broken, IMHO
Comment 4 ady 2024-02-06 20:15:28 UTC
(In reply to Telesto from comment #2)
> Maybe bug 156297

Actually that is a good example of the delays that are caused by hiding the inactive columns and/or rows.

With all columns shown, starting from cell A1, use PgDn (press and hold the key for 10 seconds) to scroll. Take note of the row.
Now hide "inactive" columns. Go back to A1.
Once again, use PgDn (press and hold the key for 10 seconds) to scroll. Take note of the row and compare it to the prior case.

The scroll is much slower (it lags) when the inactive columns are hidden.

(In reply to Telesto from comment #3)
> The file itself
> is broken, IMHO

IDK whether it is broken or not, but Calc eventually opens it. It takes a lot of time, but there is no error message, nor a recovery info warning, nor anything of that kind.

Just un-hiding the hidden columns and rows, the same operations (whichever they might be) are performed much faster.
Comment 5 Heiko Tietze 2024-02-07 09:32:29 UTC
I dont have STR for the broken file. Described everything I did, and in the recap it worked. If the document opens for Ady (my notebook is not powerful enough)=, I suggest to resolve invalid.

Hidden col/row issues seem to be tracked, and what happened for the inflated document is unclear.
Comment 6 Heiko Tietze 2024-02-07 12:06:38 UTC
Tracked the issue down and the STR are
* hide row 11 - max
* merge col B
=> file size grows extraordinarily, and consequently I cannot deal with the document on slow maschines

Now I wonder if this is reported somewhere.
Comment 7 Heiko Tietze 2024-02-07 12:09:20 UTC
(In reply to Heiko Tietze from comment #6)
> * hide row 11 - max
Or just merge a whole column without any other modification
Comment 8 ady 2024-02-07 14:51:08 UTC
(In reply to Heiko Tietze from comment #7)
> (In reply to Heiko Tietze from comment #6)
> > * hide row 11 - max
> Or just merge a whole column without any other modification

Using LO 7.6.3.2 on MS Windows, I just tested the following:
1. New empty spreadsheet.
2. Save as premerge.ods (8KB)
3. Merge entire columns C:D.
4. Save as postmerge.ods (720KB)

Opening premerge.ods takes a second.
Opening postmerge.ods takes 10 seconds.

I guess we could conclude that several characteristics of attachment 192432 [details] would multiply the size and the time to perform any action.

Having operations (or formulas) on _entire_ rows/columns in Calc tends to have such effects.

I still think that the main reason for the delay is the "inactive" rows/columns being hidden, but it would have to be proven in relation to the other factors.

@Heiko,

This report could be left open for the "entire merged columns" effects (performance and/or size?). Setting some prior comments as OT would be helpful in such case. Bug 156297 is already focused on the "inactive" hidden area generating delays for every action (e.g. scroll, open file...). 

Alternatively, a new report could be generated focusing on the "entire merged columns" effects, with the STR that I just posted in this comment.
Comment 9 Heiko Tietze 2024-02-08 09:01:31 UTC
(In reply to ady from comment #8)#
> This report could be left open...
Up to QA to decide (prolly we have tickets with this content)
Comment 10 Stéphane Guillou (stragu) 2024-04-16 04:28:08 UTC
(In reply to Heiko Tietze from comment #9)
> (In reply to ady from comment #8)#
> > This report could be left open...
> Up to QA to decide (prolly we have tickets with this content)

OK, let's refocus, using comment 8 for steps.

Results for me:
- pre-merge ODS: 7.9 kb
- post-merge ODS (merged cell is C1:D1048576): 736.5 kb

Even for a single merged column, I get 592.3 kb.

For comparison, if I instead only input some text in cell D1048576 and save, the ODS stays at 8.1 kb.

Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 8b599d60fef80039cdfe636a771c3fc8eb1028c3
CPU threads: 8; OS: Linux 6.5; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: CL threaded

OOo 3.3 goes to 568 kb (selection is also C1:D1048576), so marking as inherited.

I think the discussion in 144521 might be relevant here.
Regina, any input regarding the ODF spec?

---

Side notes:
- I tried a format comparison, but XLSX with two merged columns does not change size (5.1 kb before and after) - that's because full-row and full-column merges are lost.
- Google Docs only merges for the first 1000 rows, resulting in a much smaller file, of course. We shouldn't follow that though. contents.xml looks extremely redundant as well.
Comment 11 ady 2024-04-16 19:00:06 UTC
(In reply to Stéphane Guillou (stragu) from comment #10)

> Results for me:
> - pre-merge ODS: 7.9 kb
> - post-merge ODS (merged cell is C1:D1048576): 736.5 kb
> 
> Even for a single merged column, I get 592.3 kb.

The counterpart of the report (as in comment 8) is the time to open each of these, which is directly related to the initial problem in comment 0.

Even if the size could be somehow reduced, if one consequence of such reduction would be additional time (to open, or to some other relevant action on the file), then the problem presented in comment 0 would remain (i.e. opening would take such a long time in certain systems that users would assume some crash/hang/corruption).