Bug 145995 - Defined DATA ranges merge destructively if cells separating them are inadvertently filled (autofilter extending data area)
Summary: Defined DATA ranges merge destructively if cells separating them are inadvert...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.4.0 release
Hardware: All All
: high major
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisectNotNeeded, dataLoss, regression
Depends on:
Blocks: AutoFilter Calc-Merge-Split
  Show dependency treegraph
 
Reported: 2021-12-01 18:51 UTC by Colin
Modified: 2024-12-03 11:58 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Example file (14.72 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-12-01 18:52 UTC, Colin
Details
Additional Examples (12.23 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-12-11 10:17 UTC, Colin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Colin 2021-12-01 18:51:20 UTC
Description:
When two (or more) arrays are converted to Database ranges and those ranges are indexed with autofilters, then; if the user inadvertently enters data into the unbounded rows between the db ranges - thereby bridging the gap between the two ranges - those ranges become merged and any autofilter action will prove destructive to the entire structure.
Even such an insignificant action as selecting the autofilter button will cause the destruction.
The range definitions will be amended to show the first range encompassing the area for both ranges and the second range "appears" to have its own boundaries. This is however an illusion. Any attempts to redefine the boundaries of the first range may appear to have been successful but the damage has already been done. The amendment will have zero impact and further inspection will demonstrate that the second range is still subsumed by the first range with the definitions showing the erroneous structure, not the amendments.
The test file attached will even demonstrate that undoing all the actions that created the bond, doesn't revert the file to two ranges.
It is possible to remediate by manually selecting the second range and moving it to a location outside the "global range". At this point, the definition for the second range will probably disappear. The first range can be redefined to the original parameters and the second range can be moved back to the original location. The surplus lines in the first range can simply be deleted or the definition can be amended as there is no data in the extension.
In my experience, the ranges may lose their integrity so it's wise to actually delete the definitions and start afresh.
The symptoms I have observed when the ranges are first separated are:-
One range definition may actually disappear
The autofilter buttons may still be present but inoperative
 

Steps to Reproduce:
With the attached file
Observe the db range definitions MENU>Data>Define Range - select each range and confirm
Select the autofilter at B3 and observe it only indicates the first range
Repeat at B12
Enter data at B10 and select the autofilter at B3 to verify
Enter data at B11 and select the autofilter at B3 to verify
Observe the data now includes all of the second array
Try deleting the data in B10:B11
Observe that autofilter B3 still includes the second array
Try editing the definitions to repair the array and observe the futility.
Try saving, closing, opening and then autofilter B3 - still corrupt.
Select B12:E18 and relocate the array to H25
verify the extent of the first array and either edit the definition or simply delete the rows until the definition only encompasses B3:E9
Select the second array and relocate to its original location.
The probability is that you will need to redefine the autofilter for at least one array
Both ranges should now perform as expected

Actual Results:
Two arrays destructively merged into one

Expected Results:
Discrete arrays


Reproducible: Always


User Profile Reset: Yes


OpenGL enabled: Yes

Additional Info:
Version: 7.2.2.2 (x64) / LibreOffice Community
Build ID: 02b2acce88a210515b4a5bb2e46cbfb63fe97d56
CPU threads: 4; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: sv-SE (en_GB); UI: en-GB
Calc: threaded
Comment 1 Colin 2021-12-01 18:52:03 UTC
Created attachment 176636 [details]
Example file
Comment 2 Colin 2021-12-02 05:44:17 UTC
Workarounds:
1 Data Validation of the cells immediately above the header - say minimum of 5000 characters or perhaps a 256bit hash of something
2 Cell protection of the cells immediately above the header - ensure this doesn't interfere with any regular sheet operations.
Comment 3 Colin 2021-12-11 10:17:06 UTC
Created attachment 176861 [details]
Additional Examples

I'm attaching another sheet demonstrating some more unexpected results.
In all honesty, I was experimenting to see just what would break it and how.
Initially, the sheet was created as defined with the array A3:D18 included within the scope of the DB Range "Assorted"
I then created the SUM() on row 2.
Initially, I attempted to create the autofilters on row 4 - the numbers, but the process arbitrarily moved this to row 3 - the characters. Could be reasonable as that was the "top" of the defined DB Range
I then edited the range to A1:D18 and it automatically created the autofilter on row 1 but did not erase the filter from row 3 - Maybe not quite so reasonable.
I then consolidated B1:C1 into a merged cell then the same for B3:C3 - KEEPING the hidden cell contents.
I then "froze" row 3
Any attempt to activate any filter creates the information message "Ranges containing merged cells can only be sorted without formats.
Unmerging the cells permits filtering on any "active" filter but note that the filter has now been removed from B3:C3
Repeated attempts to re-filter either B3:C3 or A3:D3 with the autifilter MENU button toggles row 1 on & off.

The results of any filter activation without the "merged" cells providing a "veto" Reflect the "unusual" structure. and as per the initial report - all the rows become sortable elements within the array.
Interestingly, the SUM() function now includes itself with the appropriate circular reference error.
Comment 4 Buovjaga 2022-11-25 13:58:00 UTC
Repro with file. I tested the problem of deleting the extra data not removing the second range from the filter and I noticed it was not present in 3.3.0, but was already in 3.5.0.

Arch Linux 64-bit
Version: 7.5.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: d96d1b380d3c626522407397e56d68030a73e02a
CPU threads: 8; OS: Linux 6.0; UI render: default; VCL: kf5 (cairo+xcb)
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded
Built on 25 November 2022
Comment 5 Buovjaga 2024-05-10 09:35:04 UTC
(In reply to Colin from comment #0)
> The test file attached will even demonstrate that undoing all the actions
> that created the bond, doesn't revert the file to two ranges.

Raising priority and severity as users likely will not figure out another workaround than undo to restore the data.
Comment 6 Colin 2024-05-10 09:51:36 UTC
(In reply to Buovjaga from comment #5)
> (In reply to Colin from comment #0)
> > The test file attached will even demonstrate that undoing all the actions
> > that created the bond, doesn't revert the file to two ranges.
> 
> Raising priority and severity as users likely will not figure out another
> workaround than undo to restore the data.

I think it also needs to be addressed in conjunction with Bug160018 which is a later and far more damaging manifestation of the same issue which was introduced with 7.6.5.
Comment 7 Buovjaga 2024-05-11 07:59:57 UTC
Knowing the issue appeared somewhere between 3.3 and 3.5, I did a rough search through the logs based on the release feature freeze dates:

git log --all --grep="autofilter" -i --since="1.10.2010" --until="11.12.2011" sc

Studying the log, this looked very relevant:

commit 03417cf345a6044f97de4b1c18ba01807e39010f
Author: Kohei Yoshida <kyoshida@novell.com>
Date:   Fri Apr 1 21:05:08 2011 -0400

    Re-implement DB area extension prior to autofilter launching.
    
    This time store the extended DB area permanently.  This makes things
    a bit simpler.

diff --git a/sc/source/core/data/documen3.cxx b/sc/source/core/data/documen3.cxx
index c99145cd0309..afa80bc73d31 100644
--- a/sc/source/core/data/documen3.cxx
+++ b/sc/source/core/data/documen3.cxx
@@ -1354,6 +1354,15 @@ sal_Bool ScDocument::GetFilterEntries(
             SCCOL nEndCol;
             SCROW nEndRow;
             pDBData->GetArea( nAreaTab, nStartCol, nStartRow, nEndCol, nEndRow );
+
+            // Extend the DB area to include data rows immediately below.
+            // Note that the change is stored back to the DB.
+            SCCOL nCol1a = nStartCol, nCol2a = nEndCol;
+            SCROW nRow1a = nStartRow, nRow2a = nEndRow;
+            GetDataArea(nTab, nCol1a, nRow1a, nCol2a, nRow2a, false, false);
+            nEndRow = nRow2a;
+            pDBData->SetArea(nTab, nStartCol, nStartRow, nEndCol, nEndRow);
+
             if (pDBData->HasHeader())
                 ++nStartRow;

It has a follow-up commit

commit 9f9ff37fe372897e2682a4bcedd2b76b9c770979
Author: Kohei Yoshida <kyoshida@novell.com>
Date:   Fri Apr 1 23:04:45 2011 -0400

    Extend data area for standard and special filters as well.

The code now lives here: https://opengrok.libreoffice.org/xref/core/sc/source/core/tool/dbdata.cxx?r=7ad4641e#643

Commenting out the content of the ScDBData::ExtendDataArea function and compiling, it indeed stops adding data written to rows below the range to the autofilter.

The function calls ScDocument::GetDataArea in https://opengrok.libreoffice.org/xref/core/sc/source/core/data/document.cxx?r=0fb98963#1069
which in turn calls ScTable::GetDataArea in https://opengrok.libreoffice.org/xref/core/sc/source/core/data/table1.cxx?r=97af15c8#880

Apparently it will modify the state of the area even though it confusingly has "Get" in its name, leading the reader to think it has a passive role. In a comment it says: "It may shrink or extend the area given as input".

Call sites for ExtendDataArea:
https://opengrok.libreoffice.org/xref/core/sc/source/core/data/documen3.cxx?r=97af15c8#1569
https://opengrok.libreoffice.org/xref/core/sc/source/ui/view/tabvwshc.cxx?r=387a9c44#339 (multiple)

I think first the undoing should be made to work.
Then, the extending logic should be modified to stop extending, if it encounters an autofilter. Or can there be a use case where the current behaviour would be desirable? It seems counterintuitive and more like "the computer doing things I didn't ask it to", so I don't know who would miss it.
Comment 8 Colin 2024-05-11 08:31:17 UTC
(In reply to Buovjaga from comment #7)
> Knowing the issue appeared somewhere between 3.3 and 3.5, I did a rough
> search through the logs based on the release feature freeze dates:
> 
> git log --all --grep="autofilter" -i --since="1.10.2010"
> --until="11.12.2011" sc
> 
> Studying the log, this looked very relevant:
> 

> +
> +            // Extend the DB area to include data rows immediately below.
> +            // Note that the change is stored back to the DB.
> +            SCCOL nCol1a = nStartCol, nCol2a = nEndCol;
> +            SCROW nRow1a = nStartRow, nRow2a = nEndRow;
> +            GetDataArea(nTab, nCol1a, nRow1a, nCol2a, nRow2a, false, false);
> +            nEndRow = nRow2a;
> +            pDBData->SetArea(nTab, nStartCol, nStartRow, nEndCol, nEndRow);
> +

Not being a coder I only THINK my comment may be relevant;

It appears to me that something may be performing detection in a column. if that detection is assuming it may encounter a condition lower down that same column then it could miss a situation where the upper table is more columns than the lower table AND the lower table is offset so it commences some columns to the right.

Clearly, if the detection is over the entire columnar range of the upper table then I just ably demonstrated my stupidity.

 
> I think first the undoing should be made to work.

100% Agree

> Then, the extending logic should be modified to stop extending, if it
> encounters an autofilter. Or can there be a use case where the current
> behaviour would be desirable? It seems counterintuitive and more like "the
> computer doing things I didn't ask it to", so I don't know who would miss it.

I cannot imagine any use for a procedure that ALWAYS actively destroys tables. Whoever invented backups is a saint;)
Comment 9 Colin 2024-05-11 08:38:04 UTC
(In reply to Colin from comment #8)
> (In reply to Buovjaga from comment #7)

> 
> Not being a coder I only THINK my comment may be relevant;
> 
> It appears to me that something may be performing detection in a column. if
> that detection is assuming it may encounter a condition lower down that same
> column then it could miss a situation where the upper table is more columns
> than the lower table AND the lower table is offset so it commences some
> columns to the right.
> 
> Clearly, if the detection is over the entire columnar range of the upper
> table then I just ably demonstrated my stupidity.

Upon reflection, I did identify in my report for BUG160018 that the anomaly was not triggered when columns to the right of the "primary" column were the culprits