Bug 107779 - Sorting breaks rows data alignement when functions include columns with no data
Summary: Sorting breaks rows data alignement when functions include columns with no data
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.3.3.2 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL: https://help.libreoffice.org/7.3/en-U...
Whiteboard:
Keywords:
Depends on:
Blocks: Sorting Calc-Function
  Show dependency treegraph
 
Reported: 2017-05-11 17:21 UTC by Luca
Modified: 2022-02-12 10:44 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
Empty cells not sorted, even if have a color background (34.03 KB, image/png)
2020-06-13 15:58 UTC, Xavier Guillot
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luca 2017-05-11 17:21:09 UTC
In Tools / Options / LibreOffice Calc / General
select:

[X] Update references when sorting range of cells.


1) Setup a Calc spreadsheet this way:

      A             B          C        D

1     a       =sum(C1:D1)      1        1
2     b       =sum(C2:D2)      2        2
3     c       =sum(C3:D3)      3        3


This way a is worth 2, b is worth 4, c is worth 6, you see:

      A             B            C        D

1     a             2            1        1
2     b             4            2        2
3     c             6            3        3

2) Now select all then sort by column A descending using Data / Sort. You see, as expected:

      A             B            C        D

1     c             6            3        3
2     b             4            2        2
3     a             2            1        1


3) Now revert to original sorting (ctrl+z), then modify function in B, as to include column E - which doesn't contain data - in the sum:

      A             B            C        D

1     a       =sum(C1:E1)      1        1
2     b       =sum(C2:E2)      2        2
3     c       =sum(C3:E3)      3        3

The result is the same as in 1, as expected.

4) Now repeat 2): select all then sort by column A descending using Data / Sort. You get the wrong result in column B:

      A             B            C        D

1     c             2            3        3
2     b             4            2        2
3     a             6            1        1

That's because the references in column B have been wrongfully reordered,
following column A but ignoring C and  D:

      A             B            C        D

1     c       =somma(C3:E3)      3        3
2     b       =somma(C2:E2)      2        2
3     a       =somma(C1:E1)      1        1


5) If (step 3) you add any value in column D, then repeat step 2), sorting order is correct. That's to say, starting from:

      A             B            C        D       E

1     a       =somma(C1:E1)      1        1
2     b       =somma(C2:E2)      2        2
3     c       =somma(C3:E3)      3        3       3

The descending order on column A produces, as expected:

      A             B            C        D       E

1     c             9            3        3       3
2     b             4            2        2
3     a             2            1        1

=================================================================

This is bad. And it's not the only case when Calc breaks the raws integrity without any warning. Also see bug # 107385:
https://bugs.documentfoundation.org/show_bug.cgi?id=107385

--
Comment 1 Eike Rathke 2017-05-11 18:05:43 UTC
With "Update references when sorting range of cells" this is expected. Without a value in column E the sort range is A1:D3, but a reference like C3:E3 points outside that range so is not moved with the data but adjusted to point to the original location. That's exactly why the option exists.
Comment 2 Luca 2017-05-12 10:08:36 UTC
(In reply to Eike Rathke from comment #1)
> With "Update references when sorting range of cells" this is expected.
> Without a value in column E the sort range is A1:D3, but a reference like C3:E3 
> points outside that range so is not moved with the data but adjusted to point 
> to the original location. That's exactly why the option exists. 

That doesn't make sense to me: if I select all (step 2) I explicity tell the program to sort all the rows and columns. That's how the sorting function normally works in Calc. 

If this behaviour is intentional, then the bug is not to warn users that their sorting area has been arbitrary restricted and rows integrity will be lost. 

I also see that warnings about the selected sorting area extention seem to have disappeared in last versions of Calc. I think they should be reinstated. There should be no situation when the records integrity is at risk without any warning.

Please, understand that not intuitive, dangerous and incoherent behaviours cannot be considered as "normal".
Comment 3 Xisco Faulí 2017-07-26 17:38:10 UTC
Moving back to UNCONFIRMED. It needs to be confirmed by a third person
Comment 4 Jean-Baptiste Faure 2017-08-05 20:58:06 UTC
(In reply to Luca from comment #0)
> In Tools / Options / LibreOffice Calc / General
> select:
> 
> [X] Update references when sorting range of cells.

Why did you check this option ?
Please, have a look at bug 81309 and bug 81633 to understand where this option comes from.
bug 81309: Sorting should automatically adjust references
bug 81633: Sorting shouldn't always automatically adjust references

Set status to NEEDINFO, please set it back to UNCONFIRMED once requested
informations are provided.

Best regards. JBF
Comment 5 Luca 2017-12-26 10:05:13 UTC
(In reply to Jean-Baptiste Faure from comment #4)
> (In reply to Luca from comment #0)
> > In Tools / Options / LibreOffice Calc / General
> > select:
> > 
> > [X] Update references when sorting range of cells.
> 
> Why did you check this option ?
> Please, have a look at bug 81309 and bug 81633 to understand where this
> option comes from.
> bug 81309: Sorting should automatically adjust references
> bug 81633: Sorting shouldn't always automatically adjust references
> 
> Set status to NEEDINFO, please set it back to UNCONFIRMED once requested
> informations are provided.

(Sorry for my delayed reply)

The option [X] Update references when sorting range of cells
is necessary when you have two sheets in the same file with cells in a column of the first sheet pointing to corresponding cells in a column of the second sheet, because without that option, resorting one sheet would result in loosing data rows integrity:

Sheet 1:                             Sheet 2:

A                   B                A            B   

=$'Sheet 2'.A1      1a               a            2a 
=$'Sheet 2'.A2      1b               b            2b
=$'Sheet 2'.A3      1c               c            2c

(values a, b, c, must appear in column As of Sheet 1 and Sheet 2, always connected to corresponding values in column Bs, no matter how you sort one sheet or the other)

But I think you're missing the point. There are two issues here: 1) lacking of warnings, whenever a sorting operation would result in data loss by breaking the rows integrity; 2) arbitrary restriction of a specifically selected area to perform an operation that will result in data loss (this is the worst problem, see my previous messages for details). 

As for the issue in 1): when you select one column and then try a sorting operation, the program already issues a warning and proposes an extension of the selection. The case I described here are far more dangerous and less intuitive, all the more reason they should need a warning.
Comment 6 Buovjaga 2017-12-26 19:32:44 UTC
Luca: would bug 96464 solve it for your use case?
Comment 7 Jean-Baptiste Faure 2018-06-24 19:43:19 UTC
Please answer question in comment #6.

Status set to NEEDINFO, please set it back to UNCONFIRMED once requested
informations are provided.

Best regards. JBF
Comment 8 Luca 2019-01-02 14:07:05 UTC
(In reply to Buovjaga from comment #6)
> Luca: would bug 96464 solve it for your use case?

I guess not, if I understand how it works. What is needed is:

1) an option to never ever brake the integrity of a line without a warning
2) if the user selects a range of lines/rows/cells, then applies any sorting operation, the sorting must affect completely and exclusively the selected area, no exceptions.
3) In the help file, a clear, complete and exhaustive documentation of Calc behaviours in every case.

I have yet another simple case where I would need the option
[X] Update references when sorting range of cells

      A           B           C      

1     value1      value3             
2     value2      value4
3                             =B2 (value4)

If I modify sorting of rows 1 and 2 according to the column B values, I want the C3 reference updated so that it always points to value4, no matter if it's in B1 or B2. But I can't use the options because it's too dangerous. Without 1) and 2) as I described here above, it's very likely that you will end up with loosing your datasheet completely without even knowing, because of the messed up lines.
Comment 9 Luca 2019-01-05 16:38:41 UTC
(In reply to Luca from comment #8)

> I have yet another simple case where I would need the option
> [X] Update references when sorting range of cells
> 
>       A           B           C      
> 
> 1     value1      value3             
> 2     value2      value4
> 3                             =B2 (value4)
> 
> If I modify sorting of rows 1 and 2 according to the column B values, I want
> the C3 reference updated so that it always points to value4, no matter if
> it's in B1 or B2. But I can't use the options because it's too dangerous.
> Without 1) and 2) as I described here above, it's very likely that you will
> end up with loosing your datasheet completely without even knowing, because
> of the messed up lines.

I would add: even the more confusing all this is, if you consider that when manually resorting rows (alt+drag) the reference in C3 gets updated, but if you use the sorting function instead to obtain the same result, it does not.  

If all this is intentionally by design, well, it's very bad design, in my opinion. Counterintuitive and dangerous. For sure it would require extensive and exhaustive description of all possible cases, in the help file.
Comment 10 Oliver Grimm 2019-07-09 09:08:37 UTC
when I was trying to confirm this bug, I did not find the option "[X] Update references when sorting range of cells." in the german (de-de) version of the sort dialog. Has it been removed recently? my version here is 6.1.5.2.
Comment 11 Buovjaga 2019-07-09 10:46:39 UTC
(In reply to Oliver Grimm from comment #10)
> when I was trying to confirm this bug, I did not find the option "[X] Update
> references when sorting range of cells." in the german (de-de) version of
> the sort dialog. Has it been removed recently? my version here is 6.1.5.2.

It is still in Tools / Options / LibreOffice Calc / General: Input settings.
Above it is "Expand references when new columns/rows are inserted".
Below it is "Highlight selection in column/row headers".
Comment 12 Oliver Grimm 2019-07-10 09:21:57 UTC
ok, thanks, found it now B-]

While I do understand the two use cases for setting or unsetting this option, I do not understand Eikes reasoning in comment #1. Why should a cell be left out of sorting only because if refers to something outside the selected range?

When I read "[X] Update references when sorting range of cells." I assume that references should be updated after sorting all non-formula data. Nothing else.

Therefore I tend to second Lucas point here:
from comment #5: "2) arbitrary restriction of a specifically selected area" is a problem, if not bug, in my opinion. It does not affect my daily work since I do not use that option. But it needs to be solved.
Comment 13 b. 2020-05-11 19:20:07 UTC
from a short check: 

it holds with not checking the option 'update references' and including the empty colum E in the range to be sorted ...
Comment 14 Luca 2020-05-12 08:02:13 UTC Comment hidden (no-value)
Comment 15 Xavier Guillot 2020-06-13 15:58:58 UTC Comment hidden (off-topic)
Comment 16 Robert Lacroix 2020-08-25 20:59:16 UTC Comment hidden (off-topic)
Comment 17 Robert Lacroix 2020-08-25 21:27:32 UTC Comment hidden (obsolete)
Comment 18 Jean-Baptiste Faure 2022-02-09 15:11:23 UTC Comment hidden (obsolete)
Comment 19 Buovjaga 2022-02-09 15:16:57 UTC Comment hidden (obsolete)
Comment 20 Katie Chan 2022-02-09 19:00:09 UTC Comment hidden (obsolete)
Comment 21 Jean-Baptiste Faure 2022-02-09 20:34:46 UTC Comment hidden (obsolete)
Comment 22 Heiko Tietze 2022-02-10 08:56:33 UTC
To me, "Update references when sorting range of cells" is clearly a switch to not sort the cells but the references. But at least we have a documentation gap.
Comment 23 Eike Rathke 2022-02-10 16:13:39 UTC
Clearly sort always sorts, regardless of the option.. but if and how the references are adjusted is exactly the point of that option, which was introduced at some time because people don't agree what a sort should actually do with references.

If unchecked, references are not touched, so relative references still point to the same relative positions if their formula cell moved, i.e. to within the same sorted row stay on that row after the sort.

If checked, references pointing (partially) outside of the sorted area are adjusted to still point to the original range, which for relative references *changes* (hence updates) the reference.

The behaviour the bug submitter wants is the option *unchecked* (which is the default), i.e. after the sort on row 1 there's =SUM(C1:E1) and then all works as expected.

However, with the option checked the current behaviour comes to a surprise because the sort range is shrunk to the actual used area and thus the formula points outside. Shrinking is done because sorting empty areas (no data and no attribution) is superfluous and sorting those areas as well would just slow down things significantly. This is especially important *because* people hit Ctrl+A select all or select entire rows or columns to sort an actually smaller range.

In case the option is checked, shrinking the sort range must take references in formula cells into account, if they point to within the selected area but outside the used area, then stop shrinking there if so to have the references adjusted (or rather not adjusted) as expected.
Comment 24 Luca 2022-02-12 10:44:56 UTC
(In reply to Eike Rathke from comment #23)
> However, with the option checked the current behaviour comes to a surprise
> because the sort range is shrunk to the actual used area and thus the
> formula points outside. Shrinking is done because sorting empty areas (no
> data and no attribution) is superfluous 

It's the logic of "arbitrarily shrinking user selected areas" that's plain dangerous. When a user specifically selects something, then the program should execute. If there is some reason not to execute, the program should issue a warning (and user should be able to disable that warning). Program's own arbitrary decision not to comply with user's disposition, without any warning, is dangerously bad design. In this particular case it causes data loss. This is priority against any other consideration. 

> and sorting those areas as well would just slow down things significantly. 

Is the slowing down caused by empty cells sorting really an issue? Maybe not a very scientific test, but I just added a value in cell ZZ10000 to extend the "select all" area in one of my spreadsheets to a ~5 million empty cells area and I didn't notice any significant slowdown when sorting.

> This is especially important *because* people hit Ctrl+A select all or 
> select entire rows or columns to sort an actually smaller range.

That's true, and why that happens? Right because of general lack of transparency about how areas to be sorted are arbitrarily selected by the program without warning, nor appropriate documentation. E.g., if there are empty rows or columns creating gaps between data, and I hit the sorting button without selecting anything, who says that I want to sort only the area within the gaps around the cell that has focus? That's Calc's arbitrary decision, breaking integrity of my records, no warning. If gaps are outside the window and I don't remember that there are some, here again, I lose data. This cause insecurity. To be sure and avoid data loss, users just hit ctr+a all the times. In the example case, the program should instead select all or nothing at all. If I need to partially sort a sheet, I will select the area myself.

> In case the option is checked, shrinking the sort range must take references
> in formula cells into account, if they point to within the selected area but
> outside the used area, then stop shrinking there if so to have the
> references adjusted (or rather not adjusted) as expected.

Yes, that would solve this problem, but not the others. Till breaking records is considered a normal harmless behaviour, there will always be problems with the sorting functions.