Bug 153206 - Better document de logic behind sorting data in Calc (alpha-numeric and mix of formats)
Summary: Better document de logic behind sorting data in Calc (alpha-numeric and mix o...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Documentation (show other bugs)
Version:
(earliest affected)
7.4.4.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Sorting
  Show dependency treegraph
 
Reported: 2023-01-25 19:17 UTC by Hendrik Maryns
Modified: 2023-03-07 11:13 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
example (12.00 KB, image/png)
2023-01-25 19:18 UTC, Hendrik Maryns
Details
2nd example (9.81 KB, image/png)
2023-01-25 19:19 UTC, Hendrik Maryns
Details
LO example (10.66 KB, application/vnd.oasis.opendocument.spreadsheet)
2023-01-25 20:53 UTC, Hendrik Maryns
Details
To open the LibreofficeDev Calc and then given the values in cell lexicographically order, and then select the all cells and using the sort function in the LO Calc then its working fine to sort values (114.92 KB, image/png)
2023-01-29 07:36 UTC, Sathish Kumar
Details
5 different test cases with example data (created in LO 7.4.5) (15.46 KB, application/vnd.oasis.opendocument.spreadsheet)
2023-03-06 11:19 UTC, Stéphane Guillou (stragu)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hendrik Maryns 2023-01-25 19:17:33 UTC
Description:
I enter a bunch of strings in cells, some are numbers, some aren’t, since they contain a range expressed by "/".  I mark the cells as text.  Sorting them gives strange result: most are in correct order, but some aren’t.  Changing sort order or settings does not make a difference.

Steps to Reproduce:
1. enter some numbers in cells, and some article number like number-letter combinations
2. mark all as text (optional)
3. sort by clicking the relevant button or via the menu

One screenshot is from a Windows PC (the longer one), another from a Linux.

Actual Results:
see screenshots

Expected Results:
Since they are marked as text, I expect them to be sorted lexicographically.


Reproducible: Always


User Profile Reset: No

Additional Info:
Did not reset user profile, but seen on multiple computers.
Comment 1 Hendrik Maryns 2023-01-25 19:18:54 UTC
Created attachment 184909 [details]
example
Comment 2 Hendrik Maryns 2023-01-25 19:19:09 UTC
Created attachment 184910 [details]
2nd example
Comment 3 Mike Kaganski 2023-01-25 20:04:37 UTC
Your screenshots don't show anything strange. On attachment 184909 [details], cells A1:A4 likely contain numbers, and the rest are textual cells; the sort order is numbers first, ascending; then texts, in alphabetical order.

Attachment 184910 [details] shows similar picture: cells A1:A5 contain numbers and are sorted ascending; cells A6:A8 contain texts and are sorted alphabetically.

Is there something else, not visible on screenshots, that constitutes the problem? Note that screenshots are least useful data; attaching the spreadsheet document with the data allows people to see and experiment with the problem, and not assume things (e.g., it's my assumption that first attachment's A5 is text, and I have no way to check that, using e.g. View->Value Highlighting).
Comment 4 Hendrik Maryns 2023-01-25 20:53:06 UTC
Created attachment 184912 [details]
LO example

That explains, but is IMHO very counter-intuitive.  I have been looking around and there is nog obvious way to have everything sorted alphabetically (actually, lexicographically).  I would have expected marking cells as text would lead to that.  Thinking of it, I do not understand this default behavior.  Why numbers first and then text?  I would expect that as soon as LO notes not all cells are numbers, it would resort to alphabetical sorting.

So maybe this is an RFE or it should be documented better, or there should be a button in the sort dialog to choose that everything should be treated as text.  ‘Natural sort order’ makes no difference.

Actually, I made another example and here the sort order is still different.  I  attach the unsorted example.
Comment 5 Stéphane Guillou (stragu) 2023-01-25 23:15:48 UTC
I agree this could be better documented. We've got:

- https://help.libreoffice.org/latest/en-US/text/scalc/01/12030100.html
- https://help.libreoffice.org/latest/en-US/text/scalc/01/12030200.html

There's mention that "Number fields are sorted by size and text fields by the order of the characters", but it doesn't clarify what comes first, and also that the alphabetical part starts with numbers.

Hendrik, the example you provide gives me a result that makes sense:

215468
499432
6545146
306518/8
v123245

- First, numbers from lowest to highest (three first rows)
- Then, text strings, alphabetically, with numbers first (two last rows)

What might be more surprising to users is the inclusion of dates and times in the column. The logic should also be explained in the documentation.

Should we change this into a documentation improvement?
Comment 6 Hendrik Maryns 2023-01-26 08:07:34 UTC
(In reply to Stéphane Guillou (stragu) from comment #5)
> I agree this could be better documented. We've got:
> 
> - https://help.libreoffice.org/latest/en-US/text/scalc/01/12030100.html
> - https://help.libreoffice.org/latest/en-US/text/scalc/01/12030200.html
> 
> There's mention that "Number fields are sorted by size and text fields by
> the order of the characters", but it doesn't clarify what comes first, and
> also that the alphabetical part starts with numbers.

And ‘You can define the sort rules on Data - Sort - Options.’ is not really true; this is exactly the point I try to make here.

> Hendrik, the example you provide gives me a result that makes sense:
> - First, numbers from lowest to highest (three first rows)
> - Then, text strings, alphabetically, with numbers first (two last rows)

They make sense if you look at it with a particular logic, which you are probably familiar with.  As a non-programming end user, this is a non-intuitive logic.

> What might be more surprising to users is the inclusion of dates and times
> in the column. The logic should also be explained in the documentation.

Agree!

> Should we change this into a documentation improvement?

I hope that, apart from fixing the documentation, it can also be enhanced so that one can choose to sort dates, times, numbers etc. as text.  Makes sense in my case, though maybe in another situation you would want it differently.
Comment 7 Mike Kaganski 2023-01-26 09:24:59 UTC
I couldn't find in Excel 2016 a feature like our "Enable natural sort"; and this is a feature that *likely* doesn't save to files (so just a UI function). It *could* be possible to change it to consider numbers and texts similarly ...

However:
1. It still could be used in macros, and they could break then
 => we could use a separate "[x] treat numbers as text" checkbox to keep the old behavior unchanged, and only have new behavior when explicitly configured

2. The numbers may be formatted in different ways; in which position should a *number* 0.1 be placed in the sorted list, when it is formatted as: "0.0"; "#.0"; ";;"; "[NatNum12]0.0"?
 => we could only use the display format then, solving the problem of dates at the same time...

3. Anyway, having a mix of numbers and number-like things in cells is a sign of user error. Is it really needed? IMO, we better have some notification in sort function that it sorted a mix, so possibly user needs to check their data (e.g., using a balloon tip or an infobar)?
Comment 8 Hendrik Maryns 2023-01-26 10:00:49 UTC
(In reply to Mike Kaganski from comment #7)
> 3. Anyway, having a mix of numbers and number-like things in cells is a sign
> of user error. Is it really needed? IMO, we better have some notification in
> sort function that it sorted a mix, so possibly user needs to check their
> data (e.g., using a balloon tip or an infobar)?

Agree.  In my case, it is not a user error, they are all article numbers in a hardware store, which I want to sort so it is easier for my colleagues to enter them into the sticker printer (unfortunately, no digital way to do that yet).  So I want them to be treated as text, all of them, even those that really are numbers and do not contain letters or symbols.
Comment 9 Mike Kaganski 2023-01-26 10:31:58 UTC
(In reply to Hendrik Maryns from comment #8)
> In my case, it is not a user error, they are all article numbers in
> a hardware store

And this is *exactly* the user error. A *number* (in Calc sense) is anything that you use to *perform calculations*. Identifiers of any kind (like "article numbers") are *not* numbers, and should be stored as *text* (either by prepending number-like entries with ', or by pre-formatting the column as text *prior to entry*).

Without that, one can realize at some point that "numbers" that they enter would transform to somethin unexpected: e.g., one enters "000123", and gets "123"; or one enters "123/4", and gets "30 3/4"...

And the sort "misbehaving" is just another manifestation of *the* user error.
Comment 10 Hendrik Maryns 2023-01-26 10:35:38 UTC
(In reply to Mike Kaganski from comment #9)
> (In reply to Hendrik Maryns from comment #8)
> > In my case, it is not a user error, they are all article numbers in
> > a hardware store
> 
> And this is *exactly* the user error. A *number* (in Calc sense) is anything
> that you use to *perform calculations*. Identifiers of any kind (like
> "article numbers") are *not* numbers, and should be stored as *text* (either
> by prepending number-like entries with ', or by pre-formatting the column as
> text *prior to entry*).
> 
> Without that, one can realize at some point that "numbers" that they enter
> would transform to somethin unexpected: e.g., one enters "000123", and gets
> "123"; or one enters "123/4", and gets "30 3/4"...
> 
> And the sort "misbehaving" is just another manifestation of *the* user error.

OK, then might I suggest that the above information, which I suppose is documented somewhere, is linked to from the help article about sorting.
Comment 11 QA Administrators 2023-01-27 03:25:39 UTC Comment hidden (obsolete)
Comment 12 Sathish Kumar 2023-01-29 07:36:21 UTC
Created attachment 184990 [details]
To open the LibreofficeDev Calc and then given the values in cell lexicographically order, and then select the all cells and using the sort function in the LO Calc then its working fine to sort values
Comment 13 Hendrik Maryns 2023-01-30 09:11:29 UTC
Comment on attachment 184990 [details]
To open the LibreofficeDev Calc and then given the values in cell lexicographically order, and then select the all cells and using the sort function in the LO Calc then its working fine to sort values

I’m afraid I do not understand what you mean.
Comment 14 Stéphane Guillou (stragu) 2023-03-06 11:17:40 UTC
I went ahead and changed this report to a Documentation one, given the conversation so far. Hope you all agree.
Comment 15 Stéphane Guillou (stragu) 2023-03-06 11:19:01 UTC
Created attachment 185792 [details]
5 different test cases with example data (created in LO 7.4.5)

Here is a test spreadsheet that hopefully helps clarify how order changes depending on how data is entered and how cells are formatted.
Comment 16 Hendrik Maryns 2023-03-06 19:42:20 UTC
(In reply to Stéphane Guillou (stragu) from comment #14)
> I went ahead and changed this report to a Documentation one, given the
> conversation so far. Hope you all agree.

I would plead that the behavior be changed to sorting everything as string as soon as mixed data is detected.  I think that is more intuitive to non-technical users.
Comment 17 Eike Rathke 2023-03-07 11:13:56 UTC
That would break all functionality that relies on properly sorted data, especially LOOKUP(), HLOOKUP(), VLOOKUP(), MATCH() and maybe others.