Bug Hunting Session
Bug 91305 - Sort button does not sort first cell if it has text format
Summary: Sort button does not sort first cell if it has text format
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Yan Pas
URL:
Whiteboard: target:5.3.0 target:5.2.2 target:6.1.0
Keywords:
: 92186 93335 100575 102292 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-05-15 12:50 UTC by Yan Pas
Modified: 2018-06-30 22:46 UTC (History)
13 users (show)

See Also:
Crash report or crash signature:


Attachments
gdb_ScTable::HasColHeader (253.66 KB, image/png)
2016-06-17 22:05 UTC, Yan Pas
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yan Pas 2015-05-15 12:50:03 UTC
When I highlight column of cells containing text and press button sort from A to Z - first cell acts like header but who do need to highlight header when sort? I think it is related to #63416
Comment 1 raal 2015-05-17 17:43:30 UTC
Reproducible from LO 3.5

Workaround - Data-Sort-Options - untick Range contains column labels
Comment 2 Yan Pas 2015-05-17 17:46:11 UTC
Moreover if I sort numbers - first cell doesn't act like header. If text - I'm to select one more cell above 1st. Why is it unconfirmed?
Comment 3 Buovjaga 2015-05-19 11:17:57 UTC
(In reply to Yan Pashkovsky from comment #2)
> Why is it unconfirmed?

Sometimes we forget to set to NEW.
Comment 4 Buovjaga 2015-06-21 15:22:37 UTC
*** Bug 92186 has been marked as a duplicate of this bug. ***
Comment 5 Jacques Guilleron 2015-08-10 21:33:58 UTC
*** Bug 93335 has been marked as a duplicate of this bug. ***
Comment 6 Yan Pas 2015-09-20 19:41:27 UTC
Additional sub-bugs: 
1) Selecting some empty cells and pressing SORT causes empty cells to move below. No matter if you sort from A to Z or from Z to A.
2) Placing digit to the header position and running set of sorts will make digit to be on 2nd or last position.
3) Sorting of text-cells applied to rows (not columns) behaves the same buggy
Comment 7 V Stuart Foote 2016-06-04 18:07:53 UTC
Up for code review https://gerrit.libreoffice.org/#/c/25883/

Personally I have some concern. The UI provided from Data ->Sort dialog Option tab actually seems to have the right logic for setting the "Range contains..." labels when text or number values are present. 

Having contains row/column checked enabled by default when a selection of cells containing text actually seems the desired UI.
 
=-note-=
The UI responds to the Option tab Direction radio button, setting of sorting rows or columns and toggling the Sort Option checkbox accordingly Column labels/Row labels.

There is a pretty sophisticated algorithm at work here--not sure it really needs to be changed. 

@Caolán any comment, or code pointers for Yan Pas?
Comment 8 Yan Pas 2016-06-04 18:50:27 UTC
My patch only changes behavior of Sort Asc and Sort Desc actions (consequently buttons) to traditional. It has nothing to do with checkboxes and UI.

From this 
http://opengrok.libreoffice.org/search?q=pDoc-%3EHasColHeader&project=core&defs=&refs=&path=&hist=
I suppose that copy-paste programming took place

Current behavior should be definetely changed cause algorithm sets bHasHeader=true for text cells and sets bHasHeader=false for number cells.
Comment 9 V Stuart Foote 2016-06-04 20:33:34 UTC
(In reply to Yan Pas from comment #8)
> My patch only changes behavior of Sort Asc and Sort Desc actions
> (consequently buttons) to traditional. It has nothing to do with checkboxes
> and UI.
> ...

I could be wrong, but I think you are misguided regards the single button actions on the Cacl Standard toolbar Sort, Sort Ascending, and Sort Descenting--as linked to .uno:DataSort, .uno:SortAscending, .uno:SortDescending respectively.

While the button Ascending and Descending are applied with defaults (Sort always calls the dialog), there is underlying logic about how the defaults are set depending on the data type (the bHasHeader value), and the default can be overridden if needed from the Sort UI.

> Current behavior should be definetely changed cause algorithm sets
> bHasHeader=true for text cells and sets bHasHeader=false for number cells.

That might not be the best approach. The button action gets applied to the selection only if the selection is contiguous--otherwise the Sort dialog will open. But the source is common from the .uno command of the two buttons, or from the GUI Sort dialog. 

So again, hard coding the default is probably not what is needed for the standard toolbar button actions.

IIUC for reliable function it would be better to *always* open the GUI sort dialog, prepopulated with ascending/descending per the .uno action, and land on the options tab to configure--*rather* than hard coding default. But that likely would not be appealing UX, so retaining the column/row label logic would be the better choice--perhaps something can be done to improve the label no-label logic.

The "See also" links to development of the sort dialog and addressing default behaviors of rows/column labels during development in the OOo era.
Comment 10 V Stuart Foote 2016-06-05 17:23:41 UTC
(In reply to V Stuart Foote from comment #9)
> ... so retaining the column/row label logic
> would be the better choice--perhaps something can be done to improve the
> label no-label logic.

Have to figure out where it is but suggest the "has header" logic for defaults *could* be adjusted something like this:

I see this as an 80% use solution...

1. test initial cells for only text or only numbers as now

2. test if selection is just one row || or just one column as now (offer dialog to expand)

3. test if initial cell(s) only text [row/column sorts], does selection include cells from Row 1 or column A?

   3a. if initial cell is text & has cells from Row 1 or Col A-- retain current bHasHeader default true (so must open UI and adjust "range contains...")

   3b. if initial cell is text & no cells from Row 1 or Col A, set bHasHeader to false (our 80% fix--interior selections including headers would need to be set from sort options UI).

4. otherwise, if initial cell is number, retain current bHasHeader default of "false" (this does not accommodate use of numbers for column headers, but sort options UI facilitates)
Comment 11 Yan Pas 2016-06-05 20:15:01 UTC
OK, I'll try to find how to fix it in some other place. Where can default checks be stored? Glade file (.ui) even does not contain the dialog checkboxes... Are you sure that these two buttons (sortAsc, sortDesc) are awared of default values from generic Sort dialog?

What's the reason for marking any cell as header automatically? (3a) Can you name any use case when it is handy?
Comment 12 V Stuart Foote 2016-06-06 01:53:19 UTC
(In reply to Yan Pas from comment #11)
> OK, I'll try to find how to fix it in some other place. Where can default
> checks be stored? Glade file (.ui) even does not contain the dialog
> checkboxes... Are you sure that these two buttons (sortAsc, sortDesc) are
> awared of default values from generic Sort dialog?
> 

The button actions assigned .uno:SortAscending and .uno:SortDescending are integral to the Sort code--they are just called with defaults.  But, I believe the logic mostly is contained in the cellsh2.cxx that you were poking at.

Code pointers for Yan? Anyone...

> What's the reason for marking any cell as header automatically? (3a) Can you
> name any use case when it is handy?

It is the nature of a "spread sheet" where selecting a column or selecting a row--if the initial cell is "text" rather than a number--it probably represents a header rather than a data element and should not be sorted. The autoset filter follows that simplistic logic.

Those defaults break down when using a spread sheet's cells to hold not-a-number table values.  When selecting a range of text/alpha-numeric cells to sort, chances are the column or row header would still be in sheet Col A or Row 1. And if the selection does not include either, then for text/alpha-numeric cells the bHasHeader default should probably be autoset to false.  

And that's my suggested 80% solution.
Comment 13 Eike Rathke 2016-06-06 15:52:00 UTC
Removing functionality that works for most uses because sorting areas of cells usually does involve header cells, and is part of the usual work flow of most users, to cater for some corner case use (i.e. only one column selected) is not the way to go. Rather, detect the special case that only one column is selected and only if so and the ScDBData area involved does not indicate there is a header row, only then assume that there is no header row. On top of that, if the first row content is text and the next row is numeric then again assume that there indeed is a header row.
Comment 14 V Stuart Foote 2016-06-06 16:16:39 UTC
(In reply to Eike Rathke from comment #13)
> Removing functionality that works for most uses because sorting areas of
> cells usually does involve header cells, and is part of the usual work flow
> of most users, to cater for some corner case use (i.e. only one column
> selected) is not the way to go. 

Yes I think the hard coding of the bHasHeader value false has been quashed.

> Rather, detect the special case that only
> one column is selected and only if so and the ScDBData area involved does
> not indicate there is a header row, only then assume that there is no header
> row. On top of that, if the first row content is text and the next row is
> numeric then again assume that there indeed is a header row.

So for determining bHasHeader, is the ScDBData area recalculated when cells are selected? Or are they attributes of the entire sheet with focus? 

And in either case where in source is the logic for determining if the area contains header row?  

I keep getting lost working thorugh the source, but seems like now if it has Text in the initial cell (column or row) it receives bHasHeader true.  

Otherwise not apparent that the value must always be set true--clearly selections interior to the initial row or column of a sheet are less likely to contain textual or numeric values intended as headers. Can't that be added to the sort logic?
Comment 15 Eike Rathke 2016-06-06 20:35:48 UTC
There are two types of ScDBData areas, one is the defined database ranges as under Data -> Define Range, the second is an anonymous DBData range, of which one per sheet can exist and is redefined with each selection that operates on different areas to be sorted or filtered and is not a defined database range. The defined database ranges have an attribute whether a header row exists, for the anonymous ranges it is determined by using ScDocument::HasColHeader()

If you set a breakpoint in ScDocument::HasColHeader() you'll notice that it gets called twice for an anonymous range, once from ScDocShell::GetDBData() before aDocument.GetAnonymousDBData() is called (and if there is none yet a new ScDBData is created and set via aDocument.SetAnonymousDBData()), the other from ScCellShell::ExecuteDB(). The latter is called after ScDBData* pDBData has already be obtained, so likely it is superfluous IF the ScDBData is not anonymous and already indicates a header row, but instead it should be checked if the to be sorted area includes the header row if one is defined at ScDBData, and if not then set aSortParam.bHasHeader=false. However, for an anonymous ScDBData area it is not enough to rely on bHasHeader==false because after it was defined last, data and thus column headers could had been added.

ScTable::HasColHeader() could do a special handling for if only one column was selected. That needs thorough investigation of all places that call it or ScDocument::HasColHeader() whether it's really applicable, or needs a parameter to be added to differentiate.

Also note that there are several places that create an anonymous ScDBData maybe followed by a ScDocument::HasColHeader(), maybe those could be streamlined into one or two functions.

And maybe I forgot to mention one or two things.. ;-)
Comment 16 almos 2016-06-06 21:04:32 UTC
(In reply to V Stuart Foote from comment #12)
> It is the nature of a "spread sheet" where selecting a column or selecting a
> row--if the initial cell is "text" rather than a number--it probably
> represents a header rather than a data element and should not be sorted. The
> autoset filter follows that simplistic logic.

Really? Have you ever tried to sort a column of text cells?

Here's what MS Excel does:
- if first row is text, second is number, first row is header
- if first row is text, second row is text, no header
- if first row is number, no header
AFAICT the "first row is text" condition is true if at least one cell is text.

LO does this:
- if first row is all text, first row is header
Comment 17 Yan Pas 2016-06-17 22:05:04 UTC
Created attachment 125718 [details]
gdb_ScTable::HasColHeader

I am rewriting ScTable::HasColHeader function currently and has noticed that it accepts wrong arguments (see gdb output). hasColHeader should check every row, not every column. 
What kind of refactor would be better?
1) Call pDoc->HasRowHeader from cellsh2.cxx
2) Rewrite ScTable::HasColHeader
3) Pass different set of arguments to ScTable::HasColHeader.

My current variant:
bool ScTable::HasColHeader( SCCOL nStartCol, SCROW nStartRow, SCCOL nEndCol, SCROW /* nEndRow */ ) const
{
    CellType firstCellType = GetCellType(nStartCol, nStartRow);
    if (firstCellType != CELLTYPE_STRING && firstCellType != CELLTYPE_EDIT)
        return false;
    for (SCCOL nCol=nStartCol+1; nCol<=nEndCol; nCol++)
    {
        if (GetCellType( nCol, nStartRow ) != CELLTYPE_VALUE)
            return false;
    }
    return true;
}
Comment 18 Buovjaga 2016-06-26 12:52:41 UTC
*** Bug 100575 has been marked as a duplicate of this bug. ***
Comment 19 Yan Pas 2016-08-05 22:04:57 UTC Comment hidden (off-topic)
Comment 20 Commit Notification 2016-08-18 15:47:31 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=63592999a067d76c896ed79204330e1a4b934c80

refine HasColHeader()/HasRowHeader() conditions, tdf#91305 related

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Eike Rathke 2016-08-18 18:06:06 UTC
Actually also commit https://gerrit.libreoffice.org/gitweb?p=core.git;a=commit;h=4cd9e45a439b654c8e1ff7983fe7e4bd073b9c92 by Yan Pas was part of this, but apparently Bugzilla had hick-ups when the commit notification was to be added.
Comment 22 Yan Pas 2016-08-20 13:50:40 UTC
How to backport it to 5.2 ?
Comment 23 Commit Notification 2016-08-29 12:04:00 UTC
Yan Pashkovsky committed a patch related to this issue.
It has been pushed to "libreoffice-5-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f45d0d62917606a02dfd646b9ad91b92de462a91&h=libreoffice-5-2

tdf#91305 fix sort calc

It will be available in 5.2.2.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 24 m.a.riosv 2016-09-20 21:37:06 UTC
*** Bug 102292 has been marked as a duplicate of this bug. ***
Comment 25 Commit Notification 2018-05-09 15:41:41 UTC
Zdeněk Crhonek committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1887806c401e3f2d522f500c9b83bdd452030c49

uitest for bug tdf#91305

It will be available in 6.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.