Bug 149984 - Add counts to AutoFilter drop down (with Copy)
Summary: Add counts to AutoFilter drop down (with Copy)
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: low enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: AutoFilter
  Show dependency treegraph
 
Reported: 2022-07-13 18:45 UTC by Pierre Fortin
Modified: 2023-10-25 11:00 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Mock up of suggestion (61.02 KB, image/png)
2022-07-14 16:00 UTC, Pierre Fortin
Details
update mock up (65.66 KB, image/png)
2022-07-15 14:13 UTC, Pierre Fortin
Details
OnlyOffice 7.4.1.36 filter (85.68 KB, image/png)
2023-10-05 06:20 UTC, Stéphane Guillou (stragu)
Details
OpenRefine 3.7.5 text facet (179.52 KB, image/png)
2023-10-05 06:21 UTC, Stéphane Guillou (stragu)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pierre Fortin 2022-07-13 18:45:55 UTC
Description:
Using AutoFilter to check each value independently is time consuming.  Having counts next to each value and ability to sort by count would help immensely.

Steps to Reproduce:
1. Data>AutoFilter
2. 
3.

Actual Results:
Selecting a filter gives all the values in a column.  


Expected Results:
It would be very useful to have the option to display a count next to each value, and Sort {Asc,Des} by Count...



Reproducible: Always


User Profile Reset: No



Additional Info:
We spend a lot of time:
- choose one value; examine the rows
- choose another value, examine the rows
- repeat...

Having counts of each value and being able to sort on counts would be a great time saver:
  [x] Value1 1234567
  [x] Value2    5432
  [x] Value3      12
The "n of m records found" is often all we are looking for. This enhancement would eliminate the need to select and wait for results.
Comment 1 Rafael Lima 2022-07-13 20:06:23 UTC
I think this might impact performance on very large tables.

However, I see the value in having the count information for each unique entry in the autofilter.

Let's add the UX team in the loop.
Comment 2 Pierre Fortin 2022-07-13 21:49:30 UTC
(In reply to Rafael Lima from comment #1)
> I think this might impact performance on very large tables.

Agreed; but the status line is updated with the count of selected rows. Seems like that counting code would be the best place to count each unique entry; then sorting by count should be no worse than: Filter by Condition>{Top,Bottom} 10

Thanks for considering. We daily deal with script-generated .csv files which have to be limited to 1M rows; so having 16M rows is a HUGE improvement for us. That reduces the amount of file splitting we have to do.  Our worst case .csv is over 75M records; but now having 16M rows will do until CPU horsepower and memory get much better.  :)
Comment 3 Eike Rathke 2022-07-13 22:44:12 UTC
You want to use a pivot table, menu Data -> Pivot Table, with Count on a Data Field. See https://help.libreoffice.org/7.4/en-GB/text/scalc/guide/datapilot_createtable.html?DbPAR=CALC
Comment 4 Pierre Fortin 2022-07-14 01:57:23 UTC
I'm not seeing how a pivot table is going to help me...  

My sheets, on average, consist of millions of rows of text data; no numeric fields to add or compute.  The AutoFilter lets me select categories within columns such as city, gender, age, street, whatever...  With those selections, the raw data on the filtered rows is all I want/need to see.  Having totals on the AutoFilter per unique value would help decide if it's even worth drilling down into certain values. 

In a multi-million row sheet, most columns have small sets of unique values: State: 1, Moved-To: ~50, Moved-From: ~50, gender(sex): 3, County: 100, UniqueID: rows-1, etc...   The number of rows based on the filters is in the Status Bar (N of M records found) is of interest; but it's the raw data (all columns) that we want to see; not some summary.  Our "summary" is the set of visible rows.

Unless the pivot table rules have changed, our data violates the first rule of pivot tables: lots of empty cells.  In some fields, it's important to select rows which have empty cells in specific columns; such as large building tenants without unit numbers...

I'm open to seeing if pivot tables are useful here; but it seems like lots of wasted user time when a quick glance at a filter's list of values often tells a story at a glance, as in:  "Whoa! Something's not right here... there are too {many,few} whatevers in this set... Let's look at this [set of] value(s)..."

The goal is simply filtered views of entire rows; not sales numbers, etc.
Comment 5 Heiko Tietze 2022-07-14 06:39:47 UTC
Please have a look at the Data Provider feature. Being kind of experimental it's not yet included in the UI, you have to add the commands via Tools > Customize. The DP allows you to filter and manipulate data before loading into the sheet - and if you connect to the same source it's just an update later.

Summary from the last GSoC project is here: https://libreoffice-dataproviders.blogspot.com/2021/08/data-providers.html


Back to the actual request: Enhancing the sort function with count sounds like a can of worms to me. Why not Sum()? And UX wise it adds quite some complexity where average user wrap their mind around what "Sort Ascending by Count" could mean.

So my take would be to rather write a macro. Shouldn't be too difficult. If the DP is not helpful...
Comment 6 Pierre Fortin 2022-07-14 16:00:11 UTC
Created attachment 181265 [details]
Mock up of suggestion

(In reply to Heiko Tietze from comment #5)
> Please have a look at the Data Provider feature. Being kind of experimental
> it's not yet included in the UI, you have to add the commands via Tools >
> Customize. The DP allows you to filter and manipulate data before loading
> into the sheet - and if you connect to the same source it's just an update
> later.

Sounds interesting; we may be able to use this for final reports; asuming this can be used to conditionally load data. It would not help during our investigation phases, we need all data where we do whatifs based on text content. Much easier and faster than running a series of SQL queries, where modifying the query, rerun, view new results would slow down the "didn't expect xyz, filter out a,b,d,g,x from another column; oh wow!" process.

> Summary from the last GSoC project is here:
> https://libreoffice-dataproviders.blogspot.com/2021/08/data-providers.html
> 
> 
> Back to the actual request: Enhancing the sort function with count sounds
> like a can of worms to me. Why not Sum()? 

This data has nothing to Sum; it's all about counts of content. CountIf('Active') would make more sense; but we are often surprised by new values... AutoFilter is a much faster process for us.  I work with a team of individuals, all from home.  [BTW, one member reports that LO 7.5 (Windows) crashes on save (any format); otherwise, he's happy with his first time loading 8M+ rows, formatting and editing.  Doubt he's the type to file bug reports though, and I don't have his details.]

> And UX wise it adds quite some
> complexity where average user wrap their mind around what "Sort Ascending by
> Count" could mean.

Agreed. BTW, what does "Filter by Condition>Top 10" really mean?  :)  It sounds like this may be what I want, only worded differently...  
Trying with tiny sheet (240K rows)...
Nope.  Column contains only {Active,Inactive,Temporary} -- Filter by Condition>Top 10 hides every row. Now, I have no idea what {Top,Bottom} 10 means. Looks like I'll have to dig into the docs...  No time today...

> So my take would be to rather write a macro. Shouldn't be too difficult. If
> the DP is not helpful...

We're dealing with AutoFilter as it is:
- load data
- filter & view results
- next depends on what we see... 
- add/change/remove filters & view new results
- repeat

Writing a macro to replicate AutoFilter (including a UI), with only counts of data occurrences added...  ;p

Playing with this tiny sheet, I just spotted something totally new which gives me another issue to dig into...  In this case, it appears there are hundreds of unique values -- AutoFilter had to discover every unique value and create a selector for each.  I would think counting each value inside that loop should be minor. Although the display is more complex... See attachment:  each unique value has a count, and total of unique values next to All...   HTH
Comment 7 Heiko Tietze 2022-07-15 06:16:33 UTC
(In reply to Pierre Fortin from comment #6)
> Created attachment 181265 [details]
> Mock up of suggestion

Indeed, this would be acceptable UX-wise.

Foo          (12) = (COUNTIF(<RANGE>,"Foo")
Bar         (276)
Baz           (3)
Very long... (42)

Added the brackets to more clearly indicate the count, not necessarily a good idea. And ellipsis to cover for long names.

Question is if such pre-processing is cheap or has serious impact on performance.

> Agreed. BTW, what does "Filter by Condition>Top 10" really mean?  :)
https://help.libreoffice.org/7.3/en-US/text/scalc/01/12040100.html

1,1,1,2,2,2...,10,20,30...100 shows 10..100 with TOP10 not 1,2... with the highest rank. An alternative to your idea could be to have RANK10 that filters for the highest countif() items.
Comment 8 Pierre Fortin 2022-07-15 14:13:33 UTC
Created attachment 181284 [details]
update mock up

The parens don't add anything IMHO...  the ellipsis can do dual duty: indicate truncation and separate the value & count.
I chose the sample counts because that's the type of data we're dealing with...

> Question is if such pre-processing is cheap or has serious impact on performance.

In my python scripts, adding additional counts within loops is virtually unnoticeable...  e.g., 
    try:    COUNTS[value] += 1
    except: COUNTS.update( { value: 1 } )

[v]All   3046 [v] [x]
would indicate the number of values -- a LONG list (e.g., 3046) is realistic; with the relatively tiny window, scrolling through such a long list is why adding a sort capability would help...  While sort would be great, the counts are the main idea.  

In this mock up, I added a way to sort the list, and was struggling with how to represent this idea in the event it's accepted -- here's an idea:
^ = ascending counts
x = default sort of values
v = descending counts
although the UI code may not allow such granularity...

Thanks for considering...
Comment 9 Heiko Tietze 2022-08-01 08:22:46 UTC
We discussed the topic in the design meeting.

It mixes the feature AutoFilter with another function, countIf(). Then there will for sure be others with different handy functions to add. Making it optional is a prerequisite. The visual noise of the numbers could be solved with some kind of "pill/badge" styling (smaller font size, colored background). 

There is no strong argument against adding this function.
Comment 10 Pierre Fortin 2022-08-01 11:05:35 UTC
(In reply to Heiko Tietze from comment #9)
 
> There is no strong argument against adding this function.

This is wonderful news!!  I was using the filters extensively last night for a specific research item, wishing I had that feature.  I found myself selecting different values with the sole purpose of getting counts for various values; so I was doing a highly repetitive repetitive task when it hit me...

This feature would get MUCH use here -- don't know why I didn't add the following to the request:  have the ability to copy the value/count list to the copy/paste buffer, making this data instantly available for reports/emails...

I'm quite happy with the current 'dev' version... Having multiple sheets open last night with some containing 5-8M+ rows was a dream; being able to copy a values/counts list into a report would have saved MUCH time getting this info.

Based on last night's results, today I will be modifying a python script to gather similar results from multi-terabytes of data now that I have a better understanding of that data.  This feature (with Copy) on top of the JUMBO feature would be an AWESOME time-saver for us!!
THANK YOU!!!!
Comment 11 Pierre Fortin 2022-08-01 11:16:05 UTC
Trying to focus on most important parts of this request, I would be super happy with values/counts, and the ability to Copy those.  Sorting can wait (even be eliminated) since that can be done once the value/count data is pasted to another sheet. No need to overload the feature...  :)
Comment 12 Pierre Fortin 2022-10-20 11:46:03 UTC Comment hidden (off-topic)
Comment 13 Pierre Fortin 2023-01-27 20:38:08 UTC Comment hidden (off-topic)
Comment 14 Heiko Tietze 2023-01-30 13:32:22 UTC Comment hidden (off-topic)
Comment 15 Stéphane Guillou (stragu) 2023-10-05 06:20:35 UTC
Created attachment 190031 [details]
OnlyOffice 7.4.1.36 filter

I agree this would be a welcome addition.

Here's the counts in OnlyOffice 7.4.1.36's equivalent "Filter".
Comment 16 Stéphane Guillou (stragu) 2023-10-05 06:21:43 UTC
Created attachment 190032 [details]
OpenRefine 3.7.5 text facet

... and OpenRefine 3.7.5's "Text Facet" (which can be used like LO's AutoFilter).