144017 – New Function to count duplicates in list

Bug 144017 - New Function to count duplicates in list

Summary: New Function to count duplicates in list

Status:	RESOLVED INVALID

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Calc (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:	needsUXEval

Depends on:
Blocks:

Reported:	2021-08-23 07:57 UTC by John Murrell
Modified:	2021-08-27 10:27 UTC (History)
CC List:	2 users (show)

See Also:
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description John Murrell 2021-08-23 07:57:05 UTC

Description:
Calc has a function to apply conditional formatting to duplicate values in a list but this only provides visual information though when the 3 bugs are 
cleared it should be possible to filter on the duplicate values.

My suggestion is for a new function that compares a value in a cell with the values in a list and returns the number of duplicates. Zero being returned if the value does not exist in the list otherwise the number of occurrences. The cell to be compared can either be in the list in which case it will provide the number of times a value is duplicated or else external to the list where it provides the number of matches. 

To make the function more useful it could have a flag which either reports all the matches or only the first duplicate. In the first mode it would be useful to carry out other operations on the duplicate cells and in the second to count the total number of duplicates.

Suggested name for function is something like COUNTMATCH(comparison cell, list, mode) as it is an expanded version of Match with different functionality.

Steps to Reproduce:
1. Enhancement as above
2.
3.

Actual Results:
NA

Expected Results:
NA


Reproducible: Always


User Profile Reset: No


OpenGL enabled: Yes

Additional Info:
There are some work arounds such as using pivot tables to count the number of duplicate values and sorting the pivot table by the number of duplicates but this is difficult to use partly because of the need to refresh the pivot table if the data changes and also the difficulty of retrieving an unknown number of results from a pivot table. The pivot table method also does not work if the cell to be matched is outside the pivot table source array.

Comment 1 Mike Kaganski 2021-08-23 08:35:24 UTC

This is completely useless IMO. There are infinite possible things that one may think would be nice to have a dedicated function, but when there are already functions that allow one to have the data, it would only create clutter and increase learning curve, not improve experience in any way.

To know how many duplicates are there in a list for a cell in a list:

  =COUNTIFS(A1:A1000;A1)

To know how many duplicates are there in a list for a value not necessary in the list:

  =COUNTIFS(A1:A1000;B1)
  =COUNTIFS(A1:A1000;23)

To know if this is the first duplicate of this value:

  =AND(COUNTIFS($A2:$A$1000;A1)>0;COUNTIFS($A$1:$A1;A1)=1)

... and so on. Creating own formulas for complex result is the power that spreadsheets give you, and trying to invent a dedicated function for each such case is counter-productive.

WONTFIX IMO.

Comment 2 John Murrell 2021-08-26 15:07:08 UTC

Ok,

The function COUNTIFS does something similar to what I was proposing. So I will close this.

However COUNTIFS does not actually work in my application, I think because the list of random numbers re-calculates each time COUNTIFS runs so the results keep on changing. As a result when you filter on the COUNTIFS >1 you it does not show the duplicates and takes about 1/2 hour to run. Not sure if this is a bug or expected behaviour but the only way it appears to get the duplicated results is to use a pivot table or do a filter on the cell colours when the filtering on conditional results bug is fixed.

Comment 3 Eike Rathke 2021-08-27 10:27:03 UTC

(In reply to John Murrell from comment #2)
> However COUNTIFS does not actually work in my application, I think because
> the list of random numbers re-calculates each time COUNTIFS runs so the
> results keep on changing.
RAND() and RANDBETWEEN() generate new random numbers on each cell input and whenever any recalculation occurs. You could use RAND.NV() or RANDBETWEEN.NV() for non-volatile random numbers.