Bug 85976 - [RFE] Add a "remove duplicate records" command
Summary: [RFE] Add a "remove duplicate records" command
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: high enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: difficultyInteresting, easyHack, skillCpp, topicDebug
: 109519 124758 144744 (view as bug list)
Depends on:
Blocks: Calc-Enhancements Data-Filter
  Show dependency treegraph
 
Reported: 2014-11-06 17:31 UTC by Mr. Bugz
Modified: 2023-11-29 13:17 UTC (History)
29 users (show)

See Also:
Crash report or crash signature:


Attachments
Remove Duplicates button in Data Ribbon in Excel 2016 (171.06 KB, image/png)
2018-10-22 13:03 UTC, Pedro
Details
Only Office also acquired the Remove Duplicates functionality (36.12 KB, image/png)
2022-01-10 15:58 UTC, Pedro
Details
Remove Duplicates in Google Sheets (1.15 MB, image/png)
2022-01-10 16:08 UTC, Pedro
Details
Why remove the whole row? (102.34 KB, image/png)
2023-01-14 19:52 UTC, gmolleda
Details
WPS Worksheets Manage Duplicates menu (50.50 KB, image/png)
2023-02-23 15:54 UTC, Pedro
Details
Dialog (33.66 KB, image/png)
2023-02-23 15:56 UTC, Pedro
Details
Highlight values (25.63 KB, image/png)
2023-02-23 15:56 UTC, Pedro
Details
Fetch unique values (30.61 KB, image/png)
2023-02-23 16:00 UTC, Pedro
Details
License changed to MPL2.0 (50.65 KB, image/png)
2023-03-08 10:06 UTC, Pedro
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mr. Bugz 2014-11-06 17:31:54 UTC
Excel has a feature where you can remove duplicates with a single click. This would be very helpful to have in LibreOffice as well.
Comment 1 Joel Madero 2014-11-06 18:00:34 UTC Comment hidden (obsolete)
Comment 2 libreoffice 2014-11-24 17:54:39 UTC
Yes, that is what I was looking for. But I still think it should be an optional button/dialog available on the main toolbar without having to go through all that.

Like this:
https://thinkandbegin.files.wordpress.com/2012/05/remove-dup-2.png
Comment 3 m.a.riosv 2015-07-29 22:02:51 UTC
*** Bug 92990 has been marked as a duplicate of this bug. ***
Comment 4 Erelyn Alves 2015-07-31 15:21:38 UTC
Below also another program that easily performs the procedure.

WPS Spreadsheets
http://pt-br.tinypic.com/r/2h663gj/8
Comment 5 Mike Kaganski 2018-09-07 08:20:10 UTC
Additional detail: in Excel, the function *removes* duplicates in-place, while filter in LO only allows hiding in-place, or removing by copying to a different location. So actually, LO lacks the functionality.
Comment 6 Roman Kuznetsov 2018-09-07 08:22:39 UTC
(In reply to Joel Madero from comment #1)
> Like this:
> http://milospjanic.blogspot.com/2011/10/how-to-remove-duplicates-in-
> libreoffice.html ?

IMHO, in Excel it made more clear for users. Different dialogue with very simple options. And in result in Excel we have only rows without any duplicates and without copying of result to another range...
Comment 7 Cor Nouws 2018-10-18 14:42:37 UTC
*** Bug 109519 has been marked as a duplicate of this bug. ***
Comment 8 Pedro 2018-10-22 13:01:28 UTC
I would also like to request that "Remove Duplicates" is added to Calc.
In Excel, removing duplicates from a column is a one click action. In Calc I have to go through multiple steps to obtain this. It's a waste of time if it's encessary to do it in multiple spreadsheets, it's a waste of time to make a macro and other spreadsheets offer this by default.

There is an extension that adds this function to Calc so I propose that this is added to the default installation, that a .uno command is created for this button to be integrated in the Notebookbars as well.

https://extensions.libreoffice.org/extensions/remove-duplicates
Comment 9 Pedro 2018-10-22 13:03:37 UTC
Created attachment 145904 [details]
Remove Duplicates button in Data Ribbon in Excel 2016
Comment 10 Pedro 2018-10-22 13:08:28 UTC Comment hidden (obsolete)
Comment 11 Pedro 2018-10-30 18:06:45 UTC Comment hidden (no-value)
Comment 12 Thomas Lendo 2018-11-06 08:45:42 UTC
*** Bug 73712 has been marked as a duplicate of this bug. ***
Comment 13 Pedro 2018-12-18 12:25:57 UTC Comment hidden (obsolete)
Comment 14 V Stuart Foote 2019-04-15 19:37:25 UTC
*** Bug 124758 has been marked as a duplicate of this bug. ***
Comment 15 Roman Kuznetsov 2019-11-09 22:42:18 UTC
(In reply to Pedro from comment #13)
> Well, Muhammet Kara after checking the extension confirmed that it is
> written in Basic, meaning that it is a macro.
> 
> If someone would translate it to C++, it would be awesome.

Now there is faster extension https://extensions.libreoffice.org/extensions/remove-duplicates-fast for it
Comment 16 Pedro 2019-11-11 10:03:13 UTC
Couldn't this extension be added by default to LibreOffice?
Comment 17 Pedro 2019-11-11 10:04:29 UTC Comment hidden (no-value)
Comment 18 Mark Leo 2020-05-17 18:37:52 UTC Comment hidden (spam)
Comment 19 Heiko Tietze 2020-06-05 11:34:20 UTC
Since we have a working solution with the standard filter dialog it should be easy to add a new UNO command and run the filter procedure with a predefined setting. 

Code pointer

sc/source/ui/view/tabvwshc.cxx #308
sc/source/ui/dbgui/sfiltdlg.cxx

Would understand this as a medium to interesting difficulty.
Comment 20 Pedro 2020-06-06 09:58:02 UTC
Just to mention: this is not just filtering. It also deletes the cells with duplicate values. But I guess that is one minor thing to add.
Comment 21 Andrea Winslet 2020-06-22 06:05:32 UTC Comment hidden (spam)
Comment 22 stutisharma900 2020-09-10 02:57:40 UTC Comment hidden (spam)
Comment 23 Pedro 2020-11-23 10:24:01 UTC Comment hidden (obsolete)
Comment 25 Buovjaga 2020-11-23 11:26:08 UTC Comment hidden (obsolete)
Comment 26 Pedro 2020-11-23 13:09:53 UTC Comment hidden (obsolete)
Comment 27 Evgeniy 2021-02-28 18:28:14 UTC Comment hidden (me-too)
Comment 28 breetlee9211 2021-03-19 07:39:15 UTC Comment hidden (spam)
Comment 29 Richard Swayar 2021-07-05 09:12:34 UTC Comment hidden (spam)
Comment 30 Pedro 2022-01-10 15:58:41 UTC
Created attachment 177433 [details]
Only Office also acquired the Remove Duplicates functionality

Only Office also got the Remove Duplicates functionality by default.
Comment 31 Pedro 2022-01-10 16:08:54 UTC
Created attachment 177434 [details]
Remove Duplicates in Google Sheets

This feature is also present in Google Sheets.
Comment 32 Heiko Tietze 2022-01-11 08:44:57 UTC
*** Bug 144744 has been marked as a duplicate of this bug. ***
Comment 33 Mike Kaganski 2022-01-11 09:00:06 UTC
Note that any feature request having a working extension with compatible license is already an easy hack. Just use its source code as the template that provides the required logic - it uses UNO commands, which may be easily converted to a C++ code, and assign a new UNO command to that new function.

(Given that Remove Duplicates Fast is based on Remove Duplicates (https://github.com/ACTom/lo-extension-removeduplicates) which is GPLv3, which doesn't allow to use its code in LO directly because we need an MPL-compatible license, interested parties may ask the author for changed license in a github issue.)
Comment 34 Pedro 2022-01-11 12:09:10 UTC
What MPL specific license do you recommend?
Comment 35 Pedro 2022-01-11 12:13:59 UTC
Asked the dev in a new issue.
Comment 36 Pedro 2022-01-11 12:28:55 UTC
The developer already changed the license to MPL 2.0. Hopefully this allows someone to pick this up. :D
Comment 37 gmolleda 2023-01-14 19:37:54 UTC
Important, also the values of the duplicate selected cells should be deleted and not the entire rows. Moving the remaining data up to leave the blank spaces below.
Comment 38 gmolleda 2023-01-14 19:52:54 UTC
Created attachment 184664 [details]
Why remove the whole row?

The filter hides the entire 8th row, including the letter f (view image attached).
The correct behavior would be to remove values 1 and 4 from row 8. Do not filter by hiding row 8 but move values 2 and 1 from row 9 up one cell and leave B9 and C9 empty.
I know this was a standard filter, but the button that they put specifically to remove duplicates should not be a filter, but actually remove the duplicate values within the selection and not the entire row.
Comment 39 Pedro 2023-02-23 15:54:17 UTC
Created attachment 185548 [details]
WPS Worksheets Manage Duplicates menu

WPS Worksheets is vastly superior in managing duplicates to any other office suite.
It not only allows for removal, but also highlighting and also fetching unique values or highlight them.
Comment 40 Pedro 2023-02-23 15:56:01 UTC
Created attachment 185549 [details]
Dialog

It allows selecting duplicates in selected range, in two ranges within the sheet, in different sheets in the worksheet and in different worksheets!
Comment 41 Pedro 2023-02-23 15:56:43 UTC
Created attachment 185550 [details]
Highlight values
Comment 42 Pedro 2023-02-23 16:00:15 UTC
Created attachment 185551 [details]
Fetch unique values

This is a feature already available in MSO since at least 2007 and it's a feature that's been actively worked on in other Office suites as well, to a point where it is very well designed.

All in all, this is a feature that is sorely lacking in Calc for several years now and with the evolution of this feature visible in OnlyOffice and WPS Worksheets (available in Linux as well), and with the RemoveDuplicates extension having compatible license with LibO it's hard to understand why this has been overlooked for so long now.
Comment 43 Eike Rathke 2023-02-28 16:18:46 UTC
(In reply to Pedro from comment #42)
> with the RemoveDuplicates
> extension having compatible license with LibO it's hard to understand why
> this has been overlooked for so long now.
0. both extensions
https://extensions.libreoffice.org/en/extensions/show/remove-duplicates
https://extensions.libreoffice.org/en/extensions/show/remove-duplicates-fast
are licensed GPL (whatever version) and thus are *not* compatible with LibreOffice licensing.

1. even if they were, that tells nothing about the source code whether it would fit into LO core code (or even be in C++ that it could).

2. if those extensions fulfil the requirements, then why not use them.
Comment 44 Heiko Tietze 2023-03-01 08:24:21 UTC
If we realize bug 149933 searching for duplicates could be one option too.
Comment 45 Pedro 2023-03-08 10:05:28 UTC
(In reply to Eike Rathke from comment #43)
> (In reply to Pedro from comment #42)
> > with the RemoveDuplicates
> > extension having compatible license with LibO it's hard to understand why
> > this has been overlooked for so long now.
> 0. both extensions
> https://extensions.libreoffice.org/en/extensions/show/remove-duplicates
> https://extensions.libreoffice.org/en/extensions/show/remove-duplicates-fast
> are licensed GPL (whatever version) and thus are *not* compatible with
> LibreOffice licensing.
> 
> 1. even if they were, that tells nothing about the source code whether it
> would fit into LO core code (or even be in C++ that it could).
> 
> 2. if those extensions fulfil the requirements, then why not use them.

Eike Rathke, the developer changed the license on his github repo to MPL 2.0.
Comment 46 Pedro 2023-03-08 10:06:05 UTC
Created attachment 185837 [details]
License changed to MPL2.0
Comment 47 Pedro 2023-03-08 10:06:51 UTC
The Fast extension was an improvement done by Mike Kaganski and Kompilainnen I believe. They did not change the license on their extension yet.
Comment 48 Eyal Rozenberg 2023-03-10 19:39:26 UTC
Am definitely missing this in Calc right now.
Comment 49 Mike Kaganski 2023-10-15 13:27:44 UTC
(In reply to Pedro from comment #47)
> The Fast extension was an improvement done by Mike Kaganski and Kompilainnen
> I believe. They did not change the license on their extension yet.

Since our extension was based on the previous one, our license was necessarily the same. Since the old extension's license is now MPL 2.0, I am glad to re-license my work under MPL 2.0.

Roman's turn.
Comment 50 Eyal Rozenberg 2023-10-15 17:16:16 UTC
I'm assuming this bug is about adding the command. If we also want to simplify/alter the filtering dialog - that should be a separate bug.

(If I'm wrong - please change the title, clarify the bug's scope in a comment, and refer to the comment in the title)
Comment 51 Mike Kaganski 2023-10-16 15:32:27 UTC
(In reply to Eyal Rozenberg from comment #50)

The original request was to implement a feature *like Excel's "remove duplicates"*. Filters are orthogonal to that, they never remove any duplicates, only hide or do a partial copy.
Comment 52 JosephGill 2023-11-29 13:17:47 UTC Comment hidden (spam)