Bug 166121 - "Remove Duplicate" needs to be improved to handle cases with many duplicates
Summary: "Remove Duplicate" needs to be improved to handle cases with many duplicates
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
25.2.2.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: perf
Depends on:
Blocks:
 
Reported: 2025-04-10 10:04 UTC by nobu
Modified: 2025-07-25 15:33 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Example file (24.20 KB, application/vnd.oasis.opendocument.spreadsheet)
2025-04-23 18:02 UTC, Johannes
Details

Note You need to log in before you can comment on or make changes to this bug.
Description nobu 2025-04-10 10:04:48 UTC
Description:
"Remove Duplicate" needs to be improved to handle cases with many duplicates.

Steps to Reproduce:
0. This is an extreme example to illustrate.
1. Open new Calc.
2. Insert "A" in Cell Range [ A1:A100000 ].
3. Select Cell Range [ A1:A100000 ].
4. Menu - Data > Duplicates
5. Actions Option is "Remove", and other Options are Defalt.
6. Push OK Button.
7. It took 7 seconds for Cells [A2:A100000] to be deleted on my PC.
8. Perform Undo.
9. It took 20 seconds on my PC.
10. Select Cell Range [ A1:A100000 ].
11. Menu - Data > Duplicates
12. Actions Option is "Select", and other Options are Defalt.
13. Push OK Button.
14. On my PC, it took less than a second for Cells [A2:A100000] to be selected.
15. Menu - Sheet > Delete Cells  ( Ctrl + - ) > Check the "Shift cells up".
16. Push OK Button.
17. It took less than a second to delete Cells [A2: A10000] on my PC.
18. Perform Undo.
19. Undo took less than a second on my PC.

Actual Results:
[2. ~ 9.] In such an extreme case, processing is too slow.

Expected Results:
[10. ~ 19.] At the very least, you should be able to do it faster than doing it manually.


Reproducible: Always


User Profile Reset: No

Additional Info:

If the lines that can be deleted are consecutive, it should be no problem to delete them all at once.

It may be unexpected that there is so much overlap, but it should be taken into account.

Reproducible with

Version: 25.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 7c46846ad1aeeca7e38eb0aada6dc99b5fb701e3
CPU threads: 4; OS: Windows 10 X86_64 (build 19045); UI render: Skia/Raster; VCL: win
Locale: ja-JP (ja_JP); UI: ja-JP
Calc: CL threaded
Comment 1 Johannes 2025-04-23 18:01:58 UTC
I can confirm that the functionality for removing duplicates is very slow with a lot of data - even with examples that are not extreme.

Reproducible with
Version: 25.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: ea4921a3e31cf87c60e2eceeec46cccdc6a79b23
CPU threads: 4; OS: Linux 6.14; UI render: default; VCL: kf5 (cairo+wayland)
Locale: nl-NL (nl_NL.UTF-8); UI: en-US
Calc: threaded
Comment 2 Johannes 2025-04-23 18:02:32 UTC
Created attachment 200477 [details]
Example file

Example file added.