Bug 166121 - "Remove Duplicate" needs to be improved to handle cases with many duplicates
Summary: "Remove Duplicate" needs to be improved to handle cases with many duplicates
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
25.2.2.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:26.2.0
Keywords: perf
Depends on:
Blocks: Calc-Enhancements Performance
  Show dependency treegraph
 
Reported: 2025-04-10 10:04 UTC by nobu
Modified: 2025-09-09 19:44 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Example file (24.20 KB, application/vnd.oasis.opendocument.spreadsheet)
2025-04-23 18:02 UTC, Johannes
Details

Note You need to log in before you can comment on or make changes to this bug.
Description nobu 2025-04-10 10:04:48 UTC
Description:
"Remove Duplicate" needs to be improved to handle cases with many duplicates.

Steps to Reproduce:
0. This is an extreme example to illustrate.
1. Open new Calc.
2. Insert "A" in Cell Range [ A1:A100000 ].
3. Select Cell Range [ A1:A100000 ].
4. Menu - Data > Duplicates
5. Actions Option is "Remove", and other Options are Defalt.
6. Push OK Button.
7. It took 7 seconds for Cells [A2:A100000] to be deleted on my PC.
8. Perform Undo.
9. It took 20 seconds on my PC.
10. Select Cell Range [ A1:A100000 ].
11. Menu - Data > Duplicates
12. Actions Option is "Select", and other Options are Defalt.
13. Push OK Button.
14. On my PC, it took less than a second for Cells [A2:A100000] to be selected.
15. Menu - Sheet > Delete Cells  ( Ctrl + - ) > Check the "Shift cells up".
16. Push OK Button.
17. It took less than a second to delete Cells [A2: A10000] on my PC.
18. Perform Undo.
19. Undo took less than a second on my PC.

Actual Results:
[2. ~ 9.] In such an extreme case, processing is too slow.

Expected Results:
[10. ~ 19.] At the very least, you should be able to do it faster than doing it manually.


Reproducible: Always


User Profile Reset: No

Additional Info:

If the lines that can be deleted are consecutive, it should be no problem to delete them all at once.

It may be unexpected that there is so much overlap, but it should be taken into account.

Reproducible with

Version: 25.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 7c46846ad1aeeca7e38eb0aada6dc99b5fb701e3
CPU threads: 4; OS: Windows 10 X86_64 (build 19045); UI render: Skia/Raster; VCL: win
Locale: ja-JP (ja_JP); UI: ja-JP
Calc: CL threaded
Comment 1 Johannes 2025-04-23 18:01:58 UTC
I can confirm that the functionality for removing duplicates is very slow with a lot of data - even with examples that are not extreme.

Reproducible with
Version: 25.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: ea4921a3e31cf87c60e2eceeec46cccdc6a79b23
CPU threads: 4; OS: Linux 6.14; UI render: default; VCL: kf5 (cairo+wayland)
Locale: nl-NL (nl_NL.UTF-8); UI: en-US
Calc: threaded
Comment 2 Johannes 2025-04-23 18:02:32 UTC
Created attachment 200477 [details]
Example file

Example file added.
Comment 4 Commit Notification 2025-09-09 19:44:18 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/29425a537bc182b38efcb1120e9f430b28d503a4

tdf#166121 supress row height calc during "Remove Duplicate"

It will be available in 26.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 5 Commit Notification 2025-09-09 19:44:20 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/56fde5e66f22d68237150c501583fdf734c69c96

tdf#166121 reduce cost of clipboard during Remove Duplicates

It will be available in 26.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.