Description: Referring to the new feature "Remove Duplicates" - see https://bugs.documentfoundation.org/show_bug.cgi?id=85976 This is more an enhancement of the new feature than a bug. But due to the fact, that the new feature is much too slow, this "bugzilla post" can be seen as a bug as well. I had an own version for Remove Duplicates based on a self-developed Python script. In an example with appr. 10.000 rows and 2 columns, my script needs 0.2 seconds. The new built-in function needs 28 seconds for the same. This is a time multiplier of 133. A subsequent 'undo' takes no time in case of my version (let's assume 0.05 secs), and with the built-in version 86 seconds (time multiplier: 1720). This looks like that the built-in function needs a big improvement to become usable. I can share my python script if needed. The main way of working is, that the data are read from the spreadsheet into a python table, all work is done in that table and then the table is written back to the spreadsheet. It also contains an option to automatically sort the data which makes a further big speed up. In case of 10.000 rows this is not needed with a low number of columns. In case of 100.000 rows this should be used to avoid long waiting time. Only issue of my script is a limitation due to the "selection.getData" function which stops working in case of more than 262.144 rows (2^18). The limitation of this function should be filed in another bug report if needed. Steps to Reproduce: Use of "Remove Duplicate" 1. Create a spreadsheet with 10000 rows and 2 columns 2. Run new Built-In "Remove Duplicate" function Undo 1. Press Ctrl-Z directly after the above Remove Duplicates Actual Results: Very slow : > 20 seconds for Remove Duplicates resp. > 80 seconds for undo Expected Results: Both steps should be in less than one second Reproducible: Always User Profile Reset: No Additional Info: I think, all is mentioned in the "Description" already.
Thank you for reporting the bug. Please attach a sample document, as this makes it easier for us to verify the bug. I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the requested document is provided. (Please note that the attachment will be public, remove any sensitive information before attaching it. See https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F for help on how to do so.)
Created attachment 199598 [details] ODS File with an example on which to run the "Remove Duplicates" function
Really slow with Version: 25.2.1.2 (X86_64) / LibreOffice Community Build ID: d3abf4aee5fd705e4a92bba33a32f40bc4e56f49 CPU threads: 16; OS: Windows 11 X86_64 (10.0 build 26100); UI render: Skia/Raster; VCL: win Locale: es-ES (es_ES); UI: en-US Calc: CL threaded only a bit quicker Version: 25.8.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: 1622d672b8cc721d5f9917931f6d8d999f218f7a CPU threads: 16; OS: Windows 11 X86_64 (build 26100); UI render: Skia/Raster; VCL: win Locale: en-US (es_ES); UI: en-GB Calc: CL threaded
(In reply to Hartmut from comment #0) > I had an own version for Remove Duplicates based on a self-developed Python > script. Can you please share your Python script for analysis?
On my PC, it can be done in about 1.7 seconds in Basic and 0.07 seconds in Python. However, Python ignores cell formatting (same as Unique function) It must be at least faster than what is done in Basic.
Created attachment 199613 [details] My Python Script for "Remove Duplicates" This is my Python script for "Remove Duplicates". The needed dialogue boxes are created within the script. I started with a BASIC script first, but this was too slow. Then I re-coded it in Python and used this approach to learn Python.