Bug 150936 - CALC MENU>DATA>More FILTER s> Advanced FILTER >options>No duplicates Duplicates the first item in the selection
Summary: CALC MENU>DATA>More FILTER s> Advanced FILTER >options>No duplicates Duplica...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.3.5.2 release
Hardware: All Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-09-13 19:21 UTC by Colin
Modified: 2023-02-06 17:19 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Small sample ready to prove the error (7.36 KB, application/vnd.oasis.opendocument.spreadsheet)
2022-09-13 19:22 UTC, Colin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Colin 2022-09-13 19:21:50 UTC
Description:
The Data Filter option for deduplicating a selected list will reproduce the second iteration of the first item at the sequential location where it is encountered and then complete the deduplication correctly

Steps to Reproduce:
With the enclosed sheet select the data in column E
MENU>DATA>More Filters>Advanced filter>Options check No Duplicates
Left mouse in the upper address box adjacent to "undefined"
Select the data in column E - Bug 1 it doesn't auto fill the selection
Check Copy results to
Left mouse in the lower address box adjacent to "undefined"
Focus H1 and "OK"
Observe that the second iteration of "One" is replicated at its unsorted location and then the remaining entries are deduped.
The limitation of the three groups of seven is not significant - merely a short list to demonstrate.
All other duplications have been removed when I intentionally replicated 119 full-length names ten times with only the first item duplicating once.

Actual Results:
The first item is duplicated when found in the list but only the 2nd iteration of that item.

Expected Results:
deduplicated list


Reproducible: Always


User Profile Reset: No


OpenGL enabled: Yes

Additional Info:
Version: 7.3.5.2 (x64) / LibreOffice Community
Build ID: 184fe81b8c8c30d8b5082578aee2fed2ea847c01
CPU threads: 4; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: sv-SE (en_GB); UI: en-GB
Calc: threaded
Comment 1 Colin 2022-09-13 19:22:34 UTC
Created attachment 182425 [details]
Small sample ready to prove the error
Comment 2 Rafael Lima 2022-09-13 19:33:25 UTC
Try following the steps described here. They worked for me:

https://help.libreoffice.org/latest/en-US/text/scalc/guide/remove_duplicates.html?DbPAR=CALC#bm_id3153896

As for your workflow, notice that for some reason the option "Range contains column labels" is checked by default and cannot be unchecked (at least on my end). I really could not figure out why it can't be unchecked, but this is the reason "One" appears twice... the first time it is considered as the column label.
Comment 3 Rafael Lima 2022-09-13 19:49:35 UTC
As for the issue that the selected range does not appear in the dialog, I created a separate bug report to discuss it (see bug 150937).
Comment 4 Colin 2022-09-13 19:56:10 UTC
(In reply to Rafael Lima from comment #3)
> As for the issue that the selected range does not appear in the dialog, I
> created a separate bug report to discuss it (see bug 150937).

I was busy typing a response when you posted the additional info - quite topical really. I will try to identify my earlier report of the same ilk.

 I still think this excerpt from my abandoned comment is worth consideration;
    

    Whilst I never tried the method defined in the link, I'm quite happy to accept that it functions under some circumstances but not when approaching it by another regular menu access.

    ........ I have reported earlier in a different bug report scenario where the selection isn't passed to the process but needs to be re-selected from within the process.

    I guess I was just unlucky
Comment 5 Buovjaga 2023-02-03 14:39:32 UTC
Is there a need to keep this report open? Sorry, just quickly looked at the conversation and I'm wondering about it.
Comment 6 Colin 2023-02-03 14:57:59 UTC
(In reply to Buovjaga from comment #5)
> Is there a need to keep this report open? Sorry, just quickly looked at the
> conversation and I'm wondering about it.


As I understand it, the discussions have identified numerous ways in which it fails and numerous workarounds.

If you believe another report adequately covers the issue, feel the pros and cons are cancelling each other out or you're happy to see duplicates in what purports to be a deduplicated list then I won't lose any sleep over this report being abandoned.

I imagine deduping is rarely a repeating process in any spreadsheet. I certainly only ever attempt it -as a standalone process - when I have a potentially duplicated list that needs cleaning up before being incorporated into the working environment.

As It's simple enough to get Excel to dedupe and paste the data back to LO that's what I have utilised as my workaround.
Comment 7 Pedro 2023-02-06 09:21:40 UTC
So you use a closed source piece of software to address a limitation of Calc?
Just a reminder that the RemoveDuplicates extension exists, and that it has a compatible license with LibO. This means it could be implemented by default without issues.
Comment 8 Colin 2023-02-06 09:38:24 UTC
(In reply to Pedro from comment #7)
> So you use a closed source piece of software to address a limitation of Calc?

Not Any More

> Just a reminder that the RemoveDuplicates extension exists, and that it has
> a compatible license with LibO. This means it could be implemented by
> default without issues.

Sweet, Thanks.

Is it my lucky day and you're going to tell me there's a setting to make it case-insensitive?

I noticed two similar extensions, Remove and Remove...Fast. I grabbed the first as it has a better user rating - The right choice? (without liability:)  )

Should I "pull" the report as there's an authorised extension for the job?
Comment 9 ady 2023-02-06 17:19:58 UTC
(In reply to Buovjaga from comment #5)
> Is there a need to keep this report open? Sorry, just quickly looked at the
> conversation and I'm wondering about it.

What about the:
"Range contains column labels"
being forcefully checked by default and grayed out (so it cannot be unchecked)?

Moreover, there _seems_ to be no way to select a range that doesn't include the first non-empty cell (E1 in this case). For example, in the first box we could type-in (or select) the range $E$2:$E$33, but it would ultimately result in "This range doesn't contain a valid query." The workaround for this seems to be to select the range $E$2:$E$33 beforehand (i.e. before opening the advance filter dialog). In any case, the first box is not pre-filled with the selection range.

A minor side note. In LO, the (pre)selection is not exactly (quote):
"Select the data in column E"

I mean that the selection cannot be the entire column E, but only the specific cells with the relevant data ($E$1:$E$33, or $E$2:$E$33, or whatever specific range), but not the entire column. Emphasis on _data_ (range), not in (entire) column.