Bug 123095 - Auto filter result is wrong when cell value contains full-width and half-width characters and case-insensitive filtering is used
Summary: Auto filter result is wrong when cell value contains full-width and half-widt...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
(earliest affected) release
Hardware: All All
: medium normal
Assignee: Not Assigned
Keywords: bibisected, bisected, regression
Depends on:
Blocks: AutoFilter
  Show dependency treegraph
Reported: 2019-02-01 04:31 UTC by Kevin Suo
Modified: 2020-11-20 12:28 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:

Test ODS File (10.55 KB, application/vnd.oasis.opendocument.spreadsheet)
2019-02-01 04:31 UTC, Kevin Suo

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Suo 2019-02-01 04:31:42 UTC
Created attachment 148812 [details]
Test ODS File

The attached ODS file contains values "乙二醇(进口料件)" and "乙二醇(进口料件)". The first value has half-width parentheses (i.e., "()"), while the second value has full-width parentheses (i.e., "()"). When using auto-filter, the autofilter dropdown box shows only the half-width value.

Steps to Reproduce:
1. Click the drop-down list of auto filter in the attached ODS file. 

Expected Result:
The list shows values "乙二醇(进口料件)" and "乙二醇(进口料件)". These are different values.

Currect Result:
The list shows value "乙二醇(进口料件)" only. 

This issue is hard to identify. I produced wrong results in many mission-critical work. Calc autofilter is useless under this circumstance.
Comment 1 Kevin Suo 2019-02-01 04:33:08 UTC
Reproduced in versions:

Build ID:1:6.0.7-0ubuntu0.18.04.2
CPU 线程:4; 操作系统:Linux 4.15; UI 渲染:默认; VCL: gtk3; 
区域语言:zh-CN (zh_CN.UTF-8); Calc: group

Build ID: f18954c1ba9116b85c32b6bdbc0188d3e0fd24c7
CPU 线程: 4; 操作系统: Linux 4.15; UI 渲染: 默认; VCL: gtk2; 
区域语言: zh-CN (zh_CN.UTF-8); Calc: group threaded

Ubuntu 18.04 LTS.
Comment 2 Durgapriyanka 2019-02-01 18:04:32 UTC
Thank you for reporting the bug. I confirm this bug in

Build ID: b6b28931435e44aca92b8c0e1659f701e3ed1a87
CPU threads: 2; OS: Windows 6.1; UI render: default; VCL: win; 
TinderBox: Win-x86@42, Branch:master, Time: 2019-01-30_06:57:04
Locale: en-US (en_US); UI-Language: en-US
Calc: threaded
Comment 3 Buovjaga 2019-02-11 14:30:50 UTC
Bibisected on Linux with 41max to range https://gerrit.libreoffice.org/plugins/gitiles/core/+log/8a39c27633e75819e219bc5d179375fb4308a3ca..dc173b7f2a550185404aacbc6da744cb6d1880fc

This seems to be the one: https://gerrit.libreoffice.org/plugins/gitiles/core/+/338140e5201bc23625eacd1a4740622e6241cbc2%5E%21/
bnc#492440: Sort autofilter popup items correctly.

The severity is normal rather than major as there is no crash or a completely nonworking feature https://wiki.documentfoundation.org/images/0/06/Prioritizing_Bugs_Flowchart.jpg
Comment 4 Takeshi Abe 2019-02-14 15:19:53 UTC
By the way the reported behavior is due to the default case-insensitive filtering.
A simple workaround is to open the Standard Filter dialog by choosing "Standard Filter ..." in the popup, check "Case sensitive" in the "Options" drawer, and enter OK.  Then both "乙二醇(进口料件)" and "乙二醇(进口料件)" appears in the drop-down list.
Comment 5 Kevin Suo 2020-11-20 03:45:53 UTC
Still reproducible on libreoffice 7.0.

This behaviour is very dangerous - It may cause hard-to-identify error in your spreadsheet results.

Adding cc to Kohei Yoshida as it was bibisected to one of your old commit. Could you please take a loook? Thanks.
Comment 6 Kevin Suo 2020-11-20 10:46:36 UTC
I confirm it is caused by 
commit 338140e5201bc23625eacd1a4740622e6241cbc2
author	Kohei Yoshida <kohei.yoshida@gmail.com>	Wed Jan 30 22:32:28 2013 -0500

bnc#492440: Sort autofilter popup items correctly.

Changing GetCaseCollator back to GetCaseTransliteration will make the autofilter in this case work OK again.
Comment 7 Kevin Suo 2020-11-20 11:13:58 UTC Comment hidden (obsolete)
Comment 8 Kevin Suo 2020-11-20 11:15:34 UTC
Maybe we can treat "(x)" and "(x)" to be the same because case-sensitive is off, but the problem is that if they are treated the same, then if I select "(x)" in the dropdown list and click OK, then the filtered result should include both "(x)" and "(x)".

Steps to Reproduce:

1. Produce the following records:

2. Set autofilter.
--> It shows "(x)" and "y". 
(Fine, if we agree that the half-width and full-with brackets are the same).

3. Select "(x)" only in the dropdown, click OK.
--> Bug! Only the "(x)" is in the filtered result.
(This is the risk part I am concerning. You will miss the "(x)" in your data analyzing flow.)

I have no idea whether there are other chars being treated this way.