Bug 123095 - Auto filter result is wrong when cell value contains full-width and half-width characters and case-insensitive filtering is used
Summary: Auto filter result is wrong when cell value contains full-width and half-widt...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.1.0.4 release
Hardware: All All
: medium normal
Assignee: Andreas Heinisch
URL:
Whiteboard: target:7.5.0 target:7.4.3
Keywords: bibisected, bisected, regression
Depends on:
Blocks: AutoFilter
  Show dependency treegraph
 
Reported: 2019-02-01 04:31 UTC by Kevin Suo
Modified: 2023-11-28 09:13 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Test ODS File (10.55 KB, application/vnd.oasis.opendocument.spreadsheet)
2019-02-01 04:31 UTC, Kevin Suo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Suo 2019-02-01 04:31:42 UTC
Created attachment 148812 [details]
Test ODS File

Problem:
The attached ODS file contains values "乙二醇(进口料件)" and "乙二醇(进口料件)". The first value has half-width parentheses (i.e., "()"), while the second value has full-width parentheses (i.e., "()"). When using auto-filter, the autofilter dropdown box shows only the half-width value.

Steps to Reproduce:
1. Click the drop-down list of auto filter in the attached ODS file. 

Expected Result:
The list shows values "乙二醇(进口料件)" and "乙二醇(进口料件)". These are different values.

Currect Result:
The list shows value "乙二醇(进口料件)" only. 

This issue is hard to identify. I produced wrong results in many mission-critical work. Calc autofilter is useless under this circumstance.
Comment 1 Kevin Suo 2019-02-01 04:33:08 UTC
Reproduced in versions:

Version: 6.0.7.3
Build ID:1:6.0.7-0ubuntu0.18.04.2
CPU 线程:4; 操作系统:Linux 4.15; UI 渲染:默认; VCL: gtk3; 
区域语言:zh-CN (zh_CN.UTF-8); Calc: group

版本: 6.1.5.1
Build ID: f18954c1ba9116b85c32b6bdbc0188d3e0fd24c7
CPU 线程: 4; 操作系统: Linux 4.15; UI 渲染: 默认; VCL: gtk2; 
区域语言: zh-CN (zh_CN.UTF-8); Calc: group threaded

Ubuntu 18.04 LTS.
Comment 2 Durgapriyanka 2019-02-01 18:04:32 UTC
Thank you for reporting the bug. I confirm this bug in

Version: 6.3.0.0.alpha0+
Build ID: b6b28931435e44aca92b8c0e1659f701e3ed1a87
CPU threads: 2; OS: Windows 6.1; UI render: default; VCL: win; 
TinderBox: Win-x86@42, Branch:master, Time: 2019-01-30_06:57:04
Locale: en-US (en_US); UI-Language: en-US
Calc: threaded
Comment 3 Buovjaga 2019-02-11 14:30:50 UTC
Bibisected on Linux with 41max to range https://gerrit.libreoffice.org/plugins/gitiles/core/+log/8a39c27633e75819e219bc5d179375fb4308a3ca..dc173b7f2a550185404aacbc6da744cb6d1880fc

This seems to be the one: https://gerrit.libreoffice.org/plugins/gitiles/core/+/338140e5201bc23625eacd1a4740622e6241cbc2%5E%21/
bnc#492440: Sort autofilter popup items correctly.

The severity is normal rather than major as there is no crash or a completely nonworking feature https://wiki.documentfoundation.org/images/0/06/Prioritizing_Bugs_Flowchart.jpg
Comment 4 Takeshi Abe 2019-02-14 15:19:53 UTC
By the way the reported behavior is due to the default case-insensitive filtering.
A simple workaround is to open the Standard Filter dialog by choosing "Standard Filter ..." in the popup, check "Case sensitive" in the "Options" drawer, and enter OK.  Then both "乙二醇(进口料件)" and "乙二醇(进口料件)" appears in the drop-down list.
Comment 5 Kevin Suo 2020-11-20 03:45:53 UTC
Still reproducible on libreoffice 7.0.

This behaviour is very dangerous - It may cause hard-to-identify error in your spreadsheet results.

Adding cc to Kohei Yoshida as it was bibisected to one of your old commit. Could you please take a loook? Thanks.
Comment 6 Kevin Suo 2020-11-20 10:46:36 UTC
I confirm it is caused by 
commit 338140e5201bc23625eacd1a4740622e6241cbc2
author	Kohei Yoshida <kohei.yoshida@gmail.com>	Wed Jan 30 22:32:28 2013 -0500

bnc#492440: Sort autofilter popup items correctly.

Changing GetCaseCollator back to GetCaseTransliteration will make the autofilter in this case work OK again.
Comment 7 Kevin Suo 2020-11-20 11:13:58 UTC Comment hidden (obsolete)
Comment 8 Kevin Suo 2020-11-20 11:15:34 UTC
Maybe we can treat "(x)" and "(x)" to be the same because case-sensitive is off, but the problem is that if they are treated the same, then if I select "(x)" in the dropdown list and click OK, then the filtered result should include both "(x)" and "(x)".

Steps to Reproduce:

1. Produce the following records:
a
(x)
(x)
y

2. Set autofilter.
--> It shows "(x)" and "y". 
(Fine, if we agree that the half-width and full-with brackets are the same).

3. Select "(x)" only in the dropdown, click OK.
--> Bug! Only the "(x)" is in the filtered result.
(This is the risk part I am concerning. You will miss the "(x)" in your data analyzing flow.)


I have no idea whether there are other chars being treated this way.
Comment 9 Stéphane Guillou (stragu) 2021-06-19 23:21:12 UTC
(In reply to Takeshi Abe from comment #4)
> By the way the reported behavior is due to the default case-insensitive
> filtering.
> A simple workaround is to open the Standard Filter dialog by choosing
> "Standard Filter ..." in the popup, check "Case sensitive" in the "Options"
> drawer, and enter OK.  Then both "乙二醇(进口料件)" and "乙二醇(进口料件)" appears in the
> drop-down list.

I don't think that is right, or maybe the behaviour has changed since then, but with original attachment 148812 [details] you'll notice that:

1. Copy cell A3 and paste into AutoFilter search field, click OK
2. Result: nothing is shown
3. Go back to showing all (or Ctrl + Z)
4. Open Standard Filter, make sure "Case sensitive" is _not_ ticked
5. Paste the same contents to create the condition "Name = 乙二醇(进口料件)"
6. Result: the corresponding row (row 3) is shown

So, with AutoFilter (by default case-insensitive) and Standard Filter (regardless of case sensitivity), we get different results in the case of those special characters.

Reproduced with:

Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: 94d552f94b427f884c004dba5d4619ecf729d605
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-06-18_13:30:27
Calc: threaded
Comment 10 Commit Notification 2022-10-13 10:29:24 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/2e887e04c0008a4eb6cbf34050b6fa463a33599f

tdf#125363, tdf#123095 - Use CaseTransliteration for autofilter

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2022-10-13 16:11:01 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/1b1ad0e3d5988c5e16dabfaa40252a22dab517b7

tdf#125363, tdf#123095 - Use CaseTransliteration for autofilter

It will be available in 7.4.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.