Bug 125363 - UI: LibreOffice Calc's AutoFilter treats combining and modifier letters the same as plain letters in the value list
Summary: UI: LibreOffice Calc's AutoFilter treats combining and modifier letters the s...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.1.0.4 release
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Andreas Heinisch
URL:
Whiteboard: target:7.5.0 target:7.4.3
Keywords: bibisected, regression
: 105314 (view as bug list)
Depends on:
Blocks: AutoFilter
  Show dependency treegraph
 
Reported: 2019-05-18 19:24 UTC by Carsten Becker
Modified: 2023-11-28 09:13 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Example as requested by comment #2 (10.23 KB, application/vnd.oasis.opendocument.spreadsheet)
2019-06-11 05:36 UTC, Carsten Becker
Details
bisect result (3.58 KB, text/plain)
2019-08-25 19:39 UTC, raal
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carsten Becker 2019-05-18 19:24:56 UTC
Description:
LibreOffice Calc's 'AutoFilter' popup menu treats combining diacritic letters (like U+0364, U+0366) and modifier letters (like U+1D49) as identical to their plain equivalents in the value list. For instance, in a column that contains both 'tuon' and 'tuͦn' (variant spellings of Middle High German for 'to do'), only one variant will be listed (see screenshot). As expected, filtering for the listed variant indeed only lists those cells in which that variant appears verbatim, but not the other variant.

Steps to Reproduce:
1. Create a list of words in a column containing words with a plain, regular letter and the corresponding combining diacritic letter or modifier letter in its place (e.g. tuon, tuͦn; guet, guͤt; vröude, vröudͤ), make the first cell in the column some field identifier like 'mylist', 'example', or '123'.
2. Data > AutoFilter
3. OK, use first line as header
4. Click the dropdown arrow in the header of the column that contains our word list

Actual Results:
Only one variant of the word is listed in the value list: either with the plain letter (tuon, guet, vröude) or the combining/modifier letter (tuͦn, guͤt, vröudͤ).

Expected Results:
Both variants are listed—AutoFilter doesn't treat superscript/modifier letters the same as their plain equivalents, i.e. all of tuon, tuͦn, guet, guͤt, vröude, vröudͤ are listed as values occuring in the selected column.


Reproducible: Always


User Profile Reset: No



Additional Info:
Screenshot: https://i.imgur.com/DTZhV6e.png
Version: 6.0.7.3
Build-ID: 1:6.0.7-0ubuntu0.18.04.5
CPU-Threads: 4; OS: Linux 4.15; UI-Render: Standard; VCL: kde4; 
Localization schema: de-DE (de_DE.UTF-8); Calc: group
Comment 1 Carsten Becker 2019-05-18 19:29:46 UTC
That should have been 'vröudᵉ', not *vröudͤ.
Comment 2 Xisco Faulí 2019-06-10 16:28:19 UTC
Thank you for reporting the bug. Please attach a sample document, as this makes it easier for us to verify the bug. 
(Please note that the attachment will be public, remove any sensitive information before attaching it. 
See https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F for help on how to do so.)

I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the requested document is provided.
Comment 3 Carsten Becker 2019-06-11 05:36:15 UTC
Created attachment 152087 [details]
Example as requested by comment #2

Added example file as requested by Xisco Faulí in comment #2
Comment 4 raal 2019-08-12 19:17:28 UTC
Confirm with Version: 6.4.0.0.alpha0+
Build ID: 2812610f4f39ed5892da08864893c758325d1d39
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
but not in LO Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)
Comment 5 raal 2019-08-25 19:39:19 UTC
Created attachment 153646 [details]
bisect result

bibisected with bibisect-41max. The result is a range of 70 commits.
Comment 6 himajin100000 2020-06-11 16:00:38 UTC Comment hidden (obsolete)
Comment 7 himajin100000 2020-06-11 16:01:25 UTC Comment hidden (obsolete)
Comment 8 himajin100000 2020-07-15 17:28:19 UTC
006F  ; [.213C.0020.0002] # LATIN SMALL LETTER O
0366  ; [.213C.0020.0004] # COMBINING LATIN SMALL LETTER O

1D49  ; [.2007.0020.0014] # MODIFIER LETTER SMALL E
0065  ; [.2007.0020.0002] # LATIN SMALL LETTER E

https://dencode.com/en/string/unicode-normalization
https://unicode.org/reports/tr10/#Main_Algorithm
http://www.unicode.org/Public/UCA/13.0.0/allkeys.txt
https://opengrok.libreoffice.org/xref/core/sc/source/core/data/global.cxx?r=3ac9f491#1045
https://opengrok.libreoffice.org/xref/core/offapi/com/sun/star/i18n/CollatorOptions.idl?r=944eb990#31
https://opengrok.libreoffice.org/xref/core/i18npool/source/collator/collator_unicode.cxx?r=b122a39c#411

I guess that the cause of bug is, whatever the appropriate implementation is, that these letters only differs in TERTIARY weight, but the option sets the collator's strength to SECONDARY.
Comment 9 Kevin Suo 2020-11-20 12:41:21 UTC
The attached bibisect range is meaningless as they appear to be the commit id of the binary, not the source-hash.

raal: If you still have the bibisect-41max repo, would you please identify the source-hash of those commits?

Seams to be a duplicate of bug 123095, but they are of different chars affected.
Comment 10 himajin100000 2020-11-20 12:54:54 UTC
0028  ; [*0328.0020.0002] # LEFT PARENTHESIS
FF08  ; [*0328.0020.0003] # FULLWIDTH LEFT PARENTHESIS
Comment 11 Stéphane Guillou (stragu) 2021-06-19 23:02:27 UTC
Reproduced in:

Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: 94d552f94b427f884c004dba5d4619ecf729d605
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-06-18_13:30:27
Calc: threaded

I think this is serious because, as mentioned in Bug 123095, whatever subset you select in the value list, some values will never show.

In the example document, there are 20 rows of data. In the AutoFilter list, there are three values to choose from: tuon, guet, vröude. If ticked, these options respectively show 3, 4 and 4 values: a total of 11.
Comment 12 Stéphane Guillou (stragu) 2021-06-21 02:12:46 UTC
Confirmed on Windows as well, with slightly different result:

The AutoFilter value list shows only the 3 values "guͤt, tuon, vröude", which would filter in 2, 4 and 4 rows respectively.

Version: 7.0.6.2 (x64)
Build ID: 144abb84a525d8e30c9dbbefa69cbbf2d8d4ae3b
CPU threads: 8; OS: Windows 10.0 Build 19042; UI render: default; VCL: win
Locale: en-AU (en_AU); UI: en-US
Calc: threaded

and:

Version: 7.2.0.0.alpha1+ (x64) / LibreOffice Community
Build ID: aa9cb8e14749e7fb7a83b55a2bb095501f731a18
CPU threads: 8; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win
Locale: en-AU (en_AU); UI: en-US
Calc: threaded
Comment 13 Commit Notification 2022-10-13 10:29:17 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/2e887e04c0008a4eb6cbf34050b6fa463a33599f

tdf#125363, tdf#123095 - Use CaseTransliteration for autofilter

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Commit Notification 2022-10-13 16:10:52 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/1b1ad0e3d5988c5e16dabfaa40252a22dab517b7

tdf#125363, tdf#123095 - Use CaseTransliteration for autofilter

It will be available in 7.4.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Andreas Heinisch 2022-12-14 11:25:50 UTC
*** Bug 105314 has been marked as a duplicate of this bug. ***