Bug 148650 - InStr finds match in string containing diacritics
Summary: InStr finds match in string containing diacritics
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: BASIC (show other bugs)
Version:
(earliest affected)
7.3.2.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Diacritics
  Show dependency treegraph
 
Reported: 2022-04-18 13:23 UTC by Jordi
Modified: 2024-08-01 17:27 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jordi 2022-04-18 13:23:45 UTC
Moving from,

Version: 7.0.5.2 (x64)
Build ID: 64390860c6cd0aca4beafafcfd84613dd9dfb63a
CPU threads: 12; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: en-AU (en_AU); UI: en-GB
Calc: CL

to version 7.3.2.2 (with clean profile) and one of my macros that uses InStr to search a string of Greek characters no longer returns the same value. For example the following code returns 0 in the earlier version but returns 8 on 7.3.2.2,


	Dim grkchrs: grkchrs = "άέίόύώήΐΰΆΈΊΌΎΏΉ"
	Dim r: r = instr(grkchrs, "ι")
	
	print r


I reset my profile on the old version just to be safe with the same result.

Thanks.
Comment 1 Vladimir Sokolinskiy 2022-04-18 14:24:59 UTC
Returns 8 in:
Version: 7.3.2.2 (x64) / LibreOffice Community
Build ID: 49f2b1bff42cfccbd8f788c8dc32c1c309559be0
CPU threads: 6; OS: Windows 10.0 Build 19042; UI render: default; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: threaded
Comment 2 Mike Kaganski 2022-04-18 14:28:13 UTC
The transliteration (with only TransliterationFlags::IGNORE_CASE set!) used to make case-insensitive match converts the 16-character string into these 20 characters:
'ά' - 'ά'
'έ' - 'έ'
'ί' - 'ί'
'ό' - 'ό'
'ύ' - 'ύ'
'ώ' - 'ώ'
'ή' - 'ή'
'ΐ' - 'ι'
      '̈'
      '́'
'ΰ' - 'υ'
      '̈'
      '́'
'Ά' - 'ά'
'Έ' - 'έ'
'Ί' - 'ί'
'Ό' - 'ό'
'Ύ' - 'ύ'
'Ώ' - 'ώ'
'Ή' - 'ή'

Indeed, the searched character is found there.
Of course, it *seems* that the original code should use case-insensitive ("binary") comparison, and replacing 'instr(grkchrs, "ι")' with 'instr(1, grkchrs, "ι", 0)' gives the expected 0. But is the transliteration correct in this case?

Eike: do you know if it's correct?
Comment 3 Mike Kaganski 2022-04-18 14:37:43 UTC
FTR: Calc's '=SEARCH("ι";A1)' also returns 8.
Comment 4 Jordi 2022-04-18 19:33:11 UTC
(In reply to Mike Kaganski from comment #2)
> The transliteration (with only TransliterationFlags::IGNORE_CASE set!) used
> to make case-insensitive match converts the 16-character string into these
> 20 characters:
> 'ά' - 'ά'
> 'έ' - 'έ'
> 'ί' - 'ί'
> 'ό' - 'ό'
> 'ύ' - 'ύ'
> 'ώ' - 'ώ'
> 'ή' - 'ή'
> 'ΐ' - 'ι'
>       '̈'
>       '́'
> 'ΰ' - 'υ'
>       '̈'
>       '́'
> 'Ά' - 'ά'
> 'Έ' - 'έ'
> 'Ί' - 'ί'
> 'Ό' - 'ό'
> 'Ύ' - 'ύ'
> 'Ώ' - 'ώ'
> 'Ή' - 'ή'
> 
> Indeed, the searched character is found there.
> Of course, it *seems* that the original code should use case-insensitive
> ("binary") comparison, and replacing 'instr(grkchrs, "ι")' with 'instr(1,
> grkchrs, "ι", 0)' gives the expected 0. But is the transliteration correct
> in this case?
> 
> Eike: do you know if it's correct?

Great, thanks for the heads up on the mode option. It solves my immediate problem. Alas 7.3.2.2 is unstable for me so back to my old version I go.
Comment 5 Andreas Heinisch 2022-05-02 19:38:53 UTC
Should we close this as NOTABUG since Calc's '=SEARCH("ι";A1)' returns the same result as the macro?
Comment 6 Mike Kaganski 2022-05-02 20:47:59 UTC
(In reply to Andreas Heinisch from comment #5)

The problem here is the inconsistency IMO. Every character with a diacritic could be represented as a base character plus combining characters. But only two were decomposed like that.