132389 – BASIC: Replace is only case-insensitive for ASCII characters

Bug 132389 - BASIC: Replace is only case-insensitive for ASCII characters

Summary: BASIC: Replace is only case-insensitive for ASCII characters

Status:	RESOLVED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	BASIC (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	All All

Importance:	medium normal
Assignee:	Andreas Heinisch

URL:
Whiteboard:	target:7.0.0 target:7.2.0
Keywords:

Depends on:
Blocks:	Macro-StarBasic
	Show dependency tree / graph

Reported:	2020-04-24 22:03 UTC by Mike Kaganski
Modified:	2021-09-02 04:15 UTC (History)
CC List:	3 users (show)

See Also:	141045 142243 142487 110003 144245
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mike Kaganski 2020-04-24 22:03:18 UTC

> Sub TestReplace2
>   MsgBox Replace("АБВабв", "б", "*") ' test Cyrillic characters
>   MsgBox Replace("ABCabc", "b", "*") ' test ASCII characters
> End Sub

This code generates "АБВа*в" in the first case, while the correct result should be "А*Ва*в", since the default mode for Replace is case-insensitive [1]. It shows "A*Ca*c" correctly for the second case.

Replace should allow case-insensitive operation for non-ASCII characters, too.
Code pointer: SbRtl_Replace in basic/source/runtime/methods.cxx.

[1] https://help.libreoffice.org/6.4/en-US/text/sbasic/shared/replace.html

Comment 1 Commit Notification 2020-05-21 06:51:35 UTC

Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/3ff159d35770ac3454ee909b348cb4f4ca8b0b9b

tdf#132389 - case-insensitive operation for non-ASCII characters

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 2 Commit Notification 2021-05-13 18:04:23 UTC

Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/698e27d29cf0612634720c818ee773bfac6c40d1

tdf#132389 - Case-insensitive operation for non-ASCII characters

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 3 Stephan Bergmann 2021-05-14 14:58:52 UTC

Note that the Unicode standard defines a concept of locale-independent "default caseless matching" (D144 in section 3.13 "Default Case Algorithms", <https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf>), which might be more appropriate to use here than any specific locale-dependent approach.

Comment 4 Andreas Heinisch 2021-05-14 16:31:09 UTC

This is something I cannot decide, because I have not the insight in the locale vs. locale independent comparision.

In the linked document, there are even two possible ways in order to do a default caseless matching:

D144 
A string X is a caseless match for a string Y if and only if:
toCasefold(X) = toCasefold(Y)

D145
A string X is a canonical caseless match for a string Y if and only if:
NFD(toCasefold(NFD(X))) = NFD(toCasefold(NFD(Y)))

Is the method toCasefold the same as defined in https://opengrok.libreoffice.org/xref/core/i18npool/source/transliteration/transliteration_Ignore.cxx?r=c6b7f555#85, or is there another implementation?