Bug 132389 - BASIC: Replace is only case-insensitive for ASCII characters
Summary: BASIC: Replace is only case-insensitive for ASCII characters
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: BASIC (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Andreas Heinisch
URL:
Whiteboard: target:7.0.0 target:7.2.0
Keywords:
Depends on:
Blocks: Macro-StarBasic
  Show dependency treegraph
 
Reported: 2020-04-24 22:03 UTC by Mike Kaganski
Modified: 2021-09-02 04:15 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2020-04-24 22:03:18 UTC
> Sub TestReplace2
>   MsgBox Replace("АБВабв", "б", "*") ' test Cyrillic characters
>   MsgBox Replace("ABCabc", "b", "*") ' test ASCII characters
> End Sub

This code generates "АБВа*в" in the first case, while the correct result should be "А*Ва*в", since the default mode for Replace is case-insensitive [1]. It shows "A*Ca*c" correctly for the second case.

Replace should allow case-insensitive operation for non-ASCII characters, too.
Code pointer: SbRtl_Replace in basic/source/runtime/methods.cxx.

[1] https://help.libreoffice.org/6.4/en-US/text/sbasic/shared/replace.html
Comment 1 Commit Notification 2020-05-21 06:51:35 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/3ff159d35770ac3454ee909b348cb4f4ca8b0b9b

tdf#132389 - case-insensitive operation for non-ASCII characters

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 2 Commit Notification 2021-05-13 18:04:23 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/698e27d29cf0612634720c818ee773bfac6c40d1

tdf#132389 - Case-insensitive operation for non-ASCII characters

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 3 Stephan Bergmann 2021-05-14 14:58:52 UTC
Note that the Unicode standard defines a concept of locale-independent "default caseless matching" (D144 in section 3.13 "Default Case Algorithms", <https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf>), which might be more appropriate to use here than any specific locale-dependent approach.
Comment 4 Andreas Heinisch 2021-05-14 16:31:09 UTC
This is something I cannot decide, because I have not the insight in the locale vs. locale independent comparision.

In the linked document, there are even two possible ways in order to do a default caseless matching:

D144 
A string X is a caseless match for a string Y if and only if:
toCasefold(X) = toCasefold(Y)

D145
A string X is a canonical caseless match for a string Y if and only if:
NFD(toCasefold(NFD(X))) = NFD(toCasefold(NFD(Y)))

Is the method toCasefold the same as defined in https://opengrok.libreoffice.org/xref/core/i18npool/source/transliteration/transliteration_Ignore.cxx?r=c6b7f555#85, or is there another implementation?