Bug 158185 - VBA Support 1: The Range.Find method interprets "#" as a wildcard character
Summary: VBA Support 1: The Range.Find method interprets "#" as a wildcard character
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Mike Kaganski
URL:
Whiteboard: target:24.2.0
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-12 16:41 UTC by Vladimir Sokolinskiy
Modified: 2023-11-15 13:06 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vladimir Sokolinskiy 2023-11-12 16:41:16 UTC
Run the following macro on a Calc document that has Sheet1:

Option VBASupport 1
Option Explicit
Sub test()
  With Worksheets("Sheet1")
    Dim r As Range
    .Range("A1").Value = "#a"
    .Range("A2").Value = "1a"
    Set r = .Cells.Find(What:="#a", LookIn:=xlValues, LookAt:=xlPart, _
            SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False, SearchFormat:=False)
    If Not (r Is Nothing) Then MsgBox "Found " & r.Address
  End With
End Sub

Result: Found $A$2

Excel VBA shows: Found $A$1

Possible reason for the discrepancy: Calc interprets the # sign as a wildcard character, Excel does not.
Comment 1 Eike Rathke 2023-11-13 00:08:24 UTC
It looks like the Calc implementation assumed that # in VBA is not supposed to find a literal # character, but is a wildcard matching a single digit instead. See https://learn.microsoft.com/en-us/office/vba/Language/Reference/User-Interface-Help/wildcard-characters-used-in-string-comparisons . So returning $A$2 as the first match looks to be correct according to that.

But, with Find() Excel apparently (to be verified) does not do what that documentation says.. and indeed https://learn.microsoft.com/en-us/office/vba/api/excel.range.find says for What only "Can be a string or any Microsoft Excel data type." and only later in Remarks mentions "To find cells that match more complicated patterns, use a For Each...Next statement with the Like operator."

It seems Calc implemented that Like operator behaviour with the Find() function (translating those wildcard operators to regular expressions).

Code pointers:
sc/source/ui/vba/vbarange.cxx ScVbaRange::Find() and ScVbaRange::Replace() and their calls to VBAToRegexp(sWhat) and see use of SC_UNO_SRCHREGEXP property.

That may be wrong, but caveat, changing it may as well break existing VBA code created with OOo/LibreOffice that relies on it.. though maybe people didn't dive that far into it.
Comment 2 Vladimir Sokolinskiy 2023-11-13 12:58:51 UTC
Hello Eike!
Thank you very much for your answer!

For myself, I distinguish between the worlds of “VBA” and “Excel”. Your first link is to the Like operator of the "VBA" world. I don't know of Excel worksheet functions or Excel object methods that support the Like operator wildcard character set.

By the way, the Excel Range.Find method, in addition to laconic documentation, has a number of, let’s say, not generally known features.
For example, macro (Excel VBA)

Sub test()
  Dim r As Range
  Set r = Cells.Find(What:="#", LookIn:=xlValues, lookat:=xlPart, SearchFormat:=False)
  MsgBox r.Address
End Sub

will find, among other things, cells with numeric content in which “###” characters are displayed due to insufficient column width.
It is unlikely that this behavior should be taken into account when emulating the Range.Find method in Calc.  :)
Comment 3 Mike Kaganski 2023-11-13 16:12:35 UTC
Excel supports wildcards in searches, including autofilters; but they are limited to the three characters that Eike implemented in LI 5.2 [1]. E.g., a search string like "?a" should find both the cells.

So the fix should be sanitizing the VBAToRegexp function, which is only used here in sc/source/ui/vba/vbarange.cxx.

Note that wildcards are alsoo used in ScInterpreter::ScMatch, which has lines

        if (rItem.meType == ScQueryEntry::ByString)
        {
            bool bIsVBAMode = mrDoc.IsInVBAMode();

            if ( bIsVBAMode )
                rParam.eSearchType = utl::SearchParam::SearchType::Wildcard;
            else
                ...

Maybe it might be unified...

[1] https://wiki.documentfoundation.org/ReleaseNotes/5.2#Support_wildcards_to_be_compatible_with_XLS/XLSX_and_with_ODF_1.2
Comment 4 Mike Kaganski 2023-11-13 16:34:03 UTC
Heh, basically, it seems enough to drop the conversion to regexp, and use SC_UNO_SRCHWILDCARD instead of SC_UNO_SRCHREGEXP :-)
Comment 5 Vladimir Sokolinskiy 2023-11-13 17:59:48 UTC
Mike, thank you!
Comment 6 Eike Rathke 2023-11-13 19:40:19 UTC
(In reply to Mike Kaganski from comment #4)
> Heh, basically, it seems enough to drop the conversion to regexp, and use
> SC_UNO_SRCHWILDCARD instead of SC_UNO_SRCHREGEXP :-)
But that would be equally wrong, if Excel Range.Find() searches for the literal text string. Then it should be none of the SC_UNO_SRCH... properties.
Comment 7 Mike Kaganski 2023-11-13 20:10:47 UTC
(In reply to Eike Rathke from comment #6)
> But that would be equally wrong, if Excel Range.Find() searches for the
> literal text string.

:-) But of course I tested that it uses wildcards ;)
Comment 8 Commit Notification 2023-11-13 22:07:00 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/c60a9db1f2a8e2a088c6b89bcdff4901b28f2864

tdf#158185: fix Excel's Range.Find and Range.Replace wildcard recognition

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Mike Kaganski 2023-11-14 05:57:53 UTC
Let's call this fixed. Both Find and Replace should now do the right job. As opposed to AutoFilter, which seems to not work before, and with https://gerrit.libreoffice.org/c/core/+/159395, which I -1'ed, because it is not enough.

Please file a new bug for AutoFilter method.
Comment 10 Mike Kaganski 2023-11-14 06:33:28 UTC
Eike: in f84b8c03462238b821724b7f504ad141c83fcf8f and 1f3357013ba1f319a3bcddf4c9a658c46e8c0390, you mentioned '\' as an escapement character in context of "search in Word". But I was under an impression that MS Office defines only '~' as the wildcard escapement character, also in Word ... what was the story there?
Comment 11 Mike Kaganski 2023-11-14 06:41:42 UTC
Possibly there's some confusion, with "wildcards" used on MS Office differently in different context.

Excel:
https://support.microsoft.com/en-us/office/using-wildcard-characters-in-searches-ef94362e-9999-4350-ad74-4d2371110adb

Word:
https://learn.microsoft.com/en-us/office/dev/add-ins/word/search-option-guidance

Access:
https://support.microsoft.com/en-us/office/examples-of-wildcard-characters-939e153f-bd30-47e4-a763-61897c87b3f4

What is implemented in LibreOffice is the Excel syntax; and backslash here is not applicable. I'd expect *at least* have the default value of css::util::SearchOptions2::WildcardEscapeCharacter (and i18nutil::SearchOptions2::WildcardEscapeCharacter) to be '~', which would let me avoid the API CHANGE that I made in the last commit, only to set the escapement character - Eike, what do you think?
Comment 12 Eike Rathke 2023-11-15 13:06:03 UTC
(In reply to Mike Kaganski from comment #10)
> Eike: in f84b8c03462238b821724b7f504ad141c83fcf8f and
> 1f3357013ba1f319a3bcddf4c9a658c46e8c0390, you mentioned '\' as an escapement
> character in context of "search in Word". But I was under an impression that
> MS Office defines only '~' as the wildcard escapement character, also in
> Word ... what was the story there?
IIRC the Find&Replace (or whatever it's called) dialog in Word uses (used?) \ backslash as escape character, but I don't really remember..
I added support for an arbitrary escape character in case Writer wanted to implement MS wildcard search as well, which it never did.


(In reply to Mike Kaganski from comment #11)
> Possibly there's some confusion, with "wildcards" used on MS Office
> differently in different context.
Possibly as well..

> What is implemented in LibreOffice is the Excel syntax; and backslash here
> is not applicable. I'd expect *at least* have the default value of
> css::util::SearchOptions2::WildcardEscapeCharacter (and
> i18nutil::SearchOptions2::WildcardEscapeCharacter) to be '~', which would
> let me avoid the API CHANGE that I made in the last commit, only to set the
> escapement character - Eike, what do you think?
To me, the \ backslash escape character is more "natural" (like in all shell and regex context), and the ~ tilde character is only some Microsoft quirk. The wildcard search itself using ?* is not MS-specific, only ~ is.

Changing the default value would be not only an API CHANGE but an INCOMPATIBLE API CHANGE, so I'd rather avoid that.