Bug 144723 - regular expression in find and replace is case insensitive
Summary: regular expression in find and replace is case insensitive
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.2.1.1 rc
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-25 15:55 UTC by johnks
Modified: 2021-09-26 10:23 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
test for regex find and replace (10.77 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-09-25 15:56 UTC, johnks
Details
screenshot of find and replace regex (119.42 KB, image/png)
2021-09-25 15:57 UTC, johnks
Details
Screencast using https://regex101.com/ (260.49 KB, image/gif)
2021-09-25 19:01 UTC, Mike Kaganski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description johnks 2021-09-25 15:55:20 UTC
Description:
i suppose this is a bug in find and replace "regular expressions" checkbox which accepts regex but one that is case insensitive while =regex correctly detects case sensitivity in expressions. attached screenshot containing find and replace finding the test while =regex does not

Steps to Reproduce:
1. open file regex.odt . check if regex is working, both small "p" and capital "P" in two rows. the same is confirmed by checking the string and expression on online regex testers like https://regex101.com/ and https://www.regextester.com/
2. Now, copy small p string and go to find and replace "ctrl+h" and select regular expressions.


Actual Results:
find and replace with regular expressions checked should not detect any string because the string contains small "p" but the text contains capital "P" but find and replace still finds it. This is wrong behaviour

Expected Results:
find and replace should follow =regex and be case sensitive so that building expressions should be same in find and in formulae




Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 7.2.1.2 / LibreOffice Community
Build ID: 87b77fad49947c1441b67c559c339af8f3517e22
CPU threads: 4; OS: Linux 5.11; UI render: default; VCL: gtk3
Locale: en-IN (en_IN); UI: en-US
Flatpak
Calc: threaded
Comment 1 johnks 2021-09-25 15:56:50 UTC
Created attachment 175271 [details]
test for regex find and replace
Comment 2 johnks 2021-09-25 15:57:13 UTC
Created attachment 175272 [details]
screenshot of find and replace regex
Comment 3 Buovjaga 2021-09-25 16:37:59 UTC
And did you check "Match Case"?

Your steps are not very clear. You should be more exact.
Comment 4 Mike Kaganski 2021-09-25 16:46:45 UTC
(In reply to johnks from comment #0)
> Expected Results:
> find and replace should follow =regex and be case sensitive so that building
> expressions should be same in find and in formulae

No. Find and Replace has an own control (that is visible on your screenshot: "Match case" under the Find box). You may use flags to change this. Just because a new spreadsheet function had been introduced in 6.2, doesn't mean that a dialog that people are familiar with should stop follow the checkbox to start working as the function that has no checkbox :)

This is not a bug.
Comment 5 johnks 2021-09-25 17:16:04 UTC
(In reply to Mike Kaganski from comment #4)
> (In reply to johnks from comment #0)
> > Expected Results:
> > find and replace should follow =regex and be case sensitive so that building
> > expressions should be same in find and in formulae
> 
> No. Find and Replace has an own control (that is visible on your screenshot:
> "Match case" under the Find box). You may use flags to change this. Just
> because a new spreadsheet function had been introduced in 6.2, doesn't mean
> that a dialog that people are familiar with should stop follow the checkbox
> to start working as the function that has no checkbox :)
> 
> This is not a bug.

no. my understanding is this. When you use regular expressions, you put all calculations of finding and filtering text on the regular expression itself and not on the software. 

no other regex tester or even the =regex formula assumes user should use a checkbox to select or unselect case sensitivity. that is the job of the regular expression itself. a regular expression must find text on its own and not expect users to pre select something else. please understand what i am trying to say
Comment 6 johnks 2021-09-25 17:17:59 UTC
this is also why selecting regular expressions disables other buttons "wildcards" and "similarity search" because their work is being done by regular expressions. why should the match case also not be disabled like them?
Comment 7 m_a_riosv 2021-09-25 17:41:47 UTC
You can take a look in https://bugs.documentfoundation.org/show_bug.cgi?id=78840
Comment 8 Mike Kaganski 2021-09-25 17:55:50 UTC
(In reply to johnks from comment #5)
> no. my understanding is this.

False.
In any case, regular expressions have *some* initial state of the flags that define its behavior. In dialog's case, the initial state is controlled by the checkbox. This is convenient, *consistent* (all controls that are applicable to all modes do work uniformly), and does not limit one - because if you prefer, you just set the case-sensitive mode.

Again: this is *not* a bug.
Comment 9 Mike Kaganski 2021-09-25 18:04:45 UTC
For the reference:

LibreOffice uses ICU regex engine. You may look at 'Case Insensitive Matching' in its help page [1]. It reads:

> Case insensitive matching is specified by the UREGEX_CASE_INSENSITIVE flag
> during pattern compilation ...

This is ~same for any other regex engines: software defines case sensitivity applied to *compiled regex*; and then user *may* re-define the flags inside the regex.

You seem to not be familiar with this concept, but you may look at re.IGNORECASE flag in Python's regex [2] doing the same when passed to its compile.

So the checkbox in the dialog controls just that, and it is consistent and correct.

[1] https://unicode-org.github.io/icu/userguide/strings/regexp.html#case-insensitive-matching
[2] https://docs.python.org/3/library/re.html#re.I
Comment 10 Mike Kaganski 2021-09-25 19:01:33 UTC
Created attachment 175274 [details]
Screencast using https://regex101.com/

(In reply to johnks from comment #0)
> the same is confirmed by checking the string and expression on online regex
> testers like https://regex101.com/ and https://www.regextester.com/

(In reply to johnks from comment #5)
> no other regex tester or even the =regex formula assumes user should use a
> checkbox to select or unselect case sensitivity.

Please see the attached screencast. You may check the same in https://www.regextester.com/ yourself, clicking on its "flags" control.
Comment 11 Buovjaga 2021-09-26 06:18:00 UTC
(In reply to johnks from comment #5)
> no other regex tester or even the =regex formula assumes user should use a
> checkbox to select or unselect case sensitivity. that is the job of the
> regular expression itself. a regular expression must find text on its own
> and not expect users to pre select something else. please understand what i
> am trying to say

Plenty of other editors behave like LibreOffice, for example Kate editor.
Comment 12 Mike Kaganski 2021-09-26 08:55:04 UTC
(In reply to Buovjaga from comment #11)
> Plenty of other editors behave like LibreOffice, for example Kate editor.

Indeed. E.g.:

1. Linux: gedit: https://i.imgur.com/CkCK2Ti.png
2. Cross-platform: VSCode: https://i.imgur.com/PJfM7U9.png (Match Case and Regular expressions selected together)
3. Windows: notepad++: https://i.imgur.com/bg0Tkrn.png

And so on...
Comment 13 johnks 2021-09-26 09:36:41 UTC
(In reply to Mike Kaganski from comment #12)
> (In reply to Buovjaga from comment #11)
> > Plenty of other editors behave like LibreOffice, for example Kate editor.
> 
> Indeed. E.g.:
> 
> 1. Linux: gedit: https://i.imgur.com/CkCK2Ti.png
> 2. Cross-platform: VSCode: https://i.imgur.com/PJfM7U9.png (Match Case and
> Regular expressions selected together)
> 3. Windows: notepad++: https://i.imgur.com/bg0Tkrn.png
> 
> And so on...

i see your point but check this screenshot of kwrite https://ibb.co/5GDFbHP
the match case button is selected by default. if i unselect it in kwrite, then it does case insensitive. otherwise it is case sensitive by default and i assume all the mentioned softwares are too. can you please confirm going into regular expressions selects match case by default or not? i does for me on kwrite but not on libreoffice
Comment 14 Mike Kaganski 2021-09-26 10:23:19 UTC
(In reply to johnks from comment #13)

You are shifting the goalposts - for the second time ;)

It absolutely does not matter what kwrite decides its defaults. You may check that different applications have different defaults (and even different sets of supported flags) in them. You may e.g. look at comment 10, and look at the *two web services that you suggested for testing*, to see that the two services have different default values for "multiline" flag. This shows that it's up to the software what to have as its default value for the mode.

gedit and VSCode have both "case insensitive" by default; I didn't check notepad++ in that regard (I don't want to reset its settings). But even if they all had case-sensitive by default, it wouldn't change anything. LibreOffice is a separate application; and it is up to LibreOffice to decide which mode is most reasonable for majority of *its* users.

Anyway, *given that it is fine to control the flag using the checkbox* (as we already agreed), it is absolutely file to *not* change the checkbox state when *another* checkbox changes (they are *independent*). And further, given that the *default* mode is case-insensitive plain search, it is absolutely reasonable that when you check regular expressions from that default mode, case sensitivity is not enabled. Doing otherwise would be unexpected and wrong (creating hidden unobvious relations between settings).

The only thing that actually needs fixing is retaining "case sensitivity" flag across sessions - similar to bug 112271. But that is different, and needs an own report (if yet absent) - because this one is created initially with wrong premises, has already three different ideas, and would already be confusing/unmanageable if converted to "let's keep the value of the checkbox across sessions".