Description: Find returns the beginning or the end of string found when using regex. The easiest way to explain this is an example: The string result formula 123456789 5 =SZUKAJ.TEKST("3?5"; A1) 123456789 3 =SZUKAJ.TEKST("345"; B2) 123456790 3 =SZUKAJ.TEKST("3.5"; B3) 123456791 5 =SZUKAJ.TEKST("3*5"; B4) 123456792 3 =SZUKAJ.TEKST("3[0-9]5"; B5) 123456795 3 =SZUKAJ.TEKST("3[0-9]*5"; B6) (Polish szukaj.text = find) One can notice that when we use "?" or "*" we get the position of the last character. When "." or [0-9] or [0-9]* is used the position of the first character is returned which, I suppose, is correct. Steps to Reproduce: 1. Use find function with '.' and '?' locate some substring in string e.g "345" (test "345", "3.5" and "3?5") in "123456789". 2. Compare the results (returned position) Actual Results: when searching "345" in "123456789" position 3 is returned (correct) when searching "3?5" in "123456789" position 5 is returned (not correct) when searching "3*5" in "123456789" position 5 is returned (not correct) when searching "3[0-9]5" in "123456789" position 3 is returned (correct) Expected Results: I suppose that always the begining of found substring should be returned (3 in the above example). Reproducible: Always User Profile Reset: No OpenGL enabled: Yes Additional Info: Tested on 6.0.5.1(Linux), 6.0.5.2 (win), 6.0.2.1 (win). Always the same.
Created attachment 143267 [details] An example in .ODS
Text search using glob() sytle "wildcard" syntax was implemented for bug 72196 [1] and compatibility with MS Excel products. Otherwise IIUC Search and our Regex implementation is provided by ICU libs. They are completely different search implementations. Personally I see no reason to go back in and adjust the search logic or resulting pointer--but others can decide that. @Eike? =-ref-= [1] https://cgit.freedesktop.org/libreoffice/core/commit/?id=f1a4663c819bf698f95a75b5a3319506c66f2778
This has nothing to do with Wildcard search, the document has Regular Expressions enabled. This is not a bug. One has to understand how regular expressions work. Specifically the ? and * operators can not be used the same as in wildcards. 3?5 means zero or one 3 followed by one 5. For the sequence "345" this matches only 5. 3*5 means zero or more 3 followed by one 5. For the sequence "345" this matches only 5. For the available metacharacters, operators and set expressions in ICU and thus LibreOffice see http://userguide.icu-project.org/strings/regexp
Actually I've got this problem when striping the last digit (and possibly a character) from the name names of samples from some analyses. I've got such names as in first column and would like to strip last digit and possibly letter. I was surprised to get different results for "KHe3r" and "KHe3" when using "[0-9]?$". One can deal with the problem with "[0-9][a-z]{0,1}$" which work as I expected. =SZUKAJ.TEKST(REGEX; AX) "[0-9]?$" stripped text* "[0-9][a-z]{0,1}$" stripped text* K-7dpi3 7 K-7dpi 7 K-7dpi KHe1 4 KHe 4 KHe KHe2s 6 KHe2s 4 KHe KHe3 4 KHe 4 KHe KHe3r 6 KHe3r 4 KHe KL101a 7 KL101a 5 KL10 KL102 5 KL10 5 KL10 KL103 5 KL10 5 KL10 KL10-3dpi1 10 KL10-3dpi 10 KL10-3dpi KL10-3dpi2 10 KL10-3dpi 10 KL10-3dpi MM-10dpi1 9 MM-10dpi 9 MM-10dpi *stripped text returned by =FRAGMENT.TEKSTU(A1; 1; [result from the coulumn with position]-1) Thank you for clarifying the problem. I describe the solution which works for me. Maybe it will be helpful for somebody.
Testing expressions in an online tester (for example https://regex101.com/) may help, it even gives some explanation about the expression entered and if quantifiers are greedy/docile/lazy/possessive. The default PCRE dialect is fine and quite close to the Java inspired regex of ICU, though there are subtle differences. Note that the match offsets in that tester (and others) are 0-based while in Calc they are 1-based. For the quantifiers subtleties see https://www.rexegg.com/regex-quantifiers.html