Bug 126764 - REGEX() function with an integer as 'Occurence' returns the complete 'Text' if the respective match doesn't exist. It should return #N/A.
Summary: REGEX() function with an integer as 'Occurence' returns the complete 'Text' i...
Status: CLOSED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.3.0.2 rc
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-08 09:51 UTC by Wolfgang Jäger
Modified: 2019-10-16 19:50 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Wolfgang Jäger 2019-08-08 09:51:17 UTC
To justify my choice of "major" for severity see last paragraph!

Example

Current behaviour:
=REGEX("12-34-56";"[^-]*-";"";3) returns "12-34-56".

Expected behaviour and reasons:
#N/A should be returned.
Since the ordered replacement couldn't be performed due to the lack of a third match, the actual behaviour is gravely misleading. 

Additional explanations:
Some real applications will need much more complicated formulae to get a workaround under the current behaviour.
In an actual use-case I need to rely on something like
=LEFT(REGEX($A9;$E$2;"";1);IF(CURRENT()=$A9;0;1)) instead of the clear and simple
=IFNA(LEFT(REGEX($A9;$E$2;"";1);1);"")

In more complicated cases it may get difficult if not impossible to catch the error without helper cells and without relying on functions (SEARCH() e.g.)capable of applying RegEx ONLY IF the respective global setting is enabled.

[I do not know how google sheets do whatever concerning REGEX(). Again a case where conflicts concerning quality vs compatibility may occur. I would appreciate quality higher generally.]

Since the REGEX() function was only implemented recently (V6.1) it may still be feasible to change that without endangering the functionality of currently working sheets. therefore a fix should be created REALLY FAST.
Comment 1 Eike Rathke 2019-08-08 12:29:09 UTC
There is no 3rd occurrence of the expression in text, hence nothing is replaced. This essentially is similar to =REGEX("a";"b";"") and is standard replacement behaviour in tools like sed and awk, if there is no match then nothing is replaced. Specifically gawk's gensub() that knows the occurrence parameter as well behaves the same:

  gawk -e '{print gensub(/b/,"",1)}'

Try with different input of "a" and "b".

If you need to check if there actually is a 3rd occurrence to be replaced then do an explicit check:

  =IF(ISNA(REGEX("12-34-56";"[^-]*-";;3));NA();REGEX("12-34-56";"[^-]*-";"";3))

Unconditionally returning #N/A if there is no match in the replace case IMHO is not an option, otherwise the usual logic of replacements with regular expressions would be negated and all replacements would have to use IFNA().

What we maybe could do is add an optional parameter to the function whether #N/A shall be returned in replace cases or not. That would have to be hidden in the Function Wizard and input hints as we have UI translations freeze for 6.3, but in general would be possible even for 6.3. Earlier releases reading such additional parameter if present of course would fail then.
Comment 2 Wolfgang Jäger 2019-08-11 12:58:44 UTC
(In reply to Eike Rathke from comment #1)
> There is no 3rd occurrence of the expression in text, hence nothing is
> replaced. This essentially is similar to =REGEX("a";"b";"") and is standard
> replacement behaviour in tools like sed and awk, ...

Sorry, I lack the needed knowledge, and I am afraid I will not be able to make it available to me soon(or at all).

> If you need to check if there actually is a 3rd occurrence to be replaced
> then do an explicit check:
> =IF(ISNA(REGEX("12-34-56";"[^-]*-";;3));NA();REGEX("12-34-56";"[^-]*-";"";3))

Well, I have to apologize that I had not even studied the help on REGEX() thorougly enough. Having caught up on this, however, I still cannot easily be satisfied with using presence/absence of an optional parameter this way. Does not cause inconsistencies, but also doesn't fit into any scheme or concept I would know of.

> Unconditionally returning #N/A if there is no match in the replace case IMHO
> is not an option, otherwise the usual logic of replacements with regular
> expressions would be negated and all replacements would have to use IFNA().

You are right.

> What we maybe could do is add an optional parameter to the function whether
> #N/A shall be returned in replace cases or not. That would have to be hidden
> in the Function Wizard and input hints as we have UI translations freeze for
> 6.3, but in general would be possible even for 6.3. Earlier releases reading
> such additional parameter if present of course would fail then.

No actual need to get into tat trouble, I think.

I would suggest to set this bug RESOLVED NOTABUG, but I hesitate to do it myself.