70629 – Searching with regular expressions doesn't give expected result

Bug 70629 - Searching with regular expressions doesn't give expected result

Summary: Searching with regular expressions doesn't give expected result

Status:	CLOSED NOTABUG

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Writer (show other bugs)
Version: (earliest affected)	4.1.1.2 release
Hardware:	x86 (IA32) Linux (All)

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-10-18 18:59 UTC by wettererscheinung
Modified:	2013-10-21 14:43 UTC (History)
CC List:	1 user (show)

See Also:	70627
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description wettererscheinung 2013-10-18 18:59:56 UTC

When I make a search using regular expressions, like the following:

\{06(>|)(1|2|3|4|)(: |)([:alpha:]|[:space:]|_|)+\}.+\{(/|)06(>|)(1|2|3|4|)(: |)([:alpha:]|[:space:]|_|)+\}

in the following text:

{01>1}{02>1>2}Nam liber{06>3} tempor cum soluta nobis eleifend{/06>3} option congue nihil imperdiet doming id quod mazim placerat facer possim assum. Lorem ipsum dolor sit amet, consectetuer adipiscing elit,{05>1} sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud {06>3}exerci tation ullamcorper suscipit lobortis nisl {/06>3}ut aliquip ex{/05>1: Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis} ea commodo consequat.  {/02>1>2} {/01>1}

The highlighted result is not as expected from the first occurence (opening tag) to the following second (closing tag) but rather from the first occurence to the last of that paragraph.

Is there a possibility to change that?

-> See also Bug 70627 "LibreOffice-Writer crashes entering a complicated search string in find&replace dialogue" where I tried to do a workaround which leads to a crash.

Comment 1 Eike Rathke 2013-10-21 14:43:03 UTC

The .+ between opening and closing tag specifies to match any character 1 or more times and match as many times as possible. The pattern match does exactly what you asked it to do ;-)

To find the first possible match of the closing tag the pattern needs to specify to exclude such a closing tag in the "find any" pattern. In the simplest case this could be [^{]+ instead of .+ if the text between the tags can not contain a { character, if it could then the expression would get more complicated and may involve look-ahead and what not. For available regex operators see http://userguide.icu-project.org/strings/regexp#TOC-Regular-Expression-Operators

If the same tags can appear nested it is even more cumbersome to match the corresponding closing tag. However, if such tags are arbitrary it gets nearly impossible to match all combinations, this is a similar situation to all hopeless attempts to parse full flavored HTML using regular expressions. See also http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454