Bug 150794 - Incorrect replace with regular expressions
Summary: Incorrect replace with regular expressions
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.3.5.2 release
Hardware: All Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-09-05 13:03 UTC by Mark van Rossum
Modified: 2022-09-06 13:39 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark van Rossum 2022-09-05 13:03:06 UTC
Description:
When replacing with regular expressions, the new paragraph character remains, evn though is was meant to be replaced.


Steps to Reproduce:
Edit a new file with as content
1
2

Replace '1$' with 'x'
THis  should give
x2


Actual Results:
However the new paragraph character remains.
x
2
That is, the new paragraph remain.


Expected Results:
x2


Reproducible: Always


User Profile Reset: No



Additional Info:
Maybe this is as intended, but editors such as Kate do this correctly.
Comment 1 Mike Kaganski 2022-09-05 13:46:12 UTC
While the general inability of Writer to search across several paragraphs is itself a real issue, this specific case is not a bug.

1. There is *no* "paragraph character" in Writer at all. Paragraphs are separate objects, and there is no "character" that separates them. Writer document is not a plain text with a CR, LF, or other single character having that special meaning.

2. More importantly: in regular expressions, $ is not a metacharacter to *match such a end-of-paragraph character*, but a predicate (a simple form of look-ahead assertion), which itself does not match any single character, only makes the adjacent expression only match when it's in the end (of either a paragraph, or a whole text). It should *not* "select" the something that separates paragraphs, so Kate behaves not "correctly" in that regards, but it uses some extension of the convention.

See the regular expression syntax documentation [1] that is used in LibreOffice, which explicitly tells:

> $	Match at the end of a line. Line terminating characters are \u000a,
> 	\u000b, \u000c, \u000d, \u0085, \u2028, \u2029 and the sequence \u000d
> 	\u000a.

Note the "match /at/ the end ...", with the "at" meaning that not the "end" itself is matched, but characters that appear at that end.

You may also test regexes and results of replacement of "1$" with anything at resources like https://regex101.com/.

[1] https://unicode-org.github.io/icu/userguide/strings/regexp.html
Comment 2 Eike Rathke 2022-09-06 13:39:44 UTC
Confusion may have arisen by the help
https://help.libreoffice.org/7.4/en-GB/text/shared/01/02100001.html?&DbPAR=WRITER
stating for $ in the second paragraph:
"$ on its own matches the end of a paragraph. This way it is possible to search and replace paragraph breaks."
Emphasis here though is on *on its own*, i.e. searching *only* for $ will match the paragraph end that can be replaced, in combination with other text like a$ the $ anchors the text to the paragraph end.