Bug 112575 - Improve handling of paragraph endings in regular expression replacements
Summary: Improve handling of paragraph endings in regular expression replacements
Status: RESOLVED DUPLICATE of bug 108256
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.4.1.2 release
Hardware: All All
: low enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Find-Search
  Show dependency treegraph
 
Reported: 2017-09-22 14:40 UTC by Daniel Grigoras
Modified: 2018-02-17 11:13 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
sample document with desired "Reset Value: " string (25.76 KB, application/vnd.oasis.opendocument.text)
2017-09-23 15:30 UTC, V Stuart Foote
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Grigoras 2017-09-22 14:40:00 UTC
Description:
I have not found a regular expression for searching and replacing combinations of paragraph marks and text.

If I search for \n in my document, all that is found are the paragraph marks inserted before figure captions found in the frames of the figures.

Also, the help page on regular expressions (https://help.libreoffice.org/Common/List_of_Regular_Expressions) wrongly states that \n is for searching line breaks inserted via Shift+Enter. Shift+Enter inserts a page break and not a line break.

MS Word has a "Special" button in the Search & Replace window to easily pick regular expressions. This should also be a feature of LibreOffice.

Please correct this issue and make \n or \p to represent a paragraph mark.

Use case: I want to search for certain lines containing certain text and replace that text together with the line on which that text was placed. I can't make any such replacement as \nTextToReplace is not found.

Steps to Reproduce:
-

Actual Results:  
-

Expected Results:
-


Reproducible: Always

User Profile Reset: No

Additional Info:


User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0
Comment 1 Dieter 2017-09-22 15:00:50 UTC
Duplicate of 108256 ?
Comment 2 Daniel Grigoras 2017-09-22 15:07:25 UTC
(In reply to Dieter Praas from comment #1)
> Duplicate of 108256 ?

No, though the contents of my reply there are similar with what I reported here.
The main issue reported there as here is indeed the lack of a regular expression or regular expression meaning for paragraph marks.
Here I decided to ask for a dedicated regular expression for paragraph marks and to also ask for a an enhancement for easy picking and inserting regular expressions as in MS Word.
Comment 3 Jacques Guilleron 2017-09-22 17:13:52 UTC
Hi Daniel,

Did you try the $ character with regular expression ticked into Options?
Comment 4 V Stuart Foote 2017-09-22 18:05:18 UTC
(In reply to Jacques Guilleron from comment #3)
> Hi Daniel,
> 
> Did you try the $ character with regular expression ticked into Options?

With Regular expressions enabled use:
$ for paragraph marks
\n for line breaks

replace with empty to clear. Unfortunately don't work in buffer syntax.
Comment 5 Daniel Grigoras 2017-09-22 18:12:52 UTC
$ for paragraph mark is actually counterintuitive.

If \n stands for manual line break, then \p should have been the regular expression for a paragraph mark.

The issue is that I cannot search for $TextToReplace, while in MS Word I can search for ^13TextToReplace and have it replaced with what I want, with nothing in this case.
Comment 6 V Stuart Foote 2017-09-22 21:27:10 UTC
(In reply to Daniel Grigoras from comment #5)
> $ for paragraph mark is actually counterintuitive.
> 
> If \n stands for manual line break, then \p should have been the regular
> expression for a paragraph mark.

No, this is just the way the OOo legacy search and use the ICU based regular expression syntax has evolved.

LibreOffice implements the ICU string search with current ICU (59.1) libraries: 
http://userguide.icu-project.org/strings/regexp

ICU Regex treats "$" as "Match at the end of a line. Line terminating characters are \u000a, \u000b, \u000c, \u000d, \u0085, \u2028, \u2029 and the sequence \u000d \u000a."

But there is no ICU regular expression for "paragraph mark" rather that is a LO defined 'end of paragraph object'--CH_PAR--which are notated in search with $, and represented on document canvas with Unicode PILCROW glyph (\u00b6) in all fonts. But you can not search for the PILCROW just the "$" notation.

The "line break"--CH_BREAK-- "\n" symbol, or its Unicode "\u000a" point is searchable.

And as noted in bug 108256,  a Replace with "\n" will convert the line break to a  paragraph end mark. Not wrong, just unusual for more adept regex users.

> 
> The issue is that I cannot search for $TextToReplace, while in MS Word I can
> search for ^13TextToReplace and have it replaced with what I want, with
> nothing in this case.

$TextToReplace has no meaning--in ICU regex the $ is last position in the match.

=-ref-=
ICU (59.1) String search - regexp
http://userguide.icu-project.org/strings/regexp

LibreOffice Help
https://help.libreoffice.org/Writer/Using_Wildcards_in_Text_Searches

https://help.libreoffice.org/Common/List_of_Regular_Expressions
Note specifics of handling of $ and \n in LibreOffice--the help article is correct.
Comment 7 Daniel Grigoras 2017-09-23 14:44:33 UTC
$TextToReplace has no meaning?
What do you meant?

If ^13TextToReplace has a useful and usable meaning in MS Word, why shouldn't $TextToReplace have a useful and usable meaning in LibreOffice Writer?

Stuart, maybe you have difficulties understanding abstract things. I'll give you a concrete example of what I want to search for an replace: I want to search for and replace "Reset Value: " and leave no empty line behind in so doing. In Word I can do this by simply searching for "^13Reset Value: " (Use wildcards ticked) and replacing it with nothing. How would I be able to do this in LibreOffice Writer if "$Reset Value: " has no meaning?

PS: It seems that we will gradually be switching back to MS Word, so starting sometime in the near future you won't hear from me again reporting LibreOffice bugs and shortcomings.
Comment 8 V Stuart Foote 2017-09-23 15:27:55 UTC
(In reply to Daniel Grigoras from comment #7)
> $TextToReplace has no meaning?
> What do you meant?
> 

Functionally, the LibreOffice regular expression for paragraph mark is "$" 

> If ^13TextToReplace has a useful and usable meaning in MS Word, why
> shouldn't $TextToReplace have a useful and usable meaning in LibreOffice
> Writer?
> 
> Stuart, maybe you have difficulties understanding abstract things. I'll give
> you a concrete example of what I want to search for an replace: I want to
> search for and replace "Reset Value: " and leave no empty line behind in so
> doing. In Word I can do this by simply searching for "^13Reset Value: " (Use
> wildcards ticked) and replacing it with nothing. How would I be able to do
> this in LibreOffice Writer if "$Reset Value: " has no meaning?

Wrong syntax. Reverse the position of the "$" paragraph end.

Attaching a sample document, use these find strings with Regular expressions enabled:

"Reset Value: $"

".*Reset Value: $"

"Reset Value: \n"

".*Reset Value: \n"

Convince yourself then close this as INVALID as that is what it is ;-)
Comment 9 V Stuart Foote 2017-09-23 15:30:47 UTC
Created attachment 136487 [details]
sample document with desired "Reset Value: " string
Comment 10 Daniel Grigoras 2017-09-23 16:16:03 UTC
"Reset Value: $" in Writer only replaces the text and leaves an empty paragraph behind, while I want both the text and the paragraph on which the text exists removed. "^13Reset Value: " in Word removes both the text and the paragraph.

The simple fact that I cannot search for "$Reset Value: ", but I can search for "Reset Value: $" is telling that you have an issue, so please give up on sophistry and on trying to bury this bug.
Comment 11 V Stuart Foote 2017-09-23 20:00:03 UTC
(In reply to Daniel Grigoras from comment #10)
> "Reset Value: $" in Writer only replaces the text and leaves an empty
> paragraph behind, while I want both the text and the paragraph on which the
> text exists removed. "^13Reset Value: " in Word removes both the text and
> the paragraph.

Follow the ".*Reset Value: $" replace null with a "^$" replace null.
 
> 
> The simple fact that I cannot search for "$Reset Value: ", but I can search
> for "Reset Value: $" is telling that you have an issue...

I don't have anything. What the syntax in LibreOffice has are differences in syntax from MS Office's implementation. If you are happy with MS Word, use it.
 
Otherwise learn LibreOffice's syntax.
Comment 12 Daniel Grigoras 2017-09-25 09:39:26 UTC
(In reply to V Stuart Foote from comment #11)
> Follow the ".*Reset Value: $" replace null with a "^$" replace null.
>  

Oh, you sure are dishonest.
Your solution does not work because:
1. One has to make two different replacements instead of just one as is possible in MS Word.
2. The second replacement replaces all empty paragraphs, event the ones that are non related to the empty paragraphs left behind by the first replacement, that is, the empty paragraphs that should not be removed.

Dishonesty, dishonesty, dishonesty.
It's really sad that some can be so dishonest and deceitful.
Comment 13 V Stuart Foote 2017-09-25 12:19:37 UTC
And again, the issue is invalid. The "$" is a valid representation of paragraph object endings.

But since you seem insistent, poof--it is an enhancement. Meanwhile, please learn to use the LibreOffice syntax that works correctly.
Comment 14 Xisco Faulí 2017-09-26 08:43:34 UTC
(In reply to V Stuart Foote from comment #13)
> And again, the issue is invalid. The "$" is a valid representation of
> paragraph object endings.

I agree with you. Closing a RESOLVED NOTABUG
Comment 15 Daniel Grigoras 2017-09-26 08:53:28 UTC
Indeed, this is an enhancement that should be made. So I'm opening this ticket again until this basic enhancement is made.
However, closing this ticket with the argument that this is not a bug, but a needed enhancement is again proof of self-righteousness and dishonesty.
So please improve LibreOffice's regular expression syntax.
Comment 16 Buovjaga 2017-12-26 12:15:47 UTC

*** This bug has been marked as a duplicate of bug 91033 ***
Comment 17 V Stuart Foote 2018-02-17 00:02:02 UTC
Working well with Windows builds current master.
Version: 6.1.0.0.alpha0+ (x64)
Build ID: e1082e45361a92a31adedcc3ed0a35c704bca543
CPU threads: 8; OS: Windows 10.0; UI render: GL; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2018-02-15_23:24:06
Locale: en-US (en_US); Calc: group

*** This bug has been marked as a duplicate of bug 102374 ***
Comment 18 Justin L 2018-02-17 11:13:22 UTC
Not a duplicate of bug 102374 since it is searching for text PLUS paragraph mark.

Returning as a duplicate of 108256 (which bug 91033 has also become a duplicate of).

*** This bug has been marked as a duplicate of bug 108256 ***