Bug 58744 - EDITING: Cannot search with paragraph breaks or replace with line breaks; inconsistencies in search expressions
Summary: EDITING: Cannot search with paragraph breaks or replace with line breaks; inc...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.0 all versions
Hardware: All All
: medium normal
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-25 10:12 UTC by y3kcjd5
Modified: 2017-10-03 07:56 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description y3kcjd5 2012-12-25 10:12:53 UTC
Forenote: had I the option, I would have picked 'design flaw' or 'annoyance' for the severity, as this seems not to be an outright bug, but I wouldn't call it an enhancement since this issue involves function people (at least I) would expect present in the first place. If there is a better category, please file accordingly.

Several of the inconsistencies in how search symbols and expressions are defined in LibreOffice bug me to no end. Specifically, I find it quite annoying that I am unable to search with paragraph breaks like I can any other character, especially since
a. they are very common
b. they represent a significant part of text flow
c. Writer/LibreOffice imposes an absolute limit on how many characters may exist in a paragraph
In addition, as someone who works in a language (Japanese) where line breaks are also an important element, I also find it frustrating that (AFAIK) it is impossible to replace search strings with a line break character.
Current behavior (with regular expressions enabled):
-Search for '$' selects paragraph break
-Search for '^$' selects 2nd of two consecutive paragraph breaks
-Search for search string preceded/followed by '^'/'$' selects search string following/preceding paragraph break but not paragraph break itself
-Search for something followed by '^' or something preceded by '$' returns no results
-Search for '\n' and replace with '\n' replaces line break with paragraph break
-The restriction on number of characters in a paragraph effectively makes the '$' search string useless, as any attempt to replace '$' would simply replace all paragraph breaks in the document with some other character, resulting in unexpected behavior when the now one-paragraph-monster document exceeds the paragraph limit (most documents are longer than the paragraph limit).
Note: In v4.6.3 or earlier, there were also some bugs where line breaks were sometimes incompletely treated like paragraph breaks; e.g. at least some of the above would (buggily) apply to line breaks as well as paragraph breaks. These appear to have been fixed for the new beta.

In other words, the '^' and '$' symbols represent the beginning and end of a paragraph, (cursor position immediately following and preceding a paragraph break,) respectively. However, in the inconsistent cases of searches for '$' or '^$', the '$' symbol is suddenly treated as the paragraph break symbol itself. Furthermore, in the replace field, the same '$' symbol is then used with numbers to represent parenthesized strings in the search field (e.g. '$0', '$3'), and the '\n' sequence becomes the symbol used for a paragraph break, leaving no symbol for line break in the replace field.
The ultimate result is multiple inconsistencies in what various symbols represent, the inability to search for (and ergo replace) paragraph breaks in (effectively) any but the empty paragraph ('^$') case, and the inability to replace anything with a line break (it's search symbol, '\n', having been usurped by the paragraph break in the replace field).
Following, I have prepared two alternate solutions to this mess: the first being the most self-consistent and complete (and therefore the one I prefer) at a moderate cost of consistency with previous versions. This would also seem the easier to implement. The second, on the other hand, compromises self-consistency and ease of implementation somewhat in favor of version-consistency.

Proposal 1:
Of the '^' and '$' symbols, choose one (preferrably '$' as '^' also means not in some cases) to represent the paragraph break character and return the other to regular character status. In the replace field, inherit the search field definitions: '^' or '$' for paragraph break, '\n' for line break, and a number enclosed in parentheses to represent parenthesized search strings in the same manner '$' followed by a number is currently used.
This scheme maintains all previous function; for example (assuming '$' for paragraph break):
-Search '^something' Replace 'else' becomes Search '($)something' Replace '(1)else'
-Search 'something$' Replace 'else' becomes Search 'something($)' Replace 'else(1)'
-Search 's.m.th.ng' Replace 'still $0' becomes Search 's.m.th.ng' Replace 'still (0)'
-Search '\)(parenthetext)\(' Replace '($1)' becomes Search '\)(parenthetext)\(' Replace '\((1)\)'
-Search '\n' Replace '\n' becomes Search '\n' Replace '$'
but at the same time enables functions like:
-Search 'linebreakgoeshere' Replace '\n'
-Search 'something$else' Replace 'something\nelse'
-Search '(else)$(something)' Replace '(2)$(1)'
-Search '(.)${4,}(.)' Replace '(1)$$$(2)'
This scheme is more intuitive because symbols are used self-consistently (paragraph and line breaks are treated like other characters, and expressions represent the same function in both search and replace fields) and consequently reduces consultation with the help manual.

Proposal 2:
Should there be some valid and pressing reason the current configuration be maintained as much as possible, the following changes can be implemented with minimal change to current definitions, at a cost to self-consistency/intuitiveness and ease of implementation:
Consistently recognize '^' and '$' as the cursor positions following and preceding a line break, respectively, and '\n' as the line break character. In the replace field, '$^' would represent the paragraph break character.
Changes to current definitions would be limited to those examples similar to:
-Search '\n' Replace '\n' becomes Search '\n' Replace '$^'
whereas the following new functions would become available:
-Search '$something' Replace 'something' (would remove paragraph break preceding 'something', whereas Search '^something' would not)
-Search 'something$else' or 'something^else' or 'something$^else Replace 'something\nelse' (would replace paragraph break between 'something' and 'else' with a line break)
-Search 'something^${2,}else' or 'something^{2,}$else' or 'something^{3,}else' Replace 'something$^else' (would replace 2 or more consecutive paragraph breaks between 'something' and 'else' with a single paragraph break)
-Search '(s.m.th.ng)^' Replace '$^$1' (would move 's.m.th.ng' from the end of the paragraph to the beginning of the next)
-Search 's.m.th.ng$' Replace '$^$0' or '$^&'(would insert a paragraph break before 's.m.th.ng' at the end of a paragraph)
-Search '(else)$(something)' Replace '$2$^$1' (would switch 'else' at the end of a paragraph and 'something' at the beginning of the following paragraph)
In summary, this would effectively enable the same functions as Proposal 1 would, only in a manner more consistent with the search expression definitions of previous versions, albeit considerably more convoluted.
Comment 1 Joel Madero 2013-01-11 17:52:36 UTC
This is an enhancement request as nothing is broken. I think this particular bug report is a bit overbearing (multiple suggestion in one go) but since they are all related I'll just mark as NEW and let a developer decide where to go from there. 

Marking as:
New (Confirmed)
Enhancement
High - quite a few functional suggestions
Comment 2 Timur 2013-03-21 13:35:16 UTC
Is this the same as Bug 38261 - Better Find&Replace with regular expressions?
Comment 3 DouglasCarnall 2013-04-03 06:32:31 UTC
tl;dr "regex" that can't handle line breaks and hard returns nicely is not worth the candle

***
I'd just like to add my voice to y3kcjd5's (and many others: see linked discussion at 46165) for this "enhancement" to be prioritized. 

I've scarequoted "enhancement" because not being able to perform the following very common real world use case makes LibreOffice Writer feel DEEPLY BROKEN.

Text in which each line break has been converted into a hard return is a common phenomenon: e.g. when copying text from an EMAIL and pasting it into a DOCUMENT. When would anyone ever do that?

One simple pragmatic way to fix this using find and replace functionality commonly available in e.g. Microsoft Word, is as follows:

Find and replace all occurrences of ¶¶ with e.g. §§§ [mark the pars we do want]
Find and replace all occurrences of ¶ with a space [zap the pars we don't]
Find and replace all occurrences of §§§ with ¶ [recreate pars we marked in first step]

Done.

Of course, anyone capable of initiating the sequence above is also almost certainly competent to find a workaround elsewhere. But it feels very broken not just to be able to do it within the application, perhaps because somewhere along the line someone deliberately hobbled \p perhaps because of this 65535 character paragraph limit (huh?).

I'm not sure I fully follow y3kcjd5's proposal, but if Writer can be tweaked to make the sequence I outline above easily possible, it would meet most of my regex needs.
Comment 4 Paul Weiss 2013-04-03 14:44:07 UTC
(In reply to comment #3)
I agree!
Comment 5 David 2014-01-01 10:44:20 UTC
I'd like to add my voice to the request for this development to move to solution.

I also support comment #3 and appreciate DouglasCarnall's pithy "tl;dr" that captures the situation nicely. (I can't see what is relevant "inked discussion at 46165", however - is that bug number a typo?)

Another "real world" scenario is the possibility of finding and removing duplicate paragraphs. This is a common workflow: sort paragraphs > search on continguous identical paragraphs > remove duplicates.

At the moment, so far as I know, this is not possible in Writer, either with the "native" CTRL-H + regex search/replace (which cannot match across $ paragraph boundaries), or with the AltSearch extension (which cannot use back-references in searches with * -- although it can make matches across paragraph breaks).

There is some discussion of this at AskLibreOffice:

http://ask.libreoffice.org/en/question/27682/regular-expression-references-not-working/

It would be wonderful to see this "fixed"!
Comment 6 crxssi 2014-01-25 17:21:30 UTC
I also was seeking out a solution to this "bug" as I am increasingly annoyed that I cannot search for $$ and replace with $.  Or perhaps $^$ and replace with $.  Or \n\n and replace with \n.  It is silly that I have to copy text from LO Writer into Gedit so I can do something so basic and then copy it back.  Go ahead and try it-  it seems it is impossible to remove double newlines in LO/OO!

The other thing that makes absolutely no sense to me is even described in the LO/OO help file:

""""
\n in the Search for text box stands for a line break that was inserted with the Shift+Enter key combination.
\n in the Replace with text box stands for a paragraph break that can be entered with the Enter or Return key.
""""

Really?  LO/OO treats \n as two totally different things depending on if it is in the search box or the replace box?  That is about as user-friendly as a cracked glass of nitroglycerin!  Wouldn't it have made more sense to keep \n as a paragraph break and invent something new or reuse something not used for a line break... maybe use \r or something?

I think the current behavior with at least the above example is horribly broken at worst and very poorly designed at best.  Seems to me to be a toss-up between a "bug" and an "enhancement".
Comment 7 Joel Madero 2015-05-02 15:41:22 UTC Comment hidden (obsolete)
Comment 8 MarjaE 2015-11-26 22:36:40 UTC
It's still an issue in 5.0.
Comment 9 y3kcjd5 2017-04-15 16:32:48 UTC
As far as I can tell, this remains unaddressed.
Comment 10 Marcel Partap 2017-10-02 06:37:22 UTC
I just hit the use case of pasting an email, searching for a solution I found several suggestions with none of them working, so after wasting nearly half an hour trying to `sudo make sandwich s/$/\n/g` I was left to replace paragraph with line breaks manually. \n representing a line break in search and a paragraph break when replacing is obviously & utterly silly.
Comment 11 Marcel Partap 2017-10-02 06:39:35 UTC
Oh and the workaround mentioned in several posts to just copy a line break and paste it into the replace field did not work on this linux rig.