Download it now!
Bug 58744 - EDITING: Cannot search with paragraph breaks or replace with line breaks; inconsistencies in search expressions
Summary: EDITING: Cannot search with paragraph breaks or replace with line breaks; inc...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.0 all versions
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Find-Search
  Show dependency treegraph
 
Reported: 2012-12-25 10:12 UTC by y3kcjd5
Modified: 2020-05-21 06:38 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description y3kcjd5 2012-12-25 10:12:53 UTC
Forenote: had I the option, I would have picked 'design flaw' or 'annoyance' for the severity, as this seems not to be an outright bug, but I wouldn't call it an enhancement since this issue involves function people (at least I) would expect present in the first place. If there is a better category, please file accordingly.

Several of the inconsistencies in how search symbols and expressions are defined in LibreOffice bug me to no end. Specifically, I find it quite annoying that I am unable to search with paragraph breaks like I can any other character, especially since
a. they are very common
b. they represent a significant part of text flow
c. Writer/LibreOffice imposes an absolute limit on how many characters may exist in a paragraph
In addition, as someone who works in a language (Japanese) where line breaks are also an important element, I also find it frustrating that (AFAIK) it is impossible to replace search strings with a line break character.
Current behavior (with regular expressions enabled):
-Search for '$' selects paragraph break
-Search for '^$' selects 2nd of two consecutive paragraph breaks
-Search for search string preceded/followed by '^'/'$' selects search string following/preceding paragraph break but not paragraph break itself
-Search for something followed by '^' or something preceded by '$' returns no results
-Search for '\n' and replace with '\n' replaces line break with paragraph break
-The restriction on number of characters in a paragraph effectively makes the '$' search string useless, as any attempt to replace '$' would simply replace all paragraph breaks in the document with some other character, resulting in unexpected behavior when the now one-paragraph-monster document exceeds the paragraph limit (most documents are longer than the paragraph limit).
Note: In v4.6.3 or earlier, there were also some bugs where line breaks were sometimes incompletely treated like paragraph breaks; e.g. at least some of the above would (buggily) apply to line breaks as well as paragraph breaks. These appear to have been fixed for the new beta.

In other words, the '^' and '$' symbols represent the beginning and end of a paragraph, (cursor position immediately following and preceding a paragraph break,) respectively. However, in the inconsistent cases of searches for '$' or '^$', the '$' symbol is suddenly treated as the paragraph break symbol itself. Furthermore, in the replace field, the same '$' symbol is then used with numbers to represent parenthesized strings in the search field (e.g. '$0', '$3'), and the '\n' sequence becomes the symbol used for a paragraph break, leaving no symbol for line break in the replace field.
The ultimate result is multiple inconsistencies in what various symbols represent, the inability to search for (and ergo replace) paragraph breaks in (effectively) any but the empty paragraph ('^$') case, and the inability to replace anything with a line break (it's search symbol, '\n', having been usurped by the paragraph break in the replace field).
Following, I have prepared two alternate solutions to this mess: the first being the most self-consistent and complete (and therefore the one I prefer) at a moderate cost of consistency with previous versions. This would also seem the easier to implement. The second, on the other hand, compromises self-consistency and ease of implementation somewhat in favor of version-consistency.

Proposal 1:
Of the '^' and '$' symbols, choose one (preferrably '$' as '^' also means not in some cases) to represent the paragraph break character and return the other to regular character status. In the replace field, inherit the search field definitions: '^' or '$' for paragraph break, '\n' for line break, and a number enclosed in parentheses to represent parenthesized search strings in the same manner '$' followed by a number is currently used.
This scheme maintains all previous function; for example (assuming '$' for paragraph break):
-Search '^something' Replace 'else' becomes Search '($)something' Replace '(1)else'
-Search 'something$' Replace 'else' becomes Search 'something($)' Replace 'else(1)'
-Search 's.m.th.ng' Replace 'still $0' becomes Search 's.m.th.ng' Replace 'still (0)'
-Search '\)(parenthetext)\(' Replace '($1)' becomes Search '\)(parenthetext)\(' Replace '\((1)\)'
-Search '\n' Replace '\n' becomes Search '\n' Replace '$'
but at the same time enables functions like:
-Search 'linebreakgoeshere' Replace '\n'
-Search 'something$else' Replace 'something\nelse'
-Search '(else)$(something)' Replace '(2)$(1)'
-Search '(.)${4,}(.)' Replace '(1)$$$(2)'
This scheme is more intuitive because symbols are used self-consistently (paragraph and line breaks are treated like other characters, and expressions represent the same function in both search and replace fields) and consequently reduces consultation with the help manual.

Proposal 2:
Should there be some valid and pressing reason the current configuration be maintained as much as possible, the following changes can be implemented with minimal change to current definitions, at a cost to self-consistency/intuitiveness and ease of implementation:
Consistently recognize '^' and '$' as the cursor positions following and preceding a line break, respectively, and '\n' as the line break character. In the replace field, '$^' would represent the paragraph break character.
Changes to current definitions would be limited to those examples similar to:
-Search '\n' Replace '\n' becomes Search '\n' Replace '$^'
whereas the following new functions would become available:
-Search '$something' Replace 'something' (would remove paragraph break preceding 'something', whereas Search '^something' would not)
-Search 'something$else' or 'something^else' or 'something$^else Replace 'something\nelse' (would replace paragraph break between 'something' and 'else' with a line break)
-Search 'something^${2,}else' or 'something^{2,}$else' or 'something^{3,}else' Replace 'something$^else' (would replace 2 or more consecutive paragraph breaks between 'something' and 'else' with a single paragraph break)
-Search '(s.m.th.ng)^' Replace '$^$1' (would move 's.m.th.ng' from the end of the paragraph to the beginning of the next)
-Search 's.m.th.ng$' Replace '$^$0' or '$^&'(would insert a paragraph break before 's.m.th.ng' at the end of a paragraph)
-Search '(else)$(something)' Replace '$2$^$1' (would switch 'else' at the end of a paragraph and 'something' at the beginning of the following paragraph)
In summary, this would effectively enable the same functions as Proposal 1 would, only in a manner more consistent with the search expression definitions of previous versions, albeit considerably more convoluted.
Comment 1 Joel Madero 2013-01-11 17:52:36 UTC
This is an enhancement request as nothing is broken. I think this particular bug report is a bit overbearing (multiple suggestion in one go) but since they are all related I'll just mark as NEW and let a developer decide where to go from there. 

Marking as:
New (Confirmed)
Enhancement
High - quite a few functional suggestions
Comment 2 Timur 2013-03-21 13:35:16 UTC
Is this the same as Bug 38261 - Better Find&Replace with regular expressions?
Comment 3 DouglasCarnall 2013-04-03 06:32:31 UTC
tl;dr "regex" that can't handle line breaks and hard returns nicely is not worth the candle

***
I'd just like to add my voice to y3kcjd5's (and many others: see linked discussion at 46165) for this "enhancement" to be prioritized. 

I've scarequoted "enhancement" because not being able to perform the following very common real world use case makes LibreOffice Writer feel DEEPLY BROKEN.

Text in which each line break has been converted into a hard return is a common phenomenon: e.g. when copying text from an EMAIL and pasting it into a DOCUMENT. When would anyone ever do that?

One simple pragmatic way to fix this using find and replace functionality commonly available in e.g. Microsoft Word, is as follows:

Find and replace all occurrences of ¶¶ with e.g. §§§ [mark the pars we do want]
Find and replace all occurrences of ¶ with a space [zap the pars we don't]
Find and replace all occurrences of §§§ with ¶ [recreate pars we marked in first step]

Done.

Of course, anyone capable of initiating the sequence above is also almost certainly competent to find a workaround elsewhere. But it feels very broken not just to be able to do it within the application, perhaps because somewhere along the line someone deliberately hobbled \p perhaps because of this 65535 character paragraph limit (huh?).

I'm not sure I fully follow y3kcjd5's proposal, but if Writer can be tweaked to make the sequence I outline above easily possible, it would meet most of my regex needs.
Comment 4 Paul Weiss 2013-04-03 14:44:07 UTC
(In reply to comment #3)
I agree!
Comment 5 David 2014-01-01 10:44:20 UTC
I'd like to add my voice to the request for this development to move to solution.

I also support comment #3 and appreciate DouglasCarnall's pithy "tl;dr" that captures the situation nicely. (I can't see what is relevant "inked discussion at 46165", however - is that bug number a typo?)

Another "real world" scenario is the possibility of finding and removing duplicate paragraphs. This is a common workflow: sort paragraphs > search on continguous identical paragraphs > remove duplicates.

At the moment, so far as I know, this is not possible in Writer, either with the "native" CTRL-H + regex search/replace (which cannot match across $ paragraph boundaries), or with the AltSearch extension (which cannot use back-references in searches with * -- although it can make matches across paragraph breaks).

There is some discussion of this at AskLibreOffice:

http://ask.libreoffice.org/en/question/27682/regular-expression-references-not-working/

It would be wonderful to see this "fixed"!
Comment 6 crxssi 2014-01-25 17:21:30 UTC
I also was seeking out a solution to this "bug" as I am increasingly annoyed that I cannot search for $$ and replace with $.  Or perhaps $^$ and replace with $.  Or \n\n and replace with \n.  It is silly that I have to copy text from LO Writer into Gedit so I can do something so basic and then copy it back.  Go ahead and try it-  it seems it is impossible to remove double newlines in LO/OO!

The other thing that makes absolutely no sense to me is even described in the LO/OO help file:

""""
\n in the Search for text box stands for a line break that was inserted with the Shift+Enter key combination.
\n in the Replace with text box stands for a paragraph break that can be entered with the Enter or Return key.
""""

Really?  LO/OO treats \n as two totally different things depending on if it is in the search box or the replace box?  That is about as user-friendly as a cracked glass of nitroglycerin!  Wouldn't it have made more sense to keep \n as a paragraph break and invent something new or reuse something not used for a line break... maybe use \r or something?

I think the current behavior with at least the above example is horribly broken at worst and very poorly designed at best.  Seems to me to be a toss-up between a "bug" and an "enhancement".
Comment 7 Joel Madero 2015-05-02 15:41:22 UTC Comment hidden (obsolete)
Comment 8 MarjaE 2015-11-26 22:36:40 UTC
It's still an issue in 5.0.
Comment 9 y3kcjd5 2017-04-15 16:32:48 UTC
As far as I can tell, this remains unaddressed.
Comment 10 Marcel Partap 2017-10-02 06:37:22 UTC
I just hit the use case of pasting an email, searching for a solution I found several suggestions with none of them working, so after wasting nearly half an hour trying to `sudo make sandwich s/$/\n/g` I was left to replace paragraph with line breaks manually. \n representing a line break in search and a paragraph break when replacing is obviously & utterly silly.
Comment 11 Marcel Partap 2017-10-02 06:39:35 UTC
Oh and the workaround mentioned in several posts to just copy a line break and paste it into the replace field did not work on this linux rig.
Comment 12 himajin100000 2019-07-14 19:00:24 UTC
I haven't taken a look closer at the following code, and just a guess.
but it gives me an impression that this code has something to do with the behavior described in this bug report.

https://opengrok.libreoffice.org/xref/core/sw/source/core/crsr/findtxt.cxx?a=true&r=28bff4bd&h=377#377
Comment 13 himajin100000 2019-07-14 19:01:32 UTC
oops, typos
has => has
Comment 14 Luke Kendall 2020-05-21 06:33:57 UTC
Example Use Case

Just a note to show some of the problems and contortions this causes.  In other words it's a use case for an author of a book. In my case, using Writer 6.4.2.2.

I wanted to adjust certain paragraphs to have zero indent (i.e. apply my new ChapterBdy1st paragraph style), as per normal book typesetting conventions. So this applies as follows within my book to:

A. 1st para of chapter
B. 1st para after an empty line ‘scene break’
C. 1st para after an otherwise-empty solo ‘-’ line indicating a change of scene

Because some ereaders delete blank lines, I use non-breaking spaces to ensure my 'scene breaks' don't get removed.  The following recipe is what I needed to achieve the desired results (took about an hour for a 450pp book):

You need to make sure the document is clean: no blank lines at the ends of chapters (otherwise the following chapter header will be found and accidentally converted to ChapterBdy1st). Also check there are no other special cases (like blank lines before quotation indents or poems or lyrics). So a manual scan first is worthwhile.

B. For the non-breaking space 'empty' lines:

    1. replace all ^<nbsp>$ with an empty line
    2. select whole document from starting chapter to end, but omitting early empty paragraphs
    3. Find All in Selection of ^$
    4. change paragraph style to ChapterBdy1st
    5. select whole document from starting chapter to end, but omitting early empty paragraphs
    6. change all ^$ to <nbsp>\n

C. Then do a similar replace of solo ‘-’ lines by empty lines, do the above substitutions, except the final one would change all ^$ to -\n, then find all ^-$ lines and apply Centered style and even Chapter Body style:

    1. Replace all ^-$ with an empty line
    2. select whole document from starting chapter to end, but omitting early empty paragraphs
    3. Find All in Selection of ^$
    4. change paragraph style to ChapterBdy1st
    5. select whole document from starting chapter to end, but omitting early empty paragraphs
    6. change all ^$ to -\n
    7. Find All ^-$
    8. Apply Centered style
    9. Apply Chapter Body style

A. Unfortunately I can’t think of a way to change the 1st para of each chapter via a single F&R. So the final step was to manually find each Chapter Title one by one with F&R, and then select the following paragraph and change it to ChapterBdy1st.
Comment 15 Luke Kendall 2020-05-21 06:38:48 UTC
That second step 6 (change all ^$ to -\n) should have read:

6. Replace All in Selection ^$ to -\n
Comment 16 Luke Kendall 2020-05-21 06:38:58 UTC Comment hidden (obsolete)