Description: The documentation states "A search using a regular expression will work only within one paragraph. To search using a regular expression in more than one paragraph, do a separate search in each paragraph." THIS IS A BUG, not a feature, and needs to be fixed, since A REGULAR EXPRESSION MAY INCLUDE ONE OR MANY PARAGRAPH BREAKS. Obviously this is an archaic way of avoiding a bug, or limitation, that prevented the search feature from working correctly when CR/LFs were encountered. Steps to Reproduce: 1.Open help file:///C:/Program%20Files/LibreOffice/help/en-US/text/swriter/guide/search_regexp.html?&DbPAR=WRITER&System=WIN 2. Observe tip: "A search using a regular expression will work only within one paragraph. To search using a regular expression in more than one paragraph, do a separate search in each paragraph." 3. Fail. Because your intended search "Find" includes a line break in the middle. Actual Results: Search online in vain for a workaround. Find one AltSearch extension, 5 years out of date, unsupported on current LO version. Expected Results: I expect to be able to form a search term that is supported by MS Word and/or Notepad++. Reproducible: Always User Profile Reset: No Additional Info: It should support searching an entire document even when regular expressions are used. Optimally it should support searching an entire document when using a Boost regular expression engine. At the very least it should support searching an entire document when using BASIC EXTENDED CODES, please see https://npp-user-manual.org/docs/searching/#extended-search-mode for examples. Also see https://github.com/notepad-plus-plus/notepad-plus-plus and https://extensions.libreoffice.org/en/extensions/show/alternative-dialog-find-replace-for-writer
Created attachment 192440 [details] Notepad++ example of "Extended Mode" search Explained at https://npp-user-manual.org/docs/searching/#extended-search-mode
importance should be changed to major since this has been a problem for a substantial number of users for over two decades
(In reply to -t from comment #2) > importance should be changed to major since this has been a problem for a > substantial number of users for over two decades over 10 years, not 20 lol
Please don't set up your own reports as NEW, some else must do it, well except you are going to fix it.
(In reply to m_a_riosv from comment #4) > some[one] else well, can definitely "confirm" the problem, indeed far from new ... e.g. https://ask.libreoffice.org/t/find-replace-including-a-paragraph-mark/1390/9
Actually the help article (par_id3153414) is not accurate. Calc ICU lib regexp search/replace are a bit less "global" than ICU regexp searches in Writer. And there is also "match" mode for wildcard and interoperability with Excel XLS and XLSX sheet formats. [1][2][3] In other words, full document search and replace with ICU lib regular expressions *already* works by default in all modules. With the F&R dialog offering an "off" mode toggle in Writer or Calc While the "Replace" field of the F&R dialog does not directly execute the regexp. The comment "will work only within one paragraph..." in the help article (par_id3153414) should have been removed when Wildcard content was reworked for bug 142574 [4]. A broader rework of regexp (more seamless use) is in see also bug 38261. IMHO this bug reports a documentation issue. =-ref-= [1] https://books.libreoffice.org/en/CG24/CG2402-EnteringandEditingData.html#toc72 [2] https://books.libreoffice.org/en/WG75/WG7503-TextAdvanced.html#toc15 [3] https://help.libreoffice.org/24.8/en-US/text/swriter/guide/search_regexp.html?DbPAR=WRITER [4] https://gerrit.libreoffice.org/c/help/+/120573
(In reply to V Stuart Foote from comment #6) I don't see how this "works". The problem is that it can't match anything across paragraphs' bounds, like "two last characters of paragraph and three first characters of the next paragraph". This is not a documentation issue; this is not "INVALID". But I'm sure we have this filed somewhere already.
OK, we don't/can't test "paragraph" breaks with regexp because there are none (that ICU can parses) and IIRC we would have to refactor to support the regexp look-behind/look-ahead modes. Otherwise, full document runs can be edited with regexp, just not in a single pass. But since it is trivial to replace all paragraph ends (our $ as represented by Pilcrow glyph) with a marker. Then parse the entire document text run. And after any changes restore the structure with a new paragraph ('\n' replacement for "marker" to recreate the paragraph breaks. Obviously can have issues with a heavily styled document.
And sorry, meant to leave it NEW against documentation, not Resolved Invalid...
(In reply to V Stuart Foote from comment #6) > Actually the help article (par_id3153414) is not accurate. fixed here https://gerrit.libreoffice.org/c/help/+/171538 feek free to review ;) > A broader rework of regexp (more seamless use) is in see also bug 38261. trying to, progressively.
Pierre F committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/help/commit/e3caa53e99709b7099611b67cf73e9bdbd8801ea more (simple) regex examples + fix note on paragraph limitation. tdf#38261, tdf#159607
(In reply to Commit Notification from comment #11) > Pierre F committed a patch related to this issue. > It has been pushed to "master": > > https://git.libreoffice.org/help/commit/ > e3caa53e99709b7099611b67cf73e9bdbd8801ea > > more (simple) regex examples + fix note on paragraph limitation. tdf#38261, > tdf#159607 Thanks Pierre, documentation and examples getting better, but not sure we yet have quite the correct wording about Regexp matches and Paragraphs. "A search using a regular expression will work only within one paragraph. That is, a \n will match a line break within a paragraph." We can in fact match a string in *every* paragraph of a document in one pass. We just can't match strings at the Paragraph bounds, e.g. a Paragraph ending with xyz$ or starting with ^abc (needing to implement support look-ahead / look-behind syntax to be able to construct a pattern to match).
(In reply to V Stuart Foote from comment #12) thanks Stuart for the feedback, > ... not sure we > yet have quite the correct wording about Regexp matches and Paragraphs. > > "A search using a regular expression will work only within one paragraph. > That is, a \n will match a line break within a paragraph." > > We can in fact match a string in *every* paragraph of a document in one pass. yes. what I'm trying to say, is a regex can't cross OVER the paragraph boundary (see attachment) feel free to suggest a better wording of course. > We just can't match strings at the Paragraph bounds, e.g. a Paragraph ending > with xyz$ or starting with ^abc huh? we definitely can! see 2nd attachment) the only limitation is for "^" to be followed by something, example given in the help. > (needing to implement support look-ahead / > look-behind syntax to be able to construct a pattern to match). PS. this bugzilla is a pain for simple editing. need to upgrade or move to Ask! :/
Created attachment 195994 [details] regex can't cross OVER the paragraph boundary
Created attachment 195995 [details] match strings at the Paragraph bounds, e.g. xyz$ or ^abc
(In reply to fpy from comment #13) > > yes. what I'm trying to say, is a regex can't cross OVER the paragraph > boundary (see attachment) > feel free to suggest a better wording of course. > Yes, exactly. > > We just can't match strings at the Paragraph bounds, e.g. a Paragraph ending > > with xyz$ or starting with ^abc > > huh? we definitely can! see 2nd attachment) > the only limitation is for "^" to be followed by something, example given in > the help. > Sorry, a fingerflub. Knew that was not perfect after I sent it, thought about submitting "s/xyz$ or starting with ^abc/xyz$.*^abc/" correction to merge the strings. But... BZ > > PS. this bugzilla is a pain for simple editing. need to upgrade or move to > Ask! :/ Yep BZ can be tedious/unforgiving, but not sure Ask or SE style would be any sort of improvement for organizing issues that BZ does well. Thanks for working on this.
(In reply to V Stuart Foote from comment #16) > > yes. what I'm trying to say, is a regex can't cross OVER the paragraph > > boundary (see attachment) > > feel free to suggest a better wording of course. > > > > Yes, exactly. And that needs to be the guidance in the the Help articles and userguide--that the regexp pattern can't match OVER|ACROSS bounds between paragraphs (currently, without needed look-behind/look-ahead implementation).