Bug 167871 - RegEx search and replace inserts literal $1 instead of match
Summary: RegEx search and replace inserts literal $1 instead of match
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
24.2.4.2 release
Hardware: All Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Find&Replace-Regex
  Show dependency treegraph
 
Reported: 2025-08-08 15:35 UTC by R H
Modified: 2025-09-22 13:49 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Find.and.Replace.-.Regular.Expression.-.SOFT.HYPHEN.-.U+00AD.odt (12.84 KB, application/vnd.oasis.opendocument.text)
2025-09-19 16:35 UTC, Tex2002ans
Details

Note You need to log in before you can comment on or make changes to this bug.
Description R H 2025-08-08 15:35:03 UTC
Description:
I have a text like
…Dann müssen wir "an­täuschen", dass der…

I use a regex search/replace to change "an­täuschen" into „antäuschen“ like so:

Find: \b"([:alpha:]+)"\b
Replace: „$1“

The replacement yields

…Dann müssen wir „$1“, dass der…

the word also has a conditional hyphen between n and t.
Regex is enabled, as is diacritic-sensitive.

Steps to Reproduce:
1. write text like in description
2. enter search and replace pattern, check regex and diacritic
3. find next, replace

Actual Results:
replacement is „$1“

Expected Results:
„antäuschen“


Reproducible: Always


User Profile Reset: No

Additional Info:
will try latest version, too
Comment 1 R H 2025-08-08 15:56:53 UTC
Behaviour seems to depend on the conditional hyphen. 

However, \b"([:alpha:]+[:cntrl:]*[:alpha:]+)"\b doesn’t help, either.

Also, since „Replace“ jumps to the next match, I can’t see if the replacement works as expected.
Comment 2 R H 2025-08-08 15:57:35 UTC
Happens in 25.2.5.2, too.
Comment 3 fpy 2025-08-09 01:11:26 UTC
note also a soft hyphen is not matched by any character regex (".")
Comment 4 Tex2002ans 2025-09-19 16:35:28 UTC
Created attachment 202907 [details]
Find.and.Replace.-.Regular.Expression.-.SOFT.HYPHEN.-.U+00AD.odt

I attached a sample document with 3 examples:

- no hyphen
- HYPHEN-MINUS
- SOFT HYPHEN
--- Comment 0's bug definitely happens in 3rd sentence!

- - -

I confirm this happens in:

Version: 25.8.1.1 (X86_64)
Build ID: 54047653041915e595ad4e45cccea684809c77b5
CPU threads: 8; OS: Windows 11 X86_64 (build 22631); UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

... but it looks like the:

- SOFT HYPHEN (U+00AD)

is handled strangely.

Currently, LO can "find" the text, but does not seem to "capture" it into a regex Group.

But I think the root cause of this bug is...

If "Regular Expression" mode is ON:

- `[:alpha:]` SHOULD NOT match the SOFT HYPHEN character.
- `[:alpha:]` SHOULD ONLY match alphabetic characters.
- SOFT HYPHEN should be treated as a...
--- "punctuation mark", roughly equivalent to "a HYPHEN" (U+002D)!

- - -

STEPS TO REPRODUCE

0. Open attached document.

1. Edit > Find and Replace (Ctrl+H).

2. Expand "Other Options", then make sure these 2 checkboxes are ON:

- Regular Expressions
- Diacritic-sensitive

3. In the 2 boxes, type:

- Find: \b"([:alpha:]+)"\b
- Replace: „$1“

4. Press the "Replace All" button.

ACTUAL

After pressing "Find All" and/or "Replace All":

- 2 hits
--- 1st line turned into „antäuschen“
--- 3rd line turned into „$1“
----- = BUG

EXPECTED

After pressing "Find All" and/or "Replace All":

- 1 hit
--- 1st line turned into „antäuschen“

- - -

NOTES on Comment 3:

Hmmm... very strange.

I can get the SOFT HYPHEN to match with a period.

For example, do Step 3:

- Find: \b"(.+?)"\b
- Replace: „$1“

and this will retain the SOFT HYPHEN + any inner text, while flipping the quotes.

But if you do this:

- Find: \b"an.
- Replace: „ZZZ

LO will act like the SOFT HYPHEN isn't even there and match/replace both:

- "ant
   - 4 characters
- "an-t
   - 5 characters
   - '-' = invisible SOFT HYPHEN position

Hmmmm... so something weird is definitely going on with the SOFT HYPHEN and regex.

It could be because SOFT HYPHEN is a weirdly unique character, acting as "punctuation" AND "a format code" AND is "invisible" at the same time.