Description: I have a text like …Dann müssen wir "antäuschen", dass der… I use a regex search/replace to change "antäuschen" into „antäuschen“ like so: Find: \b"([:alpha:]+)"\b Replace: „$1“ The replacement yields …Dann müssen wir „$1“, dass der… the word also has a conditional hyphen between n and t. Regex is enabled, as is diacritic-sensitive. Steps to Reproduce: 1. write text like in description 2. enter search and replace pattern, check regex and diacritic 3. find next, replace Actual Results: replacement is „$1“ Expected Results: „antäuschen“ Reproducible: Always User Profile Reset: No Additional Info: will try latest version, too
Behaviour seems to depend on the conditional hyphen. However, \b"([:alpha:]+[:cntrl:]*[:alpha:]+)"\b doesn’t help, either. Also, since „Replace“ jumps to the next match, I can’t see if the replacement works as expected.
Happens in 25.2.5.2, too.
note also a soft hyphen is not matched by any character regex (".")
Created attachment 202907 [details] Find.and.Replace.-.Regular.Expression.-.SOFT.HYPHEN.-.U+00AD.odt I attached a sample document with 3 examples: - no hyphen - HYPHEN-MINUS - SOFT HYPHEN --- Comment 0's bug definitely happens in 3rd sentence! - - - I confirm this happens in: Version: 25.8.1.1 (X86_64) Build ID: 54047653041915e595ad4e45cccea684809c77b5 CPU threads: 8; OS: Windows 11 X86_64 (build 22631); UI render: Skia/Vulkan; VCL: win Locale: en-US (en_US); UI: en-US Calc: threaded ... but it looks like the: - SOFT HYPHEN (U+00AD) is handled strangely. Currently, LO can "find" the text, but does not seem to "capture" it into a regex Group. But I think the root cause of this bug is... If "Regular Expression" mode is ON: - `[:alpha:]` SHOULD NOT match the SOFT HYPHEN character. - `[:alpha:]` SHOULD ONLY match alphabetic characters. - SOFT HYPHEN should be treated as a... --- "punctuation mark", roughly equivalent to "a HYPHEN" (U+002D)! - - - STEPS TO REPRODUCE 0. Open attached document. 1. Edit > Find and Replace (Ctrl+H). 2. Expand "Other Options", then make sure these 2 checkboxes are ON: - Regular Expressions - Diacritic-sensitive 3. In the 2 boxes, type: - Find: \b"([:alpha:]+)"\b - Replace: „$1“ 4. Press the "Replace All" button. ACTUAL After pressing "Find All" and/or "Replace All": - 2 hits --- 1st line turned into „antäuschen“ --- 3rd line turned into „$1“ ----- = BUG EXPECTED After pressing "Find All" and/or "Replace All": - 1 hit --- 1st line turned into „antäuschen“ - - - NOTES on Comment 3: Hmmm... very strange. I can get the SOFT HYPHEN to match with a period. For example, do Step 3: - Find: \b"(.+?)"\b - Replace: „$1“ and this will retain the SOFT HYPHEN + any inner text, while flipping the quotes. But if you do this: - Find: \b"an. - Replace: „ZZZ LO will act like the SOFT HYPHEN isn't even there and match/replace both: - "ant - 4 characters - "an-t - 5 characters - '-' = invisible SOFT HYPHEN position Hmmmm... so something weird is definitely going on with the SOFT HYPHEN and regex. It could be because SOFT HYPHEN is a weirdly unique character, acting as "punctuation" AND "a format code" AND is "invisible" at the same time.