As someone working with gothic fonts I want to be able to search for *exactly* "long s" (letter ſ) without finding "s" and "ß" in order to be able to quickly find wrong occurrences of this letter, e.g. at the end of a word.
How to reproduce:
1. Enter following text in an empty document:
"Ich ſelbſt kann daſ Lied, das ich mit Maß ſinge."
Note that there are several "long s", as used in old German texts. Be sure you have an OpenType font (in order to support this letter, e.g. "Unifraktur Maguntia" font.)
2. Try to automatically replace wrong occurrences of "ſ" at the end of the word. Try to enter following into the Search/replace dialog:
Find: "ſ "
Replace: "s "
You may then set the search to be case sensitive.
- The search finds (and replaces) "ſ " in the fourth word, no more.
- The search finds "ſ ", "s " and even "ß " and would, if automatically executed, replace all "ß " with "s ", which is not intended.
I have pondered the help but found no indication that this "automagical" detection of similar letters can be turned off. Using regular expressions does not help.
One Windows 10 Pro 64-bit en-US with
Version: 126.96.36.199 (x64)
Build ID: f99d75f39f1c57ebdd7ffc5f42867c12031db97a
CPU Threads: 8; OS Version: Windows 6.19; UI Render: GL;
Locale: en-US (en_US)
Works for me with both U+017f<space> and "Match Case" checked, as well with regex "Regular Expresssion" and "Match Case" checked active and search for regex "\u017f\b"
Switching on "Match case" seems indeed to have the desired effect.
(Using regex or not does not make any difference.)
It would be good if this fact gets documented on the corresponding help page since imho the current behaviour is not obvious to users. Should we leave this ticket open, possibly rename it?
The regex "\b" allows you to match at the end of a word, including the last word of a paragraph. What you asked for.
The "Match case":
"Distinguishes between uppercase and lowercase characters."
So, yes there some strange logic regards the lower case Long S ( ſ U+017f), the Small Letter S ( s U+0073) the Capical Letter S ( S U+0053), the Sharp S ( ß U+00df) and its upper case varient ( ẞ U+1e9e). But checking "Match case" allows you to identify the specific glyph you need without that interfeering.
What would you change the help to read, that would be more informative?
On the page "Suchen & Ersetzen" (engl. "Search & Replace") I suggest:
Unterscheidet z wischen Groß- und Kleinbuchstaben. Ist diese Option angehakt, wird auch nicht nachbestimmten Varianten von Buchstaben gesucht (etwa s, ß und langes s wenn "s" eingegeben wurde), sondern nur nach den exakten Buchstaben.
Suggested english text:
.... If enabled this also disables the search for certain letter variants (such as s, ß, and long s when "s" is entred) .
Note that I have the German version of the help, so if the English differs and already contains this hint, just ignore my remark.
It is interesting to see that other letter variants like ŝ or š are not found, so I wonder if there is a documentation of letters considered "equivalent" somewhere?
P.S. Yes, your idea with regex "\b" is good. I had overlooked this one.
S is uppercase of ſ, and SS of ß, so there is no way to distinguish them except by case.
I believe this is related to how Unicode deals with case mapping:
The case of sharp S is defined in:
The most likely explanation is that "Match case" does not use case matching while searching for "s". If the option is disabled, all case matches are considered.
The current help page for LO 7.3 has not been updated yet.
Rafael Lima committed a patch related to this issue.
It has been pushed to "master":
tdf#100480 Clarify the use of "Match case"