Bug 143128 - Searching for German sharp s / eszett (ß,ẞ) provides wrong results
Summary: Searching for German sharp s / eszett (ß,ẞ) provides wrong results
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Eike Rathke
URL:
Whiteboard: target:7.3.0 target:7.2.5
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-30 10:13 UTC by Stephan
Modified: 2021-12-06 13:28 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Example (41.76 KB, image/png)
2021-11-09 08:40 UTC, Heiko Tietze
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stephan 2021-06-30 10:13:13 UTC
Description:
The search engine of Libreoffice Writer shows search results that do not match to the search string. Searching for "ä" shows a match with every "a" in the text, searching for "ß" shows a match with every "ss" in the text, which makes it unusable in german language.


Actual Results:
Searching for "ä" shows a match with every "a" in the text, searching for "ß" shows a match with every "ss" in the text, which makes it unusable in german language.

Expected Results:
Searching for "ä" should only show search results with "ä" and so on.



Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 Christian Lohmaier 2021-06-30 10:29:09 UTC
Notabug.

ß also matching for ss is intentional/if you don't want that you need to enable case-sensitive matching and is an old feature

Allowing umlauts to also match the base characters is a relatively new feature (but is not the default). Check the "Diaktritisch-sensitiv" option in the search dialog to only find ä / ö.
Comment 2 Stephan 2021-06-30 14:56:41 UTC
ä, ö, ü
"Check the "Diaktritisch-sensitiv" option in the search dialog to only find ä / ö."

This works, thank you.

but:

ß
"ß also matching for ss is intentional/if you don't want that you need to enable case-sensitive matching"

This is not good. When I have to activate case-sensitive matching to get only the real "ß", I miss all hits of a word at the beginning of a sentence: "daß" will be found, "Daß" will not. It is not practicable to search for the letter "ß" isolated, especially in longer text, since in many words it is correct.

Why not activate matching for "ß" with the button "diakritisch-sensitiv", too? Since there is no capital ß in typography, the letter ß is never case-sensitive...
Comment 3 Peter Wiegel 2021-07-14 09:04:21 UTC
That's wrong, there is an uppeercase ß in typography with unicode 1E9E.

The international standard associated with Unicode (UCS), ISO/IEC 10646, was updated to reflect the addition on 24 June 2008. The capital ß (ẞ) was finally adopted as an option in standard German orthography in 2017.

Therefore case sensitiv search will not be suitable anymore.
Comment 4 Stephan 2021-07-14 13:33:21 UTC
Have you ever tried to type an uppercase "ß" within Libreoffice Writer? 
Have you ever known anybody doing so in his text?

So, at least those who know the ISO/IEC 10646 will find "ß" via Writer.

Impractical.
Comment 5 Heiko Tietze 2021-11-09 08:40:59 UTC
Created attachment 176159 [details]
Example

Both ß and ss are found unless match case is checked.
Comment 6 Heiko Tietze 2021-11-09 08:42:07 UTC
Eike, is there a reason for this ß/ss not being diacriticaly distinguished situation?
Comment 7 Eike Rathke 2021-11-09 17:33:07 UTC
ß/ss has *nothing* to do with diacritics / diacritical marks, only with case-(in)sensitivity and replacements. While the uppercase ẞ exists in German orthography (btw it does not in de-CH Switzerland) the ⟨SS⟩ uppercase replacement and even the ⟨ss⟩ lowercase is still commonly used, especially if the letter ⟨ß⟩ is not available and the uppercase ⟨ẞ⟩ is even less used. Additionally in Swiss Standard German it is "⟨ss⟩ usually replaces every ⟨ß⟩". Also, many software applications convert a lowercase ⟨ß⟩ to uppercase ⟨SS⟩.

Finding those variants when searching case-insensitive seems logical to me.

However, not finding uppercase ẞ when searching for lowercase ß case-insensitive, and vice versa, is a bug.
Comment 8 Eike Rathke 2021-11-09 17:38:12 UTC
For "replacements" read "transliteration equivalents".
Comment 9 Commit Notification 2021-11-11 00:28:29 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/07a2afa4904ac51c9c61aaf41a9d6c7d41126531

Resolves: tdf#110003 tdf#143128 handle lowercase ß vs uppercase ẞ folding

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Commit Notification 2021-11-11 09:40:33 UTC
Xisco Fauli committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/9f8e4bd8501e9bb9e286cffed5f35d0f0075e9b8

tdf#143128: sw: Add UItest

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2021-11-11 14:30:55 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-7-2":

https://git.libreoffice.org/core/commit/b7c707223cb44f9423294295ac5d04cc1e2314a2

Resolves: tdf#110003 tdf#143128 handle lowercase ß vs uppercase ẞ folding

It will be available in 7.2.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Christian Lohmaier 2021-12-06 13:28:48 UTC
7.2.4 was a hotfix release, updating target in status-whiteboard