111846 – Find & Replace: Rename diacritics and kashida search options

Bug 111846 - Find & Replace: Rename diacritics and kashida search options

Summary: Find & Replace: Rename diacritics and kashida search options

Status:	RESOLVED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	LibreOffice (show other bugs)
Version: (earliest affected)	5.4.0.3 release
Hardware:	All All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:	target:6.0.0
Keywords:

Depends on:
Blocks:	Find&Replace-Dialog
	Show dependency tree / graph

Reported:	2017-08-16 07:56 UTC by Thomas Lendo
Modified:	2023-01-17 14:22 UTC (History)
CC List:	7 users (show)

See Also:	116242 116835 98544 153062
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Thomas Lendo 2017-08-16 07:56:59 UTC

The terms "ignore diacritics" and "ignore kashida" should be renamed in the Find & Replace dialog window.

First because a "positive" description should always be used in the UI instead of a negative connotation as far as possible and a positive connotation is easier to read for the user. For example: "include diacritics" would be faster to be understood than "ignore diacritics" in combination with the user's wish to find (positive connotation) something.

Second because then all extended options ("other options") would have no check mark by default. This would unify the visual appearance of these options and the user could easily see that a check mark would mean a user-defined change.

Comment 1 V Stuart Foote 2017-08-16 13:57:07 UTC

+1, but not just a simple string change. Won't the search logic also need to be toggled?

Comment 2 Thomas Lendo 2017-08-23 20:13:54 UTC

(In reply to V Stuart Foote from comment #1)
> +1, but not just a simple string change. Won't the search logic also need to
> be toggled?
You're right, the logic must be toggled so that "not checkmarked" is the default of this two options.

Comment 3 Heiko Tietze 2017-09-21 13:18:42 UTC

Sounds like an easyhack.

Comment 4 Heiko Tietze 2017-10-02 06:51:13 UTC

Jim, as you were successful with bug 112437 you may want to continue here. Should be even more simple. Feel free to ask about UX/design at #libreoffice-dev (htietze) or regarding development on #libreoffice-dev.

PS: Still in assigned state, is there more work to be done on the other issue?

Comment 5 Jim Raykowski 2017-10-03 01:10:39 UTC

Heiko,

I have made the changes and would like to test them before committing but do not have a document that contains diacritics and kashida. Could you or someone supply a test document?

Comment 6 Heiko Tietze 2017-10-03 05:38:41 UTC

(In reply to Jim Raykowski from comment #5)
> I have made the changes and would like to test them before committing but do
> not have a document that contains diacritics and kashida. Could you or
> someone supply a test document?

Great! (Let me check for more tasks if you are so fast *g*)

A simple document wouldn't help much when you want to search. Diacritics are easy: use the French accent in á (press ' first then a, or just copy/paste). Another character is the circumflex in ê (^ + e). In a sentence with Hêlló Wôèrld you should find this term when [x] Ignore Diacritics is checked (respectively according your changes) but not vice versa. 
https://en.wikipedia.org/wiki/Diacritic

More difficult is the kashida. Wikipedia give the example الحمد vs الحمــــــد (to me it looks like a straight line). When I enter the latter into a text and search for the first it is found only when [x] Ignore Kashida is on.
https://en.wikipedia.org/wiki/Kashida

Comment 7 Heiko Tietze 2017-10-03 05:40:48 UTC

(In reply to Heiko Tietze from comment #6)
> use the French <strike>accent<strike> in á...

"acute" of course

Comment 8 Julien Nabet 2017-10-03 07:00:09 UTC

nitpick: in French, on "a" there's never acute accent but a grave accent "à"
Idem for "u" -> "ù"
On "e", you can have both : "é" and "è" and even circumflex "ê"

Comment 9 Jim Raykowski 2017-10-03 08:52:58 UTC

Thanks for the tip on how to use the keyboard to enter diacritics. I actually used the insert special characters from the standard toolbar to do some testing and also found the insert->special character... in the menu. 

This task has given me appreciation for Ignore.

What follows is an attempt to relate my findings.

Using test document with multiple occurrences of 'Atest' with and without diacritic above the A.

-'Atest' with diacritic entered in the Find: edit box
--'Include diacritics' checked 
'Find Next' moves to the next occurrence of 'Atest' with diacritic and skips any 'Atest' without diacritic. 
--'Include diacritics' unchecked
Find Next' moves to next occurrence of 'Atest' with or without diacritic. 

-'Atest' without diacritic entered in the Find: edit box
--'Include diacritics' checked
'Find Next' moves to the next occurrence of 'Atest' without diacritic and skips any 'Atest' with diacritic.
--'Include diacritics' unchecked
'Find Next' moves to the next occurrence of 'Atest' with or without diacritic.

It seems Include is not the logical opposite of Ignore. 
Looking forward to thoughts on this.

Comment 10 Heiko Tietze 2017-10-03 09:11:26 UTC

The request is to rename the labels and to invert the internal logic so that everything behaves as before. 

Current situation
[ ] Ignore diacritics: Hello is not found in "Héllo World"
[x] Ignore diacritics: Hello will be found in "Héllo World" 

Changed scenario
[x] Exact diacritics: Hello is not found in "Héllo World"
[ ] Exact diacritics: Hello will be found in "Héllo World"

(first option is the default)

I'm not a native speaker and perhaps someone has better ideas of 'use diacritics in the search'. Perhaps 'Consider diacritics'.

The same applies to kashida, where 'exact' sounds weird to me.

Comment 11 Thomas Lendo 2017-10-03 09:59:44 UTC

(In reply to Heiko Tietze from comment #10)
> Current situation
> [ ] Ignore diacritics: Hello is not found in "Héllo World"
> [x] Ignore diacritics: Hello will be found in "Héllo World" <--- default
> 
> Changed scenario
> [x] Exact diacritics: Hello is not found in "Héllo World"
> [ ] Exact diacritics: Hello will be found in "Héllo World" <--- default
> 
> (first option is the default)
Just to clarify, the second line of each situation is the default.

Off topic:
Interesting for me is that in German diacritics are the umlauts ö, ü, ä--but I never stumbled over this search option because German has so few words whose only difference is a diacritic character and mostly a difference is existing between standard language and dialects (e.g. hupfen vs. hüpfen).

Comment 12 Jim Raykowski 2017-10-03 18:09:57 UTC

Heiko your explanation of requirements and clear test cases are much appreciated.

As originally proposed by Thomas "Include" seems to be the correct word to use here.

Current behavior of changes made - 

[ ] Include diacritics: Hello is found in "Héllo World"
                        Hello is found in "Hello World"
[X] Include diacritics: Hello is not found in "Héllo World"
                        Hello is found in "Hello World" 

[ ] Include diacritics: Héllo is found in "Héllo World"
                        Héllo is found in "Hello World"                         
[X] Include diacritics: Héllo is found in "Héllo World"
                        Héllo is not found in "Hello World" 

Correct? 

Is this similar to what would be considered a unit test?

Comment 13 Heiko Tietze 2017-10-03 18:52:24 UTC

(In reply to Jim Raykowski from comment #12)
> Correct? 

Yes, that's the current behavior. Because checkboxes should phrase the action in a positive way like "[x] Save on Close" instead of "[ ] Don't Save on Exit" (kind of double negation) the proposal was to rephrase and to invert the logic accordingly.

> Is this similar to what would be considered a unit test?

I'm not a programmer. Shinnok, is it?

Comment 14 Jim Raykowski 2017-10-03 22:18:33 UTC

link to commit

https://gerrit.libreoffice.org/#/c/43103/ 

Is there an easier way to change the commit message than to amend and commit again?

Comment 15 Heiko Tietze 2017-10-04 07:06:54 UTC Comment hidden (off-topic)

(In reply to Jim Raykowski from comment #14)
> Is there an easier way to change the commit message than to amend and commit
> again?

In Gerrit, click the first list item 'commit message' and go into edit mode by clicking the icon next of 'Patch set x'.

Comment 16 Khaled Hosny 2017-10-04 10:51:43 UTC

I don’t agree with the proposed change here or its rationale. “Ignore diacritics/kashida” is pretty clear, while ”Include diacritic/kashida” is ambiguous and it is not clear at all what kind of inclusion is supposed to happen. It is also pretty much a standard term: https://www.google.com.eg/search?q=%22ignore+diacritics%22&oq=%22ignore+diacritics%22, which can not be seen about the proposal here.

I think we are trying to fix a non-issue here.

Comment 17 Heiko Tietze 2017-10-04 11:25:31 UTC

(In reply to Khaled Hosny from comment #16)
> It is also pretty much a standard term:
> https://www.google.com.eg/
> search?q=%22ignore+diacritics%22&oq=%22ignore+diacritics%22 ...

The opposite search reveals a comparable number of results, the same for "diacritic-sensitive" vs. "diacritic-insensitive". 

Would you prefer "[x] case-insensitive" over "[ ] case-sensitive" (we use '[ ] Match case')? I'm 60/40. 

(The case-insensitive term could be an alternative to 'ignore' when we decide to keep the current logic.)

I've put the topic onto the design team agenda.

Comment 18 Khaled Hosny 2017-10-04 11:52:12 UTC

(In reply to Heiko Tietze from comment #17)
> (In reply to Khaled Hosny from comment #16)
> > It is also pretty much a standard term:
> > https://www.google.com.eg/
> > search?q=%22ignore+diacritics%22&oq=%22ignore+diacritics%22 ...
> 
> The opposite search reveals a comparable number of results, 

Searching for https://www.google.com.eg/search?q=%22include+diacritics%22 does not show any thing relevant.

> the same for
> "diacritic-sensitive" vs. "diacritic-insensitive". 
> 
> Would you prefer "[x] case-insensitive" over "[ ] case-sensitive" (we use '[
> ] Match case')? I'm 60/40.

I think the -ve/+ve is a red-herring, whatever popular term should be used. Consistency just for the sake of it is meaningless.

Comment 19 Thomas Lendo 2017-10-04 18:06:23 UTC

The 2 main reasons are stated in the initial post. From the UX point of view it's worth to discuss that and to take it into consideration.

Many search results in Google of "ignore diacritics" are developer-related. The search functionality of LibO is used by non-developers mostly. Also "include" is only a suggestion--a good and better term has to be found by English native speakers so that it will be understand immediately.

Comment 20 Khaled Hosny 2017-10-05 00:28:41 UTC

(In reply to Thomas Lendo from comment #19)
> The 2 main reasons are stated in the initial post. From the UX point of view
> it's worth to discuss that and to take it into consideration.
> 
> Many search results in Google of "ignore diacritics" are developer-related.
> The search functionality of LibO is used by non-developers mostly. Also
> "include" is only a suggestion--a good and better term has to be found by
> English native speakers so that it will be understand immediately.

Inventing new jargon is unlikely to help users, unlike sticking to existing nomenclature. That is similar to the never ending attempts to replace the floppy disk save icon with some of thing modern just because someone thinks this will help users who never saw an actual floppy disk.

Comment 21 Michael Meeks 2017-10-05 12:04:59 UTC

When it comes to Kashida - Khaled's opinion has the huge up-side of
being from someone who actually uses that -a-lot- ;-) what with being
an expert in this area. I also don't believe that the term 'Kashida'
or 'Diacritic' are going to be instantly obvious to any native English
person (FWIW) - just my 2 cents =)

Comment 22 Heiko Tietze 2017-10-26 15:16:18 UTC

We discussed the topic in the design team and decided to use the patch. The double negation with "[x] Ignore <foo>" to disable a function deteriorates the usability. Suggestion is, however, to rename the function into "[ ] <Foo>-sensitive" to improve familiarity.

Comment 23 Commit Notification 2017-10-26 15:25:34 UTC

Jim Raykowski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=d4064927a2e83c974d4ee9538081e8a4fcdb1e34

tdf#111846 Find & Replace: Rename diacritics and kashida options

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 24 Adolfo Jayme Barrientos 2017-10-26 21:46:49 UTC

I don’t like the wording used in the patch. The XP XPS Viewer has “Include kashida” and Office 2010 has “Match kashida” [1], which for me are better wording choices.

[1] https://blogs.technet.microsoft.com/office_global_experience/2010/08/11/find-and-replace-for-the-arabic-script/

Comment 25 Heiko Tietze 2017-10-26 22:20:30 UTC

(In reply to Adolfo Jayme from comment #24)
> The XP XPS Viewer has “Include kashida” and Office 2010 has “Match kashida” [1], 
> which for me are better wording choices.

Include Diacritics/Kashida was the first proposal and rejected here. MSDN writes about Diacritics Sensitivity and IIRC some other big players too. 

(OT: This topic was like some other for a couple of weeks on the agenda and no one commented. It's better to discuss beforehand the patch is submitted.)

Comment 26 V Stuart Foote 2018-03-06 20:31:29 UTC

Have a little glitch with this change, see bug 116242

Find & Replace searches now break without checking the Diacritic-sensitive checkbox, but that gets hidden when no CTL language is checked enabled.

Comment 27 Eike Rathke (retired, only occasionally showing up) 2018-03-07 19:47:12 UTC

That's not just a little glitch, that's a crunchy bug, and also the default presets were wrongly chosen, clearly one normally does not want to ignore diacritics. Additionally having Not-Diacritic-sensitive and Not-Kashida-sensitive being the implied defaults for every search adds specifically for ignore-diacritic a heavy extra performance penalty transliteration decomposing every text first to normalize diacritics.

Comment 28 Khaled Hosny 2018-03-07 20:44:16 UTC

(In reply to Eike Rathke from comment #27)
> That's not just a little glitch, that's a crunchy bug, and also the default
> presets were wrongly chosen, clearly one normally does not want to ignore
> diacritics.

Not in Arabic or in languages where diacritics are not parts of the letters (in Arabic خالد and خَالِدْ are the same word). It is just like case-insensitive search being the default.

> Additionally having Not-Diacritic-sensitive and
> Not-Kashida-sensitive being the implied defaults for every search adds
> specifically for ignore-diacritic a heavy extra performance penalty
> transliteration decomposing every text first to normalize diacritics.

I didn’t notice any performance difference last I tried this, but I didn’t do any actual performance testing.

Comment 29 Mihkel Tõnnov 2018-04-05 18:16:59 UTC

(In reply to Khaled Hosny from comment #28)
> (In reply to Eike Rathke from comment #27)
> > That's not just a little glitch, that's a crunchy bug, and also the default
> > presets were wrongly chosen, clearly one normally does not want to ignore
> > diacritics.
> 
> Not in Arabic or in languages where diacritics are not parts of the letters
> (in Arabic خالد and خَالِدْ are the same word). It is just like
> case-insensitive search being the default.

Could we have a per-language/locale default settings for these two options perhaps, regardless of how they are phrased? There are quite big differences in the "status" of letters with diacritics (or "diacritics") also among languages that are written in Latin script.

On the one side, there are languages like English, French, and German, where ä, ö, ü, é/ê/è/ë etc. are considered variations of the "base" letter (so a/ä, o/ö, etc. are also collated together in dictionaries).

On the other side are languages like Estonian, Finnish, Icelandic, Swedish, Latvian, Hungarian, Polish, where ä, å, á, ā etc. are considered to be separate letters in their own right, and therefore shouldn't be ignored/merged during searching, at least not by default.

For instance in Estonian, treating a/ä, o/õ/ö, u/ü, s/š, z/ž as equivalent makes almost* zero sense - when searching for either of the words in pairs like laas/lääs, too/töö, loog/lõõg, sokk/šokk, the other one should not be matched. Similar principle applies in the other languages I mentioned, so the current default setting is completely counter-intuitive for many users.

* "Almost zero" because treating õ/ö as equal might make sense in historical texts (but that's a rather marginal usecase), and treating z/ž as equal makes some sense because z/ž could only ever be confused in loanwords and foreign names, e.g. people might not know without a dictionary if the Croatian capital is called Zagreb or Žagreb. But that's minutia.

...And then there are cases like Lithuanian, where ą is considered independent letter, while ã/à are considered variants of a, and are mainly used in dictionaries to indicate stress/length.

I'll open a new enhancement request about this.