The terms "ignore diacritics" and "ignore kashida" should be renamed in the Find & Replace dialog window. First because a "positive" description should always be used in the UI instead of a negative connotation as far as possible and a positive connotation is easier to read for the user. For example: "include diacritics" would be faster to be understood than "ignore diacritics" in combination with the user's wish to find (positive connotation) something. Second because then all extended options ("other options") would have no check mark by default. This would unify the visual appearance of these options and the user could easily see that a check mark would mean a user-defined change.
+1, but not just a simple string change. Won't the search logic also need to be toggled?
(In reply to V Stuart Foote from comment #1) > +1, but not just a simple string change. Won't the search logic also need to > be toggled? You're right, the logic must be toggled so that "not checkmarked" is the default of this two options.
Sounds like an easyhack.
Jim, as you were successful with bug 112437 you may want to continue here. Should be even more simple. Feel free to ask about UX/design at #libreoffice-dev (htietze) or regarding development on #libreoffice-dev. PS: Still in assigned state, is there more work to be done on the other issue?
Heiko, I have made the changes and would like to test them before committing but do not have a document that contains diacritics and kashida. Could you or someone supply a test document?
(In reply to Jim Raykowski from comment #5) > I have made the changes and would like to test them before committing but do > not have a document that contains diacritics and kashida. Could you or > someone supply a test document? Great! (Let me check for more tasks if you are so fast *g*) A simple document wouldn't help much when you want to search. Diacritics are easy: use the French accent in á (press ' first then a, or just copy/paste). Another character is the circumflex in ê (^ + e). In a sentence with Hêlló Wôèrld you should find this term when [x] Ignore Diacritics is checked (respectively according your changes) but not vice versa. https://en.wikipedia.org/wiki/Diacritic More difficult is the kashida. Wikipedia give the example الحمد vs الحمــــــد (to me it looks like a straight line). When I enter the latter into a text and search for the first it is found only when [x] Ignore Kashida is on. https://en.wikipedia.org/wiki/Kashida
(In reply to Heiko Tietze from comment #6) > use the French <strike>accent<strike> in á... "acute" of course
nitpick: in French, on "a" there's never acute accent but a grave accent "à" Idem for "u" -> "ù" On "e", you can have both : "é" and "è" and even circumflex "ê"
Thanks for the tip on how to use the keyboard to enter diacritics. I actually used the insert special characters from the standard toolbar to do some testing and also found the insert->special character... in the menu. This task has given me appreciation for Ignore. What follows is an attempt to relate my findings. Using test document with multiple occurrences of 'Atest' with and without diacritic above the A. -'Atest' with diacritic entered in the Find: edit box --'Include diacritics' checked 'Find Next' moves to the next occurrence of 'Atest' with diacritic and skips any 'Atest' without diacritic. --'Include diacritics' unchecked Find Next' moves to next occurrence of 'Atest' with or without diacritic. -'Atest' without diacritic entered in the Find: edit box --'Include diacritics' checked 'Find Next' moves to the next occurrence of 'Atest' without diacritic and skips any 'Atest' with diacritic. --'Include diacritics' unchecked 'Find Next' moves to the next occurrence of 'Atest' with or without diacritic. It seems Include is not the logical opposite of Ignore. Looking forward to thoughts on this.
The request is to rename the labels and to invert the internal logic so that everything behaves as before. Current situation [ ] Ignore diacritics: Hello is not found in "Héllo World" [x] Ignore diacritics: Hello will be found in "Héllo World" Changed scenario [x] Exact diacritics: Hello is not found in "Héllo World" [ ] Exact diacritics: Hello will be found in "Héllo World" (first option is the default) I'm not a native speaker and perhaps someone has better ideas of 'use diacritics in the search'. Perhaps 'Consider diacritics'. The same applies to kashida, where 'exact' sounds weird to me.
(In reply to Heiko Tietze from comment #10) > Current situation > [ ] Ignore diacritics: Hello is not found in "Héllo World" > [x] Ignore diacritics: Hello will be found in "Héllo World" <--- default > > Changed scenario > [x] Exact diacritics: Hello is not found in "Héllo World" > [ ] Exact diacritics: Hello will be found in "Héllo World" <--- default > > (first option is the default) Just to clarify, the second line of each situation is the default. Off topic: Interesting for me is that in German diacritics are the umlauts ö, ü, ä--but I never stumbled over this search option because German has so few words whose only difference is a diacritic character and mostly a difference is existing between standard language and dialects (e.g. hupfen vs. hüpfen).
Heiko your explanation of requirements and clear test cases are much appreciated. As originally proposed by Thomas "Include" seems to be the correct word to use here. Current behavior of changes made - [ ] Include diacritics: Hello is found in "Héllo World" Hello is found in "Hello World" [X] Include diacritics: Hello is not found in "Héllo World" Hello is found in "Hello World" [ ] Include diacritics: Héllo is found in "Héllo World" Héllo is found in "Hello World" [X] Include diacritics: Héllo is found in "Héllo World" Héllo is not found in "Hello World" Correct? Is this similar to what would be considered a unit test?
(In reply to Jim Raykowski from comment #12) > Correct? Yes, that's the current behavior. Because checkboxes should phrase the action in a positive way like "[x] Save on Close" instead of "[ ] Don't Save on Exit" (kind of double negation) the proposal was to rephrase and to invert the logic accordingly. > Is this similar to what would be considered a unit test? I'm not a programmer. Shinnok, is it?
link to commit https://gerrit.libreoffice.org/#/c/43103/ Is there an easier way to change the commit message than to amend and commit again?
(In reply to Jim Raykowski from comment #14) > Is there an easier way to change the commit message than to amend and commit > again? In Gerrit, click the first list item 'commit message' and go into edit mode by clicking the icon next of 'Patch set x'.
I don’t agree with the proposed change here or its rationale. “Ignore diacritics/kashida” is pretty clear, while ”Include diacritic/kashida” is ambiguous and it is not clear at all what kind of inclusion is supposed to happen. It is also pretty much a standard term: https://www.google.com.eg/search?q=%22ignore+diacritics%22&oq=%22ignore+diacritics%22, which can not be seen about the proposal here. I think we are trying to fix a non-issue here.
(In reply to Khaled Hosny from comment #16) > It is also pretty much a standard term: > https://www.google.com.eg/ > search?q=%22ignore+diacritics%22&oq=%22ignore+diacritics%22 ... The opposite search reveals a comparable number of results, the same for "diacritic-sensitive" vs. "diacritic-insensitive". Would you prefer "[x] case-insensitive" over "[ ] case-sensitive" (we use '[ ] Match case')? I'm 60/40. (The case-insensitive term could be an alternative to 'ignore' when we decide to keep the current logic.) I've put the topic onto the design team agenda.
(In reply to Heiko Tietze from comment #17) > (In reply to Khaled Hosny from comment #16) > > It is also pretty much a standard term: > > https://www.google.com.eg/ > > search?q=%22ignore+diacritics%22&oq=%22ignore+diacritics%22 ... > > The opposite search reveals a comparable number of results, Searching for https://www.google.com.eg/search?q=%22include+diacritics%22 does not show any thing relevant. > the same for > "diacritic-sensitive" vs. "diacritic-insensitive". > > Would you prefer "[x] case-insensitive" over "[ ] case-sensitive" (we use '[ > ] Match case')? I'm 60/40. I think the -ve/+ve is a red-herring, whatever popular term should be used. Consistency just for the sake of it is meaningless.
The 2 main reasons are stated in the initial post. From the UX point of view it's worth to discuss that and to take it into consideration. Many search results in Google of "ignore diacritics" are developer-related. The search functionality of LibO is used by non-developers mostly. Also "include" is only a suggestion--a good and better term has to be found by English native speakers so that it will be understand immediately.
(In reply to Thomas Lendo from comment #19) > The 2 main reasons are stated in the initial post. From the UX point of view > it's worth to discuss that and to take it into consideration. > > Many search results in Google of "ignore diacritics" are developer-related. > The search functionality of LibO is used by non-developers mostly. Also > "include" is only a suggestion--a good and better term has to be found by > English native speakers so that it will be understand immediately. Inventing new jargon is unlikely to help users, unlike sticking to existing nomenclature. That is similar to the never ending attempts to replace the floppy disk save icon with some of thing modern just because someone thinks this will help users who never saw an actual floppy disk.
When it comes to Kashida - Khaled's opinion has the huge up-side of being from someone who actually uses that -a-lot- ;-) what with being an expert in this area. I also don't believe that the term 'Kashida' or 'Diacritic' are going to be instantly obvious to any native English person (FWIW) - just my 2 cents =)
We discussed the topic in the design team and decided to use the patch. The double negation with "[x] Ignore <foo>" to disable a function deteriorates the usability. Suggestion is, however, to rename the function into "[ ] <Foo>-sensitive" to improve familiarity.
Jim Raykowski committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=d4064927a2e83c974d4ee9538081e8a4fcdb1e34 tdf#111846 Find & Replace: Rename diacritics and kashida options It will be available in 6.0.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
I don’t like the wording used in the patch. The XP XPS Viewer has “Include kashida” and Office 2010 has “Match kashida” [1], which for me are better wording choices. [1] https://blogs.technet.microsoft.com/office_global_experience/2010/08/11/find-and-replace-for-the-arabic-script/
(In reply to Adolfo Jayme from comment #24) > The XP XPS Viewer has “Include kashida” and Office 2010 has “Match kashida” [1], > which for me are better wording choices. Include Diacritics/Kashida was the first proposal and rejected here. MSDN writes about Diacritics Sensitivity and IIRC some other big players too. (OT: This topic was like some other for a couple of weeks on the agenda and no one commented. It's better to discuss beforehand the patch is submitted.)
Have a little glitch with this change, see bug 116242 Find & Replace searches now break without checking the Diacritic-sensitive checkbox, but that gets hidden when no CTL language is checked enabled.
That's not just a little glitch, that's a crunchy bug, and also the default presets were wrongly chosen, clearly one normally does not want to ignore diacritics. Additionally having Not-Diacritic-sensitive and Not-Kashida-sensitive being the implied defaults for every search adds specifically for ignore-diacritic a heavy extra performance penalty transliteration decomposing every text first to normalize diacritics.
(In reply to Eike Rathke from comment #27) > That's not just a little glitch, that's a crunchy bug, and also the default > presets were wrongly chosen, clearly one normally does not want to ignore > diacritics. Not in Arabic or in languages where diacritics are not parts of the letters (in Arabic خالد and خَالِدْ are the same word). It is just like case-insensitive search being the default. > Additionally having Not-Diacritic-sensitive and > Not-Kashida-sensitive being the implied defaults for every search adds > specifically for ignore-diacritic a heavy extra performance penalty > transliteration decomposing every text first to normalize diacritics. I didn’t notice any performance difference last I tried this, but I didn’t do any actual performance testing.
(In reply to Khaled Hosny from comment #28) > (In reply to Eike Rathke from comment #27) > > That's not just a little glitch, that's a crunchy bug, and also the default > > presets were wrongly chosen, clearly one normally does not want to ignore > > diacritics. > > Not in Arabic or in languages where diacritics are not parts of the letters > (in Arabic خالد and خَالِدْ are the same word). It is just like > case-insensitive search being the default. Could we have a per-language/locale default settings for these two options perhaps, regardless of how they are phrased? There are quite big differences in the "status" of letters with diacritics (or "diacritics") also among languages that are written in Latin script. On the one side, there are languages like English, French, and German, where ä, ö, ü, é/ê/è/ë etc. are considered variations of the "base" letter (so a/ä, o/ö, etc. are also collated together in dictionaries). On the other side are languages like Estonian, Finnish, Icelandic, Swedish, Latvian, Hungarian, Polish, where ä, å, á, ā etc. are considered to be separate letters in their own right, and therefore shouldn't be ignored/merged during searching, at least not by default. For instance in Estonian, treating a/ä, o/õ/ö, u/ü, s/š, z/ž as equivalent makes almost* zero sense - when searching for either of the words in pairs like laas/lääs, too/töö, loog/lõõg, sokk/šokk, the other one should not be matched. Similar principle applies in the other languages I mentioned, so the current default setting is completely counter-intuitive for many users. * "Almost zero" because treating õ/ö as equal might make sense in historical texts (but that's a rather marginal usecase), and treating z/ž as equal makes some sense because z/ž could only ever be confused in loanwords and foreign names, e.g. people might not know without a dictionary if the Croatian capital is called Zagreb or Žagreb. But that's minutia. ...And then there are cases like Lithuanian, where ą is considered independent letter, while ã/à are considered variants of a, and are mainly used in dictionaries to indicate stress/length. I'll open a new enhancement request about this.