Bug 150693 - Confusing language in Similarity Search
Summary: Confusing language in Similarity Search
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.3.6.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: difficultyBeginner, easyHack, skillDesign, topicUI
Depends on:
Blocks: Writer-UX Find&Replace-Dialog
  Show dependency treegraph
 
Reported: 2022-08-30 17:25 UTC by Tuomas Hietala
Modified: 2022-09-10 03:49 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tuomas Hietala 2022-08-30 17:25:34 UTC
Description:
The Similarity Search dialog has some confusing language.

"Exchange characters" does not really mean that characters will be exchanged. The actual meaning is "(the maximum number of) exchangeable characters".
https://translations.documentfoundation.org/translate/libo_ui-master/cuimessages/en/?checksum=cc0ffa5f06739be3

"Add characters" does not really mean that characters will be added. The actual meaning is "(the maximum number of) additional characters".
https://translations.documentfoundation.org/translate/libo_ui-master/cuimessages/en/?checksum=64413dc15303310b

"Remove characters" does not really mean that characters will be removed. The actual meaning is "(the maximum number of) missing characters".
https://translations.documentfoundation.org/translate/libo_ui-master/cuimessages/en/?checksum=d0c4cae6b614b3eb

See Help for more information:
https://help.libreoffice.org/latest/en-US/text/shared/01/02100100.html


Steps to Reproduce:
1. In Writer, go to Edit -> Find & Replace
2. Check the Similarity search check box
3. Click the Similarities... button.


Actual Results:
The strings "Exchance characters", "Add characters" and "Remove characters" are used in the UI.

Expected Results:
More descriptive strings such as "Exchangeable characters:", "Additional characters:" and "Missing characters:" are used instead.


Reproducible: Always


User Profile Reset: No



Additional Info:
n/a
Comment 1 Heiko Tietze 2022-08-31 08:38:21 UTC
No strong opinions from my side.
Comment 2 Rafael Lima 2022-08-31 22:05:39 UTC
Here the use of tooltips is very helpful. But I would change the first one (as well as the description in the help page)

For "Exchange characters" the tooltip is "Enter the number of characters in the search term that can be exchanged". Maybe a better tooltip would be "Enter the number of characters that can differ from the search term"

Here are some suggestions for the labels:
"Number of different characters"
"Number of additional characters"
"Number of missing characters"

Or maybe a reduced version

"Different by [ ] characters"
"   Larger by [ ] characters"
"  Shorter by [ ] characters"

Where [ ] is where the entry box is positioned.
Comment 3 Tuomas Hietala 2022-09-01 16:54:16 UTC
(In reply to Rafael Lima from comment #2)
> Here the use of tooltips is very helpful. But I would change the first one
> (as well as the description in the help page)
> 
> For "Exchange characters" the tooltip is "Enter the number of characters in
> the search term that can be exchanged". Maybe a better tooltip would be
> "Enter the number of characters that can differ from the search term"
> 
> Here are some suggestions for the labels:
> "Number of different characters"
> "Number of additional characters"
> "Number of missing characters"

I agree that "different" is a better word here.

> Or maybe a reduced version
> 
> "Different by [ ] characters"
> "   Larger by [ ] characters"
> "  Shorter by [ ] characters"
> 
> Where [ ] is where the entry box is positioned.

This would work well for English (and many other languages), but not necessarily for all languages. I think hardcoding a particular sentence structure in the UI should be avoided.
Comment 4 Heiko Tietze 2022-09-02 07:14:57 UTC
(In reply to Tuomas Hietala from comment #3)
> > "Different by [ ] characters"
> 
> This would work well for English (and many other languages), but not
> necessarily for all languages. I think hardcoding a particular sentence
> structure in the UI should be avoided.

Isn't this kind of a <label><value><unit> sequence? It would be easier to refuse this good idea if we had an example where it's not working.
Comment 5 Eike Rathke 2022-09-05 10:15:05 UTC
(In reply to Rafael Lima from comment #2)
> For "Exchange characters" the tooltip is "Enter the number of characters in
> the search term that can be exchanged". Maybe a better tooltip would be
> "Enter the number of characters that can differ from the search term"
That is not what it does. "ab" differs from "a" by one character but there is no character replaced/exchanged/substituted. There is one deletion if going from "ab" to "a".


> Here are some suggestions for the labels:
> "Number of different characters"
> "Number of additional characters"
> "Number of missing characters"
> 
> Or maybe a reduced version
> 
> "Different by [ ] characters"
> "   Larger by [ ] characters"
> "  Shorter by [ ] characters"

I think that's not any better. It may be hard to describe in three words for each option what it actually does, but "different by" is too vague and does not describe the replacement/substitution parameter; "larger by" sounds as if the matched string may contain x more characters but that is only one possible effect of the parameter; similar for "shorter by". Also, characters are not "missing".

For choosing the wording it may be important to know roughly about the Weighted Levenshtein Distance (WLD) algorithm. It looks for a possible transformation of the search term to a text string by measuring an "edit distance". That transformation can be accomplished by different operations, for example "ab" can be transformed to "ac" by either replacing/substituting 'b' with 'c' (distance of 1), or by removing/deleting 'b' and then adding/inserting 'c' (distance of 2).
See also https://en.wikipedia.org/wiki/Levenshtein_distance
Comment 6 Heiko Tietze 2022-09-05 13:29:20 UTC
Levenshtein talks about "Insert/Delete/Replace [n]" characters. We use "Add/Remove/Exchange characters [n]", indeed not much but at least a little improvement.
Comment 7 Tuomas Hietala 2022-09-08 13:40:33 UTC
(In reply to Heiko Tietze from comment #4)
> (In reply to Tuomas Hietala from comment #3)
> > > "Different by [ ] characters"
> > 
> > This would work well for English (and many other languages), but not
> > necessarily for all languages. I think hardcoding a particular sentence
> > structure in the UI should be avoided.
> 
> Isn't this kind of a <label><value><unit> sequence? It would be easier to
> refuse this good idea if we had an example where it's not working.

On a second thought, this actually wouldn't be a problem here, because the structure <string 1>[input box]<string 2> does accommodate any kind of word order, because it's possible to leave either of the strings empty if necessary.
Comment 8 Stéphane Guillou (stragu) 2022-09-08 14:58:30 UTC
The labels were the same in:

Version: 6.3.6.2
Build ID: 2196df99b074d8a661f4036fca8fa0cbfa33a497
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3; 
Locale: en-AU (en_AU.UTF-8); UI-Language: en-US
Calc: threaded

I would be happy with:

- Exchange at most [] characters
- Add at most [] characters
- Remove at most [] characters

(In reply to Eike Rathke from comment #5)
> For choosing the wording it may be important to know roughly about the
> Weighted Levenshtein Distance (WLD) algorithm. It looks for a possible
> transformation of the search term to a text string by measuring an "edit
> distance". That transformation can be accomplished by different operations,
> for example "ab" can be transformed to "ac" by either replacing/substituting
> 'b' with 'c' (distance of 1), or by removing/deleting 'b' and then
> adding/inserting 'c' (distance of 2).
> See also https://en.wikipedia.org/wiki/Levenshtein_distance

In that sense, the documentation could be improved:

https://help.libreoffice.org/7.5/en-US/text/shared/01/02100100.html?System=UNIX&DbPAR=WRITER&HID=cui/ui/similaritysearchdialog/grid1#bm_id3154815

Using wording like "how many times a character can be added when computing the edit distance between the search string and the matched string".

Related to this, there's bug 129492
Comment 9 Heiko Tietze 2022-09-09 08:39:35 UTC
We discussed this topic in the design meeting.

Basically shorter labels are better than verbose. The idea with treating "characters" as kind of a unit is going in this direction.

Whether the ultimate string is "Exchange at most" or "Different by" or just "Add" should be decided by native speakers (or the one who implements it). Personally I prefer the second option.

Code pointer:
cui/uiconfig/ui/similaritysearchdialog.ui