Bug 129492 - [LOCALHELP] Need good example of use for similarity search
Summary: [LOCALHELP] Need good example of use for similarity search
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Documentation (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL: https://help.libreoffice.org/6.3/en-U...
Whiteboard: target:7.1.0
Keywords:
Depends on:
Blocks: Find-Search Help-Changes-Features
  Show dependency treegraph
 
Reported: 2019-12-19 10:23 UTC by Heiko Tietze
Modified: 2022-09-09 08:33 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Document with instructinos for testing similarity search (11.62 KB, application/vnd.oasis.opendocument.text)
2020-10-09 14:37 UTC, sdc.blanco
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Heiko Tietze 2019-12-19 10:23:24 UTC
The workflow and use case of the similarity search is difficult to understand. In particular, Combine reads as if the parameters are put together with a logical OR while without all parameters have to be met. However, that's not clear and an example is missing at the documentation.
Comment 1 Eike Rathke 2019-12-19 17:30:52 UTC
Maybe this can shed some light:

The algorithm used is a Weighted Levenshtein Distance (including wildcards ? and *).

The mathematical definition of the real WLD means EITHER maximum X replacements OR Y characters shorter OR Z characters longer, where a mix of operations is allowed but each operation draws from a shared 100% pool of operations.

The relaxed (UI Combined, internal SplitCount) mode allows maximum X replacements AND/OR Y character shorter AND/OR Z characters longer. Only insertions and deletions share one pool from which they draw, replacements use a second independent pool. This is more what a user expects if not familiar with WLD.

More details and an example can be found in the comments at
https://opengrok.libreoffice.org/xref/core/i18npool/source/search/levdis.hxx?r=ee8f0a10#26
Comment 2 sdc.blanco 2020-10-02 13:24:28 UTC
(In reply to Eike Rathke from comment #1)
> Maybe this can shed some light:
...but not enough.

Was going to try to improve, but not sure I understand completely.

If "Combine" is UNchecked in the Similarity Search dialog, then what happens? (i.e., how is it different from when Combine is checked?)

(the source code says EITHER, does that mean that each parameter is used exactly for a match  (which in everyday thinking sounds like "combine")

The mathematical explanation of relaxed WLD sounds like what one would expect 
(in everyday language use),if the descriptions of each parameter (on the help page) are Combined. 

(a guess for) Possible text for help page under Combine heading:

"If unchecked, then search matches any item that matches one of the three parameters.
If checked, then an intelligent combination of the settings for exchange, add, and remove characters is used."

cc: Eike Rathke

Just curious:  if these interpretations are correct, then it is hard to understand how checking or unchecking Combine will make a big difference in practice.  If that naive speculation is completely wrong, then a practical "tip" about when it is better to choose one or the other would be good (and could be added to the help page in a "tip" box).
Comment 3 sdc.blanco 2020-10-09 14:37:30 UTC
Created attachment 166244 [details]
Document with instructinos for testing similarity search

Is similarity search supposed to be able to find two words (i.e, two letter strings with a space between them)? 
If yes, then maybe there is a bug.  
If no, I will include a note in the documentation.  
See attached file for simple, detailed instructions about how to experience the behavior (tested with 7.1.0.0.alpha0+).
Comment 4 sdc.blanco 2020-10-10 02:14:58 UTC
(In reply to sdc.blanco from comment #3)
> Is similarity search supposed to be able to find two words (i.e, two letter
> strings with a space between them)? 
See bug 126294 for similar problem.
Comment 5 Commit Notification 2020-10-20 20:52:28 UTC
Seth Chaiklin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/help/commit/bbb9b402a4197f412a411efeef434b168d0ce96d

Partially resolves: tdf#129492 (and related to: tdf#64739) improve explanation of Similarity search
Comment 6 sdc.blanco 2020-10-20 22:01:10 UTC
(In reply to Heiko Tietze from comment #0)
> The workflow and use case of the similarity search is difficult to
> understand. In particular, Combine reads as if the parameters are put
> together with a logical OR while without all parameters have to be met.
Logic of Combine should be explained now and some tips about usage.

> an example is missing at the documentation.
It is still missing.  Therefore "partially resolved"  

Who can provide a good, short useful example?
Comment 7 Eike Rathke 2022-09-09 08:33:32 UTC
https://en.wikipedia.org/wiki/Levenshtein_distance#Example
However, as Wikipedia is CC-BY-SA and mentioning every BY in the help is quite cumbersome (or do we do that already templated?), rather link to it or create a new example with similar few steps.
Or just link to the article altogether.