129492 – [LOCALHELP] Need good example of use for similarity search

Bug 129492 - [LOCALHELP] Need good example of use for similarity search

Summary: [LOCALHELP] Need good example of use for similarity search

Status:	RESOLVED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Documentation (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:	https://help.libreoffice.org/6.3/en-U...
Whiteboard:	target:7.1.0 target:25.2.0
Keywords:

Depends on:
Blocks:	Find-Search Help-Changes-Features
	Show dependency tree / graph

Reported:	2019-12-19 10:23 UTC by Heiko Tietze
Modified:	2024-08-26 20:45 UTC (History)
CC List:	4 users (show)

See Also:	150693
Crash report or crash signature:

Attachments
Document with instructinos for testing similarity search (11.62 KB, application/vnd.oasis.opendocument.text) 2020-10-09 14:37 UTC, sdc.blanco	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Heiko Tietze 2019-12-19 10:23:24 UTC

The workflow and use case of the similarity search is difficult to understand. In particular, Combine reads as if the parameters are put together with a logical OR while without all parameters have to be met. However, that's not clear and an example is missing at the documentation.

Comment 1 Eike Rathke 2019-12-19 17:30:52 UTC

Maybe this can shed some light:

The algorithm used is a Weighted Levenshtein Distance (including wildcards ? and *).

The mathematical definition of the real WLD means EITHER maximum X replacements OR Y characters shorter OR Z characters longer, where a mix of operations is allowed but each operation draws from a shared 100% pool of operations.

The relaxed (UI Combined, internal SplitCount) mode allows maximum X replacements AND/OR Y character shorter AND/OR Z characters longer. Only insertions and deletions share one pool from which they draw, replacements use a second independent pool. This is more what a user expects if not familiar with WLD.

More details and an example can be found in the comments at
https://opengrok.libreoffice.org/xref/core/i18npool/source/search/levdis.hxx?r=ee8f0a10#26

Comment 2 sdc.blanco 2020-10-02 13:24:28 UTC

(In reply to Eike Rathke from comment #1)
> Maybe this can shed some light:
...but not enough.

Was going to try to improve, but not sure I understand completely.

If "Combine" is UNchecked in the Similarity Search dialog, then what happens? (i.e., how is it different from when Combine is checked?)

(the source code says EITHER, does that mean that each parameter is used exactly for a match  (which in everyday thinking sounds like "combine")

The mathematical explanation of relaxed WLD sounds like what one would expect 
(in everyday language use),if the descriptions of each parameter (on the help page) are Combined. 

(a guess for) Possible text for help page under Combine heading:

"If unchecked, then search matches any item that matches one of the three parameters.
If checked, then an intelligent combination of the settings for exchange, add, and remove characters is used."

cc: Eike Rathke

Just curious:  if these interpretations are correct, then it is hard to understand how checking or unchecking Combine will make a big difference in practice.  If that naive speculation is completely wrong, then a practical "tip" about when it is better to choose one or the other would be good (and could be added to the help page in a "tip" box).

Comment 3 sdc.blanco 2020-10-09 14:37:30 UTC

Created attachment 166244 [details]
Document with instructinos for testing similarity search

Is similarity search supposed to be able to find two words (i.e, two letter strings with a space between them)? 
If yes, then maybe there is a bug.  
If no, I will include a note in the documentation.  
See attached file for simple, detailed instructions about how to experience the behavior (tested with 7.1.0.0.alpha0+).

Comment 4 sdc.blanco 2020-10-10 02:14:58 UTC

(In reply to sdc.blanco from comment #3)
> Is similarity search supposed to be able to find two words (i.e, two letter
> strings with a space between them)? 
See bug 126294 for similar problem.

Comment 5 Commit Notification 2020-10-20 20:52:28 UTC

Seth Chaiklin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/help/commit/bbb9b402a4197f412a411efeef434b168d0ce96d

Partially resolves: tdf#129492 (and related to: tdf#64739) improve explanation of Similarity search

Comment 6 sdc.blanco 2020-10-20 22:01:10 UTC

(In reply to Heiko Tietze from comment #0)
> The workflow and use case of the similarity search is difficult to
> understand. In particular, Combine reads as if the parameters are put
> together with a logical OR while without all parameters have to be met.
Logic of Combine should be explained now and some tips about usage.

> an example is missing at the documentation.
It is still missing.  Therefore "partially resolved"  

Who can provide a good, short useful example?

Comment 7 Eike Rathke 2022-09-09 08:33:32 UTC

https://en.wikipedia.org/wiki/Levenshtein_distance#Example
However, as Wikipedia is CC-BY-SA and mentioning every BY in the help is quite cumbersome (or do we do that already templated?), rather link to it or create a new example with similar few steps.
Or just link to the article altogether.

Comment 8 Commit Notification 2024-08-14 13:58:40 UTC

Olivier Hallot committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/help/commit/9d2a16b7eb33cf0ff58e010d502d64c6dfcdff4f

tdf#129492 Similarity search examples