Bug 126294 - Similarity search does not find results when searching for strings with a space or hyphen
Summary: Similarity search does not find results when searching for strings with a spa...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.0.2.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 133141 (view as bug list)
Depends on:
Blocks: Find&Replace-Dialog
  Show dependency treegraph
 
Reported: 2019-07-09 01:32 UTC by Joel M
Modified: 2022-09-08 14:42 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Instructions and text document for demonstrating problem (11.62 KB, application/vnd.oasis.opendocument.text)
2020-10-20 21:58 UTC, sdc.blanco
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joel M 2019-07-09 01:32:16 UTC
Description:
In Writer 6.2.4.2 (x64), the Find & Replace dialog's "similarity search" option fails to find results if you have multiple words or whitespace in the "find" input. It appears to be treating whitespace or extra words like added or exchanged characters.

Steps to Reproduce:
1. Create a new Writer document.
2. Type something. Example: "For example, this document."
3. Open Find & Replace. Search for "document". This works.
4. Check "Similarity search" and set "Exachange characters" to 2. Search again. This works.
5. Search for "docummmt" instead. This works.
6. Search for "this docummmt" or even just " docummmt". This does not work.
7. Increase the add or exchange characters or both. If you increase them enough, this works, but the combination is unclear.

Actual Results:
"Search key not found"

Expected Results:
Highlight 1 result


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 Dieter 2019-07-09 06:58:25 UTC
(In reply to Joel M from comment #0)
> 6. Search for "this docummmt" or even just " docummmt". This does not work.

" docummmt" works for me, but not "this docummmt" using

Version: 6.4.0.0.alpha0+ (x64)
Build ID: ae823e4633a76d13cebc6432b9e44b9b2862326b
CPU threads: 4; OS: Windows 10.0; UI render: GL; VCL: win; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2019-06-26_23:06:07
Locale: de-DE (de_DE); UI-Language: en-US
Calc: threaded

and also in

Version: 6.2.5.2 (x64)
Build-ID: 1ec314fa52f458adc18c4f025c545a4e8b22c159
CPU-Threads: 4; BS: Windows 10.0; UI-Render: Standard; VCL: win; 
Gebietsschema: de-DE (de_DE); UI-Sprache: de-DE
Calc: threaded

Thank you for reporting the bug. To be certain the reported issue is not related to corruption in the user profile, could you please reset your Libreoffice profile (https://wiki.documentfoundation.org/UserProfile) and re-test?

I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' if the issue is still present
Comment 2 Joel M 2019-07-18 22:26:08 UTC
I'll try the profile reset thing. Meanwhile, I've reproduced the bug on:

Windows 8
LibreOffice 5.0.2.2

So this is apparently not new. It looks like increasing the remove characters count has a strong effect on whether there is a hit, but you also need to increase one or the other of add or change characters when searching for "this docummmt".

Presumably it's doing something like looking at each word in the document individually but comparing it to the whole search string.
Comment 3 QA Administrators 2019-07-19 02:54:00 UTC Comment hidden (obsolete)
Comment 4 Joel M 2019-07-27 18:19:31 UTC
(In reply to Dieter Praas from comment #1) 
> Thank you for reporting the bug. To be certain the reported issue is not
> related to corruption in the user profile, could you please reset your
> Libreoffice profile (https://wiki.documentfoundation.org/UserProfile) and
> re-test?
> 
> I have set the bug's status to 'NEEDINFO'. Please change it back to
> 'UNCONFIRMED' if the issue is still present

Per the directions you linked, I restarted LO in safe mode (using the default setting) on the Windows 10 / LO 6.2.4.2 (x64) machine. I noticed the toolbars and icons were different, so I think safe mode was properly ignoring my profile.

I followed my steps above and got the same result: similarity search seems to choke when searches involve whitespace. Increasing the "Remove characters" count did result in a match, even though it shouldn't have been necessary. (So my original step #7 may be wrong -- it seems like remove characters is more important than add or exchange for this behavior.)
Comment 5 Dieter 2019-07-28 04:24:04 UTC
Set to NEW. It still works for me with an additional whitespace, but fails with searching for two words (I tried with "this documnt").
Comment 6 Buovjaga 2020-09-04 14:33:56 UTC
*** Bug 133141 has been marked as a duplicate of this bug. ***
Comment 7 sdc.blanco 2020-10-20 21:58:29 UTC
Created attachment 166559 [details]
Instructions and text document for demonstrating problem

Encountered this problem independently, and made a test file (attached, with instructions in file).

STR in test file reproduces problem of not finding two strings with a space.  

And shows an additional problem:  Cannot find strings with a hyphen.
Comment 8 Stéphane Guillou (stragu) 2022-09-08 14:42:50 UTC
confirmed for both spaces and hyphens in:

Version: 7.5.0.0.alpha0+ / LibreOffice Community
Build ID: 24087697d5cf78aac346d4dcea0596373e15a95c
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

and

Version: 7.3.6.2 / LibreOffice Community
Build ID: c28ca90fd6e1a19e189fc16c05f8f8924961e12e
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Changing earliest known version affected according to comment 2.