Bug 146837 - Find and Replace is behaving strangely (inconsistent "Find")
Summary: Find and Replace is behaving strangely (inconsistent "Find")
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.2.5.2 release
Hardware: x86 (IA32) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-18 13:24 UTC by brian.the2brit2
Modified: 2022-01-20 13:43 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Blocks of text with country stats, some not separated (17.48 KB, text/plain)
2022-01-18 13:45 UTC, brian.the2brit2
Details

Note You need to log in before you can comment on or make changes to this bug.
Description brian.the2brit2 2022-01-18 13:24:47 UTC
Description:
Hello :-)
I was doing a search and replace on a file of 190+ blocks of text (the doc started off as one column of words and numbers (file attached) but I used Find & Replace (F&R) to convert the CR to tabs (more about why later).  Then I wanted to find all text-items (the names of the countries) and make each one be the start of a line.
I used the word-boundary regular expression "alpha" to find the words
 (i.e. \b[:alpha;]   )
That did work, BUT ONLY ON SOME OF THE WORDS!  It skipped a large chunk of the file and found another word.
I tried    \bSlo   because I had words Slovakia and Slovenia on the page, but separated by lines with several other words.  The F&R DID find them.
I was beginning to think "What on Earth is going on here?"  I looked at the source webpage for clues to formatting - no help.  I tried "Clear all direct formatting": no improvement.
I tried [:alpha:][:alpha:] - and that's when I realised a bug report was called for!  "Find Next" meandered down the page highlighting pairs of characters at random intervals - no discernible pattern!  Large jumps and no similarity between choices of letters!
By the way, the reason I was doing all this was to get a table of statistics in the same form as the one in the newspaper (I just highlighted the table on the webpage and did Ctrl-C, then pasted into the Writer document).  It gave me 5 lines per country, so I converted all the end-of-lines to tabs, intending to then make all the names (the text-items) begin with "newlines".  Then I could read it into Calc as a "Tab-sep-variable" file.   If there is a better way to get a copy of a table, please let me know :-))
'Bye for Now,
 Brian Howe 

Steps to Reproduce:
1.Load my (attached) file into Writer
2.Use Find & Replace (with Regular Expressions selected) to "Find" \b[:alpha;]
3. repeat with [:alpha;][:alpha:]

Actual Results:
Only some of the names of countries were found.  Many were skipped over

Expected Results:
I expected each word to be highlighted after each click of the "Find Next" button.
In the second example (step 3 above) I expected the highlighting to step along two characters at a time.


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 7.2.5.2 (x64) / LibreOffice Community
Build ID: 499f9727c189e6ef3471021d6132d4c694f357e5
CPU threads: 4; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: en-GB (en_GB); UI: en-GB
Calc: threaded

The attached file contains the Web-link of the page from which the "table" was copied
Comment 1 brian.the2brit2 2022-01-18 13:45:12 UTC
Created attachment 177633 [details]
Blocks of text with country stats, some not separated

I just did a check to ensure that the file I sent (one of 6 stages in the testing process) DID give the behaviour I described (it was the original) and I could not reproduce the misbehaviour!  I substituted a later file in the series (one in which I had done some Find & Replace operations, and that is attached now) but I still could not see the misbehaviour!
I wonder if it has something to do with the "Replace" attempts I did that went wrong.  It may have been cleared by me closing the files.  Perhaps when I reopened the files all was well?  I don't have the time right now to go through all those "didn't work" operations, but there must have been some reason for the strange behaviour.
Comment 2 Tex2002ans 2022-01-19 20:22:10 UTC
(This should probably be marked as NOTABUG.)

You have an error in your Regular Expressions.

The correct regex for "alphabetic" characters is:

- [:alpha:]

Notice TWO COLONS.

Yours was accidentally using a SEMICOLON at the end.

For a full list of what can be used in LibreOffice 7.2 help page:

"List of Regular Expressions"
https://help.libreoffice.org/latest/en-US/text/shared/01/02100001.html

- - -

Note: What your original regular expression was looking for:

- [:alpha;]

"Hey, look for these letters:

- a colon
- OR 'a'
- OR 'l'
- OR 'p'
- OR 'h'
- OR semicolon"

That's why you were getting all these weird, "random", combinations appearing.