Bug 105883 - Use unicode normalization for search and replace
Summary: Use unicode normalization for search and replace
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
4.1.6.2 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Find-Search
  Show dependency treegraph
 
Reported: 2017-02-09 14:57 UTC by Jan-Marek Glogowski
Modified: 2023-10-03 19:14 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
A document with the test string ööo (8.90 KB, application/vnd.oasis.opendocument.text)
2017-02-09 16:01 UTC, Jan-Marek Glogowski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan-Marek Glogowski 2017-02-09 14:57:39 UTC
Description:
I'm currently testing Unicode compatibility including different input methods on Windows and Linux.

There are two ways to represent the German 'ö': either use the single unicode character U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS) or have an 'o' + U+0308 (COMBINING DIAERESIS).

Using Unicode normalization (e.g. NFC), these are considered equal, but for LO they are different.

LO 5.3 added an option to the search and replace dialog to ignore diacritics generally. This can be used as a kind of workaround for search, but doesn't help with replace, as this also matches 'o'.

Actually I also tested gedit and kate and only gedit finds both matches in the "ööo"-string. kate in KDE4 at least loads the text correctly, while KF5 loads it as "öoö" :-(

I just tested 4.1.6, but I guess it's inherited from OOo.

Steps to Reproduce:
Open document and search for ö in 'ööo'.

Actual Results:  
You get one or three matches, depending on the "ignore diacritics" setting.

Expected Results:
You should get two or three matches depending on the "ignore diacritics" setting.


Reproducible: Always

User Profile Reset: No

Additional Info:


User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0
Comment 1 Jan-Marek Glogowski 2017-02-09 16:01:00 UTC
Created attachment 131044 [details]
A document with the test string ööo
Comment 2 V Stuart Foote 2017-02-10 04:29:09 UTC
Assume we would continue to use ICU for doing this.

=-ref-=
http://www.icu-project.org/userguide/normalization
http://userguide.icu-project.org/transforms/normalization