Bug 52204 - Find/Search option should support ignoring diacritics (Arabic Tashkeel and Hebrew nikkud)
Summary: Find/Search option should support ignoring diacritics (Arabic Tashkeel and He...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
3.5.3 release
Hardware: All All
: medium normal
Assignee: abdulmajeed
URL:
Whiteboard: target:4.2.0
Keywords:
Depends on:
Blocks: RTL-CTL Find&Replace-Dialog
  Show dependency treegraph
 
Reported: 2012-07-17 15:13 UTC by Munzir Taha
Modified: 2018-03-11 22:23 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Munzir Taha 2012-07-17 15:13:23 UTC
If a file contains any Arabic characters e.g (الرَّحْمَنُ) and then I searched for the same word without the Tashkeel (الرحمن). I will return zero results. The Arabic diacritics are optional characters. For more info here is the wikipedia page
https://en.wikipedia.org/wiki/Arabic_diacritics

We need to add an enum type in the find and replace dialog boxes alongside the "Match case", so we also have "Match Tashkeel" which is by default not enabled.
Comment 1 Urmas 2012-07-20 12:14:33 UTC
The search fails even to find a single letter like reh or mim in that string. It's clearly a bug.
Comment 2 Lior Kaplan 2012-11-10 20:20:36 UTC
Also relevant for Hebrew with its diacritics (Nikkud).
Comment 3 Commit Notification 2013-06-21 09:20:42 UTC
abdulmajeed ahmed committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=448fa131b2dafac305d88480e469cc4bc0515d68

Fix fdo#52204 add new feature ignore diacritics in search for CTL



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 4 abdulmajeed 2013-06-21 09:24:40 UTC
(In reply to comment #2)
> Also relevant for Hebrew with its diacritics (Nikkud).

Lior: could you provide me with the diacritics Unicode you want to ignore in Hebrew
Comment 5 Munzir Taha 2013-06-23 18:35:50 UTC
@abdulmajeed:
Thanks a lot for the patch but please add the other missing diacritics from https://en.wikipedia.org/wiki/Arabic_diacritics. You missed some like: Maddah, Dagger alif, Alif waslah, .... It would also add to the clarity of the code if you added a comment mentioning the unicode name next to each hex number.
Comment 6 Commit Notification 2013-06-24 08:21:38 UTC
abdulmajeed ahmed committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=64245c108aec557f62c254486aa354382bd445ce

fdo#52204 add more diacritics for Arabic



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 7 ⁨خالد حسني⁩ 2013-06-24 08:32:40 UTC
Instead of hard coding Arabic marks, one should query the Unicode general category of the character using ICU, something like "u_getIntPropertyValue(c, UCHAR_GENERAL_CATEGORY) == U_NON_SPACING_MARK".
Comment 8 abdulmajeed 2013-06-24 08:45:14 UTC
@Munzir thank you i have added it excpet Alif waslah because,

all ٱأإاآ the correct behavior is not to ignore it but make equal each other and this is different from what is implemented here
so this probelm count as another bug.
Comment 9 Commit Notification 2013-06-24 10:10:14 UTC
abdulmajeed ahmed committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a089ed2bf90fdb293c8502e4ab47cbbe027234f8

Better approach for solving fdo#52204



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 10 abdulmajeed 2013-06-24 10:14:46 UTC
Thank you @Khaled so now i think it should work for all CTL languages
Comment 11 Munzir Taha 2013-06-24 11:40:23 UTC
Great work! But why the target is 4.2? Can't you commit this to 4-0 or even 4-1 branches instead of master?
Comment 12 Lior Kaplan 2013-06-24 11:42:06 UTC
We always commit to master, and then cherry-pick to other branches. This is the way we test features/fixes.

BTW, I still owe you the answer regarding Hebrew.
Comment 13 Commit Notification 2013-06-25 08:42:07 UTC
abdulmajeed ahmed committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=96456205067220cc73bffae6ae860dd120641660

Add Ignore-Diacritics to find toolbar for CTL fdo#52204



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 14 Lior Kaplan 2013-07-21 08:16:45 UTC
Verified on a build from master (tested with text in Hebrew).