If a file contains any Arabic characters e.g (الرَّحْمَنُ) and then I searched for the same word without the Tashkeel (الرحمن). I will return zero results. The Arabic diacritics are optional characters. For more info here is the wikipedia page https://en.wikipedia.org/wiki/Arabic_diacritics We need to add an enum type in the find and replace dialog boxes alongside the "Match case", so we also have "Match Tashkeel" which is by default not enabled.
The search fails even to find a single letter like reh or mim in that string. It's clearly a bug.
Also relevant for Hebrew with its diacritics (Nikkud).
abdulmajeed ahmed committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=448fa131b2dafac305d88480e469cc4bc0515d68 Fix fdo#52204 add new feature ignore diacritics in search for CTL The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
(In reply to comment #2) > Also relevant for Hebrew with its diacritics (Nikkud). Lior: could you provide me with the diacritics Unicode you want to ignore in Hebrew
@abdulmajeed: Thanks a lot for the patch but please add the other missing diacritics from https://en.wikipedia.org/wiki/Arabic_diacritics. You missed some like: Maddah, Dagger alif, Alif waslah, .... It would also add to the clarity of the code if you added a comment mentioning the unicode name next to each hex number.
abdulmajeed ahmed committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=64245c108aec557f62c254486aa354382bd445ce fdo#52204 add more diacritics for Arabic The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Instead of hard coding Arabic marks, one should query the Unicode general category of the character using ICU, something like "u_getIntPropertyValue(c, UCHAR_GENERAL_CATEGORY) == U_NON_SPACING_MARK".
@Munzir thank you i have added it excpet Alif waslah because, all ٱأإاآ the correct behavior is not to ignore it but make equal each other and this is different from what is implemented here so this probelm count as another bug.
abdulmajeed ahmed committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=a089ed2bf90fdb293c8502e4ab47cbbe027234f8 Better approach for solving fdo#52204 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Thank you @Khaled so now i think it should work for all CTL languages
Great work! But why the target is 4.2? Can't you commit this to 4-0 or even 4-1 branches instead of master?
We always commit to master, and then cherry-pick to other branches. This is the way we test features/fixes. BTW, I still owe you the answer regarding Hebrew.
abdulmajeed ahmed committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=96456205067220cc73bffae6ae860dd120641660 Add Ignore-Diacritics to find toolbar for CTL fdo#52204 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Verified on a build from master (tested with text in Hebrew).