Bug 120119 - "Find bar" does not search for Arabic text containing "diacritics"
Summary: "Find bar" does not search for Arabic text containing "diacritics"
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
6.1.1.2 release
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsUXEval
Depends on:
Blocks: Find-Search Arabic-and-Farsi Diacritics
  Show dependency treegraph
 
Reported: 2018-09-26 06:00 UTC by Hatem Wasfy
Modified: 2024-08-04 21:32 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hatem Wasfy 2018-09-26 06:00:23 UTC
Description:
(1) Quick background:
In Arabic writing system there is something called "Arabic diacritics"
https://en.wikipedia.org/wiki/Arabic_diacritics

Arabic diacritics have significant values at the sound level, meaning level, and grammar level.


(2) The Bug:

If we write some arabic text using "Arabic diacritics" in LibreOffice Writer, 
then we tried to use find function in LibreOffice Writer to find any already existing word, it fails to find it as if it is different word.

(3) Example for the bug: 

--> For example having these Arabic text:

بِسْمِ ٱللهِ ٱلرَّحْمٰنِ ٱلرَّحِيمِ

يَا أَيُّهَا النَّاسُ اتَّقُوا رَبَّكُمْ ۚ إِنَّ زَلْزَلَةَ السَّاعَةِ شَيْءٌ عَظِيمٌ (1) يَوْمَ تَرَوْنَهَا تَذْهَلُ كُلُّ مُرْضِعَةٍ عَمَّا أَرْضَعَتْ وَتَضَعُ كُلُّ ذَاتِ حَمْلٍ حَمْلَهَا وَتَرَى النَّاسَ  سُكَارَىٰ وَمَا هُم بِسُكَارَىٰ وَلَٰكِنَّ عَذَابَ اللَّهِ شَدِيدٌ


--> Then we try to search for the word:
الناس

--> Then it fails to find the word match although it is already existing more than one time.


Steps to Reproduce:
1. Open LibreOffice Writer

2. Write Arabic text using "Arabic diacritics" 
(Or quickly you can copy & paste sample arabic text from: https://en.wikipedia.org/wiki/Arabic_diacritics)

3. Press (Control + F) buttons to find text in the page

4. Write one arabic word from the ones in the written text but without "Arabic diacritics". 



Actual Results:
The bug happens, and LibreOffice fails to find the word.


Expected Results:
LibreOffice should be able to find the existing word inside the text.



Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 V Stuart Foote 2018-09-26 12:47:10 UTC
Rather than the <Ctrl>+F "Find bar", does search behave using the <Ctrl>+H "Find & Replace" dialog?

IIUC search for diacritics are handled without unicode transformation by ICU libs, but the Find bar and the Finad & Replace dialog get different defaults
Comment 2 Hatem Wasfy 2018-09-26 14:05:34 UTC
I tried (Control + H) with untick to the option then it can idetify the word.

But (Control + F) still fails.
We still need a fix for (Control + F), which is the case that I am reporting bug about.
Comment 3 Hatem Wasfy 2018-09-26 14:09:16 UTC
I tried (Control + H) with untick to the option "Diacritic-senstive" then it can idetify the word.

But (Control + F) still fails.
We still need a fix for (Control + F), which is the case that I am reporting bug about.
Comment 4 V Stuart Foote 2018-09-26 14:43:02 UTC
Out of my wheelhouse...

Would this mean the search routines for the Find bar are applying one of the ICU transforms (obscuring the diacritics and other kashida)?  OK in the more robust Find & Replace, but misconfigured for the simpler Find bar use?
Comment 5 Hatem Wasfy 2018-09-26 14:48:49 UTC
Exactly,
It is OK in the more robust Find & Replace, but misconfigured for the simpler Find bar.
Comment 6 Naruhiko Ogasawara 2018-09-27 20:46:35 UTC
I could reproduce this issue with the following two versions:

Version: 6.1.1.2
Build ID: libreoffice-6.1.1.2-snap1
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ja-JP (ja_JP.UTF-8); Calc: group threaded

Version: 6.2.0.0.alpha0+
Build ID: d077b30dba618daace0373e9b7e7fe84f982c6aa
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk2; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2018-09-26_23:17:55
Locale: ja-JP (ja_JP.UTF-8); Calc: threaded
Comment 7 Shinji Enoki 2018-09-27 20:54:56 UTC
I can reproduce in the following environment.

OS: OS: Debian jessie x86-64
Version: 6.1.0.3
Build ID: efb621ed25068d70781dc026f7e9c5187a4decd1
CPU threads: 4; OS:Linux 3.16; UI render: default; VCL: gtk2; 
Locale: ja-JP (ja_JP.utf8); Calc: group threaded
Run in safe mode of LibreOffice
Comment 8 Xisco Faulí 2018-10-17 11:08:07 UTC
I can reproduce it only if 'Complex text layout' in Tools - Options - Language is disabled
Comment 9 Xisco Faulí 2018-10-17 11:12:21 UTC
It's interesting that looking for الناس in firefox doesn't find the text in the arabic text either...
Comment 10 ⁨خالد حسني⁩ 2018-10-17 12:14:18 UTC
Firefox does not support skipping diacritics in search. Very few open source applications do.
Comment 11 Eike Rathke 2018-10-17 13:40:31 UTC
FWIW, diacritics and other settings are available in the Find & Replace (Ctrl+H) dialog; the Find Bar does not provide these settings. The (IMHO good) design decision was to not inherit the current/last settings from Find & Replace (anymore, it was done once) as that turned out to be too confusing. If additional options are needed they need to be added to the Find Bar. Specifically Diacritics-Sensitive can not be disabled unconditionally (applying a corresponding ICU Unicode transformation) because it interferes with languages where diacritics are distinct characters.
Comment 12 QA Administrators 2019-10-18 02:40:08 UTC Comment hidden (obsolete)
Comment 13 V Stuart Foote 2020-02-12 08:39:09 UTC
Are we OK here with a WONTFIX, and so continue to direct the more robust search into the Find & Replace dialog? 

Meaning, let's keep the <Ctrl>+F 'Find' toolbar lightweight and continue use of ICU libs for search to be diacritic-insensitive.

Annoying for some scripts/locales, but essential to clarity of usage of 'Find' toolbar vs. the 'Find & Replace...' dialog.
Comment 14 Heiko Tietze 2020-02-18 12:22:54 UTC
(In reply to V Stuart Foote from comment #13)
> Are we OK here with a WONTFIX

Yes. See also bug 130603 and bug 129469.