163616 – Match diacritics

Bug 163616 - Match diacritics

Summary: Match diacritics

Status:	RESOLVED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Writer (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	All All

Importance:	medium enhancement
Assignee:	Johann Lorber

URL:
Whiteboard:	target:25.8.0 inReleaseNotes:25.8
Keywords:	difficultyInteresting, easyHack, skillCpp, topicUI

Duplicates (1):	163605 (view as bug list)
Depends on:
Blocks:	Find-Toolbar
	Show dependency tree / graph

Reported:	2024-10-25 12:22 UTC by madhavkiran.sodum
Modified:	2025-04-18 15:29 UTC (History)
CC List:	7 users (show)

See Also:	52204 119200 129469 134150
Crash report or crash signature:

Attachments
a visual image on how it should look (62.87 KB, image/png) 2024-10-25 12:22 UTC, madhavkiran.sodum	Details
How it looks on Mozilla Firefox (40.13 KB, image/png) 2024-10-25 12:32 UTC, madhavkiran.sodum	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description madhavkiran.sodum 2024-10-25 12:22:38 UTC

Created attachment 197232 [details]
a visual image on how it should look

Enhancement: Could a "Match diacritics" Check box be added next to the "Match Case" Check box in the Find toolbar?

It would be a useful tool for people working with diacritics. It is often too painful to open the full Find & Replace window and check and uncheck the "Diacritic-sensitive" check box.

Please see the attached image on how it could look.

Thank you
Regards

Comment 1 madhavkiran.sodum 2024-10-25 12:32:04 UTC

Created attachment 197233 [details]
How it looks on Mozilla Firefox

This is how Mozilla Firefox does it.

Comment 2 V Stuart Foote 2024-10-25 20:27:27 UTC

Previous UX-Design discussion/decision to keep the Quick Find Search toolbar simple.

https://bugs.documentfoundation.org/show_bug.cgi?id=129469#c10

Meaning by default Kashida are ignored and folded in with other strings (ignoreDiacritics_CTL). No different to whole word or regular expression requests.

IMHO => WF

Comment 3 madhavkiran.sodum 2024-10-26 07:52:08 UTC

I definitely respect the previous thoughts and decision. Just a few days with LibreOffice has convinced me that this is a way more powerful tool than Microsoft Office. I just don't feel like going back to Microsoft Office now.

But some more thoughts for consideration in this matter:

1. The "diacritic-sensitive" check box doesn't seem to be synchronized across all the find interfaces. I have it turned on in the Find & Replace dialogue but it doesn't work in the quick find toolbar.

2. Whole Word Match problem can still be quite easily circumvented for most cases by have spaces before and after the search tag.

3. Match Kashida seems to really be a different issue. Kashida is not a character but a code used to help in justifying the text. It is optional in Arabic and also applicable only for Arabic script. It is added by default by some programs. Diacritics/Accents are not like that. They are not optional (they can change the meaning of the words). Thus even ASCII (though a very basic implementation) has diacritical marks included. And so does ISO/IEC 8859-1:1998 ("extended" ASCII and the like). The languages which use Roman + accents/diacritics are definitely a lot more than those that use Arabic scripts.

4. Regular Expression are definitely non-trivial searches and are rightly not placed in the quick search toolbar.

5. Right now Lo's implementation doesn't follow the collation strength rules. We can search while ignoring case and accents but not just case (since accent is ignored by default). Ideally IMHO we should have simple option to toggle between the first three collation strengths:

[Quote]
The Strength attribute determines whether accent or case is taken into account when collating or comparing text strings. In writing systems without case or accent, the Strength attribute controls similarly important features.
The possible values are: primary (1), secondary (2), tertiary (3), quaternary (4), and identity (I).

To ignore:

—accent and case, use the primary strength level
—case only, use the secondary strength level
—neither accent nor case, use the tertiary strength level

Almost all characters can be distinguished by the first three strength levels, therefore in most locales the default Strength attribute is set at the tertiary level. However if the Alternate attribute (described in a following row) is set to shifted, then the quaternary strength level can be used to break ties among white space characters, punctuation marks, and symbols that would otherwise be ignored.
[End of Quote]
https://www.ibm.com/docs/en/db2/11.5?topic=collation-unicode-algorithm-based-collations
https://www.php.net/manual/en/collator.setstrength.php

with the Match Case check box we can switch between strength level 2 & 3. And by implementing Match Diacritics check box we could switch between strength level 1 & 2.

6. Match Case and Match Diacritics are really very similar in implementation. With Match Case off: "s" matches "s", "S" and "ß" (sharp S used in the German language). Where as with Match Diacritics off: "s" would match "s", "ś", "ṣ", etc.

7. Diacritics have become so common on Android phone (even US QWERTY has so many accented characters on long press), Mac, iPhone in English language. Kashida is found only in one language and really cannot be typed with the ease with which we can type diacritics today.

8. Globalization has made it necessary to work with many languages in a single document and English itself has loan words from other accented languages...

9. If the sidebar find and the quick find toolbar have almost the same features, then the quick find is not being extra advantageous.

10. Unfortunately, one of the most powerful features of LO - customization is also not able to solve this issue as Match Diacritics cannot be a .uno.

Regards

Comment 4 Heiko Tietze 2024-10-28 08:37:11 UTC

The QFS (quick find/search toolbar) is supposed to be as quick and simple as possible. The SF dialog (search and find) aims to support all scenarios with full flexibility. We frequently discuss whether SF and QFS need to depend on each other or should become disentangled (which should be the fact currently).

Do you see need to review the discussion regarding diacritics?

Regarding comment 3 I struggle to see the point. We usually handle one topic per ticket and if you have some specific enhancement idea in addition to the diacritics issue, please don't hesitate to file another ticket.

If it's about the collation strength rules, a clear -1 from my side. No average user understands what this means. Ideally tickets start with a use case like "I want to achieve <foo> by doing <bar>. What hinders me is <baz>."

Comment 5 madhavkiran.sodum 2024-10-28 10:08:22 UTC

A review for diacritics would be welcome.

In comment 3, I've tried to answer the points made in https://bugs.documentfoundation.org/show_bug.cgi?id=129469#c10 in comment 2.

IMHO if we can type something easily, then we should be able to search it easily too. Just as typing in the lower and upper case is so easy on any keyboard, it is becoming increasingly easy to type accents (whether on phone or computer). Most users will find it necessary. 

Given this, the QFS will find better utility if this option for matching diacritics is provided (the way Mozilla Firefox has done it without making the search bar look complicated). People have appreciated Mozilla's feature (https://rubenerd.com/firefox-can-now-match-on-diacritics/). 

Regarding Collation Strength, i tried giving the references for further reading.
What i intend to say here is that according to the IBM quote: "Almost all characters can be distinguished by the first three strength levels". So the first three levels will cover almost all cases that are usually encountered. So our QFS should handle at least these three strength levels.

Other complicated or rare finds (such as regular expression, kashida etc.) could be consigned to the SF dialog as is currently done. The users who would know and use regular expressions or kashida is definitely few, but that is not so with diacritics.

Regards

Comment 6 madhavkiran.sodum 2024-10-28 10:15:37 UTC

A lack of this feature will also affect the many Indic scripts, where vowel marks are considered as diacritics. This has already been reported as bug: https://bugs.documentfoundation.org/show_bug.cgi?id=163605

Comment 7 Heiko Tietze 2024-10-28 10:37:15 UTC

(In reply to madhavkiran.sodum from comment #5)
> IMHO if we can type something easily, then we should be able to search it
> easily too.
And you can. What you ask for is to limit the number of results. I'm German and the words "schön" (with diacritic umlaut) has a different meaning than "schon". But there is little benefit, in my opinion, to narrow down the search, which complicates the UI unnecessarily on the other hand.

Comment 8 Heiko Tietze 2024-10-28 10:38:59 UTC

(In reply to madhavkiran.sodum from comment #6)
> A lack of this feature will also affect the many Indic scripts...
We could show the checkbox only if CTK/CTL is enabled.

Comment 9 madhavkiran.sodum 2024-10-28 11:19:19 UTC

(In reply to Heiko Tietze from comment #7)
> (In reply to madhavkiran.sodum from comment #5)
> > IMHO if we can type something easily, then we should be able to search it
> > easily too.
> And you can. What you ask for is to limit the number of results. I'm German
> and the words "schön" (with diacritic umlaut) has a different meaning than
> "schon". But there is little benefit, in my opinion, to narrow down the
> search, which complicates the UI unnecessarily on the other hand.

No sir, i don't intend to narrow down the search without good reason. What i intend is that we give the user a reasonable option through the check box and still keep the UI simple as Firefox has done. Honestly their UI does not look that complicated does it? It looks simple as well as adequate since it would cover the majority of use cases.

If Quick Find Toolbar catches too many characters which are not desired by the user, we may be quickly producing the search results, but it would end up consuming more of the user's time in filtering out what he/she desires - thus making it not so quick in the end. 
This is especially the case when the QFS is disentangled from the Sidebar Find (which lists all the search results), and one would have to go through each and every result to see if what is caught is what is searched. Had the QFS been linked to the sidebar results, then the user would have some relief in just having to scroll down the list and select/see whatever he/she wants.

When the pages go into double digit figures the number of diacritics words would also increase... And other European languages do have many diacritic words.

Comment 10 madhavkiran.sodum 2024-10-28 11:30:09 UTC

 
> IMHO if we can type something easily, then we should be able to search it
> easily too. 

Here my appeal is that if we can type diacritics easily, then we should be able to search just the diacritics too easily. nothing more or nothing less.


regards

Comment 11 V Stuart Foote 2024-10-28 11:42:13 UTC

(In reply to V Stuart Foote from comment #2)
> Meaning by default Kashida are ignored and folded in with other strings

Sorry realized I put that wrong, the issue is diacritics: 

s/Kashida/diacritics & Kashida/


(In reply to Heiko Tietze from comment #8)
> We could show the checkbox only if CTK/CTL is enabled.

No, we either add it back to the QFS find bar across the Western/CJK/CTL trifecta, or not at all. 

IIRC some of the past UI search misbehavior (between the Find bar and Find and Replace... dialog) was bcz the ignoreDiacritics_CTL() was actually across the language groupings. Affecting diacritic and Kashida sensitive search.

Also, keep in mind that the QFS findbar is linked to the SB Navigator's 'Repeat search' and to the new SB 'Find' deck. So any config change to the toolbar will also need to affect the SB decks/content panel (i.e. include/mirror the checkbox control there).

Think we could use comment here from both @Jim and @Khaled

Comment 12 madhavkiran.sodum 2024-10-28 14:27:04 UTC

A summary of the points and an answer as to why Match Diacritics is different from previous requests for other features in the QFS:

1. Match Case and Match Diacritics are both character level search options and similar in implementation. 
2. They are within the scope of an average user and affect many European and Indic languages. 
3. If I can easily type something accurately, then I should be able to easily find it accurately as well.

On the other hand:
1. Find Whole Words Only is not character level but word level search. So we are reasonable in pushing it to the SF dialog. 
2. Kashida is neither character nor word level search option. It is a justification system and also optional. It is also mostly limited to one instance per line and one instance per paragraph. It affects only one writing system - Arabic. So we are reasonable in pushing it to SF dialog.
3. Regular expressions are non-trivial search options whose usage is generally beyond the knowledge of an average user. So we are reasonable in pushing it to SF dialog.

Comment 13 Heiko Tietze 2024-11-07 07:31:11 UTC

We discussed the topic in the design meeting and agree with reasoning. Diacritics should be available at the QFS.

Comment 14 madhavkiran.sodum 2024-11-08 07:36:22 UTC

Thank you. really appreciate it.

Regards

Comment 15 raal 2024-11-10 09:27:03 UTC

*** Bug 163605 has been marked as a duplicate of this bug. ***

Comment 16 mahayogananda 2024-11-11 05:08:36 UTC

Thank you. 

In Indic scripts, the "diacritics" are even more important. One often simply can't find the word one needs with the "diacritics" ignored—because *all vowel marks* are treated as diacritics. (That is another discussion, why that should be!)

In English, that would be equivalent to searching for "dog" and the search returns "dog", "dig", "dug", and "doog" as well as "dogo".

So I'm glad this option will be added to the search bar.

Comment 17 Jim Raykowski 2024-11-15 06:32:46 UTC

FWIW, following how Match Case is implemented is how I would go about adding Match Diacritics to the find bar. I'd also do some research on user interface controller internals[1] but wouldn't let that scare me away ;-)

Here are the files touched by the implementation of Match Case:

svx/source/tbxctrls/tbunosearchcontrollers.cxx
include/svx/strings.hrc
officecfg/registry/data/org/openoffice/Office/UI/Controller.xcu
officecfg/registry/data/org/openoffice/Office/UI/GenericCommands.xcu
svx/util/svx.component

sw/uiconfig/swriter/toolbar/findbar.xml
and other findbar.xml files

[1] https://wiki.openoffice.org/wiki/Framework/Article/OpenOffice.org_2.0_User_Interface_Controller_Internals#Toolbar

Comment 18 Commit Notification 2025-03-19 00:16:39 UTC

jlorber committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/4957d832c76598e78a57324dad5b4de7345a33e2

tdf#163616 add Match Diacritics to the find bar

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 19 Buovjaga 2025-03-19 05:53:08 UTC

Johann: I noticed Madhavkiran has a related report, bug 163652 in case you want to take a look.

Comment 20 Johann Lorber 2025-03-26 15:37:28 UTC

(In reply to Buovjaga from comment #19)
> Johann: I noticed Madhavkiran has a related report, bug 163652 in case you
> want to take a look.

I'll look into it, not sure I'll have the time to fix just yet though.