168593 – Character dialog search should find by common aliases, not just by official name

Bug 168593 - Character dialog search should find by common aliases, not just by official name

Summary: Character dialog search should find by common aliases, not just by official name

Status:	UNCONFIRMED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	LibreOffice (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	Special-Character
	Show dependency tree / graph

Reported:	2025-09-28 21:30 UTC by Eyal Rozenberg
Modified:	2025-09-29 18:56 UTC (History)
CC List:	4 users (show)

See Also:	112267 114721
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Eyal Rozenberg 2025-09-28 21:30:08 UTC

When I search for a character in the (special) character dialog, I may not know exactly how it's called officially, but I know what it means; so, I might be providing an alias rather than the formal name. 

For example: I want find the semi-rotated cross-shape we use for multiplying value. I am likely to search for it as "times" or as "product" or "prod" for short. But - I won't find it. Only if I write a part of "multiplication sign" would I get it.

That's not good enough. We should make the effort to find some mapping of character official names / Unicode values to multiple unofficial names, or descriptions, or aliases (and if push comes to shove, create it collaboratively somehow), and allow users to search that too - of course, using the LO UI, not an external tool

Comment 1 V Stuart Foote 2025-09-29 01:28:44 UTC

Seems a dupe of bug 112267, and I remain opposed to attempting any alias annotation or translation/localization of Unicode glyph names for use in the SCD UI.

Proper charmap representation in our SCD is the key to working with Unicode, I argued against collapsing our font tables, instead showing full font chart with Unicode sequence intact including "no-covered" blanks for font omitted glyphs.

Translation, rather than alias annotation might be feasible--but it would be so much to ask of our l10n translators.

As noted on bug 112267 a Pootle based "translation" project from the FSF hosting era https://github.com/samhocevar/unicode-translation/tree/master/po

was stood up with partial "translations" in 11 languages of Unicode at the 4.0 release.

Would be a straight forward l10n effort, but frankly not clear TDF is the right organization to broker it. Though could make the case that we already have a community of active translators dedicated to the LibreOffice effort.

Except that since it would have to respond to user locale we can't afford to do it half way. If started it would have to be finished for our supported locales, and there would need to be rework of SCD to respond to locale.

At Unicode 17.0, each PO file could require up to 297,000 records--though some subset ~30,000 is more likely. And, imagine in this era an initial translation against our LO delivered scripts could be automated in some sequence, so l10n effort could be QC/validation--and once complete somewhat static.

Comment 2 Heiko Tietze 2025-09-29 08:21:01 UTC

I against opening this pandora's box of aliases. And technically it means to add a description to any character of random fonts => NOB/WF

Comment 3 Eyal Rozenberg 2025-09-29 08:26:44 UTC

(In reply to Heiko Tietze from comment #2)
> I against opening this pandora's box of aliases.

You have given not even a hint of why you believe it's a Pandora's box. 

> And technically it means to
> add a description to any character of random fonts

No it doesn't, it means describing Unicode characters - nothing random.

Comment 4 Heiko Tietze 2025-09-29 08:30:15 UTC

(In reply to Eyal Rozenberg from comment #3)
> No it doesn't...
"As of Unicode version 17.0, there are 297,334 assigned characters with code points..." - and you expect the LibreOffice project to add aliases to all of these code pointers. If not all you have to explain why not Tamil, for example, or give users/the community a chance to do. And then again, why LibreOffice?

Comment 5 Eyal Rozenberg 2025-09-29 08:39:28 UTC

(In reply to Heiko Tietze from comment #4)
> "As of Unicode version 17.0, there are 297,334 assigned characters with code
> points..." - and you expect the LibreOffice project to add aliases to all of
> these code pointers.

Do you expect LibreOffice to even have these characters available in the SCD? And keep all of their names? So that users can search those names?

Well, of course you do. And similarly, the answer to your question is - of course I expect LO to add aliases to all of these code points.

> why LibreOffice?

Because we have an SCD. If we always used an external app / system utility for that, you could say that it's NAB.

Comment 6 Heiko Tietze 2025-09-29 08:54:00 UTC

If you describe the problem "Cannot find the glyph <foo>" rather than demanding a (far-fetched) solution, this request would be a duplicate of bug 114721.

Comment 7 Eyal Rozenberg 2025-09-29 08:56:37 UTC

(In reply to Heiko Tietze from comment #6)
> If you describe the problem "Cannot find the glyph <foo>" rather than
> demanding a (far-fetched) solution, this request would be a duplicate of bug
> 114721.

That's a very creative idea for a dupe :-P

However, I am fully siezed of your line of thought, and encourage you to open a meta-bug for improving discoverability of characters on the SCD. If you do I'll definitely mark both of the bugs as blockers... and I'm only saying this with half-tongue-in-cheek.

Comment 8 Eyal Rozenberg 2025-09-29 10:58:49 UTC

(In reply to V Stuart Foote from comment #1)
> Seems a dupe of bug 112267

This is not about translation and localization. Let's forget about all other languages (for the moment) and just focus on English. Or even just en_GB or en_US if you like.

> and I remain opposed to attempting any alias annotation
> or translation/localization of Unicode glyph names for use in the
> SCD UI.
>
> Proper charmap representation in our SCD is the key to working with Unicode,

It's not _the_ key. It's _a_ key. Searching by character name is a perfectly valid thing to do.

> Translation, rather than alias annotation

Those are orthogonal.

Comment 9 V Stuart Foote 2025-09-29 13:15:34 UTC

Sorry, but while *Translation* of the standard Unicode glyph names might be in scope of our l10n efforts, any effort to devise a viable range of descriptive alias (in English) for some subset of Unicode would be unsupportable and of limited use as its scope would be so constrained.

Even the dedicated Unicode manipulation utility, Bablemap--authored/maintained by Andrew West--who statically annotates some percentage of the Unicode Names with additional descriptive notes--non-Unicode annotation is not included with a Name search. Very hard to justify why our charmap implementation would need to descriptively annotate and provide search thereof.  To be followed by demands to translate.

Just _No_, instead focus on the l10n effort of bug 112267 against the standard Unicode Name framework.

IMHO => WF

Comment 10 Eyal Rozenberg 2025-09-29 18:56:48 UTC

(In reply to V Stuart Foote from comment #9)
> Sorry, but while *Translation* of the standard Unicode glyph names might be
> in scope of our l10n efforts, any effort to devise

We shouldn't devise this ourselves of course.

> a viable range of
> descriptive alias (in English) for some subset of Unicode would be
> unsupportable

This is Unicode characters, not our UI. Our UI changes. Assigned Unicode code points basically don't change.

> and of limited use as its scope would be so constrained.

The use is that you can search for characters by their name. Which right now, you can't really, i.e. you can only use the official version of their name, which you don't know to begin with.