Bug 112267 - Localization of (most used) unicode names
Summary: Localization of (most used) unicode names
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: UI (show other bugs)
Version:
(earliest affected)
6.0.0.0.alpha0+
Hardware: All All
: lowest enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Special-Character Not-Localizable
  Show dependency treegraph
 
Reported: 2017-09-07 08:53 UTC by Heiko Tietze
Modified: 2021-06-03 10:36 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Heiko Tietze 2017-09-07 08:53:43 UTC
With the new special character dialog it is now possible to search for unicode names like "cross". This workflow is supposed to  be the primary method for average users to deal with (not frequently used) special characters. The question is whether or not it should also be possible to search by localized names such as "Kreuz" in German or "Croix" in French. 

Any mapping should keep the original unicode names, for example 'CROSS MARK' for U+274C.

Patch with some discussion https://gerrit.libreoffice.org/#/c/40563/
Unicode table in German https://unicode-table.com/de/
Comment 1 V Stuart Foote 2017-09-07 11:54:12 UTC
_No_ just no! At Unicode ver 10.0 there are ~136,000 defined codepoints each with a Unicode consortium approved "name". While name "translation" for each script would be a percentage of the total it would still be a daunting task. IHMO what is being suggested would sink the l10n team and is ill-advised.

If ICU were to provide library, maybe. But they don't. -> WF
Comment 2 sophie 2017-09-07 12:15:12 UTC
Last time we discussed it the idea was to add a string notifying that the search is only in English. Has it changed? Sophie
Comment 3 Thorsten Behrens (allotropia) 2017-09-07 12:17:11 UTC
I think the idea was to piggy-back on the existing translations (c.f. link from Heiko)
Comment 4 Thomas Lendo 2017-09-07 20:07:35 UTC
I suggest to reuse official or similar translations of Unicode characters as it's logistically not doable to localize all by ourselves.

As most people in the world (except developers, IT professionals, people from Sweden ;-) etc.) don't know English names for characters, the search restricted to English names is mostly useless.
Comment 5 Adolfo Jayme Barrientos 2017-09-08 04:21:47 UTC
> I suggest to reuse official or similar translations of Unicode characters

There is no such thing. The Unicode Standard is English-only. It used to be released in French as well, but that was in the past.

This is why you should be able to search for characters by their code point hexadecimal number.
Comment 6 sophie 2017-09-08 09:24:56 UTC
The link Heiko posted seems to be a company named Unicode-Table, but they are not part of Unicode (https://unicode-table.com/en/about/). As Adolfo stated, there is no translation else than French available at Unicode site. Sophie
Comment 7 sophie 2017-09-08 09:55:53 UTC
And I can't find a license on their GitHub repo https://github.com/unicode-table/unicode-table-data/wiki/Index-%28English%29, do you know under what license they release the tables?
Comment 8 Eike Rathke 2017-09-08 10:06:22 UTC
(In reply to sophie from comment #6)
> The link Heiko posted seems to be a company named Unicode-Table, but they
> are not part of Unicode (https://unicode-table.com/en/about/).
And the translations are incomplete. As probably any translation of Unicode character names will ever be. A localized search leading to only a few matching characters that happen to be translated is useless.
Comment 9 Muhammet Kara 2017-09-08 12:17:26 UTC
Speaking as a translator: I don't think it is possible (for us) to provide translations for all those characters in the unicode table.

Speaking as a user: The search feature is completely useless for me if I need to know the characters' English names while I am using a localized LO.

So I see only one way to go: enable search for characters by their hexadecimal number along with their English names.

But in case we find a source of translated strings for those characters, that would be fantastic. :)
Comment 10 Thomas Lendo 2017-09-10 06:14:38 UTC
If there is no ready translation for reuse available, I would suggest either a partial localization of most used characters like arrow and cross or a localization with the help of programmers: for example in http://www.unicode.org/charts/PDF/U0800.pdf there are only a few words to be localizable (Samaritan, letter, mark, vowel, sign, long, short ...), the language-specific names (alaf, bit, gaman ...) needn't to be localized. I Assume such name system exists in most code blocks.
Comment 11 Yousuf Philips (jay) (retired) 2018-01-19 15:24:10 UTC
As stated by others, this would be a difficult task for l10n teams to take on with very little benefit, so lets just close it.
Comment 12 Adolfo Jayme Barrientos 2021-06-03 05:01:46 UTC
I learned today that this exists: https://github.com/samhocevar/unicode-translation

If we integrate that project (once it’s more complete) as a dependency, then it would be possible to localize Unicode characters!!
Comment 13 Eike Rathke 2021-06-03 10:36:25 UTC
That's unmaintained since 5 years.