With the new special character dialog it is now possible to search for unicode names like "cross". This workflow is supposed to be the primary method for average users to deal with (not frequently used) special characters. The question is whether or not it should also be possible to search by localized names such as "Kreuz" in German or "Croix" in French. Any mapping should keep the original unicode names, for example 'CROSS MARK' for U+274C. Patch with some discussion https://gerrit.libreoffice.org/#/c/40563/ Unicode table in German https://unicode-table.com/de/
_No_ just no! At Unicode ver 10.0 there are ~136,000 defined codepoints each with a Unicode consortium approved "name". While name "translation" for each script would be a percentage of the total it would still be a daunting task. IHMO what is being suggested would sink the l10n team and is ill-advised. If ICU were to provide library, maybe. But they don't. -> WF
Last time we discussed it the idea was to add a string notifying that the search is only in English. Has it changed? Sophie
I think the idea was to piggy-back on the existing translations (c.f. link from Heiko)
I suggest to reuse official or similar translations of Unicode characters as it's logistically not doable to localize all by ourselves. As most people in the world (except developers, IT professionals, people from Sweden ;-) etc.) don't know English names for characters, the search restricted to English names is mostly useless.
> I suggest to reuse official or similar translations of Unicode characters There is no such thing. The Unicode Standard is English-only. It used to be released in French as well, but that was in the past. This is why you should be able to search for characters by their code point hexadecimal number.
The link Heiko posted seems to be a company named Unicode-Table, but they are not part of Unicode (https://unicode-table.com/en/about/). As Adolfo stated, there is no translation else than French available at Unicode site. Sophie
And I can't find a license on their GitHub repo https://github.com/unicode-table/unicode-table-data/wiki/Index-%28English%29, do you know under what license they release the tables?
(In reply to sophie from comment #6) > The link Heiko posted seems to be a company named Unicode-Table, but they > are not part of Unicode (https://unicode-table.com/en/about/). And the translations are incomplete. As probably any translation of Unicode character names will ever be. A localized search leading to only a few matching characters that happen to be translated is useless.
Speaking as a translator: I don't think it is possible (for us) to provide translations for all those characters in the unicode table. Speaking as a user: The search feature is completely useless for me if I need to know the characters' English names while I am using a localized LO. So I see only one way to go: enable search for characters by their hexadecimal number along with their English names. But in case we find a source of translated strings for those characters, that would be fantastic. :)
If there is no ready translation for reuse available, I would suggest either a partial localization of most used characters like arrow and cross or a localization with the help of programmers: for example in http://www.unicode.org/charts/PDF/U0800.pdf there are only a few words to be localizable (Samaritan, letter, mark, vowel, sign, long, short ...), the language-specific names (alaf, bit, gaman ...) needn't to be localized. I Assume such name system exists in most code blocks.
As stated by others, this would be a difficult task for l10n teams to take on with very little benefit, so lets just close it.
I learned today that this exists: https://github.com/samhocevar/unicode-translation If we integrate that project (once itβs more complete) as a dependency, then it would be possible to localize Unicode characters!!
That's unmaintained since 5 years.
> The question is whether or not it should also be possible to search by localized names such as "Kreuz" in German or "Croix" in French. It should. English should not have preferential status w.r.t. labeling of characters. Either we have localized search-by-name, or we not have search-by-name at all. ... that is, except if this functionality is "outsourced". If we use some system facility / library to perform this search, then I would say that it is NAB, but we should then file relevant bugs with whoever provides us this functionality. (In reply to V Stuart Foote from comment #1) > _No_ just no! At Unicode ver 10.0 there are ~136,000 defined codepoints each > with a Unicode consortium approved "name". If you were to argue in favor of removing all of these names and the search-by-name functionality altogether, then ok. Otherwise, why should it be English? Our users, in the general case, do not know any English. And the more we have adoption in Asia and Africa the more true this will be. (In reply to Muhammet Kara from comment #9) > Speaking as a translator: I don't think it is possible (for us) to provide > translations for all those characters in the unicode table. Please distinguish "possible" with "reasonable to invest effort it". It actually is possible for translation teams to translate the names of all characters; it is just not something that's (probably) not worthwhile to invest in relative to most localization work. > > But in case we find a source of translated strings for those characters, > that would be fantastic. :) Such translations should indeed happen, and perhaps have already happened, irrespective of LibreOffice. Unicode is a widely-accepted international standard, and world states or even commercial enterprises whose work involves typesetting, standardization, catalogues of characterts etc. - may already have performed such translations, or are likely to do so in the future. The resolution of this bug could depend on such efforts, not just on ours. > So I see only one way to go: enable search for characters by their > hexadecimal number along with their English names. If you have the hex value, then it's more like specifying something than searching it, but regardless, being able to type the hex code in the search would also help. I would file a separate bug about that though. (In reply to Eike Rathke from comment #13) > That's unmaintained since 5 years. True, there was just one commit in there. But that one commit has a very large range of characters for several important languages: Amharic, Kinyarwanda, German, Finnish, French, Irish, Danish, Dutch, Polish, Serbian and Slovak.
(In reply to Eyal Rozenberg from comment #14) > ... > If you have the hex value, then it's more like specifying something than > searching it, but regardless, being able to type the hex code in the search > would also help. I would file a separate bug about that though. > ... The SCD already provided lookup/positioning by HEX or Decimal value. And for 25.2 Mike K. added actual Name search [1] by entry of the Unicode pointvalue for bug 111816. =-ref-= [1] https://gerrit.libreoffice.org/c/core/+/171458
(In reply to Eyal Rozenberg from comment #14) > > The question is whether or not it should also be possible to search by localized names such as "Kreuz" in German or "Croix" in French. > > It should. > > English should not have preferential status w.r.t. labeling of characters. > Either we have localized search-by-name, or we not have search-by-name at > all. > > ... that is, except if this functionality is "outsourced". If we use some > system facility / library to perform this search, then I would say that it > is NAB, but we should then file relevant bugs with whoever provides us this > functionality. > > (In reply to V Stuart Foote from comment #1) > > _No_ just no! At Unicode ver 10.0 there are ~136,000 defined codepoints each > > with a Unicode consortium approved "name". > > If you were to argue in favor of removing all of these names and the > search-by-name functionality altogether, then ok. Otherwise, why should it > be English? Our users, in the general case, do not know any English. And the > more we have adoption in Asia and Africa the more true this will be. > We've made no claim to provide more than implementation of the Unicode standard. The Standard is naming in English--absent translation of those standard names, we should not search? That is not tenable, and rather naive. That is like saying our coding and annotation should be localized. We search/filter against the Standard--by Unicode point value, by Name, by Unicode block groupings. If l10n translations of the Unicode standard names can be accomplished to some degree of completeness--then it would make sense to support use of those localized names both in the SCD's UI display but also for its search features. Absent that effort requested here, we work against the Names provided by Unicode standard.