Bug 160249 - Support marking text as having an arbitrary language
Summary: Support marking text as having an arbitrary language
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 151290
  Show dependency treegraph
 
Reported: 2024-03-17 21:30 UTC by Eyal Rozenberg
Modified: 2024-04-12 10:38 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2024-03-17 21:30:14 UTC
There are lots of artificial domain-specific languages - for programming, for specifying constraints, etc. People sometimes even invent their own toy languages. Once it becomes possible to specify the language of a stretch of text (bug 151290), we should make it possible to specify any language the user wishes to name - i.e. specifying an arbitrary language whose name is provided by the user.
Comment 1 Eyal Rozenberg 2024-03-17 21:31:48 UTC
UI-wise, we should probably not just let the user type a language in whatever combo-box we provide them with, as they are likely to mistype a common language name and get an arbitrary language instead. We should probably offer adding a new language to the set of recognized ones.
Comment 2 Stéphane Guillou (stragu) 2024-04-03 04:54:50 UTC
UX/Design team, what do you think?
We already have "no language", and bug 160256 asks for the "undetermined" code.

Seems to me like very niche use case. Maybe the programming language example talks to me the most: it could be the missing block to allow extensions to deal with "spellchecking" and syntax highlighting on code snippets.
Comment 3 Heiko Tietze 2024-04-03 08:00:52 UTC
It's not a label that you assign to some text but an identifier for a bunch of other tools. Such as spellchecker, hyphenation, text completion/replacement, literal number formats etc. It's far beyond usable and needed to offer customization on all of these options.

Ultimately there is no use case that you solve. We aim to cover all ISO 639 languages, and if you introduce a new one please bring it up to this committee first :-).
Comment 4 Eyal Rozenberg 2024-04-03 08:15:38 UTC Comment hidden (obsolete)
Comment 5 Eike Rathke 2024-04-03 13:42:24 UTC
For 'und' see https://bugs.documentfoundation.org/show_bug.cgi?id=160256#c3 I just added.
For artificial languages see https://wiki.documentfoundation.org/ReleaseNotes/7.5#Language_tags

That way (just entering the language tag in the combo box field or defining it per autocorr/acor_*.dat, or spell-checkers defining language tags) _any_ valid language tag can be used. There is no need to further clutter the language listbox with predefined values almost nobody uses, unless locale data exists.
Comment 6 Eyal Rozenberg 2024-04-03 20:21:15 UTC
(In reply to Eyal Rozenberg from comment #4)

Sorry, I messed up that reply by confusing arbitrary and undetermined. Let me take that back and reply again.

(In reply to Stéphane Guillou (stragu) from comment #2)
> Seems to me like very niche use case.

It is a somewhat a niche case, but then - there quite a lot of highly-obscure, probably-not-spoken languages that we support. I mean, this would have wildly more usage than runic ancient Hungraian...

> We already have "no language"

Do we have "no language"? Where?

And - it's important to have "arbitrary"; but I agree it's not as important as having "no language" or "undetermined language".

> It's not a label that you assign to some text 

Why not? And remember, we're mostly focused on a post-151920 situation.

> Ultimately there is no use case that you solve.

Like with any language code, the use case is telling the app/the reader what is known and not known about the language of a piece of text. That's both a use case for automated tools and for manual editing.
Comment 7 Stéphane Guillou (stragu) 2024-04-04 02:20:04 UTC
(In reply to Eyal Rozenberg from comment #6)
> (In reply to Stéphane Guillou (stragu) from comment #2)
> > We already have "no language"
> 
> Do we have "no language"? Where?
Isn't that what "[None]" is?
Comment 8 Eyal Rozenberg 2024-04-04 07:32:08 UTC
(In reply to Stéphane Guillou (stragu) from comment #7)
> Isn't that what "[None]" is?

Now that you mentioned it, I found it... but I did not think of typing "[", I only looked for "None" or "No language". It's true that it's at the top of the list, but it's a long list and the position I am in by default is somewhere in the middle.
Comment 9 Eike Rathke 2024-04-11 14:58:37 UTC
Again, one can enter any arbitrary syntactically valid BCP 47 language tag in the language list combobox. No matter whether that is tied to the font attribution (bug 151920), even if moved elsewhere, the functionality will persist.
So what is this bug even about?
Comment 10 Eyal Rozenberg 2024-04-11 17:06:37 UTC
(In reply to Eike Rathke from comment #9)

1. This is not a bug, it's an enhancement request.

2. I don't know what a BCP 47 language tag is; the ask here is to be able to enter any language name, not just items available on the list.
Comment 11 Eike Rathke 2024-04-12 10:38:47 UTC
If you don't know about language tags, at https://wiki.documentfoundation.org/ReleaseNotes/7.5#Language_tags I gave in comment 5 there are also two links to follow to learn about. Then if you know how languages can be tagged you'll probably agree that anything taking an arbitrary language name, even more one where people invented "their own toy languages", and converting it to a specific language tag will be near to impossible.