Bug 151215 - Let me choose different fonts for different languages in the same group
Summary: Let me choose different fonts for different languages in the same group
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Languages
  Show dependency treegraph
 
Reported: 2022-09-28 14:29 UTC by Eyal Rozenberg
Modified: 2023-05-08 09:48 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2022-09-28 14:29:01 UTC
I'm writing a document with text in both English and Russian (or perhaps Hebrew and Arabic) - two languages from the same language group.

I want to be able to set my paragraph styles to have a different choice of font for each of these two languages - to use in the same paragraph.

LibreOffice should let me do that, and of course persist my choice to an ODF file.
Comment 1 Eyal Rozenberg 2022-09-28 21:12:51 UTC
While this may seem like a UI/UX issue - that's only the tip of the iceberg.

The iceberg is the proper support for languages in LibreOffice. And before the whole iceberg is tackled, it's premature to consider the UI implications IMHO.
Comment 2 Telesto 2022-09-29 07:25:07 UTC
I use a broad interpretation of User Experience, including lacking functionality/features. Which can give a bad user experience...

And well there are only so many people who are able to give feedback on this request. UX-team; For the ODF part Regina; Styles Mike
Comment 3 Mike Kaganski 2022-09-29 08:25:39 UTC
Having a per-lang font assignment would likely be the most flexible and natural solution, indeed together with a usable UI where some defaults would allow one to avoid assigning fonts to each of 5000+ human languages ;-D

+1 from me - but this indeed needs a new format feature.
Comment 4 Eyal Rozenberg 2022-09-29 09:25:05 UTC
(In reply to Mike Kaganski from comment #3)

Some more challenges would be:

* Which font(s) should be used for neutral characters?
* What happens when the sets of glyphs/characters for two languages are neither disjoint nor a subset+superset? i.e. when they just partially overlap?
Comment 5 Mike Kaganski 2022-09-29 09:30:10 UTC
(In reply to Eyal Rozenberg from comment #4)


IMO, these should exactly be solved by the proposal, with font attached to language must mean that when you mark any part of text as having this language, this font is used.
Comment 6 Regina Henschel 2022-10-02 19:49:36 UTC
Setting a language to a portion of text is only possible via a character style or as default character style from a paragraph style. And the character style has on the same tab the possibility to set the font. That font is then used with this language. So the problem is not clear to me.
Comment 7 Eyal Rozenberg 2022-10-03 03:51:56 UTC
(In reply to Regina Henschel from comment #6)
> Setting a language to a portion of text is only possible via a character
> style or as default character style from a paragraph style. And the
> character style has on the same tab the possibility to set the font. That
> font is then used with this language. So the problem is not clear to me.


This is inappropriate. Following our discussion a few days back, I've filed a separate bug about it: Bug 151290.
Comment 8 Mike Kaganski 2022-10-03 05:53:03 UTC
(In reply to Regina Henschel from comment #6)

Please check Bug 148257 (in See Also). Basically: you can't define a complex language to e.g. a random space; and the character formatting assigned to the text run will use magic to decide which of the *three* fonts (Western/Complex/Asian) to use on a given character.

By the way, does the "only possible via a character style" include autostyles (direct formatting), which is definitely an option?
Comment 9 Mike Kaganski 2022-10-03 06:04:52 UTC
But in the frame of this specific proposal, the issue is that you have to have several runs of texts having different font assignments *in addition to* having different language assignments: no automatic map between a language and a font possible.

By the way, a character style (autostyle) does not have to specify a language, so a language-neutral style is perfectly possible. OTOH, the current standard does not allow to assign only *one* language to any run: using fallback, any text run has three languages at once, and the mentioned magic defines the group of properties to apply to a given character in the run.
Comment 10 Heiko Tietze 2022-10-07 10:16:56 UTC
UI-wise we could use a tree and have something like:

+ Western: [ Liberation Sans ]
+ Complex: [ Noto Kufi       ]
+ Asian:   [ Noto Asia       ]

and when you expand the section the various languages are listed taking the default from it's parent

+ Western: [ Liberation Sans ]
- Complex: [ Noto Kufi       ]
  Arab     [ Noto Kufi       ] 
  Armenia  [ Serif Armenia   ] 
  Hebrew   [ Noto Kufi       ]
  ...

OTOH, we apply this font to the Default PS and I wonder how two languages could be mixed in a document without using different PS/CS. And if you do so, changing the font is simple.
Comment 11 Mike Kaganski 2022-10-07 11:29:05 UTC
(In reply to Heiko Tietze from comment #10)

IMO, having language groups is a wrong concept. What does any such group really mean? Some artificial separation, reflecting some history of computers, nothing more. The languages in the "groups" are so diverse that the grouping simply doesn't make sense. If at all, just one default + specific languages with their specific assignments look much more reasonable, and I don't expect to see very many real-life documents that would require 50+ such entries in a single document, so the collapsed tree would largely be ergonomically bad (any real use would start with expansion of 1-2 collapsed elements that would not take much space anyway).

> I wonder how two languages could be mixed in a document without using different PS/CS.

Let us consider concept and implementation separately.

Personally I have no problem *storing* language as an autostyle (=DF). Similar to the RCIDs used for unique numbering for document comparison.

And similar to those RCIDs, I would love to *not* have the language in any kind of style UI, because - again, I completely agree, that language of a text is conceptually *not* a formatting, it is data (something that converts a sequence of meaningless characters into a word).

By the way, thank you Eyal for wording this proposal and thinking deeply about it: I kind of felt that there's something unnatural with it, and used to promote use of DF (through "use system input language" feature) over styles for this (unlike some advanced/pro users who advocate for styles only - possibly because only Qt allows using that system input language on Linux), but I never realized it that deep.

> And if you do so, changing the font is simple.

Let me describe a *good* workflow.

1. User defines the language->font mapping table.
2. User types text, and every time they switch keyboard layout, LibreOffice knows which language is used. User doesn't use *any* specific means to tell LibreOffice neither language, nor font. E.g., I have "ENG/US" right now in my taskbar; I type these words, and Writer marks them English (US) without my intervention - and if this proposal is implemented, it also uses the proper font automatically. Then I press Shift+Alt (as I do hundreds of times a day, switching keyboard layout), have "RUS" on the taskbar, and type что-то по-русски - and Writer knows from the OS, that it was a Russian piece of text (and applies the respective font).
3. I decide that I needed to mark some of my commas to be part of Russian - well, because I likely changed the keyboard layout too early / too late (and now it looks a bit inconsistent, and - what's worse - is semantically wrong). I mark the non-character symbols, apply the language (not style!), as I would do today (say, in the status bar), and the font applies accordingly. (And that would also work if I marked them with e.g. Hebrew - so resolving bug 148257).

Please remember that, even though if you are Linux users where the discussed automatic language application is not available, or if you use Roman scripts only, and so don't use different keyboard layouts, please remember that that is only a tiny part of real use of the software, and e.g. Windows users (where the feature is available) is ~90% of the user base, and most of the world uses non-Roman scripts at least to some extent.

So please consider all this from this point of view. Maybe you should also try to configure and use the feature at least for some time to get used to it, and feel what I'm talking about ;)
Comment 12 Eyal Rozenberg 2022-10-07 11:43:22 UTC
(In reply to Heiko Tietze from comment #10)
Actually, once separate languages are supported, it may no longer make sense to even expose the "language groups". What use are they to the user? It's not typical for Arabic fonts to have Hebrew glyphs, and vice versa; nor is it typical IIANM for Japanese fonts to have Korean glyphs.

... the only exception I can think of is fonts which actually cover most/all languages in a language group. But even then - the grouping would more likely be by written script, i.e. all Latin-alphabet languages but not Cyrillic or Greek; or all Arabic-script languages, including Urdu and Farsi, but not Hebrew or Adlam.


Before thinking about the UI itself, let's think about what the user needs to be able to do using the dialog.

That seems to be:

* Setting the fallback mechanism for when the actually-desired fonts don't have the glyphs you want.
* Making per-language font choices for some specific languages.
* Choosing what happens to neutral characters (i.e. which language's font they adhere to)
* Understanding which additional languages would be covered "for free" by a font (e.g. if I want to occasionally use a French word with accents in my English text - can I?)
* Controlling overlaps between fonts beyond the perfectly neutral characters - what gets preferred?

that's a "maximum" list and assuming I haven't forgotten anything.


To stress the point: I suggest we first agree on what needs to be doable using the dialog, and then proceed to think about the UI for it.
Comment 13 Eyal Rozenberg 2022-10-07 11:50:32 UTC
(In reply to Eyal Rozenberg from comment #12)
Oh, I just noticed I was overly focused on the font selection dialog. We also have the drop-down box on the toolbar or in the side-bar; and the effect of a keyboard language switch, which Mike was just discussing. The function of each of these three is somewhat different.
Comment 14 Mike Kaganski 2022-10-07 12:19:55 UTC
(In reply to Eyal Rozenberg from comment #12)
> * Choosing what happens to neutral characters (i.e. which language's font
> they adhere to)

Disagree with some specific part of this wording. Specifically: I totally agree that it would be useful to define a font used for "neutral" characters ... but I believe that it should not be "bind them to font of language X", but rather an own entry. Additionally, in this case it looks like we have an xor case: *either* we map all neutral characters to some font, *or* honor the language of the text run to decide the font. Without the latter, we would not be able to resolve bug 148257; but if we allow the latter, a normal workflow on Windows (naturally marking every piece of text with current system input language) would already apply a language, so neutral characters would have to obey ...

> * Understanding which additional languages would be covered "for free" by a
> font (e.g. if I want to occasionally use a French word with accents in my
> English text - can I?)

I don't see how would that be reasonable part of e.g. Writer - we are not a dedicated tool for font management. If at all, I believe that Special Character dialog (which needs much of love anyway) could be used for something like that. However, if you have an appealing UI mockup that makes it natural in a place like the discussed configuration - why not.

> * Controlling overlaps between fonts beyond the perfectly neutral characters
> - what gets preferred?

This needs some elaboration.
Comment 15 Mike Kaganski 2022-10-07 12:22:45 UTC
(In reply to Mike Kaganski from comment #14)
> Disagree with some specific part of this wording. Specifically: I totally
> agree that it would be useful to define a font used for "neutral" characters
> ... but I believe that it should not be "bind them to font of language X",
> but rather an own entry. Additionally, in this case it looks like we have an
> xor case: *either* we map all neutral characters to some font, *or* honor
> the language of the text run to decide the font.

... and maybe we *only* need a checkbox like "always use *Default* (language-independent) font for neutral characters, irrespective of the language". Why would Default differ from "Neutral" case? In an extreme case, we could define specific fonts for every language we use, and then the default (fallback) would only be used for the neutral, right?
Comment 16 Eyal Rozenberg 2022-10-19 21:02:09 UTC
(In reply to Mike Kaganski from comment #14)
> (In reply to Eyal Rozenberg from comment #12)
> > * Choosing what happens to neutral characters (i.e. which language's font
> > they adhere to)
> 
> Disagree with some specific part of this wording. Specifically: I totally
> agree that it would be useful to define a font used for "neutral" characters
> ... but I believe that it should not be "bind them to font of language X",
> but rather an own entry.

I don't have a strong opinion on this, I (think I) am ok with your approach.

> Additionally, in this case it looks like we have an
> xor case: *either* we map all neutral characters to some font, *or* honor
> the language of the text run to decide the font.Without the latter, we
> would not be able to resolve bug 148257; but if we allow the latter, a
> normal workflow on Windows (naturally marking every piece of text with
> current system input language) would already apply a language, so neutral
> characters would have to obey ...

That's a good point; but - let's suppose that we keep track of the language the user is typing text in, and save that information in the document. That now creates 3 basic options for treating neutrals:

1. Honor what the document says explicitly about the language of the neutrals (using spans, or whatever's in the ODF spec).
2. Apply whatever standard heuristic we have for recognizing language runs in text without language indicated specifically, and assign neutrals according to the language they are determined to be in.
3. Render neutrals as some global override language.

and perhaps a combination of these three, e.g. use (1.) when possible and (3.) as a fallback.

Also (2.) may not actually be that useful in some cases, e.g. "שלום,hello" - is the comma English, or Hebrew? Or even: "Hello,bonjour" - comma in English or French? And finally, "grand,brand" - is this English all the way, or maybe French and English, or even French and Dutch, in which "brand" means fire? ... and then there's no way to make a decent call on which language to choose for the comma.


> > * Understanding which additional languages would be covered "for free" by a
> > font (e.g. if I want to occasionally use a French word with accents in my
> > English text - can I?)
> 
> I don't see how would that be reasonable part of e.g. Writer - we are not a
> dedicated tool for font management. If at all, I believe that Special
> Character dialog (which needs much of love anyway) could be used for
> something like that. However, if you have an appealing UI mockup that makes
> it natural in a place like the discussed configuration - why not.

I was trying to describe a maximal set of potential functionality. We could decide this is too much.

> > * Controlling overlaps between fonts beyond the perfectly neutral characters
> > - what gets preferred?
> 
> This needs some elaboration.

Suppose you want font F1 for language L1 and font F2 for language L2 - but that L1 and L2 have some non-neutral glyphs _in common_ and other glyphs which aren't common. Like Arabic and Farsi I guess. Now, which font should we use for a single glyph in the intersection of L1 and L2? ... if it's something we're typing, I guess you could say "use the locale's language" - but what if you've entered it as a special character? Or if you've pasted plain text? Or if you've used a Unicode hex value to enter a character that's not in the language's set of glyphs?

I realize I'm getting into corner cases here, but again, I've tried to capture the maximal set of potential functionality, and we could decide to ignore this, or force a default, or have another mechanism for handling it (like UI and an UNO command for setting the language explicitly, a-la-bug 148257). If we had that, the font selection dialog could do a little less and corrections could use the language forcing mechanism. Or not.
Comment 17 Eyal Rozenberg 2022-10-19 21:35:34 UTC
Speaking of language setting used keyboard layout... I was reminded of bug 113298.
Comment 18 Heiko Tietze 2022-11-03 08:22:22 UTC
Plenty of input from UX, happy to help when implementation has further questions.
Comment 19 Mike Kaganski 2023-02-23 12:35:40 UTC
I happened to come across the MS theme implementation on OOXML.

The "Font Scheme" description provides two "Font Collections" (major and minor, used for headings and body text, resp.).

ECMA-376 Part 1 (2016), L.4.3.2.5 Major and Minor Font (Font Collection):

> A font collection consists of a font definition for Latin, East Asian, and
> complex script. On top of these three definitions, one can also define a font
> for use in a specific language or languages.

The example of a random OOXML markup in a DOCX (from word/theme/theme1.xml) looks like this:

            <a:minorFont>
                <a:latin typeface="Calibri" panose="020F0502020204030204"/>
                <a:ea typeface=""/>
                <a:cs typeface=""/>
                <a:font script="Jpan" typeface="游明朝"/>
                <a:font script="Hang" typeface="맑은 고딕"/>
                <a:font script="Hans" typeface="等线"/>
                <a:font script="Hant" typeface="新細明體"/>
                <a:font script="Arab" typeface="Arial"/>
                <a:font script="Hebr" typeface="Arial"/>
                <a:font script="Thai" typeface="Cordia New"/>
                <a:font script="Ethi" typeface="Nyala"/>
                <a:font script="Beng" typeface="Vrinda"/>
                <a:font script="Gujr" typeface="Shruti"/>
                <a:font script="Khmr" typeface="DaunPenh"/>
                <a:font script="Knda" typeface="Tunga"/>
                <a:font script="Guru" typeface="Raavi"/>
                <a:font script="Cans" typeface="Euphemia"/>
                <a:font script="Cher" typeface="Plantagenet Cherokee"/>
                <a:font script="Yiii" typeface="Microsoft Yi Baiti"/>
                <a:font script="Tibt" typeface="Microsoft Himalaya"/>
                <a:font script="Thaa" typeface="MV Boli"/>
                <a:font script="Deva" typeface="Mangal"/>
                <a:font script="Telu" typeface="Gautami"/>
                <a:font script="Taml" typeface="Latha"/>
                <a:font script="Syrc" typeface="Estrangelo Edessa"/>
                <a:font script="Orya" typeface="Kalinga"/>
                <a:font script="Mlym" typeface="Kartika"/>
                <a:font script="Laoo" typeface="DokChampa"/>
                <a:font script="Sinh" typeface="Iskoola Pota"/>
                <a:font script="Mong" typeface="Mongolian Baiti"/>
                <a:font script="Viet" typeface="Arial"/>
                <a:font script="Uigh" typeface="Microsoft Uighur"/>
                <a:font script="Geor" typeface="Sylfaen"/>
                <a:font script="Armn" typeface="Arial"/>
                <a:font script="Bugi" typeface="Leelawadee UI"/>
                <a:font script="Bopo" typeface="Microsoft JhengHei"/>
                <a:font script="Java" typeface="Javanese Text"/>
                <a:font script="Lisu" typeface="Segoe UI"/>
                <a:font script="Mymr" typeface="Myanmar Text"/>
                <a:font script="Nkoo" typeface="Ebrima"/>
                <a:font script="Olck" typeface="Nirmala UI"/>
                <a:font script="Osma" typeface="Ebrima"/>
                <a:font script="Phag" typeface="Phagspa"/>
                <a:font script="Syrn" typeface="Estrangelo Edessa"/>
                <a:font script="Syrj" typeface="Estrangelo Edessa"/>
                <a:font script="Syre" typeface="Estrangelo Edessa"/>
                <a:font script="Sora" typeface="Nirmala UI"/>
                <a:font script="Tale" typeface="Microsoft Tai Le"/>
                <a:font script="Talu" typeface="Microsoft New Tai Lue"/>
                <a:font script="Tfng" typeface="Ebrima"/>
            </a:minorFont>

Look how MS already realized that the font should be bound to a *language*, and the three "groups" are just a compatibility artifact.