When I write a document, I often use character styles such as "Emphasis", "Internet Link". "Quotation" Naturally, I want to use these styles for text in different languages - and not define separate styles named "Arabic Emphasis", "Hebrew Emphasis", "N'Ko emphasis" etc. However, as Regina Henschel tells me, it is currently the case that the choice of language is a feature of a character style (or at least - the choice of a single language in each language group). That does also not make sense semantically: The languages I use are part of the content, not the style. I can take Hebrew text and change its "style" - but it will not become Arabic text. So, this should change. The language (and the language group) of a stretch of text must be _removed_ from the character style (explicit or default-style in a paragraph style).
I completely agree (modulo the fact that we must stay compatible, and so must support existing documents using styles exactly for the language definition). Also, the problem of marking runs as having a specific language easily must be solved, also for platforms not provising system input language / users not using that feature. What styles could/should provide is a mapping from a language to a set of formatting, for multiple languages inside a single style - bug 151215. The language applied to the text run shouldn't be formatting itself, and thus having it as part of a style is conceptually wrong.
If you assign the character style "Emphasis" to a portion of text in a paragraph, then this generates a <text:span> element (6.1.7) in file markup. In this <text:span> element you will find the text:style-name="Emphasis" attribute (19.880.33). The style "Emphasis" is a <style:style> (16.2) element in file markup in styles.xml. This <style:style> element has a style:name="Emphasis" attribute (19.502) which identifies the style and a style:family="text" attribute (19.480), which determines the properties, which may be specified in this style. In case of family "text", the properties are contained as attributes in a <style:text-properties> element (16.29.29). Up to 84 properties exists, but you may use a subset of them. The section 16.2 in the standard specifies how the value of a property has to be determined, in case it is not contained in the <style:text-properties> element of a style which is referenced by the to be styled object. [The section numbers refer to ODF 1.3.] These <text:span> elements may be nested, however as the file format is XML, the elements cannot overlap. That means, that ODF allows to apply several character styles to the same portion of text. But that is currently not correctly implemented (bug 115311). If you want, that your style "Emphasis" does not include the language, then simply do not specify the language in the style. You must not touch the language field in the dialog, otherwise the language is set. If you are unsure whether the language is set or not set, look at the Organizer tab of the style modify dialog. To remove a language setting from the style you have to use the "Reset to Parent" button on the "Font" page and set the desired other properties on that page again. Many of the properties depend on the script type of a character. The script type of a character can be "latin", "asian" or "complex". The unicode code point determines which of the three script types applies, not the language. Script type dependent properties have three variants of a property, e.g. fo:font-style, style:font-style-asian and style:font-style-complex. Only one of them is active for a character. So if you set e.g. "italic" for "Western text" and "bold" for "CTL text" the "Emphasis" character style should work for English, Hebrew and Farsi as well. If not, that is a bug. A language is set by the attributes fo:language and fo:country and their "asian" and "complex" variants. These are attributes of a <style:text-properties> element. This <style:text-properties> element can be a child element of a style of family "text". That corresponds to character styles. It can also be child element of a style of family "paragraph". So removing setting a language in a character style or a paragraph style is not possible. We can only try to make the UI clearer reflect the relationships. For example move the language settings to an own tab, so that they cannot be changed by accident when working with other settings.
(In reply to Regina Henschel from comment #2) Noting the use of `text:span` I am reminded of HTML span, and HTML in general. In that standard, the language is an attribute separate from the style (e.g. `<p lang="de-DE" style="bunch of CSS here">`). > In case of family "text", the properties are contained as attributes in a > <style:text-properties> element (16.29.29). Yes, I see: https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.html#element-style_text-properties So, fo:country and fo:language should be removed from style:text-properties. And they should be otherwise settable on text:span's, and probably some other text:XXXX elements. And maybe even other elements. And - styles should be able to carry properties for multiple languages, in multiple language-groups. > If you want, that your style "Emphasis" does not include the language, then > simply do not specify the language in the style. But then - how would the Emphasis style use different font properties for Arabic text and to Hebrew text? Anyway, I believe it should not be _possible_ to specify a language as part of a style; I claim this is a design mistake in the ODF spec.
I like having the ability to set language to None on certain styles. For example, the style I'm using for programming code is set to Language=none because I want it to be exempt from spelling checks. Some styles may have a decorative purpose (eg. those based on Symbol or Wingdings characters) which again benefit from setting language=none. However I can't think of a scenario where I would set a specific language to a style. As Eyal Rozenberg points, language is really part of the content.
(In reply to Panos Stokas from comment #4) > I like having the ability to set language to None on certain styles. > > For example, the style I'm using for programming code is set to > Language=none because I want it to be exempt from spelling checks. > > Some styles may have a decorative purpose (eg. those based on Symbol or > Wingdings characters) which again benefit from setting language=none. > > However I can't think of a scenario where I would set a specific language to > a style. As Eyal Rozenberg points, language is really part of the content. You CAN already to set the Language to NONE to any style you need.
(In reply to BogdanB from comment #5) > (In reply to Panos Stokas from comment #4) > > For example, the style I'm using for programming code is set to > > Language=none because I want it to be exempt from spelling checks. > You CAN already to set the Language to NONE to any style you need. Indeed, and I like that.
I also observed this issue in the context of cells in Calc. I created this forum topic for that: https://ask.libreoffice.org/t/improve-location-of-cell-language-setting-in-calc/102849 Shouldn't the status of this bug be NEW?
Let me bring my 2 cents to this debate. As mentioned in several comments, _language_ is an inherent property of text. Presently, this can only be set through a character style. But styles in general are tools to **format** text, i.e. change its appearance and flow properties. The language attribute in the Font tab mixes two layers: the abstract semantic layer associated to text significance and the "graphical" decoration layer. As pointed out in another comment, language tagging should be separate from the formatting layer. Comment #4 mentions a common usage of the Font language attribute to switch off spellchecking (e.g. for computer code). However, I think this is semantically wrong. Computer code is just another language (_None_ to avoid mistaking it for a human language) and this is too part of the data. Presently, writing multi-lingual documents is a real pain because this means duplicating styles. I don't like either the idea to retrieve current language from keyboard layout. Keyboard, for me, is a language-neutral device to enter characters. I don't practice layout switching for language switch sake because my keyboards have single engraving. I do switch layout but only because I configured various layouts for infrequent characters access, still continuing to type in the same language. Keyboard layout (again in my workflow) is only a description of the physical keyboard (I have one intl-US in addition to my locale) without implication about the language I type. Not using Font tab language attribute is a way to make styles universal. But this means language sequence is set with direct formatting, which is generally bad because there is no UI for it or visual feedback. Auto-detecting current language based on glyph seems to me infeasible: too many languages share characters (e.g. all West-European languages shares the Latin set, Japanese and Chinese share Kanji, …). I don't grasp the present notion of "groups". What is the commonality between Arabic and Hindi in the "Complex" group? Layout rules are dramatically different. What would make sense is language tagging. This should not be based on glyph. Many glyphs are "neutral", like punctuation and in some aspects "ordinary" digits. Consequently, only author's mark up can eliminate ambiguities. I acknowledge that the matter is difficult and compatibility with existing documents must be preserved. Font tab language setting could be kept for that but documentation should discourage its use as obsoleted by a new feature (separate from the formatting layer).
(In reply to ajlittoz from comment #8) > Comment #4 mentions a common usage of the Font language attribute to switch > off spellchecking (e.g. for computer code). However, I think this is > semantically wrong. Computer code is just another language (_None_ to avoid > mistaking it for a human language) and this is too part of the data. This is a good point, but there are actually three separate issues here: * Text with no language * Languages for programming and other specific domains rather than languages developed for general-purpose speech and writing. * Text in arbitrary languages LibreOffice does not know about apriori. > I don't like either the idea to retrieve current > language from keyboard layout. I don't believe that was suggested in the context of this bug. The effect of the chosen keyboard layout on the entered text's language is an interesting discussion to have, but let's not have it in this bug. > Auto-detecting current language based on glyph seems to me infeasible: It's indeed quite infeasible. However, in the context of "filling in" language tagging for a document we obtain with no-language-tagging - that might be a reasonable "limited-effort" heuristic. At any rate - doing so is also a matter for another, dependent, bug :-) > I don't grasp the present notion of "groups". What is the commonality > between Arabic and Hindi in the "Complex" group? Layout rules are > dramatically different. Well, there's some similarity in how typesetting is handled: A lot of glyph-joining. OTOH, there are ligatures in Latin/Western languages too... as for "Asian" languages - those are the ideogramic ones, so again, similarity in handling. But it's basically historical reasons. > What would make sense is language tagging. This should not be based on > glyph. Many glyphs are "neutral", like punctuation and in some aspects > "ordinary" digits. Consequently, only author's mark up can eliminate > ambiguities. Indeed. We need more of the LO community to realize the significance and necessity of this fundamental change, for it to gather enough momentum to be executed. > I acknowledge that the matter is difficult and compatibility with existing > documents must be preserved. Font tab language setting could be kept for > that but documentation should discourage its use as obsoleted by a new > feature (separate from the formatting layer). We will have compatibility considerations for the UI, and compatibility considerations for the document markup, and both must be handled with some care.
*** Bug 160248 has been marked as a duplicate of this bug. ***
Sorry, if my rant is not fit for this bug, but it's really madness how LO handles multilanguage documents on per-style basis. Who the hell invented this bullshit and how I suppose to use a correct spell check with this? Вслед за Брирли (Brierley, 1937) и Якобcон (Jacobson, 1953), исходившими из психоаналитической клиники, и Арнольдом (Arnold, 1970а, 1970b), Изардом (Izard, 1978), Кнаппом (Knapp, 1978) и Эмде (Emde, 1987; Emde et а1., 1978), исходившими из... Really people, do something! It's 2024 out of there! Now I have to see if I can hack and merge some hunspell dictionaries to make a multilangue one:(((
(In reply to Konstantin from comment #11) > Sorry, if my rant is not fit for this bug Yes, even ignoring that rant is off-topic in the bug tracker by definition (it only clutters discussion, and makes bugs less maintainable), it's completely off-topic here: I suppose, you use Linux or macOS, and your issue is actually that language is not detected automatically, which is bug 108151 (keyboard layout and its associated input language is used nicely on Windows, and your problem doesn't arise at all for me, when I'm on Windows).
There are many characters that are common to multiple script groups. Currently, we assign characters to script groups using a hard-coded algorithm. It's not possible to make a perfect algorithm, so we get it wrong: sometimes in ways we can hack around (e.g. bug 112594), and sometimes in ways we can't (e.g. bug 66791). For legacy/compatibility reasons, many characters that should be treated as common are also hard-coded as Western characters, which forces CJK and CTL language users to care about Western scripts while proofing even though they aren't really using them. Explicit language runs would give users control over the behavior of these ambiguous characters. This is a missing but much-needed escape hatch for users working outside of the Western script group happy path. I am therefore marking this bug new. Using explicit language runs to disambiguate script groups would also be a good MVP for this feature. It would be useful on its own, and per-language style features could build upon those changes.
(In reply to Jonathan Clark from comment #13) > This is a missing but much-needed escape hatch for > users working outside of the Western script group happy path. I am therefore > marking this bug new. I of course agree with the marking, but I think we should frame it as much more than an escape hatch. Language is something fundamental in human symbolic communication - and should have a "place of honor" in the markup of documents. It's true that workarounds cover a whole lot - we do have a usable multilingual office suite after all - but the coverage is conceptually convoluted; and we can't cover the "harder stuff" without jumping through even more hoops. For a super-concrete example, look at bug 132000.
If this is already have clear definitions in ODF specification, so we can try to make implementation.
(In reply to Volga from comment #15) > If this is already have clear definitions in ODF specification, so we can > try to make implementation. Unfortunately, this would require changes to the spec, see comment #3. I don't know if the ODF changes would be extensive (possibly not so much), but their impact is rather fundamental.
So it would be nice to seek out the proper solution for fix, and submit as amendments for the ODF specification.
*** Bug 159734 has been marked as a duplicate of this bug. ***
(In reply to Buovjaga from comment #18) Quoting the key complaint from the dupe bug, by user lvm: > Styles include mandatory language setting (in edit style-font-language) > and when a style is applied to text it overwrites language information > with own setting which is a huge inconvenience especially if this text is > multilingual.
(In reply to Eyal Rozenberg from comment #16) > (In reply to Volga from comment #15) > > If this is already have clear definitions in ODF specification, so we can > > try to make implementation. > > Unfortunately, this would require changes to the spec, see comment #3. I > don't know if the ODF changes would be extensive (possibly not so much), but > their impact is rather fundamental. And that's why I suggested a workaround: [undefined] language setting for style, which applies formatting but leaves language information of the original text intact.
(In reply to Eyal Rozenberg from comment #0) > When I write a document, I often use character styles such as "Emphasis", > "Internet Link". "Quotation" Naturally, I want to use these styles for text > in different languages - and not define separate styles named "Arabic > Emphasis", "Hebrew Emphasis", "N'Ko emphasis" etc. That is possible. Currently the character styles "Emphasis", "Internet Link" and "Quotation" do not contain any language information. So currently this styles use if exists the language setting of a parent character style or of a surrounding character style, and otherwise the language of the surrounding paragraph. Please have a look in tab General of 'Edit Style'. There you can see, what is actually contained in the style. Do not get confused with the theoretically possible settings in the other tabs. Currently the language is only included in the character style, when you change the language in the tab "Font". Removing a language is cumbersome. You first need to apply "reset to parent" to the "Font" tab. Close and reopen dialog. And then set the other previously contained settings again. Verify the settings in tab "General". Request to get a more direct way to control the actually contained settings is in bug 128960. It might help users, when we introduce a new tab "Language" and move the fields from tab "Font" to the new tab "Language". > > However, as Regina Henschel tells me, it is currently the case that the > choice of language is a feature of a character style (or at least - the > choice of a single language in each language group). The language setting is not mandatory for a character style. > > That does also not make sense semantically: The languages I use are part of > the content, not the style. I can take Hebrew text and change its "style" - > but it will not become Arabic text. Think it this way: You have a portion of text and the character style contains all what is special with this portion of text. This can be font color, borders, position, and language as well. > > So, this should change. The language (and the language group) of a stretch > of text must be _removed_ from the character style (explicit or > default-style in a paragraph style). What markup do you suggest instead for the file?