Bug 151290 - A language must not be a feature of a character/paragraph style
Summary: A language must not be a feature of a character/paragraph style
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
7.4.1.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: ODF-spec Languages
  Show dependency treegraph
 
Reported: 2022-10-03 03:51 UTC by Eyal Rozenberg
Modified: 2022-10-03 20:26 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2022-10-03 03:51:20 UTC
When I write a document, I often use character styles such as "Emphasis", "Internet Link". "Quotation" Naturally, I want to use these styles for text in different languages - and not define separate styles named "Arabic Emphasis", "Hebrew Emphasis", "N'Ko emphasis" etc.

However, as Regina Henschel tells me, it is currently the case that the choice of language is a feature of a character style (or at least - the choice of a single language in each language group).

That does also not make sense semantically: The languages I use are part of the content, not the style. I can take Hebrew text and change its "style" - but it will not become Arabic text. 

So, this should change. The language (and the language group) of a stretch of text must be _removed_ from the character style (explicit or default-style in a paragraph style).
Comment 1 Mike Kaganski 2022-10-03 09:13:02 UTC
I completely agree (modulo the fact that we must stay compatible, and so must support existing documents using styles exactly for the language definition). Also, the problem of marking runs as having a specific language easily must be solved, also for platforms not provising system input language / users not using that feature.

What styles could/should provide is a mapping from a language to a set of formatting, for multiple languages inside a single style - bug 151215. The language applied to the text run shouldn't be formatting itself, and thus having it as part of a style is conceptually wrong.
Comment 2 Regina Henschel 2022-10-03 16:24:32 UTC
If you assign the character style "Emphasis" to a portion of text in a paragraph, then this generates a <text:span> element (6.1.7) in file markup. In this <text:span> element you will find the text:style-name="Emphasis" attribute (19.880.33).
The style "Emphasis" is a <style:style> (16.2) element in file markup in styles.xml. This <style:style> element has a style:name="Emphasis" attribute (19.502) which identifies the style and a style:family="text" attribute (19.480), which determines the properties, which may be specified in this style.
In case of family "text", the properties are contained as attributes in a <style:text-properties> element (16.29.29). Up to 84 properties exists, but you may use a subset of them. The section 16.2 in the standard specifies how the value of a property has to be determined, in case it is not contained in the <style:text-properties> element of a style which is referenced by the to be styled object.
[The section numbers refer to ODF 1.3.]

These <text:span> elements may be nested, however as the file format is XML, the elements cannot overlap. That means, that ODF allows to apply several character styles to the same portion of text. But that is currently not correctly implemented (bug 115311).

If you want, that your style "Emphasis" does not include the language, then simply do not specify the language in the style. You must not touch the language field in the dialog, otherwise the language is set. If you are unsure whether the language is set or not set, look at the Organizer tab of the style modify dialog. To remove a language setting from the style you have to use the "Reset to Parent" button on the "Font" page and set the desired other properties on that page again.

Many of the properties depend on the script type of a character. The script type of a character can be "latin", "asian" or "complex". The unicode code point determines which of the three script types applies, not the language. Script type dependent properties have three variants of a property, e.g. fo:font-style, style:font-style-asian and style:font-style-complex. Only one of them is active for a character. So if you set e.g. "italic" for "Western text" and "bold" for "CTL text" the "Emphasis" character style should work for English, Hebrew and Farsi as well. If not, that is a bug.

A language is set by the attributes fo:language and fo:country and their "asian" and "complex" variants. These are attributes of a <style:text-properties> element. This <style:text-properties> element can be a child element of a style of family "text". That corresponds to character styles. It can also be child element of a style of family "paragraph". So removing setting a language in a character style or a paragraph style is not possible. We can only try to make the UI clearer reflect the relationships. For example move the language settings to an own tab, so that they cannot be changed by accident when working with other settings.
Comment 3 Eyal Rozenberg 2022-10-03 20:26:52 UTC
(In reply to Regina Henschel from comment #2)
Noting the use of `text:span` I am reminded of HTML span, and HTML in general. In that standard, the language is an attribute separate from the style (e.g. `<p lang="de-DE" style="bunch of CSS here">`).

> In case of family "text", the properties are contained as attributes in a
> <style:text-properties> element (16.29.29).

Yes, I see:

https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.html#element-style_text-properties

So, fo:country and fo:language should be removed from style:text-properties. And they should be otherwise settable on text:span's, and probably some other text:XXXX elements. And maybe even other elements.

And - styles should be able to carry properties for multiple languages, in multiple language-groups.

> If you want, that your style "Emphasis" does not include the language, then
> simply do not specify the language in the style.

But then - how would the Emphasis style use different font properties for Arabic text and to Hebrew text?

Anyway, I believe it should not be _possible_ to specify a language as part of a style; I claim this is a design mistake in the ODF spec.