Bug 154799 - The ODF partitioning of scripts/languages into "Western", "RTL + CTL" and "Asian" is invalid
Summary: The ODF partitioning of scripts/languages into "Western", "RTL + CTL" and "As...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
7.5.1.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Languages
  Show dependency treegraph
 
Reported: 2023-04-13 23:00 UTC by Eyal Rozenberg
Modified: 2023-04-14 18:54 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2023-04-13 23:00:44 UTC
As we all know languages are not scripts (see also bug 154793); and some languages may be written with either a "Western" script or one which would be considered RTL/CTL. Example: Turkish.

So, let's focus on scripts for a moment. What makes a script have "complex text layout"?

Well, I couldn't quite find an answer in the ODF spec (see here for example:

https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.html#property-style_script-type

). Following Frank Oberle's text (linked to in bug 92655), I looked at the Wikipedia definition:

"Complex text layout (CTL) or complex text rendering is the typesetting of writing systems in which the shape or positioning of a grapheme depends on its relation to other graphemes. The term is used in the field of software internationalization, where each grapheme is a character."

Well, cursive Latin script is certainly like that. Actually, even non-cursive German is like that, due to digraphs: consecutive s's constituting an ß , Serbo-Croatian lj and so on. Yes, these are not very _common_, but having them means one must assume they can occur, making the layout "complex". Also, in Greek, you have intra-word sigma, σ, and final-form sigma, ς.


On the other hand, if you consider non-cursive Hebrew - there are only a few letters which have a special forms: 5 out of 26 are like the Greek sigma - a regular form and a final form. The rest only have one form. So why is Hebrew in a separate category from Greek?

One could argue "well, Hebrew script is written from right-to-left" - but then, I'll call Euro-centric bias. Why is right-to-left "complex" and left-to-right "simple"? Because somebody wrote LTR-only implementations before thinking of RTL? Surely that's not a valid reason.

So, perhaps one could claim that Hebrew is not complex (CTL), it's just RTL. But that begs another question: Why group CTL with RTL scripts? Again, it seems like the rationale is basically "scripts we thought about later so we put them in another box". That doesn't fly.

Another argument might be: "This is what Microsoft Office does" - and historically, perhaps that's how this made its way into StarOffice/OpenOffice and the ODF. But - MSO makes many choices, some right and some wrong; this one is wrong.

As for Asian languages - what sets them apart? If it's mostly/solely the possibility of writing in vertical direction - Latin actually has that (see bug 154756 and links therein). Is there something else justifying their being a separate group?

---

Note: Bug 42123 also made this claim, but more in the context of requests such as bug 151215, plus the argument that "complexity" is subjective.
Comment 1 Eyal Rozenberg 2023-04-14 08:47:33 UTC
See also bug 151215, comment 19, which opines that the grouping is, at least in some contexts, an artifact maintained for compatibility with past decisions.
Comment 2 Regina Henschel 2023-04-14 17:55:54 UTC
The specification of style:script-type attribute (section 20.358 in part 3 of ODF 1.3) has clear definition of "complex" script type. It lists all UNICODE ranges, which belong to "complex" script type.

Please do not mix the definition of a term in a standard with the meaning of a term in general.

If you think, that the ODF standard needs to be improved in this area, the LibreOffice bugtracker is the wrong place for your remarks. The correct way is a mail to office-comment@lists.oasis-open.org.

For details see
https://www.oasis-open.org/committees/comments/index.php?wg_abbrev=office
Comment 3 Eyal Rozenberg 2023-04-14 18:29:01 UTC
(In reply to Regina Henschel from comment #2)


> The specification of style:script-type attribute (section 20.358 in part 3
> of ODF 1.3) has clear definition of "complex" script type.
> It lists all UNICODE ranges, which belong to "complex" script type.

I didn't say the definition was unclear, I said it was invalid.

> Please do not mix the definition of a term in a standard with the meaning of
> a term in general.

Once these definitions impact behavior rather than mere wording, the mixture is has already occurred; and that's doubly the case when the terms themselves are used in the UI. One way to address this bug is encapsulate the use of these terms and spare the users from having to recognize and accept them.

> If you think, that the ODF standard needs to be improved in this area, the
> LibreOffice bugtracker is the wrong place for your remarks. The correct way
> is a mail to office-comment@lists.oasis-open.org.

I'll write there, as well - but if there's a problem with ODF, LO should not simply carry it forward uncritically.
Comment 4 Eyal Rozenberg 2023-04-14 18:54:21 UTC
(In reply to Eyal Rozenberg from comment #3)
> I didn't say the definition was unclear, I said it was invalid.

Wait, I take that back. I said I "couldn't find an answer". So, there is a definition, but it doesn't answer what makes a script complex, it just says which scripts are defined to be complex.