Currently, our implementation of style:script-type only affects the appearance of characters that are unmapped to a script type per the ODF specification (see ODF 1.3 Table 22). Although this implementation technically complies with the standard, it ignores many characters that users may reasonably want to override our default behavior, such as numerals and certain mathematical symbols. We should investigate how to safely expand style:script-type to cover more characters. For the benefit of interoperability, we should also evaluate the algorithm described in the OOXML standard (ECMA-376-1:2016 17.3.2.26).
I'd say the mapping by the standard is just plain wrong for many of those characters. Since when are numerals and punctuation marks "western"? I would suggest we have a list of choices regarding the Unicode character mapping scheme, in Tools > Options > Languages and Locales : It could have 3 items: * LibreOffice heuristic * ODF 1.4 §20.358 Table 23 mapping * ECMA-376-1:2016 OOXML 17.3.2.26 mapping (or whatever MSO is using) The LO heuristic would be whatever modifications we decide we want to make on the ODF table. If the Unicode consortium defines which characters are strongly associated with one or a set of languagues, we could compose that mapping with our language-to-language-group mapping (which we have, right?), - so that if the languages aren't fully within a single language group, we try to apply the hint to choose between the groups. That could be a fourth and call that a fourth option, "Unicode-consortium-based". Or just use that ourselves and drop the "LO Heuristic" option.
Is this
(In reply to Eyal Rozenberg from comment #1) > I'd say the mapping by the standard is just plain wrong for many of those > characters. Since when are numerals and punctuation marks "western"? > > I would suggest we have a list of choices regarding the Unicode character > mapping scheme, in Tools > Options > Languages and Locales : It could have 3 > items: > > * LibreOffice heuristic > * ODF 1.4 §20.358 Table 23 mapping > * ECMA-376-1:2016 OOXML 17.3.2.26 mapping (or whatever MSO is using) I'm worried it might be asking too much to expect users to have an opinion about this. I'm in favor of an OOXML compatibility flag once our implementation is robust enough that the differences start to matter, but only because we can make that choice for the user automatically when they open a DOCX file. > If the Unicode consortium defines which characters are strongly associated > with one or a set of languagues, we could compose that mapping with our > language-to-language-group mapping (which we have, right?), - so that if the > languages aren't fully within a single language group, we try to apply the > hint to choose between the groups. That could be a fourth and call that a > fourth option, "Unicode-consortium-based". Or just use that ourselves and > drop the "LO Heuristic" option. Unicode doesn't have an opinion about language, but they do associate characters with scripts. The ODF/OOXML standards already obey Unicode for the most part, with the exception of mishandling the "common" script type.