See bug 166011 for more details about the style:script-type attribute. Once the style:script-type attribute is implemented, it can be used to hint whether particular characters (ODF 1.3 20.358 "weak [UNICODE] characters") should be interpreted as a user-specified script type, rather than the script type determined by our hard-coded algorithm. The approach used by certain other word processors is to set their equivalent attribute when users explicitly mark text as having a particular language. This also seems natural to me, but we should evaluate our options before implementation.
Can you outline the pros and cons of "apply a script"? Or why is this an extra ticket?
(In reply to Heiko Tietze from comment #1) > Can you outline the pros and cons of "apply a script"? Or why is this an > extra ticket? Currently, users cannot influence how LibreOffice determines the script type for certain ambiguous characters (e.g. punctuation). This bug proposes overloading Tools->Language->For Selection to also set a hint that ambiguous characters in the selection should be treated as the script type of the indicated language. Pros: - We want to expose this control somehow, and doing it this way avoids adding more to the user interface. - Doing it this way is the precedent set by other suites. Cons: - This changes the behavior of an existing feature. - Document appearance may differ depending on whether languages were set through the menu/statusbar or via styles
Some related discussion around the topic: Bug 103036 - rework document Language Setting into a dedicated dialog Bug 104318 - CTL, CJK & Western Language GUI controls confusing, need rework for a "Global" user community Bug 146928 - Rework font selection dialog for multiple language groups - don't hide CJK/CTL tab The latter has been implemented and we could add some kind of radio or tool button to force the text (DF, CS, and PS) to this style family. As an ignorant writer of the Western sscript I never know how this works. Assuming some mixed text like "Lorem ipsum أو شكل توضع" how do I know whether the caret is on the left tab in the character tab or the right? Would be solved with the mutually exclusive control. Alternatively we move the active tab to the left side stretching the dual-list paradigm a bit. But this could be done easily with standard controls. Is there a chance to get rid of the trinity, somehow, and speak only about languages rather than scripts? IOW, changing the "Language" dropdown from English to Arabic affects the script automatically - and needs to pick a proper font... and vice versa meaning to switch from Liberation Sans to Noto Sans CJK SC turns the language into Chinese and the script into CJK. Sounds wrong to me.
Whoever visits this bug - please read the comments on bug 166011 first, especially those of you who don't know what style:script-type is. ----------------- That being said - I don't quite understand this bug report. > when users manually set a language for a run This is not supported in LibreOffice. Such a capability is not required for resolving 166011 AFAICT. Perhaps you mean for this bug to depend on bug 148257 as well? Then, it would be helpful to also set the already-ODF-standardized style:script-type. > [style:script-type] can be used to hint whether ... characters... should > beinterpreted as a user-specified script type You mean, "dictate", not "hint", right?
(In reply to Heiko Tietze from comment #3) Even given my question to Jonathan, I wanted to remind you.... > Is there a chance to get rid of the trinity, somehow, and speak only about > languages rather than scripts? That is bug 162331 (and to a lesser extent bug 151215). But also remember, that it is fundamentally wrong to set the language of text using the formatting/font selection dialogs. This is a mis-design both in the ODF spec and in LO, as discussed in bug 151215.
(In reply to Eyal Rozenberg from comment #5) > But also remember, that it is fundamentally wrong to set the language of > text using the formatting/font selection dialogs. This is a mis-design both > in the ODF spec ..., as discussed in bug 151215. ODF has nothing to do here. While I agree with you (as you know) about the idea that language is not formatting, but a part of content - the way how it's stored internally is orthogonal; and furthermore, the specific wording that you chose: "it is fundamentally wrong to set the language of text using the formatting/font selection dialogs" - which I agree - makes it completely a UI problem, i.e., again, unrelated to file format spec (which, as you know, doesn't prescribe any dialogs at all).
Created attachment 200236 [details] Mockup (In reply to Heiko Tietze from comment #3) > Alternatively we move the active tab to the left side stretching the > dual-list paradigm a bit. But this could be done easily with standard > controls. Illustration for this idea. Switched the sides and the active tab is right-hand now.
(In reply to Eyal Rozenberg from comment #4) > You mean, "dictate", not "hint", right? No, I don't; the documented intention for style:script-type is to help resolve ambiguity in the existing heuristic for characters which do not conclusively belong to a particular script category. It should not override the heuristic entirely, and is not a substitute for features that do. However, it is useful for interop as it roughly corresponds to the OOXML w:hint attribute.
IIUC this would happen under the hood, allowing "weak characters" to be properly handled when contained in a span responding to a style:script-type attribute (when implemented as for bug 166011). Absent this accommodation, the span could end up splitting around such ambiguous Unicode "weak" glyphs. Assume ICU libs would have handling to isolate the script ambiguous "weak" Unicode--and what we'd have to provide is the ability to fold them into the span in the appropriate script. +1
(In reply to Jonathan Clark from comment #8) > No, I don't; the documented intention for style:script-type is to help > resolve ambiguity in the existing heuristic for characters which do not > conclusively belong to a particular script category. It should not override > the heuristic entirely, and is not a substitute for features that do. Do you mean, not override the heuristics for strong characters, or not override the heuristics for weak characters as well? Sorry for nitpicking the semantics of what you're saying, I want to be 100% I'm on the same page with you about this. (And other people are definitely finding it difficult to be on the same page, it is quite confusing for those who don't have the script-assignment-related bugs on the top of their minds.)
We discussed the topic in the design meeting. Idea from comment 7 might be an improvement although it means to not limit the style:script-type tag to just weak characters. We should keep in mind that this is kind of a direct formatting and should be reverted with Clear DF.
(In reply to Eyal Rozenberg from comment #10) > (In reply to Jonathan Clark from comment #8) > > No, I don't; the documented intention for style:script-type is to help > > resolve ambiguity in the existing heuristic for characters which do not > > conclusively belong to a particular script category. It should not override > > the heuristic entirely, and is not a substitute for features that do. > > Do you mean, not override the heuristics for strong characters, or not > override the heuristics for weak characters as well? Sorry for nitpicking > the semantics of what you're saying, I want to be 100% I'm on the same page > with you about this. (And other people are definitely finding it difficult > to be on the same page, it is quite confusing for those who don't have the > script-assignment-related bugs on the top of their minds.) No problem. To a large extent, this issue is due to LO implementation details. It's more difficult to explain without a common understanding of those details. The ODF specification assigns certain Unicode code point ranges to Latin, Asian, or Complex (ODF 1.3 Table 22). Many code points are left unassigned; for these, handling is *implementation-defined*. style:script-type should only apply to these *implementation-defined* code points. It should not affect code points that are explicitly specified in the standard. --- This behavior differs from other proposals involving explicit language/script settings, which should override handling of all code points.
Jonathan Clark committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/1df13d7ff133837185e6412eeefb77dadd46d056 tdf#166012 Implemented GUI support for style:script-type It will be available in 26.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
This commit makes Writer set an appropriate style:script-type value when users explicitly assign a language to a selection, with either the menu item or the status bar. This implementation matches what other office suites do. I haven't touched the font tab, or the other topics above relating to bug 162331. I think there's consensus that something should be done to remove or minimize the role of script types. My vote is to assume we can make that change sooner rather than later, and treat the UI side of style:script-type as a stopgap.
Jonathan Clark committed a patch related to this issue. It has been pushed to "libreoffice-25-8": https://git.libreoffice.org/core/commit/c2de81072653f84dfe8ed607501719a594b7fe5b tdf#166012 Implemented GUI support for style:script-type It will be available in 25.8.0.0.beta2. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
This doesn't fully work for me. That is, when I select some text in multiple paragraphs, and say it all has a single language that I'm forcing, and write a file out (say fodt), I only get style:script-type set for _some_ of the paragraphs. Will attach soon, suggest reopening unless I'm mistaken somehow.
Created attachment 201248 [details] File with the problem manifesting I chose all paragraphs on the first and third sections (i.e. the ones named "LTR paragraphs with language manually set to Hebrew:" and "RTL paragraphs with language manually set to Hebrew:" ; but not the headings, just the other lines), and marked them as Hebrew using the taskbar. You will note that only the first two lines have this affecting their auto-style. I will say that the file originally had some spans on the first and second paragraphs, which I removed; and I don't know how they got created exactly.
(In reply to Eyal Rozenberg from comment #17) > Created attachment 201248 [details] > File with the problem manifesting > > I chose all paragraphs on the first and third sections (i.e. the ones named > "LTR paragraphs with language manually set to Hebrew:" and "RTL paragraphs > with language manually set to Hebrew:" ; but not the headings, just the > other lines), and marked them as Hebrew using the taskbar. You will note > that only the first two lines have this affecting their auto-style. > > I will say that the file originally had some spans on the first and second > paragraphs, which I removed; and I don't know how they got created exactly. I downloaded this file. Looking at the Style Inspector, Char Script Type Hint is set to complex (4) for all of those paragraphs. Is there somewhere else I should be looking?
(In reply to Jonathan Clark from comment #18) > (In reply to Eyal Rozenberg from comment #17) > > Created attachment 201248 [details] > > File with the problem manifesting > > > > I chose all paragraphs on the first and third sections (i.e. the ones named > > "LTR paragraphs with language manually set to Hebrew:" and "RTL paragraphs > > with language manually set to Hebrew:" ; but not the headings, just the > > other lines), and marked them as Hebrew using the taskbar. You will note > > that only the first two lines have this affecting their auto-style. > > > > I will say that the file originally had some spans on the first and second > > paragraphs, which I removed; and I don't know how they got created exactly. > > I downloaded this file. Looking at the Style Inspector, Char Script Type > Hint is set to complex (4) for all of those paragraphs. Is there somewhere > else I should be looking? Eyal: any update on this?
(In reply to Eyal Rozenberg from comment #17) > and marked them as Hebrew using the taskbar. I meant, the status bar. I selected the paragraphs (of the first and third sections), clicked the language indicator on the status bar, and chose Hebrew. (In reply to Jonathan Clark from comment #18) > I downloaded this file. Looking at the Style Inspector, Char Script Type > Hint is set to complex (4) for all of those paragraphs. Is there somewhere > else I should be looking? Yes. Save the file to an FODT. The styles are unchanged. Also, showing "4" on the style inspector doesn't seem right. Shouldn't you arrange it so that the numeric value should be looked up and shown as a string? (In reply to Buovjaga from comment #19) > Eyal: any update on this? See above comments. This does not seem to be working - at least in the sense of not surviving a save. I was using: Version: 26.2.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: bc58a54d513702bc07906627dce073f05d7978fd CPU threads: 4; OS: Linux 6.12; UI render: default; VCL: gtk3 Locale: en-IL (en_IL); UI: en-US
Please ignore my last comment. It does seem to work, both during the edit session and on save; I was checking the wrong way. One last point before I verify: Showing "4" on the style inspector doesn't seem right. Shouldn't you arrange it so that the numeric value should be looked up and shown as a string? If you can do that - great, if you believe it's out-of-scope for this bug, say so; and then I'd say it's VERIFIED. And a note to buovjaga and other QA peops who may be looking at this: This does work, but - in the the example file (attachment 201248 [details]) the only effect observable effect is the length of the spaces - which increases if you use the Hebrew (= RTL/CTL group) hint, since the font for that family is larger. One might have expected some of the digits and the punctuation marks to be switched to the Hebrew font, but that doesn't happen because of bug 167301, i.e. because Jonathan's implementation (rightly) adheres to the ODF spec, and the ODF spec considers digits and punctuation to be Western, which is wrong.