Bug 92655 - CTL/CJK: some Language & Script Handling Bugs and Reviving a Dead Horse
Summary: CTL/CJK: some Language & Script Handling Bugs and Reviving a Dead Horse
Status: RESOLVED INVALID
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard: multipleBugs
Keywords: needsDevEval
Depends on:
Blocks: Font-Rendering CJK
  Show dependency treegraph
 
Reported: 2015-07-09 15:30 UTC by Frank
Modified: 2023-04-13 19:53 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Detailed steps to reproduce the bugs listed (1.10 MB, application/pdf)
2015-07-09 15:30 UTC, Frank
Details
General discussion of Complex Text Attributes (1.14 MB, application/pdf)
2015-07-09 15:31 UTC, Frank
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Frank 2015-07-09 15:30:05 UTC
Created attachment 117159 [details]
Detailed steps to reproduce the bugs listed

This submission deliberately ignores the usual recommended practice of creating separate reports for each bug or enhancement request. This is done for two reasons:

1) It seems almost certain that many – if not all – of the bugs and/or questionable behaviors described in this report are very closely related, as are the enhancement suggestions. It seems more advantageous to view these as a forest than as individual trees (as has been done in the past with a few of them) in order to properly assess their impact on non-Latin word processing.

2) Rather than suggesting that these bugs and questionable behaviors be ‘fixed,’ my suggestion is rather that a complete redesign and (hopefully) overhaul of that part of LibreOffice related to Languages, Scripts, and Complex Text Layout (CTL) be undertaken, with the whole concept of CTL being abandoned completely. It is unnecessary with contemporary operating systems, and it causes problems.

The attached ‘Bugs-and-a-Horse.pdf’ document will provide those interested with detailed explanations, steps for reproducing the behaviors, and explicit examples demonstrating how to test these behaviors without requiring ANY knowledge of the particular languages or scripts (and there are several) used in the examples. I’ll address the important ‘Dead Horse’ reference after first enumerating the Bugs.

The specific bugs and annoyances being reported here can be summarized as follows:

Bug #1:  Writer considers shared Latin characters (space, punctuation, etc.) as specifically Latin when detecting non-Latin scripts, causing confusing cursor movement in right-to-left scripts as well as other inappropriate behaviors. –see page 2 of ‘Bugs-and-a-Horse’.

Bug #2:  Writer displays/reports an incorrect ‘Text Language’ in use. –see page 3 of ‘Bugs-and-a-Horse’.

Bug #3:  Writer makes arbitrary, unrequested, inappropriate, and often befuddling font and glyph substitutions, although its choices can usually be determined with a little effort. –see page 3.

Bug #4:  Writer then makes even further font substitutions (i.e. beyond and in addition to those referenced in Bug #3), and these require quite a bit of effort to determine. –see page 3. Bug #4 might be specific to Linux distributions. –see page 30 of ‘Bugs-and-a-Horse’.

Bug #5:  Writer reports the wrong Font-in-Use. –see page 3.

Annoyance #1:  I’ve classified the effort required to determine the font in use as Annoyance #1.

Recommended Enhancement #1: Under Tools > Options > LibreOffice > Appearance, include an option to add a color setting to indicate font substitution and/or glyph substitution (per step 10, bullet three in the ‘Bugs-and-a-Horse’ document). A related enhancement might be an indication of whether ‘bold’ or ‘italic’ text is using a legitimate bold or italic variant of the font in use, or whether either is being simulated.

Annoyance #2+:  Only one so-called ‘CTL’ script can be used in a document without a significant level of pain and confusion. –see page 4 of ‘Bugs-and-a-Horse’. I've only listed this as an annoyance, as it has been an open bug (42123) for some time.

The reason for the ‘+’ is that the CTL acronym is used as if it refers to a characteristic of language, rather than as a characteristic of a script. This is not mere quibbling over semantics, as this misunderstanding adversely affects many of Writer’s operations.

Annoyance #3:  Directly shifting between different ‘Tools > Options’ panels isn’t always possible.  –see page 8 of ‘Bugs-and-a-Horse’.

Annoyance #4:   The five language settings on the ‘Basic Fonts (Western)’ and ‘Basic Fonts (CTL)’ panels are tedious to use and could be made a bit more friendly. –see page 8 (ibid).
Unnumbered Annoyance: I occasionally need to restart Writer rather than simply close the ‘Languages’ panel in order for CTL font and size settings to take effect, but I have never been able to pin down the circumstances required to reproduce it, so it is only mentioned briefly –see page 9.

Annoyance #5:  Handling of keyboard shortcuts when non-Latin Input Methods are selected could use some attention. Under some conditions, [Ctrl + whatever] isn’t detected –see page 9. I've only listed this as an annoyance, as it is already an open bug (66772).

Bug #6:  Characters entered can be ‘lost’ or ignored under repeatable conditions. –see page 10 (ibid).

Recommended Enhancement #2: Under Tools > Options > LibreOffice > Appearance, include an option to color-code text that has direct formatting applied. (per step 25 in ‘Bugs-and-a-Horse.’)

Unnumbered Annoyance: When multiple cells in a Writer table are selected, the ‘Clear Direct Formatting’ function doesn’t seem to work. I haven’t thoroughly tested this, so it is only mentioned briefly.

Bug #7:  Incorrect style applied when setting CTL fonts. –see page 11 of ‘Bugs-and-a-Horse’.

Annoyance #6:  Recommended workarounds for several issues discussed here are tedious –see page 12.

Annoyance #7:  Layout of ‘Languages’ Panel can be misleading –see page 13 of ‘Bugs-and-a-Horse’.

Bug #8:  The taxonomy of the ‘Languages’ panel choices is logically flawed.  –see page 13.

Bug #9:  Full Justification is improperly completed in Right-to-Left scripts (ironic since Arabic’s more sophisticated Kashideh RTL justification seems to be handled well.  –see page 13.

Bug #10: Characters in Right-to-Left scripts cannot be rotated; furthermore if they are pasted into a frame formatted with character rotation, their order is reversed.  –see page 17.

In order to become familiar with the scope of these bugs and annoyances, I would suggest reading through the entire document before attempting to follow all the steps in detail, but that’s just my opinion.

While ‘Bugs-and-a-Horse’ began as a series of steps needed to reproduce the behaviors of concern, it soon grew into a rambling (and possibly opinionated) essay. The ‘dead horse’ reference, an allusion to the English language idiom ‘beating a dead horse,’ reflects another objective of ‘Bugs-and-a-Horse’: to reopen some occasionally passionate earlier discussions about similar bug reports that ended up being ignored, avoided, unresolved, or simply misunderstood. I’ll be referring to these discussions as the ‘Voices of Reason’ discussions – the name is based on a July 2012 comment during those exchanges where poster Shahar Or said ‘the voices of reason’ have been present here.’ These can be seen at any of these links:

http://lists.freedesktop.org/archives/libreoffice/2012-June/033552.html
http://lists.freedesktop.org/archives/libreoffice/2012-July/034427.html
http://lists.freedesktop.org/archives/libreoffice/2012-July/034958.html 

The ‘Bugs-and-a-Horse’ document is intended to support my beliefs that a) due to progress, some fundamental sections of LibreOffice have been become obsolete and perhaps even irrelevant since they were first developed, and b) the time to revise this subsystem is now, and it should be given priority. Therefore, some historical and architectural perspective – showing why the time to revise this subsystem is now – is included.

A second document, titled ‘Exploring_CTL.pdf’ is also attached as a supplemental discussion of various aspects of what is known as ‘Complex Text Layout.’
Comment 1 Frank 2015-07-09 15:31:56 UTC
Created attachment 117160 [details]
General discussion of Complex Text Attributes

Attached here, as the posting method only permits one file to be attached.
Comment 2 Cor Nouws 2015-07-09 16:01:18 UTC
Hi Frank,

Thanks a lot for the detailed writing...
You do know that we work with the rule: one report for one issue?
Ciao,
Cor
Comment 3 Frank 2015-07-09 16:19:13 UTC
I do indeed, but I'm a big fan of Admiral Grace Hopper who purportedly said it's easier to ask for forgiveness than for permission - or something to that effect.

That's why I began with:

Quote:
This submission deliberately ignores the usual recommended practice of creating separate reports for each bug or enhancement request. This is done for two reasons:

1) It seems almost certain that many – if not all – of the bugs and/or questionable behaviors described in this report are very closely related, as are the enhancement suggestions. ******It seems more advantageous to view these as a forest than as individual trees****** (as has been done in the past with a few of them) in order to properly assess their impact on non-Latin word processing.

2) Rather than suggesting that these bugs and questionable behaviors be ‘fixed,’ my suggestion is rather that a complete redesign and (hopefully) overhaul of that part of LibreOffice related to Languages, Scripts, and Complex Text Layout (CTL) be undertaken, with the whole concept of CTL being abandoned completely. It is unnecessary with contemporary operating systems, and it causes problems.
End Quote:

Regards ....

Frank
Comment 4 V Stuart Foote 2015-07-09 17:10:09 UTC
Well, as this is such a broad topic, while it contains legitimate "issues" with structure implementing CTL support setting to be an enhancement.

Lets see what comes of needsDevEval
Comment 5 Joel Madero 2015-07-09 20:43:01 UTC
This bug should be closed as INVALID and it should be split up in multiple bug reports. I know of absolutely no developer that has the time or patience to read a book to find out about 10 issues....
Comment 6 Joel Madero 2015-07-09 20:44:07 UTC
actually I'll close it myself - this isn't a valid report. It's impossible for a developer to "take an issue" because "taking the bug" means they take 10 issues (which no developer will do). Please report the issues separately with clear reproducible steps and sample documents where appropriate. Also please include your OS and version of LibreOffice in the comment. Thanks
Comment 7 V Stuart Foote 2015-07-09 21:35:01 UTC
@Joel, sorry but the multipleBugs clearly allows this, so closing it is a bit heavy handed. 

Frank's attached document is actually a rather good technical analysis laying out specifics of the problem in LibreOffice's implementation of CJK, CTL and support for non-latin Unicode.

It has been renotated from its "bug" defect status to be an "enhancment" which is exactly what needs to be accomplished by redevelopment of how LibreOffice supports polyglot Word processing.
Comment 8 Joel Madero 2015-07-09 21:45:37 UTC
Moved back to NEW with a lot of hesitation - having this kind of overwhelming bug report is a good way to get developers to ignore it in my opinion. So my recommendation is to split it up and then I believe (not positive) that there is already a "meta bug" for CTL issues that these individual bugs could block but - I'm not going to argue over the semantics of how to report a bug.

Why is it needsDevEval? It's new - and it's not an easy hack - what else is there to evaluate? NEW implied that someone confirmed all 10 of the issues listed already.
Comment 9 Frank 2015-07-09 22:02:10 UTC
Since the posting obviously annoyed you, I suppose I should respond in some fashion, so here goes:

You're right - no developer will take on ten bugs, and that certainly wasn't my intent. Perhaps I should have posted this somewhere else, but it wasn't obvious where that might be, since it relates more to poor design and/or architecture than bug reporting.

It's obvious you haven't actually read either of the documents, so here's a summary.

LibreOffice supposedly supports using multiple languages - but it really doesn't. It can of course be forced to incorporate several scripts at once, and the documents I posted are there to demonstrate how doing so is beyond painful. The examples are also structured so these examples can be easily reproduced by folks who have no familiarity at all with Thai, or Hindi, or Arabic, or Hebrew (among the variety of scripts and languages I use to demonstrate the problems).

Although a developer will not "take" all ten, it is probably close to impossible for any (single) developer to handle just one of these bugs in isolation without being aware of the other bugs AS WELL AS THE COMMON CONTEXT in which they appear. If they are not viewed as a whole, such a hypothetical developer would be forced to add yet more hacks to the handling of text presentation - something that is pretty fundamental to a word processor.

The listing of "bugs" was not intended as a request that each of these be fixed; it was intended as evidence that there is a fundamental issue with some obvious text layout commonality lurking beneath the surface.

1) If it helps to motivate you to read at least the first document through, it contains quotes from someone with the same name as you who's been posting on this list for a long time ... (re: your comment "to find out about 10 issues": this guy with the same name as you said he didn't even know what CTL was used for. I'm merely attempting to let him know what that is so he can see why it doesn't work well at all and confirm that for himself.)

2) If it helps to motivate you to read at least the first document through, were you aware that there are repeatable instances where right-to-left text is laid out left-to-right with the characters in reverse order?

3) If it helps to motivate you to read at least the first document through, were you aware that entering numbers to begin paragraphs in Hebrew or Arabic text is actually funny to watch? Try it - it's sure to make you think that Candid Camera is lurking somewhere in the background!

At least load the first pdf into a tablet or e-reader and save it for when you need something to read in the bathroom - you might find that there is actually something wrong - something that might possibly explain why LibreOffice is more or less not anyone's first choice when utilizing multiple languages.

Have a great day - and if I ticked you off (yes, it came through loud and clear), I apologize; that was not at all my intent. I actually want to use the product - I want all the capabilities it has, which the apps that handle multiple languages just don't have.

Frank
Comment 10 Frank 2015-07-09 22:05:29 UTC
Hi again, Joel:

I read the note you just added. I should also mention that the document discusses how such a redesign might be approached (I'm not actually ignorant of development processes - I'm just old and retired and unused to volunteer developers picking and choosing).

If you're having trouble deciding what to do with this (and I'm unaware of the "meta-bug" you referred to), at least glance through what I wrote.

Have a great day!

Frank
Comment 11 V Stuart Foote 2015-07-09 22:19:09 UTC
(In reply to Joel Madero from comment #8)
> Why is it needsDevEval? It's new - and it's not an easy hack - what else is
> there to evaluate? 

Is anyone but a developer (hopefully Frank's "ideal developer" as on page 21 of the treatise) comfortable with the underlying code? I worked several of the examples and find no flaw with Frank's argument for needed improvements.

> NEW implied that someone confirmed all 10 of the issues
> listed already.

No, it is NEW because there is no question that when accepted as an enhancement it is--while perhaps over ambitious in scope--all valid enhancement to LibreOffice. And so is deserving of developer review and comment, in any case Frank's write up is an enjoyable read.
Comment 12 Joel Madero 2015-07-09 22:23:26 UTC
Didn't tick me off at all ;) I'm blunt - I'm known for being blunt, I don't apologize for being blunt, if I was ticked off my choice of language would have been more colorful :)

My point was just that if the goal is to have issues resolved - the best way to do that is to have 1 bug per report. If the goal is instead just to list out some issues for "future discussion" - then I think this is just fine (although then I might suggest the UX mailing list is a better place for discussion).

But VSF is a committed contributor and when I disagree with another contributor who knows how things move forward - I tend to error on the side of "they win" unless it's more serious than this :)

Thanks all for your continued patience, contributions, and support of LibreOffice!
Comment 13 ⁨خالد حسني⁩ 2015-07-10 00:16:09 UTC
A 48 pages bug(s) description document is the best way to make sure that no developer will ever look into this. I mean I’m a native speaker of complex text script that is also written from right to left, and have hacked the text layout code of LibreOffice quite a bit and I couldn’t go past page 2, not to mention that I’m still confused as hell about the page and half that I managed to read and I can’t tell what any of this is about.
Comment 14 Joel Madero 2015-07-10 03:56:06 UTC
Glad one of our most respected developers agrees with me that this "bug report" is completely useless ;) Again though - feel free to leave it open as NEW and let no developer ever touch it.

Also I just found time to fully read that last remark to me and the smart ass tone is another awesome way to get people to ignore whatever issues this might raise.
Comment 15 Cor Nouws 2015-07-10 08:27:26 UTC
Just as a contrast (not that I dislike variation in peoples styles) what I learned from Synerzip/CloudOn devs at the Bern conference.

- For each issue they have one document with often just one work/line/graphic.
- Open in A and do/or open in B.
- It looks X (see e.g. picture) and should do Y (see picture).

That is IMO more inviting to check and handle.
Comment 16 Joel Madero 2015-07-12 16:36:57 UTC
Given the feedback to split this up and the expert advise has already been provided (Khaled Hosny is an expert) I'm closing this as INVALID again. Please do as others have requested (again, including Khaled Hosny who is an expert in the code and understands CTL, and has been involved in that area of code, and took the time to read 2 pages of that novel without understanding any of it) and report these issues separately in a clear and succinct way (like Cor suggested) without providing the steps in an attachment. 

So to summarize:
One bug = one report;
One report contains reproducible steps in the report, no in an attachment;
Each report should have a *simple* attachment demonstrating the issue;
Screenshots can be nice as well.
Comment 17 Robinson Tryon (qubit) 2015-12-18 10:39:47 UTC
Migrating Whiteboard tags to Keywords: (needsDevEval)
[NinjaEdit]
Comment 18 Michael Meeks 2016-03-12 21:31:13 UTC
I agree with Joel, but I love the thought and attention that has gone into this report too =) Thanks Frank !

Here is what I suggest; if Frank has the time, I'd like to see a tracker bug for this; potentially we could re-title this one; or just create a smaller one like the CJK meta-bug above; and then split each related item out; many of them are feature requests, some are bugs - eg. the semi-random font substitution issue particularly across platforms is a known horror that we are looking into fixing in various ways (at least on Windows) with DirectWrite.

Anyhow - so - I don't want to loose the goodness here; and I'd like to provide a place for QA to aggregate this lot together so that (in some distant future) we can gather like-minded developers onto this, put together a plan and execute on it to make this beautiful in LibreOffice.

Frank - you seem rather clueful =) are you able to hack on any of these too ?

Thanks !
Comment 19 Frank 2016-03-12 22:11:06 UTC
(In reply to Michael Meeks from comment #18)
Hi Michael:

I was pleasantly surprised to learn that someone actually read my document. Before retirement, I was part of the "corporate" world, and was rather astonished to learn how different the world of open source was, and naively thought I could make some architectural contribution in an area with which I have a lot of familiarity.

As for any further involvement on my part, I would have to say No: I've learned my lesson.

I do, in fact, still use LibreOffice when it's called for, but certainly not for anything that requires multiple languages. And I continue to find that each automatic update causes stomach cramps and itchy eyeballs as I wait to see what's been broken (in the version I currently have - Version: 5.0.5.2: Build ID: 1:5.0.5~rc2-0ubuntu1~trusty1: Locale: en-US (en_US.UTF-8) - for instance, I find that when I work with Writer, the memory leaks are way beyond what I've experienced since the days of 3.x, to the point where my system sometimes locks up completely. And it's been quite a while since I've experienced that with any Linux distro).

"Testing" seems to consist of "seeing if some change works," which is of course about the least significant part of that process in professional circles. Another attitude that makes me unsuitable for any such effort is the seemingly utter disdain for architectural underpinnings. Ugh!

I understand that working on bugs ("hacking" as you refer to it) is how young or inexperienced programmers can get involved with something "real," and I certainly think that's a good thing (although in the real world, we would have a mentor, something the open source community likely can't afford). And I realize that complaining about a product that does many things well and for FREE is a bit disingenuous on my part.

But thanks for what seem to be some kind words, and best of luck.

Frank
Comment 20 Eyal Rozenberg 2023-04-13 19:07:25 UTC
Hello Frank,

I have just stumbled upon this "bug", which is more like a manifesto / position paper / perspective on multi-language and multi-script handling in LO.

I wish I'd learned about it sooner, since it relates very closely to several individual bugs I have filed - taking the "one bug at a time" approach, despite those sometimes being provocative or a mote-in-the eye. Thanks goes to Mike Kaganski for pointing me here.

I'm going to mark some of these as related; I'll then read your attached document, even if things might have changed a bit over 8 years; then I hope we can have a bit of a discussion; and finally, perhaps we would be able to file additional separate and more actionable bugs based on what you've written. That is, if you're still interested.

So,

Bug 151290: A language must not be a feature of a character/paragraph style
Bug 148257: Need ability to explicitly set the language of a piece of text 
Bug 151215: Support different fonts for different languages in the same group