Created attachment 52834 [details]
A sample ODT text with multiple diacritical marks in the letters
In some RTL languages (and perhaps in some LTR languages too but I don't know), there are diacritical marks added to the letters which stand for vocalization, syntax and other matters. These marks are independent Unicode characters, but they appear within or attached to a specific letter. One letter can have several such marks attached to it.
In most common word processors (e.g. Word and Google Docs), the cursor takes these marks into account when it moves. For instance, if a letter has two such marks the cursor will move over the letter in three steps. This allows for convenient manual editing, because an individual mark can easily be selected, deleted, or copied elsewhere.
In LibreOffice Writer, however, the cursor moves over each letter in a single step, no matter how many diacritical marks it has, thus making it difficult or impossible to accomplish basic editing functions in vocalized texts.
This bug was originally reported to the Hebrew OO/LO forum, where it was pointed out that the LO behavior also has an advantage in making it less tedious for the cursor to pass through large blocks of text. The discussion appears here (in Hebrew):
The question therefore is whether LO currently provides an option to choose the behavior of the cursor. If that option currently exists then no one has yet been able to locate it; if it doesn't exist then it should be provided to the user at least as an option and possibly as the default. Otherwise, the cursor should behave as in Word and Google Docs so that all normal editing functions are possible.
A sample ODT text with multiple diacritical marks in the letters is attached.
I can confirm this. As one who works with Hebrew poetry, which contains many vowel marks, this bug harms my productivity.
I have a few comments to add:
1. Notice that RTL languages use diacritical characters a little different than latin languages. The latin language have letter with diacritical characters about them as part of the keyboard layout and are considered a single char. RTL languages add the diacritical characters near the letter and word processors consider it as different chars.
2. For those who don't speak Hebrew, the thread in the link in the first message, except from describing the problem has two other issues:
- If you put the cursor before the letter and hit delete, the letter and the diacritical characters that follow it will be deleted. If you put the cursor after the letter and hit backspace it will delete one diacritical character. Using backspace again will delete the next diacritical character, until non left and the letter itself will be deleted.
- While the issue in the bug above is a valid problem, I'm not sure what is the wanted behavior. Having to do much more keyboard stocks to move the cursor because of the diacritical characters might be too much (think of children books, which are usually with diacritical characters).
3. The same problem is present in both Linux and Windows.
The "wanted behavior" obviously has to support the full range of editing. And most Hebrew text doesn't have a lot of diacritical marks, so that it doesn't hamper the cursor in them.
So I move that we decide the "wanted behavior" is default like in Word and Google Docs, but with an added option to change the behavior of the cursor so that it will pass over letters in a single step like now.
What LibreOffice basically does, or tries to do at least, is follow http://unicode.org/reports/tr29/ and it should work like so...
a) The cursor keys skip over a full grapheme
b) A "delete" deletes a full grapheme
c) A backspace however, chips away a single codepoint at a time
d) A alt+cursor key moves a single code-point at a time
So... you should be able to use alt+cursor to move the cursor inside a grapheme codepoint-by-codepoint and delete a single codepoint that makes that up. Cursor sort of lacks smarts so it doesn't have a way to indicate which constituent codepoint it's currently addressing, but alt+cursor should allow individual code-point addressing anyway.
i18npool/source/breakiterator/data/char.txt is involved in what's considered a grapheme cluster or not FWIW
So, if I look only at comment #1, how/where does word/google docs draw the cursor when it's addressing a diacritical mark ? i.e. is it visually obvious whether the mark or the affected character is being addressed. A screenshot might be helpful for that at least.
Thanks Caolán for the additional information.
What was most interesting to me was d). I tried out alt+cursor and it does indeed move one code-point at a time. That is already very helpful for editing in LO. It would be nice if there was a way to make people aware of this feature, because there seems to be no indication of it for the user and no one (even LO programmers) seems to be aware of it when asked...
From trying it out, however, what alt+cursor does *not* seem to allow is selection of an individual code point. If you move along at alt+cursor but simultaneously press shift in order to select the text, the cursor doesn't even move and the code point cannot be selected to be pasted elsewhere. This is a very real lack in functionality.
Regarding the way the cursor moves graphically in Word and Google Docs: It approximates how far it has moved through the full grapheme, appearing half-way through in the middle, or 1/3 or 2/3 through. The visual representation isn't perfect, but it is functional enough and the user gets a sense as to where the cursor is "actually" standing. When you "select" one code-point within the grapheme (including the letter itself without the additional code-points), only part of the width is colored as "selected". As an example, I will now attach a screenshot of an example in Google Docs where a letter was followed by two code-points, and the first of these code points (i.e. the "middle" of the three total characters) is selected, the picture showing the selection as the middle third of the full grapheme highlighted.
Created attachment 53332 [details]
A letter with two added code points, where the first code-point has been individually selected in Google Docs.
uh huh, not great either in e.g. google-docs
a) Might be that we haven't implemented alt+shift+arrow, or that its eaten by the windowmanager before it gets to us. That needs looking at anyway.
b) Maybe we can come up with some better scheme to visually indicate selection of part of the underlying codepoints that make up a glyph. graphite has some awesome stuff there, but that needs graphite fonts anyway I think, so still need a general case solution
c) Maybe word-of-mouth will be sufficient to get alt+cursor known now :-)
"uh huh, not great either in e.g. google-docs"
Actually it works great in Google Docs. Try it out at the link below, which I made open to editing:
I added "ক্ট্র" which is Bengali KA+HALANT+TTA+HALANT+RA, i.e. 5 codepoints that make up one glyph/grapheme. For that it seems to end up with the "extra" cursor positions well outside the glyph.
Since it's LibreOffice expected behaviour "according to comment no #4" , shouldn't we close this bug ?
yeah... that's true. It is working as intended. But the cursor positioning is still suboptimal, not that google docs or msword appears to do a lot better than us on my comment #9 example.
I've logged a new bug as bug 54494 specifically for improving the cursor traversal with some notes and a half remembered demo I saw of a super-cool cursor solution I saw in a demo of some SIL sponsored software I saw once and I'll set this as a duplicate of that to keep the association
*** This bug has been marked as a duplicate of bug 54494 ***
(In reply to Caolán McNamara from comment #11)
> yeah... that's true. It is working as intended. But the cursor positioning
> is still suboptimal, not that google docs or msword appears to do a lot
> better than us on my comment #9 example.
> I've logged a new bug as bug 54494 specifically for improving the cursor
> traversal with some notes and a half remembered demo I saw of a super-cool
> cursor solution I saw in a demo of some SIL sponsored software I saw once
> and I'll set this as a duplicate of that to keep the association
> *** This bug has been marked as a duplicate of bug 54494 ***
is it related to my bugs 91764 and 100854?
how can i set libreoffice to support diacritics and does not ignore them when navigating by right arrow key?
thanks for your help and God bless you.