I am using LibreOffice 3.3.3 on Kubuntu Natty and 3.4.2 on Windows XP. On both installations I observe the following bug: What I did: I typed the Tamil word சித்திரை (name of a month) into an LO application. I used the left- and right-cursor keys to navigate the word. Expected behaviour: The "correct" native-user-perceived grapheme cluster split-out is: சி|த்|தி|ரை (ci|t|ti|rai) as in Tamil, vowelless consonants are considered independent grapheme clusters on their own. So it should be possible to place the cursor at any of the above positions indicated by the |, especially between த் and தி. Actual behaviour: LO analyses the word as சி|த்தி|ரை (ci|t.ti|rai) and does not allow cursor placement between the த் and தி. When my cursor is to the left of த் and I press the right-cursor key, the cursor moves to the right of தி, and vice versa with left-cursor key. Background: Generally in Indic scripts like Devanagari, a vowelless consonant would be taken into the same grapheme cluster as a following consonant, as mostly ligatures or conjoining forms between the consonants will occur in such cases. For example, the same word presented in Devanagari would be analysed as चि|त्ति|रै (ci|t.ti|rai) since the vowelless "t" ligates (or in the absence of ligature in a font, takes a conjoining form) with the following consonant. However in Tamil, vowelless consonants never ligate or form conjoining forms with following consonants. (The only exception is க் ligating with ஷ to form க்ஷ.) Therefore vowelless consonants in Tamil are always perceived by native users as grapheme clusters on their own. Therefore a native user expects to be able to place a cursor immediately before or after a vowelless consonant, so: சி|த்|தி|ரை Other applications (like for example Firefox which I am using now to report this bug) correctly treat த் and தி as separate grapheme clusters in the word சித்திரை. Note: This faulty behaviour is seen for all CONSONANT + VIRAMA + CONSONANT sequences in Tamil in LibreOffice. The one obvious exception is க்ஷ i.e. KA + VIRAMA + SSA where the ligature is formed and so the native user does *not* expect to place the cursor in the middle of it.
Thanks for bugreport Reproduced with copy-pasted text from here in 3.5.2 on Fedora 64 bit but not reproducible in 3.3.4 , so regression
@ Caolan What do You think about this bug?
i18npool/source/breakiterator/data/char_in.txt might need some sort of adjustment (or its just wrong) and i18npool/qa/cppunit/test_breakiterator.cxx updated with these examples The rule we're apparently following is... $TamilLetter = [\u0B85-\u0BB9]; $TamilSignVirama = \u0BCD; $TamilLetter ($TamilSignVirama $TamilLetter?)+; probably needs a bit more head-scratching to get that rule right
Caolan McNamara committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=16cd97480d0681d37f86e89366e1f9964ec16ef8 Resolves: fdo#40292 Tamil grapheme cluster rules
I'll freely admit I'm no expert here, but the provided examples now apparently do the expected thing. You will be able to find daily builds at e.g. http://dev-builds.libreoffice.org/daily/Win-x86@6-fast/master/current/ (for windows) and http://dev-builds.libreoffice.org/daily/Linux-x86_10-Release_Configuration/master/current/ (for linux) tomorrow (or the day after) which should include this fix for testing
Thanks!
Now confirmed fixed on latest trunk viz http://dev-builds.libreoffice.org/daily/Win-x86@6-fast/master/current/master~2012-06-14_22.09.53_LibO-Dev_3.7.0alpha0_Win_x86_install_en-US.msi BTW it doesn't seem to be fixed in the LO 3.5 series. I'm using LO 3.5.3 on Kubuntu Precise where this problem still exists. BTW don't you think that the rule in your commit http://cgit.freedesktop.org/libreoffice/core/commit/?id=16cd97480d0681d37f86e89366e1f9964ec16ef8: +$TamilSsa $TamilSignVirama $TamilKa; should be: +$TamilKa $TamilSignVirama $TamilSsa; See the other rules in your commit.
re versions its in. See the "whiteboard" section above. That's supposed to keep track of what versions a fix was committed to. So I only committed to "master" and didn't backport to 3.5.X. re the rules: +$TamilKa $TamilSignVirama $TamilSsa; <-this one is for going forwards ... +$TamilSsa $TamilSignVirama $TamilKa; <-this one is for going backwards so I think this is what we want here