Bug 33092 - Autocomplete display double character for this word [CTL / Thai]
Summary: Autocomplete display double character for this word [CTL / Thai]
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected)
3.3.2 release
Hardware: Other Windows (All)
: medium normal
Assignee: Caolán McNamara
Depends on:
Blocks: AutoCorrect-Complete 33891
  Show dependency treegraph
Reported: 2011-01-13 23:59 UTC by Tantai
Modified: 2016-10-19 23:31 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:

Both lines are autocompleted. Only the first occurence will display double chracter. (12.03 KB, image/png)
2011-02-15 21:05 UTC, Samphan Raruenrom
The experimental patch :- (530 bytes, patch)
2011-03-01 01:21 UTC, Tantai
The result content.xml :- (2.82 KB, text/xml)
2011-03-01 01:22 UTC, Tantai
The bugdoc, odt with overlapped autocomplete text (8.78 KB, application/vnd.oasis.opendocument.text)
2011-03-02 04:39 UTC, Samphan Raruenrom
Experimental patch to see whatif we don't SetLanguage() when insert string from autocorrect? (1.21 KB, patch)
2011-04-12 04:14 UTC, Samphan Raruenrom
does this work for you ? (11.22 KB, patch)
2011-06-03 08:49 UTC, Caolán McNamara

Note You need to log in before you can comment on or make changes to this bug.
Description Tantai 2011-01-13 23:59:06 UTC
Step to reproduced:

1.Open a blank Writer document.
2.Switch keyboard to Thai.
3.Type text “มิถุ” by press “,b5”.
4.The function autocomplete will display the full word.

Expect result:
This function should display the text as “มิถุนายน”.

Real result:
It displays the text with the second character doubled and overlapped.
(See the pictures)

This problem occurs when some text is splitted into 2 runs. if a non-spacing
vowel mark (e.g. ุ) is placed at the first character of the second run, the text
will be displayed overlap. But the problem will occur in the first (regarding
position in the document) autocompleted word in a document. I've pasted 2
snipplets of content.xml to show examples:-

A: Western autocomplete case:
<text:p text:style-name="Standard">January</text:p><text:p

B1: Thai autocomplete case with overlapped text:
<text:p text:style-name="Standard">มิถ<text:span text:style-name="T1">
ุนายน</text:span></text:p><text:p text:style-name="Standard"><text:span

B2: Thai autocomplete case without problem:
<text:p text:style-name="Standard">กรก<text:span text:style-name="T1">
ฎาคม</text:span></text:p><text:p text:style-name="Standard"><text:span

You will see that, in case B1 and B2, the first autocompleted word (e.g.
"มิถุนายน") is splitted before the vowel mark (ุ). But in the second autocompleted
word (e.g. "มิถุนายน"), the entire autocompleted word is placed in the text:span
element. OOo seems not to be able to display non-spacing mark at the begining of
a text run so that's why the text is displayed overlapped.

Note that the splitted autocompleted word always happend if you insert the word
before any other autocompleted word. That is, it happend only for the first
occurance of such autocompleted words.
Comment 1 Samphan Raruenrom 2011-02-15 21:05:17 UTC
Created attachment 43411 [details]
Both lines are autocompleted. Only the first occurence will display double chracter.
Comment 2 Caolán McNamara 2011-02-16 06:55:09 UTC
Because the autocorrect seems to only collect words that are typed, and not pasted in, what are the keystroke to generate the full มิถุนายน string ?
Comment 3 Michael Meeks 2011-02-17 03:27:57 UTC
I did a related fix around some of this code here:

commit 4635182111d70e7c0dfc3a2fc1de61846c82e8a1
Author: Michael Meeks <michael.meeks@novell.com>
Date:   Fri Dec 3 20:26:24 2010 +0000

    autocomplete using the context's case i#22961

22      2       sw/source/ui/docvw/edtwin.cxx

possibly breaking CTL of course ;-)
Comment 4 Tantai 2011-03-01 01:21:20 UTC
Created attachment 43953 [details]
The experimental patch :-
Comment 5 Tantai 2011-03-01 01:21:40 UTC
I experiment with the following patch. And then try the same auto-complete test case above. The result content.xml is shown below. The bug is happen to be solved. I'm wondering why, when Writer autocomplete some text, it must specify the language. Why it is needed?.
Comment 6 Tantai 2011-03-01 01:22:42 UTC
Created attachment 43954 [details]
The result content.xml :-
Comment 7 Samphan Raruenrom 2011-03-02 04:37:37 UTC
A: content.xml from LO before applying the experimental patch (see the next attachment)
<style:style style:name="T1" style:family="text">
   <style:text-properties style:language-complex="th" style:country-complex="TH"/>
<text:p text:style-name="Standard">มิถ<text:span text:style-name="T1">

B: after applying the experimental patch, there's no auto-stype T1
<text:p text:style-name="Standard">มิถุนายน</text:p>

I'm wondering, in the first case, why the auto-style T1 to switch to locale th_TH is needed at all?
Comment 8 Samphan Raruenrom 2011-03-02 04:39:49 UTC
Created attachment 44017 [details]
The bugdoc, odt with overlapped autocomplete text
Comment 9 Samphan Raruenrom 2011-04-12 04:14:43 UTC
Created attachment 45511 [details]
Experimental patch to see whatif we don't SetLanguage() when insert string from autocorrect?

The result is that this bug is gone. We don't see any side-effect yet. So I am wondering whether SetLanguage() before inserting string from autocorrect is needed at all, and why? Because the user is typing in the same language as the autocorrect string anyway. Why switch to the same language?
Comment 10 Caolán McNamara 2011-06-03 06:14:34 UTC
Because the autocorrect seems to only collect words that are typed, and not
pasted in, what are the keystroke to generate the full มิถุนายน string ?
Comment 11 Caolán McNamara 2011-06-03 07:02:04 UTC
oky doky. Windows and Linux IMs work differently. The windows one provides the language used, while the Linux ones don't. Hence why I couldn't reproduce this.

With some hackery in place I can see this problem now.
Comment 12 Caolán McNamara 2011-06-03 08:48:38 UTC
So I think comment #9 is basically correct. I don't have windows running here at the moment, so I'm going to speculate...

On a Thai system with a Thai IM (Input Method) then no explicit language is forced onto text entered through the IM as the defaults are considered to be ok. When the autocorrect is triggered, it asks for the current IM language which is Thai and forces it onto the accepted output which gives two runs, a "standard" run which is determined to be thai because its got CTL chars in it and the doc default CTL language is thai, and another run which is explicitly thai, and this triggers some other bugs down the line.

So, would the following patch work. Follows the same pattern as #9, but moves the chunk of code that determines the "should language be forced to be the same as the current windows IM" when normally typing text into a shared location and reuses that logic for setting the language of the autocomplete text.
Comment 13 Caolán McNamara 2011-06-03 08:49:16 UTC
Created attachment 47481 [details]
does this work for you ?
Comment 14 Samphan Raruenrom 2011-06-04 22:39:18 UTC
Yes. It's working fine here. We'll doing more testing anyway.
Comment 15 Caolán McNamara 2011-06-07 05:18:07 UTC
Let's assume this is the correct approach, and open a new bug (against me) if there turns out to be more side-effects
Comment 16 Samphan Raruenrom 2011-09-30 02:09:21 UTC
I've seen the patch in master and tested it. It works fine. However, the Thai users still face the problem because the patch isn't in 3.4.x. Can someone pick the patch for the next releases?
Comment 17 Caolán McNamara 2011-09-30 12:08:33 UTC
can you send a mail to the dev mailing list if you want to propose this for 3-4. [REVIEW] is the usual header, and mention the bug id and that you want to propose it for backporting/cherrypicking to the 3-4 series