Bug 148991 - Inputting Chinese characters in overwrite mode result in "!!broken!!" errors
Summary: Inputting Chinese characters in overwrite mode result in "!!broken!!" errors
Status: REOPENED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.0.4 release
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Jonathan Clark
URL:
Whiteboard: target:26.2.0 target:25.8.4
Keywords: bibisected, bisected, regression
Depends on:
Blocks: CJK IME
  Show dependency treegraph
 
Reported: 2022-05-09 01:31 UTC by Fudo Altto
Modified: 2025-11-18 00:53 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fudo Altto 2022-05-09 01:31:33 UTC
Description:
When I type Chinese characters in the insertion mode of the keyboard just ahead of several other Chinese characters, the characters typed in advance got broken, with "!!broken!!" displayed. It happens when more than 2 Chinese characters are typed ahead in my test, however, I am not sure if the number of characters matters. ALso, in the simple test of mine, the same occasion is not seen as to ASCII characters.

Example:示例 --> 示例二!!broken!!

Steps to Reproduce:
1. Create a new .odt file and open it.
2. Type several Chinese characters, like "示例".
3. Turn the cursor to the head of the characters typed before (for example, press button "Home" on the keyboard).
4. Press button "Insert" to turn the insert mode.
5. Type more than 2 Chinese characters, like "示例".

Actual Results:
示例 --> 示例!!broken!!

Expected Results:
示例示例


Reproducible: Always


User Profile Reset: Yes


OpenGL enabled: Yes

Additional Info:
Version: 7.2.6.2 (x64) / LibreOffice Community
Build ID: b0ec3a565991f7569a5a7f5d24fed7f52653d754
CPU threads: 8; OS: Windows 10.0 Build 19044; UI render: Skia/Vulkan; VCL: win
Locale: en-GB (zh_CN); UI: en-GB
Calc: CL
Comment 1 Kevin Suo 2022-05-14 05:08:44 UTC
Would you please clarify the input method you have used?
Comment 2 Fudo Altto 2022-05-15 07:36:57 UTC
(In reply to Kevin Suo from comment #1)
> Would you please clarify the input method you have used?

I used Microsoft Pinyin, which is a officially supported input method by Microsoft for Chinese characters input.
Comment 3 Fudo Altto 2022-05-15 07:42:17 UTC
(In reply to Fudo Altto from comment #2)
> (In reply to Kevin Suo from comment #1)
> > Would you please clarify the input method you have used?
> 
> I used Microsoft Pinyin, which is a officially supported input method by
> Microsoft for Chinese characters input.

Sorry for a few grammar mistakes. I do need to be more discreet before submitting the reply next time and I was not aware that it was not revocable.
Comment 4 Ayush Jain 2022-06-06 19:00:26 UTC
Thank you for reporting the bug. I can confirm that the bug is present in

7.3.3.2 (x64)
Comment 5 Fudo Altto 2022-06-13 05:07:42 UTC
(In reply to Ayush Jain from comment #4)
> Thank you for reporting the bug. I can confirm that the bug is present in
> 
> 7.3.3.2 (x64)

It is a pleasure for me! And it is me that should thank all of you for your contribution to this program set that helps me in dealing with elextronic diagrams, slides, and especially documents. Thank you!
Comment 6 Faisal 2023-02-02 12:40:20 UTC
Can still be reproduced with:

Version: 7.4.4.2 (x64) / LibreOffice Community
Build ID: 85569322deea74ec9134968a29af2df5663baa21
CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL

I can reproduce the issue with Microsoft Pinyin and Microsoft Wubi IMEs in the Chinese (Simplified, China) locale. I can also reproduce with Microsoft Bopomofo in the Chinese (Traditional, Taiwan) locale. I cannot reproduce with Microsoft CangJie IME in the Taiwan locale.

Also, shouldn't the expected result be 示例 and not 示例示例, since insert mode is turned on and the newly typed characters will replace the old ones entirely?
Comment 7 QA Administrators 2025-02-02 03:17:23 UTC Comment hidden (obsolete)
Comment 8 Ming Hua 2025-02-02 05:04:30 UTC
Still reproducible in 25.2:

Version: 25.2.0.3 (X86_64) / LibreOffice Community
Build ID: e1cf4a87eb02d755bce1a01209907ea5ddc8f069
CPU threads: 12; OS: Windows 11 X86_64 (10.0 build 26100); UI render: Skia/Vulkan; VCL: win
Locale: zh-CN (zh_CN); UI: zh-CN
Calc: CL threaded
Comment 9 Takenori Yasuda 2025-05-23 14:19:28 UTC
Microsoft Pinyin: Reproduced
Google Pinyin (ver.2.7.25.128): Not reproduced

Version: 25.2.4.1 (X86_64) / LibreOffice Community
Build ID: 09303ce8b49f86f106fccd32b1324662053027cc
CPU threads: 8; OS: Windows 11 X86_64 (10.0 build 26100); UI render: default; VCL: win
Locale: ja-JP (ja_JP); UI: ja-JP
Calc: threaded

Version: 25.8.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: b52384de6f09f124fef9405ccf273c0f5c3339d1
CPU threads: 8; OS: Windows 11 X86_64 (build 26100); UI render: default; VCL: win
Locale: ja-JP (ja_JP); UI: ja-JP
Calc: threaded
Comment 10 Takenori Yasuda 2025-08-05 08:08:01 UTC
This bug also occurs when a single character is entered.
In this case, the following steps will reproduce the issue:

Steps to Reproduce:
1. Enter any single character (both CJK and non-CJK characters are acceptable).
2. Move the cursor to the beginning of the entered character.
3. Switch to overwrite mode.
4. Type any Pinyin string whose length is u + 1 characters,
    where u is the number of UTF-16 code units of the character in step 1.
    Do not confirm the input.
5. Press the Escape key.

Actual Results:
!!br0ken!!

Additional Info:
Multi-character inputs and ligatures appear to follow the same pattern.
However, in those cases, the bug may sometimes occur even with fewer than u + 1 characters of input, so further investigation may be necessary.
As of now, entering a Pinyin string of u + 1 characters reliably triggers the bug.

Tested with:
Version: 25.8.1.0.0+ (X86_64) / LibreOffice Community
Build ID: 051e3f7490541e1a67111b7f8cf72fa5d2a1bb96
CPU threads: 8; OS: Windows 11 X86_64 (build 26100); UI render: Skia/Raster; VCL: win
Locale: ja-JP (ja_JP); UI: ja-JP
Calc: CL threaded Jumbo

Version: 25.2.5.2 (X86_64) / LibreOffice Community
Build ID: 03d19516eb2e1dd5d4ccd751a0d6f35f35e08022
CPU threads: 8; OS: Windows 11 X86_64 (10.0 build 26100); UI render: Skia/Raster; VCL: win
Locale: ja-JP (ja_JP); UI: ja-JP
Calc: CL threaded Jumbo

Microsoft Pinyin
Comment 11 Takenori Yasuda 2025-08-05 15:25:13 UTC
I've just confirmed that this bug is not only Chinese but also Japanese.
It also occurs with the following Japanese IMEs:

- Microsoft IME (Japanese) — Windows 11 built-in
- Google Japanese Input — Version 2.30.5620.0+24.11.9

The same steps described in comment 0 and comment 10 can be used to reproduce the issue.

Version: 25.2.5.2 (X86_64) / LibreOffice Community
Build ID: 03d19516eb2e1dd5d4ccd751a0d6f35f35e08022
CPU threads: 8; OS: Windows 11 X86_64 (10.0 build 26100); UI render: Skia/Raster; VCL: win
Locale: ja-JP (ja_JP); UI: ja-JP
Calc: CL threaded Jumbo

Version: 25.8.1.0.0+ (X86_64) / LibreOffice Community
Build ID: 150bf27c032f615453df8d5da71d86fa767c30de
CPU threads: 8; OS: Windows 11 X86_64 (build 26100); UI render: Skia/Raster; VCL: win
Locale: ja-JP (ja_JP); UI: ja-JP
Calc: CL threaded Jumbo
Comment 12 Saburo 2025-08-08 11:15:08 UTC
reproduce
Version: 4.3.7.2
Build ID: 8a35821d8636a03b8bf4e15b48f59794652c68ba

not reproduce
Version 4.0.6.2 (Build ID: 2e2573268451a50806fcd60ae2d9fe01dd0ce24)

bibisected with linux-43all and linux-42max
commit 808d3c669c4c49c2dd5ea7fad7841378b5cc2f8c
author	Matteo Casalin

String to OUString, some cleanup

Change-Id: I7d1cdabdaecae1d993730397a1757727fb40a6db
Reviewed-on: https://gerrit.libreoffice.org/5608
Comment 13 Takenori Yasuda 2025-10-18 05:40:46 UTC
I noticed that the IMEs affected by this bug seemed to share the following characteristics:

- The language follows a "type -> convert -> confirm" input process.
- The IME displays the unconfirmed (composing) text directly in the document.
Comment 14 Commit Notification 2025-10-28 17:08:47 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/2d32f9abd43e9308ebc14c9f12fa7440f1cc1654

tdf#148991 sw: fix string corruption after IME dismiss

It will be available in 26.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Jonathan Clark 2025-10-28 17:14:40 UTC
I wasn't able to reproduce the bug by following the instructions in the original report. This fix is based on the instructions in comment 10.

It's possible these are separate bugs, so it would be helpful if someone else can test this fix and reopen the bug if necessary.
Comment 16 Commit Notification 2025-10-28 23:11:43 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "libreoffice-25-8":

https://git.libreoffice.org/core/commit/9301eb8efa519bff1f2f90cd0a2306e4a4e64fea

tdf#148991 sw: fix string corruption after IME dismiss

It will be available in 25.8.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Takenori Yasuda 2025-11-01 01:39:11 UTC
Tested with the following nightly build:

Version: 26.2.0.0.alpha0+ (X86_64) / LibreOffice Community  
Build ID: 620(Build:0)  
CPU threads: 8; OS: Windows 11 X86_64 (build 26100); UI render: Skia/Raster; VCL: win  
Locale: ja-JP (ja_JP); UI: ja-JP  
Calc: CL threaded Jumbo  
https://dev-builds.libreoffice.org/daily/master/Win-x86_64@tb103-1-TDF/2025-10-30_20.18.53/

The steps in comment 10 seem to be fixed.
The steps in comment 0 also appear to be fixed.
Though I’m not completely confident — it would be great if someone else could double-check.
Comment 18 Takenori Yasuda 2025-11-01 01:49:28 UTC
Additionally, I found another related behavior.
I'm not sure whether it's a different bug or a leftover case of this one.

Steps to Reproduce:
1. Type "日本".
2. Move the cursor to the beginning.
3. Switch to overwrite mode.
4. Type "にほんこく" (do not convert or confirm).
5. Alternate between pressing the Conversion (Space) and Escape keys.

Actual Results:
Before the patch: 日本国!!br0ken!!!!br0ken!!!!br0ken!!
After the patch: 日本国こくこくこく
The number of "!!br0ken!!" or "こく" repetitions depends on how many times Step 5 is repeated.

Expected Results:
Repeated alternation between "日本国" and "にほんこく" without duplication or corruption.

If this turns out to be a separate issue, please feel free to open a new report for it.
Comment 19 Takenori Yasuda 2025-11-12 04:33:37 UTC
(In reply to Takenori Yasuda from comment #17)
> The steps in comment 0 also appear to be fixed.
It seems the bug might still not be completely fixed after all.

In Step 5 of Comment 0, the bug no longer occurs when the number of entered characters matches that in Step 2. However, when I added one more character and entered "大家好", the bug reappeared.
- Actual Result: 大家好jia'hao (Before patch: 大家好!!br0ken!!)
- Expected Result: 大家好

Since Step 5 of Comment 0 says "Type more than 2 Chinese characters", this suggests that the current patch may not fully cover all cases.

Note:
For this verification, I assumed that the reporter in Comment 0 may have misunderstood Step 4. Based on that assumption, I interpreted it as follows:
- 4. Press the "Insert" key to switch to **overwrite** mode.


If this is considered a remaining part of the same issue, please feel free to reopen this report.
Otherwise, I'll be happy to open a new one if it turns out to have a different root cause.
Comment 20 Jonathan Clark 2025-11-18 00:53:56 UTC
Thanks for testing the fix. It should be fine to handle as another instance of the same bug.