Bug 137639 - Copying and pasting English text converts it to Chinese (kf5 + Wayland)
Summary: Copying and pasting English text converts it to Chinese (kf5 + Wayland)
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.4.0.3 release
Hardware: All Linux (All)
: medium normal
Assignee: Michael Weghorn
URL:
Whiteboard: target:7.4.0 target:7.3.4
Keywords: bibisected, bisected, regression
Depends on:
Blocks: Wayland KDE
  Show dependency treegraph
 
Reported: 2020-10-20 23:21 UTC by Munzir Taha
Modified: 2022-05-18 07:34 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:
Regression By: Jan-Marek Glogowski


Attachments
LO file with a link (8.56 KB, application/vnd.oasis.opendocument.text)
2021-08-05 14:11 UTC, Munzir Taha
Details
FC_DEBUG=4 output (26.87 MB, text/plain)
2021-08-05 14:13 UTC, Munzir Taha
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Munzir Taha 2020-10-20 23:21:11 UTC
Description:
Insert Footnote/Endnote dialog neither allow Unicode character input using Alt+X nor paste some characters correctly.

One example is the circled digits. It inserts Chinese characters instead.

I understand China is everywhere but please not here ;)

Steps to Reproduce:
1. Insert → Footnote and Endnote → Footnote or Endnote…
2. Paste ① (U+2460: CIRCLED DIGIT ONE)
3. It magically turns into a Chinese character 釢

Actual Results:
Neither pasting nor Alt-X works

Expected Results:
I expect Alt-X to work beside pasting any Unicode character code.


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 7.0.2.2
Build ID: 00(Build:2)
CPU threads: 12; OS: Linux 5.8; UI render: default; VCL: kf5
Locale: en-US (en_US.UTF-8); UI: en-US
7.0.2-1
Calc: threaded
Comment 1 Dieter 2020-11-06 07:37:46 UTC
I can't confirm it with

Version: 7.1.0.0.alpha1+ (x64)
Build ID: f27c4ec5c864395f4cdaec32d7e95ff24e4f43c8
CPU threads: 4; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
Calc: threaded

I typed ALT+2460 and got £. I could also press "choose" and select a character frm the special character dialog.

Perhaps Linux only or related to problems with user profile (please try in SafeMode)
Comment 2 Munzir Taha 2020-12-29 19:33:29 UTC
@Dieter:
I tried in safe mode and the bug is there. I clearly mentioned the bug is in Linux (Not Windows) and in version 7.0.2. It's also in 7.0.4. I can't install an alpha version now till it's released and would test eventually, but it would help for some one with latest stable version to test and (un)confirm the bug.

Strangely, you mentioned
> I typed ALT+2460 and got £

How come? U+2460 is not £! The pound sign is 00A3. Either you are wrong or you have a different bug!
Comment 3 Munzir Taha 2021-02-05 00:06:56 UTC Comment hidden (obsolete)
Comment 4 Munzir Taha 2021-08-01 02:28:49 UTC Comment hidden (obsolete)
Comment 5 Dieter 2021-08-03 12:28:04 UTC Comment hidden (obsolete)
Comment 6 Munzir Taha 2021-08-05 14:11:36 UTC
Created attachment 174106 [details]
LO file with a link
Comment 7 Munzir Taha 2021-08-05 14:13:31 UTC
Created attachment 174107 [details]
FC_DEBUG=4 output
Comment 8 Munzir Taha 2021-08-05 14:14:45 UTC
Unfortunately, the bug comes again. I will do my best attaching more debugging information.

Attached is a bug.odt file that contains a link. Just right-click -> Edit Hyperlink, Copy the http url, close the dialog and paste the text any where in the page and you get the Chinese characters. I let them live in the file for now.

Attached also is a bug.txt file that contains the output of FC_DEBUG=4. I don't understand why LO is calling for Chinese pattern and fonts when I just copy English url. Maybe these lines means something to you.


Rule Set: /etc/fonts/conf.d/49-sansserif.conf
FcConfigSubstitute test pattern all family NotEqual "sans-serif"
FcConfigSubstitute test pattern all family NotEqual "serif"
FcConfigSubstitute test pattern all family NotEqual "monospace"
Substitute Edit family AppendLast "sans-serif"

Append list before  "思源宋体"(s) [marker]
Append list after  "思源宋体"(s) "sans-serif"(w)
FcConfigSubstitute editPattern has 8 elts (size 16)
        family: "思源宋体"(s) "sans-serif"(w)
        slant: 0(i)(s)
        weight: 80(i)(s)
        spacing: 0(i)(s)
        hintstyle: 1(i)(w)
        scalable: True(s)
        lang: "zh-cn"(s) "en"(w)
        prgname: "soffice.bin"(s)

FcConfigSubstitute Pattern has 39 elts (size 48)
        family: "KanjiStrokeOrders"(s)
        familylang: "en"(s)
        style: "Medium"(s)
        stylelang: "en"(s)
        fullname: "KanjiStrokeOrders"(s)
 ...
        file: "/usr/share/fonts/kanjistrokeorders/KanjiStrokeOrders.ttf"(w)

Please, tell me if there is a better way to help you debug this.
Comment 9 Munzir Taha 2021-08-05 14:14:56 UTC Comment hidden (obsolete)
Comment 10 himajin100000 2021-08-05 14:22:18 UTC Comment hidden (obsolete)
Comment 11 himajin100000 2021-08-05 14:23:07 UTC
memo:

釢(U+91E2)
① E2 91 A0 in UTF-8
Comment 12 Munzir Taha 2021-08-06 03:28:16 UTC
@himajin: Nice catch!
I just tested by copying ww and I got 睷
w (U+0077)
睷 (U+7777)

Now, I know why copying single characters was not working sometimes randomly. It's also worth mentioning that LO after this get sluggish, unresponsive with many crashes and 100% CPU usage.
Comment 13 Buovjaga 2022-05-11 14:41:16 UTC
(In reply to Munzir Taha from comment #8)
> Unfortunately, the bug comes again. I will do my best attaching more
> debugging information.
> 
> Attached is a bug.odt file that contains a link. Just right-click -> Edit
> Hyperlink, Copy the http url, close the dialog and paste the text any where
> in the page and you get the Chinese characters. I let them live in the file
> for now.

I tried with attachment 174106 [details], but I don't see the problem. Do you still see it with version 7.3?

Set to NEEDINFO.
Change back to UNCONFIRMED, if the problem persists. Change to RESOLVED WORKSFORME, if the problem went away.

Arch Linux 64-bit
Version: 7.3.3.2 / LibreOffice Community
Build ID: 30(Build:2)
CPU threads: 8; OS: Linux 5.17; UI render: default; VCL: kf5 (cairo+xcb)
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
7.3.3-2
Calc: threaded
Comment 14 Munzir Taha 2022-05-12 16:51:19 UTC
> I tried with attachment 174106 [details], but I don't see the problem. Do
> you still see it with version 7.3?

Yes, I just test version 7.3.3 by copying the ww and got 睷.

 Arch Linux
Version: 7.3.3.2 / LibreOffice Community
Build ID: 30(Build:2)
CPU threads: 12; OS: Linux 5.17; UI render: default; VCL: kf5 (cairo+wayland)
Locale: en-US (en_US.UTF-8); UI: en-US
7.3.3-2
Calc: threaded



> CPU threads: 8; OS: Linux 5.17; UI render: default; VCL: kf5 (cairo+xcb)

I see you are using X11 whereas I am using wayland. It seems the bug is wayland-only.
Comment 15 Buovjaga 2022-05-12 20:31:40 UTC
Indeed, I can repro in a Wayland session (only with kf5).

I bibisected it with linux-64-6.4 to https://git.libreoffice.org/core/commit/c386f07ce195c2167f1b56d23cfd95292634e2de
tdf#112368 Qt5 handle owned, non-LO clipboard content

In bug 112368 comment 9 I even mention the same issue and there was a follow-up patch, but apparently the patch only fixed it for X11.
Comment 16 Munzir Taha 2022-05-12 21:42:17 UTC
Thanks for bisecting and confirming.
Comment 17 Buovjaga 2022-05-13 06:29:36 UTC
Please don't remove stuff I add to the fields
Comment 18 Michael Weghorn 2022-05-17 05:27:57 UTC
Pending Gerrit change: https://gerrit.libreoffice.org/c/core/+/134456
Comment 19 Buovjaga 2022-05-17 07:59:32 UTC
(In reply to Michael Weghorn from comment #18)
> Pending Gerrit change: https://gerrit.libreoffice.org/c/core/+/134456

I confirm the problem is gone on Wayland with this patch!
Comment 20 Commit Notification 2022-05-18 04:48:32 UTC
Michael Weghorn committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/6fc3ec85a32cd70216b4bbf21e479b4fc32a38dc

tdf#137639 qt: UTF-16-encode mime data for "text/plain;charset=utf-16"

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Michael Weghorn 2022-05-18 04:49:38 UTC
(In reply to Buovjaga from comment #19)
> (In reply to Michael Weghorn from comment #18)
> > Pending Gerrit change: https://gerrit.libreoffice.org/c/core/+/134456
> 
> I confirm the problem is gone on Wayland with this patch!

Thanks for testing! Setting to VERIFIED.
Comment 22 Commit Notification 2022-05-18 07:34:15 UTC
Michael Weghorn committed a patch related to this issue.
It has been pushed to "libreoffice-7-3":

https://git.libreoffice.org/core/commit/a32742936d6da1167c5b8200ef8b2827daf4c13f

tdf#137639 qt: UTF-16-encode mime data for "text/plain;charset=utf-16"

It will be available in 7.3.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.