Description: The "voice consonant mark" means little dashes or circle which is put on some KATAKANA characters: e.g. カ (KA) with the little dashes becomes ガ (GA), ハ (HA) with a little circle becomes パ (PA). Half-width KATAKANA treats such little dashes or circle ― U+FF9E and U+FF9F ― as a single character. for example, "ガ" is a combination of two characters (U+FF76, U+FF9E). In Full-width KATAKANA, KATAKANA character with a voice consonant mark counts to 1 character, e.g. "ガ" (U+30AC). Japanese human usually recognizes KATAKANA with such marks as one character, even if a combination of 2 half-width characters. If a finding string includes such KATAKANA with voice consonant mark, the searching result is incorrect. This problem occurs, at least in Calc, Writer, Draw and Impress. The issue has reproduced since the commit d6336e0b21eeece0e678a8768938c04fa120043f, and didn't before that commit. Steps to Reproduce: 1. open the attachment with Writer. 2. open Find and Replace dialog and Uncheck "Match Character Width" 3. enter a KATAKANA string which contains voice consonant mark: Examination 1: enter "ガギグゲゴ" (U+30AC + U+30AE + U+3030B0 + U+30B2 + U+30B4) or "ガギグゲゴ" (U+FF76 + U+FF9E + U+FF77 + U+FF9E + U+FF78 + U+FF9E + U+FF79 + U+FF9E + U+FF7A+ U+FF9E) Examination 2: enter "ギグゲ" (U+30AE, U+3030B0, U+30B2) or "ギグゲ" (U+FF77 + U+FF9E + U+FF78 + U+FF9E + U+FF79 + U+FF9E) 4. click Find Next. Actual Results: Examination 1: "ガギグゲゴ" (U+FF76 + U+FF9E + U+FF77 + U+FF9E + U+FF78 + U+FF9E + U+FF79 + U+FF9E + U+FF7A+ U+FF9E) or "ガギグゲゴ01234" (U+30AC + U+30AE + U+3030B0 + U+30B2 + U+30B4 + U+0030 + U+0031 + U+0032 + U+0033 + U+0034) Examination 2: "ギグゲ" (U+FF77 + U+FF9E + U+FF78 + U+FF9E+ U+FF79 + U+FF9E), "グゲ0123" (U+30B0 + U+30B2 + U+0030 + U+0031 + U+0032 + U+0033) or "グゲゴ012" (U+30B0 + U+30B2 + U+30B4 + U+0030 + U+0031 + U+0032) Expected Results: Examnation 1: "ガギグゲゴ" (U+30AC + U+30AE + U+3030B0 + U+30B2 + U+30B4) or "ガギグゲゴ" (U+FF76 + U+FF9E + U+FF77 + U+FF9E + U+FF78 + U+FF9E + U+FF79 + U+FF9E + U+FF7A+ U+FF9E) Examination 2: "ギグゲ" (U+30AE + U+30B0 + U+30B2) or "ギグゲ" (U+FF77 + U+FF9E + U+FF78 + U+FF9E+ U+FF79 + U+FF9E) Reproducible: Always User Profile Reset: No Additional Info: Version: 7.4.1.2 / LibreOffice Community Build ID: 40(Build:2) CPU threads: 8; OS: Linux 5.19; UI render: default; VCL: gtk3 Locale: ja-JP (ja_JP.UTF-8); UI: en-US 7.4.1-2 Calc: threaded
Created attachment 182643 [details] example file This file is not embedded Japanese font. Please install a Japanese font and review the file.
"The issue has reproduced since the commit d6336e0b21eeece0e678a8768938c04fa120043f, and didn't before that commit." I'm sorry, but the commit is wrong. That was of bibisecting: d6336e0b21eeece0e678a8768938c04fa120043f is the first bad commit commit d6336e0b21eeece0e678a8768938c04fa120043f Author: Jenkins Build User <tdf@pollux.tdf> Date: Thu Sep 16 12:16:43 2021 +0200 source c7551e8a46e2f9f8142aa7921a0494221ae096e8 source c7551e8a46e2f9f8142aa7921a0494221ae096e8 instdir/program/libi18npoollo.so | Bin 1617192 -> 1613016 bytes instdir/program/libi18nutil.so | Bin 123104 -> 123104 bytes instdir/program/setuprc | 2 +- instdir/program/versionrc | 2 +- 4 files changed, 2 insertions(+), 2 deletions(-) As you can see, the issue has reproduced since the commit c7551e8a46e2f9f8142aa7921a0494221ae096e8 , and didn't before that commit.
Reproduced with 7.3.6 and 7.4.1 on Windows: Version: 7.3.6.2 (x64) / LibreOffice Community Build ID: c28ca90fd6e1a19e189fc16c05f8f8924961e12e CPU threads: 12; OS: Windows 10.0 Build 22000; UI render: Skia/Vulkan; VCL: win Locale: zh-CN (zh_CN); UI: en-US Calc: CL and Version: 7.4.1.2 (x64) / LibreOffice Community Build ID: 3c58a8f3a960df8bc8fd77b461821e42c061c5f0 CPU threads: 12; OS: Windows 10.0 Build 22000; UI render: Skia/Raster; VCL: win Locale: en-US (zh_CN); UI: zh-CN Calc: CL But no reproduce on 7.0.6: Version: 7.0.6.2 (x64) Build ID: 144abb84a525d8e30c9dbbefa69cbbf2d8d4ae3b CPU threads: 12; OS: Windows 10.0 Build 22000; UI render: default; VCL: win Locale: zh-CN (zh_CN); UI: en-US Calc: CL
It seems the reporter was using machine translation. I hope my description below is more concise and clear. The issue is rather straight forward: To find a Japanese string, the result shouldn't include the digits (0123...) after the search term. In 7.0 the behavior is normal, in 7.3 and 7.4 the search result (highlighted) includes the digits when it matches the full-width characters, for example searching "ガギグゲゴ" (U+30AC...) gets "ガギグゲゴ01234". Although I didn't do the bibisection myself, the result in comment #2 is consistent with the regression range I found in testing. So setting the keyword and adding Noel to CC. Noel: Would you please have a look?
Noel Grandin committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/222e56157c6317435088e09e52a0705bc6a1a83a tdf#151148 Finding KATAKANA which has voice consonant mark wrong It will be available in 7.5.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
*** Bug 151141 has been marked as a duplicate of this bug. ***
To Noel: Thank you for your providing a patch. It works well for me. I'm sorry, but I submitted your patch to libreoffice-7-4 branch as backporting without your permission. The branch is developing for its minor release now. That regression is serious for Japanese users and most of them want the immediate fix. If you don't mind, See https://gerrit.libreoffice.org/c/core/+/140566
Noel Grandin committed a patch related to this issue. It has been pushed to "libreoffice-7-4": https://git.libreoffice.org/core/commit/a5b6ddf3f0055cebe2713af34c304a647af6c76a tdf#151148 Finding KATAKANA which has voice consonant mark wrong It will be available in 7.4.3. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Noel Grandin committed a patch related to this issue. It has been pushed to "libreoffice-7-3": https://git.libreoffice.org/core/commit/a288453c50f49852c2a83cc4716ec44d6230d37c tdf#151148 Finding KATAKANA which has voice consonant mark wrong It will be available in 7.3.7. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Noel Grandin committed a patch related to this issue. It has been pushed to "libreoffice-7-4-2": https://git.libreoffice.org/core/commit/b46221a4817ca41776446d2a8d81272ce1022c29 tdf#151148 Finding KATAKANA which has voice consonant mark wrong It will be available in 7.4.2. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
*** Bug 151396 has been marked as a duplicate of this bug. ***
*** Bug 151477 has been marked as a duplicate of this bug. ***
I can confirm the issue described in comment #0 here is fixed in: Version: 7.4.2.3 (x64) / LibreOffice Community Build ID: 382eef1f22670f7f4118c8c2dd222ec7ad009daf CPU threads: 12; OS: Windows 10.0 Build 22000; UI render: Skia/Raster; VCL: win Locale: en-US (zh_CN); UI: zh-CN Calc: CL Reporters of other bugs that are marked as DUPLICATE: Please test your problem with 7.4.2 (already released) or 7.3.7 (RC1 should be out this coming week). If the new version doesn't resolve your issue, speak up here or set your bug's status back to UNCONFIRMED.
And of course, thanks Noel for the quick fix and Julien for testing!
I confirmed the issue was fixed in the latest libreoffice-fresh package of Archlinux: Version: 7.4.2.3 / LibreOffice Community Build ID: 40(Build:3) CPU threads: 8; OS: Linux 6.0; UI render: default; VCL: gtk3 Locale: ja-JP (ja_JP.UTF-8); UI: ja-JP 7.4.2-1 Calc: threaded