Bug 166299 - If the delimiter is not detected, garbled characters will appear.(Japanese)
Summary: If the delimiter is not detected, garbled characters will appear.(Japanese)
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
25.2.0.0 alpha0+
Hardware: All All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, regression
Depends on:
Blocks:
 
Reported: 2025-04-23 03:03 UTC by Saburo
Modified: 2025-07-14 01:09 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
text2column-sample-ja (10.56 KB, application/vnd.oasis.opendocument.spreadsheet)
2025-04-23 03:06 UTC, Saburo
Details
Text to column on A1 on Linux and A2 on Windows (212.50 KB, image/png)
2025-04-23 18:11 UTC, Mateusz Wlazłowski
Details
screenshots (236.57 KB, image/png)
2025-04-23 23:23 UTC, Saburo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Saburo 2025-04-23 03:03:06 UTC
Description:
When you run "Text to Columns" on characters that contain full-width spaces, the delimiter is not detected.
If you select multiple rows and run it, the preview at the bottom of the dialog will be garbled. (This can also happen with a single cell.)
If the preview is garbled, selecting "Separated by" from the separator options and specifying a full-width space as the separator will not change the preview, and pressing the OK button will simply replace the contents of the first cell with garbled characters.

Steps to Reproduce:
1.Enter characters including full-width spaces
2.Select any rows
3.Data - Text to Columns...

Actual Results:
The preview field is garbled

Expected Results:
No garbled characters


Reproducible: Sometimes


User Profile Reset: No

Additional Info:
It works fine if you change the delimiter from a full-width space to a half-width space or a comma.

Version: 25.2.2.2 (X86_64) / LibreOffice Community
Build ID: 7370d4be9e3cf6031a51beef54ff3bda878e3fac
CPU threads: 4; OS: Linux 6.8; UI render: default; VCL: gtk3
Locale: ja-JP (ja_JP.UTF-8); UI: en-US
Calc: threaded

Version: 25.2.2.2 (X86_64) / LibreOffice Community
Build ID: 7370d4be9e3cf6031a51beef54ff3bda878e3fac
CPU threads: 4; OS: Windows 10 X86_64 (10.0 build 19045); UI render: Skia/Raster; VCL: win
Locale: ja-JP (ja_JP); UI: ja-JP
Calc: CL threaded

Version: 25.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: eb4977cb6d81b1c15d025435adf25b19e88d3132
CPU threads: 4; OS: Linux 6.8; UI render: default; VCL: gtk3
Locale: ja-JP (ja_JP.UTF-8); UI: ja-JP
Calc: threaded

works fine
Version: 24.8.6.2 (X86_64) / LibreOffice Community
Build ID: 6d98ba145e9a8a39fc57bcc76981d1fb1316c60c
CPU threads: 4; OS: Linux 6.8; UI render: default; VCL: gtk3
Locale: ja-JP (ja_JP.UTF-8); UI: ja-JP
Calc: threaded
Comment 1 Saburo 2025-04-23 03:06:50 UTC
Created attachment 200463 [details]
text2column-sample-ja

I've attached a sample for verification.
Comment 2 Saburo 2025-04-23 03:07:50 UTC
bisected

author	Gabriel Masei
commit 565b619d57a3b98b0826c4b49dee6606f9ae70e0

tdf#160582 Preserve settings saving in csv import dialog
Also, improve detection algorithm by replacing the limit
of 20 lines with a time limit of 500ms.

Change-Id: Iac519b6ebe675b91ce84b900646d9d320ea9ddc1
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/165905
Comment 3 Mateusz Wlazłowski 2025-04-23 18:11:22 UTC
Created attachment 200478 [details]
Text to column on A1 on Linux and A2 on Windows

What do you mean by garbled ? Can you show us a screenshot?


For me, on Linux and Windows, when go to Text on column on A1, the preview shows Chinese characters. Is this bug report about that?

On windows, the rendering of the Japanese characters look low resolution. I guess that that's another issue


Version: 25.2.2.2 (X86_64) / LibreOffice Community
Build ID: 7370d4be9e3cf6031a51beef54ff3bda878e3fac
CPU threads: 8; OS: Linux 6.11; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Flatpak
Calc: threaded
Comment 4 Saburo 2025-04-23 21:38:11 UTC
(In reply to opp from comment #3)
> What do you mean by garbled ? Can you show us a screenshot?
Same as your screenshot.

Isn't it garbled to say that the situation where the A1 cell 'あいう えおか' is displayed in Chinese-like characters is garbled.

Since it relies on machine translation, the words may not be appropriate.
Comment 5 Saburo 2025-04-23 23:23:32 UTC
Created attachment 200488 [details]
screenshots

I think there is a bug because when I select two lines and run it, one line becomes unreadable text.
Comment 6 Mateusz Wlazłowski 2025-04-24 15:09:05 UTC
(In reply to Saburo from comment #4)

> Isn't it garbled to say that the situation where the A1 cell 'あいう えおか' is
> displayed in Chinese-like characters is garbled.

I don't know


I confirm the bug
Comment 7 Takenori Yasuda 2025-07-14 01:09:53 UTC
The supposedly garbled string appears to have undergone a byte swap between the high and low bytes of each Unicode code point.

Using Attachment 200478 [details] as an example:

- あいう えおか -> U+3042 U+3044 U+3046 U+3000 U+3048 U+304A U+304B
- 䈰䐰䘰0䠰䨰䬰 -> U+4230 U+4430 U+4630 U+0030 U+4830 U+4A30 U+4B30

This clearly shows that each code point has had its bytes reversed, suggesting an endian-related issue in how the text is being handled internally.