Bug 104927 - Text Import - fixed width mode not adjusting csvtablebox for multi-byte fonts
Summary: Text Import - fixed width mode not adjusting csvtablebox for multi-byte fonts
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.2.3.3 release
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.2.0 target:7.4.0 target:7.3....
Keywords:
Depends on:
Blocks: CJK CSV-Import
  Show dependency treegraph
 
Reported: 2016-12-26 06:57 UTC by Dragon Chuang
Modified: 2022-03-21 09:51 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Text File, data width 58 bytes each line (40.43 KB, image/png)
2016-12-26 06:57 UTC, Dragon Chuang
Details
Calc Text Import wrong data width (54.59 KB, image/png)
2016-12-26 06:58 UTC, Dragon Chuang
Details
Import file (fixed width CSV) (1.52 KB, text/plain)
2016-12-27 04:03 UTC, Dragon Chuang
Details
Excel Text Import (26.95 KB, image/png)
2016-12-27 04:15 UTC, Dragon Chuang
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dragon Chuang 2016-12-26 06:57:48 UTC
Created attachment 129941 [details]
Text File, data width 58 bytes each line

Text Import used incorrect data width in non-ascii character set.

It should not be character but byte.
In DBCS (Chinese, Japanese) one character used two bytes.

Thank you.

Text File, data width 58 bytes each line.
Calc Text Import, data width 33 bytes(?) each line.
Comment 1 Dragon Chuang 2016-12-26 06:58:52 UTC
Created attachment 129942 [details]
Calc Text Import wrong data width
Comment 2 Xisco Faulí 2016-12-26 12:46:34 UTC
Hello Dragon,

Thank you for reporting the bug. Please attach a sample document, as this makes it easier for us to verify the bug. 
I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the requested document is provided.
(Please note that the attachment will be public, remove any sensitive information before attaching it. 
See https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F for help on how to do so.)
Comment 3 V Stuart Foote 2016-12-26 15:12:34 UTC
No issue with fixed column import of utf-8 encoded CJK fonts with
Version: 5.2.4.1 (x64)
Build ID: 9b50003582f07ac674d6451e411e9b77cccd2b22
CPU Threads: 8; OS Version: Windows 6.19; UI Render: default; 
Locale: en-US (en_US); Calc: group

Please post your sample document with the Windows-950 encoding.

And personally, with that data set I would reduce the white space and use a "space" (or replacement while editing)) for delimited CSV import.
Comment 4 Dragon Chuang 2016-12-27 04:03:18 UTC
Created attachment 129955 [details]
Import file (fixed width CSV)

Text file for import.
Comment 5 Dragon Chuang 2016-12-27 04:15:30 UTC
Created attachment 129956 [details]
Excel Text Import

I can not use a "space" for delimited CSV import, because source file is fixed width. Some "space" (column) is no data, not delimitation.

(Microsoft Excel correctly import this file.)
Comment 6 V Stuart Foote 2016-12-27 15:15:22 UTC
Confirmed.

The "fixed width" Big5 encoded Traditional Chinese sample document, or similar in utf-8, are not correctly handled by the csvtablebox GUI.

The ruler and column selection do not adjust char to handle multi-byte characters, so the column positions for the "fixed width" import is corrupt and can not be set.

Actual import does honor the encoding and fielding as set--but the GUI (ruler and grid) have wrong layout so impossible to correctly set column widths.

Testing on Windows 10 Pro 64-bit en-US with
Version: 5.2.4.1 (x64)
Build ID: 9b50003582f07ac674d6451e411e9b77cccd2b22
CPU Threads: 8; OS Version: Windows 6.19; UI Render: default; 
Locale: en-US (en_US); Calc: group

On open into Calc, document correctly triggers Text Import dialog and is detected as Chinese Traditional and fixed width, but font encoding is initially identified as utf-8, change that to Chinese Traditional (Big5) and glyphs are correctly rendered to the GUI.

=-ref-=
http://opengrok.libreoffice.org/xref/core/sc/source/ui/dbgui/csvtablebox.cxx
http://opengrok.libreoffice.org/xref/core/sc/source/ui/inc/csvtablebox.hxx
Comment 7 Caolán McNamara 2017-02-01 21:24:39 UTC
I don't think its anything to do with double byte fonts or encodings, just that we're assuming that the cjk font width is the same as the western font width
Comment 8 QA Administrators 2018-07-21 02:40:55 UTC Comment hidden (obsolete)
Comment 9 Dragon Chuang 2018-08-13 07:18:17 UTC
The bug still present.
It can be reproduced at LibreOffice version 6.1.0.3 (x64).
Build ID:efb621ed25068d70781dc026f7e9c5187a4decd1
Comment 10 Commit Notification 2021-01-09 04:14:22 UTC
Mark Hung committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/621c189173b35ac7f5ce4c578f57045479c63ab6

tdf#104927 consider character width for CSV import

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2022-03-18 10:08:06 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/6b768542ddd52573bbdb0e7b5b85ce5a9dd4551d

Resolves: tdf#148054 Advance offset for all columns, tdf#104927 regression

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Commit Notification 2022-03-19 19:25:16 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-7-3":

https://git.libreoffice.org/core/commit/1ec1e1b6f9353159376419d37ac8d8584312671e

Resolves: tdf#148054 Advance offset for all columns, tdf#104927 regression

It will be available in 7.3.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 13 Commit Notification 2022-03-21 09:51:40 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-7-2":

https://git.libreoffice.org/core/commit/479620d84a761297e013ca76fd429d938f3d2d8f

Resolves: tdf#148054 Advance offset for all columns, tdf#104927 regression

It will be available in 7.2.7.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.