Bug 108789 - Thai character encoding mismatch
Summary: Thai character encoding mismatch
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.4.0.1 rc
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Julien Nabet
URL:
Whiteboard: target:6.0.0 target:5.4.0.3 target:5.3.5
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-26 13:04 UTC by Viruch Hemapanpairo
Modified: 2017-07-20 20:49 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
encoding mismatched of LO5.4rc1 (260.70 KB, image/png)
2017-06-26 13:04 UTC, Viruch Hemapanpairo
Details
correct result of LO5.3.4.2 (260.50 KB, image/png)
2017-06-26 13:05 UTC, Viruch Hemapanpairo
Details
DBF file with items of TiS-620 Thai characters (798.96 KB, application/x-dbf)
2017-06-26 22:26 UTC, Viruch Hemapanpairo
Details
hexdump of DBF file that contains TIS-620 encoding (480 bytes, text/plain)
2017-06-28 02:14 UTC, Viruch Hemapanpairo
Details
hexdump of DBF file that contains TIS-620 encoding (480 bytes, text/plain)
2017-06-28 02:15 UTC, Viruch Hemapanpairo
Details
hexdump of DBF file that contains TIS-620 encoding (480 bytes, text/plain)
2017-06-28 02:15 UTC, Viruch Hemapanpairo
Details
hexdump of DBF file that contains TIS-620 encoding (480 bytes, text/plain)
2017-06-28 02:17 UTC, Viruch Hemapanpairo
Details
hexdump of DBF file that contains TIS-620 encoding (480 bytes, text/plain)
2017-06-28 02:17 UTC, Viruch Hemapanpairo
Details
hexdump of DBF file that contains TIS-620 encoding (480 bytes, text/plain)
2017-06-28 02:17 UTC, Viruch Hemapanpairo
Details
2017-07-04 daily built test screen shot (285.49 KB, image/png)
2017-07-04 21:46 UTC, Viruch Hemapanpairo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Viruch Hemapanpairo 2017-06-26 13:04:37 UTC
Created attachment 134291 [details]
encoding mismatched of LO5.4rc1

There may be some problem on importing DBF file with Thai (TIS-620) encoding into LibreOffice 5.4 rc1 Calc. The effect haven't occured before.

My laptop :Lenovo T430
OS : Lubuntu 14.04
Libreoffice version:
Version: 5.4.0.1
Build ID: 962a9c4e2f56d1dbdd354b1becda28edd471f4f2
CPU threads: 4; OS: Linux 3.13; UI render: default; VCL: gtk2; 
Locale: en-US (en_US.UTF-8); Calc: group

I also attached 2 files from different LO version to show the comparision. The correct one is from LibreOffice 5.3.4.2
Comment 1 Viruch Hemapanpairo 2017-06-26 13:05:33 UTC
Created attachment 134292 [details]
correct result of LO5.3.4.2
Comment 2 Xisco Faulí 2017-06-26 15:15:07 UTC Comment hidden (obsolete)
Comment 3 Julien Nabet 2017-06-26 15:29:09 UTC
I'll give it a try, might be a regression from my patches about dbf.

Would it be possible you attach the dbf file so I could give it a try?
Without it, it could be far more difficult to debug this.
Comment 4 Viruch Hemapanpairo 2017-06-26 22:26:14 UTC
Created attachment 134302 [details]
DBF file with items of TiS-620 Thai characters

I attached here the file with more records contained the TIS-620 Thai characters for your investigation.
Comment 5 Julien Nabet 2017-06-27 05:15:20 UTC
On pc Debian x86-64 with master sources updated yesterday, I could reproduce this.
With LO 5.2.7 (package from Debian), I don't reproduce this.
I confirm the bug + regression.

hexdump shows this:
0000000 7503 1b06 2046 0000 00c1 0063 0000 0000
0000010 0000 0000 0000 0000 0000 0000 1b00 0000

1B isn't listed in encodings
Comment 6 Julien Nabet 2017-06-27 18:41:27 UTC
Argh my fix for tdf#55631 is wrong.

It seems except the case encoding can be read from the header of dbf file, that it'll always be RTL_TEXTENCODING_IBM_850 even if the user selects another encoding.
:-(

I'm gonna try to understand why the first patch https://cgit.freedesktop.org/libreoffice/core/commit/?id=9fe9685627c51926459a897594ead9f64deee579 was considered wrong (see https://gerrit.libreoffice.org/#/c/38627/)
Comment 7 Julien Nabet 2017-06-27 19:26:03 UTC
Ok, 850 is transformed to "DONTKNOW" so LO tries to read encoding from header.

In addition here, since 0x1b isn't listed in the switch, the header encoding can't be used even 850 is selected.

I didn't find any website indicating 1b would correspond to "Thai ISO 8859-11/TIS 620".

Viruch: 
could you run some hexdump <file>| head -10
to see if all dbf files containing TIS 620 encoding have this 1b at second line? (at the same exact location as the already quoted example:
0000000 7503 1b06 2046 0000 00c1 0063 0000 0000
0000010 0000 0000 0000 0000 0000 0000 1b00 0000
)
If yes, we could start to add it on a switch.
Comment 8 Viruch Hemapanpairo 2017-06-28 02:12:43 UTC
Attached here are the hexdump result of 6 different files with TIS-620 encoding inside. As I opened these files by "hexedit", it seems to be that beginning from the position of 0000020 of every files are the starting position of the "Field_Names" of each .DBF file. So some of them would look very indentical due to their similar "Filed_Name".
Comment 9 Viruch Hemapanpairo 2017-06-28 02:14:31 UTC
Created attachment 134332 [details]
hexdump of DBF file that contains TIS-620 encoding
Comment 10 Viruch Hemapanpairo 2017-06-28 02:15:14 UTC
Created attachment 134333 [details]
hexdump of DBF file that contains TIS-620 encoding
Comment 11 Viruch Hemapanpairo 2017-06-28 02:15:42 UTC
Created attachment 134334 [details]
hexdump of DBF file that contains TIS-620 encoding
Comment 12 Viruch Hemapanpairo 2017-06-28 02:17:08 UTC
Created attachment 134335 [details]
hexdump of DBF file that contains TIS-620 encoding
Comment 13 Viruch Hemapanpairo 2017-06-28 02:17:34 UTC
Created attachment 134336 [details]
hexdump of DBF file that contains TIS-620 encoding
Comment 14 Viruch Hemapanpairo 2017-06-28 02:17:59 UTC
Created attachment 134337 [details]
hexdump of DBF file that contains TIS-620 encoding
Comment 15 Commit Notification 2017-07-02 11:33:56 UTC
Lionel Elie Mamane committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=7f1465a9599e9665159dd2d823a6e9064cca5703

tdf#108789 and others: overhaul DBase files encoding handling

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Commit Notification 2017-07-04 07:33:31 UTC
Lionel Elie Mamane committed a patch related to this issue.
It has been pushed to "libreoffice-5-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=6e0eafe576436ec229c6d90f654ff1b11ff9bdfd&h=libreoffice-5-4

tdf#108789: branch 5.4 only

It will be available in 5.4.0.2.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Viruch Hemapanpairo 2017-07-04 21:45:03 UTC
It seems to be that the bug was already solved in the latest daily build version as tested on Lubuntu 14.04
Version: 5.4.0.1.0+
Build ID: 6e0eafe576436ec229c6d90f654ff1b11ff9bdfd
CPU threads: 4; OS: Linux 3.13; UI render: default; VCL: gtk2; 
TinderBox: Linux-rpm_deb-x86@71-TDF, Branch:libreoffice-5-4, Time: 2017-07-04_07:33:47
Locale: en-US (en_US.UTF-8); Calc: single
Comment 18 Viruch Hemapanpairo 2017-07-04 21:46:20 UTC
Created attachment 134485 [details]
2017-07-04 daily built test screen shot
Comment 19 Julien Nabet 2017-07-06 05:14:08 UTC
(In reply to Viruch Hemapanpairo from comment #17)
> It seems to be that the bug was already solved in the latest daily build
> ...
There's still https://gerrit.libreoffice.org/#/c/39449/ for 5.3 branch but let's put this one to FIXED.
Comment 20 Commit Notification 2017-07-06 18:03:55 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "libreoffice-5-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=91bafeff8b8a195d9ecc242c0def413361161d79&h=libreoffice-5-3

tdf#108789 quick fix for 5.3 branch only

It will be available in 5.3.5.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Commit Notification 2017-07-19 19:25:45 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=22ae038a56b85e86219922c2759544545f2d813d

Fix crash when saving new spreadsheet as dBase/.dbf, tdf#108789 follow-up

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Commit Notification 2017-07-20 07:59:22 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-5-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=e059303c2dc6ecac5247c315f5b452d346512c12&h=libreoffice-5-4

Fix crash when saving new spreadsheet as dBase/.dbf, tdf#108789 follow-up

It will be available in 5.4.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 23 Commit Notification 2017-07-20 20:00:38 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-5-4-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a9950f79867c4a7c2d4c542db754c5d24028fadf&h=libreoffice-5-4-0

Fix crash when saving new spreadsheet as dBase/.dbf, tdf#108789 follow-up

It will be available in 5.4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.