Bug 116171 - Cann't open MS visual Foxpro created *.DBF files
Summary: Cann't open MS visual Foxpro created *.DBF files
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.4.0.2 rc
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Stephan Bergmann
URL:
Whiteboard: target:6.1.0 target:6.0.3 target:5.4.6
Keywords: bibisectRequest, regression
Depends on:
Blocks:
 
Reported: 2018-03-04 05:31 UTC by chichang4911
Modified: 2018-03-09 05:15 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
sample files and description in zh-TW (49.75 KB, application/x-zip-compressed)
2018-03-04 05:39 UTC, chichang4911
Details
5.4.0.2 failed to open the file (9.58 KB, image/png)
2018-03-04 07:38 UTC, Franklin Weng
Details
5.4.0.1 could open but the encoding is not correctly handled (160.07 KB, image/png)
2018-03-04 07:40 UTC, Franklin Weng
Details
In 5.4.0.0 beta2 it could open and show correctly (187.23 KB, image/png)
2018-03-04 07:41 UTC, Franklin Weng
Details
bt from console log (12.05 KB, text/plain)
2018-03-04 10:45 UTC, Julien Nabet
Details
bt concerning charset mapping (3.04 KB, text/plain)
2018-03-04 14:04 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description chichang4911 2018-03-04 05:31:59 UTC
Description:
The problem is only  summary described.
When I
1.Choose File - Open.
2.Locate the *.dbf file that you want to import.
3.Click Open.
The Import dBASE files dialog was not shown as before. (v5.4.x.x)

However, this file once open from LO v5.4, and saves as a new file with dbf format, it now can be opened by v6.0. 

Steps to Reproduce:
1.Choose File - Open.
2.Locate the *.dbf file that want to import.
3.Click Open.

Actual Results:  
shows reading error message

Expected Results:
popup: Import dBASE files dialog 


Reproducible: Always


User Profile Reset: Yes


OpenGL enabled: Yes

Additional Info:
The problem never happened on v5.4.x.x and before since Ooo v1


User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0
Comment 1 chichang4911 2018-03-04 05:39:56 UTC
Created attachment 140324 [details]
sample files and description in zh-TW
Comment 2 Franklin Weng 2018-03-04 06:03:24 UTC
(In reply to chichang4911 from comment #1)
> Created attachment 140324 [details]
> sample files and description in zh-TW

Additional information in the zip file:

addresses.dbf is the original file generated by Visual Foxpro v6.0, which can not be opened by LibreOffice 6.0.2, showing "Read Error.  Impossible to connect to file."

addresses_v5.dbf is generated by opening the addresses.dbf with Calc v5.x successfully, then saved as a new dbf file.  It can be opened by Calc 6.0.2.

addresses_mad.dbf is generated by opening the addresses.dbf with madedit successfully, then saved as a new dbf file.  It can also be opened by Calc 6.0.2.
Comment 3 Franklin Weng 2018-03-04 07:38:54 UTC
Created attachment 140325 [details]
5.4.0.2 failed to open the file

Bisected:

5.4.0.1 could open (but with other problem), and in 5.4.0.2 it failed to open.

Tested under Linux (Kubuntu 16.04).
Comment 4 Franklin Weng 2018-03-04 07:40:21 UTC
Created attachment 140326 [details]
5.4.0.1 could open but the encoding is not correctly handled

5.4.0.1 could open the file, but choosing the encoding Big5 the Chinese characters weren't shown correctly.
Comment 5 Franklin Weng 2018-03-04 07:41:20 UTC
Created attachment 140327 [details]
In 5.4.0.0 beta2 it could open and show correctly

In 5.4.0.0 beta2 it could be opened and shown correctly with encoding Big5.
Comment 6 Franklin Weng 2018-03-04 07:42:27 UTC
Notice that my testing (and bisecting) is under Linux.  The original reporter is using Windows.  According to his report (to us) he could open it with 5.4.4 (Windows version) correctly.
Comment 7 raal 2018-03-04 08:10:36 UTC
(In reply to Franklin Weng from comment #6)
> Notice that my testing (and bisecting) is under Linux.  The original
> reporter is using Windows.  According to his report (to us) he could open it
> with 5.4.4 (Windows version) correctly.

Hi chichang4911, 
please test it with newer version and let us know if it works. Thank you.

http://www.libreoffice.org/download/libreoffice-fresh/
Comment 8 raal 2018-03-04 08:30:30 UTC
I can confirm with file addresses.dbf [chinese BIG5 coding]
Version: 6.1.0.0.alpha0+
Build ID: 44b4ad7d210097fdaed7dd94c5746b03f43592d3
CPU threads: 4; OS: Linux 4.4; UI render: default; VCL: gtk3; 
and Version: 6.1.0.0.alpha0+
Build ID: e108a31a8fee09c2fa4031e45e45ed73bbdb7c6f
CPU threads: 1; OS: Windows 10.0; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2018-03-03_23:36:02

regression, works in LO $.4, Linux
Comment 9 Julien Nabet 2018-03-04 09:47:35 UTC
On pc Debian x86-64 with master sources updated some days ago, I could reproduce this.
Just for the information, it can be opened from Base.
1) Launch Base
2) Choose "Connect to an existing DB" then "dBASE"
3) Select path where dbase file is.
4) Click "Finish"
5) Choose file name for the odb file

I'll give it a try.
Comment 10 Julien Nabet 2018-03-04 10:05:09 UTC
Just to be sure:
I used hexdump to view the first 32 bytes of addresses.dbf and got this:
00000000  30 12 03 03 06 00 00 00  48 03 d8 01 00 00 00 00  |0.......H.......|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 03 78 00 00  |.............x..|

According to this https://www.clicketyclick.dk/databases/xbase/format/dbf.html#DBF_STRUCT, byte 29 (so the 30th byte since we start from 0) is equal to the hexa 78 (or 120 in decimal)
In https://opengrok.libreoffice.org/xref/core/include/rtl/textenc.h, I don't see value 120.

Could you indicate the precise encoding of the file?
Perhaps I'm wrong or missed something.
Comment 11 Julien Nabet 2018-03-04 10:45:57 UTC
Created attachment 140330 [details]
bt from console log

On console, I noticed this log:
DBaseImport: dbtools::OCharsetMap doesn't know text encoding
So here's the bt from this point.
Comment 12 Julien Nabet 2018-03-04 10:50:39 UTC
First byte of dbf file corresponds to the version (see https://www.clicketyclick.dk/databases/xbase/format/dbf.html#DBF_STRUCT)
Mapping of version value:
https://www.clicketyclick.dk/databases/xbase/format/dbf.html#DBF_NOTE_1_TARGET
(notice that it might change slightly when searching other references)

addresses.dbf => 30 (so Visual Foxpro as expected I suppose)
addresses-v5.dbf => 83 (File with DBT) If I remember well it's the default version LO uses (DBT for MEMO fields which may be used).
addresses-mad.dbf => 03 (File without DBT)

But the pb is not the version, it's the unrecognized encoding "78"

In addresses-mad.dbf, we also got 78
and in addresses-v5.dbf, we got 0 (equivalent to "don't know")
Comment 13 chichang4911 2018-03-04 12:49:37 UTC
(In reply to raal from comment #7)
> (In reply to Franklin Weng from comment #6)
> > Notice that my testing (and bisecting) is under Linux.  The original
> > reporter is using Windows.  According to his report (to us) he could open it
> > with 5.4.4 (Windows version) correctly.
> 
> Hi chichang4911, 
> please test it with newer version and let us know if it works. Thank you.
> 
> http://www.libreoffice.org/download/libreoffice-fresh/

I will try to test more versions of Libreoffice,
Test os: windows 10 (64bits)
Test files: addresses.dbf, addresses-mad.dbf, addresses-v5.dbf 
Results
		addresses	addresses-mad	addresses-v5
v5.1.6.2	ok		ok		ok		
v5.2.7.2	ok		ok		ok
v5.3.7.2	ok		ok		ok
v5.4.0.1	ok		ok		ok
v5.4.0.2	xx		xx		ok
v5.4.1.2	xx		xx		ok
v5.4.4.2	xx		xx		ok
Comment 14 Julien Nabet 2018-03-04 12:56:14 UTC
(In reply to chichang4911 from comment #13)
> ...
> I will try to test more versions of Libreoffice,
> ...
Thank you for your feedback, but above all, as asked in comment 6, could you indicate the precise encoding of the file? (addresses.dbf)
Comment 15 Franklin Weng 2018-03-04 12:59:09 UTC
(In reply to Julien Nabet from comment #14)
> (In reply to chichang4911 from comment #13)
> > ...
> > I will try to test more versions of Libreoffice,
> > ...
> Thank you for your feedback, but above all, as asked in comment 6, could you
> indicate the precise encoding of the file? (addresses.dbf)

As mentioned in comment #4, the encoding is zh_TW.Big5.  Version before 5.4.0.0 beta2 choosing Big5 as encoding could show Chinese correctly.  But it broke in 5.4.0.1, though opening the file was okay.
Comment 16 Julien Nabet 2018-03-04 14:04:05 UTC
Created attachment 140333 [details]
bt concerning charset mapping

So hexa 78 is used here in dbfDecodeCharset and mapped to RTL_TEXTENCODING_MS_950
See https://opengrok.libreoffice.org/xref/core/connectivity/source/commontools/dbtools.cxx#2020
See bt

Quote from https://en.wikipedia.org/wiki/Code_page_950, "Code page 950 is Microsoft's implementation of the de facto standard Big5"
So no pb in this part.
Comment 17 Julien Nabet 2018-03-04 15:37:10 UTC
I submitted a first patch to review here for the encoding part:
https://gerrit.libreoffice.org/#/c/50731/

+ another patch for dealing with timestamp and empty value:
https://gerrit.libreoffice.org/#/c/50732/
Comment 18 Commit Notification 2018-03-04 17:08:06 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a77b493392ecdfe2e58bb0fcfa7363a8583dffe4

Related tdf#116171: don't try to convert empty value in timestamp

It will be available in 6.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 chichang4911 2018-03-04 17:24:07 UTC
(In reply to chichang4911 from comment #13)
> (In reply to raal from comment #7)
> > (In reply to Franklin Weng from comment #6)
> > > Notice that my testing (and bisecting) is under Linux.  The original
> > > reporter is using Windows.  According to his report (to us) he could open it
> > > with 5.4.4 (Windows version) correctly.
> > 
> > Hi chichang4911, 
> > please test it with newer version and let us know if it works. Thank you.
> > 
> > http://www.libreoffice.org/download/libreoffice-fresh/
> 
> I will try to test more versions of Libreoffice,
> Test os: windows 10 (64bits)
> Test files: addresses.dbf, addresses-mad.dbf, addresses-v5.dbf 
> Results
> 		addresses	addresses-mad	addresses-v5
> v5.1.6.2	ok		ok		ok		
> v5.2.7.2	ok		ok		ok
> v5.3.7.2	ok		ok		ok
> v5.4.0.1	ok		ok		ok
> v5.4.0.2	xx		xx		ok
> v5.4.1.2	xx		xx		ok
> v5.4.4.2	xx		xx		ok

v6.0.2.1 xx    ok     ok
Comment 20 chichang4911 2018-03-04 17:50:11 UTC
When I test "libo-60-64~2018-03-04_05.34.43_LibreOfficeDev_6.0.3.0.0_Win_x64.msi"

addresses.dbf still shows "read error impossible to connect to the file"
addresses-mad.dbf, addresses-v5.dbf can be imported
Comment 21 Julien Nabet 2018-03-04 18:16:39 UTC
(In reply to chichang4911 from comment #20)
> When I test
> "libo-60-64~2018-03-04_05.34.43_LibreOfficeDev_6.0.3.0.0_Win_x64.msi"
> 
> addresses.dbf still shows "read error impossible to connect to the file"
> addresses-mad.dbf, addresses-v5.dbf can be imported

Since I can reproduce this with master sources, it's expected. No need further tests for the moment.
Comment 22 Commit Notification 2018-03-04 18:20:29 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "libreoffice-6-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=bab7cef648025038055d3284773d33f102d42f13&h=libreoffice-6-0

Related tdf#116171: don't try to convert empty value in timestamp

It will be available in 6.0.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 23 Stephan Bergmann 2018-03-04 21:25:00 UTC
(In reply to Franklin Weng from comment #15)
> As mentioned in comment #4, the encoding is zh_TW.Big5.  Version before
> 5.4.0.0 beta2 choosing Big5 as encoding could show Chinese correctly.  But
> it broke in 5.4.0.1, though opening the file was okay.

Is it possible to bibisect this, to find out which commit exactly broke it?
Comment 24 Julien Nabet 2018-03-04 22:05:29 UTC
Stephan: I may be wrong but I think it's due to some work done last year, see https://cgit.freedesktop.org/libreoffice/core/log/?qt=grep&q=dbase
Comment 25 Julien Nabet 2018-03-05 10:34:46 UTC
chichang4911: just to be sure with 5.4.0.2 or before + with a brand new LO profile (see https://wiki.documentfoundation.org/UserProfile#Windows), does LO open addresses.dbf directly or does LO ask about encoding before opening it?
Comment 26 Julien Nabet 2018-03-05 10:51:23 UTC
(In reply to Julien Nabet from comment #25)
> chichang4911: just to be sure with 5.4.0.2 or before + with a brand new LO
> profile (see https://wiki.documentfoundation.org/UserProfile#Windows), does
> LO open addresses.dbf directly or does LO ask about encoding before opening
> it?

Argh, I meant 5.4.0.1 but don't bother, I'll test this. I retrieved the archive.
Comment 27 Julien Nabet 2018-03-05 11:04:59 UTC
I confirm I don't reproduce the pb with 5.4.0.1 since LO asks about the encoding.

The regression is due to https://cgit.freedesktop.org/libreoffice/core/commit/?id=7f1465a9599e9665159dd2d823a6e9064cca5703 but this patch fixes a broken situation and so reveals a bug.
Indeed, load_CharSet from sc/source/ui/unoobj/filtuno.cxx wasn't searching encoding of the file but displayed a list of encodings that a user could choose with at the beginning of the list either 850 encoding by default or the last one used.
See in particular https://cgit.freedesktop.org/libreoffice/core/diff/sc/source/ui/unoobj/filtuno.cxx?id=7f1465a9599e9665159dd2d823a6e9064cca5703
Comment 28 Julien Nabet 2018-03-05 11:15:51 UTC
Let's remove targets since it's not fixed for the moment.
Comment 29 Stephan Bergmann 2018-03-05 11:46:57 UTC
Yeah, whatever the claims that it worked in the past (with or without presenting a dialog first, asking the user to chose an appropriate text encoding from a list), I think I understand now what goes wrong on current master, and how to fix it.  Gerrit change is forthcoming.
Comment 30 Julien Nabet 2018-03-05 13:20:55 UTC
My patch was wrong but hopefully Stephan proposed one here:
https://gerrit.libreoffice.org/#/c/50772/

(so now, even if I'm still interested and concerned by this tracker since this regression is partly due to me, let's not pretend I'm the guy who will fix this and unassign myself)
Comment 31 Commit Notification 2018-03-05 16:44:51 UTC
Stephan Bergmann committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=5ad62544bce42396faaae2bc79c7517af6ff085b

tdf#116171: Tunnel arbitrary rtl_TextEncoding from sc to sdbc:dbase connection

It will be available in 6.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 32 Commit Notification 2018-03-06 11:35:30 UTC
Stephan Bergmann committed a patch related to this issue.
It has been pushed to "libreoffice-6-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=db7dae40a2082d5d2b1ac22008d32ef9ebf86f4e&h=libreoffice-6-0

tdf#116171: Tunnel arbitrary rtl_TextEncoding from sc to sdbc:dbase connection

It will be available in 6.0.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 33 Commit Notification 2018-03-06 11:35:42 UTC
Stephan Bergmann committed a patch related to this issue.
It has been pushed to "libreoffice-5-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=8d96cdf9ac8dedd54620d31bafbccc76d75d7757&h=libreoffice-5-4

tdf#116171: Tunnel arbitrary rtl_TextEncoding from sc to sdbc:dbase connection

It will be available in 5.4.7.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 34 Commit Notification 2018-03-07 15:41:29 UTC
Stephan Bergmann committed a patch related to this issue.
It has been pushed to "libreoffice-5-4-6":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=5598fa704df171544a913a8cfda62a183f1a1a66&h=libreoffice-5-4-6

tdf#116171: Tunnel arbitrary rtl_TextEncoding from sc to sdbc:dbase connection

It will be available in 5.4.6.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.