69744 – Data in Visual FoxPro DBF is garbled

Bug 69744 - Data in Visual FoxPro DBF is garbled

Summary: Data in Visual FoxPro DBF is garbled

Status:	RESOLVED INVALID

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Base (show other bugs)
Version: (earliest affected)	Inherited From OOo
Hardware:	Other All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-09-24 05:35 UTC by Urmas
Modified:	2017-06-17 16:18 UTC (History)
CC List:	3 users (show)

See Also:
Crash report or crash signature:

Attachments
Russian DBF (410 bytes, application/x-dbase) 2013-09-24 05:35 UTC, Urmas	Details
Screenshot (4.55 KB, image/png) 2013-09-24 22:51 UTC, Urmas	Details
*Two charsets in the same .dbf** (17.29 KB, image/png) 2013-09-26 14:39 UTC, Robert Großkopf	Details
Opening the file with MS Excel 2013 (46.42 KB, application/x-zip-compressed) 2013-09-28 10:48 UTC, Mike Kaganski	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Urmas 2013-09-24 05:35:53 UTC

Created attachment 86430 [details]
Russian DBF

When using the attached text file in LO Base, data in table are garbled.
VFP9 shows the table properly.

Comment 1 Robert Großkopf 2013-09-24 17:50:47 UTC

Would be better to show with screenshots, how it should look. I have opened the file with Calc and tried different filters - having no idea what I should search for.
With which program the file is created? 
Which character-set had been chosen?
Seems Calc doesn't know the right charcater-set for the file. But Calc (and also Base) could change the character-set in many ways.

Comment 2 Urmas 2013-09-24 22:51:55 UTC

Created attachment 86493 [details]
Screenshot

Comment 3 Robert Großkopf 2013-09-25 19:20:19 UTC

Seems to be a problem, because the field isn't created by the same character-set as the content.
When I try to get the right field-description, I have to chose "Kyrillisch DOS/OS2-866" in the German version of LO.
When I try to get the right content-description, I have to use "Kyrillisch PT154" or "Kyrillisch Windows-1251".
It's the same behavior in all LO-versions and in Base and Calc. I see the same behavior in AOO 4.0. So I don't know, if this is a bug of LO or a bug of the program the file is created with ...

Comment 4 Urmas 2013-09-26 13:38:49 UTC

I think VFP screenshot gives an unambiguous answer whether this is an LO bug or not.

Comment 5 Robert Großkopf 2013-09-26 14:39:07 UTC

Created attachment 86654 [details]
Two charsets in the same *.dbf

When I open the *.dbf with different charsets I could see the field-description in the right way and the field-content in the right way. If a program works with different charsets for the content and the header in the same *.dbf, the program itself should present the right content. But how should another program recognize it.
I haven't any possibility here to open the file in the right way you show with the screenshot of Visual FoxPro. So I can't confirm, that it is a wrong behavior of LO/AOO/OOo.
Let us hope anybody else will read this and test it, for example, with MS Excel ...

Comment 6 Mike Kaganski 2013-09-28 10:48:02 UTC

Created attachment 86767 [details]
Opening the file with MS Excel 2013

I think MS Excel screenshots give unambiguous answer whether this is an LO bug or not.

Comment 7 Owen Genat (retired) 2014-07-22 04:08:12 UTC

(In reply to comment #3)
> Seems to be a problem, because the field isn't created by the same
> character-set as the content.
> When I try to get the right field-description, I have to chose "Kyrillisch
> DOS/OS2-866" in the German version of LO.
> When I try to get the right content-description, I have to use "Kyrillisch
> PT154" or "Kyrillisch Windows-1251".

I can confirm that opening (in Calc) the provided DBF under GNU/Linux using v4.3.0.3 Build ID: 08ebe52789a201dd7d38ef653ef7a48925e7f9f7 this is displayed for these character sets 

Cyrillic (DOS/OS2-866/Russian):
A1: НАЗВАНИЕ,C,80
A2: ╨єёёъшщ ЄхъёЄ

Cyrillic (PT154):
A1: ҚҖҮӮҖҚҲ…,C,80
A2: Русский текст

i.e., Using DOC/OS2-866/Russian A1 displays the field-description as "NAME" and using PT154 A2 displays the content as "Russian text". This would seem consistent with what Robert has indicated under Base.

(In reply to comment #5)
> If a program works with different charsets for the content and the header in
> the same *.dbf, the program itself should present the right content. But how
> should another program recognize it.

Agreed. At the very least this would be an enhancement request to expand the existing functionality of DBF import to offer field-by-field character set specification or to cater for a quirk with how VFP writes these files out.

(In reply to comment #6)
> I think MS Excel screenshots give unambiguous answer whether this is an LO
> bug or not.

Given that MS Excel experiences the same import issue I am tossing this report in the NEEDINFO bucket. It requires developer input as to what is feasible with handling DBF files with multiple character sets.

Comment 8 Alex Thurgood 2015-01-03 17:39:59 UTC

Adding self to CC if not already on

Comment 9 QA Administrators 2015-07-18 17:35:16 UTC Comment hidden (obsolete)

Dear Bug Submitter,

This bug has been in NEEDINFO status with no change for at least 6 months. Please provide the requested information as soon as possible and mark the bug as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in NEEDINFO status with no change in 30 days the QA team will close the bug as INVALID due to lack of needed information.

For more information about our NEEDINFO policy please read the wiki located here: 
https://wiki.documentfoundation.org/QA/FDO/NEEDINFO

If you have already provided the requested information, please mark the bug as UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed.


Thank you for helping us make LibreOffice even better for everyone!


Warm Regards,
QA Team

This NEEDINFO message was generated on: 2015-07-18

Comment 10 QA Administrators 2015-09-04 03:00:33 UTC Comment hidden (obsolete)

Dear Bug Submitter,

Please read this message in its entirety before proceeding.

Your bug report is being closed as INVALID due to inactivity and a lack of information which is needed in order to accurately reproduce and confirm the problem. We encourage you to retest your bug against the latest release. If the issue is still present in the latest stable release, we need the following information (please ignore any that you've already provided):

a) Provide details of your system including your operating system and the latest version of LibreOffice that you have confirmed the bug to be present

b) Provide easy to reproduce steps – the simpler the better

c) Provide any test case(s) which will help us confirm the problem

d) Provide screenshots of the problem if you think it might help

e) Read all comments and provide any requested information

Once all of this is done, please set the bug back to UNCONFIRMED and we will attempt to reproduce the issue. 
Please do not:
a) respond via email 
b) update the version field in the bug or any of the other details on the top section of FDO
Message generated on: 2015-09-03

Comment 11 Julien Nabet 2017-06-17 16:18:37 UTC

Following recent dBase commits (see https://cgit.freedesktop.org/libreoffice/core/log/?qt=grep&q=dbase), the dbf files open with RTL_TEXTENCODING_IBM_866 (Russian MS-DOS code page 866)
hexdump of the file shows this:
0000000 0d30 1809 0001 0000 0148 0051 0000 0000
0000010 0000 0000 0000 0000 0000 0000 6500 0000
0000020 808d 8287 8d80 8588 0000 4300 0001 0000
0000030 0050 0004 0000 0000 0000 0000 0000 0000
0000040 000d 0000 0000 0000 0000 0000 0000 0000
0000050 0000 0000 0000 0000 0000 0000 0000 0000
*
0000140 0000 0000 0000 0000 d020 f1f3 eaf1 e9e8
0000150 f220 eae5 f2f1 2020 2020 2020 2020 2020
0000160 2020 2020 2020 2020 2020 2020 2020 2020
*
0000190 2020 2020 2020 2020 1a20               
000019a

Let's read it in little-endian way, so first byte is 30 not 0d.
30 is version and corresponds here to VisualFoxPro file (see http://opengrok.libreoffice.org/xref/core/connectivity/source/inc/dbase/DTable.hxx#40)
65 (in second line) indicates RTL_TEXTENCODING_IBM_866
Third line gives field name, its fieldtype and 50 from beginning "50" from line gives indicates length field (80 in decimal).
But then lines 7 and 8 give content of the record but nothing about encoding.

So I don't know how LO could "guess" the encoding of the context except by testing range value of charsets, eg:
d0 in https://www.ascii-codes.com/cp866.html gives "Box drawings up double and horizontal single"
d0 in http://www.iana.org/assignments/charset-reg/PTCP154 gives "CYRILLIC CAPITAL LETTER ER"
But even with this, a user could want some non cyrillic characters (bow drawings) in content and the guessing would be wrong.

BTW, would be interested in dbf original with different versions (DB2, DB3, DB4... with memo, with sql, ...FoxPro, etc.) and encodings.