LibreOffice does not recognize DBF encoding 0x69. In the list of character sets, Mazovia (CP620) is not offered.
Steps to Reproduce:
Open Attached File
Asks for encoding, none of the options are Mazovia
File should open with the correct encoding. The correct rendering for the string in question is Ś╫êëτ⌡ś
User Profile Reset: Yes
Visual FoxPro has a few special encodings:
- 0x69 -> Mazovia (Polish) MS-DOS [CP620]
Unicode table: https://github.com/SheetJS/js-codepage/blob/master/codepages/620.TBL
- 0x68 -> Kamenický (Czech) MS-DOS [CP895]
Unicode table: https://github.com/SheetJS/js-codepage/blob/master/codepages/895.TBL
If VFP access is limited, Gnumeric recognizes the codepage mapping. On a machine with limited iconv support, the terminal will show a message like
Unable to open an iconv handle from codepage 620 -> UTF-8
File has unknown or missing code page information (69)
which indicates that Gnumeric detects the DBF encoding is 0x69 and that it corresponds to CP620.
Created attachment 182332 [details]
On pc Debian x86-64 with master sources updated today, I could reproduce this.
2019 //case 0x68: eEncoding = ; break; // Kamenicky (Czech) MS-DOS
2020 //case 0x69: eEncoding = ; break; // Mazovia (Polish) MS-DOS
Eike: thought you might be interested in this one since it concerns encoding and Calc/Base
These two cases are rightfully commented out. My short investigation shows we don't have conversions for them, at least there's no RTL_TEXTENCODING_IBM_620 define, the only 620 is RTL_TEXTENCODING_TIS_620 which is Thai that doesn't fit Polish ;-) Same for CP859 there's no RTL_TEXTENCODING_IBM_859.
and look for all places over the code base that use for example RTL_TEXTENCODING_IBM_865 to see what would need to be added to support a new encoding.
I've started a patch here: https://gerrit.libreoffice.org/c/core/+/139819
As you may have seen, I put some questions in it.
Just to be sure, do you confirm you're the one who did https://github.com/SheetJS/js-codepage/blob/master/codepages/620.TBL ?
If yes, perhaps you'd have some insight about https://opengrok.libreoffice.org/xref/core/sal/textenc/tcvtest1.tab?r=98492e9d
and also perhaps you may be interested to contribute by following this link https://wiki.documentfoundation.org/Development/GetInvolved ?
Both files use the same format as the unicode.org tables (e.g. http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT). The first column is a byte value and the second column is the equivalent Unicode value.
If https://opengrok.libreoffice.org/xref/core/sal/textenc/tencinfo.cxx?r=b480819d#823 is the master mapping from codepage to encodings, there are a number of issues.
For example, RTF spec  has two tables of supported codepages:
A) Pages 14-15 describe the ANSI codepages specified with \ansicpg# .
B) Pages 20-21 describe the \fcharset control word and associated codepages.
- CP720 is missing
- CP708 is described as "ASMO 708" but it should use the Windows version   . Windows 708 fills a number of gaps that ISO-8859-6 leaves undefined, so a separate mapping should be created.
- CP10021 (Mac Thai) is missing (referenced as \fcharset87)
 https://github.com/SheetJS/js-codepage/blob/master/codepages/720.TBL according to our notes it was enumerated using .NET System.Text.Encoding from a Windows 7 machine
I abandoned the patch, too complicate for me, unassign myself.
Julien Nabet committed a patch related to this issue.
It has been pushed to "master":
tdf#150877: Add support for Kamenický and Mazovia encodings
It will be available in 7.5.0.
The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.
Just to tell it clearly, even if Stephan started from my abandoned patch, he did the most important and difficult part of the job (I think about the mapping).
Thank you again Stephan!
BTW, with master sources updated today, I don't reproduce the pb anymore.
I got the string Ś╫êëτ⌡ś (in B3), so let's put this one to VERIFIED.
I added it to the release notes: https://wiki.documentfoundation.org/ReleaseNotes/7.5#Calc