160970 – Problem in command line file conversion (XLSX to DBF) with special character

Bug 160970 - Problem in command line file conversion (XLSX to DBF) with special character

Summary: Problem in command line file conversion (XLSX to DBF) with special character

Status:	UNCONFIRMED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	filters and storage (show other bugs)
Version: (earliest affected)	7.6.6.3 release
Hardware:	All All

Importance:	medium normal
Assignee:	Not Assigned

URL:	https://ask.libreoffice.org/t/change-...
Whiteboard:
Keywords:

Depends on:
Blocks:	Commandline
	Show dependency tree / graph

Reported:	2024-05-07 08:02 UTC by joerg.goerner
Modified:	2024-11-21 13:49 UTC (History)
CC List:	3 users (show)

See Also:
Crash report or crash signature:

Attachments
Address list as sample (8.62 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet) 2024-05-07 08:06 UTC, joerg.goerner	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description joerg.goerner 2024-05-07 08:02:05 UTC

Description:
I use the file conversion methode in the command line like this:
"C:\Program Files\LibreOffice\program\scalc.exe" --convert-to dbf Testlist.xlsx

If a cell contains a string with the czech character 'š' (ASCII 154) conversion ends before this row. I have also tried it with different character sets.

Steps to Reproduce:
1. Creating a simple address list in excel, like this:
   PLZ	ORT	STRASSE
   14169	Berlin	Teltower Damm 1
   140 00	Praha	Antala Staška 2
   42781	Haan	Schallbruch 3
2. Save the Excel file
3. Try to convert the excel-file by command line

Actual Results:
The dbf-file will end with after first line of data 

Expected Results:
the complete address list with all records


Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 7.6.6.3 (X86_64) / LibreOffice Community
Build ID: d97b2716a9a4a2ce1391dee1765565ea469b0ae7
CPU threads: 12; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: de-DE
Calc: CL threaded

Comment 1 joerg.goerner 2024-05-07 08:06:19 UTC

Created attachment 194012 [details]
Address list as sample

Comment 2 Stéphane Guillou (stragu) 2024-05-23 05:17:55 UTC

If using the GUI, the default character set used is "Western Europe (DOS/OS2-850/International), which results in this error message:

Error saving the document Testlist:
Write Error.
Cell SfxBaseModel::impl_store <file:///home/stragu/Downloads/Testlist.dbf>
failed: 0x40c03(Error Area:Sc Class:Write Code:3) arg1=C3 arg2=Western
Europe (DOS/OS2-850/International) at /home/tdf/lode/jenkins/workspace/
lo_gerrit/tb/src_master/sfx2/source/doc/sfxbasemodel.cxx:3304 contains
characters that are not representable in the selected target character set "$
(ARG2)".

Resulting file only has one address.

Using the command line, I get in the console:

warn:connectivity.drivers:151848:151848:connectivity/source/drivers/dbase/DTable.cxx:521: Parsing warning: 0 records claimed, recovering
warn:sc:151848:151848:sc/source/ui/docshell/docsh8.cxx:986: ScDocShell::DBaseExport com.sun.star.sdbc.SQLException message: "The string “Antala Staška 2” cannot be converted using the encoding “ibm850”. at /home/tdf/lode/jenkins/workspace/lo_gerrit/tb/src_master/connectivity/source/commontools/dbtools2.cxx:910" SQLState: 22018 ErrorCode: 22018
    wrapped: 
warn:sc:151848:151848:sc/source/ui/docshell/docsh8.cxx:1045: ScDocShell::DBaseExport encoding error, string with default replacements: ``Antala Staška 2''
Error: Please verify input parameters... (SfxBaseModel::impl_store <file:///home/stragu/Downloads/Testlist.dbf> failed: 0x40c03(Error Area:Sc Class:Write Code:3) arg1=C3 arg2=Western Europe (DOS/OS2-850/International) at /home/tdf/lode/jenkins/workspace/lo_gerrit/tb/src_master/sfx2/source/doc/sfxbasemodel.cxx:3304 at /home/tdf/lode/jenkins/workspace/lo_gerrit/tb/src_master/sfx2/source/doc/sfxbasemodel.cxx:1822)

Same result.

One would need to pick a suitable character set for it, see: https://help.libreoffice.org/latest/en-US/text/shared/guide/lotusdbasediff.html

For example this works for me, using the encoding "Windows-1250/WinLatin 2 (Central European)":

soffice --headless --convert-to dbf:dBase:33 ./Testlist.xlsx

Does an equivalent command work for you?

Comment 3 QA Administrators 2024-11-20 03:16:52 UTC Comment hidden (obsolete)

Dear joerg.goerner,

This bug has been in NEEDINFO status with no change for at least
6 months. Please provide the requested information as soon as
possible and mark the bug as UNCONFIRMED. Due to regular bug
tracker maintenance, if the bug is still in NEEDINFO status with
no change in 30 days the QA team will close the bug as INSUFFICIENTDATA
due to lack of needed information.

For more information about our NEEDINFO policy please read the
wiki located here:
https://wiki.documentfoundation.org/QA/Bugzilla/Fields/Status/NEEDINFO

If you have already provided the requested information, please
mark the bug as UNCONFIRMED so that the QA team knows that the
bug is ready to be confirmed.
 
Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-NeedInfo-Ping

Comment 4 joerg.goerner 2024-11-21 13:49:39 UTC

Sorry for the delay!

The recommended character set dBase:33 causes some other problems.

Result for 'Antala Staška 2' is then 'Antala StaÜka 2'. That's not correct but would not be a big problem for me in this case.

The original file has some more records.

Unfortunately with this setting all german vowel mutations like 'ä', 'ö', 'ü' are incorrect and file converting ends now before the record containing the term 'M³ Raum'.

I have tried a lot of differnt filter parameters, but no type has solved the problem.
Maybe it's not possible to convert a file with some different international records?

In my opinion it would be acceptable not to get all characters correct but skipping the following records could be dangerous.