Bug 101975 - Problem with utf8 html-xls to csv conversion
Summary: Problem with utf8 html-xls to csv conversion
Status: RESOLVED DUPLICATE of bug 36313
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.6.7.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: CSV-Export
  Show dependency treegraph
 
Reported: 2016-09-07 19:38 UTC by viacheslav.sychov
Modified: 2021-11-02 07:02 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
input file (283 bytes, application/vnd.ms-excel)
2016-09-07 19:38 UTC, viacheslav.sychov
Details
output file (134 bytes, application/vnd.ms-excel)
2016-09-07 19:39 UTC, viacheslav.sychov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description viacheslav.sychov 2016-09-07 19:38:22 UTC
Created attachment 127205 [details]
input file

Hi,

I got problem with file conversion from html-xls format to csv, when I try convert file, using command:

# libreoffice --headless --convert-to csv --outdir . test.xls

Output:
convert /root/pc/test.xls -> /root/pc//test.csv using filter : Text - txt - csv (StarCalc)

Output file have strange chars (0xd1 0x3f), instead chars from origin files.

Example:
(original char) -> (replaced char)
0xd1 0x81 -> 0xd1 0x3f
0xd1 0x8d -> 0xd1 0x3f
0xd1 0x8f -> 0xd1 0x3f

0xd0 0x90 -> 0xd0 0x3f 
0xd0 0x81 -> 0xd0 0x3f 
0xd0 0xd9 -> 0xd0 0x3f 

Output file (base64):
0LDQsdCy0LPQtNC10ZHQttC30LjQudC60LvQvNC90L7Qv9GA0T/RgtGD0YTRhdGG0YfRiNGJ0YrRi9GM0T/RjtE/LNA/0JHQktCT0JTQldA/0JbQl9CY0JnQmtCb0JzQP9Ce0J/QoNCh0KLQo9Ck0KXQptCn0KjQqdCq0KvQrNCt0K7Qrwo=
Comment 1 viacheslav.sychov 2016-09-07 19:39:16 UTC
Created attachment 127206 [details]
output file
Comment 2 viacheslav.sychov 2016-09-07 19:42:10 UTC
Version: LibreOffice 5.2.0.4 066b007f5ebcc236395c7d282ba488bca6720265
Kernel: 3.10.0-327.28.3.el7.x86_64
Comment 3 Buovjaga 2016-09-30 11:26:29 UTC
Repro.

Arch Linux 64-bit, KDE Plasma 5
Version: 5.3.0.0.alpha0+
Build ID: 8e812b87ff7f8c5bf2c6f8858646c55effd2eea3
CPU Threads: 8; OS Version: Linux 4.7; UI Render: default; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on September 30th 2016

Arch Linux 64-bit
Version 3.6.7.2 (Build ID: e183d5b)
Comment 4 QA Administrators 2018-07-21 02:39:23 UTC Comment hidden (obsolete)
Comment 5 QA Administrators 2020-07-21 03:47:11 UTC Comment hidden (obsolete)
Comment 6 Kevin Suo 2021-11-02 07:02:17 UTC
libreoffice --headless --convert-to csv --infilter=CSV:44,34,UTF8 --outdir . test.xls

where "44" denotes the field separator character (ascii code for the period sign ',') and "34" denotes the quote character (ascii code for double-quote '"').

see https://bugs.documentfoundation.org/show_bug.cgi?id=36313#c17

*** This bug has been marked as a duplicate of bug 36313 ***