Bug 123294 - FILEOPEN: csv file doesn't detect utf8 but utf16 and crashes while reading
Summary: FILEOPEN: csv file doesn't detect utf8 but utf16 and crashes while reading
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.1.3.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-09 15:16 UTC by Bernard TREMBLAY
Modified: 2019-02-10 07:30 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
CVS file recognized as UTF16 (588.41 KB, application/vnd.ms-excel)
2019-02-09 16:30 UTC, Bernard TREMBLAY
Details
hex view / notepad++ (75.79 KB, image/png)
2019-02-09 17:48 UTC, Oliver Brinzing
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bernard TREMBLAY 2019-02-09 15:16:50 UTC
Description:
Reading a csv file (opening) doesn't detect ut8 but uf16 and crashes while reading.

Need to edit in hexa mode and add utf-6 BOM to read the file.

An auto UTF8 detect doesn't run or does an error

Steps to Reproduce:
1.Open CVS file (checked as fully normal without BOM)
2.
3.

Actual Results:
Crash :
Show first for one second "utf-16" display chinese chars (UTF8 read as UTF16) the'n crashes which needs to send a report

Expected Results:
Import as successful when UTF8 BOM has been added


Reproducible: Always


User Profile Reset: No



Additional Info:
Not found any option for default import (open csv) mode.
Then suppose to be auto detect.
Comment 1 Bernard TREMBLAY 2019-02-09 15:21:15 UTC
Expected Results:
Open with auto detect encoding UTF8

Workaround : Need to Import successfully to set UTF8 BOM as first 3 chars of the file.
Comment 2 Oliver Brinzing 2019-02-09 15:34:02 UTC
Thank you for reporting the bug.

Please attach a sample document, as this makes it easier for us to verify the bug. 
(Please note that the attachment will be public, remove any sensitive information before attaching it. 

See https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F for help on how to do so.)

I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the requested document is provided.
Comment 3 Bernard TREMBLAY 2019-02-09 16:30:27 UTC
Created attachment 149050 [details]
CVS file recognized as UTF16

File exported from "everything" files searcher

Note that manual : "csv encoded utf8" is not accepted as file type but no accepted list is known
Comment 4 Oliver Brinzing 2019-02-09 17:48:28 UTC
Created attachment 149055 [details]
hex view / notepad++

i cannot reproduce a crash, attached *.csv opens fine with Lo 6.1.5.2
seems to be encoded with utf8, please have a look at the attached picture
Comment 5 Bernard TREMBLAY 2019-02-10 00:14:00 UTC
Hi,

There is something wrong but I cannot reproduce anymore.

The first time (probably after update) that I had opened a CSV :
- no one of the delimiters by defaults went filled
- showing UTF16
- displaying like chinese cicons
- after one or two seconds (core I7 at 4GHz) crash

This occurs twice

After I had added a BOM (and saved in another file renamed file)

I have opened the file and the UTF16 had become UTF8, but delimiters were not filled, I don't care, then I got an error exceeded size of chars fields.
Then I REopen and add the comma and import had been good.

Now when I reopen into a new file (calc) the csv it is OK.

So I think that they can exist bad or not initiated values for this parameters on new installation.

I have upgraded recently (may be two month to the new version and 64bits)

Best regards

Trebly
Comment 6 Oliver Brinzing 2019-02-10 07:30:51 UTC
(In reply to Bernard TREMBLAY from comment #5)
> There is something wrong but I cannot reproduce anymore.

so setting to WORKSFORME
please feel free to reopen if you can reproduce