Bug 88909

Summary: [4.4.0.3] After upgrading, opening a CSV UTF-8 in Calc defaults to UTF-16 encoding
Product: LibreOffice Reporter: Chris Heald <cheald>
Component: CalcAssignee: Not Assigned <libreoffice-bugs>
Status: RESOLVED DUPLICATE    
Severity: normal CC: ilmari.lauhakangas, raal
Priority: medium    
Version: 4.4.0.2 rc   
Hardware: x86-64 (AMD64)   
OS: Windows (All)   
Whiteboard:
Crash report or crash signature: Regression By:
Attachments: Example file which fails in UTF-16

Description Chris Heald 2015-01-29 17:18:41 UTC
Created attachment 112924 [details]
Example file which fails in UTF-16

This makes LibreOffice interpret the file incorrectly as Chinese characters. This is fixable by changing the encoding, but it's pretty confusing if you don't know what encodings are, and I'm not sure why UTF-16 should ever be a default for a text-based file format.

http://i.imgur.com/i1YuVXA.png
Comment 1 Chris Heald 2015-01-29 17:19:24 UTC
So, I haven't had coffee yet. Of course it fails in UTF-16, but the problem is that after upgrading 4.3 to 4.4, the default encoding changed to UTF-16, which "broke" CSV importing.
Comment 2 Buovjaga 2015-01-31 12:06:45 UTC
I've heard about this (on Reddit I guess), yet for me your .csv suggests UTF-8 and not UTF-16.

Win 7 Pro 64-bit, LibO Version: 4.4.0.3
Build ID: de093506bcdc5fafd9023ee680b8c60e3e0645d7

Version: 4.5.0.0.alpha0+
Build ID: 309574394bd4ae3e9e10e5ff0d64bdd7bbbc8b83
TinderBox: Win-x86@62-TDF, Branch:MASTER, Time: 2015-01-29_23:44:46

Ubuntu 14.10 64-bit Version: 4.5.0.0.alpha0+
Build ID: 8fd9c25ac66dd238d4c68be3974241a18cb21705
TinderBox: Linux-rpm_deb-x86_64@46-TDF-dbg, Branch:master, Time: 2015-01-27_22:43:15
Comment 3 raal 2015-02-11 07:49:30 UTC
When I  open your file in Version: 4.4.0.3
ID build: de093506bcdc5fafd9023ee680b8c60e3e0645d7
Locale: cs_CZ
then LO offers WIN-1250 as default. WIN-1250 is default for Locale: cs_CZ. Your file is in UTF-8. 


The same in Version: 4.5.0.0.alpha0+
Build ID: 1845b6af3991ca5521eef48aafe1d0489e2ff8f6
TinderBox: Win-x86@42, Branch:master, Time: 2015-02-02_09:30:48
Locale: cs_CZ

No problem in Linux, character set correctly recognized as UTF-8
  
 => on Windows Cals fails to recognize encoding of .csv file and offers default encoding from windows enviroment.
Comment 4 Maxim Monastirsky 2015-07-30 11:47:56 UTC

*** This bug has been marked as a duplicate of bug 82418 ***