It would be great to have charset autodetection as an option for the csv import in Calc. Currently the last charset gets remembered.
I imagine that this could cause quite a bit of trouble to endusers who accidentally change their charset import setting to something that is not the default, but looks like an innocent choice (e.g. "Unicode").
There is already a basic implementation of charset detection implemented in the writer text import as SwIoSystem::IsDetectableText:
It old and ugly, but could be a starting point. Obviously, it would have to be moved out of writer and polished a bit so that it can be used in other applications too.
Would be nice if the implementation would also work with 38637 - Better handling for csv-Files
[This is an automated message.]
This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it
started right out as NEW without ever being explicitly confirmed. The bug is
changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back
to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases.
Details on how to test the 3.5.0 beta1 can be found at:
more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html
Still an issue in Version: 22.214.171.124.
I actually ended up using Excel for a bunch of CSV files because when I tried to open them in LibreOffice, the import screen (defaulting to UTF-16) showed the file as a string of unintelligible Asian characters and I was in a hurry. Once I had a bit more time, I realised that making them work in LO was as simple as changing the charset to UTF-8. Excel just worked.
Created attachment 122602 [details]
Confirmation that this is an ongoing issue.
If this has been around since 2011 and unchanged, perhaps it isn't urgent if not enough people encounter it, but that surprises me. Perhaps people run into it, and don't know what to do about it, and so don't report it? Anyway, here it is, and yes, if you just go back to utf-8 it resolves.
Implemented (at least the loose Unicode UTF-16 detection) since 7.1 with