Bug 35019 - charset autodetection for csv imports
Summary: charset autodetection for csv imports
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.3.1 release
Hardware: Other All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: CSV-Import 38637
  Show dependency treegraph
 
Reported: 2011-03-04 08:11 UTC by Björn Michaelsen
Modified: 2021-08-29 21:44 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Confirmation that this is an ongoing issue. (281.36 KB, image/png)
2016-02-13 00:30 UTC, Malik
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Björn Michaelsen 2011-03-04 08:11:09 UTC
It would be great to have charset autodetection as an option for the csv import in Calc. Currently the last charset gets remembered.

I imagine that this could cause quite a bit of trouble to endusers who accidentally change their charset import setting to something that is not the default, but looks like an innocent choice (e.g. "Unicode").

see also:

 https://bugs.launchpad.net/ubuntu/+source/openoffice.org/+bug/694188
Comment 1 Björn Michaelsen 2011-03-04 09:04:09 UTC
There is already a basic implementation of charset detection implemented in the writer text import as SwIoSystem::IsDetectableText:

http://opengrok.libreoffice.org/xref/writer/sw/source/filter/basflt/iodetect.cxx#427

It old and ugly, but could be a starting point. Obviously, it would have to be moved out of writer and polished a bit so that it can be used in other applications too.
Comment 2 Alexander Balzer 2011-06-24 04:59:53 UTC
Would be nice if the implementation would also work with 38637 - Better handling for csv-Files
Comment 3 Björn Michaelsen 2011-12-23 11:50:46 UTC Comment hidden (obsolete)
Comment 4 lists 2015-07-24 11:20:16 UTC
Still an issue in Version: 4.4.2.2.

I actually ended up using Excel for a bunch of CSV files because when I tried to open them in LibreOffice, the import screen (defaulting to UTF-16) showed the file as a string of unintelligible Asian characters and I was in a hurry. Once I had a bit more time, I realised that making them work in LO was as simple as changing the charset to UTF-8. Excel just worked.
Comment 5 Malik 2016-02-13 00:30:57 UTC
Created attachment 122602 [details]
Confirmation that this is an ongoing issue.

If this has been around since 2011 and unchanged, perhaps it isn't urgent if not enough people encounter it, but that surprises me. Perhaps people run into it, and don't know what to do about it, and so don't report it? Anyway, here it is, and yes, if you just go back to utf-8 it resolves.
Comment 6 Eike Rathke 2021-08-29 21:44:01 UTC
Implemented (at least the loose Unicode UTF-16 detection) since 7.1 with
https://git.libreoffice.org/core/+/85f12e47f4a086a3923dd3a6b097776d60c6dc82%5E%21/