35019 – charset autodetection for csv imports

Bug 35019 - charset autodetection for csv imports

Summary: charset autodetection for csv imports

Status:	RESOLVED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Calc (show other bugs)
Version: (earliest affected)	3.3.1 release
Hardware:	Other All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	CSV-Import 38637
	Show dependency tree / graph

Reported:	2011-03-04 08:11 UTC by Björn Michaelsen
Modified:	2021-08-29 21:44 UTC (History)
CC List:	3 users (show)

See Also:	https://launchpad.net/bugs/694188
Crash report or crash signature:

Attachments
Confirmation that this is an ongoing issue. (281.36 KB, image/png) 2016-02-13 00:30 UTC, Malik	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Björn Michaelsen 2011-03-04 08:11:09 UTC

It would be great to have charset autodetection as an option for the csv import in Calc. Currently the last charset gets remembered.

I imagine that this could cause quite a bit of trouble to endusers who accidentally change their charset import setting to something that is not the default, but looks like an innocent choice (e.g. "Unicode").

see also:

 https://bugs.launchpad.net/ubuntu/+source/openoffice.org/+bug/694188

Comment 1 Björn Michaelsen 2011-03-04 09:04:09 UTC

There is already a basic implementation of charset detection implemented in the writer text import as SwIoSystem::IsDetectableText:

http://opengrok.libreoffice.org/xref/writer/sw/source/filter/basflt/iodetect.cxx#427

It old and ugly, but could be a starting point. Obviously, it would have to be moved out of writer and polished a bit so that it can be used in other applications too.

Comment 2 Alexander Balzer 2011-06-24 04:59:53 UTC

Would be nice if the implementation would also work with 38637 - Better handling for csv-Files

Comment 3 Björn Michaelsen 2011-12-23 11:50:46 UTC Comment hidden (obsolete)

[This is an automated message.]
This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it
started right out as NEW without ever being explicitly confirmed. The bug is
changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back
to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases.
Details on how to test the 3.5.0 beta1 can be found at:
http://wiki.documentfoundation.org/QA/BugHunting_Session_3.5.0.-1

more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html

Comment 4 lists 2015-07-24 11:20:16 UTC

Still an issue in Version: 4.4.2.2.

I actually ended up using Excel for a bunch of CSV files because when I tried to open them in LibreOffice, the import screen (defaulting to UTF-16) showed the file as a string of unintelligible Asian characters and I was in a hurry. Once I had a bit more time, I realised that making them work in LO was as simple as changing the charset to UTF-8. Excel just worked.

Comment 5 Malik 2016-02-13 00:30:57 UTC

Created attachment 122602 [details]
Confirmation that this is an ongoing issue.

If this has been around since 2011 and unchanged, perhaps it isn't urgent if not enough people encounter it, but that surprises me. Perhaps people run into it, and don't know what to do about it, and so don't report it? Anyway, here it is, and yes, if you just go back to utf-8 it resolves.

Comment 6 Eike Rathke 2021-08-29 21:44:01 UTC

Implemented (at least the loose Unicode UTF-16 detection) since 7.1 with
https://git.libreoffice.org/core/+/85f12e47f4a086a3923dd3a6b097776d60c6dc82%5E%21/