Bug 68077 - Accept the fact that Excel's CSV format is de facto standard
Summary: Accept the fact that Excel's CSV format is de facto standard
Status: RESOLVED INVALID
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-13 18:19 UTC by Ma Xiaojun
Modified: 2014-04-17 16:11 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ma Xiaojun 2013-08-13 18:19:08 UTC
Calc tries to be general about CSV format. However, such idea causes much trouble with almost zero gain.

The most superficial trouble is the annoying Text Import dialog that has almost zero value. Never seen an CSV in non-Excel dialect, sorry. 

Deeper trouble is all sort of bugs surround Text Import dialog.

Even deeper trouble is that saving CSV may cause data loss/format change in certain cases. This is really ridiculous for such a simple format.

As a reference, Python's csv module treats Excel's CSV format as de facto standard:
http://docs.python.org/2/library/csv.html

It switches to other dialects unless explicitly told so.

To be honest, I'm not totally unbiased. I use Python's csv module (with default dialect) as a way to generate spreadsheet from simulation results. ( I looked at ODS but various Python libraries all look immature and/or inactive. ) Excel's handling of such CSV is certainly more smooth than LO in my case.
Comment 1 Tomaz Vajngerl 2013-08-13 20:19:20 UTC
What problems do you have with CSV format in Calc? What bugs are surrounding Text Import dialog? In which cases saving CSV causes data loss? Bugzilla is meant to report bugs.

I have seen a tab delimited, a space delimited, a REAL comma delimited and even fixed size files where the Import dialog was a time saver. I am not alone who thinks it is useful but you are probably right that, if we detect a Excel format CSV then we could skip the dialog and do the right thing. However I don't even know a reliable way to detect this - it is safer to always ask instead.

Regards, Tomaž
Comment 2 Ma Xiaojun 2013-08-13 20:35:31 UTC
Search "csv" in both LO and OO gives a good amount of results, I don't think they are all outdated ones.

> I have seen a tab delimited, a space delimited, a REAL comma delimited and even fixed size files where the Import dialog was a time saver.

Can you tell me which software is (still) generating such CSV?

I don't think automatic detection is really necessary. Both Calc and Excel have "Data => Text to Columns" at the end of day.
Comment 3 Ma Xiaojun 2013-08-13 20:40:46 UTC
I mean searching Bugzilla in previous comment.
Comment 4 m_a_riosv 2013-08-13 22:48:36 UTC
Hi Ma,

Maybe read about CSV in the Wikipedia, can clear a bit the situation.
https://en.wikipedia.org/wiki/Comma-separated_values

Can you attach a sample CSV to verify the problems, and explain the issues you have with it.
Comment 5 Ma Xiaojun 2013-08-14 06:12:27 UTC
I know the Comma-separated_values Wikipedia entry mentions that CSV can have many different formats.

However, you may also check another Wikipedia entry:
https://en.wikipedia.org/wiki/CSV_application_support

The list mentions standard lib and/or open source lib. Can you find some lib actually intentionally generate non-Excel dialect? I've verified that Python's standard csv module indeed generate Excel dialect by default. Python is currently the biggest language for me.

For some random real world data, check here:
http://www.capitalbikeshare.com/trip-history-data
Comment 6 Tomaz Vajngerl 2013-08-14 10:10:50 UTC
(In reply to comment #2)
> Can you tell me which software is (still) generating such CSV?
I checked my database viewer (DB Visualizer) just for fun and it uses tab delimiter by default. Next I tried Excel 2003 - saved a simple sheet as CSV and I got a semicolon delimited file because semicolon is a default delimiter for my locale (locale dependency is also written in [1]).

Opening your "random real world data" CSV file in Excel did not separate the file for me because of locale settings again - so much for the Excel CSV "standard".

CSV is very locale dependent and this is why we can not assume any format as default. As I said, with smart auto-detection we could detect the settings in some cases and the dialog would not be necessary but this is all we could do IMHO.

Regards, Tomaž

[1]: https://en.wikipedia.org/wiki/CSV_application_support
Comment 7 Ma Xiaojun 2013-08-14 10:30:00 UTC
Thank you very much for your information.

I was not aware of the locale annoyance of Excel, though it is not uncommon in MSFT software.

As I checked, you are also right about DbVisualizer.
Comment 8 tommy27 2014-04-14 21:54:29 UTC
should we close this bug report?
Comment 9 Cor Nouws 2014-04-17 16:11:00 UTC
(In reply to comment #8)
> should we close this bug report?

Yes, looking at eg #6, it's simply invalid. 

@Ma: sorry that life isn't easier :)