Bug 117672 - Make default column type 'Text' when importing CSVs, or warn of data modification
Summary: Make default column type 'Text' when importing CSVs, or warn of data modifica...
Status: RESOLVED WONTFIX
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.0.4.2 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: CSV-Import
  Show dependency treegraph
 
Reported: 2018-05-17 18:11 UTC by totalmassretain
Modified: 2022-08-31 12:25 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description totalmassretain 2018-05-17 18:11:12 UTC
Description:
With the default CSV import settings, column types are set to 'Standard.' This means that columns that contain very long numbers (such as serial numbers) get converted to notation, which can result in data loss due to rounding.

The principal of least surprise would indicate that, if I open a CSV, LibreOffice will not make changes to it that I haven't explicitly requested. CSVs do not have a concept of data types -- every field in a CSV is 'text'. If LibreOffice tries to infer meaning based on the content of these text fields, it will get it wrong some of the time, resulting in unexpected behaviour.

The current solution is to select all the cells in the CSV import window and change 'Standard' to 'Text.' Since all of the data in a CSV is text, I would argue that making 'Text' the default column type is more correct and would result in less unexpected behaviour, at least for the use cases I have in mind (opening a CSV, making a minor modification, saving and closing it.)

If it is felt that the current behaviour is preferable for most of your users, an alternative solution would be to provide the user with some warning that LibreOffice is about to modify your data. This could take the form of a dialogue box that appears after you click 'OK' in the Text Import window. This new dialogue box could feature a Cancel button, allowing a user to go back to the Text Import window and change their settings, and could perhaps include a recommendation that the user change their column types to 'Text' if they don't want LibreOffice to modify their data.

Silently changing data by default is the wrong behaviour, in my opinion. Imagine a user working with a large sheet that has long numbers in column Z that don't immediately appear on screen and aren't relevant to the user, so they probably won't bother looking at them. They make their changes to the cells that are relevant to them (say column A or B,) save and send the csv around to their colleagues, never realizing that LibreOffice has modified part their data. Now they look silly because they didn't predict LO's default behaviour. Or they try to import it into the database using column Z (pretend they're serial numbers) as a key. If those serial numbers are actually serial, the same record might get updated repeatedly due to the rounding. Or some records with sufficiently similar keys might get updated multiple times. Or something else -- at this point we are well in to the realm of unexpected, implementation-specific behaviour.

Far better just to not make changes when they aren't asked for. Or at least warn when LO is about to make those changes so the user can prevent it from doing so by changing the import settings.

Steps to Reproduce:
1.echo 12345678901234567890 > test.csv
2.vi test.csv -- notice the number, as echoed
3.open test.csv in lo calc using the default settings
4.save test.csv
5.vi test.csv -- notice that the number has changed, despite not having explicitly done anything to the file and despite receiving no warning that my csv file will essentially be corrupted if I save



Actual Results:  
The cell has been changed to 1.23456789012346E+019

Expected Results:
The cell should appear as 12345678901234567890


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 6.0.4.2
Build ID: 6.0.4-1
CPU threads: 8; OS: Linux 4.16; UI render: default; VCL: kde4; 
Locale: en-US (en_US.UTF-8); Calc: group


User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
Comment 1 Xavier Van Wijmeersch 2018-05-18 08:32:21 UTC
The current behaviour; number format; is for all the spreadsheet i know the most preferable use.
If you want text as default make a template for your needs, there are a lot things you can do.
Spreadsheets are for calculation not text writing or database storage.
Use cell or page styling and save it as a template.
A warning that long number will change to a number format that i can understand.
And csv file's are made with database's if i am right? so text based.
Comment 2 V Stuart Foote 2018-05-18 15:43:10 UTC
While LibreOffice can import CSV into and export CSV from Calc it is not intended to be a .CSV editor. Round trip of a CSV (any delimited or fixed width format) through Calc has no obligation to retain text strings in their original format once imported--expecting such is abusive use of a spreadsheet where formulas and cell styling and conditional formatting is involved.

Text editors, e.g. ViM, Notepad++, or even emacs are all infinitely more functional for "editing" delimited text files.

Meaning having to manually set all of the resulting columns to Text on import is reasonable (it is a single click to select from the preview and then apply TEXT from droplist). But users should then understand the marginal results they'll get--both with Calc as a text "editor" or in working with now malformed data as a spreadsheet.

Set them if one likes, but Text as a _default_ for columns of the Text Import dialog is not appropriate to use of a spreadsheet. Simply NO!

I guess we could maybe provide a Tools -> Options -> LO Calc panel for the Text Import dialog and a cook up a control to toggle default from STANDARD to TEXT, and hold setting per user profile. But I still think that fosters user abuse of Calc spreadsheets.

Otherwise, with default STANDARD filter import, you can also hold a long numeric string (e.g. your serial numbers) as TEXT by prepending an apostrophe to it prior to import, i.e. part of preparing your data for import.

IMHO => WONTFIX
Comment 3 totalmassretain 2018-05-20 15:44:57 UTC
I'm sure you've researched how your program is used, and so I respect your views on the most common use cases. There are, however, reasons why one might try to edit a CSV in Calc and not a simple text editor, such as re-ordering columns. Or even the simple fact that a user knows Calc can import CSVs and may prefer to work in that program. While it may not be intended to be a CSV editor, if it can open and save CSVs (via open/save dialogues, no less -- not import/export dialogues), then the CSV editor use case exists.

I acknowledge that this use case isn't really concerned with spreadsheet functionality, and may not line up with how developers might expect users to use the program.

With that in mind, I'll forget the request to change the default and re-iterate the later part of the request - that silently modifying the data is not good practice, and that a warning dialogue be displayed when the user clicks the OK button in the Text Import dialogue. Such a dialogue would inform the user that their import settings will modify their data, and perhaps it could even tell them what to do if they don't want their data modified (change all columns to 'Text'.) That way, if someone imports a CSV with serial numbers in some off-screen column, they won't unwittingly blow them away.

I feel that default toggles and templates would not address the issue of unexpected data modification, as they would require prior knowledge of this issue. Those with prior knowledge of the issue could just do what we currently do and change all the columns types to 'Text'. Such features may be relevant to other use cases, however.

Thank you for your consideration.
Comment 4 Heiko Tietze 2018-05-27 08:36:06 UTC
I agree with the warning when data has been changed automatically. The infobar should provide a proper feedback in those cases.
Comment 5 Mike Kaganski 2022-08-31 06:10:35 UTC
No, there is no use to do this "warning", nor change the default import column type.

The principle of least surprise tells that the software should not confuse most users. And most users would be surprised to get their numbers with multiple digits (like "1.2345678901234567890") not treated as numbers.

The dialog allows one to define column types explicitly. Bug 150652 would make that more discoverable. Using an external format, especially CSV, which on one hand is heavily used by banks and such (so any additional failure to recognize numbers would make *lots* of people suffer), and on the other hand, has absolutely no idea about data types, requires some user involvement. Dumbing the import process to provide "safe" default (like "everything is text") will make major pain to most users; and there is *no* way to tell that there was no data modification, when you e.g. imported a number 0.1, and got its imprecise binary representation equal to 0.1000000000000000055511151231257827021181583404541015625.