Created attachment 190242 [details] csv file from a sql table I imported a csv generated from a sql table. It looked rather good but had several exceptions in the format of the imported data. Subnet triples seperated by two dots in column 2 looked wrong in line 10 and 12+13, but looked good in the other lines. The dot in between had disappeared. Why is it so?
Because your locale used dot as thousand separator; and you didn't mark the IP column as text - so the values that matched three-digit parts separated by thousand separators were recognized as numbers; while those that didn't match that pattern were imported as text. If a column represents what is not a number, then it should be explicitly marked so in the import dialog.
Mike, your answer is nonsense. The first 9 lines also contain the dots and are not recognized as numers. Also would a conversion to a number yield the wrong value. Are you a saboteur?
(In reply to Jens-D Doll from comment #2) > your answer is nonsense. Do you say that every time you do not understand anything, instead of asking others to re-phrase the idea in a simpler words that you can comprehend? > The first 9 lines also contain the dots and are not recognized as numers. 9? You never provided a screenshot of what you actually see; but I guess that the first "problematic" result would appear at row 10, with previous *8* lines having the expected result (and the line 1 would contain column names, so no "dots recognized as numbers"). Now try to look at the data, and re-read what I wrote. In row 2, the text of the second column in CSV is "101.91.148". It has three digits in the leftmost group, then a dot, then *two* digits in the middle group, then dot, and then three digits in the rightmost group. I explained, that when the data does not match the "numeric digits in groups of three, separated by dots", it would not get converted into a number. And here, the middle group does not consist of *thee* digits. What is unclear here? The whole data of the cell does not match the number recognition pattern -> it does not get converted. Same for rows 3 to 9. And row 10 has "149.154.159". It is the first row that contains data that matches the rule. It is considered "one hundred forty nine million one hundred fifty four thousand one hundred fifty nine" Indeed, the leftmost group could also be less than 3 digits (say, "9.154.159" would also match as nine million one hundred fifty four thousand one hundred fifty nine). > Also would a conversion to a number yield the wrong value. It would bring a number, which then would be displayed according to the *default* format rules, which don't include thousand separators. > Are you a saboteur? Did you like my words hinting that you are unable to comprehend simple things? If not, then why do you think it's OK to personally attack someone trying to help you by explaining things?
Regardless of your "arguments" a value must not be interpreted in lines 1-9 one way and in lines 10,12,13 the other way. That should be clear also to simple minds.
No, you are wrong. And your data exactly shows why: it has row 1 as headers (definitely text), while other rows as data (so other rows would definitely need different interpretation). And CSV is used heavily for very different kinds of data, including intermediate results, sub-headings, etc. If you use such an inherently loosely defined format as CSV, it's your duty to learn how to handle the specific flavor you have correctly. The program offers you the very flexible means (the dialog gives you the full control). And you can tell the program exactly that: instead of trying to guess every cell of a column (the standard), just treat it as I say (make it text by choosing the column type). But trying to make program please one group of users would break infinite number of other users.
Sorry for those hard words, Mike. I'll explain it now. A csv collection is similar to a table, where each column has the same type in all lines. So the first line decides over the type of the values in the following lines. Here the first line is all text and so all values in the following lines have to be interpreted as text values.
Sigh. Please read comment 5. I explained it there, telling that there are HUGE VARIETY of CSV flavors. Yours is just one kind.
Have a look at Wikipedia, https://en.wikipedia.org/wiki/Comma-separated_values Comma-separated values (CSV) is a text file format that uses commas to separate values. A CSV file stores tabular data (numbers and text) in plain text, where each line of the file typically represents one data record. Each record consists of the same number of fields, and these are separated by commas in the CSV file. If the field delimiter itself may appear within a field, fields can be surrounded with quotation marks.[3]
Better look at the RFC [1] (which, by the way, is just a memo). And acknowledge the fact, that CSV has evolved through more than 30 years before that attempt of "standardization", having infinite number of flavors (including binary data, or absolutely odd variants like described at [2]). No amount of wishful thinking can change that. New CSVs (and TSVs, and other "tabular" textual formats) are generated each day by thousands of different softwares. [1] https://www.rfc-editor.org/rfc/rfc4180 [2] https://mikekaganski.wordpress.com/2021/02/18/reading-from-mysql-data-with-blobs-dumped-to-csv/
We should look at csv in todays world and not in yesterdays' one. Before Copernicus the world was a plane and centered the world.
In comment 9, I provided you a link to my post from 2021. I explained there a non-standard CSV flavor generated by MySQL, right then, in 2021. If you ignore everything that contradicts your opinion, I have no intention to waste time trying to explain. If you suppose that 2021 is "yesterday's world", you are welcome to test current implementations of e.g. MySQL, and countless of other current applications. I'm done here.
Created attachment 190273 [details] a view of the buggy import
The csv format has always been a database format with fixed types in each column. See https://support.microsoft.com/en-us/office/create-or-edit-csv-files-to-import-into-outlook-4518d70d-8fe9-46ad-94fa-1494247193c7 for reference. So this is a bug.
As noted comment 1, import of a column of data that can be either numbers or text (e.g. the subnet column 2 in a locale where . is the thousands separator) can not be left to detection as 'Standard'. You must understand your data and your number formats and make an appropriate selection. In this case assign a column as 'Text'--or you will get mixed numeric detection. You could also force the "subnet" column to text enclosing the field with "", or during manual entry by prepending a ' Either method => NAB