I am working on a German Windows system. German language uses the dot (".", U+002E) as thousands separator, but the colon (",", U+002C) as fractions sparator. However, usually all CSV files I encountered use the dot as separator for the fractional part of floating point numbers. If I open such a file on a System working in German, the cells are imported as text, not as fractional numbers. Even if checking the option "extended number detection" on the CSV import dialog, it is not possible to parse the numbers as numbers.
This forum post provides a workaround, but it is not convenient: http://en.libreofficeforum.org/node/287
I would appreciate an improvement of this someway, perhaps allowing to choose the decimal import seperator from the CSV import dialog, or making the detection smarter.
Hi @Matthias, thanks for reporting.
There are at least two ways for select it in the import window.
- The securer way is select the column clicking on their head and in the column type select US English. You can use Shift or Ctrl key for a quick selection of several columns.
- Selecting as Language for input a language with those as default as English (UK), it has dot as decimal point and dates like in other EU countries.
Please attach a simple test file so that it's easy for us to see what you're seeing. Marking as NEEDINFO - once you attach a sample set to UNCONFIRMED. Thanks
Created attachment 119953 [details]
Simple import test file
Created attachment 119954 [details]
The import dialog I see
Created attachment 119955 [details]
The result of the import process
Created attachment 119956 [details]
What, imho, the result of the import should be
@m.a.riosv: The numbers column can be selected “US English” as you explained, which causes the import to work like expected. The problem was for me that I failed to guess that function from that label. I would never have come to the idea to set a language to a non text column (In what language is 42?). If I would have been asked what that setting is for, I would have guessed it would set the syllabificator’s or spell checker’s dictionary.
Perhaps a first correction to this would be to change the label to something more self-descriptive, perhaps “Decimal separator is .”?
Requesting UX input and throwing this to NEW.
I don't particularly like changing the label but I'm not entirely sure the extent of impact that selecting the language has (outside of apparently what you've encountered). Thus UX advise is best here to determine how to move forward.
Agree with OP the "US English" is a rather unclear label to indicate handling of decimal mark formatting for import of source CSV. But, the GUI drop-down list of columnar type Text import parsing filters in SCSTR_COLUMN_USER has been that way since early in OOo.
The action of the GUI sets a parsing filter for Decimal Mark as decimal point -- as in English language usage (US & British Commonwealth), helping those locals where Decimal Mark defaults to a decimal comma.
The label should probably reflect that. So would agree that a change in label from "US English" -> "Point - Decimal Mark" would be clearer UI.
And though not currently supported by filter, IMHO adding a "Comma - Decimal Mark" might be appropriate as a new enhancement, to handle data coming the opposite direction.
And in this vein, one could make the argument of a need to filter import UniCode CSV text of other positional numeral systems, i.e. Indic, Eastern Arabic, Hindi, and some of the CTL and CJK encoded texts. But think that would be another much more ambitious enhancement.
(In reply to V Stuart Foote from comment #9)
> The label should probably reflect that. So would agree that a change in
> label from "US English" -> "Point - Decimal Mark" would be clearer UI.
But it does more than that. It effectively sets en-US locale parsing for that column's import, which not only affects the decimal separator but also group separator, date format and day and month names. So actually the current label reflects reality..
(In reply to Eike Rathke from comment #10)
> But it does more than that. It effectively sets en-US locale parsing for
> that column's import, which not only affects the decimal separator but also
> group separator, date format and day and month names. So actually the
> current label reflects reality..
Ugh! That is even more obscure. Then in addition to the "US English" (wouldn't want to break anything), would it be reasonable to add two simpler and specific entries to the SCSTR_COLUMN_USER picklist, and the code to support them, in the column parser?
"Point - decimal mark"
"Comma - decimal mark"
P.S. -- poked around in the source... could follow it as far as http://opengrok.libreoffice.org/xref/core/sc/source/ui/dbgui/scuiasciiopt.cxx#240 and populates the list box. But I'm quickly lost in looking for indication it can distinguish a range of data formats that might be in the column. So no idea how involved this may be--Sorry.
Please add keyword 'needsUXEval' and CC 'email@example.com' if input from UX is needed.
From my point of view, UI does what it says:
- either data are formatted in the language indicated in the UI (for instance German in attachment 119954 [details]) ==> change language according to the language used in the file for formatting
- or data are formatted in US English, as many apparatus may produce ==> change header of affected columns
As Eike mentioned, format affects not only decimal or thousand separators, but also date format, currency, etc.