Bug 94977 - Calc: Decimal number parsing when importing CSV -- function of US English labeled filter needs to be clear (comment 7)
Summary: Calc: Decimal number parsing when importing CSV -- function of US English lab...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: CSV-Import
  Show dependency treegraph
 
Reported: 2015-10-12 10:10 UTC by Matthias Ronge
Modified: 2020-03-09 19:16 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Simple import test file (34 bytes, text/plain)
2015-10-26 07:29 UTC, Matthias Ronge
Details
The import dialog I see (36.51 KB, image/png)
2015-10-26 07:32 UTC, Matthias Ronge
Details
The result of the import process (8.23 KB, application/vnd.oasis.opendocument.spreadsheet)
2015-10-26 07:33 UTC, Matthias Ronge
Details
What, imho, the result of the import should be (8.32 KB, application/vnd.oasis.opendocument.spreadsheet)
2015-10-26 07:34 UTC, Matthias Ronge
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Ronge 2015-10-12 10:10:56 UTC
I am working on a German Windows system. German language uses the dot (".", U+002E) as thousands separator, but the colon (",", U+002C) as fractions sparator. However, usually all CSV files I encountered use the dot as separator for the fractional part of floating point numbers. If I open such a file on a System working in German, the cells are imported as text, not as fractional numbers. Even if checking the option "extended number detection" on the CSV import dialog, it is not possible to parse the numbers as numbers.

This forum post provides a workaround, but it is not convenient: http://en.libreofficeforum.org/node/287

I would appreciate an improvement of this someway, perhaps allowing to choose the decimal import seperator from the CSV import dialog, or making the detection smarter.
Comment 1 m_a_riosv 2015-10-12 12:03:55 UTC
Hi @Matthias, thanks for reporting.

There are at least two ways for select it in the import window.

- The securer way is select the column clicking on their head and in the column type select US English. You can use Shift or Ctrl key for a quick selection of several columns.

- Selecting as Language for input a language with those as default as English (UK), it has dot as decimal point and dates like in other EU countries.
Comment 2 Joel Madero 2015-10-24 06:01:57 UTC
Please attach a simple test file so that it's easy for us to see what you're seeing. Marking as NEEDINFO - once you attach a sample set to UNCONFIRMED. Thanks
Comment 3 Matthias Ronge 2015-10-26 07:29:41 UTC
Created attachment 119953 [details]
Simple import test file
Comment 4 Matthias Ronge 2015-10-26 07:32:22 UTC
Created attachment 119954 [details]
The import dialog I see
Comment 5 Matthias Ronge 2015-10-26 07:33:27 UTC
Created attachment 119955 [details]
The result of the import process
Comment 6 Matthias Ronge 2015-10-26 07:34:53 UTC
Created attachment 119956 [details]
What, imho, the result of the import should be
Comment 7 Matthias Ronge 2015-10-26 07:46:21 UTC
@m.a.riosv: The numbers column can be selected “US English” as you explained, which causes the import to work like expected. The problem was for me that I failed to guess that function from that label. I would never have come to the idea to set a language to a non text column (In what language is 42?). If I would have been asked what that setting is for, I would have guessed it would set the syllabificator’s or spell checker’s dictionary.

Perhaps a first correction to this would be to change the label to something more self-descriptive, perhaps “Decimal separator is .”?
Comment 8 Joel Madero 2015-10-30 04:12:53 UTC
Requesting UX input and throwing this to NEW.

I don't particularly like changing the label but I'm not entirely sure the extent of impact that selecting the language has (outside of apparently what you've encountered). Thus UX advise is best here to determine how to move forward.
Comment 9 V Stuart Foote 2015-10-30 16:03:49 UTC
Agree with OP the "US English" is a rather unclear label to indicate handling of decimal mark formatting for import of source CSV.  But, the GUI drop-down list of columnar type Text import parsing filters in SCSTR_COLUMN_USER has been that way since early in OOo.

The action of the GUI sets a parsing filter for Decimal Mark as decimal point -- as in English language usage (US & British Commonwealth), helping those locals where Decimal Mark defaults to a decimal comma.

The label should probably reflect that.  So would agree that a change in label from "US English" -> "Point - Decimal Mark" would be clearer UI.

=-ref-=
http://opengrok.libreoffice.org/xref/core/sc/source/ui/src/scstring.src


And though not currently supported by filter, IMHO adding a "Comma - Decimal Mark" might be appropriate as a new enhancement, to handle data coming the opposite direction.

And in this vein, one could make the argument of a need to filter import UniCode CSV text of other positional numeral systems, i.e. Indic, Eastern Arabic, Hindi, and some of the CTL and CJK encoded texts. But think that would be another much more ambitious enhancement.
Comment 10 Eike Rathke 2015-11-30 14:37:56 UTC
(In reply to V Stuart Foote from comment #9)
> The label should probably reflect that.  So would agree that a change in
> label from "US English" -> "Point - Decimal Mark" would be clearer UI.

But it does more than that. It effectively sets en-US locale parsing for that column's import, which not only affects the decimal separator but also group separator, date format and day and month names. So actually the current label reflects reality..
Comment 11 V Stuart Foote 2015-11-30 15:22:42 UTC
@@Eike, *

(In reply to Eike Rathke from comment #10)
> But it does more than that. It effectively sets en-US locale parsing for
> that column's import, which not only affects the decimal separator but also
> group separator, date format and day and month names. So actually the
> current label reflects reality..

Ugh! That is even more obscure. Then in addition to the "US English" (wouldn't want to break anything), would it be reasonable to add two simpler and specific entries to the SCSTR_COLUMN_USER picklist, and the code to support them, in the column parser?

"Point - decimal mark"
"Comma - decimal mark"

P.S. -- poked around in the source... could follow it as far as http://opengrok.libreoffice.org/xref/core/sc/source/ui/dbgui/scuiasciiopt.cxx#240 and populates the list box.  But I'm quickly lost in looking for indication it can distinguish a range of data formats that might be in the column. So no idea how involved this may be--Sorry.
Comment 12 Xisco Faulí 2020-03-09 13:28:27 UTC Comment hidden (off-topic)
Comment 13 Laurent Balland 2020-03-09 19:16:37 UTC
From my point of view, UI does what it says:
- either data are formatted in the language indicated in the UI (for instance German in attachment 119954 [details]) ==> change language according to the language used in the file for formatting
- or data are formatted in US English, as many apparatus may produce ==> change header of affected columns
As Eike mentioned, format affects not only decimal or thousand separators, but also date format, currency, etc.