Bug 100254 - Incorrect import of BIG INTEGER as Decimal(18) from CSV
Summary: Incorrect import of BIG INTEGER as Decimal(18) from CSV
Status: CLOSED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
5.1.3.2 release
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: dataLoss
Depends on:
Blocks: CSV-Import
  Show dependency treegraph
 
Reported: 2016-06-07 08:49 UTC by kartis56
Modified: 2019-11-11 14:06 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
sample list of csv (61 bytes, text/plain)
2016-06-09 02:39 UTC, kartis56
Details
odb2.odb (3.66 KB, application/vnd.sun.xml.base)
2016-06-09 02:42 UTC, kartis56
Details
CALC import wizard (65.68 KB, image/jpeg)
2016-06-09 08:42 UTC, kartis56
Details
DB wizard (53.52 KB, image/jpeg)
2016-06-09 09:30 UTC, kartis56
Details

Note You need to log in before you can comment on or make changes to this bug.
Description kartis56 2016-06-07 08:49:04 UTC
I expoort Blocked user list from  twitter, and import to BASE.
Attacehd by Text in new odb, and copy table to other odb.

Changed "999455876375535616" to "999455876375535000"
Comment 1 Alex Thurgood 2016-06-08 16:40:15 UTC
kartis: Please provide some more information.

What do you mean when you say "attached by text in ODB" ?

Please give detailed step by step instructions, or else a sample ODB file and instructions so that we can test.

At present, I don't understand : 

1) if the error you are seeing occurs only when you bind/import the CSV data to your first ODB file, or 

2) whether it occurs only when you try and copy/paste table from your first ODB file to a second ODB that has no corresponding table ; or

3) whether it occurs when you try to append table from ODB1 to an existing table in ODB2

  
Setting to needinfo
Comment 2 kartis56 2016-06-09 02:39:40 UTC
Created attachment 125562 [details]
sample list of csv
Comment 3 kartis56 2016-06-09 02:42:32 UTC
Created attachment 125563 [details]
odb2.odb

Table new is copyed as is
new2 is changed field format #,##0
Comment 4 kartis56 2016-06-09 02:55:45 UTC
OK for step by step info. 
it's similar to here 
http://dominoc925.blogspot.jp/2013/05/import-csv-file-into-libreoffice-base.html

0) export twitter block list from twitter>settings>blocked etc
   this csv has one field, id number only.

1) make new odb, Connect to an existing database Text

2) select CSV path, and check *.csv, and Finish

3) save any name odb. (odb1)
   this odb has csv file name talbes, and field type is DECIMAL. no error

4) open another odb, "Create a new database". (odb2)

5) drag and drop table from "odb1". "Definition and data" and finish. yes to make PrimaryKey

6) table has DECIMAL filed type. but view is float type such as " 1.23E+16"
   so change field format  #,##0,  data is zero filled "12,345,678,912,345,700" from "12345678912345678".

 I want "12,345,678,912,345,678" as integer or string.
Comment 5 kartis56 2016-06-09 03:03:33 UTC
next 

5-2) drag and drop table from "odb1". "Definition and data" and field type as BIGINT(19).

6-2) table has BIGINT(19) filed type. but view is float type such as " 1.23E+16"
   so change field format  #,##0,  data is zero filled "12,345,678,912,345,700"
Comment 6 kartis56 2016-06-09 03:06:37 UTC
last

5-3) drag and drop table from "odb1". "Definition and data" and field type as CHAR(20).

6-3) table has CHAR(20) field type. view is zero filled "12345678912345700   "
Comment 7 Alex Thurgood 2016-06-09 07:15:37 UTC
@kartis56 : thanks for the very comprehensive explanation. What you describe appears to be a bug in the data import wizard. I will test and report back.
Comment 8 Alex Thurgood 2016-06-09 07:34:16 UTC
So I did some preliminary testing on the CSV file. When I open this file in my master build of LO Calc, the numbers get truncated and appended with scientific notation. I think the root of the problem lies in this conversion of the integer, as I suspect that the wizard uses Calc's datatype engine in the import wizard. Unfortunately, it would appear that the code responsible for appending the data into to the table doesn't know how to handle numbers represented with this scientific notation correctly.

I have a vague recollection of this issue already having been raised somewhere, but will have to check.
Comment 9 Lionel Elie Mamane 2016-06-09 07:50:52 UTC
This is probably because it goes through a double (IEEE 64bit double precision floating point number) and 15 decimal digits of precision for the mantissa sounds about right.
Comment 10 Alex Thurgood 2016-06-09 08:11:20 UTC
During testing, I also noticed multiple incorrect codepage suggestions through different versions of Calc when opening the CSV file, ranging from UTF-8, to Korean, to Vietnamese, which doesn't help when testing. This behaviour appears to be bug 39124
Comment 11 Alex Thurgood 2016-06-09 08:25:01 UTC
Working with LO 5132 on OSX 10.11.5 :

1) Create new ODB file connecting to folder containing CSV provided by OP.
2) Open table "new" to display data.
3) Data displays as follows :

id
1,23E+16
2,23E+16
3,23E+16

For me the error is already here in the data displayed. The field definition of the table is 

DECIMAL
Length : -1
Decimal places : -1
Default value : N/A
Format example : 0

The value should not be DECIMAL but INT. Confirming
Comment 12 Alex Thurgood 2016-06-09 08:38:08 UTC
Confirming also that a copy-append of the table data and definition in a first ODB connected to the CSV file into a clean ODB file creates by default a DECIMAL datatype, that even when specified as BIGINT fails to display data correctly. Not really surprised by this as the first table from the first ODB already no longer contains an INT type.
Comment 13 kartis56 2016-06-09 08:42:04 UTC
Created attachment 125566 [details]
CALC import wizard
Comment 14 Alex Thurgood 2016-06-09 08:44:10 UTC
Note that when you create a link to a CSV file on setting up the connection, the user gets asked by the connection setup wizard to specify the value separator characters and decimal point separator character. There seems to be no way to avoid this part of the connection setup wizard.
Comment 15 kartis56 2016-06-09 08:48:39 UTC
I'm using locale ja-JP, so at CALC import wizard encoding to ASCII, lang en.
wizard view is correct, but imported data is incorrect such as 1.23E+16

maybe not only murtibyte ploblem...
Comment 16 kartis56 2016-06-09 09:17:32 UTC
Ok I make 2 file and check separator

1) " quote csv
2) original null quote csv

1) makes corerct view. data type is VARCHAR and copy makes correct.
2) has same ploblem. DECIMAL data type and incorrect view.
Comment 17 kartis56 2016-06-09 09:30:30 UTC
Created attachment 125567 [details]
DB wizard

maybe separator {null}  works as field type DECIMAL?
Comment 18 kartis56 2016-06-10 19:24:49 UTC
I see bug 39124   https://bugs.documentfoundation.org//show_bug.cgi?id=39124
and test this

1) " quote csv for test

2) import BASE by wizard as qwuoted ", type VARCHAR

3) view table and, copy all

4) make new CALC file, and paste

5) DATA is float type and zero filled. such as 12,345,678,912,345,700

maybe something wrong at CALC wizard
Comment 19 kartis56 2016-06-10 19:28:23 UTC
and test next

1) open " quote csv by CALC, text separator is ", field separtor is conma
   wizard view is correct. 12345678912345678

2) there is float... 1.23E+16
Comment 20 kartis56 2016-06-10 19:37:47 UTC
and next test

1) make new table at BASE
   ID as INTEGER primary key
   id as BIGINT(19) 

2) id format change to #,##0

3) open table and enter new record such as 
   1 and 1234567890123456789

4) id is zero filled        1,234,567,890,123,460,000
Comment 21 kartis56 2016-06-12 08:24:27 UTC
OK I'll give up using LO .
Comment 22 Robert Großkopf 2016-06-12 15:24:39 UTC
Have tested it a little bit.
1. Opened the database https://bugs.documentfoundation.org/attachment.cgi?id=125563
2. Executed there in Tools > SQL:
CREATE TEXT TABLE "Csv" ("ID" BIGINT PRIMARY KEY);
(creates a text-table with BIGINT)
SET TABLE "Csv" SOURCE "new.csv;ignore_first=true;encoding=UTF-8";
(creates a connection to the first attached *.csv-file)
SELECT * FROM "Csv";
(shows the correct numbers)
3. Opened the same table in the GUI - couldn't display the right number, shows 12345678912345700 instead of 12345678912345678

Displaying numbers with more than 16 digit seems to be impossible in the GUI. But this numbers were saved in the database.

4. Now I created a query:
SELECT CAST("ID" AS VARCHAR(18)) FROM "Csv"
Shows the right value.

The bug for Base and Calc has nothing to do with DECIMAL or BIGINT, but with displaying the right value inside the GUI.
Comment 23 Julien Nabet 2016-06-12 15:50:01 UTC
Thank you Robert, I reproduced exactly what you indicated.

Lionel: any codepointer about table gui management?

(remark: "csv" table can't be seen on table list unless restarting LO but that's another story)
Comment 24 Robert Großkopf 2016-06-12 16:07:29 UTC
(In reply to Julien Nabet from comment #23)

> (remark: "csv" table can't be seen on table list unless restarting LO but
> that's another story)

You have to chose View > Refresh tables (or something like this - haven't looked in English GUI). This is the normal way to show the GUI tables have been added through direct SQL.
Comment 25 Lionel Elie Mamane 2016-06-12 16:38:29 UTC
(In reply to Julien Nabet from comment #23)

> Lionel: any codepointer about table gui management?

My guess is that it is because a formatted field use a double (IEEE 64bit double precision floating point number) for the number that is contained in it and/or for the formatting.

See forms/source/component/FormattedField.[ch]xx

The formatting is done by a system that is shared with Calc; I don't remember out of the top of my head where it is, but look for "SvNumberFormatsSupplier" in svl
Comment 26 kartis56 2016-06-13 05:02:02 UTC
(In reply to Lionel Elie Mamane from comment #25)
> (In reply to Julien Nabet from comment #23)
> 
> > Lionel: any codepointer about table gui management?
> 
> My guess is that it is because a formatted field use a double (IEEE 64bit
> double precision floating point number) for the number that is contained in
> it and/or for the formatting.
> 
> See forms/source/component/FormattedField.[ch]xx
> 
> The formatting is done by a system that is shared with Calc; I don't
> remember out of the top of my head where it is, but look for
> "SvNumberFormatsSupplier" in svl

maybe here?

core\svl\source\numbers\numfmuno.hxx 

or this?

https://docs.libreoffice.org/svl/html/classSvNumberFormatterServiceObj.html
Comment 27 Lionel Elie Mamane 2016-06-13 05:42:28 UTC
(In reply to kartis56 from comment #26)
> (In reply to Lionel Elie Mamane from comment #25)
>> (In reply to Julien Nabet from comment #23)

>> The formatting is done by a system that is shared with Calc; I don't
>> remember out of the top of my head where it is, but look for
>> "SvNumberFormatsSupplier" in svl

> maybe here?

> core\svl\source\numbers\numfmuno.hxx 
> https://docs.libreoffice.org/svl/html/classSvNumberFormatterServiceObj.html

Yes, that looks right.
Comment 28 kartis56 2016-06-15 08:10:01 UTC
\core\svl\source\numbers\zformat.cxx(57,40)

const double EXP_ABS_UPPER_BOUND = 1.0E15;  // use exponential notation above that absolute value.
                                            // Back in time was E16 that lead
                                            // to display rounding errors, see
                                            // also sal/rtl/math.cxx
                                            // doubleToString()

line 1750

void SvNumberformat::ImpGetOutputStandard(double& fNumber, OUString& rOutString)
{
    sal_uInt16 nStandardPrec = rScan.GetStandardPrec();

    if ( fabs(fNumber) > EXP_ABS_UPPER_BOUND )
    {
        nStandardPrec = ::std::min(nStandardPrec, static_cast<sal_uInt16>(14)); // limits to 14 decimals

but why use this ?
Comment 29 QA Administrators 2018-07-21 02:40:08 UTC Comment hidden (obsolete)
Comment 30 Julien Nabet 2019-11-10 16:04:36 UTC
(In reply to kartis56 from comment #28)
> \core\svl\source\numbers\zformat.cxx(57,40)
> 
> const double EXP_ABS_UPPER_BOUND = 1.0E15;  // use exponential notation
> above that absolute value.
>                                             // Back in time was E16 that lead
>                                             // to display rounding errors,
> see
>                                             // also sal/rtl/math.cxx
>                                             // doubleToString()
> 
> line 1750
> 
> void SvNumberformat::ImpGetOutputStandard(double& fNumber, OUString&
> rOutString)
> {
>     sal_uInt16 nStandardPrec = rScan.GetStandardPrec();
> 
>     if ( fabs(fNumber) > EXP_ABS_UPPER_BOUND )
>     {
>         nStandardPrec = ::std::min(nStandardPrec,
> static_cast<sal_uInt16>(14)); // limits to 14 decimals
> 
> but why use this ?

Taking a look at sal/rtl/math.cxx, we got:
    381     switch (eFormat)
    382     {
    383         case rtl_math_StringFormat_Automatic:
    384         {   // E or F depending on exponent magnitude
    385             int nPrec;
    386             if (nExp <= -15 || nExp >= 15)  // was <-16, >16 in ancient versions, which leads to inaccuracies
    387             {
    388                 nPrec = 14;
    389                 eFormat = rtl_math_StringFormat_E;
    390             }
(see https://opengrok.libreoffice.org/xref/core/sal/rtl/math.cxx?r=be8da979#380)

Eike: it seems it might be a WONTFIX, any thoughts?
Comment 31 Eike Rathke 2019-11-11 14:06:11 UTC
See also further above https://opengrok.libreoffice.org/xref/core/sal/rtl/math.cxx?r=be8da979#283 for how representable integer values are handled. If it reaches line 381 it means the value is no such.

The error is on user side to import the ID values as numeric instead of text.
Floating point IEEE 754 double precision values are limited in their accuracy, the largest representable unambiguous whole integer number is (2^53)-1 or 9007199254740991.