100254 – Incorrect import of BIG INTEGER as Decimal(18) from CSV

Bug 100254 - Incorrect import of BIG INTEGER as Decimal(18) from CSV

Summary: Incorrect import of BIG INTEGER as Decimal(18) from CSV

Status:	CLOSED NOTABUG

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Base (show other bugs)
Version: (earliest affected)	5.1.3.2 release
Hardware:	x86-64 (AMD64) All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:	dataLoss

Depends on:
Blocks:	CSV-Import
	Show dependency tree / graph

Reported:	2016-06-07 08:49 UTC by kartis56
Modified:	2022-12-28 12:21 UTC (History)
CC List:	4 users (show)

See Also:	150635
Crash report or crash signature:

Attachments
sample list of csv (61 bytes, text/plain) 2016-06-09 02:39 UTC, kartis56	Details
odb2.odb (3.66 KB, application/vnd.sun.xml.base) 2016-06-09 02:42 UTC, kartis56	Details
CALC import wizard (65.68 KB, image/jpeg) 2016-06-09 08:42 UTC, kartis56	Details
DB wizard (53.52 KB, image/jpeg) 2016-06-09 09:30 UTC, kartis56	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description kartis56 2016-06-07 08:49:04 UTC

I expoort Blocked user list from  twitter, and import to BASE.
Attacehd by Text in new odb, and copy table to other odb.

Changed "999455876375535616" to "999455876375535000"

Comment 1 Alex Thurgood 2016-06-08 16:40:15 UTC

kartis: Please provide some more information.

What do you mean when you say "attached by text in ODB" ?

Please give detailed step by step instructions, or else a sample ODB file and instructions so that we can test.

At present, I don't understand : 

1) if the error you are seeing occurs only when you bind/import the CSV data to your first ODB file, or 

2) whether it occurs only when you try and copy/paste table from your first ODB file to a second ODB that has no corresponding table ; or

3) whether it occurs when you try to append table from ODB1 to an existing table in ODB2

  
Setting to needinfo

Comment 2 kartis56 2016-06-09 02:39:40 UTC

Created attachment 125562 [details]
sample list of csv

Comment 3 kartis56 2016-06-09 02:42:32 UTC

Created attachment 125563 [details]
odb2.odb

Table new is copyed as is
new2 is changed field format #,##0

Comment 4 kartis56 2016-06-09 02:55:45 UTC

OK for step by step info. 
it's similar to here 
http://dominoc925.blogspot.jp/2013/05/import-csv-file-into-libreoffice-base.html

0) export twitter block list from twitter>settings>blocked etc
   this csv has one field, id number only.

1) make new odb, Connect to an existing database Text

2) select CSV path, and check *.csv, and Finish

3) save any name odb. (odb1)
   this odb has csv file name talbes, and field type is DECIMAL. no error

4) open another odb, "Create a new database". (odb2)

5) drag and drop table from "odb1". "Definition and data" and finish. yes to make PrimaryKey

6) table has DECIMAL filed type. but view is float type such as " 1.23E+16"
   so change field format  #,##0,  data is zero filled "12,345,678,912,345,700" from "12345678912345678".

 I want "12,345,678,912,345,678" as integer or string.

Comment 5 kartis56 2016-06-09 03:03:33 UTC

next 

5-2) drag and drop table from "odb1". "Definition and data" and field type as BIGINT(19).

6-2) table has BIGINT(19) filed type. but view is float type such as " 1.23E+16"
   so change field format  #,##0,  data is zero filled "12,345,678,912,345,700"

Comment 6 kartis56 2016-06-09 03:06:37 UTC

last

5-3) drag and drop table from "odb1". "Definition and data" and field type as CHAR(20).

6-3) table has CHAR(20) field type. view is zero filled "12345678912345700   "

Comment 7 Alex Thurgood 2016-06-09 07:15:37 UTC

@kartis56 : thanks for the very comprehensive explanation. What you describe appears to be a bug in the data import wizard. I will test and report back.

Comment 8 Alex Thurgood 2016-06-09 07:34:16 UTC

So I did some preliminary testing on the CSV file. When I open this file in my master build of LO Calc, the numbers get truncated and appended with scientific notation. I think the root of the problem lies in this conversion of the integer, as I suspect that the wizard uses Calc's datatype engine in the import wizard. Unfortunately, it would appear that the code responsible for appending the data into to the table doesn't know how to handle numbers represented with this scientific notation correctly.

I have a vague recollection of this issue already having been raised somewhere, but will have to check.

Comment 9 Lionel Elie Mamane 2016-06-09 07:50:52 UTC

This is probably because it goes through a double (IEEE 64bit double precision floating point number) and 15 decimal digits of precision for the mantissa sounds about right.

Comment 10 Alex Thurgood 2016-06-09 08:11:20 UTC

During testing, I also noticed multiple incorrect codepage suggestions through different versions of Calc when opening the CSV file, ranging from UTF-8, to Korean, to Vietnamese, which doesn't help when testing. This behaviour appears to be bug 39124

Comment 11 Alex Thurgood 2016-06-09 08:25:01 UTC

Working with LO 5132 on OSX 10.11.5 :

1) Create new ODB file connecting to folder containing CSV provided by OP.
2) Open table "new" to display data.
3) Data displays as follows :

id
1,23E+16
2,23E+16
3,23E+16

For me the error is already here in the data displayed. The field definition of the table is 

DECIMAL
Length : -1
Decimal places : -1
Default value : N/A
Format example : 0

The value should not be DECIMAL but INT. Confirming

Comment 12 Alex Thurgood 2016-06-09 08:38:08 UTC

Confirming also that a copy-append of the table data and definition in a first ODB connected to the CSV file into a clean ODB file creates by default a DECIMAL datatype, that even when specified as BIGINT fails to display data correctly. Not really surprised by this as the first table from the first ODB already no longer contains an INT type.

Comment 13 kartis56 2016-06-09 08:42:04 UTC

Created attachment 125566 [details]
CALC import wizard

Comment 14 Alex Thurgood 2016-06-09 08:44:10 UTC

Note that when you create a link to a CSV file on setting up the connection, the user gets asked by the connection setup wizard to specify the value separator characters and decimal point separator character. There seems to be no way to avoid this part of the connection setup wizard.

Comment 15 kartis56 2016-06-09 08:48:39 UTC

I'm using locale ja-JP, so at CALC import wizard encoding to ASCII, lang en.
wizard view is correct, but imported data is incorrect such as 1.23E+16

maybe not only murtibyte ploblem...

Comment 16 kartis56 2016-06-09 09:17:32 UTC

Ok I make 2 file and check separator

1) " quote csv
2) original null quote csv

1) makes corerct view. data type is VARCHAR and copy makes correct.
2) has same ploblem. DECIMAL data type and incorrect view.

Comment 17 kartis56 2016-06-09 09:30:30 UTC

Created attachment 125567 [details]
DB wizard

maybe separator {null}  works as field type DECIMAL?

Comment 18 kartis56 2016-06-10 19:24:49 UTC

I see bug 39124   https://bugs.documentfoundation.org//show_bug.cgi?id=39124
and test this

1) " quote csv for test

2) import BASE by wizard as qwuoted ", type VARCHAR

3) view table and, copy all

4) make new CALC file, and paste

5) DATA is float type and zero filled. such as 12,345,678,912,345,700

maybe something wrong at CALC wizard

Comment 19 kartis56 2016-06-10 19:28:23 UTC

and test next

1) open " quote csv by CALC, text separator is ", field separtor is conma
   wizard view is correct. 12345678912345678

2) there is float... 1.23E+16

Comment 20 kartis56 2016-06-10 19:37:47 UTC

and next test

1) make new table at BASE
   ID as INTEGER primary key
   id as BIGINT(19) 

2) id format change to #,##0

3) open table and enter new record such as 
   1 and 1234567890123456789

4) id is zero filled        1,234,567,890,123,460,000

Comment 21 kartis56 2016-06-12 08:24:27 UTC

OK I'll give up using LO .

Comment 22 Robert Großkopf 2016-06-12 15:24:39 UTC

Have tested it a little bit.
1. Opened the database https://bugs.documentfoundation.org/attachment.cgi?id=125563
2. Executed there in Tools > SQL:
CREATE TEXT TABLE "Csv" ("ID" BIGINT PRIMARY KEY);
(creates a text-table with BIGINT)
SET TABLE "Csv" SOURCE "new.csv;ignore_first=true;encoding=UTF-8";
(creates a connection to the first attached *.csv-file)
SELECT * FROM "Csv";
(shows the correct numbers)
3. Opened the same table in the GUI - couldn't display the right number, shows 12345678912345700 instead of 12345678912345678

Displaying numbers with more than 16 digit seems to be impossible in the GUI. But this numbers were saved in the database.

4. Now I created a query:
SELECT CAST("ID" AS VARCHAR(18)) FROM "Csv"
Shows the right value.

The bug for Base and Calc has nothing to do with DECIMAL or BIGINT, but with displaying the right value inside the GUI.

Comment 23 Julien Nabet 2016-06-12 15:50:01 UTC

Thank you Robert, I reproduced exactly what you indicated.

Lionel: any codepointer about table gui management?

(remark: "csv" table can't be seen on table list unless restarting LO but that's another story)

Comment 24 Robert Großkopf 2016-06-12 16:07:29 UTC

(In reply to Julien Nabet from comment #23)

> (remark: "csv" table can't be seen on table list unless restarting LO but
> that's another story)

You have to chose View > Refresh tables (or something like this - haven't looked in English GUI). This is the normal way to show the GUI tables have been added through direct SQL.

Comment 25 Lionel Elie Mamane 2016-06-12 16:38:29 UTC

(In reply to Julien Nabet from comment #23)

> Lionel: any codepointer about table gui management?

My guess is that it is because a formatted field use a double (IEEE 64bit double precision floating point number) for the number that is contained in it and/or for the formatting.

See forms/source/component/FormattedField.[ch]xx

The formatting is done by a system that is shared with Calc; I don't remember out of the top of my head where it is, but look for "SvNumberFormatsSupplier" in svl

Comment 26 kartis56 2016-06-13 05:02:02 UTC

(In reply to Lionel Elie Mamane from comment #25)
> (In reply to Julien Nabet from comment #23)
> 
> > Lionel: any codepointer about table gui management?
> 
> My guess is that it is because a formatted field use a double (IEEE 64bit
> double precision floating point number) for the number that is contained in
> it and/or for the formatting.
> 
> See forms/source/component/FormattedField.[ch]xx
> 
> The formatting is done by a system that is shared with Calc; I don't
> remember out of the top of my head where it is, but look for
> "SvNumberFormatsSupplier" in svl

maybe here?

core\svl\source\numbers\numfmuno.hxx 

or this?

https://docs.libreoffice.org/svl/html/classSvNumberFormatterServiceObj.html

Comment 27 Lionel Elie Mamane 2016-06-13 05:42:28 UTC

(In reply to kartis56 from comment #26)
> (In reply to Lionel Elie Mamane from comment #25)
>> (In reply to Julien Nabet from comment #23)

>> The formatting is done by a system that is shared with Calc; I don't
>> remember out of the top of my head where it is, but look for
>> "SvNumberFormatsSupplier" in svl

> maybe here?

> core\svl\source\numbers\numfmuno.hxx 
> https://docs.libreoffice.org/svl/html/classSvNumberFormatterServiceObj.html

Yes, that looks right.

Comment 28 kartis56 2016-06-15 08:10:01 UTC

\core\svl\source\numbers\zformat.cxx(57,40)

const double EXP_ABS_UPPER_BOUND = 1.0E15;  // use exponential notation above that absolute value.
                                            // Back in time was E16 that lead
                                            // to display rounding errors, see
                                            // also sal/rtl/math.cxx
                                            // doubleToString()

line 1750

void SvNumberformat::ImpGetOutputStandard(double& fNumber, OUString& rOutString)
{
    sal_uInt16 nStandardPrec = rScan.GetStandardPrec();

    if ( fabs(fNumber) > EXP_ABS_UPPER_BOUND )
    {
        nStandardPrec = ::std::min(nStandardPrec, static_cast<sal_uInt16>(14)); // limits to 14 decimals

but why use this ?

Comment 29 QA Administrators 2018-07-21 02:40:08 UTC Comment hidden (obsolete)

** Please read this message in its entirety before responding **

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.

If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not
appropriate in this case)

If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword

Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug

Comment 30 Julien Nabet 2019-11-10 16:04:36 UTC

(In reply to kartis56 from comment #28)
> \core\svl\source\numbers\zformat.cxx(57,40)
> 
> const double EXP_ABS_UPPER_BOUND = 1.0E15;  // use exponential notation
> above that absolute value.
>                                             // Back in time was E16 that lead
>                                             // to display rounding errors,
> see
>                                             // also sal/rtl/math.cxx
>                                             // doubleToString()
> 
> line 1750
> 
> void SvNumberformat::ImpGetOutputStandard(double& fNumber, OUString&
> rOutString)
> {
>     sal_uInt16 nStandardPrec = rScan.GetStandardPrec();
> 
>     if ( fabs(fNumber) > EXP_ABS_UPPER_BOUND )
>     {
>         nStandardPrec = ::std::min(nStandardPrec,
> static_cast<sal_uInt16>(14)); // limits to 14 decimals
> 
> but why use this ?

Taking a look at sal/rtl/math.cxx, we got:
    381     switch (eFormat)
    382     {
    383         case rtl_math_StringFormat_Automatic:
    384         {   // E or F depending on exponent magnitude
    385             int nPrec;
    386             if (nExp <= -15 || nExp >= 15)  // was <-16, >16 in ancient versions, which leads to inaccuracies
    387             {
    388                 nPrec = 14;
    389                 eFormat = rtl_math_StringFormat_E;
    390             }
(see https://opengrok.libreoffice.org/xref/core/sal/rtl/math.cxx?r=be8da979#380)

Eike: it seems it might be a WONTFIX, any thoughts?

Comment 31 Eike Rathke 2019-11-11 14:06:11 UTC

See also further above https://opengrok.libreoffice.org/xref/core/sal/rtl/math.cxx?r=be8da979#283 for how representable integer values are handled. If it reaches line 381 it means the value is no such.

The error is on user side to import the ID values as numeric instead of text.
Floating point IEEE 754 double precision values are limited in their accuracy, the largest representable unambiguous whole integer number is (2^53)-1 or 9007199254740991.