Bug 87386 - Calculation inaccuracy converting calculated numbers to dates and integers.
Summary: Calculation inaccuracy converting calculated numbers to dates and integers.
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.3.4.1 release
Hardware: x86 (IA32) All
: medium major
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-16 21:47 UTC by klsu
Modified: 2015-04-13 18:41 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
test file (17.05 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-12-17 12:28 UTC, raal
Details

Note You need to log in before you can comment on or make changes to this bug.
Description klsu 2014-12-16 21:47:40 UTC
Cell C13 & D13 contain general number 24.09 and text 24.09.13, respectively (European dates dd.mm and dd.mm.yy in a .CSV opened on a US machine). Formula to convert the first dd.mm and the second year to a date would be =DATE(2000+VALUE(RIGHT(D13,2)),MOD(C13,1)*100,INT(C13)), however, for this particular date the formula returns 08/24/13 rather than 09/24/13. Note that:
   =DATE(2000+VALUE(RIGHT(D13,2)),0.09*100,INT(C13)) works
   =DATE(2000+VALUE(RIGHT(D13,2)),C13*100-INT(C13)*100,INT(C13)) works
   =DATE(2000+VALUE(RIGHT(D13,2)),(C13-INT(C13))*100,INT(C13)) does not
   =DATE(2000+VALUE(RIGHT(D13,2)),MOD(C13,1)*100,INT(C13)) does not
   A similar problem occurs with number 7.05 and text 07.05.13 and others, but not all.
Comment 1 raal 2014-12-17 12:26:45 UTC
I can confirm with LO 4.3.4, win 7.
Comment 2 raal 2014-12-17 12:28:05 UTC
Created attachment 110946 [details]
test file
Comment 3 m.a.riosv 2014-12-18 23:24:25 UTC
As indicated by the help:

https://help.libreoffice.org/scalc/SC_HID_FUNC_DATUM?Language=en-US&System=WIN&Version=4.3#bm_id3149022

DATE use integers as parameters.

MOD(C13,1)*100 = 8,999999999999999

8 is the integer part of the number, which is the value for DATE().

an option is ROUND(MOD(C13,1)*100) or I think better without numeric transformations extracting text.

=DATE(RIGHT(D13;2);MID(D13;4;2);LEFT(D13))

=DATE(RIGHT(D13;2);MID(C13;4;2);LEFT(C13))

But perhaps the better is when importing CSV select properly the options to obtain directly date properly transformed. E.g. by selecting the original language use to create the csv.

In any case I think it is not a bug.

Please if you are not agree, reopen it.
Comment 4 klsu 2014-12-19 19:35:11 UTC
Based on the criteria for determining the importance of a bug, crashes and systems slowing to a crawl are the worst. From the point-of-view of engineering, medicine, logistic, finance, etc. there is something worse: wrong answers that cause losses. To anyone who doesn't understand how computers store and calculate with real numbers (most people), the 4 equations I provided should give identical results. In LibreOffice, they don't.

Either there is a bug in how LibreOffice handles decimal calculations internally (see bug 87506), or there is a bug or omission in how the DATE() function in LibreOffice handles the conversion of real numbers to integers (or both). The DATE() function shouldn't expect the user to make sure that how the month is calculated always returns an integer for the month, because dates are stored as real numbers, not integers (e.g., December 19, 2014 at 1:16 PM = 41992.5526594578). The month should not change because if an inaccuracy of 0.00000864 seconds (1E-010 days).

1 and 0 are integers, so all digital calculations are integer calculations. That means some real numbers cannot be calculated digitally with 100% accuracy. Being off by 1 in the 15th (or greater) decimal place is normally not a problem, except where INT and MOD type functions can turn a difference of 0.000000000000001 into a difference of 1, and can cause logical values that are true for practical purposes to be false. Most users are not aware of the details of digital computations and their limitations, will not know how to adjust their formulas accordingly, and will expect equivalent formulas to result in essentially identical answers. Therefore, digital calculations must account for this problem. In Excel and in most of LibreOffice Calc, they do; but in this case in LibreOffice, they don't.

 In fact, if the 4 equations I provided to reproduce the problem are entered in Excel, all 4 provide the same, correct result. The first two give correct results in LibreOffice because mulitplying by 100 eliminates all significant decimal digits.
Comment 5 klsu 2014-12-19 19:36:07 UTC
P.S. I can reproduce this problem in version 4.4.0Beta2DevDaily.
Comment 6 m.a.riosv 2014-12-20 00:43:25 UTC
Please take a look to this bug report just about MOD() function:

https://bugs.freedesktop.org/show_bug.cgi?id=50299

Specially Kohei comment #4.

Resolved as duplicate, please if you are not agree reopen it.

*** This bug has been marked as a duplicate of bug 50299 ***
Comment 7 klsu 2014-12-22 06:33:20 UTC
(In reply to m.a.riosv from comment #6)
> Please take a look to this bug report just about MOD() function:
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=50299
> 
> Specially Kohei comment #4.
> 
> Resolved as duplicate, please if you are not agree reopen it.
> 
> *** This bug has been marked as a duplicate of bug 50299 ***

Thank you, I've already reopened 50299, with comments 20, 21 & 22, and an attachment showing how the problem is not only a problem, but is worse than many of the other commenters seem to think (in LO Calc 4.4.0 Beta2 Dev Daily, MOD(230,10)=0, MOD(2.3*1000,100)=100). There is clearly an error with how LO Calc calculates the modulo. The problem is not so extreme in Excel. As long as hardware has been around, I doubt Intel and Motorola have been just ignoring postgrad work on fast modulus algorythms (e.g., http://www.dtic.mil/dtic/tr/fulltext/u2/a547555.pdf), so I question hardware as the cause. I am trying to figure out what to tell someone who is familiar with the MOD() function code that will help him/her fix it. So far, what I sure of is that the MOD() function is not rounding the dividend to the correct number of significant digits before doing the math. Real example:
=MOD(ROUND(36.3000000000002,9)*100,10) gives the result 10, which is wrong.
=MOD(ROUND(36.3,9)*100,10) gives the same result.
=MOD(36.8000000000003*100,9),10) gives 2.45563569478691E-011, which you might look better...until you see that
=MOD(36.3*100,10) gives 10

However, that's not the problem that caused me to create this bug report, so I am reopening it. Here, the problem appears to be that DATE() (and INT()) do the same thing; not rounding parameters passed as real numbers to the correct number of significant digits before executing the function. Passing a constant real number, doesn't cause so much trouble, but the whole point of functions is to allow working with calculated values, and those tend to be real until you do something special to them to convert them. The program is supposed to do that when it's needed, because it doesn't always display what it's actual result is, so the user has no way of knowing the general number 9 that's displayed is actually 8.999999999999999. MOD(24.09,1) should equal 0.09, although LO Calc displays 0.09 it is using 0.0899999999999999, so 0.09*100 displays 9 but uses 8.99999999999999. Since that 15 digit number is not correct, any function that expects an integer needs to make sure it's rounded correctly. But multiple calculations tend to adversely affect accuracy in more digits, so after 1000 additions, a value (real example) displayed as 100 may be represented internally as 99.9999999999986, so when dealing with significant digits, rounding must be to digits, not to decimal points, and rounding a 15 digit number to 15 places will eventually give a wrong answer. To use 99.9999999999986 in an integer or date function it must be rounded to 11 decimal places or 13 digits. I've done engineering calculations that added many thousands and were still accurate enough so rounding to 13 digits (not decimal places) was sufficient. If you want to see what number you're actually using, just use the formula =""&[cell reference].
Comment 8 klsu 2015-01-21 18:42:33 UTC
2015-01-21, thinking about doing income taxes soon, installed Open Office 4.1.1, Gnumeric 1.12.6, and Calligra Sheets 2.7.2 and determined that this problem exists in all of them; only Excel does this correctly. A one month error in a date can be the difference between Short-Term and Long-Term. If you believe that the I.R.S. doesn't care about that difference or would accept the excuse "I used OfficeLibre/Open Office/Gnumeric/Calligra Sheets, which has a bug in the DATE() function", dream on...

Since LibreOffice/OpenOffice, Gnumeric and Calligra have been around for years, and all still give wrong answers, not only does there appear to be no suitable alternative to Microsoft Excel for serious spreadsheet use, but the resources being applied to creating one are being spread in so many different directions that none is likely to work in the foreseeable future. The result is that users who want to change to get away from the hook-them-and-then-soak-them philosophy of commercial software development and marketing, can't; they have no viable alternative.
Comment 9 Matthew Francis 2015-04-11 10:06:43 UTC
Unfortunately this is simply not a proper use of general numbers, and will never work reliably. Floating point numbers just don't work that way.

Format the columns as dates, and use the DAY(), MONTH() and YEAR() functions as appropriate - or if you must, format all the columns as text so that the text based dissection you are attempting has a chance of succeeding.

Closing this bug again.
Comment 10 klsu 2015-04-11 17:52:45 UTC
Some people need to learn that computers and software are supposed to work for people. People do not work for computers. Telling someone they have to do what the computer wants doesn't resolve the problem (in this case, random wrong answers). So, "not a proper use of general numbers", "will never work reliably" and "Floating point numbers just don't work that way" don't change the fact that this is a bug.

The whole point of GUIs is "what you see is what you get". If someone sees "0.09", it is reasonable for them to expect it to work in calculations as 0.09, and it normally will. Where it won't is with any function that truncates digits without regard to their significance, so for those types of functions, the software must account for the fact that computers must work with more digits than they show, so they can display the correct answer.

This is not a problem with number types. It is a problem with algorithms used for functions. As has been pointed out in the thread of a related bug with the MOD() function, random wrong answers due to inadequate or improper handling of significant digits is not acceptable to anyone in a job where life or property could be harmed or lost by such an error. That means engineers, logisticians, doctors and accountants should not use the MOD() or DATE() functions in LibreOffice.

The fact that this discussion is ongoing tells me spreadsheet functions haven't been standardized, and that's too bad. I'm stuck with MS Excel for the foreseeable future, since the open source community apparently doesn't understand user requirements for spreadsheets. I got involved in bug reporting, thinking the community might want to know what users need. Now I'm not sure they do.

Someone who wants to get away from Windows won't find a spreadsheet for Linux that can be depended on for all the functions I use. But LibreOffice comes with those functions, and most people don't know they can give wrong answers. You could bury a warning with descriptions of the MOD() and DATE() functions saying they shouldn't be used for important work, but what engineer or accountant needs to read the descriptions of MOD() and DATE(). If the function doesn't work reliably, it shouldn't be implemented.

DATE() and MOD() worked as they should for the more than 20 years I've used them in Excel, so when someone tells me one "will never work reliably", it sounds like they think the current algorithm is the only one possible. I can think of many ways to work around the problem, but why would I unless I know the functions don't work properly? Most LO Calc users don't know, and if they did, I suspect they would not be happy if they use the DATE() function.

If I were programming a DATE() function that expected only integers as input for YEAR, MONTH, or DAY, I would make very sure that every passed value was appropriately rounded BEFORE being used in the function. (However, I probably wouldn't design the DATE() function to expect only integers, because dates are stored as floating point days that can include the information returned by TODAY() or NOW(), so the DATE() function should allow addition of decimal days to the value returned by NOW() or any other date/time.)
Comment 11 GerardF 2015-04-11 19:39:00 UTC
(In reply to klsu from comment #0)
> Cell C13 & D13 contain general number 24.09 and text 24.09.13, respectively
> (European dates dd.mm and dd.mm.yy in a .CSV opened on a US machine).
> Formula to convert the first dd.mm and the second year to a date would be
> =DATE(2000+VALUE(RIGHT(D13,2)),MOD(C13,1)*100,INT(C13)), however, for this
> particular date the formula returns 08/24/13 rather than 09/24/13. Note that:
>    =DATE(2000+VALUE(RIGHT(D13,2)),0.09*100,INT(C13)) works
>    =DATE(2000+VALUE(RIGHT(D13,2)),C13*100-INT(C13)*100,INT(C13)) works
>    =DATE(2000+VALUE(RIGHT(D13,2)),(C13-INT(C13))*100,INT(C13)) does not
>    =DATE(2000+VALUE(RIGHT(D13,2)),MOD(C13,1)*100,INT(C13)) does not
>    A similar problem occurs with number 7.05 and text 07.05.13 and others,
> but not all.

Instead of using DATE, VALUE, RIGHT, INT, MOD and other hundreds of Functions, just open correctly the .csv file and you will have true dates and don't need any functions.
Comment 12 Matthew Francis 2015-04-12 00:49:41 UTC
To repeat, as a member of LibreOffice QA, I am closing this bug. Reopening it will not change the outcome.
Comment 13 klsu 2015-04-13 18:07:06 UTC
(In reply to GerardF from comment #11)
> (In reply to klsu from comment #0)
> > Cell C13 & D13 contain general number 24.09 and text 24.09.13, respectively
> > (European dates dd.mm and dd.mm.yy in a .CSV opened on a US machine).
> > Formula to convert the first dd.mm and the second year to a date would be
> > =DATE(2000+VALUE(RIGHT(D13,2)),MOD(C13,1)*100,INT(C13)), however, for this
> > particular date the formula returns 08/24/13 rather than 09/24/13. Note that:
> >    =DATE(2000+VALUE(RIGHT(D13,2)),0.09*100,INT(C13)) works
> >    =DATE(2000+VALUE(RIGHT(D13,2)),C13*100-INT(C13)*100,INT(C13)) works
> >    =DATE(2000+VALUE(RIGHT(D13,2)),(C13-INT(C13))*100,INT(C13)) does not
> >    =DATE(2000+VALUE(RIGHT(D13,2)),MOD(C13,1)*100,INT(C13)) does not
> >    A similar problem occurs with number 7.05 and text 07.05.13 and others,
> > but not all.
> 
> Instead of using DATE, VALUE, RIGHT, INT, MOD and other hundreds of
> Functions, just open correctly the .csv file and you will have true dates
> and don't need any functions.

As I have said before, programs and computers are supposed to work for users, not vice versa. When programmers start to think they know more than their users about the things their programs are used for, it becomes a problem for users. I know how to open a .CSV; whereas you obviously haven't experienced that opening a .CSV created in a different language from the the machine it is being opened on doesn't always work as the programmers imagined it would. Someone who does nothing but programming doesn't typically know enough about how his program will be used to test it adequately. In addition, today's programmers are much less aware of the problem of unintended consequences because higher level languages make it easier to program without knowing what they're actually causing the computer do. This doesn't just caused bugs, but also severe security problems.

I would have asked for a workaround if I had needed one (you can see by the things I tried above that I didn't need one). If you're a programmer, I suggest you read my response to Matthew Francis' last comment.
Comment 14 klsu 2015-04-13 18:41:57 UTC
(In reply to Matthew Francis from comment #12)
> To repeat, as a member of LibreOffice QA, I am closing this bug. Reopening
> it will not change the outcome.

When I write an application for other people's use, I want to know what's wrong with it, so I can fix it ASAP. That's how professionals work. On the assumption that the people working on LibreOffice take pride in their work, I reported the problems I found. I've done all I can by reporting the bugs, by telling anyone I know to stay away from LibreOffice Calc, and by not using it myself for work that must be correct.

Other users who aren't as careful might not notice there's a problem, but if either of these bugs ever causes loss of life or financial loss, the law may force them to figure out why, and then LibreOffice developers will probably hear about it as well.

If you want to close the bug, it's obviously your prerogative, however, pretending it's not a bug doesn't resolve it. It sounds to me like you believe users should use spreadsheets the way you think spreadsheets should be used. I doubt you can imagine some of the calculations spreadsheets are actually used for or why.

As I explained about the MOD() bug: if a word processor or presentation graphic (eyewash) program has a bug, lives and money aren't at risk. Spreadsheets are different: the ceilings over your and you family's heads, and the bridges you drive over, may well have been designed using spreadsheets. Random wrong answers are worse than crashes; they're unacceptable.