129606 – Drag-fill series produces values like 6.00000000000001

Bug 129606 - Drag-fill series produces values like 6.00000000000001

Summary: Drag-fill series produces values like 6.00000000000001

Status:	RESOLVED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Calc (show other bugs)
Version: (earliest affected)	Inherited From OOo
Hardware:	All All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:	target:6.5.0 target:7.1.0 target:7.2....
Keywords:

Duplicates (1):	138795 (view as bug list)
Depends on:
Blocks:

Reported:	2019-12-25 07:15 UTC by Mike Kaganski
Modified:	2021-04-29 19:32 UTC (History)
CC List:	5 users (show)

See Also:	90419 130724 141970
Crash report or crash signature:

Attachments
file with some 'step-0,1-series' produced with different methods (74.95 KB, application/vnd.oasis.opendocument.spreadsheet) 2019-12-28 19:06 UTC, b.	Details
IEEE 754 values handeled in calc (68.24 KB, image/jpeg) 2020-01-05 01:12 UTC, b.	Details
129606_testsheet1_ori.ods, a sample with leftover floating point irritations (23.71 KB, application/vnd.oasis.opendocument.spreadsheet) 2020-01-26 10:39 UTC, b.	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mike Kaganski 2019-12-25 07:15:13 UTC

In a new spreadsheet:

1. Put 1 into A1
2. Put 1.1 into A2
3. Select A1:A2
4. Drag-fill down to A51

Cells A45, A47, A49:A51 contain values like 6.00000000000001

When the same sequence is filled using Sheet->Fill Cells...->Fill Series, putting 0.1 as increment, this doesn't happen.

Tested with Version: 6.4.0.1 (x64)
Build ID: 1b6477b31f0334bd8620a96f0aeeb449b587be9f
CPU threads: 12; OS: Windows 10.0 Build 18363; UI render: GL; VCL: win; 
Locale: ru-RU (ru_RU); UI-Language: en-US
Calc: threaded

Comment 1 Mike Kaganski 2019-12-25 07:36:22 UTC

https://gerrit.libreoffice.org/c/core/+/85795

Eike: could you please take a look at the proposed patch?

Comment 2 Mike Kaganski 2019-12-25 08:40:06 UTC

Interestingly, Excel also shows same effect - just starting on 6.50000000000001, not on 5.40000000000001.

Comment 3 Roman Kuznetsov 2019-12-25 13:53:24 UTC

set to NEW. The discussion in https://forumooo.ru/index.php/topic,7934.0.html

Comment 4 b. 2019-12-28 19:06:58 UTC Comment hidden (no-value)

Created attachment 156819 [details]
file with some 'step-0,1-series' produced with different methods

@Mike: you are aware of the source, 

just for others who check in for new: 

it's normal that you get rounding issues sometimes around the 15'th or 16'th significant decimal digit as calc does store numbers in floating point values, and that limits accuracy somehow. (reg. narrowing given or calculated decimal values by binary representation with limited length and thus limited precision some granularity steps in). 

short: 'According to en.wikipedia.org/wiki/IEEE_754 binary double 64bit have a precision of 15.95 digits if leading digit is not null, you have hit representation limit.' (somewhere in 'ask'?) 

irritating is that you can! display - display, not store! - such a value with 20 decimal digits in calc, and then see something like 6.00000000000001000000. in terms of our well beloved 'leading zero' (Trump) the last six zeros in this sample are 'fake zeros'. mnemonic bridge for that: the last sex is with faked holes. 

the digits are used and filled with accuracy on smaller values ... it's a tradeout to use those fakes to get 'nice looking columns' versus pretending incorrect accuracy ... i'd suggest to display them in different colour or with gray background to inform the user about their property as 'valueless placeholders'. 

(may be that the content xml file stores even the faked zeros - it does!, calc can't use them for calculations.)

the issue in your case, and the difference displayed on screen, might result from different ways used for the calculation. just an assumption: 'drag down' (i'll call it procedure 'A') has to do 44 steps of 'f(x(n))=f(x(n-1))+0,1' starting with f(x(1))=1, with 44 times rounding the running total and the rounding errors summed up?, 

while 'fill series' (i'll call it procedure 'B') is somehow 'smart' and calculates 'f(x(45))=(45-1)*(0,1)+1' and rounds only once? 

you can play around with similar functions like 1 in row 1 and '=1+(ROW()-1)*0,1' in the rows below (procedure C) - results similar to procedure 'B', or 1 in D1 and '=D1+0,1' in D2, D2 copied to D3:D1000 (procedure D), produces rounding errors as procedure 'A' but more likely rounded down instead of roundig up (on my system). 

calc itself isn't aware of theese rounded displays, play with comparision between e.g. A45 and B45 in attached sheet ... display is different but calc will handle them as identical, 

as rounding errors are a 'must have', and the mathematical way to achieve a goal is 'free', and the precision achieved is very near to the real world value ... notabug? 
 
but! ... if you look at your procedure 'A' and start with '0' in A1 (procedure 'E') you get 'clean results' ... thus ... 'assuming' (sorry Eike) someone fixed that for values below 1 (imho there was something that calc has one more significant digit for values below 1), but on fixing he'd work with 'small values' only and fixed rounding in 17'th digit to 'human acceptable', without considering the reduced accuracy for values above 1? 

procedures 'behind the scene' may be totally different ... 

you can achieve the accuracy of procedure 'B' for your task by 'fill series' or using procedure 'E' and add 1 to every step ... 

i had a similar issue in a financial sheet sometimes ago and proposed to 'round away' those errors ... it was not accepted reg. 'calc should never do unwanted roundings' or similar ... neglecting that calc already does! roundings - not towards decimals but towards binary values it can deal with. 

enhancement request? 

YES! 

it would be very nice and appreciated (by me!, by others?) and! save much time and headaches for plenty people to get calc working in the same manner as procedure 'B' or 'E' for tasks as procedure 'A' and 'D', not the same accuracy, that's not possible, but the same way to handle the binary results for user to work with ... 

is that a solution? roundsig()? 

https://ask.libreoffice.org/en/question/109372/incorrect-results-produced-by-simple-subtraction/ 

yes and no, it does push away the errors, but at the cost of lost comfort / functionality as e.g. a dragdown of such formulas won't work. 

to long, sorry, i'll make a break here ... :-)

reg. 

b.

Comment 5 Mike Kaganski 2019-12-29 06:02:29 UTC

FTR: the assumptions in comment 4 have nothing to do with reality. Both fill series dialog and drag-fill use the same method of multiplying the increment by number of steps. The difference between these two methods is in determining the increment, which is trivial for fill series (the increment value is given directly), but requires a subtraction of floating-point values of varying magnitude in case of drag-fill. When the magnitude of initial values in drag-fill increases, the absolute precision of the subtraction decreases. See below the comparison of approximate values of subtraction result and 0.1 given directly, stored as IEEE 754 64-bit binary:

1.1 - 1.0 => ~0.10000000000000009
0.1 => ~0,10000000000000001

This can be seen e.g. at http://weitz.de/ieee/.

Also worth mentioning that "15.95" from Wikipedia might be misleading. For any number with decimal representation

a.bcdefghijklmnopq * 10^x

where a is not 0, storing it as 64-bit binary IEEE 754 produces a number which, when converted back to base10 and rounded to original precision, has first *16* correct digits (i.e., a to p); and 17th is off by varying degree. So "15" there in the article is not the number of correct decimal digits, but number of correct digits *after the most significant*.

And last note is that the citation was not from "somewhere in 'ask'", but from stackexchange.com [1].

[1] https://unix.stackexchange.com/questions/472618/can-i-increase-the-number-of-decimal-places-stored-by-libre-office-calc

Comment 6 b. 2020-01-05 01:12:42 UTC Comment hidden (no-value)

Created attachment 156935 [details]
IEEE 754 values handeled in calc

hello @Mike, 

happy new year, 

i already admitted that my comment may be valueless for you, but might be usefule for others less experienced, 

your sample: 

1.1 - 1.0 => ~0.10000000000000009
0.1       => ~0,10000000000000001

is a little 'off' in two respects: 

- it's likely ~0.10000000000000001 - with a dot, 

- and it's handled different in calc, both results 0.10000000000000000000, see attached screenshot, 

that may result from the 'valueless placeholders' you supressed with the valueless comment, 

thanks for correcting the citation, 

regardless of your rough tone i will be happy if you succeed in making this little piece of calc working smooth for users without breaking functionality or correctness ... 

reg. 

b.

Comment 7 Mike Kaganski 2020-01-05 06:08:49 UTC

(In reply to b. from comment #6)
> - it's likely ~0.10000000000000001 - with a dot, 

Yes, I forgot to replace my local comma with a dot (that I usually do for convenience of others, to settle with some convention that seems to be used more here).

> - and it's handled different in calc, both results 0.10000000000000000000,
> see attached screenshot, 

And now change your example:

=0.1*60
=(1.1-1.0)*60

Comment 8 Commit Notification 2020-01-08 14:58:17 UTC

Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/8f46501233c164ff91d77a7f5adf74ea16cd0165

tdf#129606: limit precision in ScTable::FillAnalyse

It will be available in 6.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 9 b. 2020-01-16 19:11:24 UTC

> And now change your example:

> =0.1*60
> =(1.1-1.0)*60

exactly! that fails (in 6.1) and I had wondered about such 'irritating' behavior years ago - at ask? - and was dismissed with 'learn floating point first', and 'leave us professionals alone with your immature talk' ... thanks for taking the 'immature talk' seriously and being able to regulate it ... I hope so, I couldn't test with 6.5 yet. 

best regards, 

b.

Comment 10 Mike Kaganski 2020-01-17 05:18:53 UTC

It's very difficult to discuss anything with an ignorant person refusing to learn and Gish-galloping all the time.

Comment 11 b. 2020-01-19 10:42:52 UTC

@Mike: 

YES!

Comment 12 b. 2020-01-26 10:39:02 UTC

Created attachment 157434 [details]
129606_testsheet1_ori.ods, a sample with leftover floating point irritations

hello @Mike, 

i couldn't find a version 6.5x where to be sure your patch is included, downloaded 'master' - now named 7.0x - shortly, and it looks as if your patch is in and solves the dragfill problem. 

but the problem of unprecise and 'user irritating' decimal - floating.point - binary - floating-point - decimal conversion behaviour seems more general, see red marked cells in attached sheet. 

(e.g. '=(1234+0,12)-1234' which one would expect so result in 0,12 instead produces 0,11999999999989100000 with as well showing the rounding problem as the irritating zeros behind it pretending the result to be more accurate than it really is.)

i couldn't find the old discussion about that problem in short time, instead some other which talks about the problem and! a possible solution?. 

you can find it there: 

https://www.libreoffice-forum.de/viewtopic.php?f=6&t=16543&hilit=floating+point&start=10#p43772

the point: 

'A 'simple' solution to the dilemma would be to use the newer floating point format IEEE 754-2008, because here an exact conversion of decimal values would be possible even with decimal places.'

sounds somehow promising for me, but i can't say how far it would reach. 

neither can i say how much work it would be to implement it and how deep the compatibility cut would be. 

(at all, as the 'new' standard is quite old already, i can't say if it or parts of it have already been implemented in calc.)

wikipedia talks about an even newer version there: https://en.wikipedia.org/wiki/IEEE_754#2019

maybe you'll like to dig into it, or you can give it to persons or groups which are interested or 'in charge'. 

as the discussion is in german i tried a translation with deepl: 

************
Re: Calculation error / imprecision LO Calc

Contribution by Mr. Hobbybyte " Thu Aug 4, 2016, 10:45
after several attempts to narrow the whole thing down a bit,
I have come to the following conclusion :

- Calc calculates internally with 32 bit floating point values
- Intermediate values are stored in the (old) floating point format according to IEEE754

and thus the problem is causal, as mikele above already correctly suspected,
was caused by an inaccurate conversion to the (old) floating point format.
This then leads to the inaccuracies described above when performing additions.
However, I would have suspected that calculations are either caused by an own
arithmetic or at least with the help of more precise intermediate values
take place. However, this is not (yet) the case.

@mikele
Your test shows exactly the problem with the intermediate values.
A 'simple' solution to the dilemma would be to use the newer floating point format IEEE 754-2008, because here an exact conversion of decimal values would be possible even with decimal places.
Since I haven't found the time to look at the source more closely, I don't know if this is also would be easy to implement (bfp754.h), since all basic functions are affected.

@Hiker
The "Accuracy as displayed" option may obscure the error under certain circumstances, but in the calculation with fractions even produce new errors. During the tests this option was shut down everywhere.
I have tested with different Excel versions from 97 - 2013 and found different (in)accuracies has been determined. I even found some effects of the operating system used in Excel (32 or 64 bit)
A 64 Bit Excel was not available for testing
According to my current knowledge, the problem of Excel is also only solved by clever rounding with the help of the last Place(s) bypassed, but not really solved.

Thanks and best regards
************

reg. 

b. 

P.S. @Xisco if - for whatever reason - you'd like a new 'bug' for this enhancement proposal instead warming up 'old stuff' ... feel free to move it accordingly. as well you may decide if this bug should be reopened.

Comment 13 b. 2020-01-26 10:45:17 UTC Comment hidden (off-topic)

sorry, looks as if my red marks are lost in transition, cells B4, B7, B10, B13 and B16 are affected. 

b.

Comment 14 b. 2020-01-27 16:43:22 UTC

opened new bug 

https://bugs.documentfoundation.org/show_bug.cgi?id=130221

for the '=(1234+0,12)-1234' issue and question for IEEE format, 

issue from comment#7: 

=0.1*60        -> 6,00000000000000000000
=(1.1-1.0)*60  -> 6,0000000000000*1*000000

still fails in: 

Version: 7.0.0.0.alpha0+ (x64)
Build ID: 07b1159b79135857dd9a450c3bb9ae0a944ebcf9
CPU threads: 8; OS: Windows 6.1 Service Pack 1 Build 7601; UI render: default; VCL: win; 
Locale: de-DE (de_DE); UI-Language: en-US
Calc: 

suggest reopening

Comment 15 b. 2020-09-21 06:58:30 UTC

again i stumbled over a simple dragfill problem, (or two?),

1.:

B3: 6,9
B4: 6,8
select both cells and dragfill down, numbers like 6,29999999999999 starting in B9, (visible in the input line or when displaying with more decimals),
accumulating, e.g. to a deviation of -0,00000000000013 at -6,0

ver. 7.1 2020-09-05, win7x64,

reopen? new bug? or forget about decimal math precision?

(assumption - only assumption! - correcting the difference used as argument by 'fixed' rounding fails as the arguments loose precision with increasing magnitude, and thus the deviation of the difference increases)

2.:

for this case the workaround with [sheet - fill cells] also fails, 0,1 and -0,1 are filled with 0,0999999999999996 and -~ instead,

same ver. as above,

covered with this bug? reopen? new bug? or forget about decimal math precision?

(assumption - only assumption! - the errors 'more likely' become apparent when applied to ranges with better (decimal or binary) precision, will say small numbers)

suggestion: to find the 'true' increment between two decimal values (e.g. 6,9 and 6,8) you may legally round the decimal calculated from the 'precise' fp-value (but 'fuzzy' regarding decimal - binary - decimal conversion) to the max of the decimal places of the arguments (one decimal place in the 6,9 - 6,8 case), (is 'flexible rounding' an appr. term for that?), reg. school math rules the decimal places of the result of '+' and '-' operations are limited like that,

with that you get an increment value as exact as when keyed into [sheet - fill cells] and can concentrate on the 'fill cells' error,

ok?

Comment 16 Mike Kaganski 2020-09-21 07:48:35 UTC

(In reply to b. from comment #15)
> 1.: 
new bug
for unclear reason, the 6.7 isn't stored as 6.7999999999999998, but as 6.7999999999999989. This is an own bug.

> 2.: 
forget about decimal math precision.

Comment 17 Mike Kaganski 2020-09-21 07:49:36 UTC

(In reply to Mike Kaganski from comment #16)
> for unclear reason, the 6.7
6.8 was meant, naturally.

Comment 18 Mike Kaganski 2020-09-21 08:05:20 UTC

(In reply to Mike Kaganski from comment #16)
> for unclear reason, the 6.7 isn't stored as 6.7999999999999998, but as
> 6.7999999999999989. This is an own bug.

Ah no, it was me confusing myself. I overwrote the 6.8 myself while experimenting.

So:

(In reply to b. from comment #15)
> 1.: 
forget about decimal math precision.

Comment 19 Commit Notification 2020-09-21 09:13:19 UTC

Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/d5ebe7c3089ab9f4d3fe0707169fc1ce024cdb70

tdf#129606: Round the mean of the two subtractions

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 20 Mike Kaganski 2020-09-21 11:59:23 UTC

(In reply to Commit Notification from comment #19)

So I decided to make this change, that *seems* reasonable. However, I can't be sure that the change would not introduce a regression for similar cases where it used to give better results: e.g., where c happened to be closer to ideal value than d, and their mean would round to a worse increment value ... that's entirely possible. As mentioned: there is no way to make this work ideally, and there's no goal in that here, only to improve things that could be reasonable improved.

Comment 21 b. 2020-09-27 09:33:33 UTC

OT for the problem, but might be helpful to understand floats: 

------------ 
c#05: "Also worth mentioning that "15.95" from Wikipedia might be misleading. For any number with decimal representation

a.bcdefghijklmnopq * 10^x

where a is not 0, storing it as 64-bit binary IEEE 754 produces a number which, when converted back to base10 and rounded to original precision, has first *16* correct digits (i.e., a to p)" 
------------ 

that holds up to 9,007199254740992*10^15, above that value double floats have a 'granularity' of 2 (52 binary 'after dot' digits with an exponent of 53, thus times 2^53), 
thus for about 10% of the number range from 0 to 9,999999999999999*10^15 the precision is '2' instead of '1', '16 correct significant decimal digits' is not fullfilled, which is probably expressed by the value 15.95 at wikipedia, 

(keying in 9007199254740993.0 into the cited Weitz IEEE calculator is changed on the fly to 9007199254740992.0, while 9007199254740993.1 changes to 9007199254740994.0 (which expresses an own understanding of rounding))

calc does similar but - afais - has integer precision only up to 9,007199254740991 and shortens floats to 15 digits, and does that more by truncating than roundig (only for the display string?) which leads to funny number sequences if you repeat adding 1 to the predecessor starting with e.g. 9007199254740989.0 (ok, '1' is cutted away instead of rounded up, try 1,000000000000001)

similar happens for e.g. 4.503.599.627.370.494 with repeated '+ 0,5'

4.503.599.627.370.494,000
4.503.599.627.370.490,000  too harsh rouding (15 digits), 
                    ^
4.503.599.627.370.495,000  correct
4.503.599.627.370.500,000  too harsh rounding, 
                  ^^^
4.503.599.627.370.496,000  correct
4.503.599.627.370.496,000  not rounded up, 
                    ^ ^
4.503.599.627.370.496,000  wrong value carried on, 
                    ^
4.503.599.627.370.496,000  ...
                    ^
4.503.599.627.370.496,000
                    ^
4.503.599.627.370.496,000
                    ^
4.503.599.627.370.496,000
                    ^
questionable rounding, 
'holding precision internally' not fulfilled, 


4.503.599.627.370.494 repeatedly '+ 0,5000000000000001'

4.503.599.627.370.494,000
4.503.599.627.370.490,000  in 16 digits ~495 (rounded) is possible,  
                    ^
4.503.599.627.370.495,000  correct
4.503.599.627.370.500,000  too harsh rounding, 
                  ^^^
4.503.599.627.370.496,000  correct, 
4.503.599.627.370.497,000
4.503.599.627.370.498,000  rounded value taken as argument? 
                    ^
4.503.599.627.370.499,000  rounded value taken as argument? 
                    ^
4.503.599.627.370.500,000  again ...
                  ^^^
4.503.599.627.370.501,000  again ...
                  ^^^
4.503.599.627.370.502,000  again ...
                  ^^^
4.503.599.627.370.503,000  again ...
                  ^^^
4.503.599.627.370.504,000  again ...
                   ^^
questionable rounding, 
'holding precision internally' not fulfilled, 

irritating for simple minded users like me ... 

similar wants to happen for other values, especially when crossing precision range borders, but overall the 'rounding concept' of calc seems problematic to me ... 

(ex$el (2010) is worse in rounding / truncating, but - afais - produces series with better monotonity while matching or exceeding the other errors of calc, including the 0,0999... instead of 0,1 on sheet - fill)

Comment 22 Mike Kaganski 2020-12-10 12:13:40 UTC

*** Bug 138795 has been marked as a duplicate of this bug. ***

Comment 23 Commit Notification 2021-03-16 13:28:05 UTC

Xisco Fauli committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/6fdff333560e625b7c5c4dd9053c9945a45a52d1

tdf#129606: sc_ucalc: Add unittest

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 24 Mike Kaganski 2021-04-29 09:25:53 UTC

FTR: As expected, there is a downside of the change in comment 8 (see comment 20 "I can't be sure that the change would not introduce a regression for similar cases where it used to give better results", which applies universally to any such attempt to improve).

For the sequence 78, 78.1, ...
version 7.0 and later shows 78.4999999999999, while 6.4 had 78.5; in 6.4, the first problematic result was 78,9999999999999.

The reason is that after commit 8f46501233c164ff91d77a7f5adf74ea16cd0165, the number 0.099999999999994316 is rounded to 14 decimals, when previously it was rounded to 15 decimals. So before the commit, the increment was ~0.099999999999994, and after the commit it became ~0.09999999999999, i.e. the error has increased.

As expected, there's no silver bullet. At this time, no ideas and incentive how to further improve this, but constructive ideas are welcome *in a new* bug report.

Comment 25 Commit Notification 2021-04-29 14:49:39 UTC

Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/aa096331ba468f19951e43e2550105d4dcf50053

tdf#141970 Revert "tdf#129606: Round the mean of the two subtractions"

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 26 Commit Notification 2021-04-29 19:32:56 UTC

Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-7-1":

https://git.libreoffice.org/core/commit/80d111c948d1553e9e0015738289b471cfe137f1

tdf#141970 Revert "tdf#129606: Round the mean of the two subtractions"

It will be available in 7.1.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.