Bug 129606 - Drag-fill series produces values like 6.00000000000001
Summary: Drag-fill series produces values like 6.00000000000001
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:6.5.0 target:7.1.0 target:7.2....
Keywords:
: 138795 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-12-25 07:15 UTC by Mike Kaganski
Modified: 2021-04-29 19:32 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
file with some 'step-0,1-series' produced with different methods (74.95 KB, application/vnd.oasis.opendocument.spreadsheet)
2019-12-28 19:06 UTC, b.
Details
IEEE 754 values handeled in calc (68.24 KB, image/jpeg)
2020-01-05 01:12 UTC, b.
Details
129606_testsheet1_ori.ods, a sample with leftover floating point irritations (23.71 KB, application/vnd.oasis.opendocument.spreadsheet)
2020-01-26 10:39 UTC, b.
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2019-12-25 07:15:13 UTC
In a new spreadsheet:

1. Put 1 into A1
2. Put 1.1 into A2
3. Select A1:A2
4. Drag-fill down to A51

Cells A45, A47, A49:A51 contain values like 6.00000000000001

When the same sequence is filled using Sheet->Fill Cells...->Fill Series, putting 0.1 as increment, this doesn't happen.

Tested with Version: 6.4.0.1 (x64)
Build ID: 1b6477b31f0334bd8620a96f0aeeb449b587be9f
CPU threads: 12; OS: Windows 10.0 Build 18363; UI render: GL; VCL: win; 
Locale: ru-RU (ru_RU); UI-Language: en-US
Calc: threaded
Comment 1 Mike Kaganski 2019-12-25 07:36:22 UTC
https://gerrit.libreoffice.org/c/core/+/85795

Eike: could you please take a look at the proposed patch?
Comment 2 Mike Kaganski 2019-12-25 08:40:06 UTC
Interestingly, Excel also shows same effect - just starting on 6.50000000000001, not on 5.40000000000001.
Comment 3 Roman Kuznetsov 2019-12-25 13:53:24 UTC
set to NEW. The discussion in https://forumooo.ru/index.php/topic,7934.0.html
Comment 4 b. 2019-12-28 19:06:58 UTC Comment hidden (no-value)
Comment 5 Mike Kaganski 2019-12-29 06:02:29 UTC
FTR: the assumptions in comment 4 have nothing to do with reality. Both fill series dialog and drag-fill use the same method of multiplying the increment by number of steps. The difference between these two methods is in determining the increment, which is trivial for fill series (the increment value is given directly), but requires a subtraction of floating-point values of varying magnitude in case of drag-fill. When the magnitude of initial values in drag-fill increases, the absolute precision of the subtraction decreases. See below the comparison of approximate values of subtraction result and 0.1 given directly, stored as IEEE 754 64-bit binary:

1.1 - 1.0 => ~0.10000000000000009
0.1       => ~0,10000000000000001

This can be seen e.g. at http://weitz.de/ieee/.

Also worth mentioning that "15.95" from Wikipedia might be misleading. For any number with decimal representation

a.bcdefghijklmnopq * 10^x

where a is not 0, storing it as 64-bit binary IEEE 754 produces a number which, when converted back to base10 and rounded to original precision, has first *16* correct digits (i.e., a to p); and 17th is off by varying degree. So "15" there in the article is not the number of correct decimal digits, but number of correct digits *after the most significant*.

And last note is that the citation was not from "somewhere in 'ask'", but from stackexchange.com [1].

[1] https://unix.stackexchange.com/questions/472618/can-i-increase-the-number-of-decimal-places-stored-by-libre-office-calc
Comment 6 b. 2020-01-05 01:12:42 UTC Comment hidden (no-value)
Comment 7 Mike Kaganski 2020-01-05 06:08:49 UTC
(In reply to b. from comment #6)
> - it's likely ~0.10000000000000001 - with a dot, 

Yes, I forgot to replace my local comma with a dot (that I usually do for convenience of others, to settle with some convention that seems to be used more here).

> - and it's handled different in calc, both results 0.10000000000000000000,
> see attached screenshot, 

And now change your example:

=0.1*60
=(1.1-1.0)*60
Comment 8 Commit Notification 2020-01-08 14:58:17 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/8f46501233c164ff91d77a7f5adf74ea16cd0165

tdf#129606: limit precision in ScTable::FillAnalyse

It will be available in 6.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 b. 2020-01-16 19:11:24 UTC
> And now change your example:

> =0.1*60
> =(1.1-1.0)*60

exactly! that fails (in 6.1) and I had wondered about such 'irritating' behavior years ago - at ask? - and was dismissed with 'learn floating point first', and 'leave us professionals alone with your immature talk' ... thanks for taking the 'immature talk' seriously and being able to regulate it ... I hope so, I couldn't test with 6.5 yet. 

best regards, 

b.
Comment 10 Mike Kaganski 2020-01-17 05:18:53 UTC
It's very difficult to discuss anything with an ignorant person refusing to learn and Gish-galloping all the time.
Comment 11 b. 2020-01-19 10:42:52 UTC
@Mike: 

YES!
Comment 12 b. 2020-01-26 10:39:02 UTC
Created attachment 157434 [details]
129606_testsheet1_ori.ods, a sample with leftover floating point irritations

hello @Mike, 

i couldn't find a version 6.5x where to be sure your patch is included, downloaded 'master' - now named 7.0x - shortly, and it looks as if your patch is in and solves the dragfill problem. 

but the problem of unprecise and 'user irritating' decimal - floating.point - binary - floating-point - decimal conversion behaviour seems more general, see red marked cells in attached sheet. 

(e.g. '=(1234+0,12)-1234' which one would expect so result in 0,12 instead produces 0,11999999999989100000 with as well showing the rounding problem as the irritating zeros behind it pretending the result to be more accurate than it really is.)

i couldn't find the old discussion about that problem in short time, instead some other which talks about the problem and! a possible solution?. 

you can find it there: 

https://www.libreoffice-forum.de/viewtopic.php?f=6&t=16543&hilit=floating+point&start=10#p43772

the point: 

'A 'simple' solution to the dilemma would be to use the newer floating point format IEEE 754-2008, because here an exact conversion of decimal values would be possible even with decimal places.'

sounds somehow promising for me, but i can't say how far it would reach. 

neither can i say how much work it would be to implement it and how deep the compatibility cut would be. 

(at all, as the 'new' standard is quite old already, i can't say if it or parts of it have already been implemented in calc.)

wikipedia talks about an even newer version there: https://en.wikipedia.org/wiki/IEEE_754#2019

maybe you'll like to dig into it, or you can give it to persons or groups which are interested or 'in charge'. 

as the discussion is in german i tried a translation with deepl: 

************
Re: Calculation error / imprecision LO Calc

Contribution by Mr. Hobbybyte " Thu Aug 4, 2016, 10:45
after several attempts to narrow the whole thing down a bit,
I have come to the following conclusion :

- Calc calculates internally with 32 bit floating point values
- Intermediate values are stored in the (old) floating point format according to IEEE754

and thus the problem is causal, as mikele above already correctly suspected,
was caused by an inaccurate conversion to the (old) floating point format.
This then leads to the inaccuracies described above when performing additions.
However, I would have suspected that calculations are either caused by an own
arithmetic or at least with the help of more precise intermediate values
take place. However, this is not (yet) the case.

@mikele
Your test shows exactly the problem with the intermediate values.
A 'simple' solution to the dilemma would be to use the newer floating point format IEEE 754-2008, because here an exact conversion of decimal values would be possible even with decimal places.
Since I haven't found the time to look at the source more closely, I don't know if this is also would be easy to implement (bfp754.h), since all basic functions are affected.

@Hiker
The "Accuracy as displayed" option may obscure the error under certain circumstances, but in the calculation with fractions even produce new errors. During the tests this option was shut down everywhere.
I have tested with different Excel versions from 97 - 2013 and found different (in)accuracies has been determined. I even found some effects of the operating system used in Excel (32 or 64 bit)
A 64 Bit Excel was not available for testing
According to my current knowledge, the problem of Excel is also only solved by clever rounding with the help of the last Place(s) bypassed, but not really solved.

Thanks and best regards
************

reg. 

b. 

P.S. @Xisco if - for whatever reason - you'd like a new 'bug' for this enhancement proposal instead warming up 'old stuff' ... feel free to move it accordingly. as well you may decide if this bug should be reopened.
Comment 13 b. 2020-01-26 10:45:17 UTC Comment hidden (off-topic)
Comment 14 b. 2020-01-27 16:43:22 UTC
opened new bug 

https://bugs.documentfoundation.org/show_bug.cgi?id=130221

for the '=(1234+0,12)-1234' issue and question for IEEE format, 

issue from comment#7: 

=0.1*60        -> 6,00000000000000000000
=(1.1-1.0)*60  -> 6,0000000000000*1*000000

still fails in: 

Version: 7.0.0.0.alpha0+ (x64)
Build ID: 07b1159b79135857dd9a450c3bb9ae0a944ebcf9
CPU threads: 8; OS: Windows 6.1 Service Pack 1 Build 7601; UI render: default; VCL: win; 
Locale: de-DE (de_DE); UI-Language: en-US
Calc: 

suggest reopening
Comment 15 b. 2020-09-21 06:58:30 UTC
again i stumbled over a simple dragfill problem, (or two?), 

1.: 

B3: 6,9
B4: 6,8
select both cells and dragfill down, numbers like 6,29999999999999 starting in B9, (visible in the input line or when displaying with more decimals), 
accumulating, e.g. to a deviation of -0,00000000000013 at -6,0

ver. 7.1 2020-09-05, win7x64, 

reopen? new bug? or forget about decimal math precision? 

(assumption - only assumption! - correcting the difference used as argument by 'fixed' rounding fails as the arguments loose precision with increasing magnitude, and thus the deviation of the difference increases) 


2.: 

for this case the workaround with [sheet - fill cells] also fails, 0,1 and -0,1 are filled with 0,0999999999999996 and -~ instead, 

same ver. as above, 

covered with this bug? reopen? new bug? or forget about decimal math precision? 

(assumption - only assumption! - the errors 'more likely' become apparent when applied to ranges with better (decimal or binary) precision, will say small numbers)


suggestion: to find the 'true' increment between two decimal values (e.g. 6,9 and 6,8) you may legally round the decimal calculated from the 'precise' fp-value (but 'fuzzy' regarding decimal - binary - decimal conversion) to the max of the decimal places of the arguments (one decimal place in the 6,9 - 6,8 case), (is 'flexible rounding' an appr. term for that?), reg. school math rules the decimal places of the result of '+' and '-' operations are limited like that,  

with that you get an increment value as exact as when keyed into [sheet - fill cells] and can concentrate on the 'fill cells' error, 

ok?
Comment 16 Mike Kaganski 2020-09-21 07:48:35 UTC
(In reply to b. from comment #15)
> 1.: 
new bug
for unclear reason, the 6.7 isn't stored as 6.7999999999999998, but as 6.7999999999999989. This is an own bug.

> 2.: 
forget about decimal math precision.
Comment 17 Mike Kaganski 2020-09-21 07:49:36 UTC
(In reply to Mike Kaganski from comment #16)
> for unclear reason, the 6.7
6.8 was meant, naturally.
Comment 18 Mike Kaganski 2020-09-21 08:05:20 UTC
(In reply to Mike Kaganski from comment #16)
> for unclear reason, the 6.7 isn't stored as 6.7999999999999998, but as
> 6.7999999999999989. This is an own bug.

Ah no, it was me confusing myself. I overwrote the 6.8 myself while experimenting.

So:

(In reply to b. from comment #15)
> 1.: 
forget about decimal math precision.
Comment 19 Commit Notification 2020-09-21 09:13:19 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/d5ebe7c3089ab9f4d3fe0707169fc1ce024cdb70

tdf#129606: Round the mean of the two subtractions

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Mike Kaganski 2020-09-21 11:59:23 UTC
(In reply to Commit Notification from comment #19)

So I decided to make this change, that *seems* reasonable. However, I can't be sure that the change would not introduce a regression for similar cases where it used to give better results: e.g., where c happened to be closer to ideal value than d, and their mean would round to a worse increment value ... that's entirely possible. As mentioned: there is no way to make this work ideally, and there's no goal in that here, only to improve things that could be reasonable improved.
Comment 21 b. 2020-09-27 09:33:33 UTC
OT for the problem, but might be helpful to understand floats: 

------------ 
c#05: "Also worth mentioning that "15.95" from Wikipedia might be misleading. For any number with decimal representation

a.bcdefghijklmnopq * 10^x

where a is not 0, storing it as 64-bit binary IEEE 754 produces a number which, when converted back to base10 and rounded to original precision, has first *16* correct digits (i.e., a to p)" 
------------ 

that holds up to 9,007199254740992*10^15, above that value double floats have a 'granularity' of 2 (52 binary 'after dot' digits with an exponent of 53, thus times 2^53), 
thus for about 10% of the number range from 0 to 9,999999999999999*10^15 the precision is '2' instead of '1', '16 correct significant decimal digits' is not fullfilled, which is probably expressed by the value 15.95 at wikipedia, 

(keying in 9007199254740993.0 into the cited Weitz IEEE calculator is changed on the fly to 9007199254740992.0, while 9007199254740993.1 changes to 9007199254740994.0 (which expresses an own understanding of rounding))

calc does similar but - afais - has integer precision only up to 9,007199254740991 and shortens floats to 15 digits, and does that more by truncating than roundig (only for the display string?) which leads to funny number sequences if you repeat adding 1 to the predecessor starting with e.g. 9007199254740989.0 (ok, '1' is cutted away instead of rounded up, try 1,000000000000001)

similar happens for e.g. 4.503.599.627.370.494 with repeated '+ 0,5'

4.503.599.627.370.494,000
4.503.599.627.370.490,000  too harsh rouding (15 digits), 
                    ^
4.503.599.627.370.495,000  correct
4.503.599.627.370.500,000  too harsh rounding, 
                  ^^^
4.503.599.627.370.496,000  correct
4.503.599.627.370.496,000  not rounded up, 
                    ^ ^
4.503.599.627.370.496,000  wrong value carried on, 
                    ^
4.503.599.627.370.496,000  ...
                    ^
4.503.599.627.370.496,000
                    ^
4.503.599.627.370.496,000
                    ^
4.503.599.627.370.496,000
                    ^
questionable rounding, 
'holding precision internally' not fulfilled, 


4.503.599.627.370.494 repeatedly '+ 0,5000000000000001'

4.503.599.627.370.494,000
4.503.599.627.370.490,000  in 16 digits ~495 (rounded) is possible,  
                    ^
4.503.599.627.370.495,000  correct
4.503.599.627.370.500,000  too harsh rounding, 
                  ^^^
4.503.599.627.370.496,000  correct, 
4.503.599.627.370.497,000
4.503.599.627.370.498,000  rounded value taken as argument? 
                    ^
4.503.599.627.370.499,000  rounded value taken as argument? 
                    ^
4.503.599.627.370.500,000  again ...
                  ^^^
4.503.599.627.370.501,000  again ...
                  ^^^
4.503.599.627.370.502,000  again ...
                  ^^^
4.503.599.627.370.503,000  again ...
                  ^^^
4.503.599.627.370.504,000  again ...
                   ^^
questionable rounding, 
'holding precision internally' not fulfilled, 

irritating for simple minded users like me ... 

similar wants to happen for other values, especially when crossing precision range borders, but overall the 'rounding concept' of calc seems problematic to me ... 

(ex$el (2010) is worse in rounding / truncating, but - afais - produces series with better monotonity while matching or exceeding the other errors of calc, including the 0,0999... instead of 0,1 on sheet - fill)
Comment 22 Mike Kaganski 2020-12-10 12:13:40 UTC
*** Bug 138795 has been marked as a duplicate of this bug. ***
Comment 23 Commit Notification 2021-03-16 13:28:05 UTC
Xisco Fauli committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/6fdff333560e625b7c5c4dd9053c9945a45a52d1

tdf#129606: sc_ucalc: Add unittest

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 24 Mike Kaganski 2021-04-29 09:25:53 UTC
FTR: As expected, there is a downside of the change in comment 8 (see comment 20 "I can't be sure that the change would not introduce a regression for similar cases where it used to give better results", which applies universally to any such attempt to improve).

For the sequence 78, 78.1, ...
version 7.0 and later shows 78.4999999999999, while 6.4 had 78.5; in 6.4, the first problematic result was 78,9999999999999.

The reason is that after commit 8f46501233c164ff91d77a7f5adf74ea16cd0165, the number 0.099999999999994316 is rounded to 14 decimals, when previously it was rounded to 15 decimals. So before the commit, the increment was ~0.099999999999994, and after the commit it became ~0.09999999999999, i.e. the error has increased.

As expected, there's no silver bullet. At this time, no ideas and incentive how to further improve this, but constructive ideas are welcome *in a new* bug report.
Comment 25 Commit Notification 2021-04-29 14:49:39 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/aa096331ba468f19951e43e2550105d4dcf50053

tdf#141970 Revert "tdf#129606: Round the mean of the two subtractions"

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 26 Commit Notification 2021-04-29 19:32:56 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-7-1":

https://git.libreoffice.org/core/commit/80d111c948d1553e9e0015738289b471cfe137f1

tdf#141970 Revert "tdf#129606: Round the mean of the two subtractions"

It will be available in 7.1.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.