Bug 137679 - Implement a Kahan summation algorithm to reduce the numerical error in the total
Summary: Implement a Kahan summation algorithm to reduce the numerical error in the total
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard: target:7.2.0
Keywords: difficultyInteresting, easyHack
: 141752 144156 (view as bug list)
Depends on:
Blocks: Calculate
  Show dependency treegraph
 
Reported: 2020-10-22 12:53 UTC by Roman Kuznetsov
Modified: 2021-09-13 07:46 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:


Attachments
Example of the Kahan Sum (24.65 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-05-01 21:43 UTC, dante19031999
Details
testsheet kahan_sum with calc 7.2.0.0.a0+ (31.71 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-05-06 09:04 UTC, b.
Details
testsheet kahan_sum with calc 7.1.4.0.0 (29.15 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-05-06 09:04 UTC, b.
Details
testsheet kahan_sum with gnumeric 1_12_17_portable (16.58 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-05-06 09:14 UTC, b.
Details
testsheet kahan_sum with different versions (14.31 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-05-08 09:47 UTC, b.
Details
testsheet Neumaier Kahan–Babuška working well (84.73 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-05-15 14:52 UTC, b.
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Roman Kuznetsov 2020-10-22 12:53:24 UTC
Description:
Implement a Kahan summation algorithm for reduce the numerical error in the total

https://en.wikipedia.org/wiki/Kahan_summation_algorithm

Eike already wrote about it in https://git.libreoffice.org/core/+/5ac74f15dcdf85faa2ac1cb7317ea77e31fbcd50%5E%21/#F0

It should give an effect for SUM and AVERAGE function

A code point here is https://opengrok.libreoffice.org/xref/core/sc/source/core/tool/interpr6.cxx?r=08e85556&mo=15551&fi=511#511

Steps to Reproduce:
-

Actual Results:
Today we have some problems with a calculation accuracy for floating-point numbers

Expected Results:
We'll have reduce the numerical error in the total for floating-point numbers


Reproducible: Always


User Profile Reset: No



Additional Info:
-
Comment 1 Roman Kuznetsov 2020-10-22 13:15:34 UTC
Noel, Eike, what's your opinion here?
Comment 2 Noel Grandin 2020-10-22 13:21:49 UTC
Sure, you're welcome to work on this
Comment 3 b. 2020-10-22 21:25:49 UTC Comment hidden (off-topic)
Comment 4 Eike Rathke 2020-10-22 22:59:23 UTC Comment hidden (off-topic)
Comment 5 Eike Rathke 2020-10-22 23:17:40 UTC
@Roman: yes, go ahead please.
Comment 6 b. 2020-10-23 08:03:28 UTC Comment hidden (off-topic)
Comment 7 b. 2020-11-23 07:06:14 UTC Comment hidden (off-topic)
Comment 8 Eike Rathke 2020-12-09 21:58:34 UTC Comment hidden (off-topic)
Comment 9 dante19031999 2021-02-09 15:59:05 UTC
Sorry, I could easily introduce code for it. But it would make the code there already is even more impossible tor read. There might be a proper form to do it.
Comment 10 Mike Kaganski 2021-03-16 13:08:27 UTC
(In reply to dante19031999 from comment #9)

It is very unfortunate if the code readability stops you from implementing this. Please don't hesitate to propose a change on gerrit, where we could discuss code structure further.

Thanks!
Comment 11 Buovjaga 2021-03-16 13:17:04 UTC
(In reply to Mike Kaganski from comment #10)
> (In reply to dante19031999 from comment #9)
> 
> It is very unfortunate if the code readability stops you from implementing
> this. Please don't hesitate to propose a change on gerrit, where we could
> discuss code structure further.
> 
> Thanks!

Dante: FYI
Comment 12 TheWebMachine 2021-04-13 15:02:58 UTC
*** Bug 141614 has been marked as a duplicate of this bug. ***
Comment 13 Mike Kaganski 2021-04-19 04:22:19 UTC
*** Bug 141752 has been marked as a duplicate of this bug. ***
Comment 14 b. 2021-04-19 21:31:32 UTC
(In reply to Mike Kaganski from comment #13)
> *** Bug 141752 has been marked as a duplicate of this bug. ***

don't think that fits, 'Kahan' if implemented might solve the other problem, but Kahan is 'not so easy' it'll take some time ... 

in the meantime the rawsubtract-sequence can be solved by other means, e.g. calculation sequence as the subtrahends are numbered ... 

think it might lead to circles if rawsubtract is used - and neccessary? - to calculate the 'kahan artefacts', and one would use kahan inside rawsubtract to avoid deviations from operand ordering ... ??? 

propose unduping,
Comment 15 Mike Kaganski 2021-04-20 05:46:20 UTC
(In reply to b. from comment #14)

I agree that order of RAWSUBTRACT is inconsistent with normal subtraction. As you wrote in bug 141752 comment 0,

=1E+016 - 1 - 1E+016 gives 0, while
=RAWSUBTRACT(1E+016;1;1E+016) gives -1;

=1E+016 - 1E+016 - 1 gives -1, while
=RAWSUBTRACT(1E+016;1E+016;1) gives 0.

This is due to natural processing order of the arguments of a function in Calc, (that is *unspecified* by ODF!) and reversing it is possible here (forming a vector of arguments before performing negations), but is completely useless in view of Kahan summation algorithm (which is not complex by the way, just needs an easy hacker to handle it).

Your bug was actually useful that it highlighted that RAWSUBTRACT is also a subject to the Kahan (or more precisely, improved Kahan–Babuška) algorithm implementation, so also is covered by this bug. Spending even one minute on an alternative "fix" of bug 141752 is just a waste of developers' time , and I disagree with "unduping" (but anyone is still free to suggest a patch with "alternative" fix that would make a single person in the universe happy).
Comment 16 b. 2021-04-20 09:07:13 UTC
(In reply to Mike Kaganski from comment #15)
> (In reply to b. from comment #14)
...
> =1E+016 - 1 - 1E+016 gives 0, while
> =RAWSUBTRACT(1E+016;1;1E+016) gives -1;

good pinpointing, thanks, sometimes i have a feeling about a problem but am short in bringing it precisely 'to the point', 

> This is due to natural processing order of the arguments of a function in
> Calc, 

*uggghhhhh*, that's a 'big point'!, sorry, some of the following is OT for 'Kahan' but 1. have to write it somewhere, 2. it's affecting 'Kahan' too, 

some points, where i can't decide which is more important, 

1. if that - special 'right to left processing' - is really in all functions it might affect more results, (it is in processing of ranges and affects ex$el compatibility, see tdf#140691), 

2. it is - in some respects - effective and intelligent to work that way, but contradicts 'normal occidental way of thinking', and a central point of good programming is not only to be 'somehow circumstantially explainable "right"', but to deliver what the user knows, understands, needs, what best matches his expectations and 'normal mathematics'. 

3. there is! a mathematical convention that operations of the same priority (exponention before 'point calculation' before 'dash calculation'), if not ordered by brackets, are to be processed 'from left to right', isn't it?. 

4. it is questionable whether calc's calculation of a sum '=sum(A3:D3)' in the order D3 + C3 + B3 + A3 is compatible with this convention ... ('=sum(D3:A3)' is not supported by calc, the input is changed arbitrarily).

5. compatibility - it would make sense if different spreadsheets produce 'similar' results, and deviations to 'the big' can be justified with 'mathematically correct' or 'better', but hardly with anything else. 

> (that is *unspecified* by ODF!) 
is a shortcoming as processing sequence affects results, doe's IEEE say something about that? 

> (which is not complex by the way, just needs an easy hacker to handle it).
in principle 'Kahan' is simple, but knitting it in the big and complex code base and all the idiosyncrasies of calc could be difficult and requires a experienced dev, - imho - 

> "alternative" fix that would make a single person in the universe happy).

Please don't always insult me as too idiosyncratic, 'exotic' or too individual, I don't deal with these topics for fun, but so that errors can be eliminated and less people suffer from them and the consequential errors of what the developers were not sufficiently aware of when programming. I could not know beforehand how deep the causes of any problems lie, but i'm sure it's necessary to dig to the bottom. The more problems remain in basic functions the more crap they inject into more complex functions that build on them. 

If you can point me to an concise explanation where is explained how calc works with details about deviations to 'standard math' i'd possibly ask less silly questions ...
Comment 17 Mike Kaganski 2021-04-20 09:11:47 UTC
(In reply to b. from comment #16)
> If you can point me to an concise explanation where is explained how calc
> works with details about deviations to 'standard math' i'd possibly ask less
> silly questions ...

Yes, it's documented in the source code. E.g., RAWSUBTRACT is at https://opengrok.libreoffice.org/xref/core/sc/source/core/tool/interpr6.cxx?r=f3fc16f4&mo=38325&fi=1068#1068
Comment 18 TheWebMachine 2021-04-20 10:29:40 UTC Comment hidden (abusive, spam)
Comment 19 Eike Rathke 2021-04-20 12:16:26 UTC
For RAWSUBTRACT() we could make it such that the arguments are processed from left to right by reversing the stack, which would be more "natural" and expected.
Comment 20 b. 2021-04-21 06:58:51 UTC
(In reply to Mike Kaganski from comment #17)

i have rarely heard more stupid nonsense than declaring code to be its own documentation. i know such approaches used to exist, but in cryptolanguages like C this is silly. in order to analyze idiosyncratic functions or errors of a program, there needs to be an explanation outside the code of what the program is supposed to do, how it is supposed to work, what restrictions it is subject to, and so on. 'we do fp-math as it is historically grown and our dear @Mike Kaganski (doesn't) understand it' is not enough (for me). 

> (In reply to b. from comment #16)
> > If you can point me to an concise explanation where is explained how calc
> > works with details about deviations to 'standard math' i'd possibly ask less
> > silly questions ...
> 
> Yes, it's documented in the source code. E.g., RAWSUBTRACT is at
> https://opengrok.libreoffice.org/xref/core/sc/source/core/tool/interpr6.
> cxx?r=f3fc16f4&mo=38325&fi=1068#1068
Comment 21 Mike Kaganski 2021-04-21 07:05:49 UTC
(In reply to b. from comment #20)

Lol, you clearly try to convert some help from my side into looking bad. No problem :-) Just keep in mind, that:

1. I explained the situation in simple words in comment 15:

> This is due to natural processing order of the arguments of a function in Calc

2. I did not write the code of the function, and I only can point you to what exists;

3. I made some effort to find the only place in existence that has the information that you need;

4. You sound as if you *demand* me to spend my time parsing the code and converting its algorithm into user language, telling that any code pointer is an insult.
Comment 22 b. 2021-04-21 07:13:12 UTC
(In reply to Eike Rathke from comment #19)
> For RAWSUBTRACT() we could make it such that the arguments are processed
> from left to right by reversing the stack, which would be more "natural" and
> expected.

hello @erAck, 
1. think that makes sense, 
2. be aware that there are plenty other functions affected as well, 
3. as having no 'standard' (M.K.) and as it would make sense to have a standard or at least a guideline because it's affecting results ... would you consider start discussing this in the appropriate circles?
Comment 23 Roman Kuznetsov 2021-04-21 08:26:39 UTC
(In reply to b. from comment #22)
> (In reply to Eike Rathke from comment #19)


PLEASE STOP THE DISCUSSION HERE. IF YOU WANT TALK WITH DEVELOPERS THEN WRITE THEM INTO MAILING LIST

HERE IS A SPECIFIC REPORT ABOUT A SPECIFIC REALIZATION OF THE KAHAN'S ALGORITHM. AND IT'S ALL. ALL OTHER THINGS SHOULD BE IN OTHER PLACES
Comment 24 b. 2021-04-21 08:48:03 UTC
hello @Roman Kuznetsov, 

you are right, but no need to shout (capitals), and no need to shout especially at me, it was @Mike Kaganski who pulled another issue here, my comment c#14 was 'that doesn't fit', 

btw. do you make progress with the Kahan thing? 

@Mike Kaganski and @erAck: propose to continue 'operand ordering' / 'processing sequence' in tdf#140691
Comment 25 dante19031999 2021-04-22 16:55:24 UTC
About this issue, what makes the code most illegible is the fact that you are using only one function to perform multiple tasks (sum, average, count, ...) and at same time has to deal with multiple possible input types (double, integer, boolean, text, ...) in a double loop with nested switchs.

Would like to propose an idea:

Without changing the structure of the code, instead of directly implementing Kahan inside the loop, use a kahan summation class. This class contains 3 doubles: the value, the error and the second order error (I believe is enought for general purpose software). This class also overloads the '+' operator which internally handles the sum. Also would have a get double method for returning the final result.

However I'm not an informatic engineer and have no idea of the efficiency impact this would have. So if someone who understands about that kind of issue thinks it's a fine way of implementing it I would propose a patch in gerrit.

This class would also be usable for any other operation with this kind of need.
Comment 26 Mike Kaganski 2021-04-22 18:22:53 UTC
(In reply to dante19031999 from comment #25)

Looks perfectly reasonable - looking forward for the patch! :-)
Comment 27 Roman Kuznetsov 2021-04-27 04:55:57 UTC
I don't know why, but Dante didn't write a link to bug report in his patches:

https://gerrit.libreoffice.org/c/core/+/114567 - added a Kahan summation class

https://gerrit.libreoffice.org/c/core/+/114612 - Use kahan sum for lcl_ named methods in abstract namespace
Comment 28 Commit Notification 2021-04-28 14:32:32 UTC
dante committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/fd4df675dfe1012448285134082f61a0c03a7d15

tdf#137679 Use Kahan summation for scmatrix operations

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 29 Commit Notification 2021-04-28 16:10:07 UTC
dante committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b9b744babd4d2eac73ef3684c28cfefd5b842c11

tdf#137679 Use Kahan summation for interpr1.cxx

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 30 Commit Notification 2021-04-29 05:47:41 UTC
dante committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/bd944fe3812fd9fa5a90e98cdac4a77f1a4e6865

tdf#137679 Use Kahan summation for interpr2.cxx

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 31 Commit Notification 2021-04-30 20:11:22 UTC
dante committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/296367e0a91d0e6169da280d6a5efa83ae56de5d

tdf#137679 Use kahan summation for ScInterpreter::SumProduct

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 32 Commit Notification 2021-04-30 21:31:08 UTC
dante committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/4283fb9d4a6152643364bfe1f98ee1f36aabbb78

tdf#137679 Use kahan summation for ScInterpreter::CalculateSkew

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 33 b. 2021-05-01 19:50:47 UTC
hello @dante, 

lemme say first that i highly appreciate that you care for that stuff !!! 

doe's that work for sums already? experimented with alpha from yesterday and saw some small changes, but e.g. summing 100 times 1E15; 0,1; -1E15 or 1E15; 0,3; -1E15 still produces big deviations ... (12,5 and 25 instead of 10 and 30) ... or did i mess up my system?
Comment 34 dante19031999 2021-05-01 21:42:02 UTC
(In reply to b. from comment #33)
> hello @dante, 
> 
> lemme say first that i highly appreciate that you care for that stuff !!! 
> 
> doe's that work for sums already? experimented with alpha from yesterday and
> saw some small changes, but e.g. summing 100 times 1E15; 0,1; -1E15 or 1E15;
> 0,3; -1E15 still produces big deviations ... (12,5 and 25 instead of 10 and
> 30) ... or did i mess up my system?

I have to say I don't use the alpha version.
So I haven't even idea if it has already been incorporated.
With my build I have made some checks and the results are as expected.
But I have to say I get that kind of results with 7.1 .

Check the odf file I will have attached.
Of course auto-recalculate is disabled so you can see original data.

Checking SUM 1 and SUM 2 the results are perfect. Way better than raw sum.
Those are results with numbers of 2 orders combined.

However as expected SUM 3 will fail.
The cause is we are using Kahan first order instead of second.
The community decided Kahan second order was overkill.
Have to say the test was made to wreck it (numbers of 3 orders combined).

I may add that file as a test.
Comment 35 dante19031999 2021-05-01 21:43:16 UTC
Created attachment 171579 [details]
Example of the Kahan Sum
Comment 36 Commit Notification 2021-05-03 11:50:56 UTC
dante committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/9d7db2b4f09e08a607c8f07db38afa722d98a84f

tdf#137679 Use kahan summation for ScInterpreter::CalculatePearsonCovar

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 37 Commit Notification 2021-05-03 11:55:08 UTC
dante committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/040451e19087d8cb8ff4376166ca8248f909a4e0

tdf#137679 Use Kahan summation for interpr3.cxx (1/2)

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 38 Commit Notification 2021-05-04 18:35:20 UTC
dante committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/94977bb43d8dc91023bee7afa037f6319c36ccc3

tdf#137679 Use kahan summation for ScInterpreter::lcl_IterateInverse

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 39 b. 2021-05-06 09:04:00 UTC
Created attachment 171678 [details]
testsheet kahan_sum with calc 7.2.0.0.a0+

> I have to say I don't use the alpha version. ... But I have to say I get that kind of results with 7.1 .

but your patches are announced as 'for 7.2' ... ??? 

> Check the odf file I will have attached.

did so, see attached files here and with next comment, i have slightly scattered results both between 7.2 and your table and between 7.1.4.0.0 from today and your table, see attached examples, see inserted columns and comments between the results, all calculated without! threaded or openCL, 

> However as expected SUM 3 will fail.
> The cause is we are using Kahan first order instead of second.

i tested with 'gnumeric' for comparison, ver. 1.12.17 win portable from June 2014, they had the problem 'sum over one or more ranges' much better under control already 7 years ago, see attachement to the next but one comment

instead of reinventing the wheel (and getting lost in calc's previous correction efforts?) ... gnumeric is open source and a 'free license', we are allowed to look there and copy?, shouldn't we just do that?
Comment 40 b. 2021-05-06 09:04:55 UTC
Created attachment 171679 [details]
testsheet kahan_sum with calc 7.1.4.0.0
Comment 41 b. 2021-05-06 09:14:24 UTC
Created attachment 171680 [details]
testsheet kahan_sum with gnumeric 1_12_17_portable
Comment 42 Mike Kaganski 2021-05-06 09:26:08 UTC
(In reply to b. from comment #39)
> i tested with 'gnumeric' for comparison, ver. 1.12.17 win portable from June
> 2014, they had the problem 'sum over one or more ranges' much better under
> control already 7 years ago, see attachement to the next but one comment
> 
> instead of reinventing the wheel (and getting lost in calc's previous
> correction efforts?) ... gnumeric is open source and a 'free license', we
> are allowed to look there and copy?, shouldn't we just do that?

No. Gnumeric uses incompatible license, which we can't use in our MPL code. We can't even look into its code for inspiration FWIW.
Comment 43 b. 2021-05-06 11:22:09 UTC Comment hidden (off-topic)
Comment 44 Mike Kaganski 2021-05-06 11:36:08 UTC Comment hidden (off-topic)
Comment 45 b. 2021-05-06 12:03:46 UTC Comment hidden (off-topic)
Comment 46 dante19031999 2021-05-06 13:24:41 UTC
> No. Gnumeric uses incompatible license, which we can't use in our MPL code.
> We can't even look into its code for inspiration FWIW.

I do also think we should let the calculations for the pros.
We gain in precision, speed, less code to maintain and more features.

I have no clue about law.
But if we paste gnumeric in a folder and "isolate" it. Can't we even include a header and call it's functions? What would be the point on making unusable libraries?
Comment 47 Commit Notification 2021-05-06 22:23:14 UTC
dante committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/2b90807fd474cb4e26c80ba5407aca4b66f6d3a2

tdf#137679 Use Kahan summation for interpr5.cxx

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 48 dante19031999 2021-05-07 11:43:52 UTC
I've already done what I could.
On my part it's done.
Comment 49 Commit Notification 2021-05-07 12:30:52 UTC
dante committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/38c88aca77f8c8ef47470723557ddf9e5d9d7b66

tdf#137679 Use Kahan summation for interpr8.cxx

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 50 Eike Rathke 2021-05-07 16:09:28 UTC
(In reply to dante19031999 from comment #48)
> I've already done what I could.
> On my part it's done.
Anything missing in the KahanSum implementation? If not then let's set this bug's status to RESOLVED FIXED.

And thanks for doing the tedious work!

Could you provide a list of affected spreadsheet functions so we can mention it in the release notes?
Comment 51 Roman Kuznetsov 2021-05-07 16:47:53 UTC
(In reply to Eike Rathke from comment #50)
> (In reply to dante19031999 from comment #48)
> > I've already done what I could.
> > On my part it's done.
> Anything missing in the KahanSum implementation? If not then let's set this
> bug's status to RESOLVED FIXED.

I still see some unmerged patches in gerrit
Comment 52 Eike Rathke 2021-05-07 18:28:41 UTC
Yes indeed, https://gerrit.libreoffice.org/q/topic:%22kahansum%22+status:open
Comment 53 dante19031999 2021-05-07 19:34:21 UTC
(In reply to Eike Rathke from comment #52)
> Yes indeed, https://gerrit.libreoffice.org/q/topic:%22kahansum%22+status:open

There are 3 left to merge ...
And it doesn't require many more review.

OK: affected functions:
lcl_GetSumProduct
lcl_GetColumnEuclideanNorm
lcl_TGetColumnEuclideanNorm
lcl_GetColumnMaximumNorm
lcl_GetColumnSumProduct
lcl_TGetColumnSumProduct
lcl_SolveWithUpperRightTriangle
lcl_SolveWithLowerLeftTriangle
lcl_ApplyUpperRightTriangle
lcl_GetMeanOverAll
lcl_CalculateColumnMeans
lcl_CalculateRowMeans
lcl_GetSSresid
lcl_IterResult
lcl_IterateInverse
lcl_MFastMult
lcl_LUP_solve
lcl_GetBinomDistRange
class NumericCellAccumulator
class ScMatrix
class FuncSum
class IterateMatrix ( SUM COUNT COUNT2 PRODUCT SUMSQ AVERAGE )
class ArraySumFunctor ( SSE2 may take a while to review )
ScSumXMY2
ScRawSubtract !!! NOT AFFECTED !!!
ScSumIfs
ScAverageIfs
ScNPV
ScIRR
ScMIRR
ScISPMT
ScDB
ScInterVDB
ScVDB
ScCumIpmt
ScCumPrinc
ScSumProduct
ScHarMean
ScGeoMean
ScStandard
ScAveDev
ScDevSq
ScForecast
SCFTest by after effects of a future patch completely unrelated
ScMatMult
ScChiTest
ScTTest
ScZTest
ScHypGeomDist
ScPoissonDist
ScCritBinom
CalculateSkew
CalculatePearsonCovar
CalculateSlopeIntercept
CalculateSumX2MY2SumX2DY2
CalculateRGPRKP
CalculateTest
IterateParameters
PreprocessDataRange
prefillTrendData
prefillPerIdx
calcAccuracyIndicators
CalcPeriodLen
GetStVarParams
IterateParametersIf
DBIterator
GetDBStVarParams

AND ANY WHICH DEPENDS ON THIS
I believe I haven't left anything to do and also listed everything.
Comment 54 Commit Notification 2021-05-08 07:01:25 UTC
dante committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/5545d8f515a2f7af45f0d2ab3ee40dfbde7fa20f

tdf#137679 Use Kahan summation for interpr3.cxx (2/2)

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 55 b. 2021-05-08 09:47:06 UTC
Created attachment 171782 [details]
testsheet kahan_sum with different versions

I don't see this working well yet.  
  
see attached sheet.  
  
Column A contains 100 times the three values 1E15; 0,1; -1E15.  
  
mathematically the sum over them should be 10.  
  
(difficult in FP-math, but solvable with Kahan summation).  
  
Column B contains partial sums, in cell B300 the sum over all values in A1:A300.   

in cell C1 the value of B300 is copied, in C3:D6 the results calculated by different calc versions and gnumeric.  
  
12.47500000000000 result calc 7.2.0.0.a0+ 2021-05-08  
12.50000000000000 result calc 7.1.4.0.0 2021-05-08  
12.50000000000000 result calc 6.1.6.3  
10.00000000000000 result gnumeric 1.12.17  
  
Status: correct results are possible, calc can't do it yet.  
  
I hope i haven't messed up my system? please check the results. if the pending patches solve the problem just ignore this comment.
Comment 56 dante19031999 2021-05-08 13:00:20 UTC
(In reply to b. from comment #55)
> Created attachment 171782 [details]
> testsheet kahan_sum with different versions
> 
> I don't see this working well yet.  
>   
> see attached sheet.  
>   
> Status: correct results are possible, calc can't do it yet.  
>   
> I hope i haven't messed up my system? please check the results. if the
> pending patches solve the problem just ignore this comment.

No you haven't.
I believe the function in charge of summing columns is this one:
ArraySumFunctor::executeSSE2
However, that patch stills on review because it touches non trivial stuff.
It's the last one, but I believe it will take 1 week -2 months,
So, please wait until this is merged:
https://gerrit.libreoffice.org/c/core/+/115113
And then you should get the same results as the example attachment I've uploaded.
Comment 57 b. 2021-05-08 13:19:50 UTC
@Dante, 

thanks for the explanation, 

and please excuse my impatience, i think 'mathematically correct' is important and i'm glad that something is going on, i'll wait patiently now. 

:-)  :-)  :-)
Comment 58 Commit Notification 2021-05-10 10:21:27 UTC
dante committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/a25c9a2925e64f3adb46a749ce18393aa01b1870

tdf#137679 Use KahanSum for SSE2

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 59 b. 2021-05-11 09:09:33 UTC
hello @Dante, 

previous comment was the 'last needed patch'? 

testing with 7.2 ver. from 2021-05-11 ~04h, i see: 

your sample sheet working, 

mine 'kahan_sum with different versions' still failing (12,475), 

note that we have different operand sequences, you: 
1e15, 0.1, -1e15, 0.1, 1e15, 0.1, -1e15, 0.1, ... while i tested with: 
1e15, 0.1, -1e15, 1e15, 0.1, -1e15, ... 

tested my sheet fine with your data, 
your sheet failing with my data, 

pls. recheck if the mistake is on my side, 

Version: 7.2.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: d7f734db2c078ced3ce08ad58cd816a79abe3bcf
CPU threads: 8; OS: Windows 6.1 Service Pack 1 Build 7601; UI render: default; VCL: win
Locale: de-DE (de_DE); UI: en-US
Calc:
Comment 60 b. 2021-05-11 12:37:41 UTC
add. tests: 

see already '1; 0,1; -1; 1; 0,1; -1; ...' failing, by ~5E-15, 
and all similar sequences up to '1E15; 0,1; -1E15; ...' increasingly, while 
'1E16; 0,1; -1E16; ...' and '1E17; 0,1; -1E17; ...' hold. 

are you at any point trapped by calcs 'built in prettyfiing'?
Comment 61 dante19031999 2021-05-11 13:56:45 UTC
(In reply to b. from comment #60)
> add. tests: 
> 
> see already '1; 0,1; -1; 1; 0,1; -1; ...' failing, by ~5E-15, 
> and all similar sequences up to '1E15; 0,1; -1E15; ...' increasingly, while 
> '1E16; 0,1; -1E16; ...' and '1E17; 0,1; -1E17; ...' hold. 
> 
> are you at any point trapped by calcs 'built in prettyfiing'?

I finally understood what is going on when you told 1E16 and 1E17 hold.
You've got the chance of falling on what I would call a numerically unstable zone.
Check this out: https://en.wikipedia.org/wiki/Double-precision_floating-point_format
It basically says IEEE754 has 15,955 decimals.

Which means, for E16 and E17 you are falling out of range and 0.1 is being summed at the error term. So all the info is perfectly being kept and you get the good result.

However, if you are on E+15, you fall on the sum interval. Which means the last bits of the sum and the error term are modified. That combined with 0.1 not having exact binary representation you are accumulating error because of finite precision. If you change 0.1 by 0.25 or 0.125 you get the good result.

I believe my example worked by pure chance because a unexpected convenient correction of the error because of the order of the sum.

I'm afraid even Kahan Sum has it's limitations. Sorry to disappoint.
Comment 62 Mike Kaganski 2021-05-11 14:07:57 UTC
(In reply to dante19031999 from comment #61)

But yet Kahan has improved the resulting accuracy even in this problematic case *a bit* (yes I realize how insignificant could this be - the deviation becoming 12.475 where it was 12.5; but anyway, introducing Kahan method can be expected to at least not worsen the result, and in most cases improve).

The correct title of the bug reads: "to *reduce* the numerical error". Yes, it is expected that the error is not eliminated completely here, but just reduced where possible, since the method is not a miracle.
Comment 63 Mike Kaganski 2021-05-11 14:09:03 UTC
(In reply to Mike Kaganski from comment #62)
> the deviation becoming 12.475 where it was 12.5

Sorry for the thinko - of course that was meant to be "2.475 where it was 2.5"
Comment 64 Commit Notification 2021-05-15 06:23:52 UTC
dante committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/112c9d4b2d8a6393f52697c928e8ace06782b6e0

tdf#137679 Use KahanSum for Taylor series

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 65 b. 2021-05-15 14:52:04 UTC
Created attachment 172026 [details]
testsheet Neumaier Kahan–Babuška working well

hello @Dante, 

did some 'poking around', and - think - have found proof that better results are possible, 

would you mind having a look into the attached file, explanations in the sheet explanation, 

trying NeumaierSum from https://en.wikipedia.org/wiki/Kahan_summation_algorithm in a calc sheet, works, not 100%, but avoids the '12,45' fail. 

(as with all work / suggestions from me ... recheck at least twice!, i'm human and may fail) 

regards, 

b.
Comment 66 dante19031999 2021-05-15 16:32:01 UTC
(In reply to b. from comment #65)
> Created attachment 172026 [details]
> testsheet Neumaier Kahan–Babuška working well
> 
> hello @Dante, 
> 
> did some 'poking around', and - think - have found proof that better results
> are possible, 
> 
> would you mind having a look into the attached file, explanations in the
> sheet explanation, 
> 
> trying NeumaierSum from
> https://en.wikipedia.org/wiki/Kahan_summation_algorithm in a calc sheet,
> works, not 100%, but avoids the '12,45' fail. 
> 
> (as with all work / suggestions from me ... recheck at least twice!, i'm
> human and may fail) 
> 
> regards, 
> 
> b.

I could say I'm a precision freak and always try to get the best possible result even if at expenses of calculation time. However calc philosophy is the opposite.
I did try to use Klein sum. You can check this first patch:
https://gerrit.libreoffice.org/c/core/+/114567
However it was considered overkill and downgraded to Neumanier sum.

So now I'm left with some doubts. Why Neumanier with calc and Neumanier with C++ differ. I've rechecked all the code involved in the sum and seems legitimate.

My theory is that the problem is in the SSE2 code.
For summing columns in theory you use the SSE2 which uses raw Kahan.
It's here: https://gerrit.libreoffice.org/c/core/+/115113

However using Neumanier in there is beyond my capacities.
Would suggest to create a new tdf for it.
That way someone who understands it could fix it.

Sorry to disappoint again.

This is the sum code:
    void add(double x_i)
    {
        double t = m_fSum + x_i;
        if (std::abs(m_fSum) >= std::abs(x_i))
            m_fError += (m_fSum - t) + x_i;
        else
            m_fError += (x_i - t) + m_fSum;
        m_fSum = t;
    }
Comment 67 b. 2021-05-15 20:41:29 UTC
hello @Dante,  

from a short amateur! look in https://gerrit.libreoffice.org/c/core/+/115113/4/sc/source/core/tool/arraysumSSE2.cxx#30 I would think that: 
- you tried to add 'Neumaier' into 'Kahan' and 'pairwise' instead of replacing the whole construct ... unnecessarily complicated? and that 
- 'err2' and 'err4' have 'fallen by the wayside', i.e. are calculated in 'sum2' and 'sum4' but not evaluated in the rest of the construct ... ??? 

bear with me if i'm mistaken ... 

even if that's not 'the point' I would say 'now we have the fish by the ears' :-)  

> I could say I'm a precision freak and always try to get the best possible result even if at expenses of calculation time. 
:-) 

> However calc philosophy is the opposite. 
:-( 

> I did try to use Klein ... downgraded to Neumanier sum. 
if Neumaier would work it would be 'something', 

> Why Neumanier with calc and Neumanier with C++ differ. I've rechecked all the code involved in the sum and seems legitimate.
:-( but!:  
i have often faced incomprehensible problems in life and in various fields, from the point where i had a working and a non-working way to compare them ... all problems lost. They became 'analyzable'. insofar :-)

> My theory is that the problem is in the SSE2 code.
> ... 
> That way someone who understands it could fix it.
I am hopeful that someone will find it, 

> Sorry to disappoint again.
no, not at all, not in the least, you have moved things forward! great! 

> This is the sum code:
    void add(double x_i)
    {
        double t = m_fSum + x_i;
        if (std::abs(m_fSum) >= std::abs(x_i))
            m_fError += (m_fSum - t) + x_i;
        else
            m_fError += (x_i - t) + m_fSum;
        m_fSum = t;
    }

i'm in good hope you didn't forget to add the accumulated error as a last step (difference between Kahan and Neumaier), and thus can't contribute to the coding part, 
I'm glad if you have looked through my 'in sheet' attempts and also think that they work, then sooner or later a solution will arise ... thank you very much!  


P.S. 1e16 and 1e17 'holding' might have been an error by me taking the wrong dataset, or putting it into the wrong sheet, or messing up 0.1, 1e1, 1e-1, 1, 10 ... or recalc / no autocalc or whatever, retrying it doesn't hold, but holds in gnumeric, 

my general idea is that pretty good accuracy is possible as it's achieved by gnumeric, i've asked Morten how they do and he said 'use an extension of Kahan summation', as @Mike Kaganski is afraid LO might get trouble looking at gnumeric for inspiration i have additional asked if Morten would see a problem in that, not yet answered, will keep you posted.
Comment 68 dante19031999 2021-05-15 21:52:52 UTC
(In reply to b. from comment #67)

> from a short amateur! look in
> https://gerrit.libreoffice.org/c/core/+/115113/4/sc/source/core/tool/
> arraysumSSE2.cxx#30 I would think that: 
> - you tried to add 'Neumaier' into 'Kahan' and 'pairwise' instead of
> replacing the whole construct ... unnecessarily complicated? and that 
> - 'err2' and 'err4' have 'fallen by the wayside', i.e. are calculated in
> 'sum2' and 'sum4' but not evaluated in the rest of the construct ... ??? 

I don't completely understand it. But by the info I found.
If I'm wrong don't hesitate to correct me.

The SSE2 does seem to make use of intel's SSE2 by giving instructions at almost machine level, allowing direct access to the register. I suppose the purpose is to boost speed (by a lot) (wikipedia says some AMD also use it).

We are performing 4 sums at the same time to use an optimization called loop unrolling. It is supposed to speed up results by making the 4 operations at same time. It's usually what modern hardware support.
However I'm not sure if it stills useful. It's generally only worth it if the loop is very simple.

So in order to speed up calculations we enter in the very complicated stuff.
It's quite hardcore to find good documentation to that kind of thing.
Had to guess it by the functions definitions on the header, which by the way is compiler specific.
I'm not even sure it is part of the standard. I don't believe Intel's particular stuff could be part of the standard.

Due to those reasons I had to go for raw Kahan (does not sum the error at the end, so in the last operations the error is ignored: err2 and err4 ).

> i'm in good hope you didn't forget to add the accumulated error as a last
> step (difference between Kahan and Neumaier)

It's there.
And everything was re-checked by expert programmers (while patch review).
So it should be fine.
Comment 69 b. 2021-05-15 23:38:49 UTC
as i'm not familiar with SSE2 ... would that: 

http://blog.zachbjornson.com/2019/08/11/fast-float-summation.html 

and that: 

https://gist.github.com/zbjornson/f90513fcaaee4d6268d26f9c1a0507dd#file-benchmark-cc 

be helpful? 

IMHO that guy knows what he's dealing with ... 

[there might be even more errors in the code, 

(SSE2 is not optimal for Neumaier?)

the calculation of err1 - err4 should be 'dependent on summand size' as in 
        if (std::abs(m_fSum) >= std::abs(x_i))
            m_fError += (m_fSum - t) + x_i;
        else
            m_fError += (x_i - t) + m_fSum;           ??

rows 74-78 should evaluate err2 ?
                          
rows 80-83 should evaluate err4 ?, and calculate a new err3 ?, 

rows 85-88 should evaluate the new err3 ?, and calculate a new err1 ?, 

the new err1 needs to be carried into subsequent calculations ?, either more input to this sum or into the final result?, 

you had to 'truncate' Neumaier to fit it into SSE2? logically errors will occur]
Comment 70 dante19031999 2021-05-16 00:01:41 UTC
> you had to 'truncate' Neumaier to fit it into SSE2? logically errors will
> occur]

It's just Kahan, not Neumanier wich is being used in SSE2 code.
Comment 71 Xisco Faulí 2021-09-13 07:46:01 UTC
*** Bug 144156 has been marked as a duplicate of this bug. ***