Actual implementation of fast sum is using SSE2 with basic Kahan sum.
However it could be upgraded to AVX512 using Nuemanier sum.
User Profile Reset: No
Version: 22.214.171.124 / LibreOffice Community
Build ID: f6099ecf3d29644b5008cc8f48f42f4a40986e4c
CPU threads: 8; OS: Linux 5.11; UI render: default; VCL: gtk3
Locale: es-ES (en_US.UTF-8); UI: en-US
In the Kahan sum patch .b has pointed me here:
With that new information I believe I should be able to pull this off.
If the info is correct it should be faster and more precise.
It may also be possible using it in scmatrix summation code.
And if some conditions are met would be possible for 7.3 to add great speed improvements to statistical functions.
setting new as IMHO a neccessary enhancement (or a bug of SSE2 module blocking precision),
changing subject to reflect the correct name 'Neumaier',
@Dante, evtl. also have a look in:
(too scientific for me) :-(
would you like to 'take' assign this bug to you?
Right now I'm working over here: https://gerrit.libreoffice.org/c/core/+/115675
I was able to implement Neumanier for SSE2.
Now you're test sheet gives correct output.
AVX512 for now crashes, but is on it's way.
We are using the methods in this order:
If AVX512 is available use it.
If not try with SSE2.
If not continue with just unrolled loop.
dante committed a patch related to this issue.
It has been pushed to "master":
tdf#142307 - Upgrade SSE2 sum to AVX512 sum with Neumaier 1
It will be available in 7.3.0.
The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.
Dante, can you do some test with and without your patch on the same computer? I think it would be interesting to know how much has the calculation accelerated by using AVX
(In reply to Roman Kuznetsov from comment #5)
> Dante, can you do some test with and without your patch on the same
> computer? I think it would be interesting to know how much has the
> calculation accelerated by using AVX
The test can be found in this file:
(It is not yet on opengrok, give it 48 hours)
The original code is the SSE2 version.
So the new and old versions are tested.
But I don't believe this is what you are asking for.
However here you have a performance test (very basic):
However printf does not seem to work.
And it may fail due to statistical fluctuations.
Particularly on server technology.
So can not be merged.
(In reply to dante19031999 from comment #6)
> (In reply to Roman Kuznetsov from comment #5)
> > Dante, can you do some test with and without your patch on the same
> > computer? I think it would be interesting to know how much has the
> > calculation accelerated by using AVX
> But I don't believe this is what you are asking for.
Yeah, I just want to know info like:
"I have a spreadsheet with 1 million cells with data and 100 formulas
It took 1 min for recalculating before
It take 10 sec for recalculating after"
> Yeah, I just want to know info like:
> "I have a spreadsheet with 1 million cells with data and 100 formulas
> It took 1 min for recalculating before
> It take 10 sec for recalculating after"
That depends of the computer, for mine (3*10^6 terms):
Time for sum with NONE: 0.002667 s (default on ARM)
Time for sum with AVX: 0.001426 s (new)
Time for sum with SSE2: 0.001914 s (original)
And I can't tell you about AVX512.
But can't give you much more info. This kind of thing will work on auto generated calc sheets with insane amounts of data.
So expect an improvement of ~ * 1.4 on the sum plus the time spended on the interpreter.