Bug 140691 - calc: calculation: sequence of operations neglected in many tasks, only mathematical better results should be allowed if calc differs from ex$el | was: sum of range: (and similar) - no compatibility with ex$el reg. ordering of operands
Summary: calc: calculation: sequence of operations neglected in many tasks, only mathe...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.5.1 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-27 01:26 UTC by b.
Modified: 2021-04-27 10:46 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
140691_testsheet_01.xls file showing evil effect of different handling the order of operands in a range by calc, ex$el, in the sheet, and in and their statusbars / auto-sum (12.50 KB, application/vnd.ms-excel)
2021-04-04 17:18 UTC, b.
Details
140691_testsheet_0.xlsx file showing evil effect of different handling the order of operands in a range by calc, ex$el, in the sheet, and in and their statusbars / auto-sum (7.94 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2021-04-04 17:24 UTC, b.
Details
a_file_with_some_calc_only_processing_sequence_problems (18.31 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-04-21 11:23 UTC, b.
Details
sample comparing different operand sequences affecting the sum (14.28 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2021-04-22 18:17 UTC, b.
Details

Note You need to log in before you can comment on or make changes to this bug.
Description b. 2021-02-27 01:26:05 UTC
Description:
spinoff from tdf#55960: 

ex$el (starting with upper left cell in sheet and calculating left->right, then top->down, oczidental style) and calc (starting with bottom right cell in sheet and calculating bottom->up and then right->left, oriental style) are different in simple calculations which are sensitive to the ordering of operands (it's depending on the operands), e.g.
sum(1234; 0,12; -1234) is sensitive), 
it's not a big problem, might bring bridges down or crash airplanes somewhere far away, not a big a problem, but it's a little annoying as some people in some points stated that calc will try to be as good as ex$el or better, and thus the better choice for users ... for that goal and to ease migration for users it should be as easy as possible, and should not! inject errors when you import sheets from one system into the other ...  but it does ... 

citation from tdf#55960: '(it could only be possible if we know exact sequence of the calculations they perform, like if they do it from left-to-right then top-to-bottom, or top-to-bottom then left-to-right, or any other way for a given argument number of SUM)' - ex$el calculates: 
- '=Lx+My+Nz': in the order Lx, My, Nz, as written in the formula, no matter where the cells are positioned in the sheet, imho calc does the same, 
- '=sum(Ax:Cy)': top row from left to right then next row, imho different to calc which acts rightmost column bottom up, then nextleft column, i 'assume' oriental influence, and i'm astonished that nobody mentioned that earlier, as it indeed introduces compatibility issues, 
ex$el does not: do any smart sorting of operands reg. magnitude or similar as sometimes somewhere assumed, neither does calc (afaik), 
(above tests with ex$el 2010 ver 14.0.6023.1000 64-bit) 'home and business', 

happy hacking ... :-) 

b. 


Steps to Reproduce:
1. guess how to test, 
2. found a method: test and see, 
3. not found: keep trying or ask me, 

Actual Results:
calculation order different, 

Expected Results:
identical computation sequence to keep compatibility and avoid irritations / fails, 


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 7.2.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 62dff2844b0bf1d1bcb8eb4d6db529ef4a31bee4
CPU threads: 8; OS: Windows 6.1 Service Pack 1 Build 7601; UI render: default; VCL: win
Locale: de-DE (de_DE); UI: en-US
Calc: 

didn't test versions prior 4.1.6.2, assume 'inherited', 
didn't test linux or mac, assume 'all', 
didn't reset user profile, assume apparent on more then one system -> not 'local corruption',
Comment 1 b. 2021-03-12 19:25:23 UTC
have written a little more about compatibility problems there: tdf#109189
Comment 2 Xisco Faulí 2021-03-31 08:53:57 UTC
Thank you for reporting the bug. Please attach a sample document, as this makes it easier for us to verify the bug. 
I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the requested document is provided.
(Please note that the attachment will be public, remove any sensitive information before attaching it. 
See https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F for help on how to do so.)
Comment 3 b. 2021-04-04 17:18:30 UTC
Created attachment 170951 [details]
140691_testsheet_01.xls file showing evil effect of different handling the order of operands in a range by calc, ex$el, in the sheet, and in and their statusbars / auto-sum

@xisco: yes sir! yes! 

attaching *.xls and *.xlsx as i found no way to preserve the formulas in an *.ods when opening with ex$el, explanations in the sheets, should be macro and virus free ...
Comment 4 b. 2021-04-04 17:24:22 UTC
Created attachment 170952 [details]
140691_testsheet_0.xlsx file showing evil effect of different handling the order of operands in a range by calc, ex$el, in the sheet, and in and their statusbars / auto-sum

@xisco: yes sir! yes! 

attaching *.xls and *.xlsx as i found no way to preserve the formulas in an *.ods when opening with ex$el, explanations in the sheets, should be macro and virus free ...
Comment 5 QA Administrators 2021-04-05 03:54:28 UTC Comment hidden (obsolete)
Comment 6 Mike Kaganski 2021-04-20 11:59:24 UTC
This must be WONTFIX.

We do not try to follow the order of evaluation that Excel does, when it's unspecified in any standard. They may have different data layout that makes their specific order more efficient, and our data layout may make another order preferred; also they may change it tomorrow, without notice. It's for purpose that various standards do not define the order of evaluation.

And then, if you think that it would "improve compatibility", then you are breaking compatibility with our documents. Then, what about Excel following the order used in Calc when opening ODS?

It's just splitting hairs, and is absolutely useless to pursue this. As said in the bug 55960 comment 22 cited here in comment 0, we only worry about real problems, and suggestions that allow to improve calculations. E.g., Excel gives 0 for SUM(1e20;-1;-1e20) and 1 for SUM(1e20;-1e20;-1), and it would be just fine for Calc to start producing proper -1 for both; trying to mimic Excel here is *not* a goal (which would be, if we accept this bug).
Comment 7 b. 2021-04-21 06:37:03 UTC
@Mike Kaganski: 

i won't accept a 'wontfix', 

i'm aware that it's a difficult thing, but it's a general problem which needs care and awareness, what - imho - was neglected yet, 

ex$el compatibility might be no goal for you, but most users demand it, and it's been a decisive argument in many other questions, if calc stays away from it in this respect it should be clearly communicated to the developers, supporters and users, 

as far as it's a matter of 'standards' it's a clear weakness of them, and should be argued about to change and how to define, 

as far as there are three concepts conflicting with each other (backward compatibility to calc vs. ex$el compatibility vs. progress and mathematical correctness) ... there is hardly another way than to wake awareness for that and implement three different modi, which each address one target without compromising, 

as far as you suggest to aim at mathematically correct results as a concept against lazy compromises - which you often liked to wipe aside as too exotic or too difficult - i'm fully behind you, changing subject, two tips: 
- this would also affect the backwards compatibility inside calc, 
- have a look at gnumeric, they have solved this point better, 

> This must be WONTFIX.
- see above, 

> We do not try to follow the order of evaluation that Excel does, when it's unspecified in any standard. 
- i know it's hard for an arrogant person like you, but pls. consider that in some way ex$el is a 'de facto standard' in the world we have, 

> They may have different data layout that makes their specific order more efficient, and our data layout may make another order preferred; 
- yeah, and it's time to get that noticed and be aware of, 

> also they may change it tomorrow, without notice. 
- i can't believe you really believe that, they are! ... 'somewhat problematic' ... but not stupid enough to shock all their users and break their! backward compatibility, 

> It's for purpose that various standards do not define the order of evaluation.
- on purpose? too difficult so far? or just forgotten? - if you say 'on purpose': 1. evidence please, 2. what purpose? 

> And then, if you think that it would "improve compatibility", then you are breaking compatibility with our documents. Then, what about Excel following the order used in Calc when opening ODS?
- i admit it's a difficult thing, i'm yet short to know the ultima ratio, but i know it's relevant to care for, 
- imho it's not a problem of the file, but of calculation sequence at runtime, as of now i expect the results stored in the files to be different if saved by calc or ex$el, 

> It's just splitting hairs, and is absolutely useless to pursue this. 
- no, it would reduce plenty of problems we - and the users - are facing today, at least being aware of such things, 

> As said in the bug 55960 comment 22 cited here in comment 0, we only worry about real problems, 
- it is! a real problem, 
  - as it affects results, and 
  - as you haven't been aware of it, 

> and suggestions that allow to improve calculations. 
- i hope you won't start citing Adolf Hitler? who once spouted such nonsense as: 'Only the one who can solve a task better is entitled to criticism.' 
no, we all make better progress if even the people who can not yet eliminate errors are allowed to point them out. 

> E.g., Excel gives 0 for SUM(1e20;-1;-1e20) and 1 for SUM(1e20;-1e20;-1), and it would be just fine for Calc to start producing proper -1 for both; 
- that sentence gives some hope you change your stubborn mantra 'fp-math is imprecise, it has to stay like that' (i know that you worked for progress, need not to mention, but you humiliate and demotivate other people who try to do this to an extent ... well, we've talked about this before ... shake you ...)  

> trying to mimic Excel here is *not* a goal (which would be, if we accept this bug).
- it's a quite easy 5 step concept: 
1. users demand what they understand, 
2. they would understand correct math, 
3. they would accept wrong result like ex$el, 
4. they will complain about wrong results different than ex$el, thus: 
5. to avoid 4. while you can not reach 2. try 3. (reaching 2. would be better ...) 

doe's 'needsDevEval' say 'easyhack'? then pls. remove, thought 'evaluation by devs'
Comment 8 Mike Kaganski 2021-04-21 07:32:03 UTC
(In reply to b. from comment #7)
> as far as it's a matter of 'standards' it's a clear weakness of them, and
> should be argued about to change and how to define, 
> > They may have different data layout that makes their specific order more efficient, and our data layout may make another order preferred; 
> - yeah, and it's time to get that noticed and be aware of, 

I don't see what you are trying to tell with this "it's time to get that noticed and be aware of". You can't be aware of the undocumented data layout in a closed-source program like Excel. You can't rely on it even if you somehow made assumptions and reverse-engineered that.

> > also they may change it tomorrow, without notice. 
> - i can't believe you really believe that, they are! ... 'somewhat
> problematic' ... but not stupid enough to shock all their users and break
> their! backward compatibility, 

No that will not break "their! backward compatibility". It would of course need to keep the major picture (e.g., it would be illegal to return 0 where they returned 1), but it is OK for any spreadsheet application to change the result in 15th significant decimal, if that is a justified change.

> > It's for purpose that various standards do not define the order of evaluation.
> - on purpose? too difficult so far? or just forgotten? - if you say 'on
> purpose': 1. evidence please, 2. what purpose? 

E.g., a purpose to allow implementers to use different, unknown in advance, optimization strategies (like parallel access). Spreadsheets are well-espablished, and didn't appear yesterday. They have international standards with several revisions, each approved by international bodies of experts. They all know what order of evaluation is (such order is always defined e.g. in most programming languages, and people there in TCs are perfectly aware of pros and contras of strict specification of that). It's just not reasonable to imagine that omission of this spec in all published  standards is "shortcoming" or "omission"; it was mentioned in many places like [1], and is not a well-hidden secret.

> > And then, if you think that it would "improve compatibility", then you are breaking compatibility with our documents. Then, what about Excel following the order used in Calc when opening ODS?
> - imho it's not a problem of the file, but of calculation sequence at
> runtime, as of now i expect the results stored in the files to be different
> if saved by calc or ex$el, 

It's a problem of *standard*. If you assume that e.g. XLS(X) formulas need to have some *specific* order of evaluation, then it would be normal that MS defined it; but then the ODS spreadsheets are a separate thing, and they have features not present in XLSX - so following the train of thought, the order of evaluation of arguments in *that* file format would be something that *its* author (OOO and later LO) defined, so - following it further - MS Excel should now start to evaluate formulas differently when it opens ODS.

> > and suggestions that allow to improve calculations. 
> - i hope you won't start citing Adolf Hitler? who once spouted such nonsense
> as: 'Only the one who can solve a task better is entitled to criticism.' 
> no, we all make better progress if even the people who can not yet eliminate
> errors are allowed to point them out. 

You are attacking personally, putting words that I didn't tell into my mouth. I never wrote that only those who can change can suggest, it's your imagination that told you that. A suggestion "let's calculate SUM(1e20;-1;-1E20) to correct -1" *is* a suggestion that allows to improve calculations, and suggesting it does not require a CS degree; but "let's make LO follow unspecified order of evaluation of a closed-source application" is not.

[1] https://www.sciencedirect.com/science/article/pii/S2590118420300204
Comment 9 b. 2021-04-21 11:23:33 UTC
Created attachment 171329 [details]
a_file_with_some_calc_only_processing_sequence_problems

it's not only ex$el compatibility, but already calc in itself having problems with different processing order, see e.g. the statusbar deviations, and attached sample, 

the red marked results are different than one would expect, 'sum' and 'sumproduct' calculate in different order, without that the pairs of results should at least be identic, 

i put that first as it might point into the right direction: 

i found something about sequences in the OASIS doc: https://docs.oasis-open.org/office/OpenDocument/v1.3/cs02/part4-formula/OpenDocument-v1.3-cs02-part4-formula.html#Operators : 

"4.11.12 Sequences (NumberSequence, NumberSequenceList, DateSequence, LogicalSequence, and ComplexSequence)

Inside a sheet, it is implementation-defined as to whether the values are processed row-at-a-time or column-at-a-time, but it shall be one of these two processing orders. If processing row-at-a-time, the sequence shall be produced by processing each row in turn, from smallest to largest column value (e.g., A1, B1, C1). If processing column-at-a-time, the sequence shall be produced by processing each column at a time, from the smallest to the largest row value (e.g., A1, A2, A3)." 

can't say if that is applicable here or might only be used as a hint that 'processing sequence matters' and 'there are ideas about guidelines', i ask that the more experienced developers check that. 

if that applies it's meaningful, but should - imho - be refined as it matters if you calculate a range row by row or column by column, results may be different. if one cannot agree to a standard give it to the user, make processing: 
*(A1:C3) -> A1, B1, C1, A2, B2, C2, A3, B3, C3,* 
(C1:A3) -> C1, B1, A1, C2, B2, A2, C3, B3, A3,
(A3:C1) -> A3, B3, C3, A2, B2, C2, A1, B1, C1,
(C3:A1) -> C3, B3, A3, C2, B2, A2, C1, B1, A1,
*(1A:3C) -> A1, A2, A3, B1, B2, B3, C1, C2, C3,* 
(3A:1C) -> A3, A2, A1, B3, B2, B1, C3, C2, C1, 
(1C:3A) -> C1, C2, C3, B1, B2, B3, A1, A2, A3, 
(3C:1A) -> C3, C2, C1, B3, B2, B1, A3, A2, A1, 


(In reply to Mike Kaganski from comment #8)
> (In reply to b. from comment #7)
> > as far as it's a matter of 'standards' it's a clear weakness of them, and
...
> > - yeah, and it's time to get that noticed and be aware of, 
> 
> I don't see what you are trying to tell with this "it's time to get that
> noticed and be aware of". You can't be aware of the undocumented data layout
> in a closed-source program like Excel. You can't rely on it even if you
> somehow made assumptions and reverse-engineered that.

it's three things: 
1. be aware of calculation sequence issues, 
2. if we try to position LO calc as a good substitute for ex$el, there are a couple of requirements, 
3. aiming compatibility to *.xls and *.xlsx file format relies on reverse engineered assumptions and is subject to power to change by M$ as well, 


> result in 15th significant decimal, if that is a justified change.
- regarding operation sequence / operand ordering besides mostly only injecting small irritations we are talking about the risk of much bigger differences up to total fail, 


> E.g., a purpose to allow implementers to use different, unknown in advance,
> optimization strategies (like parallel access). Spreadsheets are
> well-espablished, and didn't appear yesterday. They have international
> standards with several revisions, each approved by international bodies of
> experts. They all know what order of evaluation is (such order is always
> defined e.g. in most programming languages, and people there in TCs are
> perfectly aware of pros and contras of strict specification of that). It's
> just not reasonable to imagine that omission of this spec in all published 
> standards is "shortcoming" or "omission"; it was mentioned in many places
> like [1], and is not a well-hidden secret.

will read [1] in detail, on a first glance i'd say they are not a normative committe and talking with too simple scope not taking round-off and cancellation into account ... 


> It's a problem of *standard*. If you assume that e.g. XLS(X) formulas need
> to have some *specific* order of evaluation, then it would be normal that MS
> defined it; but then the ODS spreadsheets are a separate thing, and they
> have features not present in XLSX - so following the train of thought, the
> order of evaluation of arguments in *that* file format would be something
> that *its* author (OOO and later LO) defined, so - following it further - MS
> Excel should now start to evaluate formulas differently when it opens ODS.

imho - but am at the beginning to think about - it's less a point of the file or file format, but of the interpretation of arguments to a function by the program processing the function, i also thought about defining file format related processing sequences, think that would make things worse than better ??? 


> > > and suggestions that allow to improve calculations. 
> > - i hope you won't start citing Adolf Hitler? who once spouted such nonsense
> > as: 'Only the one who can solve a task better is entitled to criticism.' 
> > no, we all make better progress if even the people who can not yet eliminate
> > errors are allowed to point them out. 
> 
> You are attacking personally, 
don't be so / too sensitive, after i've been called an idiot i'm allowed to express myself concretely, 

and yes, you are right, no, you did not quote A.H., but 'we accept only this and that' is close to his egocentric way, that's why I warned to repeat errors of him before! it happens. 


> make LO follow unspecified order of evaluation of a closed-source
> application" is not.
that's M.K. making the same fault he just critizied at others, i never said that, i said i know the - actual - M$ processing sequence and know that it's different to calc, know that it's injecting problems and propose to discuss that, it's not my fault that calc is a excel clone in many respects, i didn't design or program it, i pinpoint some weaknesses, and that's as well that the processing order is problematic for compatibility, as well as it's a problem in calc itself, as well that it's obviously neglected by the programmers and devs and you! so far.  

and instead of meaningful discussions and bringing in your knowledge to benefit the cause, you make stupid statements, and then can't take the answers
Comment 11 b. 2021-04-22 18:17:42 UTC
Created attachment 171360 [details]
sample comparing different operand sequences affecting the sum

did a little more investigation, and found something what blocks consistent sheets in calc ... 

calc has no fixed order how it calculates ranges, 

from experiments with three operands i concluded column by column from right to left and in the column bottom up, 

but attached sheet tells me - already with 4 operands - it's somehow not a fixed one-way work through the stack, and additional reacting on free cells between the operands, 

this blocks: 

ex$el compatibility, 

any efforts to gain better results by operand ordering, what can! be done in ex$el, 

thus in this point calc is weak against excel, and weak in itself, which may contribute to plenty other irritations,
Comment 12 b. 2021-04-26 19:50:15 UTC
hello @Regina Henschel, hello @erAck, 

hope it's ok cc-ing you, i see @erAck as senior programmer and remember @Regina as interested in and with knowledge about Standards, 

i see a problem with operand ordering, have discussed in private with @Mike Kaganski, we can agree: 

- there is a rule in which order operands of e.g. sums over ranges have to be processed: https://docs.oasis-open.org/office/OpenDocument/v1.3/cs02/part4-formula/OpenDocument-v1.3-cs02-part4-formula.html#__RefHeading__1432166_1318037074, there section '4.11.12 Sequences', 

- calc disregards this rule by calculating e.g. '=SUM(A1:A3)' not in the order A1 + A2 + A3, 

- the calculation order is important because different sequences of the same operands sometimes produce different results (non-associativity), 

insofar 'not standard compliant', 'not ex$el compatible' and 'difficult to control for users because results are not predictable', 

we disagree if the argument list of a formula has to be processed in a certain order, 

Example: '=RAWSUBTRACT(A1; A2; A3)' calculates calc as 'A1 - A3 - A2', 

I think it makes sense to make this intuitive for the users or 'according to normal mathematical logic', also I see 'When processing a ReferenceList, the references are processed in order (first, second if any, and so on).' from the standard applicable here, @Mike Kaganski disagrees. 

If it is not required by the standard so far I plead for it to be included in future versions, it just makes sense. 

Further one could / should include in the standard: 'the prescribed processing order may be deviated from if mathematically correct results are achieved with a different order'. 

(something like that is done by gnumeric ...)

And in calc I plead for a work through and adaptation to the standard, simply because it is right. I think we cannot wait for the implementation of 'Kahan' (from which @Mike Kaganski expects a lot) to solve all accuracy problems. that is good for a 'monolithic', 'smooth' stream of values, for a complex sheet with complicated interwoven operations and dependencies it becomes ... complex and complicated. It will not work without storing an additional correction value for almost every cell, this changes the data structure ... larger project. There the activation and use of 'long doubles' would be easier to implement? ... imho ...
Comment 13 Eike Rathke 2021-04-27 10:33:34 UTC
(In reply to b. from comment #12)
> - calc disregards this rule by calculating e.g. '=SUM(A1:A3)' not in the
> order A1 + A2 + A3, 

Is changing with implementation of KahanSum for bug 137679 anyway.

> Example: '=RAWSUBTRACT(A1; A2; A3)' calculates calc as 'A1 - A3 - A2', 

I already mentioned in https://bugs.documentfoundation.org/show_bug.cgi?id=137679#c19 that we can reverse order of processing for RAWSUBTRACT().

> I think it makes sense to make this intuitive for the users or 'according to
> normal mathematical logic', also I see 'When processing a ReferenceList, the
> references are processed in order (first, second if any, and so on).' from
> the standard applicable here, @Mike Kaganski disagrees. 

I agree with Mike disagreeing. A ReferenceList is *not* a list of parameters. A ReferenceList is a type of *one* argument passed where the argument is a list of references. See
https://docs.oasis-open.org/office/OpenDocument/v1.3/cs02/part4-formula/OpenDocument-v1.3-cs02-part4-formula.html#__RefHeading__1017906_715980110
Comment 14 Eike Rathke 2021-04-27 10:46:22 UTC
And please stop your silly childish use of ex$el instead of Excel. I also think everyone would appreciate if you finally after been pointed out for months or years could write actual sentences using some sort of proper grammar and punctuation and write short, precise and concise statements. You're annoying the hell out of readers who still try to grasp your blurbs. Fwiw, I don't read most of it anymore because it wastes my time.