Bug 121103 - ANOVA uses wrong formula when on new sheet
Summary: ANOVA uses wrong formula when on new sheet
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.1.2.1 release
Hardware: All All
: medium normal
Assignee: Jaromír Wysoglad
URL:
Whiteboard: target:6.3.0 target:6.2.4 target:6.1....
Keywords: difficultyBeginner, easyHack, skillCpp, topicDebug
Depends on:
Blocks: Data-Statistics
  Show dependency treegraph
 
Reported: 2018-11-01 20:41 UTC by Sondra Kinsey
Modified: 2019-04-04 15:28 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
Example of ANOVA calculation problem (31.03 KB, application/vnd.oasis.opendocument.spreadsheet)
2019-01-09 19:26 UTC, Libomark
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sondra Kinsey 2018-11-01 20:41:47 UTC
Description:
In LibreOffice 6.1.2.1, when I create a new ANOVA by Menu/Data/Statistics/ANOVA, if the "Results to" field is on a different sheet than the original data, then the Sum of Squares" value for the "Within Groups" comparison is incorrect.
Specifically, the formula for this cell doesn't specify the sheet for the cells referred to. For my data, the formula for this cell was this:
  =SUM(DEVSQ($A$2:$A$30),DEVSQ($B$2:$B$30),DEVSQ($C$2:$C$30),DEVSQ($D$2:$D$30))
when it should have been this:
  =SUM(DEVSQ($Sheet3.$A$2:$A$30),DEVSQ($Sheet3.$B$2:$B$30),DEVSQ($Sheet3.$C$2:$C$30),DEVSQ($Sheet3.$D$2:$D$30))
To the user, this usually appears as a "#DIV/0" error, because $A$2:$A$30 is partially empty in the current ANOVA results sheet. Instead the formula needs to refer to $A$2:$A$30 in the data sheet.


Steps to Reproduce:
1. Select data
2. Menu/Data/Statistics/ANOVA
3. Set "Results to" field of dialog to any cell on a different sheet than the original data.

Actual Results:
The "Sum of Squares" (SS) value for the "Within Groups" comparison is "#DIV/0". The formula refers to the sheet the ANOVA is found on.

Expected Results:
The formula refers to the sheet the originally selected data is found on.


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 m.a.riosv 2018-11-01 22:29:00 UTC
Please rephrase the title, to explain in breve the issue.
Comment 2 Joel Madero 2018-11-20 11:37:06 UTC
We need a simple (very simple) test document. Marking as NEEDINFO, once attachment is added set to UNCONFIRMED. Thank you.
Comment 3 Libomark 2019-01-09 19:26:24 UTC
Created attachment 148180 [details]
Example of ANOVA calculation problem

This is an example sheet illustrating the problem.  Data is in Sheet1.b1:d30

ANOVA (single factor) was performed with Sheet1.G2 as the output start cell: the generated formula for total variation in Sheet1.H13 is incorrect, producing Err:508 - it should simply be SUM(Sheet1.H11:H12) 

ANOVA (single factor) was performed again with Sheet2.B1 as the output start cell.  Cells C11 and C12 on Sheet2 (highlighted in yellow) contain incorrect formulae, as specified above and in the original comment.  This echoes through to dependent cells.

ANOVA was also performed on a two factor basis with Sheet2.B19 as the destination.  This appears to work correctly.
Comment 4 Libomark 2019-01-12 23:20:57 UTC
This should be a very easy fix for anyone who has worked on Scaddins: most of the formulae produced by ANOVA are correct, so there is already a good template for the two that need amending. 

An alternative for the Total cell - Sheet2.C12 in the example file is to replace

=DEVSQ($Sheet1.$B$1:$B$30,$Sheet1.$C$1:$C$30,$Sheet1.$D$1:$D$30)
with
=DEVSQ(($Sheet1.$B$1:$B$30),($Sheet1.$C$1:$C$30),($Sheet1.$D$1:$D$30))

only the extra brackets need inserting - the range and sheet references are correct 

although =SUM($Sheet2.$C$10:$C$11) is simpler as a calculation for the same result

(In this context Sheet2=OutputSheet Sheet1=DataSheet)

Sondra has nailed the amendment needed to the "Withing groups" calculation - refer to the data input sheet in the formula references, rather than omitting it.
Comment 5 Libomark 2019-01-15 02:19:43 UTC
I have discovered that the Err508 problem has been addressed:

https://bugs.documentfoundation.org/show_bug.cgi?id=80583

That still leaves the problem for output on a different sheet to the input data.

The discussion at 80583 is helpful to someone seeking to resolve this bug, because it refers to the code location.
Comment 6 Sondra Kinsey 2019-01-17 21:53:25 UTC
Relevant code is at /sc/source/ui/StatisticsDialogs/AnalysisOfVarianceDialog.cxx

Link: https://gerrit.libreoffice.org/plugins/gitiles/core/+/8e1f198ae521c4b99ce5e8b03bf2c72849badbd2/sc/source/ui/StatisticsDialogs/AnalysisOfVarianceDialog.cxx
Comment 8 Libomark 2019-01-22 23:27:59 UTC
Dennis Francis, author of the new regression statistics interface to be released in 6.2, seems to think that his version (which also provides ANOVA results) works OK:

https://dennisfrancis.wordpress.com/2018/07/03/multivariate-regression-in-calc/#comment-56

He has a slightly different approach to coding, as revealed here:

https://cgit.freedesktop.org/libreoffice/core/commit/?id=b7a02f2bb66b990289eb7f4dc80069d1545179a4
Comment 9 Xisco Faulí 2019-03-21 12:31:52 UTC
Hi Dennis Francis,
I thought you could be interested in this issue...
Is it already implemented by your multivariate regression work ?
Comment 10 Dennis Francis 2019-03-21 13:04:29 UTC
The problem is at https://opengrok.libreoffice.org/xref/core/sc/source/ui/StatisticsDialogs/AnalysisOfVarianceDialog.cxx?r=470a91cb#70 where the ScRefFlags flag is hard-coded to ScRefFlags::RANGE_ABS. Instead it should be decided based on whether the input and output ranges/addresses are in same sheet or not - if they are in the same sheet then ScRefFlags::RANGE_ABS should be fine, else ScRefFlags::RANGE_ABS_3D should be used.

I think this bug can be marked as an easy-hack for the new contributors with the above code-pointer to help them.
Comment 11 Dennis Francis 2019-03-21 13:08:40 UTC
(In reply to Xisco Faulí from comment #9)
> Hi Dennis Francis,
> I thought you could be interested in this issue...
> Is it already implemented by your multivariate regression work ?

Hi Xisco, 

In the regression dialog, it checks if result sheet is same as the input sheet and uses this info to format the addresses/ranges in all formulae written to the result sheet.
Comment 12 Jaromír Wysoglad 2019-03-28 10:18:09 UTC
Hi, I'm quite new here, but if I understand it correctly, the only difference between the ScRefFlags::RANGE_ABS and ScRefFlags::RANGE_ABS_3D is just that the ScRefFlags::RANGE_ABS_3D adds $Sheet to the source cell address. So, wouldn't it be enough to just hard code the value to ScRefFlags::RANGE_ABS_3D and let it add the $Sheet to the address every time? The other ANOVA formulas also reference the sheet every time (even if the results are on the same sheet as the sources).
Comment 13 Dennis Francis 2019-03-28 13:29:16 UTC
(In reply to Jaromír Wysoglad from comment #12)
> Hi, I'm quite new here, but if I understand it correctly, the only
> difference between the ScRefFlags::RANGE_ABS and ScRefFlags::RANGE_ABS_3D is
> just that the ScRefFlags::RANGE_ABS_3D adds $Sheet to the source cell
> address. So, wouldn't it be enough to just hard code the value to
> ScRefFlags::RANGE_ABS_3D and let it add the $Sheet to the address every
> time? The other ANOVA formulas also reference the sheet every time (even if
> the results are on the same sheet as the sources).

Yes sure, here it does not matter if you hardcode the *3D flags as you say. For the purposes of fixing the bug it is just fine.

Generally it is better to use *3D sparingly when input and output sheets are same. Consider a very common practical use-case where the user wants to do same analysis (especially with a dialog with many controls) on say 10 sheets where in each sheet the input data is different but at the same location and has same shape. Using the dialog the user can get the results for sheet1 on sheet1 and then simply copy the output range to all other sheets at same relative location instead of using the dialog 10 times !
Comment 14 Jaromír Wysoglad 2019-03-28 19:15:02 UTC
(In reply to Dennis Francis from comment #13)
> Yes sure, here it does not matter if you hardcode the *3D flags as you say.
> For the purposes of fixing the bug it is just fine.
> 
> Generally it is better to use *3D sparingly when input and output sheets are
> same. Consider a very common practical use-case where the user wants to do
> same analysis (especially with a dialog with many controls) on say 10 sheets
> where in each sheet the input data is different but at the same location and
> has same shape. Using the dialog the user can get the results for sheet1 on
> sheet1 and then simply copy the output range to all other sheets at same
> relative location instead of using the dialog 10 times !

I now understand the benefits of not always using the *3D flag, so I made it to determine if it is needed as you originally sugested.
Comment 15 Eike Rathke 2019-03-29 19:08:38 UTC
(In reply to Dennis Francis from comment #13)
> Generally it is better to use *3D sparingly when input and output sheets are
> same. Consider a very common practical use-case where the user wants to do
> same analysis (especially with a dialog with many controls) on say 10 sheets
> where in each sheet the input data is different but at the same location and
> has same shape. Using the dialog the user can get the results for sheet1 on
> sheet1 and then simply copy the output range to all other sheets at same
> relative location instead of using the dialog 10 times !
As is, that wouldn't work anyway because all other generated formulas pointing into the ANOVA output written use 3D references. So if we wanted to be able to copy&paste to a different sheet all those references would need to be written without 3D.
Comment 16 Commit Notification 2019-04-02 20:20:30 UTC
Jaromir Wysoglad committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/66c08a0d308fd6119460546a906434bbc9d23c2e%5E%21

tdf#121103 ANOVA sum of squares formula correction

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Eike Rathke 2019-04-02 20:21:06 UTC
Pending review https://gerrit.libreoffice.org/70153 for 6-2
Comment 18 Eike Rathke 2019-04-02 20:22:33 UTC
https://gerrit.libreoffice.org/70155 for 6-1
Comment 19 Commit Notification 2019-04-03 09:19:29 UTC
Jaromir Wysoglad committed a patch related to this issue.
It has been pushed to "libreoffice-6-2":

https://git.libreoffice.org/core/+/205f9756f70fe32db169d6f81795f4c66382d014%5E%21

tdf#121103 ANOVA sum of squares formula correction

It will be available in 6.2.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Commit Notification 2019-04-03 09:19:41 UTC
Jaromir Wysoglad committed a patch related to this issue.
It has been pushed to "libreoffice-6-1":

https://git.libreoffice.org/core/+/1dc8088162a66a99217dae3258711cbfa8f9bc28%5E%21

tdf#121103 ANOVA sum of squares formula correction

It will be available in 6.1.6.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Eike Rathke 2019-04-03 09:24:51 UTC
So now that the actual bug is fixed, one could work on a non-3D vs 3D references change, i.e. using 3D references to data only when the calculation is on a different sheet, and within the ANOVA formulas use only non-3D references.
Comment 22 Commit Notification 2019-04-04 15:28:59 UTC
Jaromir Wysoglad committed a patch related to this issue.
It has been pushed to "libreoffice-6-2-3":

https://git.libreoffice.org/core/+/7708861765dae49a60386774970db21e598be7c1%5E%21

tdf#121103 ANOVA sum of squares formula correction

It will be available in 6.2.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.