Bug 132983 - Automatically calculate known variance values for Z-test
Summary: Automatically calculate known variance values for Z-test
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.4.2.2 release
Hardware: x86-64 (AMD64) Windows (All)
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Data-Statistics
  Show dependency treegraph
 
Reported: 2020-05-12 16:15 UTC by jfwpublic
Modified: 2023-04-03 04:51 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
spreadsheet of data and running LO data|statistic|z-test (91.09 KB, application/vnd.oasis.opendocument.spreadsheet)
2020-05-12 16:19 UTC, jfwpublic
Details
F-test result table (114.42 KB, image/png)
2020-09-05 12:56 UTC, Felipe Viggiano
Details
Z-test result table (56.15 KB, image/png)
2020-09-05 12:56 UTC, Felipe Viggiano
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jfwpublic 2020-05-12 16:15:08 UTC
Description:
Air Quality data, two sites with identical latitude and longitude. testing data from 2007 to 2020 by year to compare reported readings. Used t-pair and as number of readings each year exceeds 30 also using z-test.


Steps to Reproduce:
1.highlight the two columns of data eg all rows where year in 2008
2.select data|statistics|z-test
3.enter destination cell and enter

4. save the file after each year

Actual Results:
z	#DIV/0!
z-test		
Alpha	0.05	
Hypothesised Mean Difference	0	
	Variable 1	Variable 2
Known Variance	0	0
Mean	17.2256168965517	18.5240762068965
Observations	290	290
Observed Mean Difference	-1.29845931034482	
differs according to year
P (Z<=z) one-tail	#DIV/0!
z Critical one-tail	1.64485362695147
P (Z<=z) two-tail	#DIV/0!
z Critical two-tail	1.95996398454005
repeats for all years 

Expected Results:
?
Not able to manually calculate. Do not have minitab, matlab, R etc.


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 6.4.2.2 (x64)
Build ID: 4e471d8c02c9c90f512f7f9ead8875b57fcb1ec3
CPU threads: 2; OS: Windows 10.0 Build 18362; UI render: default; VCL: win; 
Locale: en-GB (en_GB); UI-Language: en-GB
Calc: threaded
the *.ods file attached as the data is in the public domain.
raw data in *.csv file as hourly readings extracted from airqualityengland. loaded into a database, code (object pascal) to determine daily quartile,mean,stddev, SQL to "select" data for the two sites by date, by pollutant, copied data into calc
Comment 1 jfwpublic 2020-05-12 16:19:47 UTC
Created attachment 160732 [details]
spreadsheet of data and running LO data|statistic|z-test
Comment 2 Buovjaga 2020-09-03 12:03:40 UTC
I am ignorant of statistical methods, but I looked into it a bit and asked on documentation list. You are not supposed to leave the "Known variance" values as zero: https://listarchives.libreoffice.org/global/documentation/2020/msg01762.html

The Help and Calc guides should be updated to make this clear.
Comment 3 Felipe Viggiano 2020-09-05 03:54:37 UTC
Doing a little research on google and found this website doing some explanation about Z-Test:
https://www.solver.com/z-test-two-sample-means

When running the example in Calc and manually inserting the variances in the cells of the Z-Test results, the output was identical to the example on the website.

So, the known variance should not be zero, and Calc should be able to calculate it from the data given in the dialog.

I' not a developer myself (just starting to learn), but looking in the codes (core/sc/source/ui/StatisticsDialogs/ZTestDialog.cxx), and it appears that the Know Variance values are set to zero:

// Known Variance
    aOutput.writeString(ScResId(STR_ZTEST_KNOWN_VARIANCE));
    aOutput.nextColumn();
    aOutput.writeValue(0); <------
    aTemplate.autoReplaceAddress("%KNOWN_VARIANCE_VARIABLE1%", aOutput.current());
    aOutput.nextColumn();
    aOutput.writeValue(0); <------
    aTemplate.autoReplaceAddress("%KNOWN_VARIANCE_VARIABLE2%", aOutput.current());
    aOutput.newLine();




Perhaps if the output was changed to something like:

    aTemplate.setTemplate("=VAR(%VARIABLE1_RANGE%)");
    aTemplate.autoReplaceAddress("%VAR_VARIABLE1%", aOutput.current());
    aOutput.writeFormula(aTemplate.getTemplate());

and 

   aTemplate.setTemplate("=VAR(%VARIABLE2_RANGE%)");
    aTemplate.autoReplaceAddress("%VAR_VARIABLE2%", aOutput.current());
    aOutput.writeFormula(aTemplate.getTemplate());


I believe this could remove the #DIV/0! error.
Comment 4 Felipe Viggiano 2020-09-05 03:57:23 UTC
Complementing the comment above, this issue is also presente in LO 7.0.

Version: 7.0.0.3
Build ID: 8061b3e9204bef6b321a21033174034a5e2ea88e
CPU threads: 4; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-US (pt_BR.UTF-8); UI: en-US
Calc: threaded
Comment 5 Buovjaga 2020-09-05 05:36:34 UTC
(In reply to Felipe Viggiano from comment #3)
> So, the known variance should not be zero, and Calc should be able to
> calculate it from the data given in the dialog.

By this proposal, do you mean that additional fields should be added into the dialog like exist in Excel: https://www.dummies.com/software/microsoft-office/excel/how-to-perform-z-test-calculations-in-excel/

Or do you mean something else?
Comment 6 Felipe Viggiano 2020-09-05 12:55:11 UTC
(In reply to Buovjaga from comment #5)
 
> By this proposal, do you mean that additional fields should be added into
> the dialog like exist in Excel:
> https://www.dummies.com/software/microsoft-office/excel/how-to-perform-z-
> test-calculations-in-excel/
> 
> Or do you mean something else?

Actually I was thinkking in simply insert the variance caculation in the ¨Known variance¨ cells on the results table, instead of setting it to zero. This is already done in the F-test function.
(core/sc/source/ui/StatisticsDialogs/FTestDialog.cxx --> lines 88 to 97) I have uploaded some images of the F-test and the Z-test results. Note the the variance id calculate on the F-test but in the Z-test it is set to zero.


But, if a change in the dialog could be done, that would be even better.
In this case, I would suggest inserting a condition where Calc calculate the variance from the data given if the user keep a null value in the known variance fields.
Comment 7 Felipe Viggiano 2020-09-05 12:56:22 UTC
Created attachment 165179 [details]
F-test result table
Comment 8 Felipe Viggiano 2020-09-05 12:56:50 UTC
Created attachment 165180 [details]
Z-test result table
Comment 9 Buovjaga 2020-09-05 13:18:18 UTC
Ok, let's repurpose this report.
Comment 10 Felipe Viggiano 2023-03-19 00:18:07 UTC
This situation is still going on on LibreOffice 7.4.

When using the z-test feature, LibreOffice is still failing to calculate the variance of the dataset. It continues to return a a zero value for the variance of both inputs, when it should be using a formula like =VAR(Input).

To reproduce:

Insert two columns of random data (about 10 lines is fine). Go to the Z-test feature in Data/Statistics and perform a Z-test using those two.



Version: 7.4.6.2
Build ID: 40(Build:2)
CPU threads: 8; OS: Linux 6.1; UI render: default; VCL: gtk3
Locale: pt-BR (pt_BR.UTF-8); UI: pt-BR
Calc: threaded