Description: Air Quality data, two sites with identical latitude and longitude. testing data from 2007 to 2020 by year to compare reported readings. Used t-pair and as number of readings each year exceeds 30 also using z-test. Steps to Reproduce: 1.highlight the two columns of data eg all rows where year in 2008 2.select data|statistics|z-test 3.enter destination cell and enter 4. save the file after each year Actual Results: z #DIV/0! z-test Alpha 0.05 Hypothesised Mean Difference 0 Variable 1 Variable 2 Known Variance 0 0 Mean 17.2256168965517 18.5240762068965 Observations 290 290 Observed Mean Difference -1.29845931034482 differs according to year P (Z<=z) one-tail #DIV/0! z Critical one-tail 1.64485362695147 P (Z<=z) two-tail #DIV/0! z Critical two-tail 1.95996398454005 repeats for all years Expected Results: ? Not able to manually calculate. Do not have minitab, matlab, R etc. Reproducible: Always User Profile Reset: No Additional Info: Version: 6.4.2.2 (x64) Build ID: 4e471d8c02c9c90f512f7f9ead8875b57fcb1ec3 CPU threads: 2; OS: Windows 10.0 Build 18362; UI render: default; VCL: win; Locale: en-GB (en_GB); UI-Language: en-GB Calc: threaded the *.ods file attached as the data is in the public domain. raw data in *.csv file as hourly readings extracted from airqualityengland. loaded into a database, code (object pascal) to determine daily quartile,mean,stddev, SQL to "select" data for the two sites by date, by pollutant, copied data into calc
Created attachment 160732 [details] spreadsheet of data and running LO data|statistic|z-test
I am ignorant of statistical methods, but I looked into it a bit and asked on documentation list. You are not supposed to leave the "Known variance" values as zero: https://listarchives.libreoffice.org/global/documentation/2020/msg01762.html The Help and Calc guides should be updated to make this clear.
Doing a little research on google and found this website doing some explanation about Z-Test: https://www.solver.com/z-test-two-sample-means When running the example in Calc and manually inserting the variances in the cells of the Z-Test results, the output was identical to the example on the website. So, the known variance should not be zero, and Calc should be able to calculate it from the data given in the dialog. I' not a developer myself (just starting to learn), but looking in the codes (core/sc/source/ui/StatisticsDialogs/ZTestDialog.cxx), and it appears that the Know Variance values are set to zero: // Known Variance aOutput.writeString(ScResId(STR_ZTEST_KNOWN_VARIANCE)); aOutput.nextColumn(); aOutput.writeValue(0); <------ aTemplate.autoReplaceAddress("%KNOWN_VARIANCE_VARIABLE1%", aOutput.current()); aOutput.nextColumn(); aOutput.writeValue(0); <------ aTemplate.autoReplaceAddress("%KNOWN_VARIANCE_VARIABLE2%", aOutput.current()); aOutput.newLine(); Perhaps if the output was changed to something like: aTemplate.setTemplate("=VAR(%VARIABLE1_RANGE%)"); aTemplate.autoReplaceAddress("%VAR_VARIABLE1%", aOutput.current()); aOutput.writeFormula(aTemplate.getTemplate()); and aTemplate.setTemplate("=VAR(%VARIABLE2_RANGE%)"); aTemplate.autoReplaceAddress("%VAR_VARIABLE2%", aOutput.current()); aOutput.writeFormula(aTemplate.getTemplate()); I believe this could remove the #DIV/0! error.
Complementing the comment above, this issue is also presente in LO 7.0. Version: 7.0.0.3 Build ID: 8061b3e9204bef6b321a21033174034a5e2ea88e CPU threads: 4; OS: Linux 5.4; UI render: default; VCL: gtk3 Locale: en-US (pt_BR.UTF-8); UI: en-US Calc: threaded
(In reply to Felipe Viggiano from comment #3) > So, the known variance should not be zero, and Calc should be able to > calculate it from the data given in the dialog. By this proposal, do you mean that additional fields should be added into the dialog like exist in Excel: https://www.dummies.com/software/microsoft-office/excel/how-to-perform-z-test-calculations-in-excel/ Or do you mean something else?
(In reply to Buovjaga from comment #5) > By this proposal, do you mean that additional fields should be added into > the dialog like exist in Excel: > https://www.dummies.com/software/microsoft-office/excel/how-to-perform-z- > test-calculations-in-excel/ > > Or do you mean something else? Actually I was thinkking in simply insert the variance caculation in the ¨Known variance¨ cells on the results table, instead of setting it to zero. This is already done in the F-test function. (core/sc/source/ui/StatisticsDialogs/FTestDialog.cxx --> lines 88 to 97) I have uploaded some images of the F-test and the Z-test results. Note the the variance id calculate on the F-test but in the Z-test it is set to zero. But, if a change in the dialog could be done, that would be even better. In this case, I would suggest inserting a condition where Calc calculate the variance from the data given if the user keep a null value in the known variance fields.
Created attachment 165179 [details] F-test result table
Created attachment 165180 [details] Z-test result table
Ok, let's repurpose this report.
This situation is still going on on LibreOffice 7.4. When using the z-test feature, LibreOffice is still failing to calculate the variance of the dataset. It continues to return a a zero value for the variance of both inputs, when it should be using a formula like =VAR(Input). To reproduce: Insert two columns of random data (about 10 lines is fine). Go to the Z-test feature in Data/Statistics and perform a Z-test using those two. Version: 7.4.6.2 Build ID: 40(Build:2) CPU threads: 8; OS: Linux 6.1; UI render: default; VCL: gtk3 Locale: pt-BR (pt_BR.UTF-8); UI: pt-BR Calc: threaded