Bug 105847 - SUMIF function gives unexpected / wrong results with custom ranges
Summary: SUMIF function gives unexpected / wrong results with custom ranges
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.2.5.1 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-08 04:28 UTC by Stéphane Guillou (stragu)
Modified: 2017-02-10 01:19 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
testing spreadsheet with number of cases (orange highlights problematic results) (10.26 KB, application/vnd.oasis.opendocument.spreadsheet)
2017-02-08 04:29 UTC, Stéphane Guillou (stragu)
Details
Sample file adapted. (14.28 KB, application/vnd.oasis.opendocument.spreadsheet)
2017-02-09 10:56 UTC, m_a_riosv
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stéphane Guillou (stragu) 2017-02-08 04:28:13 UTC
Description:
Using the SUMIF() function with a "range" and a "sum_range" that do not use the same row coordinates can give unexpected / erroneous results.

Steps to Reproduce:
1. Create a spreadsheet
2. Create a column with values for the test in SUMIF()
3. Create a column with values that will be summed by SUMIF(), i.e. the sum_range
4. Create a formula using SUMIF(), in which the test range and the sum range use different row coordinates (for example, whole column A:A for test range, and partial column B2:B5 for sum range)

Actual Results:  
In most cases, the result given will be unexpected / incorrect. No error is returned.

Interesting exception where results are right: when the test range is the only one that is restricted, and starts at the first row. (See rows 4 and 9 in attached document.)

Expected Results:
- The function should perform the calculation on a restricted range (for example, restrict the test range A:A to A2:A5 if the sum range is B2:B5, or restrict the sum range B:B to B4:B10 if the test range is A4:A10).

OR

- Return an error code if this is not the expected usage of the function.

See in attached document several examples, and a comparison with how the related function SUMIFS() behaves for the same conditions (an error code 502 is returned if the ranges don't match).

Not sure if the severity should stay at "Major" or should be changed to "Critical" as it can result silently result in erroneous calculations.

Testing on different versions (5.3 branch especially) and on other spreadsheet software would be appreciated.


Reproducible: Always

User Profile Reset: No

Additional Info:
Version: 5.2.5.1
Build ID: 1:5.2.5~rc1-0ubuntu1~trusty0
CPU Threads: 2; OS Version: Linux 3.13; UI Render: default; VCL: gtk2; 
Locale: en-GB (en_GB.UTF-8); Calc: group


User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0
Comment 1 Stéphane Guillou (stragu) 2017-02-08 04:29:37 UTC
Created attachment 131001 [details]
testing spreadsheet with number of cases (orange highlights problematic results)
Comment 2 m_a_riosv 2017-02-08 23:17:59 UTC
There is not such bug, SUMIF doesn't take in account the end of the range to sume, in fact you can put =SUMIF(A2:A5; "yes"; B2) and it works fine. That I think it's not compatible with other programs.

But it is not the same for other conditional functions they must have the same range's length.
Comment 3 Stéphane Guillou (stragu) 2017-02-08 23:48:38 UTC
Hi m.a.riosv

Thanks for your comment, although I am not sure I understand what you are saying.

If this is not a bug, could you please explain each of the cases that are highlighted in orange in the example spreadsheet? (rows 2, 6, 7 and 10)
They are very unintuitive results. I am particularly interested in what you think of rows 6 and 7 as they don't even result in a multiple of 2.

When you say "you can put =SUMIF(A2:A5; "yes"; B2) and it works fine", do you mean that the sum_range "B2" is interpreted as "B2:B5"? (In which case, the result "4" would make sense)

Or do you mean that it is useless to define the end of the range for sum_range, and that inputting e.g. "B2:B5" would result in "B5" being completely ignored? If that is the case, again, it is a very unintuitive behaviour, and it would not explain the result "4".

Anyway, neither of those explanations would explain the bulk of the cases presented in the attached spreadsheet.

I am switching it back to unconfirmed as I expect more input from the community, and I don't think your comment explains the behaviour. But I could be missing something obvious...?
Comment 4 m_a_riosv 2017-02-09 10:56:50 UTC
Created attachment 131036 [details]
Sample file adapted.

In reply to stragu from comment #3)
> Hi m.a.riosv
> 
> ......
> 
> When you say "you can put =SUMIF(A2:A5; "yes"; B2) and it works fine", do
> you mean that the sum_range "B2" is interpreted as "B2:B5"? (In which case,
> the result "4" would make sense)
Yes for SUMIF() only
> 
> Or do you mean that it is useless to define the end of the range for
> sum_range, and that inputting e.g. "B2:B5" would result in "B5" being
> completely ignored? If that is the case, again, it is a very unintuitive
> behaviour, and it would not explain the result "4".
Yes for SUMIF() only
> 
> Anyway, neither of those explanations would explain the bulk of the cases
> presented in the attached spreadsheet.
> 
Attached you file with what the formulas are doing and the equivalent for SUMIFS, with all cases.

This site is for bug not for questions, so it's better first do the questions on ask.libreoffice.org, that can be founded on several languages, and then if a bug it's appreciated report it here.
Comment 5 Stéphane Guillou (stragu) 2017-02-10 00:00:47 UTC
thanks for getting back to me. Thanks for the explanations in the spreadsheet.

For some reason, my result column had row 6 and 7 stuck with invalid results and I had to recalculate them (with the F9 key), and that solved the issue with odd numbers.

So, tell me if I understand well how SUMIF() works:

- It compares cells at a specific position *in the defined ranges*, not cells that are aligned in the grid (e.g. for SUMIF(A2:A10, "yes", B1:B9), the cell A2 will be matched to B1);
- If <range> and <sum_range> do not have the same dimension, <sum_range> will be modified to have the same dimensions as <range> (e.g. for SUMIF(A1:A10, "yes", B2:B3), <sum_range> will be interpreted as B2:B11)
- If <range> has coordinates in both axes but <sum_range> is a whole column or row, <sum_range> will be interpreted as the range of same dimension as <range> and starting at the first cell in the column or row (e.g. for SUMIF(A3:A10, "yes", B:B), <sum_range> will be interpreted as B1:B8).

If that is how it works, well OK, but I have to say, it feels very unintuitive (it took me way too much time and a bug report to get my head around it), and the LibreOffice help does not explain that at all, from what I have seen.

Plus, how come SUMIF() interprets the sum_range silently but SUMIFS() throws an error if dimensions are different?
Comment 6 m_a_riosv 2017-02-10 01:19:54 UTC
(In reply to stragu from comment #5)
> thanks for getting back to me. Thanks for the explanations in the
> spreadsheet.
> 
> For some reason, my result column had row 6 and 7 stuck with invalid results
> and I had to recalculate them (with the F9 key), and that solved the issue
> with odd numbers.
> 
> So, tell me if I understand well how SUMIF() works:
> 
> - It compares cells at a specific position *in the defined ranges*, not
> cells that are aligned in the grid (e.g. for SUMIF(A2:A10, "yes", B1:B9),
> the cell A2 will be matched to B1);
Ok.

> - If <range> and <sum_range> do not have the same dimension, <sum_range>
> will be modified to have the same dimensions as <range> (e.g. for
> SUMIF(A1:A10, "yes", B2:B3), <sum_range> will be interpreted as B2:B11)
Ok.

> - If <range> has coordinates in both axes but <sum_range> is a whole column
> or row, <sum_range> will be interpreted as the range of same dimension as
> <range> and starting at the first cell in the column or row (e.g. for
> SUMIF(A3:A10, "yes", B:B), <sum_range> will be interpreted as B1:B8).
Ok. B:B it's only an abbreviation of B1:B1048576

> 
> If that is how it works, well OK, but I have to say, it feels very
> unintuitive (it took me way too much time and a bug report to get my head
> around it), and the LibreOffice help does not explain that at all, from what
> I have seen.
Well for me the only unintitive is when ranges has different lenght. The others only give you more power for calculations.

> 
> Plus, how come SUMIF() interprets the sum_range silently but SUMIFS() throws
> an error if dimensions are different?
I think to avoid incompatibility issues with other spreadsheet apps.