Bug 64132 - Calc operators and built-in functions use different algorithm for string-to-number coercion
Summary: Calc operators and built-in functions use different algorithm for string-to-n...
Status: CLOSED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.5.4 release
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: Winfried Donkers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Cell-Formula
  Show dependency treegraph
 
Reported: 2013-05-01 19:16 UTC by Leo
Modified: 2020-06-02 10:26 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Leo 2013-05-01 19:16:07 UTC
In calc, a formula like sum(a1:a3) applied to cells which contain a string results in the string being coerced to 0. However, if the the string contains a number, this coercion is incorrect.

This *is* a bug, since the correct behaviour would be to signal an error indicating the formula has been applied to invalid inputs.

This bug bit when copying numbers from one program into libreoffice/calc. One of the numbers was read as a string by calc, and the resulting column tally was off by ~$400. This wasn't noticed til it was too late...

---
Btw, I discovered after the fact that View->Value Highlighting provides a way to diagnose problems with string->0 coercion.

---
Here is a simple example:

a1: "I am not a number"
a2: =sum(a1:a1)

This may be related to
https://bugs.freedesktop.org/show_bug.cgi?id=59309
Comment 1 m_a_riosv 2013-05-01 21:21:17 UTC
Hi Leo,

(In reply to comment #0)
> In calc, a formula like sum(a1:a3) applied to cells which contain a string
> results in the string being coerced to 0. However, if the the string
> contains a number, this coercion is incorrect.

Under you point of view.
This works as intended and I hope this will never change.

> 
> This *is* a bug, since the correct behaviour would be to signal an error
> indicating the formula has been applied to invalid inputs.

An error result only leads to make unusable the SUM() function for many people like me. 

> 
> This bug bit when copying numbers from one program into libreoffice/calc.
> One of the numbers was read as a string by calc, and the resulting column
> tally was off by ~$400. This wasn't noticed til it was too late...

you have options to make the import in number properly, e.g. with paste special - unformatted text, you can select the right data type.

> 
> ---
> Btw, I discovered after the fact that View->Value Highlighting provides a
> way to diagnose problems with string->0 coercion.
> 
> ---
> Here is a simple example:
> 
> a1: "I am not a number"
> a2: =sum(a1:a1)

Result 0, what it is right, nothing to sum.

> 
> This may be related to
> https://bugs.freedesktop.org/show_bug.cgi?id=59309

I think it is your responsibility have the data in the proper way, numbers are numbers not text.
If you want you can set easily verifications to be sure no text data in the list.
Comment 2 Leo 2013-05-02 00:05:39 UTC
(In reply to comment #1)
> Hi Leo,
> 
> (In reply to comment #0)
> > In calc, a formula like sum(a1:a3) applied to cells which contain a string
> > results in the string being coerced to 0. However, if the the string
> > contains a number, this coercion is incorrect.
> 
> Under you point of view.
> This works as intended and I hope this will never change.

We have opposing desires.

> 
> > 
> > This *is* a bug, since the correct behaviour would be to signal an error
> > indicating the formula has been applied to invalid inputs.
> 
> An error result only leads to make unusable the SUM() function for many
> people like me. 

Look, you are completely missing why this is a bug.

A user copies a "number" like 190.20 from thunderbird or wherever into calc. Calc sees this number as a string and coerces this "number" to 0 in sum, et. al. No warnings, no errors, nothing. That is absolute crap. And how is an unsophisticated user even to know where or why the error happened?

> 
> > 
> > This bug bit when copying numbers from one program into libreoffice/calc.
> > One of the numbers was read as a string by calc, and the resulting column
> > tally was off by ~$400. This wasn't noticed til it was too late...
> 
> you have options to make the import in number properly, e.g. with paste
> special - unformatted text, you can select the right data type.

Let me be clear, I identified this bug because another user came to me telling me what crap libreoffice is, it can't even sum a column of numbers correctly. Who would expect that pasting a damn number into libreoffice office would require such damn contortions as you are suggesting? Come on.

> I think it is your responsibility have the data in the proper way, numbers
> are numbers not text.

Indeed, and when calc, by default, displays a stringified number as a number and *silently* coerces that string to 0, then calc violates your principal. That is the bug report I am filing.

> If you want you can set easily verifications to be sure no text data in the
> list.

So, an unsophisticated calc user can be bitten in the ass by a bug that can be easily fixed if only the user were more sophisticated. Just so the sophisticated user can take advantage of some short cuts that could be easily customised. Come on.
Comment 3 Leo 2013-05-02 00:14:42 UTC
Please excuse the polemical tone of my last reply.

It is a bug for calc to, as a default, display strings consisting of numbers as numbers. And, it is a bug for calc to, as a default, coerce a string that consists of numbers to 0 without signaling a warning or error.
Comment 4 Joel Madero 2014-11-06 21:54:11 UTC
We need to get another voice in here from QA - moving to UNCONFIRMED. If a second person confirms that the reported desire isn't wanted, we should close this as WONTFIX. Else it should be moved to NEW. Thanks all.
Comment 5 raal 2014-11-12 19:16:07 UTC
Calc is inconsistent in conversion text to numbers. When I format A1 as text and write "1", B1 = A1+1, result is 2. When I write formula =SUM(A1), result is 0. In first case automatic conversion text to numbers works, in second case text is not converted to number.

Version: 4.4.0.0.alpha2+
Build ID: 98f9baa5253f7bb8034f148519f31f548b1452fa
TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-11-12_00:13:54

Calc do it the same way as excel.
Comment 6 Robinson Tryon (qubit) 2014-11-12 22:09:58 UTC
(In reply to raal from comment #5)
> Calc is inconsistent in conversion text to numbers. When I format A1 as text
> and write "1", B1 = A1+1, result is 2. When I write formula =SUM(A1), result
> is 0.

CONFIRMED in LO 4.4.0.0.alpha2 + Ubuntu 14.04

To sum up (no pun intended), operators such as "+, -, /, *" coerce numbers formatted as text to their apparent value, whereas functions such as SUM() and AVERAGE() convert all strings to a value of zero.

> In first case automatic conversion text to numbers works, in second
> case text is not converted to number.

In both cases the text is converted to a number, it's just that the built-in functions map all text inputs to zero :P

> Calc do it the same way as excel.

Hmm...that's a good data point, even if I think that the behavior is suboptimal. At the very least, we need to document this nuance of Calc. What version of Excel are you using to test?

Status -> NEW
Comment 7 raal 2014-11-13 18:13:18 UTC
(In reply to Robinson Tryon (qubit) from comment #6)
> (In reply to raal from comment #5)
 What
> version of Excel are you using to test?

Excel 2010
Comment 8 Leo 2014-11-13 19:56:28 UTC
(In reply to Robinson Tryon (qubit) from comment #6)
> (In reply to raal from comment #5)
> > Calc is inconsistent in conversion text to numbers. When I format A1 as text
> > and write "1", B1 = A1+1, result is 2. When I write formula =SUM(A1), result
> > is 0.
> 
> CONFIRMED in LO 4.4.0.0.alpha2 + Ubuntu 14.04
> 
> To sum up (no pun intended), operators such as "+, -, /, *" coerce numbers
> formatted as text to their apparent value, whereas functions such as SUM()
> and AVERAGE() convert all strings to a value of zero.
> 
> > Calc do it the same way as excel.
> 
> Hmm...that's a good data point, even if I think that the behavior is
> suboptimal. At the very least, we need to document this nuance of Calc. What
> version of Excel are you using to test?
> 
> Status -> NEW

Thanks for looking into this.

Excel's bugs shouldn't be replicated in libreoffice, imo.

a1+a2 and sum(a1:a2) should produce the same output. The latter is just an abbreviation for the former.
Comment 9 GerardF 2014-11-13 20:42:22 UTC
There is a new option in 4.3.0 and following version.
This option allow the choice on how strings must be treated in arithmetic operation.

See http://erack.org/blog/archives/40-LibreOffice-4.3-new-Calc-feature-user-selectable-text-conversion-models.html
Comment 10 Robinson Tryon (qubit) 2014-11-14 05:16:25 UTC
(In reply to GerardF from comment #9)
> There is a new option in 4.3.0 and following version.
> This option allow the choice on how strings must be treated in arithmetic
> operation.
> 
> See
> http://erack.org/blog/archives/40-LibreOffice-4.3-new-Calc-feature-user-
> selectable-text-conversion-models.html

Looks interesting, but I couldn't seem to find an option in that config that would make built-in functions convert strings to (non-zero) numbers, so if A1 = 1, then SUM(A1) = 0, even though SUM(0 + A1) = 1.
Comment 11 QA Administrators 2015-12-20 16:21:30 UTC Comment hidden (obsolete)
Comment 12 QA Administrators 2017-01-03 19:51:15 UTC Comment hidden (obsolete)
Comment 13 Leo 2017-01-03 21:36:10 UTC
I can confirm this bug, as originally reported, is in the current version of libreoffice/calc in debian testing:

Version: 5.2.4.1
Build ID: 1:5.2.4~rc1-1
CPU Threads: 4; OS Version: Linux 4.8;
Locale: en-US (en_US.UTF-8); 

The patch proposed in comment 9 does not affect the behavior (should a separate bug report be filed?)
Comment 14 Winfried Donkers 2018-06-22 11:26:40 UTC
(In reply to Leo from comment #13)
> The patch proposed in comment 9 does not affect the behavior (should a
> separate bug report be filed?)

I don't think that is necessary.

It seems taht only SUM is not affected by the option mentioned in comment #9.
Functions like RAWSUBTRACT and AVERAGE do convert the string to a correct number.

I will see if I can find out what's wrong with SUM and what can be done about this.
Comment 15 Eike Rathke 2018-07-12 18:18:22 UTC
This is not a bug.

There's a great deal of confusion here. A lot of functions take parameters of type NumberSequence which is defined to *ignore* all text cells in cell references, among them are SUM() and AVERAGE(), and no, also AVERAGE() does *not* convert strings to numbers, it ignores all strings as well, specifically they also don't add to the count of elements by which the sum is divided. This is how all major spreadsheet applications work.

For the technically inclined, you can read that up at
http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#__RefHeading__1017992_715980110
the reference case.
If in that specification you take a look at SUM or AVERAGE, for example, you see those functions are defined to take one or more NumberSequence or NumberSequenceList parameters (NumberSequenceList is the result of a "union" of one or more range references, a list of NumberSequence).
http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#SUM
http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#AVERAGE

The difference to operators (+,-,...) and functions that take a single scalar numeric value is that for those when encountering text strings it depends on the configuration how to treat conversion from text to number, under Tools -> Options -> Calc -> Formula, Detailed Calculation Settings, go to Custom Details to see available choices. The default for new installations is still "Convert also locale dependent" because that is what Excel does so when loading such document we treat it the same. BUT, that conversion mode depends on the current locale and may deliver different results or yield an error value for different locales. The recommendation is to set that to "Generate #VALUE! error" so one can spot errors early and correct number text to actual numeric values, which will work in all spreadsheet applications whereas any automatic on-the-fly conversion from text to numbers may or may not yield the same results.
Comment 16 Mike Kaganski 2018-07-13 02:08:14 UTC
(In reply to Eike Rathke from comment #15)

There is still a user-level problem with all this. And that problem needs addressing. It's unrealistic to expect all users to be advanced enough to know this beforehand; and the current situation inevitably leads to a situation that ~every advanced user would obtain this knowledge through making grave errors, sometimes fatal (it might cost job someone).

As the program behaviour is correct in regards to specs and design, the only way to address this user-level problem is to do best informing user about the possible errors. First, in the descriptions of related functions: in help, in function wizard, and possibly in the auto-completion tooltips, should be clarified, that referenced cells with texts looking as numbers will *not* be converted to text, regardless of text-to-number conversion mode. But also the text values themselves should be somehow highlighted in the spreadsheet - something like Excel does; possibly by creating comment-like markers in such cells. Maybe some other way, like making visual clues when defining ranges for functions - highlighting the "wrong" values in the ranges, or turning on Value Highlighting mode when defining ranges or writing formulas......

I'd say that these enhancements should be filed as separate RFEs and tracked individually (help, wizard, tooltips, cell formatting tips ...).