Bug 58838 - : =A1="" returns TRUE when A1 does not contain a blank
Summary: : =A1="" returns TRUE when A1 does not contain a blank
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Eike Rathke
URL:
Whiteboard: BSA target:5.1.0 target:5.0.1 target:...
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-28 19:55 UTC by David Lynch
Modified: 2016-10-25 19:24 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Example spreadsheet showing comparisons (33.24 KB, application/vnd.oasis.opendocument.spreadsheet)
2013-04-29 05:46 UTC, Owen Genat (retired)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Lynch 2012-12-28 19:55:14 UTC
Problem description: 
=A1="" returns TRUE when A1 does not contain a blank
Steps to reproduce:
1. Set a1 equal to soft hyphen ... =char(173)
2. Set a2 to a1=""

Current behavior:
a2 shows TRUE
Expected behavior:
a2 shows FALSE

This behavior also occurs when a1=code(n) for n=0...8,13...31,127
[These latter characters are not printable, but code(173) does occur in printed text.]
[In all cases ISTEXT(A1) returns TRUE.]
Operating System: Windows (other)
Version: 3.6.2.2 release
Comment 1 m_a_riosv 2012-12-29 01:06:05 UTC
Hi David,

I think for this and other cases is there the EXACT() function, which is always false comparing any character against null, except with a blank cell.

I guess the soft hyphenation has a special meaning, so it is treated as null.

I do not think this is a bug.
Comment 2 Owen Genat (retired) 2013-04-29 05:46:11 UTC
Created attachment 78588 [details]
Example spreadsheet showing comparisons

I am not sure if this constitutes a confirmation of this bug or not. I agree with mariosv that the EXACT() function is the preferred method for string comparison. It would seem to me that comparisons via raw equality (equals sign) of the empty string "" with CHAR() and UNICHAR() output are, for some reason, variable i.e., dependent upon the character in question being compared.

In the attached spreadsheet there are columns for comparison of CHAR() output for characters 0 through 256 (one extra to be certain) against a NULL cell, an empty string cell, and a cell that has been edited (for curiosity sake). This is repeated again for UNICHAR() output for the same range. A comparison of the results of these tests is then made i.e., NULL result for CHAR() test against NULL result for UNICHAR() test.

I have conducted these tests under Ubuntu 10.04 x86_64 running TDF LO v.3.5.7.2 and TDF LO v4.0.2.2. There is no difference in output between the LO versions. As can be seen in the attachment most empty string tests return a FALSE value that is the same as that returned for a NULL value. There are some characters though (0-8, 14-31, 127, 249-251, 254-255 for the CHAR() function and 0-8, 14-31, 127-132, 134-159 for the UNICHAR() function) that return a TRUE value. I have not tested higher Unicode values.

I am not sure if these characters constitute an edge case, but the behaviour would certainly appear to be unusual or inconsistent.
Comment 3 David Lynch 2013-04-29 06:46:10 UTC
The ODF documentation copied below (ver 1.2) seems to imply that there is no scope for comparisons via raw equality (equals sign) to be variable:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
6.4.7 Infix Operator "="

Summary: Report if two values are equal

Syntax: Scalar Left = Scalar Right

Returns: Logical

Constraints: None

Semantics: Returns TRUE if two values are equal. If the values differ in type, return FALSE. If the values are both Number, return TRUE if they are considered equal, else return FALSE. If they are both Text, return TRUE if the two values match, else return FALSE. For Text values, if the calculation setting HOST-CASE-SENSITIVE is false, text is compared but characters differencing only in case are considered equal. If they are both Logicals, return TRUE if they are identical, else return FALSE. Error values cannot be compared to a constant Error value to determine if that is the same Error value.

Evaluators may approximate and test equality of two numeric values with an accuracy of the magnitude of the given values scaled by the number of available bits in the mantissa, ignoring some least significant bits and thus providing compensation for not exactly representable values.

The result of “1=TRUE()” is FALSE for evaluators that implement a distinct Logical type and TRUE if they don't. 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Comment 4 Owen Genat (retired) 2013-05-05 02:38:10 UTC
David, I would agree. Furthermore, under Tools > Options... > LibreOffice Calc > Formula > Detailed calculation settings > Custom > Details... (v4.0.2.2) it makes no difference to the results of the example spreadsheet I provided whether "Treat empty string as zero" is set to True or False (default). On the surface, this would seem problematic, however I am not sure it is that simple.

The reason I was hesitant to suggest my test as a confirmation is because it may illustrate more than one issue or even a different issue. The infix operator "=" may not be in error. Rather it could be the manner in which certain characters are handled or reported to the infix operator. It could also be a problem with the handling of the empty string value. I am not familiar with the code enough to determine is this is the case or not.

From the perspective of a user it does appear, in simple terms, that there is a problem of some kind, but what the technical nature of that problem is, I am uncertain. As mariosv indicates, if some of these characters have special meaning, it may not be possible to conduct equality tests via the infix operator "=". I will be interested in the outcome either way.
Comment 5 Owen Genat (retired) 2013-12-05 02:16:04 UTC
I am now confirming this bug as a result of further testing and to partly to get developer input on what is happening here. Status set to NEW. Version left as-is due to the original report mentioning U+0173 and this does appear to have changed behaviour under v3.6. Platform set to All/All. Expanding on my testing in comment #2:

> I have conducted these tests under Ubuntu 10.04 x86_64 running TDF LO v.3.5.7.2 and 
> TDF LO v4.0.2.2. There is no difference in output between the LO versions. As can 
> be seen in the attachment most empty string tests return a FALSE value that is the 
> same as that returned for a NULL value. There are some characters though (0-8, 
> 14-31, 127, 249-251, 254-255 for the CHAR() function and 0-8, 14-31, 127-132, 
> 134-159 for the UNICHAR() function) that return a TRUE value.

I have now opened the attachment to that comment under Crunchbang 11 x86_64 running:

- v3.3.4.1 OOO330m19 Build:401
- v3.4.6.2 OOO340m1 Build:602
- v3.5.7.2 Build ID: 3215f89-f603614-ab984f2-7348103-1225a5b
- v3.6.7.2 Build ID: e183d5b
- v4.0.6.2 Build ID: 2e2573268451a50806fcd60ae2d9fe01dd0ce24
- v4.1.3.2 Build ID: 70feb7d99726f064edab4605a8ab840c50ec57a

Character comparisons, for the CHAR and UNICHAR functions, that report TRUE when equated with an empty string (via infix operator "=") are listed (refer columns E and I). Anomalies in comparing the returned boolean values (columns O, P, Q) are also indicated:

v3.3.4.1
CHAR:    0-8, 14-31, 127.
UNICHAR: 0-8, 14-31, 127-132, 134-159.
All boolean results compare OK.

v3.4.6.2
As above for v3.3.4.1.

v3.5.7.2
CHAR:    0-8, 14-31, 127, 249-251, 254-255.
UNICHAR: 0-8, 14-31, 127-132, 134-159.
All boolean results compare OK.

v3.6.7.2
CHAR:    0-8, 14-31, 127, 249-251.
UNICHAR: 0-8, 14-31, 127-132, 134-159, 173.
All boolean results compare OK.

v4.0.6.2
As above for v3.6.7.2.

v4.1.3.2
CHAR:    0-8, 14-31, 127, 249-251.
UNICHAR: 0-8, 14-31, 127-132, 134-159, 173.
Boolean results for 254-255 return FALSE even though both character comparisons (CHAR and UNICHAR) also return FALSE. This appears to be erroneous.

Overall the behaviour of characters 173, 249-251, and 254-255, when compared against an empty string, have changed across versions. In itself this is not unusual or worrying other than there does not appear to be any discernible pattern to the changes, especially in terms of improvement or consistency.
Comment 6 Regina Henschel 2014-01-17 15:32:57 UTC
UNICHAR() returns the correct character. If you apply the function UNICODE() to the result of UNICHAR(), you get the original number. So it is a matter of the string comparison of the equal sign. If you test <>"", you get always TRUE.
Comment 7 klsu 2015-01-25 20:27:56 UTC
It doesn't matter whether the effect is a bug or by design (which can also mean "we didn't really think about it" or "it's too difficult"); it can result in an unexpected wrong answer. In a spreadsheet, that is worse than a program or system crash or slowing down to a crawl (ask an accountant, an engineer or a pharmacist). A spreadsheet function that can give an unexpected wrong answer makes the spreadsheet unusable. 

The spreadsheet attached by Owen Genat clearly shows that there is inconsistency with how (an empty string) "" is interpreted. By definition, a character cannot equal an empty string, but 28 apparently do. When I open Owen's spreadsheet in Excel 2003, no characters are shown equal to an empty string.
Comment 8 Julien Nabet 2015-04-19 14:10:29 UTC
On pc Debian x86-64 with master sources updated today or with LO Debian package 4.3.3, I don't reproduce the problem indicated in initial description.

However, even with master sources, there are still "FALSE" cells in O/P/Q columns from Owen's file.
Comment 9 Eike Rathke 2015-07-20 14:53:56 UTC
Equal/NotEqual use collation, which collation is not meant for in this context..
Taking.
Comment 10 Commit Notification 2015-07-20 17:20:23 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1e5e7b9323c3321a380edf86097ccc2e3f0a7a7a

Resolves: tdf#58838 do not use collation for Equal/NotEqual operators

It will be available in 5.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 11 Eike Rathke 2015-07-20 17:54:40 UTC
Pending review
https://gerrit.libreoffice.org/17238 for 4-4
https://gerrit.libreoffice.org/17240 for 5-0
Comment 12 Commit Notification 2015-08-07 12:54:55 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-5-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=34bdbdee3baebf5f008f2ad6762f455db7752e48&h=libreoffice-5-0

Resolves: tdf#58838 do not use collation for Equal/NotEqual operators

It will be available in 5.0.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 13 Commit Notification 2015-08-21 21:51:21 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-4-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=e55ea8312645b6a4274164591bea5dd88fa99022&h=libreoffice-4-4

Resolves: tdf#58838 do not use collation for Equal/NotEqual operators

It will be available in 4.4.6.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.