Bug 155697 - String comparison fails depending on formatting
Summary: String comparison fails depending on formatting
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.5.3.2 release
Hardware: x86-64 (AMD64) macOS (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-06 02:11 UTC by Shad Sterling
Modified: 2023-06-06 13:06 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Reproduction (10.33 KB, application/vnd.oasis.opendocument.spreadsheet)
2023-06-06 02:11 UTC, Shad Sterling
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Shad Sterling 2023-06-06 02:11:40 UTC
Created attachment 187742 [details]
Reproduction

String comparisons of the form `=IF(A1="✔","=","<>")` can fail depending on the formatting of A1

I don't know how I made this happen, I noticed it in a fairly complicated file, was not able to reproduce it in a new file by applying the same formatting, but was able to reproduce it by pasting only Values & Formats from an affected cell in the large sheet

In the attached reproduction, A1 and A3 have the value and formatting created in the new file, and A2 and A4 have the value and formatting created in the original file.  They appear identical, and have identical content in the input line, but as you can see in column B they do not compare the same.  None of them appear the same as in the original sheet, where the applied font color appears (in the reproduction the default font color appears)

macOS 13.4 (22F66)
Comment 1 ady 2023-06-06 03:55:35 UTC
Attachment 187742 [details] from comment 0:

* cell A1 contains 1 character, unicode 10004

* Cell A2 contains 2 characters, unicode 10004 and then unicode 65039.

I would suggest closing this as NOTABUG (or perhaps INVALID, as there is simply nothing wrong in any way, no repro of any problem).
Comment 2 Shad Sterling 2023-06-06 04:35:54 UTC
Then I guess there are two bugs: however the content got to be different from entering the same text, and that there's no discernable difference in the input line (nor any apparent way to add or remove the invisible character)
Comment 3 ady 2023-06-06 05:12:48 UTC
(In reply to Shad Sterling from comment #2)
> Then I guess there are two bugs: 

I disagree.


> however the content got to be different
> from entering the same text

IMHO, Calc has no control on how or what exactly users introduce as data; that's for the users to know and to control.

> and that there's no discernable difference in
> the input line (nor any apparent way to add or remove the invisible
> character)

I would suggest using functions such as:
_ D1: LEN(A1)
_ E2: RIGHT(A2)
_ F2: UNICODE(E2)

...among others.

In a similar way, there are formatting marks in text used in Writer (e.g. end of paragraph; toggle formatting marks in Writer to show them).

Perhaps you would want to open an enhancement request for something, but I don't see here some formatting issue affecting the result of a formula as presented in comment 0.
Comment 4 Mike Kaganski 2023-06-06 06:29:03 UTC
Indeed, the "however the content got to be different from entering the same text" is only the user who created that can tell us; and the "there's no discernable difference in the input line" is *by design*, with Unicode explicitly defining *several* different codepoint sequences resulting in identical visual output. There are things kike normalization, that exist for that reason; there are thinks like 'variation selectors' - which your second character in A2 and A4, 'U+FE0F', is: it defines which of the several alternatives for a given "primary" character, like emoji, is used (it could be a skin color, or a hair variant, or whatever; and when your selector co-incides with the default, the resulting look would indeed be identical).

NOTABUG.
Comment 5 Mike Kaganski 2023-06-06 06:42:34 UTC
(In reply to Mike Kaganski from comment #4)
> things kike  ... thinks like ...

Sorry for typos.

As to "however the content got to be different from entering the same text" - note e.g. that different methods exist for entering emojis: e.g., on Windows, one can use charmap (a system app); or LibreOffice's Special Character (Insert menu); or system's Windows logo button + Dot - and the latter method allows you to chose variations; then you can select, copy and paste - and that may be another source of difference. This all is outside of LibreOffice control, and the only LibreOffice responsibility here is to obey, provide correct rendering, and accurately keep what was defined by user.
Comment 6 Shad Sterling 2023-06-06 13:06:07 UTC
As the user who entered everything that was entered in the relevant files, I actually don't know why entering the same character in different places happened with different encodings, and if using the macOS "Emoji & Symbols" tool is doing that I don't have control over that - but you do have control over whether you reveal it.  I certainly didn't manually enter a variant selector to select the default variant and make the entry invisibly different (nor do I know that doing so is even possible) - and if Writer has a way to show such invisible characters I don't see why Calc shouldn't also.  It's extra weird that Calc uses different glyphs in the input line and in the sheet, the same in each place either with or without the invisible selector character, but if you're intentionally confusing the user in the first way I assume you're also intentionally confusing the user in the second way