Bug 153425 - Calc shows different representations (TRUE/FALSE/1/0) of boolean data types stored in .xlsx starting near row 50
Summary: Calc shows different representations (TRUE/FALSE/1/0) of boolean data types s...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.5.0.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Calc-Cells
  Show dependency treegraph
 
Reported: 2023-02-06 22:54 UTC by PMouse
Modified: 2023-08-22 19:17 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
screenshot of LO doing the wrong thing (18.68 KB, image/png)
2023-02-06 23:01 UTC, PMouse
Details
screenshot of googlesheets with same file (11.23 KB, image/png)
2023-02-06 23:01 UTC, PMouse
Details
screenshot of excell with same file (59.25 KB, image/png)
2023-02-06 23:02 UTC, PMouse
Details
Example xlsx file (7.60 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2023-02-08 00:04 UTC, PMouse
Details
Another example file (12.13 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2023-02-08 00:07 UTC, PMouse
Details
printscreen from excel (35.22 KB, image/png)
2023-02-08 22:37 UTC, raal
Details

Note You need to log in before you can comment on or make changes to this bug.
Description PMouse 2023-02-06 22:54:54 UTC
Description:
Calc isn't consistent when displaying boolean values stored in .xlsx files.  Gnumeric, Excel, and GoogleSheets represent the same spreadsheet file data differently than Calc.

Near row 50, libreoffice-calc switches from "TRUE" or "FALSE" text representation to 1 or 0 integer value representations.

Steps to Reproduce:
1. In Excel, start a new spreadsheet.
2. Leave cell A1 empty.
3. Put 0 in cell A2, 1 in cell A3, and pull down on the lower corner of the cell to fill 101 rows with increasing integers.
4. Put a title for column B in cell B1, like "column" or something
5. Put "TRUE" in B2 and "FALSE" in B3.
6. Select B2:B3 and then pull-down auto-fill, again, so that you have 100 rows of True/False alternating values.
7. Save the file as standard "Excel Workbook" (.xlsx) file which is the newer standard.  I didn't test with "Strict OpenXML".
8. Now, save the same data as "Excel 97-2003" (.xls), too.
9. Open both export data files in Libre Office (calc)

Actual Results:
The .xls version opens fine, looks as expected.
The .xlsx version opens, but data is not shown in a consistent representation.  During my testing, the representation always starts out as "TRUE"/"FALSE", but changes to 0/1 at least once around row 50, but not always on the same row, and sometimes switches back and forth at least one more time within the first 100 rows.

Expected Results:
The same data type should have the same representation where identically formatted.  Since the data wasn't manually formatted, it is the default, and, presumably, all the same in this example.  Regardless, all other spreadsheet programs work as expected with this data, so...


Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 7.5.0.3 (X86_64)
Build ID: 50(Build:3)
CPU threads: 2; OS: Linux 6.2; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 1 PMouse 2023-02-06 23:01:03 UTC
Created attachment 185163 [details]
screenshot of LO doing the wrong thing
Comment 2 PMouse 2023-02-06 23:01:39 UTC
Created attachment 185164 [details]
screenshot of googlesheets with same file
Comment 3 PMouse 2023-02-06 23:02:17 UTC
Created attachment 185165 [details]
screenshot of excell with same file
Comment 4 Eike Rathke 2023-02-07 00:50:25 UTC
Please attach a sample .xlsx file to reproduce for the poor lucky souls who don't have Excel.
Comment 5 PMouse 2023-02-08 00:04:41 UTC
Created attachment 185193 [details]
Example xlsx file

Yes, of course; I meant to do that.  Here is a file created via python pandas to test this.
Comment 6 PMouse 2023-02-08 00:07:04 UTC
Created attachment 185194 [details]
Another example file

Here is an example file I created starting with the previous one, but, in Excel, I used the method discussed in my reproduction instructions to created a second column of T/F data and saved to this file.  So, this is an official Excel save file.
Comment 7 QA Administrators 2023-02-08 03:25:01 UTC Comment hidden (obsolete)
Comment 8 raal 2023-02-08 22:12:37 UTC
Confirm in Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: b052ec2f2fbe0f3044ba824c064a280a5ee9cd7f
CPU threads: 4; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US
Calc: threaded

Works in Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)
Comment 9 raal 2023-02-08 22:37:05 UTC
after this commit in LO 4.2 are all cells in range B2:C201 formatted as Boolean 0/1
 add940c952b3dd382be62cb28a0842d81a21e53f is the first bad commit
commit add940c952b3dd382be62cb28a0842d81a21e53f
Author: Matthew Francis <mjay.francis@gmail.com>
Date:   Sat Sep 5 22:39:35 2015 +0800

    source-hash-835fee82efb70b40b94f6babc2706ee1eb66dcf7
    
    commit 835fee82efb70b40b94f6babc2706ee1eb66dcf7
    Author:     Kohei Yoshida <kohei.yoshida@collabora.com>
    AuthorDate: Thu Oct 31 16:25:32 2013 -0400
    Commit:     Kohei Yoshida <kohei.yoshida@collabora.com>
    CommitDate: Mon Nov 4 13:59:17 2013 -0500
    
        Populate raw cell values using ScDocumentImport.
    
        Also fix incorrect const methods. Methods that populate the document
        model should not be marked const even if the compiler allows it.

After this commit in 7.4 are cells formatted as TRUE/FALSE
 72898afa9e57ff1f23eb82bff9f0cf0fbbe693c3 is the first bad commit
commit 72898afa9e57ff1f23eb82bff9f0cf0fbbe693c3
Author: Jenkins Build User <tdf@pollux.tdf>
Date:   Wed Feb 23 16:58:49 2022 +0100

    source 8f5d8669ca74fed8608e438a8436c173e35d43a6

https://git.libreoffice.org/core/+/8f5d8669ca74fed8608e438a8436c173e35d43a6
Comment 10 raal 2023-02-08 22:37:53 UTC
Created attachment 185237 [details]
printscreen from excel
Comment 11 ady 2023-02-09 08:38:31 UTC
FWIW, I can reproduce this without Excel at all. Also, there must be something very specific in order to trigger this issue, because I'm getting mixed results.

I was trying to reproduce this in LO 7.4.5, without starting from an imported file. I tried several different minor variations in the procedure:

* Having a column label (or not).
* Using AutoFilter (or not).
* Using one unique value (either true or false, not both).
* Dragging the mouse in order to fill the data, all in one step (i.e. one dragging movement), or instead making 2 movements (i.e. dragging once up to row 99, release the mouse, dragging again up to row 210).
* Filling more (or less) rows.
* Using TRUE() and FALSE(), or TRUE/FALSE, or combining these.


I could go on with additional minor differences.

Observations:

Using AutoFilter, the first time I get many different values within the AutoFilter list itself (many FALSE, TRUE, 0 and 1s). I mean that AutoFilter itself lists these values repeated times (with its own checkbox, etc.). After I PageDown, PageUp, the list in AutoFilter now shows each value 1 time (i.e. up to 4 different items at most, depending on the values being displayed on the cells). To be clear, I didn't even select a specific value; the resulting list is not filtered, no action was performed other than scroll down/up, and yet the AutoFilter list is different.

Since I haven't modified the values, why AutoFilter would recognize them differently first, but as equivalent values the second time? I kept trying different minor adjustments to the procedure.

Using "Save" or using "Save As" (I'm not sure anymore), I ended up with 2 xlsx files with the same original content, but one file opens up showing TRUE/FALSE only, whereas the other displays TRUE/FALSE/0/1.

In some cases (of all my attempts), I only closed the file, whereas in others I also closed LO entirely. Sometimes the 0/1 values are "in the middle" of the data rage; sometimes they are only seen in the bottom half of the rows. To be clear, I never introduced numbers in these attempts.

Now I have 2 xlsx files created with LO 7.4.5, with the exact same original content, showing 2 different things, with 2 different size weights:

* 6KB (incorrect, with 0/1 too) vs.
* 17KB (with TRUE/FALSE only).


I see problems with AutoFilter, and with how the values are displayed in the respective cells (and in which range). The only problem is that I cannot identify a step-by-step procedure that I could describe in order to reproduce exactly what I did so to obtain these files, or why I see the 0/1 values in different ranges (considering that I am opening the same exact file).

Although I'm not able to describe a clear procedure ATM, I am sharing this experience here, just in case someone else sees some similar behavior.
Comment 12 Justin L 2023-08-21 18:21:07 UTC
apparently fixed in 24.2 with (believe it or not)
commit d15c4caabaa21e0efe3a08ffbe145390e802bab9
Author: Justin Luth on Tue Sep 20 08:14:41 2022 -0400 (code, not commit date)
    tdf#123026 xlsx import: recalc optimal row height on import
Comment 13 ady 2023-08-22 01:30:13 UTC
(In reply to Justin L from comment #12)
> apparently fixed in 24.2 with (believe it or not)
FWIW, I can still see some inconsistencies, or at least differences, depending on version.

I am not saying that the results are incorrect; just that the exporting/saving is slightly different depending on file format.

On a "recent" (around 4 weeks old) LO Dev 24.2, saving/exporting as XLS and reopening:
* shows all TRUE/FALSE values as 1 and 0 respectively,
* cell content is (functions) =TRUE() and =FALSE() respectively,
* with cell format as default (standard).

On the same LO Dev version, saving/exporting as XLSX and reopening:
* shows all TRUE/FALSE values as TRUE and FALSE respectively,
* cell content is (functions) =TRUE() and =FALSE() respectively,
* with cell format as Boolean.

On the same LO Dev version, saving/exporting as ODS and reopening:
* shows all TRUE/FALSE values as TRUE and FALSE respectively,
* cell content is (values) TRUE and FALSE respectively,
* with cell format as Boolean.


Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: cf8f7b91f41821b79495c0388359c4cb1156ea67
CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: en-US (es_AR); UI: en-US
Calc: CL threaded

This LO Dev build includes the commit mentioned in comment 12 by Justin.
Comment 14 Justin L 2023-08-22 14:26:24 UTC
(In reply to ady from comment #13)
> FWIW, I can still see some inconsistencies, or at least differences,
> depending on format.
However, these really are rather disconnected from this particular bug, which is talking about xlsx format (almost exclusively).

> I am not saying that the results are incorrect; just that the
> exporting/saving is slightly different depending on file format.
Generally you could say the results are incorrect. At least for something as "simple" as true/false verses 1/0 or cell format.

> On a "recent" LO Dev 24.2, saving/exporting as XLS and reopening:
> * shows all TRUE/FALSE values as 1 and 0 respectively,
Sounds like the XLS bug equivalent of 8f5d8669ca74fed8608e438a8436c173e35d43a6
    tdf#122098 xlsx im/export: let formulas guess their number format

Reported as new bug 156861.

> On the same LO Dev version, saving/exporting as ODS and reopening:
> * cell content is (values) TRUE and FALSE respectively,
That seems unlikely, and I can't reproduce. I still see formula =True().
Comment 15 ady 2023-08-22 19:13:48 UTC
(In reply to Justin L from comment #14)

> > On the same LO Dev version, saving/exporting as ODS and reopening:
> > * cell content is (values) TRUE and FALSE respectively,
> That seems unlikely, and I can't reproduce. I still see formula =True().

I shall clarify my steps then:
1. New Calc worksheet (do not start from a previously-saved file).
2. Introduce the following string values, not as functions, not as numbers, and with the default standard cell format:
 A1: TRUE
 A2: FALSE
3. Save the file as ODS, close and reopen it.

Result, as described in comment 13:
* shows all TRUE/FALSE values as TRUE and FALSE respectively,
* cell content is (values) TRUE and FALSE respectively,
* with cell format as Boolean.
Comment 16 ady 2023-08-22 19:17:25 UTC
(In reply to ady from comment #15)

> 3. Save the file as ODS, close and reopen it.
> 
> Result, as described in comment 13:
> * shows all TRUE/FALSE values as TRUE and FALSE respectively,
> * cell content is (values) TRUE and FALSE respectively,
> * with cell format as Boolean.

To be clear, since LO considers those strings as meaning the same as the respective functions (while working in English), this is probably OK for ODS. Importing/exporting from other tools in ODS might have different results, but I have not tested such scenario.