Bug 145384 - Writer table cells have address scheme that differs from Calc, "A-Z,a-z,AB-AZ,..." vs "A-Z,AA-AZ,BA-BZ,..."; lowercase cell refs are more error prone for cell formulas
Summary: Writer table cells have address scheme that differs from Calc, "A-Z,a-z,AB-A...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: ODF-import Writer-Tables-Formulas
  Show dependency treegraph
 
Reported: 2021-10-29 14:15 UTC by Dave McKellar
Modified: 2021-11-18 15:23 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
And RTF doc made with Microsoft Word using a lowercase range - this is correctly imported (12.67 KB, application/msword)
2021-10-29 14:15 UTC, Dave McKellar
Details
sample Writer ODF with 60 column table, using cell formulas (11.00 KB, application/vnd.oasis.opendocument.text)
2021-11-04 18:45 UTC, V Stuart Foote
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dave McKellar 2021-10-29 14:15:00 UTC
Created attachment 176001 [details]
And RTF doc made with Microsoft Word using a lowercase range - this is correctly imported

In Microsoft Word I entered lowercase for a range it saved SUM(a1:a3) in the RTF file.  LibreOffice Writer correctly changed this to SUM(<A1:A3>) when importing.

But if I manually enter a lowercase range SUM(<a1:a3>) in LibreOffice Writer its not tolerated.  The value changes to 0 and when I go back to the formula editor the formula has become SUM(<?:?>).  Perhaps the rules for ranges is uppercase only but this seems overly strict and unfriendly since we are inside angle brackets (meaning range).  Writer should tolerate lowercase ranges or convert it to uppercase.
Comment 1 Mike Kaganski 2021-10-29 15:07:08 UTC
Repro with Version: 7.2.2.2 (x64) / LibreOffice Community
Build ID: 02b2acce88a210515b4a5bb2e46cbfb63fe97d56
CPU threads: 12; OS: Windows 10.0 Build 19043; UI render: default; VCL: win
Locale: ru-RU (ru_RU); UI: en-US
Calc: threaded

and also with OOo 3.2.0.

E.g., creating a table 2x2 in a new document, and entering the formula into A1:

=sum(<b1:b2>)

results in "expression is faulty"; while this formula:

=sum(<B1:B2>)

gives proper result.
Comment 2 V Stuart Foote 2021-10-29 15:52:30 UTC
The edit shell for Calc tables corrects lowercase entered cell refs to uppercase in resulting formula.  

So this is about Writer Table cell formula entry *only*.

Can a similar input validation be added to the edit shell for Writer tables?

Not a regression, but adding to bug 123370

All documentation (help & the 7.2 Writer Guide) shows cell refs Upper case. But has no caution or note of effect of using a cell reference entered with lower case.

Strange it has never been reported against Writer, also checked the AOO bugzilla as well with no clear dupe.
Comment 3 himajin100000 2021-11-01 13:03:49 UTC Comment hidden (obsolete)
Comment 5 Dave McKellar 2021-11-01 18:04:58 UTC
Perhaps use standard functions for that code:

   if (!isalpha(cChar)) break
   if (islower(cChar)) cChar = toupper(cChar);
   cChar -= 'A';

To replace:

    if ((cChar<'A' || cChar>'Z') && (cChar<'a' || cChar>'z'))
        break;
    cChar -= 'A';
    if( cChar >= 26 )
        cChar -= 'a' - '[';
Comment 6 Eike Rathke 2021-11-03 20:46:30 UTC
No, never use isalpha(), islower(), toupper() or similar standard C/C++ library functions if you want to operate on exactly ASCII characters, because these functions may be localized and thus yield unexpected results.

Instead, use rtl::isAsciiAlpha() and rtl::toAsciiUpperCase()
Comment 7 Dave McKellar 2021-11-03 20:55:00 UTC
Thanks for the info, Eike.  I am new to LibreOffice.  That makes my suggestion:

   if (!rtl::isAsciiAlpha(cChar)) break
   if (rtl::isAsciiLowerCase(cChar)) cChar = rtl::toAsciiUpperCase(cChar);
   cChar -= 'A';
Comment 8 Mike Kaganski 2021-11-04 06:33:16 UTC
(In reply to Dave McKellar from comment #7)

And you even may remove one check that is already built into the inline toAsciiUpperCase [1]:

> if (!rtl::isAsciiAlpha(cChar))
>     break;
> cChar = rtl::toAsciiUpperCase(cChar) - 'A';

That would be a nice cleanup, and I see how it would improve the code; but I don't see how would that *change* current behavior.

[1] https://opengrok.libreoffice.org/xref/core/include/rtl/character.hxx?r=b1fb6338#290
Comment 9 V Stuart Foote 2021-11-04 13:47:07 UTC
For calc don't we use it like that for lcl_a1_get_col () in address.cxx [1]. We wouldn't need it to be as complex, but validating input of writer table cell addresses seems worth doing.

=-ref-=

[1] https://opengrok.libreoffice.org/xref/core/sc/source/core/tool/address.cxx?r=96f97eb0#924
Comment 10 Dave McKellar 2021-11-04 15:24:37 UTC
I did a test and my change does fix the code.  Sorry if this post is too big.
I made a test program.  First with the original code and some print's:

#include <stdio.h>
#include <ctype.h>

void convertName(const char *name) {
	printf("name in=%s\n", name);
        for (const char *p = name; *p; p++) {
		char cChar = *p;
		if ((cChar<'A' || cChar>'Z') && (cChar<'a' || cChar>'z'))
			break;
		cChar -= 'A';
		if( cChar >= 26 )
			cChar -= 'a' - '[';
		printf("out cChar=0x%02x\n", cChar);
	}
	printf("\n");
}

int main() {
	convertName("A55");
	convertName("a55");
	return 0;
}

Which prints out:

name in=A55
out cChar=0x00

name in=a55
out cChar=0x1a

In other words, lowercase and uppercase get different results.

Then I added one toupper() line:

#include <stdio.h>
#include <ctype.h>

void convertName(const char *name) {
	printf("name in=%s\n", name);
        for (const char *p = name; *p; p++) {
		char cChar = *p;
		if ((cChar<'A' || cChar>'Z') && (cChar<'a' || cChar>'z'))
			break;
		if (islower(cChar)) cChar = toupper(cChar); // NEW NEW
		cChar -= 'A';
		if( cChar >= 26 )
			cChar -= 'a' - '[';
		printf("out cChar=0x%02x\n", cChar);
	}
	printf("\n");
}

int main() {
	convertName("A55");
	convertName("a55");
	return 0;
}

It has the same output for uppercase and lowercase:

name in=A55
out cChar=0x00

name in=a55
out cChar=0x00
Comment 11 himajin100000 2021-11-04 15:30:37 UTC
by the way, what do you think this line of the orignal code is doing ?
num = num * 52 + cChar;
Comment 12 Dave McKellar 2021-11-04 15:53:16 UTC
Good question!
At that point cChar is binary zero for 'A' and binary 25 for 'Z'.
Adding 52 to that gets nonsense ASCII: 'A' becomes '4' and 'Z' becomes 'M'.
So its probably not an ASCII conversion.
(The difference between 'A' and 'a' is 32 - sort of like 52.)
I guess its making it into the range that the caller of that function expects.
Why?, not sure.
Comment 13 Mike Kaganski 2021-11-04 15:56:58 UTC
(In reply to Dave McKellar from comment #10)

Ah - indeed 'a' - '[' == 6, not 32 as it should be to convert to lowercase. Please do submit your fix to gerrit!
Comment 14 Mike Kaganski 2021-11-04 15:58:28 UTC
(In reply to himajin100000 from comment #11)
> by the way, what do you think this line of the orignal code is doing ?
> num = num * 52 + cChar;

This looks a separate problem, doing nonsense to handle two-character column numbers like AA.
Comment 15 Mike Kaganski 2021-11-04 16:09:55 UTC
Now thinking about it a bit more

It seems that the address is case-sensitive, with A != a, and the full character repertoire being uppercase then lowercase, with Z followed by a, and z followed by AA. Then 52 makes perfect sense being 26*2.
Comment 16 Dave McKellar 2021-11-04 16:12:29 UTC
> It seems that the address is case-sensitive, with A != a, and the full
> character repertoire being uppercase then lowercase, with Z followed by a,
> and z followed by AA. Then 52 makes perfect sense being 26*2.

Its doing base 52 math.
Comment 17 Mike Kaganski 2021-11-04 16:13:19 UTC
Heh, so it is NOTABUG.
Try to insert a table with e.g. 60 columns, and check its columns :-)
Comment 18 Dave McKellar 2021-11-04 16:16:02 UTC
But in my original bug it changed the formula.  That seems wrong.  Even if it was out of range.
Comment 19 Mike Kaganski 2021-11-04 16:22:58 UTC
(In reply to Dave McKellar from comment #18)

The same way as when you try to put '=<K1>' in a table with three columns. Unrelated to upper-vs-lowercase, and specific to the way Writer's tables validate formula references. Self-consistent in a sense...

I am very disappointed to learn that Writer uses totally different column naming convention compared to Calc. But anyway, there is help missing on this topic.
Comment 20 Dave McKellar 2021-11-04 16:29:20 UTC
It seems very unfriendly to delete something the user entered.  Better to keep the formula and display "invalid reference" (or something) for the value.

Since the difference between Writer and Calc doesn't appear to be documented, perhaps Writer can be changed and then documented.  To help users transition if it sees lowercase call names it can display a message.  That would be best in the long run.
Comment 21 Mike Kaganski 2021-11-04 16:36:41 UTC
Fun detail is that it's even can't be justified "for compatibility with Word", since Word uses normal 26-base reference scheme.

Wrt "unfriendly to delete something the user entered": IIUC, the formula text is never kept, it's parsed, tokenized, and the result is stored. Whatever can't be tokenized is not stored. 

Without much change that would highlight the problem, keeping the formula intact and only displaying "invalid reference" would make fixing formulas next to impossible, when user doesn't see which part is wrong.
Comment 22 Mike Kaganski 2021-11-04 16:39:09 UTC
(In reply to Dave McKellar from comment #20)
> Since the difference between Writer and Calc doesn't appear to be
> documented, perhaps Writer can be changed and then documented.  To help
> users transition if it sees lowercase call names it can display a message. 
> That would be best in the long run.

It *looks* reasonable, but dangerous. Personally I support this - but I'd suggest you to post a message to the dev mailing list asking for opinions on the proposal.
Comment 23 Mike Kaganski 2021-11-04 16:42:39 UTC
(In reply to Mike Kaganski from comment #22)
> It *looks* reasonable, but dangerous. Personally I support this - but I'd
> suggest you to post a message to the dev mailing list asking for opinions on
> the proposal.

OTOH, the formula is written into file format using this convention, so the change of this, however undocumented, convention would be incompatible.
Comment 24 Dave McKellar 2021-11-04 16:54:21 UTC
On unparsable formulas:
If you enter a syntax error like: +++++
The formula is retained but the value "** Expression is faulty **" is displayed.
But if you enter something wrong inside angle brackets - eg <junk> it is changed into <?>
That's inconsistent.  Perhaps the representation of a range could be expanded
to hold things it can not parse as a string.
Comment 25 V Stuart Foote 2021-11-04 18:45:17 UTC
Created attachment 176108 [details]
sample Writer ODF with 60 column table, using cell formulas

So in this attached Writer ODF with a 60 column table, it shows the column notation is -- A-Z,a-z,AA-AZ, etc.

It probably would be better to make the cell labeling in Writer match the labeling in Calc, i.e  A-Z,AA-AZ,BA-BZ, etc. 

That is, to eliminate the a-z labeling like calc by using the rtl::isAsciiAlpha() and rtl::toAsciiUpperCase() conversion as Eike suggested in comment 6

There would need to be a convert on edit/save for any existing documents, but IMHO consistency would be better in the long run.
Comment 26 V Stuart Foote 2021-11-04 19:01:50 UTC
(In reply to V Stuart Foote from comment #25)
> Created attachment 176108 [details]
> sample Writer ODF with 60 column table, using cell formulas
> but IMHO consistency would be better in the long run.

For what it is worth, Word 2019 chokes on the ODF.
Comment 27 Eike Rathke 2021-11-05 13:07:06 UTC
(In reply to Mike Kaganski from comment #17)
> Heh, so it is NOTABUG.
> Try to insert a table with e.g. 60 columns, and check its columns :-)

It's still a bug because ODF 9.2.1 Referencing Table Cells
https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.html#__RefHeading__1415614_253892949
says that after Z the next column 27 is AA.

However, if this is corrected then
a) import must take care of converting everything above A-Z to the proper notation
b) writing the proper values of AA... to file will let older versions address wrong columns
c) when reading file and encountering AA... it needs to decide whether that could be the old broken notation for value 53... or the new 27..., likely by inspecting whether it fits into the actual table width
d) though not recommended and frowned upon, it may be necessary to act on the <meta:generator> value if that is neither LibreOffice nor OpenOffice then assume the proper ODF conform values were written; if it is LibreOffice then its version number could help to determine how to handle c)

So, to remedy b) somewhat, when fixing this in master apply changes for current release branches that when encountering AA... do c) but do *not* the conversion of a) and b) to not propagate the fixed values to even older release versions that don't know anything about this.

Good luck ;-)
Comment 28 Mike Kaganski 2021-11-05 13:13:24 UTC
(In reply to Eike Rathke from comment #27)
> It's still a bug because ODF 9.2.1 Referencing Table Cells
> https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/
> OpenDocument-v1.3-os-part3-schema.html#__RefHeading__1415614_253892949
> says that after Z the next column 27 is AA.

Yay! :-D

> Good luck ;-)

I second this.
Comment 29 Heiko Tietze 2021-11-10 08:31:13 UTC
Issue seems to be clear (with a lot of facets). If we keep lowercase and accept =sum(<a1:a3>) it might break backward compatibility.

Removing needsUXEval.
Comment 30 V Stuart Foote 2021-11-10 14:34:33 UTC
(In reply to Heiko Tietze from comment #29)
> Issue seems to be clear (with a lot of facets). If we keep lowercase and
> accept =sum(<a1:a3>) it might break backward compatibility.
> 
> Removing needsUXEval.

Sigh, "fixing" this would require major refactoring of Writer tables. Where using =sum(<a1:a3>) is perfectly acceptable cell reference to 1st, 2nd and 3rd row of the *27th* column of a Writer table! If they don't exist in the table, it alerts.

In a Calc sheet those cells would be referenced by <AA1:AA3>. And, entering that same range in lower case is corrected by Calc edit shell to <A1:A3>. ODF standard does not permit addressing cells with lower case--they are converted in Calc.

Calc complies with ODF, Writer does not.

But this is an UX issue requiring assessment of what extent do we need to accommodate the prior usage?  And how quickly this needs to be corrected? 

UX Considerations:
1) how common are tables in our Writer .ODT wider than 26 columns?
2) how disruptive would it be changing Writer formula input to ODF standard?
3) can we let the current addressing remain unchanged?
4) are we obliged to maintain backward compatibility of Writer table addressing for use on prior builds?
5) is it better to fully convert a Writer document to new "corrected" ODF compliance, and when (on opening, or on save)? Or should we attempt more complex handling of supporting both addressing schemes?


To me it is pretty simple. "Zero" complaints from OOo/AOO/LO means few if any users of Writer have bumped into the issue of mixed-case addressing in Writer tables wider than 26 columns. They are not common!  Not much disruption by fixing it correctly to comply with the ODF document model.  Frankly we must correct it (or the world questions our standards compliance--we have enough issues with SVG already).

The last two are a bit tougher, and interlinked. I'd argue strongly that the best action is to aggressively refactor handling of formula addressing of Writer tables to be ODF table cell address compliant--eliminate lowercase 52*2 addressing. Then match the Calc shell validation to convert lowercase -> upper case cell ref input.  

For simplicity .ODT filter import should convert incoming Table cell formulas from the A-Z,a-z,AA-AZ cell addressing to ODF compliant A-Z,AA-AZ,BA-BZ. And we should write back any formulas in ODF compliant form. The document save will complete the conversion--only closing without save should abandon conversion.

Any new Writer table formulas will be correct, any old documents would be irrevocably converted. Yes someone on an old build of LO opening a new document would have formula issues--but then the old version is not an ODF compliant renderer.

We may annoy a few users, but there needs to be clear UX agreement of the approach to correcting this. To me a clean break is better than muddled attempt to accommodate legacy Writer table cell addressing.
Comment 31 Dave McKellar 2021-11-11 14:03:04 UTC Comment hidden (noise)
Comment 32 Eike Rathke 2021-11-12 12:49:23 UTC Comment hidden (noise)
Comment 33 Heiko Tietze 2021-11-16 11:56:48 UTC
(In reply to V Stuart Foote from comment #30)
> We may annoy a few users, but there needs to be clear UX agreement of the
> approach to correcting this. To me a clean break is better than muddled
> attempt to accommodate legacy Writer table cell addressing.

ODF conformity is prime likewise compatibility. It should be possible to open documents with old/wrong and new/correct references in upcoming and old versions.

OTOH, whether a1 is the first or the 27th column would be unclear to users. I agree that most tables are shorter. So perhaps a clear cut is better than fiddling around with compatibility.

But that's rather a decision for the ESC. And we gave enough input.
Comment 34 Heiko Tietze 2021-11-18 15:23:02 UTC
ESC discussed the topic:

=> OK to switch to openformula in Writer tables if we don't break old documents and somebody steps up to do the work (all)