145384 – Writer table cells have address scheme that differs from Calc, "A-Z,a-z,AA-AZ,..." vs "A-Z,AA-AZ,BA-BZ,..."; lowercase cell refs are more error prone for cell formulas

Bug 145384 - Writer table cells have address scheme that differs from Calc, "A-Z,a-z,AA-AZ,..." vs "A-Z,AA-AZ,BA-BZ,..."; lowercase cell refs are more error prone for cell formulas

Summary: Writer table cells have address scheme that differs from Calc, "A-Z,a-z,AA-A...

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Writer (show other bugs)
Version: (earliest affected)	Inherited From OOo
Hardware:	All All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	ODF-import Writer-Tables-Formulas
	Show dependency tree / graph

Reported:	2021-10-29 14:15 UTC by Dave McKellar
Modified:	2024-12-11 18:35 UTC (History)
CC List:	7 users (show)

See Also:	137764
Crash report or crash signature:

Attachments
And RTF doc made with Microsoft Word using a lowercase range - this is correctly imported (12.67 KB, application/msword) 2021-10-29 14:15 UTC, Dave McKellar	Details
sample Writer ODF with 60 column table, using cell formulas (11.00 KB, application/vnd.oasis.opendocument.text) 2021-11-04 18:45 UTC, V Stuart Foote	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dave McKellar 2021-10-29 14:15:00 UTC

Created attachment 176001 [details]
And RTF doc made with Microsoft Word using a lowercase range - this is correctly imported

In Microsoft Word I entered lowercase for a range it saved SUM(a1:a3) in the RTF file.  LibreOffice Writer correctly changed this to SUM(<A1:A3>) when importing.

But if I manually enter a lowercase range SUM(<a1:a3>) in LibreOffice Writer its not tolerated.  The value changes to 0 and when I go back to the formula editor the formula has become SUM(<?:?>).  Perhaps the rules for ranges is uppercase only but this seems overly strict and unfriendly since we are inside angle brackets (meaning range).  Writer should tolerate lowercase ranges or convert it to uppercase.

Comment 1 Mike Kaganski 2021-10-29 15:07:08 UTC

Repro with Version: 7.2.2.2 (x64) / LibreOffice Community
Build ID: 02b2acce88a210515b4a5bb2e46cbfb63fe97d56
CPU threads: 12; OS: Windows 10.0 Build 19043; UI render: default; VCL: win
Locale: ru-RU (ru_RU); UI: en-US
Calc: threaded

and also with OOo 3.2.0.

E.g., creating a table 2x2 in a new document, and entering the formula into A1:

=sum(<b1:b2>)

results in "expression is faulty"; while this formula:

=sum(<B1:B2>)

gives proper result.

Comment 2 V Stuart Foote 2021-10-29 15:52:30 UTC

The edit shell for Calc tables corrects lowercase entered cell refs to uppercase in resulting formula.  

So this is about Writer Table cell formula entry *only*.

Can a similar input validation be added to the edit shell for Writer tables?

Not a regression, but adding to bug 123370

All documentation (help & the 7.2 Writer Guide) shows cell refs Upper case. But has no caution or note of effect of using a cell reference entered with lower case.

Strange it has never been reported against Writer, also checked the AOO bugzilla as well with no clear dupe.

Comment 3 himajin100000 2021-11-01 13:03:49 UTC Comment hidden (obsolete)

haven't looked closer into the code, but POSSIBLY related.

https://opengrok.libreoffice.org/xref/core/sw/source/core/bastyp/calc.cxx?r=78cee244#SwCalc

Comment 4 himajin100000 2021-11-01 17:41:50 UTC

currently reading...

https://opengrok.libreoffice.org/xref/core/sw/source/core/table/swtable.cxx?r=546e7d14&fi=GetBoxNum#1276

Comment 5 Dave McKellar 2021-11-01 18:04:58 UTC

Perhaps use standard functions for that code:

   if (!isalpha(cChar)) break
   if (islower(cChar)) cChar = toupper(cChar);
   cChar -= 'A';

To replace:

    if ((cChar<'A' || cChar>'Z') && (cChar<'a' || cChar>'z'))
        break;
    cChar -= 'A';
    if( cChar >= 26 )
        cChar -= 'a' - '[';

Comment 6 Eike Rathke 2021-11-03 20:46:30 UTC

No, never use isalpha(), islower(), toupper() or similar standard C/C++ library functions if you want to operate on exactly ASCII characters, because these functions may be localized and thus yield unexpected results.

Instead, use rtl::isAsciiAlpha() and rtl::toAsciiUpperCase()

Comment 7 Dave McKellar 2021-11-03 20:55:00 UTC

Thanks for the info, Eike.  I am new to LibreOffice.  That makes my suggestion:

   if (!rtl::isAsciiAlpha(cChar)) break
   if (rtl::isAsciiLowerCase(cChar)) cChar = rtl::toAsciiUpperCase(cChar);
   cChar -= 'A';

Comment 8 Mike Kaganski 2021-11-04 06:33:16 UTC

(In reply to Dave McKellar from comment #7)

And you even may remove one check that is already built into the inline toAsciiUpperCase [1]:

> if (!rtl::isAsciiAlpha(cChar))
>     break;
> cChar = rtl::toAsciiUpperCase(cChar) - 'A';

That would be a nice cleanup, and I see how it would improve the code; but I don't see how would that *change* current behavior.

[1] https://opengrok.libreoffice.org/xref/core/include/rtl/character.hxx?r=b1fb6338#290

Comment 9 V Stuart Foote 2021-11-04 13:47:07 UTC

For calc don't we use it like that for lcl_a1_get_col () in address.cxx [1]. We wouldn't need it to be as complex, but validating input of writer table cell addresses seems worth doing.

=-ref-=

[1] https://opengrok.libreoffice.org/xref/core/sc/source/core/tool/address.cxx?r=96f97eb0#924

Comment 10 Dave McKellar 2021-11-04 15:24:37 UTC

I did a test and my change does fix the code.  Sorry if this post is too big.
I made a test program.  First with the original code and some print's:

#include <stdio.h>
#include <ctype.h>

void convertName(const char *name) {
	printf("name in=%s\n", name);
        for (const char *p = name; *p; p++) {
		char cChar = *p;
		if ((cChar<'A' || cChar>'Z') && (cChar<'a' || cChar>'z'))
			break;
		cChar -= 'A';
		if( cChar >= 26 )
			cChar -= 'a' - '[';
		printf("out cChar=0x%02x\n", cChar);
	}
	printf("\n");
}

int main() {
	convertName("A55");
	convertName("a55");
	return 0;
}

Which prints out:

name in=A55
out cChar=0x00

name in=a55
out cChar=0x1a

In other words, lowercase and uppercase get different results.

Then I added one toupper() line:

#include <stdio.h>
#include <ctype.h>

void convertName(const char *name) {
	printf("name in=%s\n", name);
        for (const char *p = name; *p; p++) {
		char cChar = *p;
		if ((cChar<'A' || cChar>'Z') && (cChar<'a' || cChar>'z'))
			break;
		if (islower(cChar)) cChar = toupper(cChar); // NEW NEW
		cChar -= 'A';
		if( cChar >= 26 )
			cChar -= 'a' - '[';
		printf("out cChar=0x%02x\n", cChar);
	}
	printf("\n");
}

int main() {
	convertName("A55");
	convertName("a55");
	return 0;
}

It has the same output for uppercase and lowercase:

name in=A55
out cChar=0x00

name in=a55
out cChar=0x00

Comment 11 himajin100000 2021-11-04 15:30:37 UTC

by the way, what do you think this line of the orignal code is doing ?
num = num * 52 + cChar;

Comment 12 Dave McKellar 2021-11-04 15:53:16 UTC

Good question!
At that point cChar is binary zero for 'A' and binary 25 for 'Z'.
Adding 52 to that gets nonsense ASCII: 'A' becomes '4' and 'Z' becomes 'M'.
So its probably not an ASCII conversion.
(The difference between 'A' and 'a' is 32 - sort of like 52.)
I guess its making it into the range that the caller of that function expects.
Why?, not sure.

Comment 13 Mike Kaganski 2021-11-04 15:56:58 UTC

(In reply to Dave McKellar from comment #10)

Ah - indeed 'a' - '[' == 6, not 32 as it should be to convert to lowercase. Please do submit your fix to gerrit!

Comment 14 Mike Kaganski 2021-11-04 15:58:28 UTC

(In reply to himajin100000 from comment #11)
> by the way, what do you think this line of the orignal code is doing ?
> num = num * 52 + cChar;

This looks a separate problem, doing nonsense to handle two-character column numbers like AA.

Comment 15 Mike Kaganski 2021-11-04 16:09:55 UTC

Now thinking about it a bit more

It seems that the address is case-sensitive, with A != a, and the full character repertoire being uppercase then lowercase, with Z followed by a, and z followed by AA. Then 52 makes perfect sense being 26*2.

Comment 16 Dave McKellar 2021-11-04 16:12:29 UTC

> It seems that the address is case-sensitive, with A != a, and the full
> character repertoire being uppercase then lowercase, with Z followed by a,
> and z followed by AA. Then 52 makes perfect sense being 26*2.

Its doing base 52 math.

Comment 17 Mike Kaganski 2021-11-04 16:13:19 UTC

Heh, so it is NOTABUG.
Try to insert a table with e.g. 60 columns, and check its columns :-)

Comment 18 Dave McKellar 2021-11-04 16:16:02 UTC

But in my original bug it changed the formula.  That seems wrong.  Even if it was out of range.

Comment 19 Mike Kaganski 2021-11-04 16:22:58 UTC

(In reply to Dave McKellar from comment #18)

The same way as when you try to put '=<K1>' in a table with three columns. Unrelated to upper-vs-lowercase, and specific to the way Writer's tables validate formula references. Self-consistent in a sense...

I am very disappointed to learn that Writer uses totally different column naming convention compared to Calc. But anyway, there is help missing on this topic.

Comment 20 Dave McKellar 2021-11-04 16:29:20 UTC

It seems very unfriendly to delete something the user entered.  Better to keep the formula and display "invalid reference" (or something) for the value.

Since the difference between Writer and Calc doesn't appear to be documented, perhaps Writer can be changed and then documented.  To help users transition if it sees lowercase call names it can display a message.  That would be best in the long run.

Comment 21 Mike Kaganski 2021-11-04 16:36:41 UTC

Fun detail is that it's even can't be justified "for compatibility with Word", since Word uses normal 26-base reference scheme.

Wrt "unfriendly to delete something the user entered": IIUC, the formula text is never kept, it's parsed, tokenized, and the result is stored. Whatever can't be tokenized is not stored. 

Without much change that would highlight the problem, keeping the formula intact and only displaying "invalid reference" would make fixing formulas next to impossible, when user doesn't see which part is wrong.

Comment 22 Mike Kaganski 2021-11-04 16:39:09 UTC

(In reply to Dave McKellar from comment #20)
> Since the difference between Writer and Calc doesn't appear to be
> documented, perhaps Writer can be changed and then documented.  To help
> users transition if it sees lowercase call names it can display a message. 
> That would be best in the long run.

It *looks* reasonable, but dangerous. Personally I support this - but I'd suggest you to post a message to the dev mailing list asking for opinions on the proposal.

Comment 23 Mike Kaganski 2021-11-04 16:42:39 UTC

(In reply to Mike Kaganski from comment #22)
> It *looks* reasonable, but dangerous. Personally I support this - but I'd
> suggest you to post a message to the dev mailing list asking for opinions on
> the proposal.

OTOH, the formula is written into file format using this convention, so the change of this, however undocumented, convention would be incompatible.

Comment 24 Dave McKellar 2021-11-04 16:54:21 UTC

On unparsable formulas:
If you enter a syntax error like: +++++
The formula is retained but the value "** Expression is faulty **" is displayed.
But if you enter something wrong inside angle brackets - eg <junk> it is changed into <?>
That's inconsistent.  Perhaps the representation of a range could be expanded
to hold things it can not parse as a string.

Comment 25 V Stuart Foote 2021-11-04 18:45:17 UTC

Created attachment 176108 [details]
sample Writer ODF with 60 column table, using cell formulas

So in this attached Writer ODF with a 60 column table, it shows the column notation is -- A-Z,a-z,AA-AZ, etc.

It probably would be better to make the cell labeling in Writer match the labeling in Calc, i.e  A-Z,AA-AZ,BA-BZ, etc. 

That is, to eliminate the a-z labeling like calc by using the rtl::isAsciiAlpha() and rtl::toAsciiUpperCase() conversion as Eike suggested in comment 6

There would need to be a convert on edit/save for any existing documents, but IMHO consistency would be better in the long run.

Comment 26 V Stuart Foote 2021-11-04 19:01:50 UTC

(In reply to V Stuart Foote from comment #25)
> Created attachment 176108 [details]
> sample Writer ODF with 60 column table, using cell formulas
> but IMHO consistency would be better in the long run.

For what it is worth, Word 2019 chokes on the ODF.

Comment 27 Eike Rathke 2021-11-05 13:07:06 UTC

(In reply to Mike Kaganski from comment #17)
> Heh, so it is NOTABUG.
> Try to insert a table with e.g. 60 columns, and check its columns :-)

It's still a bug because ODF 9.2.1 Referencing Table Cells
https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.html#__RefHeading__1415614_253892949
says that after Z the next column 27 is AA.

However, if this is corrected then
a) import must take care of converting everything above A-Z to the proper notation
b) writing the proper values of AA... to file will let older versions address wrong columns
c) when reading file and encountering AA... it needs to decide whether that could be the old broken notation for value 53... or the new 27..., likely by inspecting whether it fits into the actual table width
d) though not recommended and frowned upon, it may be necessary to act on the <meta:generator> value if that is neither LibreOffice nor OpenOffice then assume the proper ODF conform values were written; if it is LibreOffice then its version number could help to determine how to handle c)

So, to remedy b) somewhat, when fixing this in master apply changes for current release branches that when encountering AA... do c) but do *not* the conversion of a) and b) to not propagate the fixed values to even older release versions that don't know anything about this.

Good luck ;-)

Comment 28 Mike Kaganski 2021-11-05 13:13:24 UTC

(In reply to Eike Rathke from comment #27)
> It's still a bug because ODF 9.2.1 Referencing Table Cells
> https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/
> OpenDocument-v1.3-os-part3-schema.html#__RefHeading__1415614_253892949
> says that after Z the next column 27 is AA.

Yay! :-D

> Good luck ;-)

I second this.

Comment 29 Heiko Tietze 2021-11-10 08:31:13 UTC

Issue seems to be clear (with a lot of facets). If we keep lowercase and accept =sum(<a1:a3>) it might break backward compatibility.

Removing needsUXEval.

Comment 30 V Stuart Foote 2021-11-10 14:34:33 UTC

(In reply to Heiko Tietze from comment #29)
> Issue seems to be clear (with a lot of facets). If we keep lowercase and
> accept =sum(<a1:a3>) it might break backward compatibility.
> 
> Removing needsUXEval.

Sigh, "fixing" this would require major refactoring of Writer tables. Where using =sum(<a1:a3>) is perfectly acceptable cell reference to 1st, 2nd and 3rd row of the *27th* column of a Writer table! If they don't exist in the table, it alerts.

In a Calc sheet those cells would be referenced by <AA1:AA3>. And, entering that same range in lower case is corrected by Calc edit shell to <A1:A3>. ODF standard does not permit addressing cells with lower case--they are converted in Calc.

Calc complies with ODF, Writer does not.

But this is an UX issue requiring assessment of what extent do we need to accommodate the prior usage?  And how quickly this needs to be corrected? 

UX Considerations:
1) how common are tables in our Writer .ODT wider than 26 columns?
2) how disruptive would it be changing Writer formula input to ODF standard?
3) can we let the current addressing remain unchanged?
4) are we obliged to maintain backward compatibility of Writer table addressing for use on prior builds?
5) is it better to fully convert a Writer document to new "corrected" ODF compliance, and when (on opening, or on save)? Or should we attempt more complex handling of supporting both addressing schemes?


To me it is pretty simple. "Zero" complaints from OOo/AOO/LO means few if any users of Writer have bumped into the issue of mixed-case addressing in Writer tables wider than 26 columns. They are not common!  Not much disruption by fixing it correctly to comply with the ODF document model.  Frankly we must correct it (or the world questions our standards compliance--we have enough issues with SVG already).

The last two are a bit tougher, and interlinked. I'd argue strongly that the best action is to aggressively refactor handling of formula addressing of Writer tables to be ODF table cell address compliant--eliminate lowercase 52*2 addressing. Then match the Calc shell validation to convert lowercase -> upper case cell ref input.  

For simplicity .ODT filter import should convert incoming Table cell formulas from the A-Z,a-z,AA-AZ cell addressing to ODF compliant A-Z,AA-AZ,BA-BZ. And we should write back any formulas in ODF compliant form. The document save will complete the conversion--only closing without save should abandon conversion.

Any new Writer table formulas will be correct, any old documents would be irrevocably converted. Yes someone on an old build of LO opening a new document would have formula issues--but then the old version is not an ODF compliant renderer.

We may annoy a few users, but there needs to be clear UX agreement of the approach to correcting this. To me a clean break is better than muddled attempt to accommodate legacy Writer table cell addressing.

Comment 31 Dave McKellar 2021-11-11 14:03:04 UTC Comment hidden (noise)

Changing Writer cell addressing to be the same as Calc means the code could be shared too.  Besides slightly reducing the codebase this has a UX benefit: the two applications will be in sync for future improvements or bug fixes.

Comment 32 Eike Rathke 2021-11-12 12:49:23 UTC Comment hidden (noise)

(In reply to Dave McKellar from comment #31)
> Changing Writer cell addressing to be the same as Calc means the code could
> be shared too.
Wishful thinking, but no, it does not. They are completely different implementations and suit different needs.

Comment 33 Heiko Tietze 2021-11-16 11:56:48 UTC

(In reply to V Stuart Foote from comment #30)
> We may annoy a few users, but there needs to be clear UX agreement of the
> approach to correcting this. To me a clean break is better than muddled
> attempt to accommodate legacy Writer table cell addressing.

ODF conformity is prime likewise compatibility. It should be possible to open documents with old/wrong and new/correct references in upcoming and old versions.

OTOH, whether a1 is the first or the 27th column would be unclear to users. I agree that most tables are shorter. So perhaps a clear cut is better than fiddling around with compatibility.

But that's rather a decision for the ESC. And we gave enough input.

Comment 34 Heiko Tietze 2021-11-18 15:23:02 UTC

ESC discussed the topic:

=> OK to switch to openformula in Writer tables if we don't break old documents and somebody steps up to do the work (all)

Comment 35 QA Administrators 2024-09-19 03:17:09 UTC Comment hidden (obsolete)

Dear Dave McKellar,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.

If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not
appropriate in this case)

If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword

Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug

Comment 36 Dave McKellar 2024-09-19 18:57:19 UTC

As requested, I retested this issue in the current version of Libre Writer 24.8.1.2 and it still happens.  Lowercase ranges are not tolerated.