Created attachment 176001 [details] And RTF doc made with Microsoft Word using a lowercase range - this is correctly imported In Microsoft Word I entered lowercase for a range it saved SUM(a1:a3) in the RTF file. LibreOffice Writer correctly changed this to SUM(<A1:A3>) when importing. But if I manually enter a lowercase range SUM(<a1:a3>) in LibreOffice Writer its not tolerated. The value changes to 0 and when I go back to the formula editor the formula has become SUM(<?:?>). Perhaps the rules for ranges is uppercase only but this seems overly strict and unfriendly since we are inside angle brackets (meaning range). Writer should tolerate lowercase ranges or convert it to uppercase.
Repro with Version: 7.2.2.2 (x64) / LibreOffice Community Build ID: 02b2acce88a210515b4a5bb2e46cbfb63fe97d56 CPU threads: 12; OS: Windows 10.0 Build 19043; UI render: default; VCL: win Locale: ru-RU (ru_RU); UI: en-US Calc: threaded and also with OOo 3.2.0. E.g., creating a table 2x2 in a new document, and entering the formula into A1: =sum(<b1:b2>) results in "expression is faulty"; while this formula: =sum(<B1:B2>) gives proper result.
The edit shell for Calc tables corrects lowercase entered cell refs to uppercase in resulting formula. So this is about Writer Table cell formula entry *only*. Can a similar input validation be added to the edit shell for Writer tables? Not a regression, but adding to bug 123370 All documentation (help & the 7.2 Writer Guide) shows cell refs Upper case. But has no caution or note of effect of using a cell reference entered with lower case. Strange it has never been reported against Writer, also checked the AOO bugzilla as well with no clear dupe.
haven't looked closer into the code, but POSSIBLY related. https://opengrok.libreoffice.org/xref/core/sw/source/core/bastyp/calc.cxx?r=78cee244#SwCalc
currently reading... https://opengrok.libreoffice.org/xref/core/sw/source/core/table/swtable.cxx?r=546e7d14&fi=GetBoxNum#1276
Perhaps use standard functions for that code: if (!isalpha(cChar)) break if (islower(cChar)) cChar = toupper(cChar); cChar -= 'A'; To replace: if ((cChar<'A' || cChar>'Z') && (cChar<'a' || cChar>'z')) break; cChar -= 'A'; if( cChar >= 26 ) cChar -= 'a' - '[';
No, never use isalpha(), islower(), toupper() or similar standard C/C++ library functions if you want to operate on exactly ASCII characters, because these functions may be localized and thus yield unexpected results. Instead, use rtl::isAsciiAlpha() and rtl::toAsciiUpperCase()
Thanks for the info, Eike. I am new to LibreOffice. That makes my suggestion: if (!rtl::isAsciiAlpha(cChar)) break if (rtl::isAsciiLowerCase(cChar)) cChar = rtl::toAsciiUpperCase(cChar); cChar -= 'A';
(In reply to Dave McKellar from comment #7) And you even may remove one check that is already built into the inline toAsciiUpperCase [1]: > if (!rtl::isAsciiAlpha(cChar)) > break; > cChar = rtl::toAsciiUpperCase(cChar) - 'A'; That would be a nice cleanup, and I see how it would improve the code; but I don't see how would that *change* current behavior. [1] https://opengrok.libreoffice.org/xref/core/include/rtl/character.hxx?r=b1fb6338#290
For calc don't we use it like that for lcl_a1_get_col () in address.cxx [1]. We wouldn't need it to be as complex, but validating input of writer table cell addresses seems worth doing. =-ref-= [1] https://opengrok.libreoffice.org/xref/core/sc/source/core/tool/address.cxx?r=96f97eb0#924
I did a test and my change does fix the code. Sorry if this post is too big. I made a test program. First with the original code and some print's: #include <stdio.h> #include <ctype.h> void convertName(const char *name) { printf("name in=%s\n", name); for (const char *p = name; *p; p++) { char cChar = *p; if ((cChar<'A' || cChar>'Z') && (cChar<'a' || cChar>'z')) break; cChar -= 'A'; if( cChar >= 26 ) cChar -= 'a' - '['; printf("out cChar=0x%02x\n", cChar); } printf("\n"); } int main() { convertName("A55"); convertName("a55"); return 0; } Which prints out: name in=A55 out cChar=0x00 name in=a55 out cChar=0x1a In other words, lowercase and uppercase get different results. Then I added one toupper() line: #include <stdio.h> #include <ctype.h> void convertName(const char *name) { printf("name in=%s\n", name); for (const char *p = name; *p; p++) { char cChar = *p; if ((cChar<'A' || cChar>'Z') && (cChar<'a' || cChar>'z')) break; if (islower(cChar)) cChar = toupper(cChar); // NEW NEW cChar -= 'A'; if( cChar >= 26 ) cChar -= 'a' - '['; printf("out cChar=0x%02x\n", cChar); } printf("\n"); } int main() { convertName("A55"); convertName("a55"); return 0; } It has the same output for uppercase and lowercase: name in=A55 out cChar=0x00 name in=a55 out cChar=0x00
by the way, what do you think this line of the orignal code is doing ? num = num * 52 + cChar;
Good question! At that point cChar is binary zero for 'A' and binary 25 for 'Z'. Adding 52 to that gets nonsense ASCII: 'A' becomes '4' and 'Z' becomes 'M'. So its probably not an ASCII conversion. (The difference between 'A' and 'a' is 32 - sort of like 52.) I guess its making it into the range that the caller of that function expects. Why?, not sure.
(In reply to Dave McKellar from comment #10) Ah - indeed 'a' - '[' == 6, not 32 as it should be to convert to lowercase. Please do submit your fix to gerrit!
(In reply to himajin100000 from comment #11) > by the way, what do you think this line of the orignal code is doing ? > num = num * 52 + cChar; This looks a separate problem, doing nonsense to handle two-character column numbers like AA.
Now thinking about it a bit more It seems that the address is case-sensitive, with A != a, and the full character repertoire being uppercase then lowercase, with Z followed by a, and z followed by AA. Then 52 makes perfect sense being 26*2.
> It seems that the address is case-sensitive, with A != a, and the full > character repertoire being uppercase then lowercase, with Z followed by a, > and z followed by AA. Then 52 makes perfect sense being 26*2. Its doing base 52 math.
Heh, so it is NOTABUG. Try to insert a table with e.g. 60 columns, and check its columns :-)
But in my original bug it changed the formula. That seems wrong. Even if it was out of range.
(In reply to Dave McKellar from comment #18) The same way as when you try to put '=<K1>' in a table with three columns. Unrelated to upper-vs-lowercase, and specific to the way Writer's tables validate formula references. Self-consistent in a sense... I am very disappointed to learn that Writer uses totally different column naming convention compared to Calc. But anyway, there is help missing on this topic.
It seems very unfriendly to delete something the user entered. Better to keep the formula and display "invalid reference" (or something) for the value. Since the difference between Writer and Calc doesn't appear to be documented, perhaps Writer can be changed and then documented. To help users transition if it sees lowercase call names it can display a message. That would be best in the long run.
Fun detail is that it's even can't be justified "for compatibility with Word", since Word uses normal 26-base reference scheme. Wrt "unfriendly to delete something the user entered": IIUC, the formula text is never kept, it's parsed, tokenized, and the result is stored. Whatever can't be tokenized is not stored. Without much change that would highlight the problem, keeping the formula intact and only displaying "invalid reference" would make fixing formulas next to impossible, when user doesn't see which part is wrong.
(In reply to Dave McKellar from comment #20) > Since the difference between Writer and Calc doesn't appear to be > documented, perhaps Writer can be changed and then documented. To help > users transition if it sees lowercase call names it can display a message. > That would be best in the long run. It *looks* reasonable, but dangerous. Personally I support this - but I'd suggest you to post a message to the dev mailing list asking for opinions on the proposal.
(In reply to Mike Kaganski from comment #22) > It *looks* reasonable, but dangerous. Personally I support this - but I'd > suggest you to post a message to the dev mailing list asking for opinions on > the proposal. OTOH, the formula is written into file format using this convention, so the change of this, however undocumented, convention would be incompatible.
On unparsable formulas: If you enter a syntax error like: +++++ The formula is retained but the value "** Expression is faulty **" is displayed. But if you enter something wrong inside angle brackets - eg <junk> it is changed into <?> That's inconsistent. Perhaps the representation of a range could be expanded to hold things it can not parse as a string.
Created attachment 176108 [details] sample Writer ODF with 60 column table, using cell formulas So in this attached Writer ODF with a 60 column table, it shows the column notation is -- A-Z,a-z,AA-AZ, etc. It probably would be better to make the cell labeling in Writer match the labeling in Calc, i.e A-Z,AA-AZ,BA-BZ, etc. That is, to eliminate the a-z labeling like calc by using the rtl::isAsciiAlpha() and rtl::toAsciiUpperCase() conversion as Eike suggested in comment 6 There would need to be a convert on edit/save for any existing documents, but IMHO consistency would be better in the long run.
(In reply to V Stuart Foote from comment #25) > Created attachment 176108 [details] > sample Writer ODF with 60 column table, using cell formulas > but IMHO consistency would be better in the long run. For what it is worth, Word 2019 chokes on the ODF.
(In reply to Mike Kaganski from comment #17) > Heh, so it is NOTABUG. > Try to insert a table with e.g. 60 columns, and check its columns :-) It's still a bug because ODF 9.2.1 Referencing Table Cells https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.html#__RefHeading__1415614_253892949 says that after Z the next column 27 is AA. However, if this is corrected then a) import must take care of converting everything above A-Z to the proper notation b) writing the proper values of AA... to file will let older versions address wrong columns c) when reading file and encountering AA... it needs to decide whether that could be the old broken notation for value 53... or the new 27..., likely by inspecting whether it fits into the actual table width d) though not recommended and frowned upon, it may be necessary to act on the <meta:generator> value if that is neither LibreOffice nor OpenOffice then assume the proper ODF conform values were written; if it is LibreOffice then its version number could help to determine how to handle c) So, to remedy b) somewhat, when fixing this in master apply changes for current release branches that when encountering AA... do c) but do *not* the conversion of a) and b) to not propagate the fixed values to even older release versions that don't know anything about this. Good luck ;-)
(In reply to Eike Rathke from comment #27) > It's still a bug because ODF 9.2.1 Referencing Table Cells > https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/ > OpenDocument-v1.3-os-part3-schema.html#__RefHeading__1415614_253892949 > says that after Z the next column 27 is AA. Yay! :-D > Good luck ;-) I second this.
Issue seems to be clear (with a lot of facets). If we keep lowercase and accept =sum(<a1:a3>) it might break backward compatibility. Removing needsUXEval.
(In reply to Heiko Tietze from comment #29) > Issue seems to be clear (with a lot of facets). If we keep lowercase and > accept =sum(<a1:a3>) it might break backward compatibility. > > Removing needsUXEval. Sigh, "fixing" this would require major refactoring of Writer tables. Where using =sum(<a1:a3>) is perfectly acceptable cell reference to 1st, 2nd and 3rd row of the *27th* column of a Writer table! If they don't exist in the table, it alerts. In a Calc sheet those cells would be referenced by <AA1:AA3>. And, entering that same range in lower case is corrected by Calc edit shell to <A1:A3>. ODF standard does not permit addressing cells with lower case--they are converted in Calc. Calc complies with ODF, Writer does not. But this is an UX issue requiring assessment of what extent do we need to accommodate the prior usage? And how quickly this needs to be corrected? UX Considerations: 1) how common are tables in our Writer .ODT wider than 26 columns? 2) how disruptive would it be changing Writer formula input to ODF standard? 3) can we let the current addressing remain unchanged? 4) are we obliged to maintain backward compatibility of Writer table addressing for use on prior builds? 5) is it better to fully convert a Writer document to new "corrected" ODF compliance, and when (on opening, or on save)? Or should we attempt more complex handling of supporting both addressing schemes? To me it is pretty simple. "Zero" complaints from OOo/AOO/LO means few if any users of Writer have bumped into the issue of mixed-case addressing in Writer tables wider than 26 columns. They are not common! Not much disruption by fixing it correctly to comply with the ODF document model. Frankly we must correct it (or the world questions our standards compliance--we have enough issues with SVG already). The last two are a bit tougher, and interlinked. I'd argue strongly that the best action is to aggressively refactor handling of formula addressing of Writer tables to be ODF table cell address compliant--eliminate lowercase 52*2 addressing. Then match the Calc shell validation to convert lowercase -> upper case cell ref input. For simplicity .ODT filter import should convert incoming Table cell formulas from the A-Z,a-z,AA-AZ cell addressing to ODF compliant A-Z,AA-AZ,BA-BZ. And we should write back any formulas in ODF compliant form. The document save will complete the conversion--only closing without save should abandon conversion. Any new Writer table formulas will be correct, any old documents would be irrevocably converted. Yes someone on an old build of LO opening a new document would have formula issues--but then the old version is not an ODF compliant renderer. We may annoy a few users, but there needs to be clear UX agreement of the approach to correcting this. To me a clean break is better than muddled attempt to accommodate legacy Writer table cell addressing.
Changing Writer cell addressing to be the same as Calc means the code could be shared too. Besides slightly reducing the codebase this has a UX benefit: the two applications will be in sync for future improvements or bug fixes.
(In reply to Dave McKellar from comment #31) > Changing Writer cell addressing to be the same as Calc means the code could > be shared too. Wishful thinking, but no, it does not. They are completely different implementations and suit different needs.
(In reply to V Stuart Foote from comment #30) > We may annoy a few users, but there needs to be clear UX agreement of the > approach to correcting this. To me a clean break is better than muddled > attempt to accommodate legacy Writer table cell addressing. ODF conformity is prime likewise compatibility. It should be possible to open documents with old/wrong and new/correct references in upcoming and old versions. OTOH, whether a1 is the first or the 27th column would be unclear to users. I agree that most tables are shorter. So perhaps a clear cut is better than fiddling around with compatibility. But that's rather a decision for the ESC. And we gave enough input.
ESC discussed the topic: => OK to switch to openformula in Writer tables if we don't break old documents and somebody steps up to do the work (all)
Dear Dave McKellar, To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
As requested, I retested this issue in the current version of Libre Writer 24.8.1.2 and it still happens. Lowercase ranges are not tolerated.