Calc version 6.2 introduced a regression with a possibility of data corruption. This affects locales where the grouping separator is not comma. Steps to reproduce: 1) use a locale like fi_FI for the document 2) enter a value like 12345,67 into a worksheet cell 3) pick a number format like "# ##0,00" for the cell 4) copy the cell to clipboard 5) enter another cell, type a starting = and paste The result is Err:509 in the target cell and the pasted value has an extra zero before the decimal separator, i.e. =123450,67 If the worksheet with such an error is closed and then reopened, the error status gets cleared but the value in the target cell is the one with an extra zero. This happens with both the provided Windows binary (6.2.0.3) and a Linux version (6.2.1.1) compiled from source. If the originating cell has a format with no group separator or if the group separator of the locale is comma, everything works like before. The reason behind the regression appears to be an incomplete fix to bug 42518, since reverting commits 673bd16887c354981120f7ebea44b25b0ab2e41e and 9336286a7ea5385541344f444e6f8702c85bdacb is enough to build a binary without the regression.
I reproduce the bug with locale fr-FR: - Version: 6.2.0.3 (x64) Build ID: 98c6a8a1c6c7b144ce3cc729e34964b47ce25d62 Threads CPU : 2; OS : Windows 6.1; UI Render : par défaut; VCL: win; Locale : fr-FR (fr_FR); Langue IHM : fr-FR Calc: CL I do NOT reproduce with: - Version: 6.1.5.1 (x64) Build ID: f18954c1ba9116b85c32b6bdbc0188d3e0fd24c7 Threads CPU : 2; OS : Windows 6.1; UI Render : par défaut; Locale : fr-FR (fr_FR); Calc: CL
Same bug with new profile Same bug with more or less decimals Same bug with locale en-US No bug without '=' No bug without thousand separator So procedure can be simplified to: 1. Enter a value larger than 1000 2. Add thousand separator 3. Copy cell 4. Type a starting '=' and paste Behavior will depend on value, if it contains '0' or not: some '0' are removed in pasted string 1,000 => 1,0 1,001 => 1,1 1,034,000 => 1,34,0 1,034,001 => 1,34,1 1,034,010 => 1,34,10 103,000 => 103,0 103,001 => 103,1 103,011 => 103,11 I was not able to add an extra zero.
Entering a formula it was never guaranteed to accept group separators in values (as in many locales the group separator also is the parameter separator ','). It may have worked by chance in some locales that use a NO-BREAK SPACE as group separator, but already using an ordinary SPACE character instead made that attempt fail; or if parameter separator was set to ';' a group separator of ',' may have worked. I can investigate why an extra zero is added in 12345,67 => =123450,67 of the original description (fi-FI locale). But in an en-US locale with default ',' parameter separator things like 1,000 => 1,0 are logical because it is parsed as number 1 separator , number 0 Certainly the result of SUM(1,234) shall not be 1234 but 235 instead.. This was not different in earlier versions.
(In reply to Eike Rathke from comment #3) > Entering a formula it was never guaranteed to accept group separators in > values (as in many locales the group separator also is the parameter > separator ','). It may have worked by chance in some locales that use a > NO-BREAK SPACE as group separator, but already using an ordinary SPACE > character instead made that attempt fail; or if parameter separator was set > to ';' a group separator of ',' may have worked. There's still the problem that this happens also when what's copied on the clipboard is the entire cell, not just its displayed value. Changing how a number is formatted in one cell probably shouldn't cause Err:509 in another. Is there a reason to use the display formatted string in paste to a non-string context when the numeric value is also available? > I can investigate why an extra zero is added in > 12345,67 => =123450,67 > of the original description (fi-FI locale). Thanks.
Jouni: thanks for your report and investigation. For bisecting, we have special repos that contain binaries for commits (or commit ranges in the older ones): https://wiki.documentfoundation.org/QA/Bibisect
What happens is that 12_345,67 (here with '_' for the no-break space) is lexically tokenized to 12 _345 ,67 which then leads to three numbers 12 and 345 (because there in the number parser the no-break space is just a leading space) and 0,67 (from which stems the extra 0 because a leading decimal separator is allowed), which then with the formula tokens concatenated as strings produces 123450,67
(In reply to Buovjaga from comment #5) > Jouni: thanks for your report and investigation. For bisecting, we have > special repos that contain binaries for commits (or commit ranges in the > older ones): https://wiki.documentfoundation.org/QA/Bibisect I see that this issue has already been marked as bisected, so I'll just drop this here for completeness' sake: $ git bisect good 9ab19b74662842c303ef72c33f6045cb701ae299 is the first bad commit commit 9ab19b74662842c303ef72c33f6045cb701ae299 Author: Jenkins Build User <tdf@pollux.tdf> Date: Sat Dec 1 03:52:19 2018 +0100 source 9336286a7ea5385541344f444e6f8702c85bdacb source 9336286a7ea5385541344f444e6f8702c85bdacb :040000 040000 1c2382d2d7f50dab3c61ab8d29a824698e3c41cc 9070851048d99c2944c6643546ccbc1ff445e357 M instdir As expected, that source sha is for the commit for the intended fix in bug 42518 comment 24 mentioned in the original report.
(In reply to Eike Rathke from comment #6) > What happens is that 12_345,67 (here with '_' for the no-break space) is > lexically tokenized to > > 12 > _345 > ,67 > > which then leads to three numbers 12 and 345 (because there in the number > parser the no-break space is just a leading space) and 0,67 (from which > stems the extra 0 because a leading decimal separator is allowed), which > then with the formula tokens concatenated as strings produces 123450,67 That would explain the extra zero and also the Err:509 afterwards, since that ends up producing three numeric tokens with no operators in between. It doesn't explain *why* this happens or why it *doesn't* happen when there is a decimal separator but no group separator. Both 6.1 and 6.2 have now been released with this behavior and I can't find anything related in gerrit, is this being worked on or should I just try to figure out some kind of a fix myself?
(In reply to Jouni Kosonen from comment #8) > Both 6.1 and 6.2 have now been released with this behavior and I can't find > anything related in gerrit, is this being worked on or should I just try to > figure out some kind of a fix myself? If you could do that, it would be great and I think Eike would be happy to review your patch.
(In reply to Buovjaga from comment #9) > (In reply to Jouni Kosonen from comment #8) > > Both 6.1 and 6.2 have now been released with this behavior and I can't find > > anything related in gerrit, is this being worked on or should I just try to > > figure out some kind of a fix myself? > > If you could do that, it would be great and I think Eike would be happy to > review your patch. That could take a while, not least because Gentoo Linux just pulled the 6.2.1.2 ebuild I've been using to tinker with this and the 6.2.2.2 they replaced it with doesn't actually build. As far as I can tell, the best way to fix this without breaking bug 42518 again would be to process the group separator so that the formula parser never sees it was there in the first place. I'll look into it but Ḯ'm not overly optimistic.
I think I see some of what's happening now. In i18npool/source/characterclassification/cclass_unicode_parser.cxx cclass_Unicode::initParserTable, KParseTokens::GROUP_SEPARATOR_IN_NUMBER isn't on in nContTypes, but I don't know why. I can't seem to find where that bit is supposed to be set. This means that the cGroupSep code point doesn't get the ParserFlags::VALUE bit set and is actually set to zero itself. This then means that the code in cclass_Unicode::parseText handling cGroupSep not only doesn't look at the right codepoint, it doesn't even get run because it is conditioned on the ParserFlags::VALUE bit and the group separator is handled as nMask ParserFlags::(VALUE_SEP | WORD_SEP | CHAR_DONTCARE). This ssBounces the parser out of the ssGetValue state and the separator gets reevaluated in a new ssGetChar state as nMask ParserFlags::CHAR_WORD. That would be why the value gets emitted as sequence multiple tokens instead of a single parsed number. The numbers following the group separator get parsed in the same ssGetWord state that started from the group separator. If the value ends with a decimal part, that comma ends the word and gets evaluated outside the parseText state machine and appended to the sequence. TLDR: I get a working binary if I simply set the ParserFlags::VALUE bit on for the group separator code point regardless of KParseTokens::GROUP_SEPARATOR_IN_NUMBER being off in nContTypes, but that would presumably have some other effect elsewhere.
Created attachment 150588 [details] Minimal patch to restore functionality This appears to be the simplest patch. If the locale provides a decimal group separator, use the new KParseTokens::GROUP_SEPARATOR_IN_NUMBER to indicate this so that it doesn't get zeroed out. The patch was made against 6.2.3.1 sources.
(In reply to Jouni Kosonen from comment #12) > Created attachment 150588 [details] > Minimal patch to restore functionality > > This appears to be the simplest patch. If the locale provides a decimal > group separator, use the new KParseTokens::GROUP_SEPARATOR_IN_NUMBER to > indicate this so that it doesn't get zeroed out. > > The patch was made against 6.2.3.1 sources. Could you make a patch against master and submit it to gerrit? https://wiki.documentfoundation.org/Development/gerrit
(In reply to Jouni Kosonen from comment #12) > Created attachment 150588 [details] > Minimal patch to restore functionality The patch is wrong. It unconditionally adds the flag for every locale no matter what parser flags the caller set and thus eliminates the purpose of the flag. If at all then the flag should be set at caller level. But as described earlier, doing so may also lead to unexpected behaviour if the group separator equals the function parameter separator. This is a rather twisted situation, and things interfere also with ScCompiler's lexical analysis. Actually the i18npool parser is called only because the ScCompiler parser decided it would not handle the character (left for investigation if and how it could be solved at that level).
(In reply to Eike Rathke from comment #14) > The patch is wrong. It unconditionally adds the flag for every locale no > matter what parser flags the caller set and thus eliminates the purpose of > the flag. If at all then the flag should be set at caller level. But as > described earlier, doing so may also lead to unexpected behaviour if the > group separator equals the function parameter separator. This is a rather > twisted situation, and things interfere also with ScCompiler's lexical > analysis. Actually the i18npool parser is called only because the ScCompiler > parser decided it would not handle the character (left for investigation if > and how it could be solved at that level). Yes, but as I understand it there isn't a single line anywhere in the codebase that would set the flag – as things currently stand the group separator is unconditionally broken for all locales, some just don't show the symptoms. The ScCompiler parser may have decided it wouldn't handle the character but then the i18npool parser isn't allowed to recognize it.
(In reply to Jouni Kosonen from comment #15) > Yes, but as I understand it there isn't a single line anywhere in the > codebase that would set the flag That is the point. It should only be set under very specific circumstances in a well known surrounding. Which so far does not exist.
(In reply to Eike Rathke from comment #16) > (In reply to Jouni Kosonen from comment #15) > > Yes, but as I understand it there isn't a single line anywhere in the > > codebase that would set the flag > That is the point. It should only be set under very specific circumstances > in a well known surrounding. Which so far does not exist. Well, no. By your own comment about the flag, leaving it *unset* introduces new, unspecified behaviour and should only be done under very specific circumstances. Instead, that unspecified behaviour is now the default action and there's no way around it. The way the code is now, KParseTokens::GROUP_SEPARATOR_IN_NUMBER would need to be on by default unless someone explicitly clears it for reasons best known to themselves.
Without the change it was unspecified behaviour, in that it sometimes by chance "worked" under some circumstances to include a group separator in a number, and sometimes parsed a group separator away even if it wasn't meant as a group separator. With the change and KParseTokens::GROUP_SEPARATOR_IN_NUMBER not set it is specified behaviour that group separators in formulas are not ignored and end a number.
Taking a stab at this.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/+/4e3e0c6744b3fc33c7f4da7bd48136b861e9ed58%5E%21 Resolves: tdf#123752 allow group separator in formula values if possible It will be available in 6.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "libreoffice-6-3": https://git.libreoffice.org/core/+/c4eb7dddf047a613fee78188a65b36f5215b4c21%5E%21 Resolves: tdf#123752 allow group separator in formula values if possible It will be available in 6.3.0.1. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Verified Arch Linux 64-bit Version: 6.4.0.0.alpha0+ Build ID: 6c31c2b01dd32cc7ba1230f2c4a98b8f7def219b CPU threads: 8; OS: Linux 5.1; UI render: default; VCL: gtk3; Locale: fi-FI (fi_FI.UTF-8); UI-Language: en-US Calc: threaded Built on 5 June 2019
Patching with c4eb7dddf047a613fee78188a65b36f5215b4c21 does restore the functionality in 6.2.4.2.
Xisco Fauli committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/213430e0bdac0786b30a76a68b43d35647e93912 tdf#123752, tdf#138432: sc_uicalc: Add unittest It will be available in 7.3.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.