Bug 97983 - cInt function treats , as decimal
Summary: cInt function treats , as decimal
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: BASIC (show other bugs)
Version:
(earliest affected)
3.5.0 release
Hardware: All All
: medium normal
Assignee: Andreas Heinisch
URL:
Whiteboard: target:6.5.0 target:7.4.0
Keywords:
Depends on:
Blocks: Macro-StarBasic
  Show dependency treegraph
 
Reported: 2016-02-18 15:58 UTC by tim
Modified: 2021-12-09 14:59 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
Demonstration Writer document with macro (9.98 KB, application/vnd.oasis.opendocument.text)
2016-02-18 15:58 UTC, tim
Details
test cInt (9.91 KB, application/vnd.oasis.opendocument.spreadsheet)
2019-12-02 18:33 UTC, Oliver Brinzing
Details
Another test document (10.52 KB, application/vnd.oasis.opendocument.text)
2021-11-23 09:16 UTC, Tor Lillqvist
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tim 2016-02-18 15:58:21 UTC
Created attachment 122772 [details]
Demonstration Writer document with macro

The cInt function treats string "1,234" as being 1.  This is true on 3 different operating systems on 2 different PCs, all of which are defined as being in the EN GB locale using '.' as the decimal point in the language settings (I have double-checked this).

I have tested this on:

ubuntu 14.04 64 bit, Version: 5.1.1.1 Build ID: 1:5.1.1~rc1-0ubuntu1~trusty1 CPU Threads: 8; OS Version: Linux 4.2; UI Render: default; Locale: en-GB (en_GB.UTF-8) (from ubuntu pre-release PPA)

Windows 7 (on same PC) Version 5.0.4.2 (from Libreoffice website)

xubuntu 15.04 (a 64 bit laptop), Version 5.0.5.2 Build ID: 1.5.0.5-rc2-0ubuntu2 Locale en:GB (en GB.UTF-8) (from ubuntu release PPA)

I don't how far back this issue goes.  It's easy to work-around, just use format to remove the ','s, but could fool people for a long time if in the middle of a complex algorithm.

I attach a Writer document demonstrating this. Press the button to see what "1,234" turns into.  The macro is in 'Module1', 'Main', and just says:

Sub Main
	Dim sText as String
	Dim iIntText as Integer
	sText = "1,234"
	iIntText = cInt (sText)
	MsgBox "1,234 becomes" & iIntText
End Sub

I originally found this using Basic in Base.  I thought I'd demonstrate it in Writer just to show it was nothing to do with Base.

Or am I being really stupid?
Comment 1 Buovjaga 2016-02-20 15:53:46 UTC
Repro.

Win 7 Pro 64-bit, Version: 5.1.0.3 (x64)
Build ID: 5e3e00a007d9b3b6efb6797a8b8e57b51ab1f737
CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; 
Locale: fi-FI (fi_FI)

LibreOffice 3.5.0rc3 
Build ID: 7e68ba2-a744ebf-1f241b7-c506db1-7d53735
Comment 2 tim 2016-02-20 15:57:32 UTC
I have tried to verify that my understanding of the correct behaviour of cInt is correct.  I've not found any absolutely definitive source, although several websites recommend using it as opposed to 'Val' because it tolerates regional variations better.
Comment 3 tim 2016-02-20 16:06:00 UTC
(In reply to tim from comment #2)
> I have tried to verify that my understanding of the correct behaviour of
> cInt is correct.  I've not found any absolutely definitive source, although
> several websites recommend using it as opposed to 'Val' because it tolerates
> regional variations better.
I just found this:
http://www.csidata.com/custserv/onlinehelp/vbsdocs/vbs79.htm

Which claims to be from Microsoft for Visual Basic.  It says:

 "Use the CInt function to provide internationally aware conversions from any other data type to an Integer subtype. For example, different decimal separators are properly recognized depending on the locale setting of your system, as are different thousand separators."
Comment 4 QA Administrators 2017-03-06 15:51:32 UTC Comment hidden (obsolete)
Comment 5 tim 2017-03-06 16:55:31 UTC
The bug is still present on 64 bit ubuntu 16.10, librefoffice:

Version: 5.3.1.1
Build ID: 1:5.3.1~rc1-0ubuntu1~yakkety0
CPU Threads: 8; OS Version: Linux 4.8; UI Render: default; VCL: x11; Layout Engine: new; 
Locale: en-GB (en_GB.UTF-8); Calc: group

All Language Settings are set to GB.
Comment 6 QA Administrators 2018-03-07 03:41:54 UTC Comment hidden (obsolete)
Comment 7 tim 2018-03-07 07:52:39 UTC
Retested under Version: 6.0.2.1
Build ID: 1:6.0.2~rc1-0ubuntu0.17.10.1~lo1
CPU threads: 8; OS: Linux 4.13; UI render: GL; VCL: gtk2; 
Locale: en-GB (en_GB.UTF-8); Calc: group

The behaviour is unchanged. 1,234 is still transformed to be 1 (rather than 1234), regardless of the fact that I am in the UK, with UK settings.
Comment 8 QA Administrators 2019-03-08 03:41:07 UTC Comment hidden (obsolete)
Comment 9 tim 2019-03-08 09:48:04 UTC
The bug is still present on:

Version: 6.1.5.2
Build ID: 1:6.1.5~rc2-0ubuntu0.18.10.1~lo3
CPU threads: 8; OS: Linux 4.18; UI render: default; VCL: x11; 
Locale: en-GB (en_GB.UTF-8); Calc: group threaded
Comment 10 Commit Notification 2019-11-28 09:22:22 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/4fdc90c51e6a1bbb83c1f1826ad5b90dc1ff0ad6

tdf#97983 - Added localization for numeric types

It will be available in 6.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Buovjaga 2019-12-01 15:46:15 UTC
Andreas: thanks for working on this. However, I still get "1,234 becomes1" when I press the button in the example document.

Arch Linux 64-bit
Version: 6.5.0.0.alpha0+
Build ID: 7e403195e574be5174815a51cf5c42f06f76a87a
CPU threads: 8; OS: Linux 5.3; UI render: default; VCL: kf5; 
Locale: fi-FI (fi_FI.UTF-8); UI-Language: en-US
Calc: threaded
Built on 1 December 2019
Comment 12 Andreas Heinisch 2019-12-01 17:15:56 UTC
Yes I see, I missed some of the functions without a standard parameter, I will send a follow up patch tomorrow. Thank you for the test!
Comment 13 Andreas Heinisch 2019-12-02 10:45:07 UTC
Hm, just a stupid question: with your locale (Locale: fi-FI (fi_FI.UTF-8); UI-Language: en-US) this example should return 1 and for an an English locale the example will return 1234. So I think the patch is correct.
Comment 14 Buovjaga 2019-12-02 11:26:48 UTC
(In reply to Andreas Heinisch from comment #13)
> Hm, just a stupid question: with your locale (Locale: fi-FI (fi_FI.UTF-8);
> UI-Language: en-US) this example should return 1 and for an an English
> locale the example will return 1234. So I think the patch is correct.

Argh, sorry, you are right. I tested by running with LC_ALL=C and indeed I get "1,234 becomes1234" with en-US locale.
Comment 15 Oliver Brinzing 2019-12-02 18:33:26 UTC
Created attachment 156252 [details]
test cInt

Please add an entry to the release notes, cause this change can affect existing basic code.
Comment 16 Andreas Heinisch 2019-12-02 20:40:34 UTC
Thank you for the hint! Done:
https://wiki.documentfoundation.org/ReleaseNotes/6.5#Base
Comment 17 Tor Lillqvist 2021-11-22 15:31:15 UTC
If this bug was about the cInt() function specifically, why does the commit from comment #10 seem to be much more generic? Please correct me if I am wrong and it indeed is only about changing what cInt() does.
Comment 18 Andreas Heinisch 2021-11-22 15:45:16 UTC
The patch considered all the functions which retrieve their value from ImpScan [1]. It changed all of the functions retrieving any numerical value (cLng, cByte, etc.). So the patch is more generic and not only for cInt.

https://opengrok.libreoffice.org/xref/core/basic/source/sbx/sbxint.cxx?r=aef7feb3#707
Comment 19 Tor Lillqvist 2021-11-23 09:12:17 UTC
But the patch also changed what it means, in a decimal-comma and grouping-period locale, to simply assign a string that contains a floating-point number to a variable that has been declared as Integer, even with no function involved. I think that is wrong.

Previously,

Dim i1 as Integer
i1 = "1.2"

Made i1 have the value 1.

Now it has the value 12. That is IMHO a far too intrusive change and not good.
Comment 20 Tor Lillqvist 2021-11-23 09:16:03 UTC
And if assigning to a variable declared Single, previously it got the value 1.2, and now then 12. This is particularly surprising and counter-intuitive in my humble opinion. It should be only the CInt etc functions that take the locale into consideration.
Comment 21 Tor Lillqvist 2021-11-23 09:16:57 UTC
Created attachment 176442 [details]
Another test document

Look at the Basic code in this and compare how it works in LibreOffice 6 and 7.
Comment 22 Andreas Heinisch 2021-11-23 09:44:00 UTC
Should we open a new bug report in order to change only the cInt, cLong etc? Or should we reopen this one?
Comment 23 Mike Kaganski 2021-11-23 10:02:56 UTC
Just for the record:

Historic reference: OOo 1.0.3 used decimal dot irrespective to the locale, so it was not like "*originally* in StarOffice was so that plain assignment without any function used locale-specific conversion, then LO changed that to always use decimal-period, and then later that bug was filed and it was changed back" - or at least that then would be even earlier.

Documentation reference: StarOffice 8 Programmierhandbuch für BASIC 8 [1] implicitly tells in its "Implicit and Explicit Type Conversions" section, that implicit conversions match explicit CFoo, as opposed to Val function. It is especially obvious checking the next "Checking the Content of Variables" section, where it's clear that IsNumeric and friends complement implicit conversions, like in the code (from the section):

  If IsNumeric(UserInput) Then
    ValidInput = UserInput
  Else
    ValidInput = 0
    MsgBox "Error message."
  End If

Note how this is user input-oriented; and how this would work with input like ".123", where on decimal-comma-using locales, IsNumeric(".123") is False.

All in all, having this change in the released versions for some time already, it's very difficult to have a "fix" for the situation - with current implementation being *correct*, and older being de-facto standard for quite some time ... unfortunately.

[1] See https://wiki.documentfoundation.org/Macros
Comment 24 Tor Lillqvist 2021-11-23 10:57:47 UTC
The worst thing with the patch was that it also changed the unit test that verified the expected behaviour of implicit conversion from string to number. What is the point in having unit tests that verify some expected semantics if a patch can change those semantics (and break an unknown number of existing documents) as long as it also changes the unit test?
Comment 25 Mike Kaganski 2021-11-23 11:09:06 UTC
(In reply to Tor Lillqvist from comment #24)
> The worst thing with the patch was that it also changed the unit test that
> verified the expected behaviour of implicit conversion from string to
> number.

You are unfair to this commit. The unit test was added by myself in https://git.libreoffice.org/core/+/b24b0b9856b42eb6598c053703a11f86b6a6346c, and was intended to test a different thing. Changing it after inspecting its roots is just fine (but the change itself is indeed breaking...)
Comment 26 tim 2021-11-23 11:21:07 UTC
As the originator of this report (over 5 years ago) I'm feeling a little guilty for causing so much trouble. Sorry! ;-)
Comment 27 Tor Lillqvist 2021-11-23 11:30:31 UTC
it is also fun that we did have unit tests already since 2013 that were supposed to verify the documented locale-specific behaviour of implicit conversion from strings to numbers (but which in fact did not work as documented). But the test were written in such a way that they actually tested was that implicit conversion and explicit conversion with CDbl() give the same result. Which they did also before this patch... (See basic/qa/basic_coverage/da-DK/cdbl.bas and basic/qa/basic_coverage/da-DK/cdbl-2.bas.) If I understand correctly...
Comment 28 Andreas Heinisch 2021-11-23 11:33:57 UTC
I think discussing changes and even changing the underlying unit tests should be acceptable since a patch may change the current behaviour.

This patch was added with the best of knowledge and belief and atm there were no regression reports in bugzilla. I don't know which behaviour should be targeted now, but I am open to any changes which we may decide.
Comment 29 Mike Kaganski 2021-11-23 12:21:51 UTC Comment hidden (off-topic)
Comment 30 Mike Kaganski 2021-11-23 12:26:37 UTC Comment hidden (off-topic)
Comment 31 Tor Lillqvist 2021-11-23 12:38:17 UTC Comment hidden (off-topic)
Comment 32 Tor Lillqvist 2021-11-24 10:42:57 UTC
Suggested revert and re-work to affect *only* the CInt, CLng, CByte, CSng, and CDbl functions in https://gerrit.libreoffice.org/c/core/+/125756
Comment 33 Mike Kaganski 2021-11-26 22:16:18 UTC
Please consider also this:

  Dim d As Double
  Dim s As String
  d = 1.23
  s = d
  d = s
  MsgBox d

Note that prior to the change the result was unexpectedly 1.
Comment 34 Tor Lillqvist 2021-11-29 09:26:42 UTC
It is totally irrelevant whether how BASIC behaved before this change was "unexpected" or not. (To me, most of BASIC is "unexpected" and silly.) What is relevant is that it behaved in a certain way, for tens of years, and users wrote code accordingly. Whether the behaviour matched some old German language documentation that very few had read or not is irrelevant. Now after this change, the same code works differently.
Comment 35 Mike Kaganski 2021-11-29 09:36:20 UTC
(In reply to Tor Lillqvist from comment #34)

I agree wrt breakage :-) What I write here is for the completeness of the whole picture. We have multiple problems in our Basic; and IMO trying to reject any improvements on the basis that it would be a breakage is not constructive per se; we need to allow some breakage potential *if there's a reasonable workaround*, like easy enough fix using compatible functions, if we ever want to allow some progress.

We had a problem "similar" to this one (in its spirit at least; IMO at least ;)) - see https://mikekaganski.wordpress.com/2019/11/09/when-legacy-justifies-errors/. We were able to introduce a change, still maintaining backward compatibility. I doubt here we have some similar possibility, though.

Tor: of course, it's up to you to do the change. Please don't feel blocked by my opinion.
Comment 36 Commit Notification 2021-12-09 11:07:20 UTC
Tor Lillqvist committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/47aabde053a1472dc32770da29d68c8de5bd7919

Make the tdf#97983 changes to BASIC optional

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 37 tim 2021-12-09 13:41:03 UTC
Having raised this 5 years ago, a change was made 2 to 3 years later, and then it went quiet until this year.  I'm not into all the details, but it seems strange that what I imagined was standard practice in knowing which locale was in use, and therefore how , and . should be treated, has created so many problems.  Is it really a BASIC problem?

No matter. I'm happy to test something when a formal release comes out.  But if it is 'optional' how do I specify what behaviour I expect?
Comment 38 Mike Kaganski 2021-12-09 13:50:39 UTC
(In reply to tim from comment #37)
> Is it really a BASIC problem?

Locales are hard. Really hard :) And having an inconsistent implementation of a widely-used function is really problematic. So it is ... well, it is our Basic problem, and compatibility problem, and general difficult topic problem :-)

> But if it is 'optional' how do I specify what behaviour I expect?

The change is created in such a way, so that the standard behavior is as it was before the change; and there is an expert configuration to enable old (pre-7.0) behavior for people who *really know what they do*. So for all standard uses, this commit does not change behavior.
Comment 39 tim 2021-12-09 14:05:05 UTC
(In reply to Mike Kaganski from comment #38)
> (In reply to tim from comment #37)
> > Is it really a BASIC problem?
> 
> Locales are hard. Really hard :) And having an inconsistent implementation
> of a widely-used function is really problematic. So it is ... well, it is
> our Basic problem, and compatibility problem, and general difficult topic
> problem :-)
> 
> > But if it is 'optional' how do I specify what behaviour I expect?
> 
> The change is created in such a way, so that the standard behavior is as it
> was before the change; and there is an expert configuration to enable old
> (pre-7.0) behavior for people who *really know what they do*. So for all
> standard uses, this commit does not change behavior.
Thanks.

Please forgive me being dim, but which 'standard' behaviour?  The behaviour before this latest change, or the behaviour before the change that was made in or around 2019 in response to my original query?
Comment 40 Mike Kaganski 2021-12-09 14:19:46 UTC
(In reply to tim from comment #39)
> which 'standard' behaviour?  The behaviour
> before this latest change, or the behaviour before the change that was made
> in or around 2019 in response to my original query?

I used word "standard" to mean "the behavior that user sees after installing, without any special configuration". So my phrase was to explain what the "standard" behavior will be (it will be same as before the patch mentioned in commit 36). (Sorry for use of unclear language :))

To make it unambiguous:

1. Before commit 4fdc90c51e6a1bbb83c1f1826ad5b90dc1ff0ad6 (comment 10, landed in LO 7.0), cInt, cDbl, cFlt, and implicit conversion from string to number did not consider locale.

2. The mentioned commit changed all those to use current locale (and thus all conversions using those functions / implicit conversions became locale-aware, so might break).

3. Commit 47aabde053a1472dc32770da29d68c8de5bd7919 (comment 36) introduced an expert configuration, which is *not* enabled, but which, *when enabled*, makes *implicit conversion* of string into number locale-unaware again - so unless you enable it, the behavior will be as it was after #2; and if you enable it, the behavior will be as before #2 - *only for implicit conversion* (CDbl/CFlt will not be affected by the setting!).

Hope it makes sense.
Comment 41 tim 2021-12-09 14:59:43 UTC
(In reply to Mike Kaganski from comment #40)
> (In reply to tim from comment #39)
> > which 'standard' behaviour?  The behaviour
> > before this latest change, or the behaviour before the change that was made
> > in or around 2019 in response to my original query?
> 
> I used word "standard" to mean "the behavior that user sees after
> installing, without any special configuration". So my phrase was to explain
> what the "standard" behavior will be (it will be same as before the patch
> mentioned in commit 36). (Sorry for use of unclear language :))
> 
> To make it unambiguous:
> 
> 1. Before commit 4fdc90c51e6a1bbb83c1f1826ad5b90dc1ff0ad6 (comment 10,
> landed in LO 7.0), cInt, cDbl, cFlt, and implicit conversion from string to
> number did not consider locale.
> 
> 2. The mentioned commit changed all those to use current locale (and thus
> all conversions using those functions / implicit conversions became
> locale-aware, so might break).
> 
> 3. Commit 47aabde053a1472dc32770da29d68c8de5bd7919 (comment 36) introduced
> an expert configuration, which is *not* enabled, but which, *when enabled*,
> makes *implicit conversion* of string into number locale-unaware again - so
> unless you enable it, the behavior will be as it was after #2; and if you
> enable it, the behavior will be as before #2 - *only for implicit
> conversion* (CDbl/CFlt will not be affected by the setting!).
> 
> Hope it makes sense.
Thanks very much - all clear!