Bug 130725 - ImpSvNumberInputScan::StringToDouble may produce inaccurate result
Summary: ImpSvNumberInputScan::StringToDouble may produce inaccurate result
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.0.0
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-17 07:55 UTC by Mike Kaganski
Modified: 2024-03-03 01:19 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Google Docs doesn't have this problem (20.19 KB, image/png)
2020-02-24 17:58 UTC, Jonny Grant
Details
google sheets doesn't have the bug (24.18 KB, image/png)
2020-02-24 23:57 UTC, Jonny Grant
Details
Google Sheets indeed does not have this bug, because this is not a bug (72.99 KB, image/png)
2020-02-25 04:34 UTC, Mike Kaganski
Details
testing_subtraction_fails_with_random_values (25.20 MB, application/vnd.oasis.opendocument.spreadsheet)
2021-01-31 10:37 UTC, b.
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2020-02-17 07:55:31 UTC
In Calc, put "0.0042" to A1, "0.0043" to A2, "=A2-A1" to A3; make sure to see 19 decimal places in A3.

Expected result:
> 0.0001000000000000000
Actual result:
> 0.0000999999999999994

The problem here is the entered string in A1 has been converted to double 0.0042000000000000006, which is not the nearest double-precision value for decimal "0.0042": 0.0041999999999999997 is. The conversion of the entered string to double happens in ImpSvNumberInputScan::StringToDouble, which handles integer and fractional parts of the string separately, and then multiplies the fractional part by a power 10 exponent. As shown, this does not guarantee the closest possible representation, resulting in immediately observable inaccuracies in following calculations.

A robust conversion is needed here.
Comment 1 Mike Kaganski 2020-02-17 08:23:26 UTC
Can't std::from_chars (locale-independent, taking expected format specification, and designed for fastest-possible throughput) be used here instead, given that the string is expected to be positive, consisting only of plain decimal digits and a dot? (ImpSvNumberInputScan::StringToDouble would need to take/convert the string to 8-bit then.)
Comment 2 Mike Kaganski 2020-02-17 10:25:03 UTC
https://gerrit.libreoffice.org/c/core/+/88854
Comment 3 Mike Kaganski 2020-02-17 11:03:23 UTC
Unfortunately, std::from_chars is still not available in our baseline, so change from comment 2 is abandoned. Eike: do you have some idea for the time being?
Comment 4 Xisco Faulí 2020-02-17 12:02:02 UTC
Reproduced in

Version: 7.0.0.0.alpha0+
Build ID: 28d844a589e52abfe62dc66b888e78665221ba28
CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded

and

LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4
Comment 5 Oliver Brinzing 2020-02-17 17:32:57 UTC
*** Bug 130728 has been marked as a duplicate of this bug. ***
Comment 6 Jonny Grant 2020-02-17 17:51:52 UTC
Good to see my testcase has made it to to the bug tracker.

It would be better if the fraction of the decimal after the . was represented as an uint64, then we wouldn't have this problem. Spreadsheet is for accounts and invoices, and accounts work in base 10. The default should be as such.

Shoot me down if you don't agree! I'm sure another spreadsheet will come along and do this eventually.
Comment 7 Eike Rathke 2020-02-19 18:26:16 UTC
Would using rtl::math::stringToDouble() there yield better results? If applicable..
Comment 8 Mike Kaganski 2020-02-19 21:02:15 UTC
(In reply to Eike Rathke from comment #7)
> Would using rtl::math::stringToDouble() there yield better results? If
> applicable..

Unfortunately no, https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/53635/console shows that unit test is failing with it - see "Value must be the nearest representation of decimal 0.0042" message in the log. The method uses the same technique, so no wonder :-(

Accurate and performant conversion of string to double is *not* a trivial task, unfortunately.
Comment 9 Mike Kaganski 2020-02-19 21:42:40 UTC
[1] does not use floating-point operations. It is MIT, so might be used. It is not guaranteed to give the same result as library functions, but my understanding is that the difference should be only in cases where the value is strictly in the middle between two possible representations, i.e. both are closest possible. Might worth a try using as o3tl::from_chars, to have the same signature, to allow easy transition to std:: version when it's available?

[1] https://github.com/grzegorz-kraszewski/stringtofloat
Comment 10 Mike Kaganski 2020-02-19 21:50:28 UTC
https://github.com/achan001/dtoa-fast
Comment 11 Jonny Grant 2020-02-21 15:42:17 UTC Comment hidden (no-value)
Comment 12 Eike Rathke 2020-02-24 14:56:53 UTC
Oh we love bug comments that just chime in to tell a second time that something is wrong. Maybe we should limit Calc's precision to 8 decimal digits then.
Comment 13 Jonny Grant 2020-02-24 15:04:32 UTC Comment hidden (no-value)
Comment 14 Oliver Brinzing 2020-02-24 17:31:02 UTC
(In reply to Jonny Grant from comment #13)
> Speaking frankly, I give up. Google Calc doesn't suffer this bug.

Are you sure ?

https://webapps.stackexchange.com/questions/80125/why-does-google-spreadsheets-says-zero-is-not-equals-zero

https://webapps.stackexchange.com/questions/75073/google-sheets-rounds-numbers-where-not-wanted
Comment 15 Mike Kaganski 2020-02-24 17:46:30 UTC
(In reply to Oliver Brinzing from comment #14)
> (In reply to Jonny Grant from comment #13)
> > Speaking frankly, I give up. Google Calc doesn't suffer this bug.
> 
> Are you sure ?

Jonny Grant mixes the two problems here in the crusade to achieve Casio precision. The real problem solved here in this specific bug is incorrect conversion between string containing a decimal number, and its closest IEEE 754 double precision representation. And this problem indeed is LibreOffice-specific. But Jonny Grant tries to expand the scope of the bug, which is a wrong thing to do; and then, indeed, the statement that GSheets or Excel or whatever do things differently is wrong.
Comment 16 Jonny Grant 2020-02-24 17:57:34 UTC
(In reply to Mike Kaganski from comment #15)
> (In reply to Oliver Brinzing from comment #14)
> > (In reply to Jonny Grant from comment #13)
> > > Speaking frankly, I give up. Google Calc doesn't suffer this bug.
> > 
> > Are you sure ?
> 
> Jonny Grant mixes the two problems here in the crusade to achieve Casio
> precision. The real problem solved here in this specific bug is incorrect
> conversion between string containing a decimal number, and its closest IEEE
> 754 double precision representation. And this problem indeed is
> LibreOffice-specific. But Jonny Grant tries to expand the scope of the bug,
> which is a wrong thing to do; and then, indeed, the statement that GSheets
> or Excel or whatever do things differently is wrong.

There may be a misunderstanding. I don't seek casio. I seek my accounant, my broker, my auditor to be able to see 0.001 without them suffering because LibreOffice isn't catering for users. It's a very simple problem, and a simple fix?

Mike, Many thanks for filing my bug before I could. It's good you filed it. But the accurate bug description is on the one I filed. https://bugs.documentfoundation.org/show_bug.cgi?id=130728

The screenshot is attached to that. I'll attach the screenshot here for completeness.

Easy steps to reproduce, just open any new Google Docs Spreadsheet, and type the cells as per screengrab
Comment 17 Jonny Grant 2020-02-24 17:58:17 UTC
Created attachment 158147 [details]
Google Docs doesn't have this problem

Easy steps to reproduce. Use latest Chrome browser.
Comment 18 Mike Kaganski 2020-02-24 18:02:05 UTC
(In reply to Jonny Grant from comment #16)
> Mike, Many thanks for filing my bug before I could. It's good you filed it.
> But the accurate bug description is on the one I filed.

I didn't file "your bug". I filed the real problem, that I found when checked your test case; and *this* bug is not about what you would like it to be, but is about what I filed it for.

If you want to reduce the maximum precision of default number format, please file *that* request separately; and in that request, this specific testcase (0.043-0.042) is *not* suitable, because it demonstrates a *different* problem (that is handled here).
Comment 19 Jonny Grant 2020-02-24 18:10:17 UTC
(In reply to Mike Kaganski from comment #18)
> (In reply to Jonny Grant from comment #16)
> > Mike, Many thanks for filing my bug before I could. It's good you filed it.
> > But the accurate bug description is on the one I filed.
> 
> I didn't file "your bug". I filed the real problem, that I found when
> checked your test case; and *this* bug is not about what you would like it
> to be, but is about what I filed it for.
> 
> If you want to reduce the maximum precision of default number format, please
> file *that* request separately; and in that request, this specific testcase
> (0.043-0.042) is *not* suitable, because it demonstrates a *different*
> problem (that is handled here).

What do you think of the screenshot.
Comment 20 Mike Kaganski 2020-02-24 18:39:49 UTC
(In reply to Jonny Grant from comment #19)
> What do you think of the screenshot.

As I mentioned in the AskLibO discussion, when a person claims some level of expertise, the expectations of the other side of the discussion change. I hoped that being "a comp sci", you could understand the essence of the problem here, where LibreOffice incorrectly converts entered text "0.042" into corresponding closest IEEE 754 double precision number. LibreOffice has a bug here, which is identified, and a solution is needed, and is being developed. This has been described here, right in comment 0, so hopefully you could understand that. Taking that into account, if we fix this bug, i.e. will convert "0.042" into the *closest* binary representation (which is still not exactly equal to 0.042: see [1]), then LibreOffice would automatically start showing the *same* result in this example.

Which doesn't mean that it would also show expected results everywhere: e.g., see the example I put into the FAQ [2]. So here comes *your* PoV (which *we* understand clearly): that either Calc must calculate that precisely (which requires software calculations with infinite precision, and is clear WONTFIX), or that it at least should not show that to user automatically (unless explicitly told so), i.e. to limit automatic number format precision - which is what I recommended you to file separately.

I *hope* that now you realize that we do understand you, and it's time for you to make an effort to understand *us*.

[1] https://www.binaryconvert.com/result_double.html?decimal=048046048052050
[2] https://wiki.documentfoundation.org/Faq/Calc/Accuracy
Comment 21 Jonny Grant 2020-02-24 18:58:39 UTC
Thank you for your reply Mike, I appreciate your links.

For users, personally I don't feel infinite precision is not needed. Just accurate decimal precision, up to 8 would be ideal for most accountants etc we work with. If coded in a soft decimal lib, that's only 27 bits, but I appreciate, you'd have to code this in software, as compilers just generate code for double precision.

Could I ask, does your precision patch also resolve this test case?  anyway, good to have a second test case.

30.9
30.1
0.799999999999997
Comment 22 Mike Kaganski 2020-02-24 20:30:49 UTC
(In reply to Jonny Grant from comment #21)
> Just accurate decimal precision, up to 8 would be ideal for most accountants etc
> we work with. If coded in a soft decimal lib, that's only 27 bits, but I
> appreciate, you'd have to code this in software, as compilers just generate
> code for double precision.

Oh.
LibreOffice uses IEEE 754 double precision, which has 53 bits mantissa, and so no less than 15 decimal digits precision. That is much higher than 8 decimal digits that that you look for. You seem to not understand the precision, those ...99999... confusing you.
Comment 23 Mike Kaganski 2020-02-24 22:41:07 UTC
(In reply to Jonny Grant from comment #21)
> Could I ask, does your precision patch also resolve this test case?  anyway,
> good to have a second test case.
> 
> 30.9
> 30.1
> 0.799999999999997

No, this one will continue to give the same result as Google Sheets and MS Excel (if set to show enough decimals).
Comment 24 Jonny Grant 2020-02-24 23:57:24 UTC
Created attachment 158157 [details]
google sheets doesn't have the bug
Comment 25 Mike Kaganski 2020-02-25 04:34:59 UTC
Created attachment 158160 [details]
Google Sheets indeed does not have this bug, because this is not a bug

Please stop spamming, and learn reading (I wrote that you need to set it to show enough digits); learn using the tool (discover for yourself how to control shown precision); learn what precision is (equivalency of 0.50000... and 0.49999... is useful here, studied in intermediate school); and finally, learn using bug tracker - and stop abusing this issue with unrelated messages. Thanks!
Comment 26 Jonny Grant 2020-02-25 08:02:58 UTC Comment hidden (no-value)
Comment 27 Commit Notification 2020-02-27 10:03:00 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/f3e7004794eded346d98264d3061f4e4aa80ee0a

tdf#130725: use strtod by David M. Gay to make sure we get the nearest

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 28 Commit Notification 2020-02-27 10:03:13 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/1782810f886acd26db211d8fdd7ae8796d203c57

Related: tdf#130725: use strtod also in rtl::math::stringToDouble

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 29 b. 2020-12-14 16:16:19 UTC Comment hidden (spam)
Comment 30 Mike Kaganski 2020-12-15 06:58:53 UTC Comment hidden (off-topic)
Comment 31 b. 2020-12-16 18:28:01 UTC Comment hidden (spam)
Comment 32 Mike Kaganski 2020-12-17 07:36:37 UTC Comment hidden (off-topic)
Comment 33 b. 2021-01-31 10:37:59 UTC
Created attachment 169318 [details]
testing_subtraction_fails_with_random_values

hello @Mike Kaganski, 

i did - testing - see attached sheet, green: good, yellow: ok, red: bad, 

not 4 trillion numbers, that wouldn't have been enough as i would have to work through the square of these possibilities for subtractions alone, but a significant sample, about a million problematic cases, randomly assembled, ~96 percent fails in calc (all tasks except the binary trivial ones), 100 percent correct with some 'auxiliary calculating', i think that's meaningful, 

preliminary note: i appreciate your talents as a programmer and your knowledge about LO calc very much and would like to praise you in the highest terms, and to the same extent your condescending arrogant attitude towards others gets on my nerves to such an extent that i would like to shake you over the head until it falls off, 

please excuse me for trying relatively persistently to contribute to the correctness of calc's calculation results, they are still a little too weak after 35 years of development, 

point of contention 1: calc calculates errors, we agree on that, they are caused by the 'floats' and 'doubles', i think also 'agree', 'on which aspect' we can already argue, you think more 'conversion and representation' and thus unavoidable, i see mostly 'cancellation' as the! problem and one can work on that, 

point 2: are better results possible? you say 'rather not', i say 'yes, see attachment', with 'formula' within the IEEE standard (15 significant digits) 100 percent correct results, calc 'standard': about 96 percent fails, outside the standard (sheet2, 16 digits) the improvements are not yet fully worked out, 

point 3: you say 'such things, effort, rounding and performance impact? NO!', i say once neatly regulated saves so many stupid questions that the effort is worth it, and the performance of the hardware has increased so much that we can afford some computing time for mathematically correct results (we also waste some on background shading and calculating the positioning of comments), 

point 4: performance: what is said to users stumbling over fp-inaccuracies? do appropriate rounding, what does that do on performance? kill ... 

point 5: performance I: i have something in the quiver / in development what is much better than these rounding crutches, i.e. something that achieves the same but is much faster, but while working on it i keep stumbling over things like rounding errors, 4-bit truncation, inaccurate comparisons that hinder the elaboration and verification, so it's not yet ready, 

besides: IEEE 754 defines calculations with 15 sig.-digit figures? thus your sample '= 0.0043 - 0.0042' means '= 
0.00430000000000000 - 
0.00420000000000000' and for that a result of 
0.0000999999999999994 is correct once you round it to the appr. number of digits  and! replace the deviating double representation with that of 
0.00010000000000000

besides: i spotted another irritation, '=4/46' displayed as 0,0869565217391304 with 16 decimal digits, and 0,086956521739131 !! *1*, wrong rounding? !! with 15 decimal digits, do you know how come? new bug? 

regards, 


b.
Comment 34 Jonny Grant 2021-02-12 12:47:03 UTC Comment hidden (no-value)
Comment 35 Mike Kaganski 2021-02-12 12:51:59 UTC Comment hidden (off-topic)
Comment 36 Jonny Grant 2021-02-12 13:16:45 UTC Comment hidden (off-topic)
Comment 37 Eike Rathke 2021-02-12 19:14:32 UTC
Fwiw, this bug was about the conversion from string to double, which was fixed with the commits mentioned. All later comments are completely off-topic and hijacking this bug for unrelated discussion. This helps nothing and no one. Au contraire, repeating those over and over again at several unrelated places and interspersing information all over the place so one can't even follow things is annoying.
Please stop it. Thank you.

Jonny's is likely bug 138920 for the double to string conversion on which I'm working since a while now, trying to get around quirks quirk by quirk.