Bug 125978 - Unreliable float comparisons in bridgetest
Summary: Unreliable float comparisons in bridgetest
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
6.2.4.2 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: patch
Depends on:
Blocks: Dev-Bugs
  Show dependency treegraph
 
Reported: 2019-06-18 08:02 UTC by Marcus Tomlinson
Modified: 2023-09-06 12:56 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
bridgetest patch (2.28 KB, patch)
2019-06-18 08:04 UTC, Marcus Tomlinson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcus Tomlinson 2019-06-18 08:02:54 UTC
Description:
Hi, we've (Ubuntu) been seeing a fair bit of flakiness lately (particularly on s390x) from: testtools/source/bridgetest/bridgetest.cxx

Looking into it, we found the culprit to be a series of 'float == float' checks. Sure, most of the time these companions work, but now and then you'll get a rounding error that breaks the test. Ideally the test should be comparing floats by difference within the bounds of an epsilon, would you agree?

I've attached a patch for bridgetest.cxx that fixes the issue there, but this would need to be fixed to the 2x cli tests, and in testiadaptor too.

Downstream bug report: https://bugs.launchpad.net/ubuntu/+source/libreoffice/+bug/1832360

Steps to Reproduce:
1. Run bridgetest repeatedly

Actual Results:
Occasionally you'll get:
### float does not match! failed

Expected Results:
Test passes every time.


Reproducible: Sometimes


User Profile Reset: No



Additional Info:
Comment 1 Marcus Tomlinson 2019-06-18 08:04:33 UTC
Created attachment 152263 [details]
bridgetest patch
Comment 2 Julien Nabet 2019-06-18 08:58:58 UTC
Thank you for the proposed patch!

If interested to contribute for dev part, you can give a look to https://wiki.documentfoundation.org/Development/GetInvolved
Here are the steps to sum up:
- clone repo
- provide license statement
- create an account on gerrit
- submit your patch on gerrit

Since I suppose you've already clone the repo, analyzed the pb and created the patch, you did the biggest part! :-)
Comment 3 Xisco Faulí 2019-08-06 14:39:09 UTC
Dear Marcus Tomlinson,
Could you please submit the  patch to gerrit so core developers can review it there? < https://wiki.documentfoundation.org/Development/gerrit/SubmitPatch >
Moving to NEW
Comment 4 Stephan Bergmann 2019-08-06 14:46:38 UTC
(In reply to Xisco Faulí from comment #3)
> Dear Marcus Tomlinson,
> Could you please submit the  patch to gerrit so core developers can review
> it there?

...and provide a link here to that Gerrit patch, please
Comment 5 Caolán McNamara 2019-08-07 09:20:48 UTC
was this on real hardware or under hercules ?
Comment 6 Marcus Tomlinson 2019-08-07 10:07:47 UTC
(In reply to Caolán McNamara from comment #5)
> was this on real hardware or under hercules ?

Real hardware
Comment 7 Stephan Bergmann 2019-08-07 10:52:16 UTC
...and were the repeated runs of bridgetest (of which some failed in a way that would be fixed by your patch and some succeeded) all done on the same hardware?  Looks a bit odd to me that such failures would happen nondeterministically.
Comment 8 Marcus Tomlinson 2019-08-07 10:57:56 UTC
(In reply to Stephan Bergmann from comment #7)
> ...and were the repeated runs of bridgetest (of which some failed in a way
> that would be fixed by your patch and some succeeded) all done on the same
> hardware?  Looks a bit odd to me that such failures would happen
> nondeterministically.

Same hardware but different kvm instance for each. Weird indeed.
Comment 9 Stephan Bergmann 2019-08-07 11:00:51 UTC
Anyway, if you could upload the patch to Gerrit (and please do the paperwork at <https://wiki.documentfoundation.org/Development/Developers#License_Statements>), the fix should be uncontroversial to get in.
Comment 10 Stephan Bergmann 2023-08-24 06:31:20 UTC
As reported in comments 6 and 10 of <https://bugzilla.redhat.com/show_bug.cgi?id=2136459> "test failure in rawhide/s390x - float does not match!":

"When I last looked into this well-known-on-s390x sporadic failure in September 2020, I inconclusively noted that it 'looks more like a heisenbug related to floating-point behavior'.  (I.e., it passes some hardcoded floating-point value around and then compares with ==.  IIRC, the values were always printing identically in a debugger when I tried to debug that, but still == occasionally failed.  So it wasn't like the values were wildly off and thus clearly indicating an actual bug somewhere in the LibreOffice code.)  The back-then disabled-for-s390x `make unitcheck slowcheck` has since been enabled with <https://src.fedoraproject.org/rpms/libreoffice/c/5be3141a5b44a2d2fc236a676ec2e8325a7a0036> 'renable check for s390x'.  If this particular sporadic failure hits frequently enough, we might want to disable that one test for s390x for the time being?"

and

"I built libreoffice with:
fedpkg build --scratch --arches s390x
20 times in a row with a F38 target without failure. I'm not really convinced the bug is gone, but I can't trigger it."

And as reported at <https://lists.freedesktop.org/archives/libreoffice/2023-August/090788.html> "Re: Tests failures trying to package LO in Fedora":

"For s390x we get the floating point precision test failure described [in this issue here]. Applying the patch proposed in the bug report didn't help."