Description: I ran ucalc_formula_test and testExternalRefFunctions() always failed with 503 error (538 expected) on RISC-V. I noticed that the error code was put in a NaN floating-point variable (e.g Error 538 is 0x7ff800000000021a) wrapped in a KahanSum class in interpr6.cxx:466~922. But these NaN values were added up in kahan.hxx:add(). On x86, when NaN floating-point added, the significand bits are reserved but on RISC-V they are cleared. Maybe it is just a coincidence that the test passes on x86? Perhaps the error value should be directly reserved in a double variable rather than wrapped in a KahanSum class to avoid being operated? Steps to Reproduce: (On any arch) In source code folder, run `make CppunitTest_sc_ucalc_formula CPPUNITTRACE="gdb --args"` Then in gdb, run 1. break ucalc_formula.cxx:7180 (FormulaError nErr = pFC->GetErrCode();) 2. run 3. break kahan.hxx:59 (double t = m_fSum + m_fMem;) 4. continue Error code(0x7ff800000000021a) was reserved in m_fMem. m_fSum and m_fError are calculated after add() func. Actual Results: In testExternalRefFunctions(), the result is 503 - invalid floating point operation Expected Results: Error 538 - MatrixSizeError Reproducible: Always User Profile Reset: No Additional Info: Nothing.
https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/nan-propagation.pdf > The standard specifies that an operation with a quiet NaN input should produce a > quiet NaN result with the same payload. This means that for a "0 + nan", executed in KahanSum::get -> KahanSum::add, the result *must* be identical to the nan payload. > Support for payload propagation is optional on RISC-V processors. So likely, we just need to enable the propagation to force conformance to the standard?
https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/g79dHlV4B_k?pli=1 meaning that the "optional support" means that a given CPU may or may not implement such propagation.
https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/g79dHlV4B_k/m/dtlPNaq8AwAJ from 2016: "We considered supporting payloads but frankly could not find any example of this being used portably in practice." "we’d like to see some compelling portable use cases before mandating extra hardware on every RISC-V FPU implementation" /me wonders if meanwhile they'd heard of LibreOffice. So this just rules out RISC-V as a supported platform I guess? And we maybe should add a test to sal/qa/rtl/math/test-rtl-math.cxx or even a static_assert() in sal/rtl/math.cxx to fail early in the build process.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/2fd915b7d2d2a9e25600a07587ab01d23bffbc58 Related: tdf#152943 Test that a quiet NaN payload is propagated It will be available in 7.6.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
From https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/g79dHlV4B_k/m/dtlPNaq8AwAJ "Implementors are free to provide a NaN payload propagation scheme as a nonstandard extension enabled by a nonstandard operating mode." I deduce that there may be RISC-V systems that support propagating NaN payloads. So with bad luck a build, even if it did make check on such system, would not run properly on systems that do not support the extension. We might have to have a runtime check in Calc as well and bail out if NaN payloads are not supported.
Currently NaN payload are not supported by most RISC-V machines. It might be necessary for RISC-V porters to know which component is affected by NaN payload. Are there any documents that describe how NaN payload work in LibreOffice?
No explicit documents, but Calc heavily uses quiet NaNs to transport and propagate any runtime error in the calculation interpreter engine and relies on NaN payload propagation in all math operators. See CreateDoubleError() and GetDoubleErrorValue() at https://opengrok.libreoffice.org/xref/core/include/formula/errorcodes.hxx?r=807f238f#96 and their use https://opengrok.libreoffice.org/s?refs=CreateDoubleError&project=core https://opengrok.libreoffice.org/s?refs=GetDoubleErrorValue&project=core
(In reply to Eike Rathke from comment #5) > We might have to have a runtime check in Calc as well and bail out if NaN > payloads are not supported. Yes, a compile time check cannot guarantee that this functionality works well at runtime. So what behaviour should Calc take when detects the lack of NaN payload support at runtime? As I understand it, there might be three solutions: 1. Giving a warning and shutting down calc. 2. Giving a warning and disclaimer. Calc continues running. 3. Silently switch to a non-NaN payload mode. For #1, maybe it is a bit distructive to directly rule out RISC-V as a supported platform. #2 is a transitional solution. For #3, is it possible to add this feature? According to my observation, on riscv64, Calc does return error code when a formula error occurs, but sometimes it returns wrong error code '#NUM'.
I'd personally prefer 2. 1. has the problem that one still ships libreoffice-calc. Shipping something which doesn't work (fails on start, even with a message) is not good. Or 3, but I don't think there's such a mode (yet). And if we had such a mode we wouldn't need that NaN payload thingy at all.
I prefer #2, too. I am wondering whether #2 is acceptable by community. If so, I am willing to do a bit of work on this.
#3 is not possible as there is no non-NaN-payload mode implemented, and implementing it would require not only different error handling but also different transport of errors between intermediary results up to final results of a formula expression; while technically possible I wouldn't call it feasible or viable, as the effort would be huge, add even more runtime penalty, and possibly overlooking things adds uncertainty. #2 might work to some degree, someone would have to actually try on such an inferior architecture what happens. My (hopeful) guess is that _some_ NaN is transported so there will be _some_ error (e.g. indicating numeric failure) but any detail about what specific error will be lost. But that also means that code reacting on specific error information will fail, so not everything will work.
Here is the patch which add NaN payload check at app startup. https://gerrit.libreoffice.org/c/core/+/165391 The content of the patch: 1. check nan payload on app startup 2. emit signal to call 'CreateNanPayloadWarningDialog()' 3. the factory creates dialog 4. record property 'ShowNanPayloadWarning' persistent
I have add some commits related to this bug. For builder: - Add autogen.sh option to skip NaN payload tests https://gerrit.libreoffice.org/c/core/+/174801 - Extract NaN payload test code to make it reusable https://gerrit.libreoffice.org/c/core/+/174964 - Add check at the start of some test sequence https://gerrit.libreoffice.org/c/core/+/175293 For user: - Add check at app startup and inform the risk to the user if NaN payload is not supported. https://gerrit.libreoffice.org/c/core/+/165391 I would be appreciated If someone gives me advice or helps to review the patches.
Sakura286 committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/6a7dd6fcd95cb6086e447d1897bed750bceee8c2 tdf#152943: Add NaN payload check at the start of sc_ucalc test sequence It will be available in 25.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.