Bug 137312 - Calculation error on block formulas (ver 6.4 - 7.1?) - AMD qfx driver issue, '26.20.11016.6002' bad, '27.20.12027.1001' good -
Summary: Calculation error on block formulas (ver 6.4 - 7.1?) - AMD qfx driver issue, ...
Status: RESOLVED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.4.6.2 release
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-07 14:18 UTC by Ronald Lammers
Modified: 2020-11-10 16:26 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Test case (24.04 KB, application/vnd.oasis.opendocument.spreadsheet)
2020-10-07 14:19 UTC, Ronald Lammers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ronald Lammers 2020-10-07 14:18:09 UTC
Description:
A sumproduct-formula gives the correct result when adjacent cells do not contain the same formula. If the formula is copied to a range, the result is wrong. Copying cell by cell gives correct results. Recalculation does not have any effect unless a cell in a range with correct results is active: then recalculation results in the wrong results (for that range). This is not the case when a "lonely" cell containing the formula is selected, then the result stays correct.
When the input-range is limited to 112 rows the problem disappears.

Steps to Reproduce:
1.Open attached sheet
2.Compare formulas in cells D3, E3, I3 and K3 
3.Press F9 (no Change), Press shft-ctrl-F9 (col I shows wrong results)
4.Delete rows 115 and below (All cells have correct results)

Actual Results:
1313360

Expected Results:
2191946 is the correct result for the cells in row 3


Reproducible: Always


User Profile Reset: Yes



Additional Info:
Calculate correctly? Problem is probably some memory overflow when a formula is in a block of the same formulas (multiple identical formulas may be in the same memory space resulting in an overflow when the range is larger than abt 100 cells)
Comment 1 Ronald Lammers 2020-10-07 14:19:03 UTC
Created attachment 166161 [details]
Test case
Comment 2 Ronald Lammers 2020-10-07 14:27:55 UTC
I tested version 6.4.6.2 as well. That version has the same problem.
Comment 3 m_a_riosv 2020-10-07 16:48:35 UTC

*** This bug has been marked as a duplicate of bug 125320 ***
Comment 4 Telesto 2020-10-08 06:18:52 UTC

*** This bug has been marked as a duplicate of bug 137248 ***
Comment 5 Telesto 2020-10-08 06:20:05 UTC

*** This bug has been marked as a duplicate of bug 125320 ***
Comment 6 b. 2020-10-08 09:36:57 UTC
checked, repro of wrong values in file, no repro of recalc not working, see: https://bugs.documentfoundation.org/show_bug.cgi?id=125320#c7

check about evolving of fail not possible as no exact description, check of version not possible as not named, 

i would fear a 'sporadic fail' 

at '100 cells' - exactly - there is a threshold as threading by openCL or 'enable multi-threaded calculation' steps in above that value ...
Comment 7 Ronald Lammers 2020-10-08 10:13:16 UTC
Tried again, I get the problem as described on versions 7.0 and 6.4.6, reproducable all the time.
Tried on another laptop using version 6.4.5: recalc worked and the problem disappeared.

Apart from the versions (which I think are not the problem) the difference between the two laptops is the CPU. The first one is AMD, the second one Intel.

Might this be a problem with openCL not working correctly (all the time) on AMD-processors? (the problem-laptop has a Ryzen 5 3500U)
Comment 8 Telesto 2020-10-08 16:14:37 UTC
@Stuart
Any ideas?

I don't repro (Intel), b. doesn't repro.

BTW; to be sure. Ronald,  does this also happen in safe mode?
Comment 9 Ronald Lammers 2020-10-08 21:41:55 UTC
Yes, the problem is also there in safe mode. Same behaviour.
Comment 10 b. 2020-10-09 04:10:44 UTC
(In reply to Ronald Lammers from comment #7)

> Might this be a problem with openCL not working correctly (all the time) on
> AMD-processors? (the problem-laptop has a Ryzen 5 3500U)

hello @Ronald, 

did / can you try what happens with openCL disabled? the 'parallelizing-weaknesses' i had found in the past cleanly disappeared with unchecked openCL and! unchecked 'threading calculations', (sorry, dunno if safe mode disables parallelizing), 

with a not and a working configuration you should be able to narrow down the problem very precisely :-)
Comment 11 Ronald Lammers 2020-10-09 07:50:37 UTC
Disabling OpenCL solves the problem.
Disabling multi-threading does not solve the problem.
Comment 12 Ronald Lammers 2020-10-09 07:53:21 UTC
Just to be precise: I do not need to disable multithreading to get the sheet to calculate correctly. Just disabling OpenCL is enough.
Comment 13 Telesto 2020-10-09 21:07:31 UTC
@Michael and/or Luboš 
There appears to be some kind of OpenCL flaw here or a driver issue. No clue which info is needed to investigate this... (so in need for some dev knowledge) 

Marking it unconfirmed for now. Not seeing OpenCL explicitly mentioned in bug 125320
Comment 14 b. 2020-10-10 10:28:39 UTC
tried hard, couldn't produce wrong results here, even with openCL enabled ... 

@Ronald: as already 'copying' can be done in various ways i'm not sure if i tried the exact steps which fail for you, would you mind: 

- providing the info about the failing and the 'correct' system in detail, 'help - about LO', 

- provide a step by step description how to produce the fail, 'left click on xxx ... select paste from options ... ' that we can be sure to redraw it exactly, 

tia

is somebody around with an AMD machine to do a recheck?
Comment 15 Ronald Lammers 2020-10-10 11:52:00 UTC
Failing system:

Libreoffice
Version: 6.4.6.2 (x64)
Build ID: 0ce51a4fd21bff07a5c061082cc82c5ed232f115
CPU threads: 8; OS: Windows 10.0 Build 18363; UI render: GL; VCL: win; 
Locale: nl-NL (en_NL); UI-Language: en-US
Calc: CL

System
AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx 2.10Ghz
8 Gb Ram, 2 Gb reserved for Gfx
Gfx-driver version 26.20.11016.6002
Win 10 Home 64bit version 1909, build 18363.1082


Reproduce error:

Open uploaded sheet
Use mouse to select D3
Copy formula using F2, shft-Home, Ctrl-C, Esc
Use mouse to select G3
Ctrl-V
Ctrl-C
Goto end of range using left left, ctrl-dn, right, right
Ctrl-V
Select entire range using shft-ctrl-up
Ctrl-V

After the last Ctrl-V the range is recalculated (including the first cell) and shows the wrong results.
Comment 16 Ronald Lammers 2020-10-10 11:54:39 UTC
Working system:

Libreoffice
Version: 6.4.6.2 (x64)
Build ID: 0ce51a4fd21bff07a5c061082cc82c5ed232f115
CPU threads: 8; OS: Windows 10.0 Build 18363; UI render: GL; VCL: win; 
Locale: nl-NL (en_NL); UI-Language: en-US
Calc: threaded
Comment 17 b. 2020-10-14 06:13:12 UTC
no repro on Intel Xeon, Nvidia M4000M, 

have no 6.4.6 ver. at hand, tried with: 

Version: 6.4.5.1 (x64)
Build ID: be964ce243d03404cfeed53d0487f5d6bd49c627
CPU threads: 8; OS: Windows 6.1 Service Pack 1 Build 7601; UI render: GL; VCL: win; 
Locale: de-DE (de_DE); UI-Language: en-US
Calc: CL

assume openCL problem with OP's specific hardware, if so then it's hard to solve as devs would need similar system, may be 'driver-solveable', there are specialists around for that, i'm not ... 

don't know if my GC even works for CL, but had other CL bugs repro here and gone with CL off,
Comment 18 Luboš Luňák 2020-10-16 08:54:01 UTC
I cannot reproduce following the steps from comment #15 using recent master, and I have a very similar setup (Ryzen 5 2500U).
Can you reproduce the problem also with a recent build from https://dev-builds.libreoffice.org/daily/master/ ?
Comment 19 Ronald Lammers 2020-10-16 14:00:24 UTC
I used the 16-10-build of version 7.1 and that shows the same behaviour as the 6.4- and 7.0-versions.
Comment 20 Ronald Lammers 2020-10-16 14:29:20 UTC
I updated the gfx driver to the latest version published by AMD and that does solve the problem.

So it seems to be a graphics driver issue.

Version working correctly is 27.20.12027.1001 dated 26-8-2020(on AMD-site publised as Adrenalin 2020 Edition 20.8.3 Optional)

Version not working correctly is 26.20.11016.6002, dated 22-7-2019(OEM-version, as advised by AMD)

Did a roll back of the driver and the problem reappeared.
Comment 21 b. 2020-10-17 04:48:04 UTC
hello @Ronald Lammers, 

thanks for spotting, reporting, working on and solving this bug, 

imho that: 

'I updated the gfx driver ... and that does solve the problem.
... 
Did a roll back of the driver and the problem reappeared.' 

is clue enough to say 'resolved' ... 

good work 

'- one bug killed -' :-)

recheck would be nice, as well as a check if related to 93478, 115497, 122739, 135370
Comment 22 Ronald Lammers 2020-10-17 12:01:04 UTC
I understand this is not something Libreoffice development can solve.

But it does affect Libreoffice in a negative way and it seems openCL causes more problems.

If there is no way to ensure driver-problems do not cause Libreoffice to malfunction, maybe openCL should be off by default (and show a warning when activated).
Comment 23 Telesto 2020-10-17 12:46:13 UTC
@Meeks and/or @Luboš 

(In reply to Ronald Lammers from comment #22)
> I understand this is not something Libreoffice development can solve.
> 
> But it does affect Libreoffice in a negative way and it seems openCL causes
> more problems.

(A) I assumed there was blacklist - excuse moi - blocklist for incompatible OpenCL drivers? Obviously this driver should be added
(B) Is there no check at first launch? I assumed that OpenCL comp-ability was checked before a run. Or was that only about being available? Or I'm I totally making stuff up here?
(C) And of course: is there still an advantage of OpenCL next to multi-thread calculation? I assume so, only double checking
Comment 24 Michael Meeks 2020-10-19 13:03:53 UTC
There is indeed a black-list; and we should black-list known bad drivers of course. Lubos is the expert there - Lubos ? =)
Comment 25 Luboš Luňák 2020-11-10 16:26:38 UTC
(In reply to Ronald Lammers from comment #20)
> Version not working correctly is 26.20.11016.6002, dated
> 22-7-2019(OEM-version, as advised by AMD)

That is the marketing version number that does not mean much in practice. Please run LO with the faulty driver, which will generate <cachedir>/opencl_devices.log with the proper needed information about the driver (on Windows the <cachedir> should be AppData\Roaming\LibreOffice\4\cache).