Bug 109335 - OpenCL performance degradation
Summary: OpenCL performance degradation
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.0.3.2 release
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, perf, regression
Depends on:
Blocks: OpenCL
  Show dependency treegraph
 
Reported: 2017-07-25 10:46 UTC by Luís
Modified: 2021-01-27 16:06 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
OpenCL Devices log file (1.23 KB, text/plain)
2017-07-26 14:00 UTC, Luís
Details
Bibisect log (3.07 KB, text/plain)
2017-10-09 07:05 UTC, Telesto
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luís 2017-07-25 10:46:32 UTC
Description:
On my nVidia GTX 750 Ti OpenCL performance degrades a lot after upgrading from 5.0.3.1 to 5.0.3.2 on the "Ground water" test document, from 440 ms to 28000 ms.

There is a similar performance degradation between 5.0.4.2 and 5.0.5.1 on the "Dates worked" test document, from 100 ms to 2300 ms.

"Stock history" document did not suffer performance degradation on newer versions.

I am using Calc with the default option, but in Brazilian Portuguese and with the OpenCL "subset" option set to false, as for some reason OpenCL never works when set to true (maybe locale bug?).


Steps to Reproduce:
1. Install LibreOffice
2. Enable OpenCL
2. Disable OpenCL subset of operations
3. Run "Dates worked.xlsm" or "Ground water daily.xls"

Actual Results:  
(After 5.0.5.1)
- Dates worked: 2291 ms to complete
- Ground water: 28000 ms to complete

Expected Results:
- Dates worked: 100 ms to complete
- Ground water: 440 ms to complete



Reproducible: Always

User Profile Reset: No

Additional Info:


User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063
Comment 1 m.a.riosv 2017-07-25 11:01:56 UTC
Please can you test with a newer version.

5.0 it's a bit outdated, a lot of work has been done since then.
Comment 2 Luís 2017-07-25 11:03:50 UTC
Sorry, I should have mentioned.
This performance degradation happens since version 5.0.3.2 and 5.0.5.1.
I get the same result on 5.3.4.2.
Comment 3 m.a.riosv 2017-07-25 22:00:59 UTC
Can you test with 5.4
https://downloadarchive.documentfoundation.org/libreoffice/old/?C=M;O=D
Comment 4 Luís 2017-07-26 00:55:40 UTC
Just tested 5.4.0.3.
Same performance as 5.3.4.2, maybe a little worse but could be my computer.

Dates worked: ~2500 ms
Ground water: ~31000 ms
Comment 5 m.a.riosv 2017-07-26 13:47:33 UTC
Please can copy here the file on the userprofile:
..\cache\opencl_devices.log

and test the times with OpenCl disable but software interpreter enable.
Maybe the driver it's not compatible but not detected with the test done by LO.
Comment 6 Luís 2017-07-26 14:00:32 UTC
Created attachment 134873 [details]
OpenCL Devices log file

OpenCL device log, as requested
Comment 7 Luís 2017-07-26 14:10:53 UTC
How can I force the use of the software interpreter? When disabling OpenCL and letting the interpreter ON, the Calc engine changes to "group". If I disable the software interpreter it changes to "single".
With both enabled the engine is "CL".

Anyway, when it is on "group":
- Dates worked: 22000 ms
- Ground water: 50000 ms

(yes, that much)
Comment 8 m.a.riosv 2017-07-26 15:54:51 UTC
Now we need to wait for someone who can test your hardware combination.
Comment 9 Telesto 2017-10-08 12:20:36 UTC
Is there a test-file? Something like: Ground water daily.xls
Comment 10 Luís 2017-10-09 00:11:49 UTC
(In reply to Telesto from comment #9)
> Is there a test-file? Something like: Ground water daily.xls

Yes, I used the test files from here:
http://kohei.us/2014/10/02/opencl-test-documents-for-calc/
Comment 11 Telesto 2017-10-09 07:05:20 UTC
Repro with:
Version: 6.0.0.0.alpha0+
Build ID: c5a93cad149618bbd43632f1660a558c34bdbf7e
CPU threads: 4; OS: Windows 6.3; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2017-10-07_01:04:25
Locale: nl-NL (nl_NL); Calc: CL
Comment 12 Telesto 2017-10-09 07:05:51 UTC
Created attachment 136858 [details]
Bibisect log

Bisected to:

author	Tor Lillqvist <tml@collabora.com>	2015-10-15 09:37:55 (GMT)
committer	Tor Lillqvist <tml@collabora.com>	2015-10-15 10:45:45 (GMT)
commit	03eae494cfdb0c75188e6c2c85a4b59acba0ef12 (patch)
tree	af8bd5fcdb62fcb6e59dc351c0c3e36e3a7cce8f
parent	5e0e953f8f8fc5b27db8421ba15e33cfa664fb7a (diff)
tdf#94924: Return correct result 0 from OpenCL MIN and MAX when all args empty
Used the same style as existing code, added a new virtual isMinOrMax()
and add some special casing in Reduction::GenSlidingWindowFunction(),
and fsim_count() and fmax_count() functions that count how many
non-NaN numbers we actually see. As such, I am not sure at all that
this is an ideal way to do this, but will have to do for now.
Comment 13 Telesto 2017-10-09 07:07:09 UTC
Adding a CC to Tor Lillqvist
Comment 14 Telesto 2017-11-23 15:55:07 UTC
Repro with
Version: 6.0.0.0.alpha1+
Build ID: c24c32bf71b8e64bd0d36e511f554e1f6c015842
CPU threads: 4; OS: Windows 6.3; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2017-11-22_23:15:41
Locale: nl-NL (nl_NL); Calc: group threaded
Comment 15 QA Administrators 2018-11-24 03:43:49 UTC Comment hidden (obsolete)
Comment 16 Roman Kuznetsov 2018-11-24 17:41:52 UTC
It takes result 

2768 in file "Ground water daily" 
21139 in file "Dates worksheet"

in 

Version: 6.2.0.0.alpha1+ (x64)
Build ID: 20b2903354138f8ab19261fab74658fcf6af70e3
CPU threads: 4; OS: Windows 10.0; UI render: GL; VCL: win; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2018-11-14_22:57:13
Locale: ru-RU (ru_RU); UI-Language: en-US
Calc: CL

And I disabled multi-threaded calculation

still repro

/cache/OpenCL.log

Device Index: 0
  Selected: false
  Device Name: GeForce GTX 1050
  Device Vendor: NVIDIA Corporation
  Device Version: OpenCL 1.2 CUDA
  Driver Version: 397.93
  Device Type: gpu 
  ...
  Device OpenCL C Version: OpenCL C 1.2 
  Device Available: true
  Device Compiler Available: true
  Device Linker Available: true
  Platform Name: NVIDIA CUDA
  Platform Vendor: NVIDIA Corporation
  Platform Version: OpenCL 1.2 CUDA 9.2.127
  Platform Profile: FULL_PROFILE
  ...

Device Index: 1
  Selected: true
  Device Name: Intel(R) HD Graphics 630
  Device Vendor: Intel(R) Corporation
  Device Version: OpenCL 2.1 
  Driver Version: 22.20.16.4708
  Device Type: gpu 
  ... 
  Device OpenCL C Version: OpenCL C 2.0 
  Device Available: true
  Device Compiler Available: true
  Device Linker Available: true
  Platform Name: Intel(R) OpenCL
  Platform Vendor: Intel(R) Corporation
  Platform Version: OpenCL 2.1 
  Platform Profile: FULL_PROFILE
  ...

Device Index: 2
  Selected: false
  Device Name: Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz
  Device Vendor: Intel(R) Corporation
  Device Version: OpenCL 2.1 (Build 10)
  Driver Version: 7.2.0.10
  Device Type: cpu 
  ...
  Device OpenCL C Version: OpenCL C 2.0 
  Device Available: true
  Device Compiler Available: true
  Device Linker Available: true
  Platform Name: Intel(R) OpenCL
  Platform Vendor: Intel(R) Corporation
  Platform Version: OpenCL 2.1 
  Platform Profile: FULL_PROFILE
  ...
Comment 17 QA Administrators 2019-11-25 03:31:01 UTC Comment hidden (obsolete)
Comment 18 Luís 2019-11-26 09:29:12 UTC
Regression still present, however worse:
- Dates worked: 11785ms
- Ground water: 23598ms

Tested from the exact same PC with newer drivers. It seems that the GPU is not being used according to task manager.

Device Index: 0
  Selected: true
  Device Name: GeForce GTX 750 Ti
  Device Vendor: NVIDIA Corporation
  Device Version: OpenCL 1.2 CUDA
  Driver Version: 436.02
  Device Type: gpu 
  Device Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer
  Device OpenCL C Version: OpenCL C 1.2 
  Device Available: true
  Device Compiler Available: true
  Device Linker Available: true
  Platform Name: NVIDIA CUDA
  Platform Vendor: NVIDIA Corporation
  Platform Version: OpenCL 1.2 CUDA 10.1.0
  Platform Profile: FULL_PROFILE
  Platform Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer



vice Index: 1
  Selected: false
  Device Name: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
  Device Vendor: Intel(R) Corporation
  Device Version: OpenCL 1.2 (Build 10094)
  Driver Version: 5.2.0.10094
  Device Type: cpu 
  Device Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64 
  Device OpenCL C Version: OpenCL C 1.2 
  Device Available: true
  Device Compiler Available: true
  Device Linker Available: true
  Platform Name: Intel(R) OpenCL
  Platform Vendor: Intel(R) Corporation
  Platform Version: OpenCL 1.2 
  Platform Profile: FULL_PROFILE
  Platform Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64

 cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir

Device Index: 2
  Selected: false
  Device Name: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
  Device Vendor: Intel(R) Corporation
  Device Version: OpenCL 1.2 (Build 10094)
  Driver Version: 5.2.0.10094
  Device Type: cpu 
  Device Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64 
  Device OpenCL C Version: OpenCL C 1.2 
  Device Available: true
  Device Compiler Available: true
  Device Linker Available: true
  Platform Name: Intel(R) OpenCL
  Platform Vendor: Intel(R) Corporation
  Platform Version: OpenCL 1.2 
  Platform Profile: FULL_PROFILE
  Platform Extensions: cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
Comment 19 Luís 2020-10-29 17:00:04 UTC
Just FYI, problem still persists.
Same machine but on Calc version 7.0.2.2.

- Dates worked: ~14800 ms
- Ground water: ~16500 ms

For reference, on Excel:

- Dates worked: ~4700 ms
- Ground water: ~6600 ms