Bug 88444 - :Access violation error while calculating SumIfs using OpenCL on CPU device causes crash
Summary: :Access violation error while calculating SumIfs using OpenCL on CPU device c...
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
(earliest affected)
Hardware: All Windows (All)
: high critical
Assignee: Not Assigned
Depends on:
Reported: 2015-01-15 06:54 UTC by evgeniy.tyurin
Modified: 2015-05-18 09:31 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:

Attachments zipped: spreadsheet to reproduce, SumIfs OpenCL kernel code, the log. (260.65 KB, application/zip)
2015-01-15 06:54 UTC, evgeniy.tyurin
Patch to add debugging output to the relevant C++ and OpenCL code (12.34 KB, patch)
2015-01-15 09:44 UTC, Tor Lillqvist
Debug output from loading the attached spreadsheet (578.97 KB, text/plain)
2015-01-15 09:46 UTC, Tor Lillqvist
Suggested patch (1.38 KB, patch)
2015-01-16 12:58 UTC, Tor Lillqvist
Replacement scopencllo.dll (1.08 MB, application/x-ms-dos-executable)
2015-01-19 11:06 UTC, Tor Lillqvist
Log with dumping of buffer sizes and handles (129.75 KB, text/plain)
2015-01-19 15:17 UTC, evgeniy.tyurin
Spreadsheet to reproduce (646.72 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2015-02-03 08:57 UTC, evgeniy.tyurin
Reproducer trace log of OpenCL calls (389.10 KB, text/plain)
2015-02-03 09:01 UTC, evgeniy.tyurin
Patch to add even more debugging output (14.73 KB, patch)
2015-02-11 10:58 UTC, Tor Lillqvist

Note You need to log in before you can comment on or make changes to this bug.
Description evgeniy.tyurin 2015-01-15 06:54:15 UTC
Created attachment 112270 [details]
Attachments zipped: spreadsheet to reproduce, SumIfs OpenCL kernel code, the log.

The issue can be identified as an endless run with PCMark8, when selecting Spreadsheet in Work test, using acceleration mode.
After a bit of investigation, we have identify the issue seems like incorrect set/usages of parameters that result out-of-boundaries in OpenCL kernel execution.
And it seems located somewhere around https://gitorious.org/libreoffice/core/source/7d68e43e8b35419dc481ce3a06716113b4045839:sc/source/core/opencl/op_math.cxx#
Line 442 void OpSumIfs::GenSlidingWindowFunction(std::stringstream &ss,               const std::string sSymName, SubArguments &vSubArguments).

More information:

Here is an example of a buffer allocated with wrong size.
DEBUG 13:58:16               4032       c:\work\aderi\vcp_olc-opencl-sdk\src\framework\Context\context_module.cpp                (1635)   Intel::OpenCL::Framework::ContextModule::CreateBuffer          CreateBuffer return handle 120115056
clCreateBuffer(0x2db9228, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR, 8, 0x153de490, CL_SUCCESS) = 0x728cf70
DEBUG 13:58:16               4032       c:\work\aderi\vcp_olc-opencl-sdk\src\framework\Context\kernel.cpp                (797)                Intel::OpenCL::Framework::Kernel::SetKernelArg            SetKernelArg buffer cl_mem=120115056
clSetKernelArg(0x7234350, 4, 4, 0x1383e10) = CL_SUCCESS
DEBUG 13:58:16               4032       c:\work\aderi\vcp_olc-opencl-sdk\src\devices\cpu_device\cpu_device.cpp      (2019)                Intel::OpenCL::CPUDevice::CPUDevice::clDevFlushCommandList              clDevFlushCommandList Function enter
clEnqueueNDRangeKernel(0x2dd5768, 0x7234350, 2, nullptr, { 256, 100 }, { 256, 1 }, 0, nullptr, nullptr) = CL_SUCCESS

The buffer allocated for 5th argument (argument number 4) of kernel tmp0_0_0_SumIfs_reduction was created with size 8 bytes.
The kernel was executed with global size = {256, 100} and local size {256, 1} => group size = {1, 100}
The kernel tmp0_0_0_SumIfs_reduction access 5th argument __global double *tmp0_0_4 as follows:
tmp4= fsum(tmp0_0_4[get_group_id(1)], 0);

Summary by Tor:

- The version you are using is the tag libreoffice-4-2-milestone-5, in the branch libreoffice-4-2.

- The call to clCreateBuffer() that your log shows is most likely the one in ParallelReductionVectorRef::Marshal() and the clSetKernelArg() call is the one in DynamicKernelSoPArguments::Marshal(), the one in the for (size_t j=0; j< vclmem.size(); j++) loop, and it is the ‘redKernel’ kernel that has the name “tmp0_0_0_SumIfs_reduction”. This all is in the sc/source/core/opencl/formulagroupcl.cxx source file.

- The problematic OpenCL code line tmp4= fsum(tmp0_0_4[get_group_id(1)], 0) is generated in CheckVariables::CheckSubArgumentIsNan2() in sc/source/core/opencl/opbase.cxx.

The issue is reproducible sporadically. We experience failures while running PCMark.
One can trigger SumIfs calculation using spreadsheet in the attachment, if OpenCL acceleration is turned on and maybe even forced.

The archive attached contains:
- spreadsheet that uses SumIfs. SumIfs are located on the sheet named 'Main', starting S3.
- SumIfs kernel source code, that is generated while calculating the formula.
- Log file of OpenCL calls.
Comment 1 Tor Lillqvist 2015-01-15 08:47:14 UTC
The cl_284_fail.log file is rather massive, can you give any hints what to look for in it to find information relevant to the problem?

Also, would it be possible to reduce the sample spreadsheet even further? The ideal would be to have just one column of the problematic SUMIFS formulas. Would make following the output when I add debugging printouts to the OpenCL code and to the LO C++ code much easier.
Comment 2 Tor Lillqvist 2015-01-15 08:55:50 UTC
Do I understand correctly, that you are noticing the array out-of-bound references only sporadically, when the access happens to cross a page boundary and point into a nonexistent page (or a read-protected page)? Even if the out-of-bounds accesses happen presumably regularly, but as long as the result is an access to an existing page, they aren't trapped?

Is there perhaps some "debug mode" in your OpenCL implementation, that would cause it to keep track of the size of each buffer created with clCreateBuffer, and then when a kernel has code that dereferenes a pointer to such a buffer (as in the tmp0_0_4[get_group_id(1)]), it would check that the index does not cause out-of-bound access? Kind of like a dynamic memory checker or memory debugger for OpenCL. (Like Valgrind, Purify etc for host platform code.)
Comment 3 Tor Lillqvist 2015-01-15 09:42:42 UTC
I added some debugging printout to the code to log the OpenCL buffers created, the host buffer address in cases where CL_MEM_USE_HOST_PTR is used, and to the generated OpenCL for the problematic lines, the one where indexing a buffer with get_group_id(1). I could not see any out-of-bound access in the resulting output. Attaching the diff and the output.
Comment 4 Tor Lillqvist 2015-01-15 09:44:08 UTC
Created attachment 112280 [details]
Patch to add debugging output to the relevant C++ and OpenCL code
Comment 5 Tor Lillqvist 2015-01-15 09:46:01 UTC
Created attachment 112281 [details]
Debug output from loading the attached spreadsheet

Output from running SAL_LOG=+WARN.sc.opencl+INFO.opencl+INFO.sc.opencl instdir/program/soffice /home/tml/Downloads/spreadsheet.ods
Comment 6 evgeniy.tyurin 2015-01-15 14:50:48 UTC
Yes, we notice out-of-bounds array references sporadically, but I'm not sure that it is connected with the sheets.

I agree, the spreadsheet I attached is not the reproducer.
Currently we are working on it. For now my suspicion is that there is a subtle sequence of actions produced by an AutoIt script with Calc that causes the issue.

Meantime, please, check that we are working on the save revision:
Build ID: da443ab58158d2b7ffa52742cec2be76e3aa2026.
Comment 7 evgeniy.tyurin 2015-01-15 14:52:08 UTC
I meant on the same revision, of course.
Comment 8 Tor Lillqvist 2015-01-15 14:53:55 UTC
Yes, that is the revision I have in the tree where I am looking at this.
Comment 9 Tor Lillqvist 2015-01-16 12:58:14 UTC
Created attachment 112341 [details]
Suggested patch
Comment 10 evgeniy.tyurin 2015-01-16 13:34:07 UTC
Could you please share with us LibreOffice build that has this patch?
Unfortunately we do not have the environment set up to build it.
Comment 11 Tor Lillqvist 2015-01-19 11:06:34 UTC
Created attachment 112453 [details]
Replacement scopencllo.dll
Comment 12 evgeniy.tyurin 2015-01-19 15:17:33 UTC
Created attachment 112479 [details]
Log with dumping of buffer sizes and handles
Comment 13 evgeniy.tyurin 2015-01-19 15:18:51 UTC
By the way,
I've studied our log (attached cl_3880.log) more closely, and that's what I found:

On line 283 we can see tmp0_0_0_SumIfs_reduction kernel is being queried for arg info. SetKernelArg (uiIndex=4) on line 297, which is 5th argument (i.e. tmp0_0_4), is being initialized with buffer handle 363061576. Above in the line 194 this handle can be seen returned from the CreateBuffer (line 185). The debug output line 168 shows it was created with size 8.

Just a bit below (line 208) the same buffer with handle 363061576 is being set as arg5 of DynamicKernel_nop_SumIfs kernel.

In other words, the buffer is created with size 8 for one kernel - DynamicKernel_nop_SumIfs (but actually tmp0_0_4 is not used there). And then reused in tmp0_0_0_SumIfs_reduction without re-allocating. All kernels seems to be enqeued with { 256, 100 }, { 256, 1 }.

May be this will give you guys a hint.
Please correct me in case I'm mistaken.
Comment 14 Tor Lillqvist 2015-01-19 15:53:35 UTC
Yes; isn't that was described already from the start, that a buffer is allocated with size 8 (one double) but then accessed way beyond that?
Comment 15 evgeniy.tyurin 2015-01-20 07:45:53 UTC
the new information is the buffer is used in 2 kernels.
Comment 16 Tor Lillqvist 2015-01-20 13:39:57 UTC
But that is not a problem, is it?
Comment 17 evgeniy.tyurin 2015-01-20 13:47:09 UTC
No, this is an observation, that might give you a hint regarding the root cause.
Comment 18 evgeniy.tyurin 2015-01-21 09:27:09 UTC
We experience issues with drop-in replacement of the dll. We observe crashes or hanging of the Calc.
Does patched Calc works correctly with/without OpenCL on your side?
Comment 19 evgeniy.tyurin 2015-02-03 08:57:48 UTC
Created attachment 113075 [details]
Spreadsheet to reproduce
Comment 20 evgeniy.tyurin 2015-02-03 09:00:10 UTC
I prepared a reproducer spreadsheet confirmed by trace log of OpenCL calls.
The spreadsheet and the log are attached.

SumIfs are located in Main.S3:S102 in the single column as requested.
Hope this will help you to root cause the issue.

1. Open destination_workbook.xlsx
2. Force Calc to compute formulas using OpenCL
3. Press Ctrl-Shift-F9 to recalculate all the formulas.
Comment 21 evgeniy.tyurin 2015-02-03 09:01:29 UTC
Created attachment 113076 [details]
Reproducer trace log of OpenCL calls
Comment 22 Tor Lillqvist 2015-02-11 09:59:07 UTC
When you say "Spreadsheet to reproduce", do you mean that you see the crash with that document? Is that with or without the suggested patch?
Comment 23 Tor Lillqvist 2015-02-11 10:58:28 UTC
Created attachment 113306 [details]
Patch to add even more debugging output
Comment 24 Tor Lillqvist 2015-02-11 11:00:36 UTC
I don't see that the reproduction document from comment #19 even causes the OpenCL code in tmp0_0_0_SumIfs_reduction() to enter the code paths where the patch from comment #9 adds additional tests to avoid buffer overflow.
Comment 25 evgeniy.tyurin 2015-02-11 11:14:54 UTC
The spreadsheet in comment #19 is intended to demonstrate the creation of the buffer with wrong size while global id ranges from 0 to 99. I used LibreOffice without any patches.
I suppose OpenCL code may not reach the crash point.
Comment 26 Tor Lillqvist 2015-02-12 11:51:01 UTC
For the 4.2 branch, our priority is to fix the crash. The OpenCL-generating code is much different in the 4.3, 4.4 and master branches.
Comment 27 evgeniy.tyurin 2015-02-12 12:05:20 UTC
Thanks, Tor.
I've been looking into 4.2 milestone-7-2, and I have a suspicion that the issue is fixed there. I will perform additional validations to be sure and update this ticket ASAP.
I guess 4.2 milestone-7-2 doesn't differ significantly comparing to the revision in the ticket, is that so?
Comment 28 Tor Lillqvist 2015-02-12 13:18:01 UTC
The changes between libreoffice-4-2-milestone-5 and libreoffice-4-2-milestone-7 can be seen with a simple git command, see below. Especially the "GPU-Calc: Change the szHostBuffer to fix the out of boundary bug" change seems quite relevant to this bug.

$ git log --stat libreoffice-4-2-milestone-5..libreoffice-4-2-milestone-7
commit f6fb7359657425acfb045a6daeaad20ca70a6cb8 (tag: libreoffice-4-2-milestone-7)
Author: Wei Wei <weiwei@multicorewareinc.com>
Date:   Fri Feb 28 15:07:36 2014 -0600

    GPU-Calc: remove Alloc_Host_Ptr for clmem of NAN vector
    Change-Id: I94841f3e4df30265f609b1405453f18f251e6beb

 sc/source/core/opencl/formulagroupcl.cxx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

commit e83eab460b7f31265b209c4bfff7eb4d235e9a77 (tag: libreoffice-4-2-milestone-6)
Author: Wei Wei <weiwei@multicorewareinc.com>
Date:   Fri Feb 28 11:57:56 2014 -0600

    GPU-Calc: remove the flag of USE_HOST_PTR and ALLOC_HOST_PTR
    Change-Id: I000688c93b0feeb6da213b0f1f307a5062a41504

 sc/source/core/opencl/formulagroupcl.cxx | 47 ++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 20 deletions(-)

commit 4f0d08df0f48c1c47f11e6f5206f445823deca1e
Author: Wei Wei <weiwei@multicorewareinc.com>
Date:   Fri Feb 28 11:25:33 2014 -0600

    GPU-Calc: Change the szHostBuffer to fix the out of boundary bug
    Change-Id: I4037fb3c0ddda1a9c4b0e777e1a75bc0b11fd24b

 sc/source/core/opencl/formulagroupcl.cxx | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

commit d1fc65581906ef0ce4c8e276ae67a35fb0fc6dfd
Author: Kohei Yoshida <kohei.yoshida@collabora.com>
Date:   Wed Feb 26 14:32:57 2014 -0500

    Store the length of originally requested array size prior to trimming.
    This change adds GetRequestedArrayLength() method to both single and double
    vector ref tokens, which returns the length of the requested array size prior
    to trimming of the trailing empty cell region.
    Change-Id: Iaba96fa2ea4ff3c8bccb0bc86fa4f1525e2f45fb

 formula/source/core/api/vectortoken.cxx     | 29 +++++++++++++++++++++--------
 include/formula/vectortoken.hxx             | 13 +++++++++----
 sc/source/core/data/grouptokenconverter.cxx | 12 ++++++------
 3 files changed, 36 insertions(+), 18 deletions(-)

commit 7da51df558c1fbc21188a0c58773ba549195ca5a
Author: Kohei Yoshida <kohei.yoshida@collabora.com>
Date:   Wed Feb 26 16:29:27 2014 -0500

    Ensure that numeric array storage is aligned to 256-byte boundary.
    OpenCL devices require this else we would get a performance hit.
    Change-Id: I6b1db6320fa84f933b6446022a0fd02ba267bf21

 sc/inc/formulagroup.hxx |  3 +-
 sc/inc/stlalgorithm.hxx | 91 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+), 1 deletion(-)
Comment 29 evgeniy.tyurin 2015-05-18 09:31:44 UTC
I confirm that the issue is fixed in libreoffice-4-2-milestone-7-2.
Most likely it's the commit
commit 4f0d08df0f48c1c47f11e6f5206f445823deca1e
Author: Wei Wei <weiwei@multicorewareinc.com>
Date:   Fri Feb 28 11:25:33 2014 -0600

    GPU-Calc: Change the szHostBuffer to fix the out of boundary bug
    Change-Id: I4037fb3c0ddda1a9c4b0e777e1a75bc0b11fd24b

 sc/source/core/opencl/formulagroupcl.cxx | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)