109097 – mdds accelerating random lookups

Bug 109097 - mdds accelerating random lookups

Summary: mdds accelerating random lookups

Status:	RESOLVED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Calc (show other bugs)
Version: (earliest affected)	5.3.3.1 rc
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:	target:7.0
Keywords:

Depends on:
Blocks:

Reported:	2017-07-13 11:25 UTC by Michael Meeks
Modified:	2020-05-01 14:01 UTC (History)
CC List:	5 users (show)

See Also:	multi_type_vector-regressions
Crash report or crash signature:

Attachments
Callgrind trace (6.88 MB, application/x-xz) 2017-07-13 14:33 UTC, Aron Budea	Details
Log of accessed rows/columns (337.78 KB, application/x-xz) 2017-07-19 09:54 UTC, Aron Budea	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Meeks 2017-07-13 11:25:37 UTC

mdds provides an excellent, compact and high-performance storage for
lots of cases, however we have a few files whereby the performance of
lookups in mdds could be improved. These are for calculation of
formulae in sheets that have rather sparse data in columns -
containing many blocks of blanks particularly interspersed with data.

I understand that the calc team have spent a lot of time fixing the
root causes of many of these issues by using better iterators down
columns and so on, but for this specific case - having better "random
access" performance could be extremely helpful.

	ScColumn::GetCellValue

is the majority of this cost - called ultimately from ScInterpreter.

I wonder if we could prototype this inside ScColumn::GetCellValue in
fact (?) thoughts appreciated. Hopefully this would lead to fixing the
main random access problem, while keeping examples of poor iteration
visible in the profiles.

Code pointers:

workdir/UnpackedTarball/mdds/include/mdds/multi_type_vector.hpp
sc/inc/mtvelements.hxx

typedef mdds::multi_type_vector<CellFunc, CellStoreEvent> CellStoreType;

typedef mdds::multi_type_vector<CellFunc, CellStoreEvent> CellStoreType;
    CellStoreType::const_iterator itBlk = rSrc.begin(), itBlkEnd = rSrc.end();

The problem here being that:

    struct block
    {
        size_type m_size;
        element_block_type* mp_data;
    }

So each block has no idea of its absolute position - which makes a lot
of sense for lots of cases such as mutating the sheet - inserting rows
etc. otherwise we could store absolute position, and size as a
difference to the next/terminating block.

From an optimizing trade-off perspective, optimizing for inserts in
the middle of sheets at the expense of individual cell lookups, or
file-filters appending data is probably not deliberate.

Of course - failing this, during very 'random' access patterns such as
calculation; we could carry around some sort of lookup / caches of
common columns we like to get data from.

Aron - any chance you can add some SAL_DEBUG() lines to that pivot
sheet to print out all the arguments to: ScColumn::GetCellValue -
including the column's members, nCol, nTab and the nRow passed in ?
hopefully that will give us a big log to see if there is indeed any
patterns in at least this case.

And of course, probably this is a duplicate ticket =) Thoughts ?

Comment 1 Michael Meeks 2017-07-13 11:26:42 UTC

Aron - I wonder if you could dig out that data by adding a SAL_DEBUG ? =) my suspicion is that the formula calculation order is semi-random but, perhaps it is periodic enough that we can do something (?).

Comment 2 Markus Mohrhard 2017-07-13 13:42:56 UTC

Do you have a callgrind log for such a case? Normally all cases that I have seen can be somewhat reworked to make better use of the new data storage as it is quite difficult to produce a test document that accesses a huge amount of cells in an unstructured way.

It might not be always trivial to find the place that is actually looping as there are quite often several layers of indirection. Through these indirections cell access might appear to be random instead of being structured.

Comment 3 Aron Budea 2017-07-13 14:33:00 UTC

Created attachment 134615 [details]
Callgrind trace

Here's a Callgrind output (well, the first of the 8 files). Unfortunately the spreadsheet involved cannot be shared.

Comment 4 Markus Mohrhard 2017-07-13 15:26:11 UTC

(In reply to Aron Budea from comment #3)
> Created attachment 134615 [details]
> Callgrind trace
> 
> Here's a Callgrind output (well, the first of the 8 files). Unfortunately
> the spreadsheet involved cannot be shared.

Well, that one shows that the performance problem is in the matrix code where we use index based access to get each value out of two matrices and insert into a new one. We have existing code to do this with iterators which will improve the performance of that code significantly.

At least from a quick look I don't see anything in this file that reflects the ScColumn::GetCellValue problem Michael mentioned. Based on a quick reading 75% of the import time is spend in 12 lcl_GetMinExtend calls with all the time spend in inefficient iteration through the matrix.

Maybe you have a different document showing real problems but I doubt that this one corresponds to the problems mentioned by Michael. This one is actually a quite "easy" fix if you follow the existing ScMatrix performance hacks.

Comment 5 Michael Meeks 2017-07-13 19:47:14 UTC

Ah - so there are indeed several performance problems here; at least, when I search for: multi_type_matrix in the sc/ library - I see 4.7bn perhaps up to 8bn pcycles in that code.

When I search for multi_type_vector I see 11bn pcycles (15%) as the top hit, coming primarily from ScColumn::GetValue; but of course, I live to be mistake =)

Pretty easy to no-op the ScColumn::GetValue method and see what the impact on performance is I guess. Still - if it is easy to improve the matrix code it would be worth doing.

Comment 6 Aron Budea 2017-07-19 09:54:14 UTC

Created attachment 134730 [details]
Log of accessed rows/columns

Michael, the table is read several times with different access patterns, which ranges from a single column's value being read to more than 10 values being read (both per row).
The pattern is mostly the same throughout a loop on the table, but there's a bit of variance as well, and I've also seen few cases of jumping back and forth between rows, otherwise row order is stable. Attaching a log with the accesses.

Markus, the lcl_GetMinExtent(...) you mentioned is being inlined in ScMatrixRef::lcl_MatrixCalculation(...) and ScInterpreter::MatConcat(...), right? I can see ScMul() and CalculateAddSub(...) in ScInterpreter that rely on lcl_MatrixCalculation(...), but their contribution is far from 75%. What else needs considering here?

Comment 7 Kohei Yoshida 2020-05-01 12:12:56 UTC

Since this should have been solved by the latest mdds which implements binary search on block position lookup, I'll mark this as resolved.

Comment 8 Michael Meeks 2020-05-01 14:01:06 UTC

Thanks Kohei ! =)