Bug 121052 - Calc: Slow load of cells with VLOOKUP with references to empty cells
Summary: Calc: Slow load of cells with VLOOKUP with references to empty cells
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Serge Krot (CIB)
URL:
Whiteboard: target:6.3.0
Keywords: filter:ods, perf
Depends on:
Blocks: File-Opening
  Show dependency treegraph
 
Reported: 2018-10-30 08:46 UTC by Serge Krot (CIB)
Modified: 2019-08-10 21:18 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample document with empty first column (114.60 KB, application/vnd.oasis.opendocument.spreadsheet)
2018-10-30 08:48 UTC, Serge Krot (CIB)
Details
Sample document with filled first column (149.97 KB, application/vnd.oasis.opendocument.spreadsheet)
2018-10-30 08:48 UTC, Serge Krot (CIB)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Serge Krot (CIB) 2018-10-30 08:46:40 UTC
Calc uses ScDocRowHeightUpdater::update() method during load. Inside this method the document is being recalculated to get height of the cells with real cell values. The document has 2 tables (each table has 2 columns):

Table-1:
A,B
101,VLOOKUP(A1,X1:Z255,2,0)
102,VLOOKUP(A2,X1:Z255,2,0)
103,VLOOKUP(A3,X1:Z255,2,0)
...
199,VLOOKUP(A199,X1:Z255,2,0)

Table-2:
X,Z
1,Product1
2,Product2
3,Product3
...
255,Product255

After load the first table has following visible values:
A,B
101,Product1
102,Product2
103,Product3
...
199,Product199

Now, if we take the same document but the whole A column is empty, the document will be loaded very slow comparing to the document that contains data.

The problem:
* Calculation of the VLOOKUP is too slow when no data is provided

Useful link:
* https://help.libreoffice.org/6.2/en-US/text/scalc/01/04060109.html?DbPAR=CALC#bm_id3153152 - VLOOKUP documentation

Typical call stack:
 	[External Code]	
>	[Inline Frame] sal3.dll!rtl_uString_ImplAlloc(long) Line 1146	C++
 	sal3.dll!rtl_uString_new_WithLength(_rtl_uString * * ppThis, long nLen) Line 1263	C++
 	[Inline Frame] svllo.dll!rtl::OUStringBuffer::{ctor}() Line 71	C++
 	svllo.dll!SvNumberformat::GetOutputString(double fNumber, rtl::OUString & OutString, Color * * ppColor) Line 2392	C++
 	svllo.dll!SvNumberFormatter::GetInputLineString(const double & fOutNumber, unsigned long nFIndex, rtl::OUString & sOutString) Line 1557	C++
 	sclo.dll!ScCellFormat::GetInputString(ScRefCellValue & rCell, unsigned long nFormat, rtl::OUString & rString, SvNumberFormatter & rFormatter, const ScDocument * pDoc) Line 151	C++
 	sclo.dll!`anonymous namespace'::QueryEvaluator::compareByString(ScRefCellValue & rCell, long nRow, const ScQueryEntry & rEntry, const ScQueryEntry::Item & rItem, const ScInterpreterContext * pContext) Line 2497	C++
 	sclo.dll!ScTable::ValidQuery(long nRow, const ScQueryParam & rParam, const ScRefCellValue * pCell, bool * pbTestEqualCondition, const ScInterpreterContext * pContext) Line 2804	C++
 	sclo.dll!ScQueryCellIterator::GetThis() Line 1176	C++
 	sclo.dll!lcl_LookupQuery(ScAddress & o_rResultPos, ScDocument * pDoc, const ScInterpreterContext & rContext, const ScQueryParam & rParam, const ScQueryEntry & rEntry) Line 9640	C++
 	sclo.dll!ScInterpreter::LookupQueryWithCache(ScAddress & o_rResultPos, const ScQueryParam & rParam) Line 9677	C++
 	sclo.dll!ScInterpreter::CalculateLookup(bool bHLookup) Line 7356	C++
	sclo.dll!ScInterpreter::ScVLookup() Line 7432	C++
 	sclo.dll!ScInterpreter::Interpret() Line 4203	C++
 	sclo.dll!ScFormulaCell::InterpretTail(ScInterpreterContext & rContext, ScFormulaCell::ScInterpretTailParameter eTailParam) Line 1868	C++
 	sclo.dll!ScFormulaCell::Interpret() Line 1579	C++
 	[Inline Frame] sclo.dll!ScFormulaCell::MaybeInterpret() Line 2674	C++
 	sclo.dll!ScFormulaCell::GetErrCode() Line 2918	C++
 	sclo.dll!ScCellFormat::GetString(ScRefCellValue & rCell, unsigned long nFormat, rtl::OUString & rString, Color * * ppColor, SvNumberFormatter & rFormatter, const ScDocument * pDoc, bool bNullVals, bool bFormula, bool bUseStarFormat) Line 79	C++
 	sclo.dll!ScCellFormat::GetString(ScDocument & rDoc, const ScAddress & rPos, unsigned long nFormat, Color * * ppColor, SvNumberFormatter & rFormatter, bool bNullVals, bool bFormula) Line 117	C++
 	sclo.dll!ScDocument::GetCellScriptType(const ScAddress & rPos, unsigned long nNumberFormat) Line 120	C++
 	sclo.dll!ScDocument::GetScriptType(short nCol, long nRow, short nTab) Line 149	C++
 	sclo.dll!ScColumn::GetNeededSize(long nRow, OutputDevice * pDev, double nPPTX, double nPPTY, const Fraction & rZoomX, const Fraction & rZoomY, bool bWidth, const ScNeededSizeOptions & rOptions, const ScPatternAttr * * ppPatternChange) Line 249	C++
 	sclo.dll!ScColumn::GetOptimalHeight(sc::RowHeightContext & rCxt, long nStartRow, long nEndRow, unsigned short nMinHeight, long nMinStart) Line 949	C++
 	sclo.dll!`anonymous namespace'::GetOptimalHeightsInColumn(sc::RowHeightContext & rCxt, ScColContainer & rCol, long nStartRow, long nEndRow, ScProgress * pProgress, unsigned long nProgressStart) Line 123	C++
 	sclo.dll!ScTable::SetOptimalHeight(sc::RowHeightContext & rCxt, long nStartRow, long nEndRow, ScProgress * pOuterProgress, unsigned long nProgressStart) Line 480	C++
 	sclo.dll!ScDocRowHeightUpdater::update() Line 2589	C++
 	sclo.dll!ScXMLImport::endDocument() Line 1809	C++
 	expwraplo.dll!sax_fastparser::FastSaxParserImpl::parseStream(const com::sun::star::xml::sax::InputSource & rStructSource) Line 876	C++
 	expwraplo.dll!sax_fastparser::FastSaxParser::parseStream(const com::sun::star::xml::sax::InputSource & aInputSource) Line 1378	C++
 	xolo.dll!SvXMLImport::parseStream(const com::sun::star::xml::sax::InputSource & aInputSource) Line 485	C++
 	sclo.dll!ScXMLImportWrapper::ImportFromComponent(const com::sun::star::uno::Reference<com::sun::star::uno::XComponentContext> & xContext, const com::sun::star::uno::Reference<com::sun::star::frame::XModel> & xModel, const com::sun::star::uno::Reference<com::sun::star::xml::sax::XParser> & xParser, com::sun::star::xml::sax::InputSource & aParserInput, const rtl::OUString & sComponentName, const rtl::OUString & sDocName, const rtl::OUString & sOldDocName, const com::sun::star::uno::Sequence<com::sun::star::uno::Any> & aArgs, bool bMustBeSuccessfull) Line 190	C++
 	sclo.dll!ScXMLImportWrapper::Import(ImportFlags nMode, ErrCode & rError) Line 512	C++
 	[Inline Frame] sclo.dll!ErrCode::operator bool() Line 85	C++
 	sclo.dll!ScDocShell::LoadXML(SfxMedium * pLoadMedium, const com::sun::star::uno::Reference<com::sun::star::embed::XStorage> & xStor) Line 501	C++
 	sclo.dll!ScDocShell::Load(SfxMedium & rMedium) Line 660	C++
 	sfxlo.dll!SfxObjectShell::LoadOwnFormat(SfxMedium & rMedium) Line 3056	C++
 	sfxlo.dll!SfxObjectShell::DoLoad(SfxMedium * pMed) Line 725	C++
 	sfxlo.dll!SfxBaseModel::load(const com::sun::star::uno::Sequence<com::sun::star::beans::PropertyValue> & seqArguments) Line 1795	C++
 	sfxlo.dll!`anonymous namespace'::SfxFrameLoader_Impl::load(const com::sun::star::uno::Sequence<com::sun::star::beans::PropertyValue> & rArgs, const com::sun::star::uno::Reference<com::sun::star::frame::XFrame> & _rTargetFrame) Line 688	C++
 	fwklo.dll!framework::LoadEnv::impl_loadContent() Line 1152	C++
 	fwklo.dll!framework::LoadEnv::startLoading() Line 387	C++
 	fwklo.dll!framework::LoadDispatcher::impl_dispatch(const com::sun::star::util::URL & rURL, const com::sun::star::uno::Sequence<com::sun::star::beans::PropertyValue> & lArguments, const com::sun::star::uno::Reference<com::sun::star::frame::XDispatchResultListener> & xListener) Line 110	C++
 	fwklo.dll!framework::LoadDispatcher::dispatch(const com::sun::star::util::URL & aURL, const com::sun::star::uno::Sequence<com::sun::star::beans::PropertyValue> & lArguments) Line 52	C++
 	sfxlo.dll!sfx2::RecentDocsView::ExecuteHdl_Impl(sfx2::RecentDocsView * __formal, void * p) Line 393	C++
 	[Inline Frame] vcllo.dll!Link<void *,void>::Call(void *) Line 84	C++
 	[Inline Frame] vcllo.dll!ImplHandleUserEvent(ImplSVEvent *) Line 1929	C++
 	vcllo.dll!ImplWindowFrameProc(vcl::Window * _pWindow, SalEvent nEvent, const void * pEvent) Line 2482	C++
 	vcllo.dll!SalFrame::CallCallback(SalEvent nEvent, const void * pEvent) Line 280	C++
 	[Inline Frame] vclplug_winlo.dll!ImplHandleUserEvent(HWND__ *) Line 4071	C++
 	vclplug_winlo.dll!SalFrameWndProc(HWND__ * hWnd, unsigned int nMsg, unsigned int wParam, long lParam, bool & rDef) Line 5724	C++
 	vclplug_winlo.dll!SalFrameWndProcW(HWND__ * hWnd, unsigned int nMsg, unsigned int wParam, long lParam) Line 5832	C++
 	[External Code]	
 	[Inline Frame] vclplug_winlo.dll!ImplSalDispatchMessage(const tagMSG *) Line 409	C++
 	vclplug_winlo.dll!ImplSalYield(bool bWait, bool bHandleAllCurrentEvents) Line 440	C++
 	vclplug_winlo.dll!WinSalInstance::DoYield(bool bWait, bool bHandleAllCurrentEvents) Line 511	C++
 	[Inline Frame] vcllo.dll!ImplYield(bool) Line 469	C++
 	[Inline Frame] vcllo.dll!Application::Yield() Line 537	C++
 	vcllo.dll!Application::Execute() Line 449	C++
 	sofficeapp.dll!desktop::Desktop::Main() Line 1637	C++
 	vcllo.dll!ImplSVMain() Line 199	C++
 	sofficeapp.dll!soffice_main() Line 169	C++
 	[Inline Frame] soffice.bin!sal_main() Line 48	C
 	[Inline Frame] soffice.bin!main(int) Line 47	C
 	soffice.bin!WinMain(void * _hinst, void * _dummy, char * _cmdline, int _nshow) Line 47	C
 	[External Code]
Comment 1 Serge Krot (CIB) 2018-10-30 08:48:22 UTC
Created attachment 146161 [details]
Sample document with empty first column
Comment 2 Serge Krot (CIB) 2018-10-30 08:48:52 UTC
Created attachment 146162 [details]
Sample document with filled first column
Comment 3 Eike Rathke 2018-10-30 18:38:49 UTC
For the subexpression
VLOOKUP(A2; $F$2:$G$10000; 2; 0)
with Sort Order being 0 (= exact lookup, no range lookup) if A2 is empty VLOOKUP() has to skim through the entire range F2:F10000 to find that there is no empty cell value and the result is #N/A, that repeatedly for each and every VLOOKUP() call in B2:B2616

Probably adding a cache entry to ScLookupCache for the empty cell case might help.

Btw, the expression
=IF(ISERROR(VLOOKUP(A2; $F$2:$G$10000; 2; 0));"";(VLOOKUP(A2; $F$2:$G$10000; 2; 0)))
is suboptimal as in the non-error case the VLOOKUP() has to be executed twice.
Better would be
=IFNA(VLOOKUP(A2;$F$2:$G$10000;2;0);"")
which not only executes VLOOKUP() only once but also suppresses only the #N/A error while still propagating other errors. Otherwise, if all errors are to be suppressed use IFERROR() instead of IFNA().
Comment 4 Commit Notification 2018-11-22 20:26:01 UTC
Serge Krot committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/9e55c8c7ffc43004ca314edcd4712dcc29adf933%5E%21

tdf#121052 sc: avoid multiple empty value lookups in ranges

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 5 Commit Notification 2018-11-23 11:32:49 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/0c8b936734285ee7f373c41ecbad5a65953789dc%5E%21

Check isEmptyStringQuery() early to avoid call ..., tdf#121052 follow-up

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Frank Fuchs 2018-12-30 12:16:53 UTC
This fix is listed in the LibO 6.2 release notes (https://wiki.documentfoundation.org/ReleaseNotes/6.2) under the Calc section.
However, the comments imply it has been fixed for LibO 6.3, only.
Comment 8 Xisco Faulí 2019-01-03 13:39:15 UTC
(In reply to Frank Fuchs from comment #7)
> This fix is listed in the LibO 6.2 release notes
> (https://wiki.documentfoundation.org/ReleaseNotes/6.2) under the Calc
> section.
> However, the comments imply it has been fixed for LibO 6.3, only.

Thanks for noticing.
Moved it to 6.3 release notes...
Comment 9 Cor Nouws 2019-07-26 17:48:31 UTC
@eike @serge

Trying to do a graph of performance improvements.
I timed the loading of the vlookup_empty.ods but cannot see difference (see below)
Do you have a number before/after? Thanks

$ time OOO_EXIT_POST_STARTUP=1 libreoffice/527rc2/libreoffice5.2/program/soffice Documenten/Data/Nou\&Off/Partners/Collabora/Marketing/Features/LibreOffice6.3/vlookup_empty.ods	
real	0m6,470s
user	0m2,582s
sys	0m0,242s
	
$ time OOO_EXIT_POST_STARTUP=1 libreoffice/605rc2/libreoffice6.0/program/soffice Documenten/Data/Nou\&Off/Partners/Collabora/Marketing/Features/LibreOffice6.3/vlookup_empty.ods	
real	0m8,603s
user	0m3,215s
sys	0m0,209s
	
$ time OOO_EXIT_POST_STARTUP=1 libreoffice/630rc1/libreoffice6.3/program/soffice Documenten/Data/Nou\&Off/Partners/Collabora/Marketing/Features/LibreOffice6.3/vlookup_empty.ods	
real	0m7,944s
user	0m2,509s
sys	0m0,266s
Comment 10 Xisco Faulí 2019-07-29 10:44:37 UTC
Hi Cor,
I see a huge difference. in

Version: 6.2.0.0.alpha1+
Build ID: 26c375671aa362b2f59d84645784938677ae1719
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); Calc: threaded

it takes

real	1m37,400s
user	1m20,230s
sys	0m9,305s

while in

Version: 6.4.0.0.alpha0+
Build ID: 0d36b32755ac662299e6a8165e9fa57311b74a2f
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

it takes

real	0m13,626s
user	0m6,795s
sys	0m0,864s
Comment 11 Cor Nouws 2019-07-29 17:59:29 UTC
(In reply to Xisco Faulí from comment #10)
> Hi Cor,
> I see a huge difference. in
thanks! No idea why it didn't show that for me :)