Bug 79422 - EDITING: Calc sluggish after find and replace due to large selection of replaced cells
Summary: EDITING: Calc sluggish after find and replace due to large selection of repla...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.2.4.2 release
Hardware: All All
: high critical
Assignee: Markus Mohrhard
URL:
Whiteboard: target:4.4.0
Keywords: regression
Depends on:
Blocks:
 
Reported: 2014-05-29 18:16 UTC by W.Muellner
Modified: 2015-01-02 12:40 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample data (ODS file) (1.16 MB, application/zip)
2014-05-29 18:16 UTC, W.Muellner
Details
30,000 row trimmed version of the supplied .ods (406.79 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-06-08 17:21 UTC, Yousuf Philips (jay) (retired)
Details
callrgind result (1.86 MB, application/x-bzip)
2014-07-31 10:28 UTC, Seyeong Kim
Details

Note You need to log in before you can comment on or make changes to this bug.
Description W.Muellner 2014-05-29 18:16:59 UTC
Created attachment 100118 [details]
Sample data (ODS file)

This bug is related to Bug 77515.

Workflow to reproduce:
1. open attached file
2. replace 'NA' with '' in whole file
3. wait some time, displayed cells with NA content are changed, then Calc seems to hang in endless loop.
4. after 4-5 minutes still no result...

br, Walter

PS: Calc doesn't seem to be Muellner-proof ;-)
Comment 1 W.Muellner 2014-05-29 18:25:31 UTC
Same behaviour when reducing the number of rows to approx 67000
br, WM
Comment 2 Yousuf Philips (jay) (retired) 2014-06-08 17:03:57 UTC
Did some testing on Windows 7 64-bit on a Intel Core 2 CPU @ 1.83Ghz and 2.5gb Laptop and here are the results

LibreOffice 4.2.4 Stats
-----------------
Loading File: 20 to 35 sec
Find & Replace: > 20 min*
* stopped it after the 20 minute mark and noticed that the 15-minute autorecovery was being activated while the processing was going on, which slowed this down oven more

Word 2013 Stats
---------------
Loading File: 12 sec
Find & Replace: 3 sec

Kingsoft Spreadsheets (ods converted to xlsx in LibO)
---------------
Loading File: 9 sec
Find & Replace: 5 sec

Decided to shrink the file down to 30,000 records and repeat the test and i discovered that the replacing can take 16 seconds if during the those 16 seconds you click the close button. After exiting from the 'Find & Replace' dialog, everything has become slow because all of the replaced cells are being selected, and clicking on any other cell will result in the UI being accessible again. So retesting the original file and clicking in the cancel button area right after clicking 'Replace All' shows the replacing took 4.5 minutes, and deselecting all the selected rows after exiting the dialog takes ~3 minutes.

Observing Kingsoft Spreadsheets and Excel, they dont maintain selection of these cells during the replacement, and once replacement is completed, they simply show a replacement result popup. With LibO 4.3 and above, there is a 'search results' dialog that appears after the replacing and it hangs the UI, even after i clicked the cancel button after starting the replacing. The 'search results' halted the UI for more than 5 mintues even though i tried to click the close button as soon as it appeared.
Comment 3 Yousuf Philips (jay) (retired) 2014-06-08 17:18:39 UTC
Just another note: as we are replacing 'NA' with '', the search results dialog shows no records, but if we were replacing it with 'ABC', the search results dialog found be filled with records which may also halt the UI.

My suggestion about the search results dialog, is to have an option in the 'Find & Replace' dialog on whether to show this dialog or not, and it should be disabled by default and by default the simple total replacements done message box should be show.
Comment 4 Yousuf Philips (jay) (retired) 2014-06-08 17:21:54 UTC
Created attachment 100683 [details]
30,000 row trimmed version of the supplied .ods
Comment 5 Yousuf Philips (jay) (retired) 2014-07-02 16:28:43 UTC
For a similar find and replace issue for writer, check bug 80715.
Comment 6 Seyeong Kim 2014-07-25 03:47:04 UTC
4.2.5, 4.3.0 still have this problem.

i tested 30000 rows ods attached.

after replaceall, endless loop what call Join func 1000 times inside.
Comment 7 W.Muellner 2014-07-25 06:59:43 UTC
... and btw: some powerful text editors ("emeditor") can do this change/replace in a fraction of a second in the 131K example csv file.
I know this is something different and cannot really be compared to a spreadseet application, but...
Comment 8 Nicolas R 2014-07-29 08:04:31 UTC
Same problem here with Win 7 Pro 64bits and LibO 4.2.5.

On a large ods file (attached files could be good examples), clicking the 'Find All' button in 'Find & replace' window open a 'Search Result' window with some cells coordinates then Libo freeze / hangs up 

Same problem with 4.3 RC

No problem with 4.1 branch

Perhaps side effect related to resolved bug 79011 ?

Can we add  Kohei Yoshida to CC list ?
Comment 9 Seyeong Kim 2014-07-30 08:05:05 UTC
when I open Search&Replace Dialog
timer what scan all of matched cell is triggered. ( at first this is not heavy )

after replacing large data, it's heavy and seems unlimited.

at each timer, call Join function which also heavy with large data

so.

   1. ReplaceAll(or SearchAll) is slow because Join function is not good.

   2. triggering timer causes problem after Replaceall


i made that search&replace dialog close if replaceall finished.

then #2 problem has gone.

but #1 still exists
Comment 10 Seyeong Kim 2014-07-31 10:28:51 UTC
Created attachment 103748 [details]
callrgind result
Comment 11 Kohei Yoshida 2014-07-31 20:09:03 UTC
(In reply to comment #10)
> Created attachment 103748 [details]
> callrgind result

Thanks!  This helps.  No need to use tar when compressing though; you can run bzip2 directly on the original file if you are only compressing a single file.  That way kcachegrind can open the compressed data directly without you needing to uncompress the file first.  Just for a quick tip.

Now, the profile data suggests that the hot spot is

ScRangeList::Join()

which is interesting.  This call is used probably while calculating the areas where the replacement occurred so that Calc can highlight those areas at the very end.  Finding out why this becomes slow all of a sudden will lead us to the right direction.
Comment 12 Seyeong Kim 2014-08-01 01:04:19 UTC
right

if there are around 40000 rows. then assume that replaced 6200 rows.

Join() called 6200 count and 

inside Join(), loop from 0~0 to 0~6200. this is actual replace job.

after replaceall, trigger to call Join() every time above.

below is function call i found.

1. when push replaceall button

in sc directory

something more for handle click
ScDocument::SearchAndReplace
ScTable::SearchAndReplace
ScTable::ReplaceAll
ScRangeList::Join


2. triggering every time

svx/source/dialog/srchdlg.cxx
IMPL_LINK(SvxSearchDialog, TimeoutHdl_Impl, Timer*, pTimer)
trigerring

then in sc directory

ScDocument::GetSelectionFunction
ScTable::UpdateSelectionFunction
ScColumn::UpdateSelectionFunction ( here loop 1024 if selected col is only 1 )
ColumnSpanSet::set
ScMarkData::GetMarkedRanges
ScMarkData::FillRangeListWithMarks
ScRangeList::Join ( slow )


I disabled #2 function. then after replace all cpu 100% usage not happened.
i know it's wrong but i just tested.

I modified ScColumn::UpdateSelectionFunction to loop only selected col times for testing.
but it was not enough.
Comment 13 Seyeong Kim 2014-08-01 01:08:58 UTC
(In reply to comment #12)
> right
> 
> if there are around 40000 rows. then assume that replaced 6200 rows.
> 
> Join() called 6200 count and 
> 
> inside Join(), loop from 0~0 to 0~6200. this is actual replace job.
> 
> after replaceall, trigger to call Join() every time above.
> 
> below is function call i found.
> 
> 1. when push replaceall button
> 
> in sc directory
> 
> something more for handle click
> ScDocument::SearchAndReplace
> ScTable::SearchAndReplace
> ScTable::ReplaceAll
> ScRangeList::Join
> 
> 
> 2. triggering every time
> 
> svx/source/dialog/srchdlg.cxx
> IMPL_LINK(SvxSearchDialog, TimeoutHdl_Impl, Timer*, pTimer)
> trigerring
> 
> then in sc directory
>


here 
ScTabViewShell::HasSelecthion first

 
> ScDocument::GetSelectionFunction
> ScTable::UpdateSelectionFunction
> ScColumn::UpdateSelectionFunction ( here loop 1024 if selected col is only 1
> )
> ColumnSpanSet::set
> ScMarkData::GetMarkedRanges
> ScMarkData::FillRangeListWithMarks
> ScRangeList::Join ( slow )
> 
> 
> I disabled #2 function. then after replace all cpu 100% usage not happened.
> i know it's wrong but i just tested.
> 
> I modified ScColumn::UpdateSelectionFunction to loop only selected col times
> for testing.
> but it was not enough.
Comment 14 Markus Mohrhard 2014-08-16 03:10:09 UTC
Simple fix for the problem.

We should not call ScMarkData::GetMarkedRanges in ScColumn. If we do it already in ScTable we are down to 10 calls from 10k calls. After that the ScRangeList::join is not visible anymore in the profile.
Comment 15 Commit Notification 2014-08-16 03:36:17 UTC
Markus Mohrhard committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1cf19ea84794ca065749667b480dfed2d27d47b7

don't call ScMarkData::GetMarkedRanges in ScColumn, related fdo#79422



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 16 Seyeong Kim 2014-08-19 06:16:16 UTC
Thanks for fixing.

but.. is this fix working?

i tested master branch

but i still got same problem

after replace all large data, hang for a long time.

thanks
Comment 17 Markus Mohrhard 2014-08-21 10:29:37 UTC
(In reply to comment #16)
> Thanks for fixing.
> 
> but.. is this fix working?
> 
> i tested master branch
> 
> but i still got same problem
> 
> after replace all large data, hang for a long time.
> 
> thanks

It is significantly faster. Instead of several minutes it opens here after a few dozen seconds. There are surely more problems but at least 90% of the original time was spent in this code.
Comment 18 Seyeong Kim 2014-08-21 10:37:18 UTC
yeah right

it's faster than before

thanks for fixing
Comment 19 Nicolas R 2014-08-25 08:17:39 UTC
Hi,

I've tried today the "30000 rows" attached ods for a remplacement of values 'NA'  with 'NB' 
- select column 'risk factor'
- ctrl-H for the remplace window
- Search for : NA
- Replace with : NB
- Click 'Find All' ... and wait .......

Tried with 
libo-42~2014-08-22_11.48.56_LibreOfficeDev_4.2.7.0.0_Win_x86
and
libo-43~2014-08-23_08.23.00_LibreOfficeDev_4.3.2.0.0_Win_x86

Sorry for this bad news but still almost unusable for me ( I often use Calc for such search/replace on big csv/text files)

Try the same global replacement with version 4.1.6. It's almost immediate.


In which branch is the patch ?
Comment 20 Yousuf Philips (jay) (retired) 2014-08-25 09:15:45 UTC
@Nicklos: It is patched in the daily master build found at < http://dev-builds.libreoffice.org/daily/master/ >.

Tested the 30k file and the replacement took 5 seconds to complete, the only problems i see now is that the search results halts the UI for a bit, then when i close it, the find & replace dialog halts the UI for a bit.
Comment 21 Nicolas R 2014-08-25 10:57:00 UTC
@Jay Philips

Thanks. Following your link , tested with libo-master~2014-08-25_01.27.12_LibreOfficeDev_4.4.0.0.alpha0_Win_x86

With this version (win 7 pro 64 bits, I7 , 8Gb ram)
 
'30000 rows' test file :
Find all : Search Result windows appears immediatly, but around 7 sec before LibO interface becomes responsive again ( eg : scroll to the end of search result window)
If I select an entry in Search Results windows, this entry is displayed almost immediatly ... but still 3 sec before interface becomes responsive.
Replace all : like 'Find All' ... around 7sec before interface becomes responsive.

Same file with 4.1.6 : Find all and Replace All in less than 1 sec

But, with one of my 'real life' files ( 40 000 adresses), Find all with one column selected and search restricted to 'selection' ( as in previous example ) => always not responsive after 10 minutes ! (I've forced close of LibO).

This 'real life' file with 4.1.6 : around 3 sec for Find All or Replace All

So,sorry again, but still unusable for me for this kind of job.

This 'search results' windows with direct access is a good idea but not usable for big files at this time.
Comment 22 Markus Mohrhard 2014-08-25 13:54:32 UTC
Open a new bug report and attach your file. There are surely more issue but please keep this bug report clean.
Comment 23 Nicolas R 2014-08-27 09:55:08 UTC
Ok,
I'll prepare an 'anonymized' 40000 adresses files for a new bug report, but I really think this performance problem is linked to new find / replace results window.

I've tested the fist attachment 'Sample data ODS file' ... More than 3 minutes for a 'find all' with LibO 4.4.0 and 5 seconds with LibO 4.1.6.

The increase in response time isn't linear with the increase in file size.
Comment 24 Nicolas R 2014-08-27 12:09:58 UTC
(In reply to comment #22)
> Open a new bug report and attach your file. There are surely more issue but
> please keep this bug report clean.

Ok, new bug 83141
Comment 25 Yousuf Philips (jay) (retired) 2015-01-02 12:40:32 UTC
Just curious whether this patch was also pushed into 4.2 and 4.3.