Created attachment 100118 [details]
Sample data (ODS file)
This bug is related to Bug 77515.
Workflow to reproduce:
1. open attached file
2. replace 'NA' with '' in whole file
3. wait some time, displayed cells with NA content are changed, then Calc seems to hang in endless loop.
4. after 4-5 minutes still no result...
PS: Calc doesn't seem to be Muellner-proof ;-)
Same behaviour when reducing the number of rows to approx 67000
Did some testing on Windows 7 64-bit on a Intel Core 2 CPU @ 1.83Ghz and 2.5gb Laptop and here are the results
LibreOffice 4.2.4 Stats
Loading File: 20 to 35 sec
Find & Replace: > 20 min*
* stopped it after the 20 minute mark and noticed that the 15-minute autorecovery was being activated while the processing was going on, which slowed this down oven more
Word 2013 Stats
Loading File: 12 sec
Find & Replace: 3 sec
Kingsoft Spreadsheets (ods converted to xlsx in LibO)
Loading File: 9 sec
Find & Replace: 5 sec
Decided to shrink the file down to 30,000 records and repeat the test and i discovered that the replacing can take 16 seconds if during the those 16 seconds you click the close button. After exiting from the 'Find & Replace' dialog, everything has become slow because all of the replaced cells are being selected, and clicking on any other cell will result in the UI being accessible again. So retesting the original file and clicking in the cancel button area right after clicking 'Replace All' shows the replacing took 4.5 minutes, and deselecting all the selected rows after exiting the dialog takes ~3 minutes.
Observing Kingsoft Spreadsheets and Excel, they dont maintain selection of these cells during the replacement, and once replacement is completed, they simply show a replacement result popup. With LibO 4.3 and above, there is a 'search results' dialog that appears after the replacing and it hangs the UI, even after i clicked the cancel button after starting the replacing. The 'search results' halted the UI for more than 5 mintues even though i tried to click the close button as soon as it appeared.
Just another note: as we are replacing 'NA' with '', the search results dialog shows no records, but if we were replacing it with 'ABC', the search results dialog found be filled with records which may also halt the UI.
My suggestion about the search results dialog, is to have an option in the 'Find & Replace' dialog on whether to show this dialog or not, and it should be disabled by default and by default the simple total replacements done message box should be show.
Created attachment 100683 [details]
30,000 row trimmed version of the supplied .ods
For a similar find and replace issue for writer, check bug 80715.
4.2.5, 4.3.0 still have this problem.
i tested 30000 rows ods attached.
after replaceall, endless loop what call Join func 1000 times inside.
... and btw: some powerful text editors ("emeditor") can do this change/replace in a fraction of a second in the 131K example csv file.
I know this is something different and cannot really be compared to a spreadseet application, but...
Same problem here with Win 7 Pro 64bits and LibO 4.2.5.
On a large ods file (attached files could be good examples), clicking the 'Find All' button in 'Find & replace' window open a 'Search Result' window with some cells coordinates then Libo freeze / hangs up
Same problem with 4.3 RC
No problem with 4.1 branch
Perhaps side effect related to resolved bug 79011 ?
Can we add Kohei Yoshida to CC list ?
when I open Search&Replace Dialog
timer what scan all of matched cell is triggered. ( at first this is not heavy )
after replacing large data, it's heavy and seems unlimited.
at each timer, call Join function which also heavy with large data
1. ReplaceAll(or SearchAll) is slow because Join function is not good.
2. triggering timer causes problem after Replaceall
i made that search&replace dialog close if replaceall finished.
then #2 problem has gone.
but #1 still exists
Created attachment 103748 [details]
(In reply to comment #10)
> Created attachment 103748 [details]
> callrgind result
Thanks! This helps. No need to use tar when compressing though; you can run bzip2 directly on the original file if you are only compressing a single file. That way kcachegrind can open the compressed data directly without you needing to uncompress the file first. Just for a quick tip.
Now, the profile data suggests that the hot spot is
which is interesting. This call is used probably while calculating the areas where the replacement occurred so that Calc can highlight those areas at the very end. Finding out why this becomes slow all of a sudden will lead us to the right direction.
if there are around 40000 rows. then assume that replaced 6200 rows.
Join() called 6200 count and
inside Join(), loop from 0~0 to 0~6200. this is actual replace job.
after replaceall, trigger to call Join() every time above.
below is function call i found.
1. when push replaceall button
in sc directory
something more for handle click
2. triggering every time
IMPL_LINK(SvxSearchDialog, TimeoutHdl_Impl, Timer*, pTimer)
then in sc directory
ScColumn::UpdateSelectionFunction ( here loop 1024 if selected col is only 1 )
ScRangeList::Join ( slow )
I disabled #2 function. then after replace all cpu 100% usage not happened.
i know it's wrong but i just tested.
I modified ScColumn::UpdateSelectionFunction to loop only selected col times for testing.
but it was not enough.
(In reply to comment #12)
> if there are around 40000 rows. then assume that replaced 6200 rows.
> Join() called 6200 count and
> inside Join(), loop from 0~0 to 0~6200. this is actual replace job.
> after replaceall, trigger to call Join() every time above.
> below is function call i found.
> 1. when push replaceall button
> in sc directory
> something more for handle click
> 2. triggering every time
> IMPL_LINK(SvxSearchDialog, TimeoutHdl_Impl, Timer*, pTimer)
> then in sc directory
> ScColumn::UpdateSelectionFunction ( here loop 1024 if selected col is only 1
> ScRangeList::Join ( slow )
> I disabled #2 function. then after replace all cpu 100% usage not happened.
> i know it's wrong but i just tested.
> I modified ScColumn::UpdateSelectionFunction to loop only selected col times
> for testing.
> but it was not enough.
Simple fix for the problem.
We should not call ScMarkData::GetMarkedRanges in ScColumn. If we do it already in ScTable we are down to 10 calls from 10k calls. After that the ScRangeList::join is not visible anymore in the profile.
Markus Mohrhard committed a patch related to this issue.
It has been pushed to "master":
don't call ScMarkData::GetMarkedRanges in ScColumn, related fdo#79422
The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.
Thanks for fixing.
but.. is this fix working?
i tested master branch
but i still got same problem
after replace all large data, hang for a long time.
(In reply to comment #16)
> Thanks for fixing.
> but.. is this fix working?
> i tested master branch
> but i still got same problem
> after replace all large data, hang for a long time.
It is significantly faster. Instead of several minutes it opens here after a few dozen seconds. There are surely more problems but at least 90% of the original time was spent in this code.
it's faster than before
thanks for fixing
I've tried today the "30000 rows" attached ods for a remplacement of values 'NA' with 'NB'
- select column 'risk factor'
- ctrl-H for the remplace window
- Search for : NA
- Replace with : NB
- Click 'Find All' ... and wait .......
Sorry for this bad news but still almost unusable for me ( I often use Calc for such search/replace on big csv/text files)
Try the same global replacement with version 4.1.6. It's almost immediate.
In which branch is the patch ?
@Nicklos: It is patched in the daily master build found at < http://dev-builds.libreoffice.org/daily/master/ >.
Tested the 30k file and the replacement took 5 seconds to complete, the only problems i see now is that the search results halts the UI for a bit, then when i close it, the find & replace dialog halts the UI for a bit.
Thanks. Following your link , tested with libo-master~2014-08-25_01.27.12_LibreOfficeDev_220.127.116.11.alpha0_Win_x86
With this version (win 7 pro 64 bits, I7 , 8Gb ram)
'30000 rows' test file :
Find all : Search Result windows appears immediatly, but around 7 sec before LibO interface becomes responsive again ( eg : scroll to the end of search result window)
If I select an entry in Search Results windows, this entry is displayed almost immediatly ... but still 3 sec before interface becomes responsive.
Replace all : like 'Find All' ... around 7sec before interface becomes responsive.
Same file with 4.1.6 : Find all and Replace All in less than 1 sec
But, with one of my 'real life' files ( 40 000 adresses), Find all with one column selected and search restricted to 'selection' ( as in previous example ) => always not responsive after 10 minutes ! (I've forced close of LibO).
This 'real life' file with 4.1.6 : around 3 sec for Find All or Replace All
So,sorry again, but still unusable for me for this kind of job.
This 'search results' windows with direct access is a good idea but not usable for big files at this time.
Open a new bug report and attach your file. There are surely more issue but please keep this bug report clean.
I'll prepare an 'anonymized' 40000 adresses files for a new bug report, but I really think this performance problem is linked to new find / replace results window.
I've tested the fist attachment 'Sample data ODS file' ... More than 3 minutes for a 'find all' with LibO 4.4.0 and 5 seconds with LibO 4.1.6.
The increase in response time isn't linear with the increase in file size.
(In reply to comment #22)
> Open a new bug report and attach your file. There are surely more issue but
> please keep this bug report clean.
Ok, new bug 83141
Just curious whether this patch was also pushed into 4.2 and 4.3.