Bug 119286 - Sanitize documents using "Find & Replace" getting slower and slower
Summary: Sanitize documents using "Find & Replace" getting slower and slower
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.0.0.5 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.1.0 target:7.0.2
Keywords: implementationError, perf
: 136436 (view as bug list)
Depends on:
Blocks: Find-Search
  Show dependency treegraph
 
Reported: 2018-08-15 09:51 UTC by Telesto
Modified: 2020-09-09 18:14 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Example file (23.87 KB, application/vnd.oasis.opendocument.text)
2018-08-15 09:51 UTC, Telesto
Details
Screenshot 'hang' (45.14 KB, image/png)
2018-08-15 09:54 UTC, Telesto
Details
Bibisect log (4.62 KB, text/plain)
2018-08-15 14:03 UTC, Telesto
Details
Flame Graph Win (1.94 MB, image/svg+xml)
2019-05-08 14:43 UTC, Telesto
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Telesto 2018-08-15 09:51:17 UTC
Description:
Sanitize documents using "Find & Replace" getting slower and slower

Steps to Reproduce:
1. Open the attached file 
2. Disable the automatic spell checking if enabled

In the Search for field put "." (a single period)
In the Replace with field put "x"
Make sure Whole words only is unchecked
Click Other Options to expand the dialog, and make sure Regular expressions is checked
Click Replace All


Actual Results:
5 seconds with LibO 4.4.7.2
12 seconds with LibO 5.2.5.0
20 seconds with LibO 6.2.0.0 (and dialog acts as if LibO won't react)

Difference between 5.2.5. and 6.2.0.0 might be Harfbuzz common layout

Expected Results:
Something like 4.4.7.2 would be ideal. At least some improvement


Reproducible: Always


User Profile Reset: No



Additional Info:
Found in
Version: 6.2.0.0.alpha0+
Build ID: c0fdcece6b886912618deee9656cb2d169a9b999
CPU threads: 4; OS: Windows 6.3; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2018-08-12_00:35:45
Locale: en-US (nl_NL); Calc: CL

and in
Version: 5.2.5.0.0+
Build ID: 78223678b7513ffe46804cb08f2dc5bc899b2bab
CPU Threads: 4; OS Version: Windows 6.29; UI Render: default; 
Locale: nl-NL (nl_NL); Calc: CL

but not in
Versie: 4.4.7.2 
Build ID: f3153a8b245191196a4b6b9abd1d0da16eead600
Locale: nl_NL
Comment 1 Telesto 2018-08-15 09:51:45 UTC
Created attachment 144181 [details]
Example file
Comment 2 Telesto 2018-08-15 09:54:19 UTC
Created attachment 144182 [details]
Screenshot 'hang'
Comment 4 Buovjaga 2018-09-07 17:28:44 UTC
Repro slowness 6.2 vs. 3.6.

Adding Cc: to Miklos Vajna
so he can comment on the blamed commit
Comment 5 Miklos Vajna 2018-09-10 07:35:41 UTC
At some point I tried to improve search so it finds strings not only in Writer text, but also in Writer shape texts. I guess it's probable that the slowness comes from that.

Seems the current situation is the worst of both worlds: replace all still doesn't replace strings from shape text, but already slows down replace all.

Getting back the old performance or being more correct (include shape text when you do replace-all) would be good.
Comment 6 Miklos Vajna 2019-04-02 12:47:35 UTC
I had a look at this: just reverting the above mentioned commit doesn't change anything. I get why it doesn't help: the commit changes how documents containing shapes behave and the bugdoc doesn't have shapes.

I also profiled this: SvxSearchDialog::CommandHdl_Impl() in svx is the UI code, you can measure the cost of ExecuteSynchron():

- 97% is spent in SwFindParaText::DoFind() (OK)
  - 55% of that is sw::ReplaceImpl
  - 40% of that is sw::FindTextImpl
    - here 20% is the SvxSearchItem ctor, which is already moved out of the loop for perf reasons
    - 18% is DoSearch

So in short, nothing obviously stupid. According to the above numbers, it indeed smells like a regression, but I doubt it's from the above commit.
Comment 7 Telesto 2019-04-13 17:25:13 UTC
@Buovjaga
A flame graph would nice. And maybe retry of the bibisect on Linux; the range isn't to helpful, I guess

Looks bit like a Noel type of perf bug.. or I'm a bit to optimistic :-)
Comment 8 Buovjaga 2019-04-13 18:10:35 UTC
For me, it is 5,62 seconds with the last commit in Linux 50max and 10,54 in the first commit.

44max:
last: 10,25
first: 10,27

I don't know what I should bibisect :( I don't have newer repos than 50max on Linux.

Miklos already profiled this in comment 6, so no need for me to do anything.
Comment 9 Telesto 2019-05-08 14:43:04 UTC
Created attachment 151247 [details]
Flame Graph Win

Version: 6.3.0.0.alpha0+
Build ID: 128145e227ef91fb2f23893e73d38ae72cf074e5
CPU threads: 2; OS: Windows 6.3; UI render: default; VCL: win; 
Locale: nl-NL (nl_NL); UI-Language: en-US
Calc: threaded
Comment 10 Telesto 2019-05-29 12:30:09 UTC
Another suggestion to make find & replace even worse:
Enabled change tracking and turn show changes on before running find & replace (bug 121618 comment 6)
Comment 11 Miklos Vajna 2019-06-07 12:33:50 UTC
Forgot to change from regression to implementation error when I commented that the problem did not start at the bisected commit after all.
Comment 12 Telesto 2020-09-03 15:06:47 UTC
*** Bug 136436 has been marked as a duplicate of this bug. ***
Comment 13 Telesto 2020-09-03 15:14:40 UTC
@Noel
In the case you're doing some perf magic again; a suggestion in advance
Comment 14 Noel Grandin 2020-09-08 11:19:44 UTC
I have fixed one part of this

https://gerrit.libreoffice.org/c/core/+/102231

There are two more (which I don't intend to tackle)

(1) We repeatedly revalidate the paragraph signature. Since we're making changes to individual words/letters, this is effectively O(n^2). A smarter thing to do would be to only do this validation after the replace operation.

(2) We create a ton of small UNDO objects. Again, a smarter thing to do would be to create a single UNDO operation.

The first one is tricky because it uses the UNO API to inspect the document, while the rest of the code is using the internal raw API.

The second one is tricky because there are loops at multiple levels of the call stack, and it's not clear how the responsibilities are shared across the different calls.
Comment 15 Telesto 2020-09-08 11:28:56 UTC
I would call a 50% reduction sounds already quite an improvement
Comment 16 Commit Notification 2020-09-09 06:26:18 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/1159f58831c69680e9f10767d5358e13b66579dd

tdf#119286 speed up find/replace

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Xisco Faulí 2020-09-09 11:04:19 UTC
in my case, times after the patch is 23 seconds, before the patch, it is 42 seconds. Nicee!!

@Noel, thanks for fixing this issue. I'll create a follow-up enhancement with the info from comment 14
Comment 18 Buovjaga 2020-09-09 11:32:34 UTC
(In reply to Xisco Faulí from comment #17)
> in my case, times after the patch is 23 seconds, before the patch, it is 42
> seconds. Nicee!!
> 
> @Noel, thanks for fixing this issue. I'll create a follow-up enhancement
> with the info from comment 14

I already created them and added to See also earlier.
Comment 19 Commit Notification 2020-09-09 18:14:52 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-7-0":

https://git.libreoffice.org/core/commit/ee05abb5a41ed3fc898963dda06675ddbef8dbf5

tdf#119286 speed up find/replace

It will be available in 7.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.