Bug 119286 - Sanitize documents using "Find & Replace" getting slower and slower
Summary: Sanitize documents using "Find & Replace" getting slower and slower
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.2.5.1 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisectRequest, implementationError, perf
Depends on:
Blocks: Find-Search
  Show dependency treegraph
 
Reported: 2018-08-15 09:51 UTC by Telesto
Modified: 2019-06-07 12:33 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Example file (23.87 KB, application/vnd.oasis.opendocument.text)
2018-08-15 09:51 UTC, Telesto
Details
Screenshot 'hang' (45.14 KB, image/png)
2018-08-15 09:54 UTC, Telesto
Details
Bibisect log (4.62 KB, text/plain)
2018-08-15 14:03 UTC, Telesto
Details
Flame Graph Win (1.94 MB, image/svg+xml)
2019-05-08 14:43 UTC, Telesto
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Telesto 2018-08-15 09:51:17 UTC
Description:
Sanitize documents using "Find & Replace" getting slower and slower

Steps to Reproduce:
1. Open the attached file 
2. Disable the automatic spell checking if enabled

In the Search for field put "." (a single period)
In the Replace with field put "x"
Make sure Whole words only is unchecked
Click Other Options to expand the dialog, and make sure Regular expressions is checked
Click Replace All


Actual Results:
5 seconds with LibO 4.4.7.2
12 seconds with LibO 5.2.5.0
20 seconds with LibO 6.2.0.0 (and dialog acts as if LibO won't react)

Difference between 5.2.5. and 6.2.0.0 might be Harfbuzz common layout

Expected Results:
Something like 4.4.7.2 would be ideal. At least some improvement


Reproducible: Always


User Profile Reset: No



Additional Info:
Found in
Version: 6.2.0.0.alpha0+
Build ID: c0fdcece6b886912618deee9656cb2d169a9b999
CPU threads: 4; OS: Windows 6.3; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2018-08-12_00:35:45
Locale: en-US (nl_NL); Calc: CL

and in
Version: 5.2.5.0.0+
Build ID: 78223678b7513ffe46804cb08f2dc5bc899b2bab
CPU Threads: 4; OS Version: Windows 6.29; UI Render: default; 
Locale: nl-NL (nl_NL); Calc: CL

but not in
Versie: 4.4.7.2 
Build ID: f3153a8b245191196a4b6b9abd1d0da16eead600
Locale: nl_NL
Comment 1 Telesto 2018-08-15 09:51:45 UTC
Created attachment 144181 [details]
Example file
Comment 2 Telesto 2018-08-15 09:54:19 UTC
Created attachment 144182 [details]
Screenshot 'hang'
Comment 4 Buovjaga 2018-09-07 17:28:44 UTC
Repro slowness 6.2 vs. 3.6.

Adding Cc: to Miklos Vajna
so he can comment on the blamed commit
Comment 5 Miklos Vajna 2018-09-10 07:35:41 UTC
At some point I tried to improve search so it finds strings not only in Writer text, but also in Writer shape texts. I guess it's probable that the slowness comes from that.

Seems the current situation is the worst of both worlds: replace all still doesn't replace strings from shape text, but already slows down replace all.

Getting back the old performance or being more correct (include shape text when you do replace-all) would be good.
Comment 6 Miklos Vajna 2019-04-02 12:47:35 UTC
I had a look at this: just reverting the above mentioned commit doesn't change anything. I get why it doesn't help: the commit changes how documents containing shapes behave and the bugdoc doesn't have shapes.

I also profiled this: SvxSearchDialog::CommandHdl_Impl() in svx is the UI code, you can measure the cost of ExecuteSynchron():

- 97% is spent in SwFindParaText::DoFind() (OK)
  - 55% of that is sw::ReplaceImpl
  - 40% of that is sw::FindTextImpl
    - here 20% is the SvxSearchItem ctor, which is already moved out of the loop for perf reasons
    - 18% is DoSearch

So in short, nothing obviously stupid. According to the above numbers, it indeed smells like a regression, but I doubt it's from the above commit.
Comment 7 Telesto 2019-04-13 17:25:13 UTC
@Buovjaga
A flame graph would nice. And maybe retry of the bibisect on Linux; the range isn't to helpful, I guess

Looks bit like a Noel type of perf bug.. or I'm a bit to optimistic :-)
Comment 8 Buovjaga 2019-04-13 18:10:35 UTC
For me, it is 5,62 seconds with the last commit in Linux 50max and 10,54 in the first commit.

44max:
last: 10,25
first: 10,27

I don't know what I should bibisect :( I don't have newer repos than 50max on Linux.

Miklos already profiled this in comment 6, so no need for me to do anything.
Comment 9 Telesto 2019-05-08 14:43:04 UTC
Created attachment 151247 [details]
Flame Graph Win

Version: 6.3.0.0.alpha0+
Build ID: 128145e227ef91fb2f23893e73d38ae72cf074e5
CPU threads: 2; OS: Windows 6.3; UI render: default; VCL: win; 
Locale: nl-NL (nl_NL); UI-Language: en-US
Calc: threaded
Comment 10 Telesto 2019-05-29 12:30:09 UTC
Another suggestion to make find & replace even worse:
Enabled change tracking and turn show changes on before running find & replace (bug 121618 comment 6)
Comment 11 Miklos Vajna 2019-06-07 12:33:50 UTC
Forgot to change from regression to implementation error when I commented that the problem did not start at the bisected commit after all.