Bug 115829 - Search and replace is slow in a large document compared to LibO5.0.0.1
Summary: Search and replace is slow in a large document compared to LibO5.0.0.1
Status: RESOLVED DUPLICATE of bug 116242
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.1.6.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: perf, regression
Depends on:
Blocks: Find-Search CPU-AT-100%
  Show dependency treegraph
 
Reported: 2018-02-18 16:31 UTC by Telesto
Modified: 2018-04-06 04:24 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Callgrind output from master (7.68 MB, application/x-xz)
2018-02-19 18:46 UTC, Buovjaga
Details
Example document reduced to 24 pages (66.68 KB, application/vnd.oasis.opendocument.text)
2018-02-19 18:59 UTC, Buovjaga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Telesto 2018-02-18 16:31:53 UTC
Description:
Search and replace is slow in a large document

Steps to Reproduce:
1. Open Writer
2. Disable automatic spell checking
3. Open https://drive.google.com/file/d/1NPXKo2nsfn9steLOw1AuWvZxQE21wT-n (bug 115757)
4. CTRL+H (Replace dialog)
5. Replace Dallun with ZZZ with Replace ALL
6. Monitor the time required. 

Actual Results:  
Slow, > 2 minute

Expected Results:
20 seconds or so


Reproducible: Always


User Profile Reset: No



Additional Info:
Repro with
Version: 6.1.0.0.alpha0+
Build ID: b87fe45e8b087a315a65b92bf9c168b1e4c5cc00
CPU threads: 4; OS: Windows 6.3; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2018-02-16_23:14:35
Locale: nl-NL (nl_NL); Calc: CL

and with
Version: 5.2.5.0.0+
Build ID: 78223678b7513ffe46804cb08f2dc5bc899b2bab
CPU Threads: 4; OS Version: Windows 6.29; UI Render: default; 
Locale: nl-NL (nl_NL); Calc: CL

and with
Versie: 5.1.6.2 
Build ID: 07ac168c60a517dba0f0d7bc7540f5afa45f0909
CPU Threads: 4; Versie besturingssysteem:Windows 6.2; UI Render: GL; 
Locale: nl-NL (nl_NL); Calc: CL

but not with
Versie: 5.0.0.1 
Build ID: 9a0b23dd0ab9652e0965484934309f2d49a7758e
Locale: nl-NL (nl_NL)


User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
Comment 1 m.a.riosv 2018-02-18 23:39:52 UTC
Why do you spect 30 seconds, there are 9750 replacements, that implies preserve their undo and redo the document at the same time. Using a word of the same lenght seems it's bit quicker, so maybe redo such a long document for every change it's the main issue.
Comment 2 Telesto 2018-02-19 08:42:10 UTC
The numbers where misleading.

The replacing is ready in 6 minutes with:
Version: 6.1.0.0.alpha0+
Build ID: b87fe45e8b087a315a65b92bf9c168b1e4c5cc00
CPU threads: 4; OS: Windows 6.3; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2018-02-16_23:14:35
Locale: nl-NL (nl_NL); Calc: CL

---
The replacing is ready in 50 seconds with (there is still some background processing, but not influencing any work) 
Versie: 5.0.0.1 
Build ID: 9a0b23dd0ab9652e0965484934309f2d49a7758e
Locale: nl-NL (nl_NL)

and with
Versie: 4.4.7.2 
Build ID: f3153a8b245191196a4b6b9abd1d0da16eead600
Locale: nl_NL

--
The replacing is ready in 80 seconds. No background processing
Version: 4.3.7.2
Build ID: 8a35821d8636a03b8bf4e15b48f59794652c68ba

Very Sleepy slows a stack like this for LibO6.1 icu_60::CompoundTransliterator::handleTransliterate	icuin60
icu_60::Transliterator::filteredTransliterate	icuin60
icu_60::Transliterator::transliterate	icuin60
icu_60::Transliterator::transliterate	icuin60
com_sun_star_i18n_Transliteration_IGNORE_WIDTH_get_implementation	i18npoollo
com_sun_star_i18n_Transliteration_IGNORE_CASE_get_implementation	i18npoollo
com_sun_star_i18n_Transliteration_IGNORE_CASE_get_implementation	i18npoollo
Comment 3 Telesto 2018-02-19 15:28:28 UTC
@buovjaga
A valgrind trace would be nice, after conformation of course.. And don't use the full document, would take ages ;-)
Comment 4 Buovjaga 2018-02-19 18:46:04 UTC
I confirm the slowness. In 6.1 with the big document it took 1 min 40 secs.
With 3.6 it took 22 secs.

Arch Linux 64-bit
Version: 6.1.0.0.alpha0+
Build ID: c902cbc7dc5294ab721a9aef3a152aa243d00011
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on February 17th 2018

Arch Linux 64-bit
Version 3.6.7.2 (Build ID: e183d5b)
Comment 5 Buovjaga 2018-02-19 18:46:53 UTC
Created attachment 139999 [details]
Callgrind output from master

I should have reduced the document some more before doing it... it replaced about 2000 times.
Comment 6 Buovjaga 2018-02-19 18:59:16 UTC
Created attachment 140000 [details]
Example document reduced to 24 pages
Comment 7 Eike Rathke 2018-03-07 19:52:14 UTC
I believe that for 6.1 and 6.0 this is at least partly due to diacritic transliteration always being enabled, see bug 116242 as a fallout of the change for bug 111846. Please check again with the fix for bug 116242 and under Options enable Diacritic-sensitive once.
Comment 8 Telesto 2018-03-08 10:59:53 UTC
Diacritic-sensitive enabled = decent speed
Diacritic-sensitive disabled = painfully show

However, Diacritic-sensitive is disabled by default and not an obvious checkbox to use..
Comment 9 Buovjaga 2018-03-08 12:29:37 UTC
(In reply to Eike Rathke from comment #7)
> Please check again with the fix for bug 116242 and
> under Options enable Diacritic-sensitive once.

Without checking the box it is the same (1 min 40 sec).
Checking the box lowers the time to 21 secs.

Arch Linux 64-bit
Version: 6.1.0.0.alpha0+
Build ID: b8fe96f1da2c42c04a8094ca8c57d49763b7bded
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on March 8th 2018
Comment 10 Timur 2018-03-08 13:56:26 UTC
Is this a dupe of bug 116242 then?
Comment 11 Buovjaga 2018-03-08 13:59:41 UTC
(In reply to Timur from comment #10)
> Is this a dupe of bug 116242 then?

Well, I don't understand why this should suddenly turn to be about diacritics, but maybe Eike can elaborate.
Comment 12 Telesto 2018-03-08 14:35:29 UTC
Diacritic-sensitive will be checked by default bug 116242 comment 10..

*** This bug has been marked as a duplicate of bug 116242 ***
Comment 13 Eike Rathke 2018-03-08 16:37:45 UTC
(In reply to Buovjaga from comment #11)
> Well, I don't understand why this should suddenly turn to be about
> diacritics, but maybe Eike can elaborate.
Because when ignoring diacritics (Diacritic-sensitive not checked) an extra transliteration is applied to all text to be searched that decomposes the Unicode character string to remove the diacritics and keep the base letter, for example Äpfel -> A"pfel -> Apfel, and that is a heavy operation for large amounts of text. When replacing text it may get even worse because (a quite awkward implementation of) an indexed sequence is set up to be able to map search positions to replace positions.