128078 – Improve loading of large autocorrect lists

Bug 128078 - Improve loading of large autocorrect lists

Summary: Improve loading of large autocorrect lists

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Linguistic (show other bugs)
Version: (earliest affected)	6.4.0.0.alpha1+
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:	perf

Duplicates (1):	133700 (view as bug list)
Depends on:
Blocks:	AutoCorrect-Complete
	Show dependency tree / graph

Reported:	2019-10-10 14:58 UTC by V Stuart Foote
Modified:	2021-10-17 14:26 UTC (History)
CC List:	11 users (show)

See Also:	109158 140635 133874
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description V Stuart Foote 2019-10-10 14:58:20 UTC

continuation of work on bug 109158 (5.3 -> 6.0 autocorrect list loading slowdown)

Comment 1 V Stuart Foote 2019-10-10 15:10:35 UTC

Julien N. posted  attachment 154391 [details] flamegraph of current situation for loading @tommy27's attachment 134684 [details] -- a large autocorrect replacment table for it_IT locale.

It needs to be renamed to match testing locale when used for testing--to avoid additional delay of language switching--but otherwise provides a good sample size to benchmark.

IIUC we are now doing a very precise calculation of size (x & y) for each tupple in the correction file. Not incorrect, but still annoying from 5.3 builds where load time was acceptably fast.

So while work on 109158 results in correct handling for the general parsing of lists--someething more is needed for the autocorrect/emoji list.

Comment 2 Caolán McNamara 2019-10-22 10:58:20 UTC

The idea of a new bug when an original becomes huge and out of control is to start fresh and clearly state a reproducible problem without need to refer to the original bug. And in this particular case it cannot reference a broken rendering/layout point in time and compare against that as some benchmark moment that the current state is a regression from.

Comment 3 Buovjaga 2019-10-22 13:18:20 UTC

Changing to enhancement, so the nature of this is clear.

Comment 4 V Stuart Foote 2019-10-22 14:18:36 UTC

@Caolán, was that a scolding? ;-)

But doesn't the issue summary say it all? 

We have situation of latency in opening/reopening the AutoCorrect dialog.

Previously supported usage has allowed for a user to open the dialog interactively to assemble 'text strings' and associated shortcuts as a writing aid. If they had "boiler plate" to capture for reuse--throw it into the AutoCorrect dialog. Multiple variations? No problem throw them in with each with a different shortcut string.

The project even did the same thing--we put the entire :emoji: entry feature and localized it into the AutoCorrect dialog.

So at this point (6.3 & master/6.4) we have users who have assembled abusively large (~200K entry) AutoCorrect replacement lists given a best case latency of 15 - 20 seconds in opening the tool. As compared to 5 second openings in the past.

It now is opened visually correctly, but even with recent optimizations Noel did, it is just too slow. 

Do we force the user to edit back their correction list, cap the counts?  

If not, can we again accommodate their hefty usage by sacrificing visual precision of the listbox display for performance? 

And at what count should the threshold be?  To me seems like anything above 3,000 --so that our deployed correction lists will format visually correctly-- and any count beyond that could be allowed to be "sloppy".

Comment 5 V Stuart Foote 2019-10-22 14:45:20 UTC

Seems a subject for UX review, not the GUI but the user expectation.

Pretty certain we had inherited the unbounded correction list from OOo era, and that our GUI formatting of listbox has now  unintentionally reduced usability of this previously functional "feature".

How the project handles this going forward should be deliberated as a UX issue.

Comment 6 tommy27 2019-10-23 05:24:29 UTC

(In reply to V Stuart Foote from comment #4)
> 
> ....
> 
> So at this point (6.3 & master/6.4) we have users who have assembled
> abusively large (~200K entry) AutoCorrect replacement lists given a best
> case latency of 15 - 20 seconds in opening the tool. As compared to 5 second
> openings in the past.
> 
> ...

think about people with dyslexia or hand movement impairments... they heavily rely on autocorrect.

Comment 7 Heiko Tietze 2019-10-23 07:11:41 UTC

(In reply to V Stuart Foote from comment #5)
> Seems a subject for UX review, not the GUI but the user expectation.

Here is what I wrote for KDE some time ago https://hig.kde.org/components/assistance/progress.html. Also quite good is the MS guideline https://docs.microsoft.com/en-us/windows/win32/uxguide/progress-bars

In a nutshell: >200ms - busy pointer, >1s - progress bar with a proper time estimation (don't proceed to 1%, wait, and jump to 95%) and the possibility to cancel the operation. Operation that takes minutes should ideally run in a separate thread.

Comment 8 Julien Nabet 2019-10-23 21:08:32 UTC

(In reply to tommy27 from comment #6)
> (In reply to V Stuart Foote from comment #4)
>...
> think about people with dyslexia or hand movement impairments... they
> heavily rely on autocorrect.
What about having a limited size by default and propose 200k size as an extension or a kind of option like accessibility, high contrast theme, etc. ?

Comment 9 Julien Nabet 2019-10-23 21:11:35 UTC

(In reply to Heiko Tietze from comment #7)
> (In reply to V Stuart Foote from comment #5)
> ...
> In a nutshell: >200ms - busy pointer, >1s - progress bar with a proper time
> estimation (don't proceed to 1%, wait, and jump to 95%) and the possibility
> to cancel the operation. Operation that takes minutes should ideally run in
> a separate thread.

These numbers depend entirely on the power of your pc so even if you take into account pc not older than 10 years, you can have old CPUs with 4Go RAM
Anyway, autocorrect should perhaps be in a separate thread.

Comment 10 tommy27 2020-02-16 13:44:31 UTC

retested and compared between different LibO versions on a different Win10 x64 than my original tests.

loading times are respectively:

LibO 5.3.6 -> 5 seconds

Lib 6.1.6 -> 10 seconds

LibO 6.3.4 -> 20 seconds

LibO 7.0.0.0.alpha0+ -> 20 seconds

this shows that there has been a consistent performance drop loading large autocorrect lists.

whilst 5.3.x was very fast, 6.1.x is twice slower, and 6.3.x 4 times slower...

Comment 11 V Stuart Foote 2020-02-16 16:45:30 UTC

(In reply to tommy27 from comment #10)
> ... 
> this shows that there has been a consistent performance drop loading large
> autocorrect lists.
> 
> whilst 5.3.x was very fast, 6.1.x is twice slower, and 6.3.x 4 times
> slower...

And again those are totally irrelevant as benchmarks.

Assuming continued UX agreement that supporting auto-correct tables of 200,000 or more entries is needed (and I think it is) this enhancement would require refactoring of the 'AutoCorrect Options...' dialog for populating the widgets on its 'Replace' tab.

That refactoring could take several tracts. 

My preference would be to test for autocorrect list size, and for greater than ~3000 entries use a fixed entry, just dump the pairs into the widget unmeasured. Give up some visual precision for load speed. Not pretty, just fast.

But possibly provide a different 'mode' for large (> 3000) autocorrect tables, presumably in its own thread.

Comment 12 tommy27 2020-03-05 05:45:35 UTC

relevant or not relevant, I can tell that situations is better in 6.4.1

now the loading takes 16 seconds on the same computer of my previous benchmark.

still slower than 5.3.6 (5 secs) and 6.1.6 (10 secs) but faster tha 6.3.4 (20 seconds)

probably this has been a positive side effect of some other code tweak.
having said that, I hope that some developer will one day address this annoying performance issue.

Stuart's insights about the code refactoring look interesting.

Comment 13 V Stuart Foote 2020-06-09 15:38:33 UTC

*** Bug 133700 has been marked as a duplicate of this bug. ***

Comment 14 tommy27 2021-02-25 11:26:52 UTC

latest test using LibO 7.0.4.2 under Win10x64 

loading a huge autocorrect replacement table (probably 300K entries) takes 18 seconds

the computer has an Inter Core i5-8400 CPU @ 2.80GHz, 16GM RAM and an SSD disk, so it's not a bad machine.

basically the performance is stable from previous tests using the 6.4.x branch

Comment 15 tommy27 2021-10-17 14:26:17 UTC

retested using LibO 7.1.5.2 

autocorrect now takes just 4 seconds to load a huge autocorrect replacement table.

issue is finally solved

thanks to anyone who fixed it

RESOLVED WORKSFORME