Bug 160218 - perf: GetDefaultScriptType
Summary: perf: GetDefaultScriptType
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
24.2.0.0 beta1+
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: perf
Depends on:
Blocks:
 
Reported: 2024-03-15 12:56 UTC by Michael Meeks
Modified: 2024-03-18 09:37 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
a profile mostly of the end of loading (935.58 KB, image/png)
2024-03-15 12:56 UTC, Michael Meeks
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Meeks 2024-03-15 12:56:55 UTC
Created attachment 193129 [details]
a profile mostly of the end of loading

Loading the giant spreadsheet from here:

https://spreadsheets-are-all-you-need.ai/index.html

1.2Gb small - spends 1/4 of its time in ScGlobal::GetDefaultScriptType() - which (it seems to me) - should not consume a staggering amount of time pointer-chasing in liblangtag data-structures.

Of course - we can see that a big chunk of this part of import time is (I assume pointlessly) re-calculating the stored row-heights from Excel but ...

To be fair - I didn't profile the whole load process, mostly just the end so quite possibly the whole picture is more interesting; but this looked like a low hanging fruit =)

=)
Comment 1 Eike Rathke 2024-03-15 13:58:36 UTC
That doesn't look like pointer-chasing but in MsLangId::getScriptType() the chains of nLang.anyOf(...) and primary(nLang).anyOf(...) (that previously before using o3tl::strong_int()::anyOf() were nested switch/case constructs).

My suggestion:
Take a shortcut to handle a few well-known often occurring Western languages/locales beforehand, like

    if (nLang == LANGUAGE_ENGLISH_US ||
            primary(nLang).anyOf(
                primary(LANGUAGE_ENGLISH_US),
                primary(LANGUAGE_SPANISH_MODERN),
                primary(LANGUAGE_FRENCH),
                primary(LANGUAGE_GERMAN)))
        return css::i18n::ScriptType::LATIN;
Comment 2 Michael Meeks 2024-03-15 14:07:17 UTC
Switch sounds good to me =) Somewhere in the profile I saw a red-black tree in use and I suspect RB trees are a pointer-chasing performance hazard on real modern hardware; but perhaps I mis-read this one.

Would it not be better to cache the answer? have a single entry 'nLang -> ScriptType' static cache in that function ? I assume that 99.99% of the time this is the same language ? =)
Comment 3 Noel Grandin 2024-03-15 17:36:46 UTC
mmeeks, this was defintely a profile of an optimised build right? 

I just find it hard to believe that the compiler did not inline the anyOf method into a comparison chain.

If it is an optimised build, I possibly need to attribute anyOf() with __always_inline__
Comment 4 Noel Grandin 2024-03-16 09:19:44 UTC
Hmmm, my optimised build does not show any of that, possibly master has improved somewhat, and that is a different branch?
Comment 5 Michael Meeks 2024-03-18 09:37:03 UTC
Ah! so - of course, it is quite possible that (as you say) this is not optimized; and so - a total waste of time; seems so - my bad - sorry!

Still - this is a lookup we do repeatedly of one fixed lang against some complex machinery so a one item cache might be useful (?) =) or just have both the nLang and the ScriptType set in the global state when they are mutated on set ? =)