168815 – Support conditional ignoring of some aspects of cell text for autocomplete matching

Bug 168815 - Support conditional ignoring of some aspects of cell text for autocomplete matching

Summary: Support conditional ignoring of some aspects of cell text for autocomplete ma...

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Calc (show other bugs)
Version: (earliest affected)	Inherited From OOo
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	CJK AutoCorrect-Complete RTL
	Show dependency tree / graph

Reported:	2025-10-11 18:58 UTC by Eyal Rozenberg
Modified:	2025-10-23 07:15 UTC (History)
CC List:	2 users (show)

See Also:
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Eyal Rozenberg 2025-10-11 18:58:33 UTC

When we start typing text in a cell, the previous cells in the same columns are searched to find a best-matching cell value to autocomplete to. This bug is about the way prefixes are matched against previous cells.

The match is either exact, or it allows for ignoring certain differences, or at least canonicalizing characters before the match. Specifically, typing "man" will match MANTISSA, despite the case difference; however, it would _not_ match "mándarin".

I sugges that we allow the users to chooce which kinds of canonicalization and character-removals/ignores are performed during these matches, and which aren't. Specifically, users should be able to enable or disable at least the following categories:

* Latin letter case
* Ligatures like st, ss (the beta-like symbol in German), ae and so on.
* Decimal digit system - western-Arabic (01234567890), eastern-Arabic (٠١٢٣٤٥٦٧٨٩) or other
* Accented European language characters, e.g. á as opposed to a
* Separate accenting characters
* Punctuation and cantiallation marks (e.g. in Arabic, Farsi, Hebrew)
* Non-printing characters (like Zero-Width Joiner, Left-Right Mark etc.)

... and I'm sure CJKV people can make some more suggestions regarding those scripts.

Naturally, this is not just about the UI exposed to the user, there is probably some backend coding work to support this configurable canonicalization/character removal.

Comment 1 Heiko Tietze 2025-10-13 07:35:13 UTC

(In reply to Eyal Rozenberg from comment #0)
>... enable or disable at least the following categories:
Way too much fine-tuning for my taste. If at all I could imagine some Levenshtein algorithm as known from F&R: Similarity Search.

Comment 2 Eyal Rozenberg 2025-10-14 08:07:35 UTC

(In reply to Heiko Tietze from comment #1)
> If at all I could imagine some
> Levenshtein algorithm as known from F&R: Similarity Search.

That would not help, because you need to decide what's similar. Is 'a' similar to 'á' ? Is a punctuation mark similar to nothing? A similarity search is the layer above these questions.

> (In reply to Eyal Rozenberg from comment #0)
> >... enable or disable at least the following categories:
> Way too much fine-tuning for my taste. 

So, suppose we just had one combination of these choices; which one would it be? The current one is clearly not working out for many/most people.  ... the problem is, that different people would want a different combination without the fine-tuning for catering to other people's preference.

Comment 3 jan d 2025-10-15 16:16:23 UTC

In short: 
"Way too much fine-tuning for my taste"
I agree with Heiko.

Slightly longer: 
I personally did not run into that problem yet, despite the öäüß in the German I write (I have a hard time thinking of situations where I want an ö be an o)

But assuming the problem is common and severe enough to make a lot of people think that they want to change the autocomplete and given we would find a way to provide a somehow usable way to configure it, a general configuration might not be helpful, since what is and is not a good suggestion does not depend on personal preferences as much as on input data and its meaning. Which means that it might vary between documents and columns.

Comment 4 Eyal Rozenberg 2025-10-15 19:31:30 UTC

(In reply to jan d from comment #3)
> But assuming the problem is common and severe enough to make a lot of people
> think that they want to change the autocomplete

Not "think they want" but rather "have to, in order for it to be useful". Definitely when it comes to punctuation marks with RTL languages, for example.

> and given we would find a
> way to provide a somehow usable way to configure it, a general configuration
> might not be helpful, since what is and is not a good suggestion does not
> depend on personal preferences as much as on input data and its meaning.
> Which means that it might vary between documents and columns.

That... is an interesting point, it's true that it could also depend on data. But the question of how will likely usually depend on the person's judgement.

Comment 5 Heiko Tietze 2025-10-23 07:15:00 UTC

We discussed the topic in the design meeting. While Latin-based languages rarely need to be fuzzy it might be different for other. So the punctuation matters in Hebrew, for example  אֱיָל vs. אי.

Ideally we solve it with a simple algorithm like Levenshtein distance to refrain users from complex configurations.