Bug 154366 - Fallback mechanism for hyphenation from specific to general/default language locale
Summary: Fallback mechanism for hyphenation from specific to general/default language ...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
7.6.0.0 alpha0+
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Hyphenation
  Show dependency treegraph
 
Reported: 2023-03-24 15:01 UTC by Eyal Rozenberg
Modified: 2023-07-09 17:11 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2023-03-24 15:01:23 UTC
Some locales of languages may be missing hyphenation info; but users still do want hyphenation, even for obscure locales. It makes sense to allow falling back on on more general locale, or on a popular specific locale which does have hyphenation data available.

Example:

en_IL          - no hyphenation available
en             - would work (if this is defined at all)
en_UK or en_US - good enough fallback options for en_IL

A mechanism for enabling such fallbacks should be implemented and exposed to users.
Comment 1 Stéphane Guillou (stragu) 2023-04-26 23:19:00 UTC
In a way similar to bug 83561, which would implement the fallback mechanism needed for language variants.

I am unsure how hyphenation rules work, and how possible it is to split that from the rest of the dictionary (i.e. using the dictionary for spellcheck but a fallback for hyphenation if it is missing). Marco and Sophie, any opinion on it?
Comment 2 Marco A.G.Pinto 2023-04-26 23:36:30 UTC
(In reply to Stéphane Guillou (stragu) from comment #1)
> In a way similar to bug 83561, which would implement the fallback mechanism
> needed for language variants.
> 
> I am unsure how hyphenation rules work, and how possible it is to split that
> from the rest of the dictionary (i.e. using the dictionary for spellcheck
> but a fallback for hyphenation if it is missing). Marco and Sophie, any
> opinion on it?


Heya, hyphenation is very hard.

I have coded it into Proofing Tool GUI:
https://proofingtoolgui.org

See the subchapter in the user guide that explains how to use it:
https://proofingtoolgui.org/proofingtoolgui_files/ProofingToolGUI_manual_V30.html#5.4.hyphenation

It is very hard.

Anyway, my two cents: for English languages, you can probably use the one in Gerrit as a fallback.

The English languages that ship with my .oxt commit are either updated by me or by Kevin Atkinson, and none of us two change the hyphenation.

Kevin simply converts wordlists into .dic automatically and I don't change such complex things, so definitely that until someone with great knowledge decides to improve the hyphenator, it won't be ever changed (it will always be this).

For other languages, I am not sure.

:-)
Comment 3 Eyal Rozenberg 2023-04-27 18:06:52 UTC
(In reply to Marco A.G.Pinto from comment #2)
> Heya, hyphenation is very hard.

Perhaps, but this bug is not about hyphenation. Just about choosing which already-implemented hyphenation scheme to use.
Comment 4 Marco A.G.Pinto 2023-04-27 18:46:00 UTC
(In reply to Eyal Rozenberg from comment #3)
> (In reply to Marco A.G.Pinto from comment #2)
> > Heya, hyphenation is very hard.
> 
> Perhaps, but this bug is not about hyphenation. Just about choosing which
> already-implemented hyphenation scheme to use.

scheme?

Use the English hyphenator already there for English variants.

Is this it?
Comment 5 Marco A.G.Pinto 2023-04-27 18:46:46 UTC
I don't understand if you are asking for other thing.

:-)
Comment 6 Eyal Rozenberg 2023-04-27 19:18:29 UTC
(In reply to Marco A.G.Pinto from comment #4)
> Use the English hyphenator already there for English variants.
> 
> Is this it?

My example was for English, but suppose I want to hyphenate text in locale he_PL . That's probably not available, but it's more likely for us to have hyphenation capability for he, or for he_IL. I suggest that one of those be used as fallback. And the same for English or any other language.