Bug 148189 - Autocorrect hyphen/minus sign into Hebrew Maqaf when appropriate
Summary: Autocorrect hyphen/minus sign into Hebrew Maqaf when appropriate
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: AutoCorrect-Complete 114637
  Show dependency treegraph
 
Reported: 2022-03-25 19:59 UTC by Eyal Rozenberg
Modified: 2023-09-18 18:14 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2022-03-25 19:59:44 UTC
The Hebrew language uses a character named Maqaf - essentially a top hyphen - instead of the latin languages' inter-word hyphen. It is not a dash (neither en nor em), and is not used as a minus sign. More about this character:

https://codepoints.net/U+05BE

and here's an example:

אבי־סער

the Maqaf separates the two words.

The character does not have its own key on a Hebrew-layout keyboard; and use of hyphens in its stead is quite popular by now, so it is no longer in wide use by people typing in text... unfortunately. It can be inserted using Alt+underscore in a Hebrew-layout keyboard on Linux, and is rendered properly, but that is still not good enough.

The Auto-correct mechanism, when enabled, should replace a hyphen/minus character with a Maqaf when one types in a sequence of at least two Hebrew characters, then a hyphen, then another sequence of at least 2 Hebrew characters. Typing just one character, or mixing in non-Hebrew characters, or using spaces, should not result in the replacement, to keep things on the conservative side.


Note: I've split this off from bug 114637, which now becomes a meta-bug, because this capability is self-contained enough and important enough to merit its own bug.
Comment 1 Dieter 2022-04-18 03:32:57 UTC
Not sure, if this request is in the scope of AutoCorrect. Should it be part of "Localized Options"?
Comment 2 Eyal Rozenberg 2022-04-18 06:39:56 UTC
(In reply to Dieter from comment #1)
> Not sure, if this request is in the scope of AutoCorrect. Should it be part
> of "Localized Options"?

This is exactly in heart of the scope of autocorrect. How is this different from, replacing straight quotes with smart quotes?
Comment 3 Dieter 2023-09-17 12:05:52 UTC
(In reply to Eyal Rozenberg from comment #0)
> The Auto-correct mechanism, when enabled, should replace a hyphen/minus
> character with a Maqaf when one types in a sequence of at least two Hebrew
> characters, then a hyphen, then another sequence of at least 2 Hebrew
> characters. Typing just one character, or mixing in non-Hebrew characters,
> or using spaces, should not result in the replacement, to keep things on the
> conservative side.
As far as I can see, it is not a problem to add a replace option:
1. Copy Maqaf from special characters dialog
2. Tools -> AutoCorrect -> AutoCorrect Options -> Replace
3. Add a unique key combination and insert Maqaf in "With" field
4. Activate AutoCorrect while typing option

Result: Works as expected

Eyal, does this solve your problem?
=> NEEDINFO


I know it is not your idea for 100%, but more genral your option is "Make it possible, that defined replacement only happens in a specific context". If this is possible it would need a lot of effort and why should we only have such a feture for Maquaf and not for other characters also?
Comment 4 Eyal Rozenberg 2023-09-17 12:27:06 UTC
(In reply to Dieter from comment #3)
> Eyal, does this solve your problem?

No, for two reasons:

1. You don't want to always replace minus sign (-) with Maqaf (־). It's context-dependent. There are no Maqaf's after or before spaces (AFAIK), for example. You also want to make sure it's not a hyphen between numbers. etc.

2. The replacement is not defined by default. I'm asking for something that will be defined and enabled once AutoCorrect is enabled and the user is writing Hebrew text.

> I know it is not your idea for 100%, but more general your option is "Make it
> possible, that defined replacement only happens in a specific context". If
> this is possible it would need a lot of effort

This effort has already been expended for replacement of minus signs by en-dashes, which also happens in specific contexts. 

> and why should we only have
> such a feature for Maquaf and not for other characters also?

I'm not sure what you mean by other characters, but: We have AutoCorrect for double quotes, single quotes, and dashes. Maqaf is a kind of dash, so it is, in principle, covered by the current scope of AutoCorrect.
Comment 5 Dieter 2023-09-17 14:42:31 UTC
(In reply to Eyal Rozenberg from comment #4)
> 1. You don't want to always replace minus sign (-) with Maqaf (־). It's
> context-dependent. There are no Maqaf's after or before spaces (AFAIK), for
> example. You also want to make sure it's not a hyphen between numbers. etc.

AFAIK the idea of context sensible AutoCorrect is, that a unique key combination (like :-+:) is replaced by Maqaf
> 
> 2. The replacement is not defined by default. I'm asking for something that
> will be defined and enabled once AutoCorrect is enabled and the user is
> writing Hebrew text.

Let's see, if design-team has a solution

cc: Design-Team
Comment 6 Eyal Rozenberg 2023-09-17 14:59:06 UTC
(In reply to Dieter from comment #5)
> AFAIK the idea of context sensible AutoCorrect is, that a unique key
> combination (like :-+:) is replaced by Maqaf

While that could be defined, it's would be ineffective, as few people will use it, and one would always have to remember to type that combination in. Plus, if you work with existing text, it can't be applied. 

No, I specifically asked for heuristic replacement of minus characters between words with Maqaf, with no special key combination.

> Let's see, if design-team has a solution
> 
> cc: Design-Team

I don't mind the CC, but there's not much of a design issue here IMHO. The design part is all there - same as for minus-to-en-dash, or replacement of << >> with guillemets. Just needs a checkbox in the AutoCorrect options dialog and implementation of the heuristic.
Comment 7 Shai Berger 2023-09-17 16:58:34 UTC
Hi, 

I want to support Eyal's suggestion here. Some background: I was involved with the Hebrew keyboard layout standardization effort which led to the inclusion of Maqaf in the layout; in fact, I authored that part of the "il" XKB symbols file.  AFAIK, the alt+- key combination produces a Maqaf on Windows too, since Windows 10.

I think the suggestion to use an emoji-like construct (":+-:"), is missing the point: Because of decades of this character not being available on keyboards; and the fact that it's a 3rd-level key, meaning it is still not engraved on most physical keyboards, people have been taught not look out for it. However, since it is still used in newspapers and books, its use will make texts look better, and people will be happy to see it offered to them.

I think it is comparable, culturally and technically, to the fixing of ordinal numbers (where 4th is changed to 4^th):
- It is something which makes text look much nicer; most users would just not bother to find the option to do it themselves, but would be happy to see the computer do it for them;
- There are contextual constraints on the transformations -- th is transformed in "4th" and "1234th", but not in "a4th" (not pure number before) nor "4thd" (not end-of-word after). 

Considering your question if this belongs in localized options -- I'm not sure. The number suffixes are localized, as they change from language to language. "Capitalize first letter of every sentence" isn't, as it is apparently relevant to all users of Latin, Greek and and Cyrillic scripts. I think this feature is closer to the latter -- relevant to all users of the Hebrew script (the two most significant languages are Hebrew and Yiddish, you can see some uses in Yiddish at https://yi.wikipedia.org/), but I'll accept the argument that the whole script is fringe enough to put the option in "localized".
Comment 8 Heiko Tietze 2023-09-18 10:56:45 UTC
We have three or four options: a) AutoCorrect > Replacement (the emoji-style thing), b) AutoCorrect > Localized Options, c) spell-checker/hyphenation doing the job for a fix set of terms, d) heuristics. I doubt in general that heuristics cover all cases and are flexible enough to allow "superscript" or normal hyphen depending what users want.

(In reply to Eyal Rozenberg from comment #6)
> > Let's see, if design-team has a solution
> I don't mind the CC, but there's not much of a design issue here IMHO.

True, local speakers are the experts. And we have at least two comments requesting the enhancement. => NEW
Comment 9 Amir E. Aharoni 2023-09-18 18:14:29 UTC
A few more details.

Actually, it is possible to type the maqaf on Windows 8 and up, just like on desktop Linux systems, using Right Alt, a.k.a. AltGr and the minus key. It has also been possible to type it on Mac systems for a long time, using Option-\ or Shift-\.

But more importantly, I do agree with Eyal that it can be autocorrected. The correction rules must be carefully defined, of course. I can think of the following cases where it can be safely autocorrected:
* Between certain Hebrew letters (משהוכלב) and numbers, e.g. ו-4 to ו־4.
* Between certain Hebrew letters (משהוכלב) and non-Hebrew letters, e.g. ו-LibreOffice to ו־LibreOffice. (Sorry, it may look weird here because I don't have proper bidi markup, but the logical order of the letters is correct.)

Between two Hebrew letters, it's less certain. It should be autocorrected to something, either maqaf or en-dash, but it's difficult to say what exactly. For example, in כביש ירושלים-תל-אביב, the first one should be en-dash, and the second should be maqaf (according to the punctuation rules of the Academy of the Hebrew Language). I can't think of a smart automatic way to guess it, unfortunately. Perhaps it's possible to give LibreOffice lists of words where it's likelier to use one or the other, but it still won't be perfect.

If it's possible to suggest the user both maqaf and en-dash and let the user choose, it would be nice, but I'm not sure that LibreOffice has such a feature.