Bug 156046 - Autocorrect replaces " - " with " – " instead of " — " (en-dash instead of em-dash)
Summary: Autocorrect replaces " - " with " – " instead of " — " (en-dash instead of em...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: AutoCorrect-Complete
  Show dependency treegraph
 
Reported: 2023-06-24 22:15 UTC by Eyal Rozenberg
Modified: 2023-07-14 08:17 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2023-06-24 22:15:40 UTC
When using a dash to separate a clause of text in a sentence, the appropriate dash in an Em-Dash: — , either with or without spaces. But what LO's autocorrect gives us, when we type " - " (space, minus, space) is an En-dash instead of the minus/hyphen: " – ". En-dashes are appropriate mostly for ranges - and those don't have spaces. See here [1] or here [2] for a discussion of this.

Anyway, the default should change to a replacement with an Em-dash. AFAICT, this is not language-dependent, but of course I may be wrong. If I am wrong then we need this to be appropriately localized.



 [1]: https://www.scribbr.com/language-rules/dashes/
 [2]: https://www.merriam-webster.com/words-at-play/em-dash-en-dash-how-to-use
Comment 1 V Stuart Foote 2023-06-25 03:17:14 UTC
Not clear we are doing it incorrectly. Key *is* inclusion of spaces.

In Writer:

Two keyboard hyphen dash (U+002d) without spaces are converted to a single U+2014 (EM DASH)

A single keyboard hyphen dash (U+002d) with spaces is converted to a single U+2013 (EN DASH).

Two keyboard hyphen dash (U+002d) with spaces is converted to a single U+2013 (EN DASH).

And if you require a specific dash, just enter its Unicode and <Alt>+X convert:
U+002d - HYPHEN-MINUS (on keyboard)
U+2010 - HYPHEN
U+2011 - NONBREAKING HYPHEN
U+2012 - FIGURE DASH
U+2013 - EN DASH
U+2014 - EM DASH
U+2015 - HORIZONTAL BAR

Also, there are autocorrect emoji style entries :---: will convert to U+2014 EM DASH, and :-: will convert to U+2212 MINUS SIGN

But, I guess we could tweak the edit engine(s) to to replace the two hyphen dashes into a single U+2014 EM DASH.

Only thing I'd ask is that it be consistent for all the modules.
Comment 2 Eyal Rozenberg 2023-06-25 16:11:54 UTC
(In reply to V Stuart Foote from comment #1)
> Not clear we are doing it incorrectly.

...
> A single keyboard hyphen dash (U+002d) with spaces is converted to a single
> U+2013 (EN DASH).

This is the bug. A user that's not well aware of dashes and their importance is likely to use a space hyphen space sequence, and we should interpret that as "I don't know what kind of dash I want", and enter an Em dash.
Comment 3 V Stuart Foote 2023-06-25 17:24:39 UTC
Hmm, but when is it a minus vs. a dash? Folks writing text runs as in-line formulas won't appreciate an U+2014 EM DASH when they really need a U+2212 MINUS, so not converting the <space>-<space> is probably more helpful. No unexpected conversion, and they have to adjust the U+002d.

IMHO the <space>--<space> conversion to U+2013 EN DASH while keeping the spaces is probably incorrect. Suggest we should instead be droping the spaces and convert the run to a single EM DASH? Closing the whitespace gap (catch a leading but no closing, or closing but no leading, along with the bracketed spaces).

Folks needing an EM DASH with surrounding white space can use the :---: emoji syntax?

Otherwise our "no spaces" conversion of two keyboard U+002d HYPHEN-DASH to EM DASH is appropriate.
Comment 4 Eyal Rozenberg 2023-06-25 18:31:09 UTC
(In reply to V Stuart Foote from comment #3)
> Hmm, but when is it a minus vs. a dash?

That is a bit of a red herring. We are already replacing it with a dash; this bug is about the question of _which_ dash... 

> Folks writing text runs as in-line formulas won't appreciate an U+2014 EM DASH when they really need a U+2212 MINUS

They don't need that, because:

1. We are talking about runs of text. When we are in something which is in some way designated as math formula, then we can talk. If you suggest we use some heuristic to detect that - please open a bug and we'll discuss it :-)

2. There are no spaces before and after minuses in a proper formula.


> so not converting the <space>-<space> is probably more helpful. No
> unexpected conversion, and they have to adjust the U+002d.

Absolutely disagree. That is a niche use case, and even for that niche, using a formula editor (or a TeX add-on etc.) is the more appropriate thing to do.


> IMHO the <space>--<space> conversion to U+2013 EN DASH while keeping the
> spaces is probably incorrect. Suggest we should instead be droping the
> spaces and convert the run to a single EM DASH?

No. Why? Because both the with-spaces and without-spaces styles are common. See this question on Writing.SX:

https://writing.stackexchange.com/q/8555

> ... Chicago Style dictates that you must not have spaces before and 
> after the em dash, while AP Style dictates that you should have 
> spaces before and after...

> Folks needing an EM DASH with surrounding white space can use the :---:
> emoji syntax?

We should be thinking about people who don't type syntax aiming for replacement, but about the typical author, who just writes using a single dash/hyphen and are not thinking about it. That is the typical use case we need to cater to. Nobody will know about this syntax nor use it.
Comment 5 Heiko Tietze 2023-07-01 10:51:22 UTC
En-dash can be expressed by two hyphen, EM by three. I think we do a good job with converting it. And wonder if other languages than English have different punctuation guides. My take: NAB.
Comment 6 Eyal Rozenberg 2023-07-01 11:41:32 UTC
(In reply to Heiko Tietze from comment #5)
> En-dash can be expressed by two hyphen, EM by three. I think we do a good
> job with converting it.

This bug is not about that. Please re-read the title and comment #2. Remember, we are talking about the case of a user who is not indicating "explicitly" what they want using multiple-dashes or other sequences of characters intended for replacement.



> And wonder if other languages than English have
> different punctuation guides. 

That's a completely different issue; we currently have a uniform-across-languages behavior and nobody has challenged it, so this objection sounds TBH like a deflection.

The question is very simple: What's the better choice for replacing the hyphen in a " - " sequence: Should it be an En-dash or an Em-dash. Since there is no bug suggesting that we should redo this whole logic, this question stands. So, unless we have a good reason to prefer an En-dash - which nobody has presented AFAICT - we should choose an Em-dash.
Comment 7 Eyal Rozenberg 2023-07-01 18:24:55 UTC
(In reply to Heiko Tietze from comment #5)
also...

> EM by three. I think we do a good job with converting it.

Actually, we don't quite, please have a look at bug 147681. But again, unrelated to what happens to text with a single hyphen where the user did not explicitly indicate what dashes they want.
Comment 8 Heiko Tietze 2023-07-14 07:41:12 UTC
We discussed the proposal in the design meeting.

According the help [1] we change "A, space, minus, space, B" as well as "A, space, minus, minus, space, B" to EN-dash. This autocorrection aligns with MSO and the demands from English grammar. EN dash is used for ranges, EM dash to substitute commas and to emphasis content. It is convenient (and likely familiar for native English speaker) to distinguish between EN and EM by adding spaces. Changing the autocorrection is WF/NAB.

Without space "A, minus, minus, B" is supposed to become an EM-dash, which is not working => bug. Please file a follow-up ticket, if you can confirm.

[1] https://help.libreoffice.org/latest/en-US/text/shared/01/06040100.html
Comment 9 Eyal Rozenberg 2023-07-14 08:17:17 UTC
(In reply to Heiko Tietze from comment #8)
> We discussed the proposal in the design meeting.

Not really you didn't. You all but ignored the discussion and information here on the bug page.

> According the help [1] we change "A, space, minus, space, B" as well as "A,
> space, minus, minus, space, B" to EN-dash. 

So, the help says what we do right now. That has no bearing on a bug report.

> This autocorrection aligns with ...the demands from English grammar. 

On the contrary, and that's the whole point. It does not align with the demands of the English grammar.

> EN dash is used for ranges, EM dash to substitute commas and to emphasis content.

Exactly. That's why "A, space, minus, space, B" needs to be substituted with an EM dash, since it's not a range.

> It is convenient (and likely familiar for native English speaker) to distinguish between EN and EM by adding spaces.

1. Exactly, and it's _EM_ dashes which often (not always) get surrounding spaces, EN dashes do not [1]. I even added links to explain this in the opening comment!
2. This bug is about when a user has _not_ indicated which dash they want, and it is up to _us_ to decide. And we're making the _wrong_ decision, because in almost all cases where the user write "A, space, minus, space, B" - they don't mean a range, they are writing a sentence with a pause.



---

 [1] : In UK English, EN dashes are sometimes used instead of EN dashes, and thus may have spaces around them.