Bug 62923 - En dash (not em) should replace two hyphens when inserted between numbers
Summary: En dash (not em) should replace two hyphens when inserted between numbers
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.0.1.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:4.4.0
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-30 05:20 UTC by Jackson Sul
Modified: 2014-08-30 13:11 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jackson Sul 2013-03-30 05:20:37 UTC
By default in Writer, when two hyphens (--) are set between two words they are converted into an em dash. This behavior is correct most of the time, but only when words are used. This causes problems with number ranges.

Two examples of the problem would be when typing a page range (e.g. 45--53) or a date range (1245--1312). En, not em, dashes should be used. This is according to every English style guide I can find, including Chicago, MLA, and APA styles. Obviously different languages might follow different patterns.

The behavior should be more nuanced then: 
When a numeric character is typed, followed by two hyphens, then another number, an en dash (U+2013) should be used. The current behavior, placing an em dash, (U+2014) should be kept for alphabetic characters only. 

So:
A--B = A[em dash]B
#--# = #[en dash]#

Thanks!
Comment 1 Adolfo Jayme Barrientos 2013-04-04 03:23:25 UTC
As an alternative, maybe LibreOffice could mimic Wordpress’ behavior:

--  = en dash
--- = em dash
Comment 2 Stefan Brüns 2013-11-11 15:20:11 UTC
This behaviour is also wrong for e.g. german, where the em-dash is hardly used:

(Using english text, but german typography rules, "--" is used here for en-dash) ...
---
1. I might do this -- but maybe not. (dash, space before and after)
2. Next match: Munich--Nuremberg (e.g. football, no spaces)
3. Opening hours: Monday, 9:00--18:00 (timetables, no spaces)
---

According to the LO help, "replace dashes" will replace the -- with an em-dash for case 2 and 3, which is wrong for german typography rules.

Also according to the LO help, -- will be replaced with en-dash for finish and hungarian.

Dash replacement should be a language specific option.
Comment 3 Simo Kaupinmäki 2013-12-06 01:52:17 UTC
Note that according to the Publication Manual of the American Psychological Association (5th ed., section 5.11), an en dash can also occur between words of equal weight in a compound adjective (e.g., "Chicago--London flight", here two hyphens are used for an en dash). So, even in English there may sometimes be a need for a straightforward way to insert an en dash rather than em dash between words.
Comment 4 tommy27 2014-04-22 19:09:16 UTC
pinging just to know if you guys still see this issue in current 4.1.5.3 or 4.2.3.3 LibO releases.
Comment 5 tommy27 2014-08-02 02:51:01 UTC
in forthcoming releases it will possible to add an autocorrect entry like:

.*--.*  --> en dash

this will automatically convert the two -- into an en dash even when they are inserted between numbers.

you need a 4.4.x master build with Lazlo's fix to test (probably a daily build will be available tomorrow)

see Bug 55292 - autocorrect does not correct two dashes to em-dash *when dashes are not discreet*
Comment 6 tommy27 2014-08-02 02:55:32 UTC
obviously you could set 
.*---.*  --> em dash
as well, similarly to what Adolfo suggested in comment 1
Comment 7 tommy27 2014-08-03 15:32:25 UTC
this is fixed in 4.4.x master using the new wildcard autocorrect patterns described in detail here:
https://bugs.freedesktop.org/show_bug.cgi?id=55292#c19

new feature is available in 4.4.x current daily build.
if tests will show the fix is stable it will probably be backported to 4.2.x and 4.3.x 

so I'm closing this one and marking as a duplicate of bug 55292.

let's continue discussion over there.

*** This bug has been marked as a duplicate of bug 55292 ***
Comment 8 tommy27 2014-08-25 15:29:20 UTC
I decided to revert the status from DUPLICATE to NEW since I think that this describe a different issue which is the effect of the "Replace Dashes" autocorrect option rather than autocorrect replacements as in Bug 55292.

Detailed infos about the "Replace Dashes" option are here:
https://help.libreoffice.org/Common/Options_1#Replace_Dashes

basically the issues of that options can be summarized as follows:

1- no distinction between numbers and letters (see comment 0 )

2- conflicts with grammatical rules among different languages (see comment 2 and comment 3 )

probably that option should be rewrote to allow the user decide whick kind of dash (en- or em-) to use under all the possible scenarios.
Comment 9 Simo Kaupinmäki 2014-08-25 17:55:23 UTC
A little off-topic, but looking at the replacement table linked in comment 8, it occurred to me that there is another potential issue with one of the patterns listed:

A --B (A, space, minus, minus, B)

The automatic replacement of the double minus may cause confusion if somebody tries to type instructions for command-line interface, e.g.:

dpkg --help

A command like this will of course not work if the double minus is replaced by an en dash.

I realize that this is a marginal user case and that there may, on the other hand, be people who will find this pattern useful, but the effect may sometimes be unexpected and frustrating. (This is why I personally tend to disable all autocorrection features. I prefer to stay in control of what I type.)
Comment 10 tommy27 2014-08-26 14:34:18 UTC
the wiki also says that:

"If the text has the Hungarian or Finnish language attribute, then two hyphens in the sequence A--B are replaced by an en-dash instead of an em-dash"

so if german tipography wants en-dash, german should be added to exception languages as Hungarian and Finnish as well. (comment 2)

anyway, are we really sure that the "em-dash" rule is 100% accurate in other languages as well?
Comment 11 Commit Notification 2014-08-29 11:25:43 UTC
Laszlo Nemeth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f78eeee1b7c6599589c37b456c4b2f1c6c2e249c

fdo#62923 Autocorrect opt. 'Replace dashes' uses en-dash between numbers



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 12 László Németh 2014-08-29 11:59:09 UTC
I have made the exception to number ranges, and I will extend the help, too. Thanks for your suggestions!

For German and other problems (eg. missing replacement of em dash before/after punctuation marks and at starting position), it would be better to add a new alternative autocorrect option 'Replace -- to en-dash, --- to em-dash', like in Wordpress.
Comment 13 Commit Notification 2014-08-29 12:14:18 UTC
Laszlo Nemeth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/help/commit/?id=45a3b79f12c5d057302a4c74977e86fefd299208

fdo#62923 replace "--" between digits to en-dash, not em-dash



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 14 tommy27 2014-08-29 14:00:00 UTC
thanks Lazlo this will probably fix a lot of conflicts.
I'm on a low bandwidth connection right now. In a few days I'll be able to download a new master build and give feedback about the effect of the new committs you have done.
Comment 15 Simo Kaupinmäki 2014-08-29 16:46:47 UTC
Reflecting on comment 10.

Basically, I think language-specific exceptions to the autocorrection scheme are far from an optimal solution. The "A--B" rule in itself is quite mechanical and straightforward, but whether it is applicable to a particular language or typographical tradition, is a different matter. As far as English is concerned, it is an oversimplified solution, even after the number-range problem has been fixed.

The cited exception for Finnish is valid simply because the em dash is hardly ever used in modern Finnish typography (this applies to Swedish typography as well, by the way). Therefore it's generally considered safe to replace a double hyphen with an en dash, regardless of the context and function of the dash. However, within a particular language or even within a regional variant of the language, there can be contrasting traditions and house styles. Quoting Wikipedia:

"The Oxford Guide to Style (2002, section 5.10.10) acknowledges that the spaced en dash is used by 'other British publishers', but states that the Oxford University Press—like 'most US publishers'—uses the unspaced em dash."

http://en.wikipedia.org/wiki/Dash#En_dash_versus_em_dash

Of course, the choice between the spaced en dash and the unspaced em dash is not a problem here, since the current autocorrection scheme seems designed to handle just this kind of variation. However, the scheme has (until now) completely failed to take into account that, besides the unspaced em dash, some English style guides also prescibe the use of the unspaced _en_ dash for some specific functions.

The exception made for number ranges should fix this bug as initially described (I haven't tested the fix yet). Unfortunately, this is only a partial solution to the basic problem. Granted, it is a step forward, because you now have a simple way to insert an en dash in a date range, such as "10--12 July". But analogously, the en dash can also occur between the names of months, as in "June--July 1967". Therefore, users will still need an alternative way to insert the en dash between two words. It would seem rather inconsistent to have a different method for inserting the en dash between words than between digits.

Wikipedia has more examples of the unspaced en dash as required by some English style guides:

- Radical--Unionist coalition
- Boston--Hartford route
- Mother--daughter relationship
- The McCain--Feingold bill
- Pre--Civil War era
- Pulitzer Prize--winning novel
- Public-school--private-school rivalries
etc.

http://en.wikipedia.org/wiki/Dash#En_dash

If you want to use the unspaced em dash for one function and the unspaced en dash for another, you need to be able to indicate whether it is the em dash or the en dash that is wanted in a specific context. This cannot rely on simply replacing double hyphens mechanically. What is needed is rather a specific and consistent method for inserting each kind of dash, at the user's discretion.
Comment 16 Owen Genat (retired) 2014-08-30 07:27:47 UTC
(In reply to comment #15)
> Basically, I think language-specific exceptions to the autocorrection scheme
> are far from an optimal solution. 

I generally agree.

> The "A--B" rule in itself is quite mechanical and straightforward

Mechanical yes. Straightforward, less so. Refer:

https://bugs.freedesktop.org/show_bug.cgi?id=55292#c65

The challenge here is what pattern pair to use for dash AutoCorrect replacement i.e., whether to use -- for em dash and neglect en dash, or use --- for em dash and -- for en dash. The former pair is supported by prominent style guides (APA and Chicago), the latter by TeX and wiki notation, yet the latter is a much better solution in terms of AutoCorrection.

> ... whether it is applicable to a particular language or typographical
> tradition, is a different matter. 

Again, generally agreed and there may be a need to at least leave wildcard entries out/off by default in order to cater for this. This is eventually a l10n issue.

> the en dash can also occur between the names of months, as in 
> "June--July 1967". Therefore, users will still need an alternative 
> way to insert the en dash between two words.

This example and all the following ones in the list depend on pattern recognition i.e., a unique pattern to indicate whether em dash or en dash is to be used. The TeX / wiki notation solves exactly this issue.

> If you want to use the unspaced em dash for one function and the unspaced en
> dash for another, you need to be able to indicate whether it is the em dash
> or the en dash that is wanted in a specific context. This cannot rely on
> simply replacing double hyphens mechanically. 

This is further argument for a distinct pattern. Spacing or no spacing, it is the distinct nature of the original pattern than matters.
Comment 17 tommy27 2014-08-30 13:11:36 UTC
(In reply to comment #16)
> (In reply to comment #15)
> ...
> 
> > the en dash can also occur between the names of months, as in 
> > "June--July 1967". Therefore, users will still need an alternative 
> > way to insert the en dash between two words.
> 
> This example and all the following ones in the list depend on pattern
> recognition i.e., a unique pattern to indicate whether em dash or en dash is
> to be used. The TeX / wiki notation solves exactly this issue.
> 

IMHO it's almost impossible to set rules to differentiate all those scenarios in all different languages where the -- has to turn into an en- or em-dash

I think the best choice is to modify he replace dashes option and set A--B en-dash and A---B to em-dash