Bug 55292 - autocorrect does not correct two dashes to en-dash *when dashes are not discreet*
Summary: autocorrect does not correct two dashes to en-dash *when dashes are not discr...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.4.0.0.alpha0+ Master
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:5.0.0
Keywords:
: 65565 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-09-24 20:44 UTC by vermontpoet
Modified: 2015-07-28 04:53 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
autocorrect list with en-dash and em-dash wildcards (780 bytes, application/octet-stream)
2014-08-09 15:07 UTC, tommy27
Details
Dash test pattern document LOv4400 2014-08-18 (30.05 KB, application/vnd.oasis.opendocument.text)
2014-08-24 01:24 UTC, Owen Genat (retired)
Details
acor_und.dat and acor_en-AU.dat dash autocorrect lists (1.14 KB, application/zip)
2014-08-24 07:24 UTC, tommy27
Details
Dash test pattern results under GNU/Linux deb x86_64 LOv4400 2014-08-26 en-AU (37.42 KB, application/vnd.oasis.opendocument.text)
2014-08-29 05:52 UTC, Owen Genat (retired)
Details
Dash test pattern results under GNU/Linux deb x86_64 LOv4400 2014-08-26 en-AU RD off (37.66 KB, application/vnd.oasis.opendocument.text)
2014-08-30 05:56 UTC, Owen Genat (retired)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description vermontpoet 2012-09-24 20:44:43 UTC
If one sets auto-correct to correct two dashes (--) with an em-dash, auto-correct only works if the double-dash is discreet (separated from the word it follows).

Test-- :Won't be auto-corrected.

Test -- :*Will* be auto-corrected - but then one runs into the smart quotes bug.

Look forward to a fix for this.

I initially reported this under Bug 48892.
Comment 1 Owen Genat (retired) 2013-06-07 10:04:13 UTC
I think this bug can be updated, confirmed, and made to be a parent for bug #56307, bug #62923, and bug #65405, which are effectively all duplicates relating to same issue, that of how hyphens are replaced with dashes under the AutoCorrect facility. I can provide further details regarding the difficult nature of this issue and confirm it, but I first feel the bug title needs to be changed. 

As it stands the title is inaccurate because two non-discreet hyphens *can* be replaced by an em dash e.g., "a--b" will become "a—b" given the right options. The specific example "Test-- " is an edge-case and this needs to be made clear. Perhaps change the title to "EDITING: Hyphen/dash replacement under AutoCorrect".

I am making this comment in order for the reporter to consider these matters. Note that in order to obtain the desired behaviour described in this bug, will likely demand that the issue raised in corresponding bug #55293 also be addressed. It is therefore possible that this bug DEPENDS ON bug #55293. Adding myself to the CC list.
Comment 2 tommy27 2013-06-07 18:33:29 UTC
Changed "auto-correct" to "autocorrect" in title as that is the actual name of the facility in question and this bug was not showing up in typical searches.
Comment 3 tommy27 2014-07-23 05:36:53 UTC
Hi there,
to otbain what you look for you need to use wildcard autocorrection which is a new feature of 4.2.x (feature is full functional in 4.2.4.2 and above)
see release notes at the end of Writer section:
https://wiki.documentfoundation.org/ReleaseNotes/4.2#Writer

enter the autocorrect replacement table and add an entry like:

Replace: .*-- 
With: –    

if you have already set an entry like:
Replace: -- 
With: –   
you should remove it to avoid conflicts.

beware solution is not 100% complete since

"--" will be corrected to "–"
"test--" will be corrected to "test–"

but "test--drive" will not be corrected to "test–drive" and will remain "test--drive"

I will ping the developer of the autocorrect wilcard feature to ask him if there's a way we trigger the autocorrection even when it's in the middle of a compound word
Comment 4 Patrick Gillespie 2014-07-27 19:38:40 UTC
Okay, but there still seems to be confusion as concerns the en-dash and em-dash. (See Bug 67364).

Test-- will be corrected to "–"

But:

Test--- will correct to "–-" instead of Test— "em-dash".

The workaround requires that the user include an autocorrect entry as follows:

Correct .*–- to —

This *is* progress. (The autocorrect wildcard is invaluable.) I would think that the next step is to differentiate between the en-dash and em-dash without requiring the user's mediation.
Comment 5 Patrick Gillespie 2014-07-28 19:06:22 UTC
Just noticing that autocorrect unpredictably fails to correct a triple dash to an em-dash when followed by a single character. When I write European style dialog, for example, (instead of quotes -- see James Joyce) I will write:

---I took a book.

or

---A book was taken.

For some reason autocorrect fails to correct these even with the wild card.
Comment 6 László Németh 2014-07-29 09:00:50 UTC
Double/middle wildcard .*pattern.* has not implemented yet, but it seems, it would be useful, so I will check it, as a general solution for similar problems.

(In reply to comment #5)
> For some reason autocorrect fails to correct these even with the wild card.

This bug is fixed in the development version, thanks for your bug report!
Comment 7 Commit Notification 2014-07-29 09:08:36 UTC
Laszlo Nemeth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=86c0a56a9ee2e3d15286b11afad65568d1a87c11

fdo#55292 paragr. start. autocorr. with a single character (eg. ---A)



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 8 tommy27 2014-07-29 10:59:28 UTC
(In reply to comment #6)
> Double/middle wildcard .*pattern.* has not implemented yet, but it seems, it
> would be useful, so I will check it, as a general solution for similar
> problems.

yes, I put you on CC  in other autocorrect "corner case" where a double/middle wildcard would be really a bless. I keep my fingers crossed.


> (In reply to comment #5)
> > For some reason autocorrect fails to correct these even with the wild card.
> 
> This bug is fixed in the development version, thanks for your bug report!

@Patrick
please grab a masterbuild with Lazlo fix and retest if it works as you expect.
Comment 9 László Németh 2014-07-29 14:36:33 UTC
(In reply to comment #8)
> yes, I put you on CC  in other autocorrect "corner case" where a
> double/middle wildcard would be really a bless. I keep my fingers crossed.

I plan to add it soon, sorry for my being late to answer, thanks for your help!
Comment 10 Commit Notification 2014-08-01 16:49:22 UTC
Laszlo Nemeth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a07425892205ff8951027ea20459b97370d01de6

fdo#55292 add middle wildcard autocorrection



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 11 László Németh 2014-08-01 17:51:56 UTC
Now it is possible to add .*text.* like patterns (useful also for poor man's ligature replacements, etc.). Thanks for your bug report!
Comment 12 Patrick Gillespie 2014-08-02 17:55:14 UTC
OK, I've been testing with the Master4.4.0.0.alpha, and there are still a series of bugs.

In autocorrect, I've entered all possible combinations to produce an en-dash or an em-dash. here's what happens:

If you like super--man  --> incorrectly inserts an em-dash instead of an en-dash.

If you like super---man --> does not correct to either en-dash or em-dash

"Test--" --> does not correct to an en-dash. Perhaps autocorrect doesn't recognize the quote for wildcard purposes?

"Test---" does not correct to an em-dash.

If you try--  ---> successfully corrects to an en-dash.

If you try--- ---> incorrectly corrects to an a dash and en-dash "If I try-–"

"--you interrupted" ---> Does not correct.

“---You interrupted” ---> Does not correct.

So... these are all examples of dialog one might find when writing fiction. Libre still doesn't seem able to properly discriminate between an en-dash or an em-dash and doesn't seem to recognize quotes for wildcard purposes.

I'm adding this coment to bug 67364 as well.
Comment 13 Patrick Gillespie 2014-08-03 02:07:35 UTC
For informational purposes:

The following are the autocorrect entries I use(d) to reproduce the bugs above:

--        --> en-dash
---       --> em-dash

.*--      --> en-dash
.*---     --> em-dash

.*--.*    --> en-dash
.*---.*   --> em-dash

In principle, I would expect the entries above to cover all scenarios. I also changed the version (above) to 4.4.0.0alpha0+ Master.
Comment 14 Patrick Gillespie 2014-08-03 02:10:16 UTC
Changed "em-dash" to "en-dash" in bug title. en-dash = two dahses { em-dash = three dashes.
Comment 15 tommy27 2014-08-03 02:22:55 UTC
hi Patrick.
I haven't yet tried a new build.

anyway did you try just to have just this two entries
.*--.*    --> en-dash
.*---.*   --> em-dash

and remove the others?

I'm wondering if having all the 3 versions at the same time can be cause of internal conflicts.

maybe just the .*--.*  is enough and you don't need the -- and the .*-- replacements.
Comment 16 tommy27 2014-08-03 02:32:33 UTC
(In reply to comment #12)
> ....
> 
> If you try--- ---> incorrectly corrects to an a dash and en-dash "If I try-–"
> 

that is a clear conflict since when you digit the triple --- , the last 2 -- are corrected into en-dash by the .*-- wildcard and you get -– 

so probably you have to play with it and add some kind of .*-–.* or  .*–-.* wildcard to obtain the em-dash
Comment 17 Patrick Gillespie 2014-08-03 02:49:45 UTC
Yeah, eliminated all autocorrect entries but for the following:

.*--.*
.*---.*

These result in the following:

“--test     --> Does not correct to en-dash.
“---test    --> Does not correct to em-dash.

--test      --> Does not correct to en-dash.
---test     --> Does not correct to em-dash.

super--man  --> Incorrectly corrects to em-dash. 
super---man --> Does not correct to em-dash.

test--      --> Does not correct to en-dash.
test---     --> Does not correct to em-dash.

So.. the results are actually worse. I think the patch remains fundamentally flawed.
Comment 18 Patrick Gillespie 2014-08-03 02:50:38 UTC
(In reply to comment #17)
> Yeah, eliminated all autocorrect entries but for the following:
> 
> .*--.*
> .*---.*
> 
> These result in the following:
> 
> “--test     --> Does not correct to en-dash.
> “---test    --> Does not correct to em-dash.
> 
> --test      --> Does not correct to en-dash.
> ---test     --> Does not correct to em-dash.
> 
> super--man  --> Incorrectly corrects to em-dash. 
> super---man --> Does not correct to em-dash or en-dash.
> 
> test--      --> Does not correct to en-dash.
> test---     --> Does not correct to em-dash.
> 
> So.. the results are actually worse. I think the patch remains fundamentally
> flawed.
Comment 19 tommy27 2014-08-03 15:14:50 UTC
Hi Patrick.
the key is to have these 4 autocorrect entries:

.*--.*   for en-dash
.*---.*  for em-dash
.*–-.*   for em-dash
.*–- .*  for em-dash

note that the last 2 entries are: 
[wildcard] [en-dash] [hyphen] [wildcard]
[wildcard] [en-dash] [hyphen] [space] [wildcard]

If I set these entries every test you showed works fine.

are you sure that the 4.4 build you tested already had the fix?

the version I'm using right now is: Version: 4.4.0.0.alpha0+
Build ID: 8dc2ab47b9e5ef0ff381575195a36ceec8789ef1
TinderBox: Win-x86@42, Branch:master, Time: 2014-08-02_23:22:41


see under Help/About LibO Dev
maybe you used a daily which was not yet fixed

I revert status to FIXED.
please give feedback
Comment 20 tommy27 2014-08-03 15:20:45 UTC
*** Bug 65565 has been marked as a duplicate of this bug. ***
Comment 21 tommy27 2014-08-03 15:26:57 UTC
*** Bug 67364 has been marked as a duplicate of this bug. ***
Comment 22 tommy27 2014-08-03 15:32:25 UTC
*** Bug 62923 has been marked as a duplicate of this bug. ***
Comment 23 Patrick Gillespie 2014-08-03 16:15:44 UTC
I'm using:

Version: 4.4.0.0.alpha0+
Build ID: 8cb75e905cef50a2d8a423443d3dcef5f1899027
TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-07-30_04:47:1

From here:

http://dev-builds.libreoffice.org/daily/master/Linux-rpm_deb-x86_64@46-TDF/2014-07-30_04.47.13/

The latest version I can find is from 30th of July. Do you have a link for the August build?
Comment 25 tommy27 2014-08-04 15:21:51 UTC
@Sophie 
would you please test this wildcard autocorrect for en- and em-dashes (see comment 19) and see if they cause any conflicts with french grammar rules about dashes?

if the patch has no side effects I would ask for backport from 4.4.x to 4.3.x
Comment 26 Patrick Gillespie 2014-08-04 22:00:04 UTC
(In reply to comment #24)
> try this:
> http://dev-builds.libreoffice.org/daily/master/Linux-rpm_deb-x86_64@46-TDF-
> dbg/2014-08-02_23.49.39/

Okay, so I downloaded the 1.1G (!) archive. Extracted. Now what? I'm looking in the program directory, see a bunch of shared objects? Give me a clue here.
Comment 27 tommy27 2014-08-05 05:21:35 UTC
I'm a Window user and I don't know how to install those packages on Linux
anyway it seems that now all the Linux tindebox are working again and are updated to august. so download again from the old link you already did before.
Comment 28 sophie 2014-08-05 07:14:25 UTC
Hi all, 

So tested withVersion: 4.4.0.0.alpha0+
Build ID: 04a65e2704ee80701ca750f2e7c8c0565d2aa830
TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-08-04_23:46:59
I have the replacements set to:
--        --> en-dash
---       --> em-dash

.*--      --> en-dash
.*---     --> em-dash

.*--.*    --> en-dash
.*---.*   --> em-dash

result gives:
« -test
« –test
-test
–test
super-man
super–man
test-
test---
==> seems the only one which doesn't work is .*---
@ Patrick, please open a new issue for this one if you confirm it
@ Tommy, yes, the fix could be backported :)
Sophie
Comment 29 tommy27 2014-08-05 07:59:52 UTC
To make The 3 hyphen work
i Think You need To set this as well .–- .*
Comment 30 László Németh 2014-08-05 08:11:49 UTC
Many thanks for your test! Unfortunately, it seems, some replacements, for example, the ending "---" don't work in the end of the paragraphs, so it is better to press a dot or a space instead of the Enter to check the patterns.
Comment 31 sophie 2014-08-05 08:24:21 UTC
Hi László, many thanks for your fix :)
I still have an issue with *.--- if I add a space or a punctuation mark after.
typing:
test--- and a space displays test-- (2 hyphens and a space)
test--- and a coma displays test--, (2 hyphens and a coma)
if I hit the space bar after that, I get test-, (1 hyphen, a coma, a space).
Sophie
Comment 32 Patrick Gillespie 2014-08-05 17:25:40 UTC
Testing with:

Version: 4.4.0.0.alpha0+
Build ID: 04a65e2704ee80701ca750f2e7c8c0565d2aa830
TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-08-04_23:46:59

“--test.”     --> Does not correct (with period).
“---test.”    --> Does not correct (with period).

--test        --> Does not correct (without period and return).
---test       --> Does not correct (without period and return).

–test.        --> Does Correct.
–-test.       --> Incorrectly to an en-dash and dash.

Super—man     --> Incorrectly corrects to dashes to an em-dash (with return)
Super---man   --> Does not correct (with return).

super—man.    --> Does correct (with period).
Super–-man.   --> Incorrectly corrects to an en-dash and dash (with period).

Test–         --> Does correct (with return).
test---       --> Does not correct (with return).

test–.        --> Does correct (with period)
Test–-.       --> Incorrectly corrects to an en-dash and dash (with period).

“Test--”      --> Does not correct.
“Test---”     --> Does not correct.

The above results were obtained with the recommended auto-correct entries:

.*--.*   for en-dash
.*---.*  for em-dash
.*–-.*   for em-dash
.*–- .*  for em-dash

Reopened the bug. :-/
Comment 33 tommy27 2014-08-05 18:35:47 UTC

(In reply to comment #32)
> ....
> 
> The above results were obtained with the recommended auto-correct entries:
> 
> .*--.*   for en-dash
> .*---.*  for em-dash
> .*–-.*   for em-dash
> .*–- .*  for em-dash
> 
> Reopened the bug. :-/

@Patrick & Sophie
try adding this one as well:
–- .*   -->   em-dash
Comment 34 Patrick Gillespie 2014-08-05 18:53:36 UTC
Much better! Notice the only one that didn't correct:

“–test.”
“—test.”

–test
—test

–test.
—test.

super–man
super—man

Super–man.
Super—man.

test–
test---     ---> Did not correct with return. Can reproduce at will.

Test–.
Test—.

“test–”
“Test—”

These results include your latest addition:

–- .*   -->   em-dash


My advice is that before you mark this bug resolved, you include these entries in autocorrect **by default**. There is little chance that the first time user is going to be able to stumble on this combination before experiencing inordinate frustration; or only after finding this page through a Google search.
Comment 35 tommy27 2014-08-07 07:34:26 UTC
please Patrick rewrite just the example where it still doesn't work.
please list them "as you type" so I can retest

I reproduce the test--- return which doesn't trigger the autocorrect
while test-- is correctly corrected after hitting return
Comment 36 Patrick Gillespie 2014-08-07 11:04:10 UTC
(In reply to comment #35)
> please Patrick rewrite just the example where it still doesn't work.
> please list them "as you type" so I can retest
> 
> I reproduce the test--- return which doesn't trigger the autocorrect
> while test-- is correctly corrected after hitting return

I don't know what you mean by "as you type"? There's no other way *but* "as I type"?

In the meantime, I tried a couple more permutations that also don't work:

The quotes in these first two seem to confuse autocorrect:

“super—man”       --> two dashes incorrectly corrects to em-dash
“super---man”     --> three dashes does not correct 

test---           --> Same as before.

“super—man.”      --> two dashes incorrectly corrects to em-dash 
“super---man.”    --> three dashes does not correct

And then, strangely (this is a very strange bug), I tried to reproduce the first example in the same document, and the following happened:

“super–man”       --> correct: two dashes corrects to en-dash
“super–-man”      --> incorrect: three dashes corrects to en-dash + dash

The document looks like this:

“super—man”
“super---man”

test---

“super—man.”
“super–-man.”

“super–man.” 

“super–man”
“super–-man”

So... within the same document, I got two different results with the same input?
Comment 37 László Németh 2014-08-07 11:27:27 UTC
> So... within the same document, I got two different results with the same input?

Yes, it is possible. The reason is that some user-defined patterns aren't applied first time, because the separator (colon, comma etc.) could be part of the pattern, as in this case, too. And the default dash autocorrection will modify the word, instead of the user-defined ones. Disabling the default dash autocorrection (Tools->Autocorrect options...->Options->Replace dashes) may help, but I will search a better solution, too.
Comment 38 tommy27 2014-08-07 13:00:04 UTC
correct. If I disable the "dashes option" that Lazlo cited the --- gets autocorrected into em-dash after hitting return
Comment 39 tommy27 2014-08-07 13:17:28 UTC
no, wait a moment... now that I've relaunched LibO and that option is still off, the "--- enter" doesn't trigger the autocorrect

moreover I'm experiencing unconsisent behaviour of the triple hyphen between words (super---man)

sometimes it gest autocorrected like expected in super*em-dash*man
while other time is corrected into super*en-dash+hyphen*man

I don't understand. :-(
Comment 40 tommy27 2014-08-09 15:07:23 UTC
Created attachment 104345 [details]
autocorrect list with en-dash and em-dash wildcards

I created an acor_und.dat file containing only en-dash and em-dash wildcard combinations for testing purposes.

It will apply those autocorrect replacements regardless of the language of your document.

put that file under the autocorr subfolder of your user profile.

exact location may be found here:
https://wiki.documentfoundation.org/UserProfile


moreover as I said in my previous post I've found something weird...
the "triple hyphen" pattern is sometimes correctly replaced in some sessions and not in others.

what I've noticed is that once you are in a session where the triple hyphen doesn't work and then you enter the autocorrect options menu and you add a new autocorrect entry or you change some settings, when you are back in the document the triple hyphen pattern start working again as expected and is directly converted in an em-dash.

this is a very weird behaviour of the autocorrect engine...
any idea about it?

anyway apart of these en-em-dash weird cases the middle wildcard pattern (.*sometext.*) works fine in other scenarios I test.

for example in italian there's no word with "qc" but a lot of words with "cq" and if I set this entry .*qc.* -> cq all possible typing mistakes like:
aqcua
aqcuario
aqcuitrino
subaqcueo
etc. etc.

are correctly corrected as:
acqua
acquario
acquitrino
subacqueo
etc. etc.

the same is for triple letter (i.e.  ttt) that do not exist in italian where you have only double letters).

if I set .*ttt.* -> tt
there's a bunch of typing errors that are intercepted
like: tuttto, tutttavia, atttutire  -> tutto, tuttavia, attutire

so I think that the middle wildcard should be backported to 4.3.1 and 4.2.7 releases

the weirdness of the "triple hyphen/em-dash" corner cases will have to be addressed by another commit.

@Lazlo
what do you think about it?
Comment 41 Patrick Gillespie 2014-08-09 16:45:58 UTC
(In reply to comment #40)

> anyway apart of these en-em-dash weird cases the middle wildcard pattern
> (.*sometext.*) works fine in other scenarios I test.


Possibly, have you tested them with punctuation and quotes? That sort of thing?

Have you tried autocorrecting two t's *and* three t's like two dashes and three dashes? 

.*tt.*  --> t
.*ttt.* --> tt

Have you tried:

atttutire --> with a return
atttutire.
"atttutire"
"atttutire."

etc...

I understand that one would *not* want to correct 2 t's to one and 3 t's to 2, but trying this might give you some insight into the dash problems.

In other words, is it only dashes, or is this a systemic problem?
Comment 42 Owen Genat (retired) 2014-08-10 02:22:47 UTC
Thanks to tommy27 and Lázló for your continued efforts in this area. I have downloaded attachment 104345 [details], placed it in my user profile. I also renamed it to acor_en-AU.dat (my locale) so I can see exactly which entries are being used and confirm that ONLY these entries are being used. Tested under v4.4.0.0.alpha0+ Build ID: 4d635dcae4d7275d04a17a0efc11b0531d5d0a82
TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-08-08_23:24:32

Results (trailing character is usually a SPACE; <E> = ENTER):
a--b      to n-dash OK
a---b     to n-dash+hyphen NOT OK
“a--b”    to n-dash OK
“a---b”   to m-dash OK
a-- b     to n-dash OK
a--- b    to m-dash OK
a --b     to n-dash OK
a ---b    to m-dash OK
“a--”     to n-dash OK
“a---”    to m-dash OK
“--b”     to n-dash OK
“---b”    to m-dash OK
a--”      to n-dash OK
a---”     to m-dash OK
“--b      to n-dash OK
“---b     to m-dash OK
a--.      to n-dash OK
a---.     to m-dash OK
--b.      to n-dash OK
---b.     to m-dash OK
a--<E>    to n-dash OK
a---<E>   to 3 hyphens NOT OK
--b<E>    to n-dash OK
---b<E>   to n-dash+hyphen NOT OK
“a--<E>   to n-dash OK
“a---<E>  to 3 hyphens NOT OK
“--b<E>   to 2 hyphens NOT OK
“---b<E>  to n-dash+hyphen NOT OK
a--”<E>   to 2 hyphens NOT OK
a---”<E>  to 3 hyphens NOT OK
--b”<E>   to 2 hyphens NOT OK
---b”<E>  to n-dash+hyphen NOT OK
--b”.<E>  to n-dash OK
---b”.<E> to n-dash+hyphen NOT OK

This would seem largely in keeping with what tommy27 has reported.
Comment 43 tommy27 2014-08-10 14:59:17 UTC
(In reply to comment #41)
> (In reply to comment #40)
...
> 
> Have you tried autocorrecting two t's *and* three t's like two dashes and
> three dashes? 
> 
> .*tt.*  --> t
> .*ttt.* --> tt

that kind of replacement would certainly cause conflicts.
just for testing purposed I did:

.*hh.*  --> n
.*hhh.* --> m
.*nh.*  --> m
nh.*    --> m
.*nh .* --> m

and I still have issue with the "triple h" not being directly replaced into "m"


> Have you tried:
> 
> atttutire --> with a return
> atttutire.
> "atttutire"
> "atttutire."
> 
> etc...

all these examples are correctly replaced by my .*ttt.* --> tt
so in my system the punctation marks have no negative effect on it.


> I understand that one would *not* want to correct 2 t's to one and 3 t's to
> 2, but trying this might give you some insight into the dash problems.
> 
> In other words, is it only dashes, or is this a systemic problem?

I think it's a general problem when there are entries for double and triple character to be replaced...

what I don't understand is why sometimes LibO correctly replaces the "---" and other times it doesn't...
Comment 44 tommy27 2014-08-10 16:06:15 UTC
@Owen

try this... just soon after one of the replacement fails, enter the autocorrect replacement table and add a random new autocorrect entry, then go back to the document and try one of the previous tests.

in my PC, as I said before, this temporarily makes the --- work in any combination apart from ---<ENTER> which is not corrected at all
Comment 45 tommy27 2014-08-10 16:22:00 UTC
@Lazlo
is it possible to push the middle wildcard rule to 4.2.x or 4.3.x daily builds as well?

I'm suspecting that the inconsistent behaviour I'm describing could be due to some bug of the 4.4.x autocorrect engine so I'd wish to test that new feature on a more stable branches like 4.2.x or 4.3.x
Comment 46 Owen Genat (retired) 2014-08-11 03:16:03 UTC
(In reply to comment #44)
> @Owen
> 
> try this... just soon after one of the replacement fails, enter the
> autocorrect replacement table and add a random new autocorrect entry, then
> go back to the document and try one of the previous tests.
> 
> in my PC, as I said before, this temporarily makes the --- work in any
> combination apart from ---<ENTER> which is not corrected at all

Thanks for pointing this out. I noticed this also, but mainly wanted to first report exactly what was working in the provided AutoCorrect configuration. If I add a new AutoCorrect entry as suggested (e.g., x -> y) these are the improves to the NOT OK entries in comment 42:

a---b     to m-dash OK
---b<E>   to m-dash OK
“--b<E>   to n-dash OK
“---b<E>  to m-dash OK
a--”<E>   to n-dash OK
a---”<E>  to m-dash OK
--b”<E>   to n-dash OK
---b”<E>  to m-dash OK
---b”.<E> to m-dash OK

... thus following still have problems:

a---<E>   to 3 hyphens NOT OK
“a---<E>  to 3 hyphens NOT OK
Comment 47 tommy27 2014-08-11 06:25:12 UTC
I Think The "triple hyphen enter" may represent a different issue
Comment 48 tommy27 2014-08-12 15:21:40 UTC
after many tests is clear that what causes the autocorrect problem is that we have troubles to handle at the same time autocorrect entries with "--" and "---" 

this is exactly Bug 67364 - FORMATTING: Autocorrect no longer functions correctly when replacing two hyphens if also an entry with three hyphens exists

I marked Bug 67364 as DUPLICATE of current Bug 55292 but probably that was not a correct choice so I'm considering to reopen that bug.

anyway I have an idea... why we don't stop "fighting" with this "---" corner case and start using a unique key combination for em-dash?

my proposal would be to have:

.*--.*  for en-dash
.*__.*  for em-dash

my first tests show that with only these 2 entries we have correct replacement of every possible text combination without any conflict and without need of setting all those mixed patterns of hyphen/endash that did not work as expected.

please give feedback to my proposal.
Comment 49 Owen Genat (retired) 2014-08-13 10:17:47 UTC
(In reply to comment #48)
> after many tests is clear that what causes the autocorrect problem is that
> we have troubles to handle at the same time autocorrect entries with "--"
> and "---" 
> 
> this is exactly Bug 67364 - FORMATTING: Autocorrect no longer functions
> correctly when replacing two hyphens if also an entry with three hyphens
> exists

I initially found this somewhat disheartening but I now think it may lead us to a better solution. There is going to have to be a compromise, as you indicate but I think you did the right thing in closing bug 67364. IMO it is the same issue.

> anyway I have an idea... why we don't stop "fighting" with this "---" corner
> case and start using a unique key combination for em-dash?
> 
> my proposal would be to have:
> 
> .*--.*  for en-dash
> .*__.*  for em-dash

I agree we need a different pattern, but use of low line (U+005F) is NOT a good idea because this character is used extensively in basic text forms e.g., "Name: __________". We do not want these converting to em-dashes. I have tested this and I end up with the first pair of low lines being converted e.g. "—_____" in similar manner to the 3-hyphen problem. This is where the fun with characters begins.

The main argument for "--" to en-dash and "---" to em-dash comes from TeX and some wiki notation but there really is no consensus about this and I think others need to understand this. Some publishers, universities, and wikis use "--" for em-dash. A 2-character pattern is more restrictive. These would seem to be a simple compromise:

--- for en-dash
=== for em-dash

... but they too have problems. The use of both a different character for each and a 3-char AutoCorrect rule for consistency gives greater options in avoiding conflicts with mathematical notation such as == or ++ etc (=== is however used in JavaScript ... <boo hiss>). Unfortunately these both likely have a potential to conflict with the AutoCorrect for types of horizontal rule:

https://help.libreoffice.org/Common/Drawing_Lines_in_Text#Automatic_lines_in_Writer

Another compromise is to use mixed character combinations:

-_- for en-dash
-+- for em-dash

These avoid conflict with the AutoCorrect for types of horizontal rule, but have no precedent in use case. I feel we may need to end up using HTML notation as a compromise:

&ndash for en-dash
&mdash for em-dash

Not necessarily pretty or as easy (when typing) but the result after autocorrection is the same and there is GOOD precedent for this type of AutoCorrect rule. Like the low line (U+005F) example it also only requires two rules for effective conversion in all use cases. It is also relatively easy to remember.
Comment 50 Owen Genat (retired) 2014-08-13 10:23:24 UTC
(In reply to comment #49)
> I feel we may need to end up using HTML notation as a compromise:
> 
> &ndash for en-dash
> &mdash for em-dash

Of course, 2 seconds after I hit Save Changes I see a problem with this use case also. Editing HTML source code :-( If this is the case, then we are likely back to the unique patten idea as a compromise. Sigh.
Comment 51 tommy27 2014-08-13 13:23:04 UTC
I Think we sound keep -- for endash because is simple and something else for emdash like -.-  or any other key combination
Comment 52 Patrick Gillespie 2014-08-13 15:35:42 UTC
@ Tommy27. You've already made this bug much, much less problematic than it used to be. Even my copy of WordPerfect X3 doesn't solve this en-dash em-dash issue perfectly. I think you can pat yourself on the back and feel good about backproting the improvements you've already made. If other use cases don't quite work, I'll bet that continued effort will solve them too.
Comment 53 tommy27 2014-08-13 18:24:20 UTC
thanks Patrick but the credit for the fix goes to Lazlo Nemeth, who's the developer who coded it.

If only we could work it out how to make the .*---.* work consistently we would have  almost 100% of the issue solved.

anyway, I agree that even with the current 4.4.x situation, a lot of improvement over 4.2.x is already achieved.

so, again, I hope the fix could be backported to 4.2.x and 4.3.x as well where we could use .*--.* for en-dash and .*-.-.* or any other combination for em-dash.
Comment 54 tommy27 2014-08-14 08:28:14 UTC
guys maybe I've found a workaround that makes the "---" consistently turn into an em-dash avoiding the conflict with the .*--.* en-dash wildcard.

you have to make the character that precedes the triple hyphen be part of the wildcard pattern.

let's say
.*0---.*   --> 0— (em-dash)
.*1---.*   --> 1— 
.*a---.*   --> a— 
.*b---.*   --> b— 
...
.*z---.*   --> z— 
.* ---.*   -->  — 

obviously you have to set multiple replacement for each of the keys that may precede an em-dash (numbers, space, letter, upper and lower case, etc.)

so there are 2 choices to make the em-dash work:

1- set a specific key combination for em-dash that doesn't include the double or triple hyphen (i.e.  .*-.-.*)

2- keep using the triple hyphen as an em-dash setting multiple replacements with the preceding key workaround 


the .*--.* wildcard is working fine as a replacement for en-dash and should be kept, and now we have 2 possible working wildcard patterns to make the em-dash do it's job.

so I think this can be marked as FIXED again.
feel free to reopen if you don't agree

the only replacement that doesn't work is "---" ENTER which IMHO is a different issue and is probably related to the automatic lines feature in Writes as pointed out by Owen in comment 49: https://help.libreoffice.org/Common/Drawing_Lines_in_Text#Automatic_lines_in_Writer

"if you start a new line in a Writer text document by typing three or more hyphen characters and press the Enter key, the characters are removed and the previous paragraph gets a line as a bottom border".

I wonder if increasing the line trigger from 3 hyphens to 4 hyphens would solve this one as well. UX-advise neeeded before making a change.
Comment 55 tommy27 2014-08-15 16:05:25 UTC
Hi guys, I have some more thoughts about this and I think the final solution is to avoid the triple hyphen “---” at all as a replacement for em-dash.

One reason is the known conflicts with the double hyphen “--” for en-dash that forces users to set a considerable amount of additional autocorrect replacements as a workaround to make the “---” work as an em-dash.

So IMHO it's easier and more elegant to have just 2 autocorrect entries, this one .*--.*  for en-dash and this one .*-.-.* (or whatever you like) for em-dash rather than dozens and dozens of workaround autocorrections.

Moreover there's another possible conflict with the “apply border” option which is triggered again by typing 3 hyphens (or more) followed by Enter at the beginning of a line. 

So to avoid any kind conflicts I strongly suggest to use a specific and unique autocorrection for em-dash like and definitively forget about the “---” as an autocorrect replacement.
Comment 56 tommy27 2014-08-23 05:54:43 UTC
just to tell that I found a new easy workaround to make the "triple" hyphen work.

you have to save just the .*---.* to em-dash in your language autocorrect list
and the .*--.* to en-dash in the [All] autocorrect list.

if the .*---.* and the .*--.* are in separate lists they won't collide anymore and all the possible cases except the ---[enter] will work as expected.
Comment 57 Patrick Gillespie 2014-08-23 12:07:40 UTC
What do you mean by "language autocorrect"?
Comment 58 tommy27 2014-08-23 13:52:08 UTC
I mean the list of autocorrect replacements specific to any language.
You see it when you click on "Tools/Autocorrect Options/Replace" and you click on the dropdown menu on the upper right.

let's say your default writing language is English(USA)
so you have to enter the .*---.* to em-dash replacement in that list

then scroll all the way up the dropdown menu till you find the [All] list then enter the .*--.* to en-dash replacement in that list
Comment 59 Owen Genat (retired) 2014-08-24 01:02:41 UTC
(In reply to comment #56)
> just to tell that I found a new easy workaround to make the "triple" hyphen
> work.
> 
> you have to save just the .*---.* to em-dash in your language autocorrect
> list and the .*--.* to en-dash in the [All] autocorrect list.

Hmm. Unfortunately under GNU/Linux using v4.4.0.0.alpha0+ Build ID: e379401618268ed7f7f5885a36b90e1f4f6cd4af TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-08-18_05:51:03 I get some strange side-effects for the indicated combination (.*--.* to en-dash in acor_und.dat and .*---.* to em-dash in acor_en-AU.dat). It appears to (again) be influenced by whether the AutoCorrect entries are edited during the session (or not) e.g.,

a---<E>   to 3 hyphens NOT OK
a---.     to 3 hyphens NOT OK[1]
“a---<E>  to 3 hyphens NOT OK
“a---.    to 3 hyphens NOT OK[1]
--b<E>    to 2 hyphens NOT OK
--b.      to 2 hyphens NOT OK
--b”<E>   to 2 hyphens NOT OK
--b”.     to 2 hyphens NOT OK
a--b.     to em-dash NOT OK[1]
“a--b.    to em-dash NOT OK[1]
a---b.    to 3 hyphens NOT OK[1]
“a---b.   to 3 hyphens NOT OK[1]

Usually a trailing full stop is a trigger for dash conversion, but it is more variable with this combination.

[1] After editing the [All] AutoCorrect list (e.g., to add new x -> y entry) these tests work, and this previously working test fails:

---b.     to 3 hyphens NOT OK

There appears to still be something really strange going on with the replacement engine and the en-dash + em-dash entries. Testing this is not at all as straightforward as it seems.
Comment 60 Owen Genat (retired) 2014-08-24 01:24:05 UTC
Created attachment 105180 [details]
Dash test pattern document LOv4400 2014-08-18

Here is the document I use for testing the patterns. Usage:

1. Delete the entries in the "On open" column (leave the column break at end).
2. Re-type each entry from the "Test pattern" column in the "On open" column.
3. Edit the AutoCorrect list (e.g., add a new entry) but do not close the file.
4. Repeat steps 1-2 for the "Edit AutoCorrect" column.
5. Observe differences.
Comment 61 tommy27 2014-08-24 07:24:59 UTC
Created attachment 105184 [details]
acor_und.dat and acor_en-AU.dat dash autocorrect lists

@Owen
please retest using this minimal autocorrect lists containing just those single entries (.*--.* to en-dash in und  and .*---.* in en-AU)

those do work in my Win7x64 master build (*) even without editing the replacement lists.

the only tests not working are all those with triple hyphen and enter, either the a--- or the ---b etc. etc.

by the way thanks for the test file which will be very useful.



(*) Build ID: f4246fab77113147b36706a1f3d93e8724ff826b
TinderBox: Win-x86@42, Branch:master, Time: 2014-08-23_03:27:09
Comment 62 Owen Genat (retired) 2014-08-29 05:52:54 UTC
Created attachment 105408 [details]
Dash test pattern results under GNU/Linux deb x86_64 LOv4400 2014-08-26 en-AU

(In reply to comment #61)
> @Owen
> please retest using this minimal autocorrect lists containing just those
> single entries (.*--.* to en-dash in und  and .*---.* in en-AU)

Thanks for these. I re-tested using the provided AC lists. They are the same as the one's I had already created, but it never hurts to re-test. I must say though, that I am becoming more disheartened by this change as the results are now even more dubious.

The ENTER entries seem to be where the problem lies, but this varies according to whether the [All] AC list or the locale-specific AC list ([en-AU] in my case) is subsequently edited. This is what I did:

 1. Replaced AC lists with provided lists.
 2. Open LO to check lists are as-expected.
 3. Open test file (attachment 105180 [details]).
 4. Delete columns of pre-existing entries.
 5. Enter second column of entries (On open).
 6. Save file.
 7. Edit [en-AU] list to create new entry (x->y).
 8. Enter third column of entries (After edit [en-AU]).
 9. Save file.
10. Edit [en-AU] list to remove previous new entry (x->y).
11. Close file.
12. Re-open test file (attachment 105180 [details]).
13. Edit [All] list to create new entry (x->y).
14. Enter fourth column of entries (After edit [All]).
15. Save file.
16. Edit [All] list to remove previous new entry (x->y).
17. Highlight errors using "Error" paragraph style.
18. Save / exit.

Results are attached (easier to see than to iterate here). Tested under v4.4.0.0.alpha0+ Build ID: 37b9ea92ba81d74764a2345a9c75c65bfd272d2b TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-08-26_09:48:30
Comment 63 tommy27 2014-08-29 09:15:23 UTC
Di you turn off the replace dashes option before that test ?  See comment 37
Comment 64 Owen Genat (retired) 2014-08-30 05:56:59 UTC
Created attachment 105450 [details]
Dash test pattern results under GNU/Linux deb x86_64 LOv4400 2014-08-26 en-AU RD off

(In reply to comment #63)
> Di you turn off the replace dashes option before that test ?  See comment 37

I did not, because I generally perform a user profile reset prior to testing, and forgot to unset this option. I also noticed that the "After edit [en-AU]" entry for ---b". is erroneous in the last test sheet (I entered ---b. by mistake).

Unfortunately, that appears to be the good news. I have performed another full test, using the steps indicated in comment 62, this time with both Replace Dashes options OFF and while the results are different, they are no less problematic. Refer attached.

If that where not enough, I am still getting rather unpredictable results using the build indicated in comment 62. Even going back and trying to re-test some of the results from that comment (i.e., with the Replace Dashes options ON) does not produce a repeatable result. There is something horribly flaky with the entire AutoCorrect facility (at least in the v4.4 builds) or this particular change IMO.

I use the US International keyboard layout with dead keys (i.e., " + e = ë). Perhaps this is related? I will try re-testing later with a basic US layout to see if it makes any difference.
Comment 65 Owen Genat (retired) 2014-08-30 06:59:41 UTC
On reflection, I am wondering if the root of the problem is given in bug 62923:

> By default in Writer, when two hyphens (--) are set between two words 
> they are converted into an em dash.

This seems somewhat controversial / dubious practice (especially given the change proposed here) and yet the comments in the code from the related commits in that bug indicate:

editeng/source/misc/svxacorr.cxx	
> // Replace [A-z0-9]--[A-z0-9] double dash with "emDash" or "enDash"
> // [0-9]--[0-9] double dash always replaced with "enDash"
> // Finnish and Hungarian use enDash instead of emDash.

source/text/shared/01/06040100.xhp
> If the hyphens are there between digits or the text has the Hungarian 
> or Finnish language attribute, then two hyphens in the sequence A--B 
> are replaced by an en-dash instead of an em-dash.<comment>i71908</comment>

Why? I would say many more countries than just Finland and Hungary are affected by this. As I pointed out in comment 49: "Some publishers, universities, and wikis use "--" for em-dash", but there is a specific reason why. Here are a couple of prominent examples:

APA 6th Ed., §4.13 Hyphenation:

> If an em dash is not available on your keyboard, use two hyphens with 
> no space before or after. [...] if the en dash is not available on 
> your keyboard, [use] a single hyphen.

Chicago Manual of Style 16th Ed., §2.13 Dashes:

> For an em dash [...] type two hyphens (leave no space on either side). 
> [...] authors can generally avoid the en dash and use hyphens instead.

In other words, those guides that do indicate this practice tend to use a SINGLE hyphen for en dash. We therefore have either:

a--b = em dash
a-b  = en dash

...or:

a---b = em dash
a--b  = en dash

In both cases, there are distinct differences and there is NEVER a suggestion of using a--b to imply BOTH em dash and en dash, which is what we seem to have at present. It is better to have completely different patterns for replacement of different characters. I would also like to mention that these types of wildcard replacement entries should be at user discretion i.e., not included by default as this makes it easier for locale-specific customisations to be made as necessary.

Given what I am seeing and the questions I have posed in this comment I am setting the status to REOPENED.
Comment 66 László Németh 2015-07-20 11:11:05 UTC
In LibreOffice 5.0, using the default :--: and :---: patterns, it's possible to
insert n-dash and m-dash immediately, without any ambiguity, so I am closing this issue.

Many thanks for your help!
Comment 67 tommy27 2015-07-20 11:48:35 UTC
thanks Lazlo.
you finally knocked down this one!!! :-)
Comment 68 vermontpoet 2015-07-20 11:51:06 UTC
(In reply to László Németh from comment #66)
> In LibreOffice 5.0, using the default :--: and :---: patterns, it's possible
> to
> insert n-dash and m-dash immediately, without any ambiguity, so I am closing
> this issue.
> 
> Many thanks for your help!

Hi Laszlo, I'm not using 5 yet. Does the same replacement method work in 5 as works in 4?

In other words:

*.---.* ---> em-dash (fro example)

etc...

Do you have a recommended set of autocorrect replacement options? 

And does:

"Test---" insert the correct, closing smart quote? This is really the only hang up I continue to notice, and strikes me as, probably, a separate issue.
Comment 69 tommy27 2015-07-20 11:52:02 UTC
@all users
please remember to remove any other autocorrect replacement for en- and em-dash like those with .* wildcards to avoid conflicts with the new :--: and :---: replacements
Comment 70 vermontpoet 2015-07-20 11:55:20 UTC
(In reply to tommy27 from comment #69)
> @all users
> please remember to remove any other autocorrect replacement for en- and
> em-dash like those with .* wildcards to avoid conflicts with the new :--:
> and :---: replacements

So wait (putting my 'stupid hat' on) the new option should look like this?

Replace................With

:--: ..................en-dash
:---:..................em-dash
Comment 71 vermontpoet 2015-07-20 11:56:44 UTC
And is the ":" the new wildcard? Or does this only pertain to en and em-dash replacement?
Comment 72 tommy27 2015-07-20 12:13:56 UTC
the ":" is not exactly a wildcard
you have to digit it in order to see the conversion

as Lazlo said, type :--: to have en-dash
and :---: to have em-dash

this should work in any situation: isolated, beginning of word, end of word, middle of word

grab a 5.0.x daily build and test it.
at the moment those replacements are available in US and Hungarian autocorrect lists but you may copy in any other languages as well

see release notes here:
https://wiki.documentfoundation.org/ReleaseNotes/5.0#Emoji_and_in-word_replacement_support
Comment 73 vermontpoet 2015-07-21 19:08:09 UTC
(In reply to tommy27 from comment #72)
> the ":" is not exactly a wildcard
> you have to digit it in order to see the conversion
> 
> as Lazlo said, type :--: to have en-dash
> and :---: to have em-dash
> 
> this should work in any situation: isolated, beginning of word, end of word,
> middle of word
> 
> grab a 5.0.x daily build and test it.
> at the moment those replacements are available in US and Hungarian
> autocorrect lists but you may copy in any other languages as well
> 
> see release notes here:
> https://wiki.documentfoundation.org/ReleaseNotes/5.0#Emoji_and_in-
> word_replacement_support

Okay, I tried it. Seriously? 

Guess I'm not impressed...but...congratulations???

An inelegant, brute force solution. Think I'll go back to what I had. Was working very well. I mean, I could have come up with this 10 years ago &--& or *--*, or #---#, etc...

Or maybe I'm just not getting this...
Comment 74 tommy27 2015-07-21 19:23:51 UTC
it seems you did not understand how the :--: really works

try adding an autocorrect replacement like #--# to en-dash
and try to write super#--#man to see what happens

now try to write super:--:man and see the difference...

IMHO not the right attitude to comment a fix...
if you have a better solution submit a patch
Comment 75 vermontpoet 2015-07-21 19:41:08 UTC
(In reply to tommy27 from comment #74)
> it seems you did not understand how the :--: really works
> 
> try adding an autocorrect replacement like #--# to en-dash
> and try to write super#--#man to see what happens
> 
> now try to write super:--:man and see the difference...
> 
> IMHO not the right attitude to comment a fix...
> if you have a better solution submit a patch

Yeah, I didn't even have to bother with LibreOffice50.

Accomplished the same thing in 4.x, like this:

*.#---#*.    ---> em-dash
*.#--#*.     ---> en-dash

*.:---:*.    ---> em-dash
*.:--:*.     ---> en-dash

Works beautifully. Every time. It appears that all you've done is to spare me the trouble of having to enter the wild card (in autocorrect). In exchange for which, for that one time inconvenience, I now I have to type two extra characters every single time I want an en-dash or em-dash (that I didn't have to type before). So, yeah, I call that inelegant and brute force.

But then again, maybe I'm not getting something...
Comment 76 tommy27 2015-07-21 20:04:36 UTC
you could also use a mixed approach

.*---.*  for em-dash  (wildcard thing)
and :--: for en dash  (emoji thing)

so you have to type the extra characters for en-dash sequence only


what you probably don't have noticed yet is that the wildcard trigger the autocorrect only when you finish a word and type enter, dot or other punctation marks.

the emoji immediately trigger the autocorrect even before finishing the word
Comment 77 vermontpoet 2015-07-21 21:00:13 UTC
//...the emoji immediately trigger the autocorrect even before finishing the word...//

Okay.

As long as the wildcards still work, I'm happy. 

Personally, from a writer's perspective, I'll continue using wildcards (which have worked beautifully for me). The very specific and particular use-cases where the extra typing involved in using emoji *might* be worth it --- I'm not seeing yet, but maybe I don't need to.
Comment 78 tommy27 2015-07-22 05:43:43 UTC
(In reply to vermontpoet from comment #77)
> ...
> As long as the wildcards still work, I'm happy. 
> ...

ok, so keep using the wildcards if you prefer them.


(In reply to vermontpoet from comment #75)
> ...
> 
> Accomplished the same thing in 4.x, like this:
> 
> *.#---#*.    ---> em-dash
> *.#--#*.     ---> en-dash
> 
> ....

you would obtain the same with:
*.---*.    ---> em-dash
*.#--#*.   ---> en-dash

so the # for the em-dash is unnecessary