If one sets auto-correct to correct two dashes (--) with an em-dash, auto-correct only works if the double-dash is discreet (separated from the word it follows). Test-- :Won't be auto-corrected. Test -- :*Will* be auto-corrected - but then one runs into the smart quotes bug. Look forward to a fix for this. I initially reported this under Bug 48892.
I think this bug can be updated, confirmed, and made to be a parent for bug #56307, bug #62923, and bug #65405, which are effectively all duplicates relating to same issue, that of how hyphens are replaced with dashes under the AutoCorrect facility. I can provide further details regarding the difficult nature of this issue and confirm it, but I first feel the bug title needs to be changed. As it stands the title is inaccurate because two non-discreet hyphens *can* be replaced by an em dash e.g., "a--b" will become "a—b" given the right options. The specific example "Test-- " is an edge-case and this needs to be made clear. Perhaps change the title to "EDITING: Hyphen/dash replacement under AutoCorrect". I am making this comment in order for the reporter to consider these matters. Note that in order to obtain the desired behaviour described in this bug, will likely demand that the issue raised in corresponding bug #55293 also be addressed. It is therefore possible that this bug DEPENDS ON bug #55293. Adding myself to the CC list.
Changed "auto-correct" to "autocorrect" in title as that is the actual name of the facility in question and this bug was not showing up in typical searches.
Hi there, to otbain what you look for you need to use wildcard autocorrection which is a new feature of 4.2.x (feature is full functional in 4.2.4.2 and above) see release notes at the end of Writer section: https://wiki.documentfoundation.org/ReleaseNotes/4.2#Writer enter the autocorrect replacement table and add an entry like: Replace: .*-- With: – if you have already set an entry like: Replace: -- With: – you should remove it to avoid conflicts. beware solution is not 100% complete since "--" will be corrected to "–" "test--" will be corrected to "test–" but "test--drive" will not be corrected to "test–drive" and will remain "test--drive" I will ping the developer of the autocorrect wilcard feature to ask him if there's a way we trigger the autocorrection even when it's in the middle of a compound word
Okay, but there still seems to be confusion as concerns the en-dash and em-dash. (See Bug 67364). Test-- will be corrected to "–" But: Test--- will correct to "–-" instead of Test— "em-dash". The workaround requires that the user include an autocorrect entry as follows: Correct .*–- to — This *is* progress. (The autocorrect wildcard is invaluable.) I would think that the next step is to differentiate between the en-dash and em-dash without requiring the user's mediation.
Just noticing that autocorrect unpredictably fails to correct a triple dash to an em-dash when followed by a single character. When I write European style dialog, for example, (instead of quotes -- see James Joyce) I will write: ---I took a book. or ---A book was taken. For some reason autocorrect fails to correct these even with the wild card.
Double/middle wildcard .*pattern.* has not implemented yet, but it seems, it would be useful, so I will check it, as a general solution for similar problems. (In reply to comment #5) > For some reason autocorrect fails to correct these even with the wild card. This bug is fixed in the development version, thanks for your bug report!
Laszlo Nemeth committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=86c0a56a9ee2e3d15286b11afad65568d1a87c11 fdo#55292 paragr. start. autocorr. with a single character (eg. ---A) The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
(In reply to comment #6) > Double/middle wildcard .*pattern.* has not implemented yet, but it seems, it > would be useful, so I will check it, as a general solution for similar > problems. yes, I put you on CC in other autocorrect "corner case" where a double/middle wildcard would be really a bless. I keep my fingers crossed. > (In reply to comment #5) > > For some reason autocorrect fails to correct these even with the wild card. > > This bug is fixed in the development version, thanks for your bug report! @Patrick please grab a masterbuild with Lazlo fix and retest if it works as you expect.
(In reply to comment #8) > yes, I put you on CC in other autocorrect "corner case" where a > double/middle wildcard would be really a bless. I keep my fingers crossed. I plan to add it soon, sorry for my being late to answer, thanks for your help!
Laszlo Nemeth committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=a07425892205ff8951027ea20459b97370d01de6 fdo#55292 add middle wildcard autocorrection The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Now it is possible to add .*text.* like patterns (useful also for poor man's ligature replacements, etc.). Thanks for your bug report!
OK, I've been testing with the Master4.4.0.0.alpha, and there are still a series of bugs. In autocorrect, I've entered all possible combinations to produce an en-dash or an em-dash. here's what happens: If you like super--man --> incorrectly inserts an em-dash instead of an en-dash. If you like super---man --> does not correct to either en-dash or em-dash "Test--" --> does not correct to an en-dash. Perhaps autocorrect doesn't recognize the quote for wildcard purposes? "Test---" does not correct to an em-dash. If you try-- ---> successfully corrects to an en-dash. If you try--- ---> incorrectly corrects to an a dash and en-dash "If I try-–" "--you interrupted" ---> Does not correct. “---You interrupted” ---> Does not correct. So... these are all examples of dialog one might find when writing fiction. Libre still doesn't seem able to properly discriminate between an en-dash or an em-dash and doesn't seem to recognize quotes for wildcard purposes. I'm adding this coment to bug 67364 as well.
For informational purposes: The following are the autocorrect entries I use(d) to reproduce the bugs above: -- --> en-dash --- --> em-dash .*-- --> en-dash .*--- --> em-dash .*--.* --> en-dash .*---.* --> em-dash In principle, I would expect the entries above to cover all scenarios. I also changed the version (above) to 4.4.0.0alpha0+ Master.
Changed "em-dash" to "en-dash" in bug title. en-dash = two dahses { em-dash = three dashes.
hi Patrick. I haven't yet tried a new build. anyway did you try just to have just this two entries .*--.* --> en-dash .*---.* --> em-dash and remove the others? I'm wondering if having all the 3 versions at the same time can be cause of internal conflicts. maybe just the .*--.* is enough and you don't need the -- and the .*-- replacements.
(In reply to comment #12) > .... > > If you try--- ---> incorrectly corrects to an a dash and en-dash "If I try-–" > that is a clear conflict since when you digit the triple --- , the last 2 -- are corrected into en-dash by the .*-- wildcard and you get -– so probably you have to play with it and add some kind of .*-–.* or .*–-.* wildcard to obtain the em-dash
Yeah, eliminated all autocorrect entries but for the following: .*--.* .*---.* These result in the following: “--test --> Does not correct to en-dash. “---test --> Does not correct to em-dash. --test --> Does not correct to en-dash. ---test --> Does not correct to em-dash. super--man --> Incorrectly corrects to em-dash. super---man --> Does not correct to em-dash. test-- --> Does not correct to en-dash. test--- --> Does not correct to em-dash. So.. the results are actually worse. I think the patch remains fundamentally flawed.
(In reply to comment #17) > Yeah, eliminated all autocorrect entries but for the following: > > .*--.* > .*---.* > > These result in the following: > > “--test --> Does not correct to en-dash. > “---test --> Does not correct to em-dash. > > --test --> Does not correct to en-dash. > ---test --> Does not correct to em-dash. > > super--man --> Incorrectly corrects to em-dash. > super---man --> Does not correct to em-dash or en-dash. > > test-- --> Does not correct to en-dash. > test--- --> Does not correct to em-dash. > > So.. the results are actually worse. I think the patch remains fundamentally > flawed.
Hi Patrick. the key is to have these 4 autocorrect entries: .*--.* for en-dash .*---.* for em-dash .*–-.* for em-dash .*–- .* for em-dash note that the last 2 entries are: [wildcard] [en-dash] [hyphen] [wildcard] [wildcard] [en-dash] [hyphen] [space] [wildcard] If I set these entries every test you showed works fine. are you sure that the 4.4 build you tested already had the fix? the version I'm using right now is: Version: 4.4.0.0.alpha0+ Build ID: 8dc2ab47b9e5ef0ff381575195a36ceec8789ef1 TinderBox: Win-x86@42, Branch:master, Time: 2014-08-02_23:22:41 see under Help/About LibO Dev maybe you used a daily which was not yet fixed I revert status to FIXED. please give feedback
*** Bug 65565 has been marked as a duplicate of this bug. ***
*** Bug 67364 has been marked as a duplicate of this bug. ***
*** Bug 62923 has been marked as a duplicate of this bug. ***
I'm using: Version: 4.4.0.0.alpha0+ Build ID: 8cb75e905cef50a2d8a423443d3dcef5f1899027 TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-07-30_04:47:1 From here: http://dev-builds.libreoffice.org/daily/master/Linux-rpm_deb-x86_64@46-TDF/2014-07-30_04.47.13/ The latest version I can find is from 30th of July. Do you have a link for the August build?
try this: http://dev-builds.libreoffice.org/daily/master/Linux-rpm_deb-x86_64@46-TDF-dbg/2014-08-02_23.49.39/
@Sophie would you please test this wildcard autocorrect for en- and em-dashes (see comment 19) and see if they cause any conflicts with french grammar rules about dashes? if the patch has no side effects I would ask for backport from 4.4.x to 4.3.x
(In reply to comment #24) > try this: > http://dev-builds.libreoffice.org/daily/master/Linux-rpm_deb-x86_64@46-TDF- > dbg/2014-08-02_23.49.39/ Okay, so I downloaded the 1.1G (!) archive. Extracted. Now what? I'm looking in the program directory, see a bunch of shared objects? Give me a clue here.
I'm a Window user and I don't know how to install those packages on Linux anyway it seems that now all the Linux tindebox are working again and are updated to august. so download again from the old link you already did before.
Hi all, So tested withVersion: 4.4.0.0.alpha0+ Build ID: 04a65e2704ee80701ca750f2e7c8c0565d2aa830 TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-08-04_23:46:59 I have the replacements set to: -- --> en-dash --- --> em-dash .*-- --> en-dash .*--- --> em-dash .*--.* --> en-dash .*---.* --> em-dash result gives: « -test « –test -test –test super-man super–man test- test--- ==> seems the only one which doesn't work is .*--- @ Patrick, please open a new issue for this one if you confirm it @ Tommy, yes, the fix could be backported :) Sophie
To make The 3 hyphen work i Think You need To set this as well .–- .*
Many thanks for your test! Unfortunately, it seems, some replacements, for example, the ending "---" don't work in the end of the paragraphs, so it is better to press a dot or a space instead of the Enter to check the patterns.
Hi László, many thanks for your fix :) I still have an issue with *.--- if I add a space or a punctuation mark after. typing: test--- and a space displays test-- (2 hyphens and a space) test--- and a coma displays test--, (2 hyphens and a coma) if I hit the space bar after that, I get test-, (1 hyphen, a coma, a space). Sophie
Testing with: Version: 4.4.0.0.alpha0+ Build ID: 04a65e2704ee80701ca750f2e7c8c0565d2aa830 TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-08-04_23:46:59 “--test.” --> Does not correct (with period). “---test.” --> Does not correct (with period). --test --> Does not correct (without period and return). ---test --> Does not correct (without period and return). –test. --> Does Correct. –-test. --> Incorrectly to an en-dash and dash. Super—man --> Incorrectly corrects to dashes to an em-dash (with return) Super---man --> Does not correct (with return). super—man. --> Does correct (with period). Super–-man. --> Incorrectly corrects to an en-dash and dash (with period). Test– --> Does correct (with return). test--- --> Does not correct (with return). test–. --> Does correct (with period) Test–-. --> Incorrectly corrects to an en-dash and dash (with period). “Test--” --> Does not correct. “Test---” --> Does not correct. The above results were obtained with the recommended auto-correct entries: .*--.* for en-dash .*---.* for em-dash .*–-.* for em-dash .*–- .* for em-dash Reopened the bug. :-/
(In reply to comment #32) > .... > > The above results were obtained with the recommended auto-correct entries: > > .*--.* for en-dash > .*---.* for em-dash > .*–-.* for em-dash > .*–- .* for em-dash > > Reopened the bug. :-/ @Patrick & Sophie try adding this one as well: –- .* --> em-dash
Much better! Notice the only one that didn't correct: “–test.” “—test.” –test —test –test. —test. super–man super—man Super–man. Super—man. test– test--- ---> Did not correct with return. Can reproduce at will. Test–. Test—. “test–” “Test—” These results include your latest addition: –- .* --> em-dash My advice is that before you mark this bug resolved, you include these entries in autocorrect **by default**. There is little chance that the first time user is going to be able to stumble on this combination before experiencing inordinate frustration; or only after finding this page through a Google search.
please Patrick rewrite just the example where it still doesn't work. please list them "as you type" so I can retest I reproduce the test--- return which doesn't trigger the autocorrect while test-- is correctly corrected after hitting return
(In reply to comment #35) > please Patrick rewrite just the example where it still doesn't work. > please list them "as you type" so I can retest > > I reproduce the test--- return which doesn't trigger the autocorrect > while test-- is correctly corrected after hitting return I don't know what you mean by "as you type"? There's no other way *but* "as I type"? In the meantime, I tried a couple more permutations that also don't work: The quotes in these first two seem to confuse autocorrect: “super—man” --> two dashes incorrectly corrects to em-dash “super---man” --> three dashes does not correct test--- --> Same as before. “super—man.” --> two dashes incorrectly corrects to em-dash “super---man.” --> three dashes does not correct And then, strangely (this is a very strange bug), I tried to reproduce the first example in the same document, and the following happened: “super–man” --> correct: two dashes corrects to en-dash “super–-man” --> incorrect: three dashes corrects to en-dash + dash The document looks like this: “super—man” “super---man” test--- “super—man.” “super–-man.” “super–man.” “super–man” “super–-man” So... within the same document, I got two different results with the same input?
> So... within the same document, I got two different results with the same input? Yes, it is possible. The reason is that some user-defined patterns aren't applied first time, because the separator (colon, comma etc.) could be part of the pattern, as in this case, too. And the default dash autocorrection will modify the word, instead of the user-defined ones. Disabling the default dash autocorrection (Tools->Autocorrect options...->Options->Replace dashes) may help, but I will search a better solution, too.
correct. If I disable the "dashes option" that Lazlo cited the --- gets autocorrected into em-dash after hitting return
no, wait a moment... now that I've relaunched LibO and that option is still off, the "--- enter" doesn't trigger the autocorrect moreover I'm experiencing unconsisent behaviour of the triple hyphen between words (super---man) sometimes it gest autocorrected like expected in super*em-dash*man while other time is corrected into super*en-dash+hyphen*man I don't understand. :-(
Created attachment 104345 [details] autocorrect list with en-dash and em-dash wildcards I created an acor_und.dat file containing only en-dash and em-dash wildcard combinations for testing purposes. It will apply those autocorrect replacements regardless of the language of your document. put that file under the autocorr subfolder of your user profile. exact location may be found here: https://wiki.documentfoundation.org/UserProfile moreover as I said in my previous post I've found something weird... the "triple hyphen" pattern is sometimes correctly replaced in some sessions and not in others. what I've noticed is that once you are in a session where the triple hyphen doesn't work and then you enter the autocorrect options menu and you add a new autocorrect entry or you change some settings, when you are back in the document the triple hyphen pattern start working again as expected and is directly converted in an em-dash. this is a very weird behaviour of the autocorrect engine... any idea about it? anyway apart of these en-em-dash weird cases the middle wildcard pattern (.*sometext.*) works fine in other scenarios I test. for example in italian there's no word with "qc" but a lot of words with "cq" and if I set this entry .*qc.* -> cq all possible typing mistakes like: aqcua aqcuario aqcuitrino subaqcueo etc. etc. are correctly corrected as: acqua acquario acquitrino subacqueo etc. etc. the same is for triple letter (i.e. ttt) that do not exist in italian where you have only double letters). if I set .*ttt.* -> tt there's a bunch of typing errors that are intercepted like: tuttto, tutttavia, atttutire -> tutto, tuttavia, attutire so I think that the middle wildcard should be backported to 4.3.1 and 4.2.7 releases the weirdness of the "triple hyphen/em-dash" corner cases will have to be addressed by another commit. @Lazlo what do you think about it?
(In reply to comment #40) > anyway apart of these en-em-dash weird cases the middle wildcard pattern > (.*sometext.*) works fine in other scenarios I test. Possibly, have you tested them with punctuation and quotes? That sort of thing? Have you tried autocorrecting two t's *and* three t's like two dashes and three dashes? .*tt.* --> t .*ttt.* --> tt Have you tried: atttutire --> with a return atttutire. "atttutire" "atttutire." etc... I understand that one would *not* want to correct 2 t's to one and 3 t's to 2, but trying this might give you some insight into the dash problems. In other words, is it only dashes, or is this a systemic problem?
Thanks to tommy27 and Lázló for your continued efforts in this area. I have downloaded attachment 104345 [details], placed it in my user profile. I also renamed it to acor_en-AU.dat (my locale) so I can see exactly which entries are being used and confirm that ONLY these entries are being used. Tested under v4.4.0.0.alpha0+ Build ID: 4d635dcae4d7275d04a17a0efc11b0531d5d0a82 TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-08-08_23:24:32 Results (trailing character is usually a SPACE; <E> = ENTER): a--b to n-dash OK a---b to n-dash+hyphen NOT OK “a--b” to n-dash OK “a---b” to m-dash OK a-- b to n-dash OK a--- b to m-dash OK a --b to n-dash OK a ---b to m-dash OK “a--” to n-dash OK “a---” to m-dash OK “--b” to n-dash OK “---b” to m-dash OK a--” to n-dash OK a---” to m-dash OK “--b to n-dash OK “---b to m-dash OK a--. to n-dash OK a---. to m-dash OK --b. to n-dash OK ---b. to m-dash OK a--<E> to n-dash OK a---<E> to 3 hyphens NOT OK --b<E> to n-dash OK ---b<E> to n-dash+hyphen NOT OK “a--<E> to n-dash OK “a---<E> to 3 hyphens NOT OK “--b<E> to 2 hyphens NOT OK “---b<E> to n-dash+hyphen NOT OK a--”<E> to 2 hyphens NOT OK a---”<E> to 3 hyphens NOT OK --b”<E> to 2 hyphens NOT OK ---b”<E> to n-dash+hyphen NOT OK --b”.<E> to n-dash OK ---b”.<E> to n-dash+hyphen NOT OK This would seem largely in keeping with what tommy27 has reported.
(In reply to comment #41) > (In reply to comment #40) ... > > Have you tried autocorrecting two t's *and* three t's like two dashes and > three dashes? > > .*tt.* --> t > .*ttt.* --> tt that kind of replacement would certainly cause conflicts. just for testing purposed I did: .*hh.* --> n .*hhh.* --> m .*nh.* --> m nh.* --> m .*nh .* --> m and I still have issue with the "triple h" not being directly replaced into "m" > Have you tried: > > atttutire --> with a return > atttutire. > "atttutire" > "atttutire." > > etc... all these examples are correctly replaced by my .*ttt.* --> tt so in my system the punctation marks have no negative effect on it. > I understand that one would *not* want to correct 2 t's to one and 3 t's to > 2, but trying this might give you some insight into the dash problems. > > In other words, is it only dashes, or is this a systemic problem? I think it's a general problem when there are entries for double and triple character to be replaced... what I don't understand is why sometimes LibO correctly replaces the "---" and other times it doesn't...
@Owen try this... just soon after one of the replacement fails, enter the autocorrect replacement table and add a random new autocorrect entry, then go back to the document and try one of the previous tests. in my PC, as I said before, this temporarily makes the --- work in any combination apart from ---<ENTER> which is not corrected at all
@Lazlo is it possible to push the middle wildcard rule to 4.2.x or 4.3.x daily builds as well? I'm suspecting that the inconsistent behaviour I'm describing could be due to some bug of the 4.4.x autocorrect engine so I'd wish to test that new feature on a more stable branches like 4.2.x or 4.3.x
(In reply to comment #44) > @Owen > > try this... just soon after one of the replacement fails, enter the > autocorrect replacement table and add a random new autocorrect entry, then > go back to the document and try one of the previous tests. > > in my PC, as I said before, this temporarily makes the --- work in any > combination apart from ---<ENTER> which is not corrected at all Thanks for pointing this out. I noticed this also, but mainly wanted to first report exactly what was working in the provided AutoCorrect configuration. If I add a new AutoCorrect entry as suggested (e.g., x -> y) these are the improves to the NOT OK entries in comment 42: a---b to m-dash OK ---b<E> to m-dash OK “--b<E> to n-dash OK “---b<E> to m-dash OK a--”<E> to n-dash OK a---”<E> to m-dash OK --b”<E> to n-dash OK ---b”<E> to m-dash OK ---b”.<E> to m-dash OK ... thus following still have problems: a---<E> to 3 hyphens NOT OK “a---<E> to 3 hyphens NOT OK
I Think The "triple hyphen enter" may represent a different issue
after many tests is clear that what causes the autocorrect problem is that we have troubles to handle at the same time autocorrect entries with "--" and "---" this is exactly Bug 67364 - FORMATTING: Autocorrect no longer functions correctly when replacing two hyphens if also an entry with three hyphens exists I marked Bug 67364 as DUPLICATE of current Bug 55292 but probably that was not a correct choice so I'm considering to reopen that bug. anyway I have an idea... why we don't stop "fighting" with this "---" corner case and start using a unique key combination for em-dash? my proposal would be to have: .*--.* for en-dash .*__.* for em-dash my first tests show that with only these 2 entries we have correct replacement of every possible text combination without any conflict and without need of setting all those mixed patterns of hyphen/endash that did not work as expected. please give feedback to my proposal.
(In reply to comment #48) > after many tests is clear that what causes the autocorrect problem is that > we have troubles to handle at the same time autocorrect entries with "--" > and "---" > > this is exactly Bug 67364 - FORMATTING: Autocorrect no longer functions > correctly when replacing two hyphens if also an entry with three hyphens > exists I initially found this somewhat disheartening but I now think it may lead us to a better solution. There is going to have to be a compromise, as you indicate but I think you did the right thing in closing bug 67364. IMO it is the same issue. > anyway I have an idea... why we don't stop "fighting" with this "---" corner > case and start using a unique key combination for em-dash? > > my proposal would be to have: > > .*--.* for en-dash > .*__.* for em-dash I agree we need a different pattern, but use of low line (U+005F) is NOT a good idea because this character is used extensively in basic text forms e.g., "Name: __________". We do not want these converting to em-dashes. I have tested this and I end up with the first pair of low lines being converted e.g. "—_____" in similar manner to the 3-hyphen problem. This is where the fun with characters begins. The main argument for "--" to en-dash and "---" to em-dash comes from TeX and some wiki notation but there really is no consensus about this and I think others need to understand this. Some publishers, universities, and wikis use "--" for em-dash. A 2-character pattern is more restrictive. These would seem to be a simple compromise: --- for en-dash === for em-dash ... but they too have problems. The use of both a different character for each and a 3-char AutoCorrect rule for consistency gives greater options in avoiding conflicts with mathematical notation such as == or ++ etc (=== is however used in JavaScript ... <boo hiss>). Unfortunately these both likely have a potential to conflict with the AutoCorrect for types of horizontal rule: https://help.libreoffice.org/Common/Drawing_Lines_in_Text#Automatic_lines_in_Writer Another compromise is to use mixed character combinations: -_- for en-dash -+- for em-dash These avoid conflict with the AutoCorrect for types of horizontal rule, but have no precedent in use case. I feel we may need to end up using HTML notation as a compromise: &ndash for en-dash &mdash for em-dash Not necessarily pretty or as easy (when typing) but the result after autocorrection is the same and there is GOOD precedent for this type of AutoCorrect rule. Like the low line (U+005F) example it also only requires two rules for effective conversion in all use cases. It is also relatively easy to remember.
(In reply to comment #49) > I feel we may need to end up using HTML notation as a compromise: > > &ndash for en-dash > &mdash for em-dash Of course, 2 seconds after I hit Save Changes I see a problem with this use case also. Editing HTML source code :-( If this is the case, then we are likely back to the unique patten idea as a compromise. Sigh.
I Think we sound keep -- for endash because is simple and something else for emdash like -.- or any other key combination
@ Tommy27. You've already made this bug much, much less problematic than it used to be. Even my copy of WordPerfect X3 doesn't solve this en-dash em-dash issue perfectly. I think you can pat yourself on the back and feel good about backproting the improvements you've already made. If other use cases don't quite work, I'll bet that continued effort will solve them too.
thanks Patrick but the credit for the fix goes to Lazlo Nemeth, who's the developer who coded it. If only we could work it out how to make the .*---.* work consistently we would have almost 100% of the issue solved. anyway, I agree that even with the current 4.4.x situation, a lot of improvement over 4.2.x is already achieved. so, again, I hope the fix could be backported to 4.2.x and 4.3.x as well where we could use .*--.* for en-dash and .*-.-.* or any other combination for em-dash.
guys maybe I've found a workaround that makes the "---" consistently turn into an em-dash avoiding the conflict with the .*--.* en-dash wildcard. you have to make the character that precedes the triple hyphen be part of the wildcard pattern. let's say .*0---.* --> 0— (em-dash) .*1---.* --> 1— .*a---.* --> a— .*b---.* --> b— ... .*z---.* --> z— .* ---.* --> — obviously you have to set multiple replacement for each of the keys that may precede an em-dash (numbers, space, letter, upper and lower case, etc.) so there are 2 choices to make the em-dash work: 1- set a specific key combination for em-dash that doesn't include the double or triple hyphen (i.e. .*-.-.*) 2- keep using the triple hyphen as an em-dash setting multiple replacements with the preceding key workaround the .*--.* wildcard is working fine as a replacement for en-dash and should be kept, and now we have 2 possible working wildcard patterns to make the em-dash do it's job. so I think this can be marked as FIXED again. feel free to reopen if you don't agree the only replacement that doesn't work is "---" ENTER which IMHO is a different issue and is probably related to the automatic lines feature in Writes as pointed out by Owen in comment 49: https://help.libreoffice.org/Common/Drawing_Lines_in_Text#Automatic_lines_in_Writer "if you start a new line in a Writer text document by typing three or more hyphen characters and press the Enter key, the characters are removed and the previous paragraph gets a line as a bottom border". I wonder if increasing the line trigger from 3 hyphens to 4 hyphens would solve this one as well. UX-advise neeeded before making a change.
Hi guys, I have some more thoughts about this and I think the final solution is to avoid the triple hyphen “---” at all as a replacement for em-dash. One reason is the known conflicts with the double hyphen “--” for en-dash that forces users to set a considerable amount of additional autocorrect replacements as a workaround to make the “---” work as an em-dash. So IMHO it's easier and more elegant to have just 2 autocorrect entries, this one .*--.* for en-dash and this one .*-.-.* (or whatever you like) for em-dash rather than dozens and dozens of workaround autocorrections. Moreover there's another possible conflict with the “apply border” option which is triggered again by typing 3 hyphens (or more) followed by Enter at the beginning of a line. So to avoid any kind conflicts I strongly suggest to use a specific and unique autocorrection for em-dash like and definitively forget about the “---” as an autocorrect replacement.
just to tell that I found a new easy workaround to make the "triple" hyphen work. you have to save just the .*---.* to em-dash in your language autocorrect list and the .*--.* to en-dash in the [All] autocorrect list. if the .*---.* and the .*--.* are in separate lists they won't collide anymore and all the possible cases except the ---[enter] will work as expected.
What do you mean by "language autocorrect"?
I mean the list of autocorrect replacements specific to any language. You see it when you click on "Tools/Autocorrect Options/Replace" and you click on the dropdown menu on the upper right. let's say your default writing language is English(USA) so you have to enter the .*---.* to em-dash replacement in that list then scroll all the way up the dropdown menu till you find the [All] list then enter the .*--.* to en-dash replacement in that list
(In reply to comment #56) > just to tell that I found a new easy workaround to make the "triple" hyphen > work. > > you have to save just the .*---.* to em-dash in your language autocorrect > list and the .*--.* to en-dash in the [All] autocorrect list. Hmm. Unfortunately under GNU/Linux using v4.4.0.0.alpha0+ Build ID: e379401618268ed7f7f5885a36b90e1f4f6cd4af TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-08-18_05:51:03 I get some strange side-effects for the indicated combination (.*--.* to en-dash in acor_und.dat and .*---.* to em-dash in acor_en-AU.dat). It appears to (again) be influenced by whether the AutoCorrect entries are edited during the session (or not) e.g., a---<E> to 3 hyphens NOT OK a---. to 3 hyphens NOT OK[1] “a---<E> to 3 hyphens NOT OK “a---. to 3 hyphens NOT OK[1] --b<E> to 2 hyphens NOT OK --b. to 2 hyphens NOT OK --b”<E> to 2 hyphens NOT OK --b”. to 2 hyphens NOT OK a--b. to em-dash NOT OK[1] “a--b. to em-dash NOT OK[1] a---b. to 3 hyphens NOT OK[1] “a---b. to 3 hyphens NOT OK[1] Usually a trailing full stop is a trigger for dash conversion, but it is more variable with this combination. [1] After editing the [All] AutoCorrect list (e.g., to add new x -> y entry) these tests work, and this previously working test fails: ---b. to 3 hyphens NOT OK There appears to still be something really strange going on with the replacement engine and the en-dash + em-dash entries. Testing this is not at all as straightforward as it seems.
Created attachment 105180 [details] Dash test pattern document LOv4400 2014-08-18 Here is the document I use for testing the patterns. Usage: 1. Delete the entries in the "On open" column (leave the column break at end). 2. Re-type each entry from the "Test pattern" column in the "On open" column. 3. Edit the AutoCorrect list (e.g., add a new entry) but do not close the file. 4. Repeat steps 1-2 for the "Edit AutoCorrect" column. 5. Observe differences.
Created attachment 105184 [details] acor_und.dat and acor_en-AU.dat dash autocorrect lists @Owen please retest using this minimal autocorrect lists containing just those single entries (.*--.* to en-dash in und and .*---.* in en-AU) those do work in my Win7x64 master build (*) even without editing the replacement lists. the only tests not working are all those with triple hyphen and enter, either the a--- or the ---b etc. etc. by the way thanks for the test file which will be very useful. (*) Build ID: f4246fab77113147b36706a1f3d93e8724ff826b TinderBox: Win-x86@42, Branch:master, Time: 2014-08-23_03:27:09
Created attachment 105408 [details] Dash test pattern results under GNU/Linux deb x86_64 LOv4400 2014-08-26 en-AU (In reply to comment #61) > @Owen > please retest using this minimal autocorrect lists containing just those > single entries (.*--.* to en-dash in und and .*---.* in en-AU) Thanks for these. I re-tested using the provided AC lists. They are the same as the one's I had already created, but it never hurts to re-test. I must say though, that I am becoming more disheartened by this change as the results are now even more dubious. The ENTER entries seem to be where the problem lies, but this varies according to whether the [All] AC list or the locale-specific AC list ([en-AU] in my case) is subsequently edited. This is what I did: 1. Replaced AC lists with provided lists. 2. Open LO to check lists are as-expected. 3. Open test file (attachment 105180 [details]). 4. Delete columns of pre-existing entries. 5. Enter second column of entries (On open). 6. Save file. 7. Edit [en-AU] list to create new entry (x->y). 8. Enter third column of entries (After edit [en-AU]). 9. Save file. 10. Edit [en-AU] list to remove previous new entry (x->y). 11. Close file. 12. Re-open test file (attachment 105180 [details]). 13. Edit [All] list to create new entry (x->y). 14. Enter fourth column of entries (After edit [All]). 15. Save file. 16. Edit [All] list to remove previous new entry (x->y). 17. Highlight errors using "Error" paragraph style. 18. Save / exit. Results are attached (easier to see than to iterate here). Tested under v4.4.0.0.alpha0+ Build ID: 37b9ea92ba81d74764a2345a9c75c65bfd272d2b TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-08-26_09:48:30
Di you turn off the replace dashes option before that test ? See comment 37
Created attachment 105450 [details] Dash test pattern results under GNU/Linux deb x86_64 LOv4400 2014-08-26 en-AU RD off (In reply to comment #63) > Di you turn off the replace dashes option before that test ? See comment 37 I did not, because I generally perform a user profile reset prior to testing, and forgot to unset this option. I also noticed that the "After edit [en-AU]" entry for ---b". is erroneous in the last test sheet (I entered ---b. by mistake). Unfortunately, that appears to be the good news. I have performed another full test, using the steps indicated in comment 62, this time with both Replace Dashes options OFF and while the results are different, they are no less problematic. Refer attached. If that where not enough, I am still getting rather unpredictable results using the build indicated in comment 62. Even going back and trying to re-test some of the results from that comment (i.e., with the Replace Dashes options ON) does not produce a repeatable result. There is something horribly flaky with the entire AutoCorrect facility (at least in the v4.4 builds) or this particular change IMO. I use the US International keyboard layout with dead keys (i.e., " + e = ë). Perhaps this is related? I will try re-testing later with a basic US layout to see if it makes any difference.
On reflection, I am wondering if the root of the problem is given in bug 62923: > By default in Writer, when two hyphens (--) are set between two words > they are converted into an em dash. This seems somewhat controversial / dubious practice (especially given the change proposed here) and yet the comments in the code from the related commits in that bug indicate: editeng/source/misc/svxacorr.cxx > // Replace [A-z0-9]--[A-z0-9] double dash with "emDash" or "enDash" > // [0-9]--[0-9] double dash always replaced with "enDash" > // Finnish and Hungarian use enDash instead of emDash. source/text/shared/01/06040100.xhp > If the hyphens are there between digits or the text has the Hungarian > or Finnish language attribute, then two hyphens in the sequence A--B > are replaced by an en-dash instead of an em-dash.<comment>i71908</comment> Why? I would say many more countries than just Finland and Hungary are affected by this. As I pointed out in comment 49: "Some publishers, universities, and wikis use "--" for em-dash", but there is a specific reason why. Here are a couple of prominent examples: APA 6th Ed., §4.13 Hyphenation: > If an em dash is not available on your keyboard, use two hyphens with > no space before or after. [...] if the en dash is not available on > your keyboard, [use] a single hyphen. Chicago Manual of Style 16th Ed., §2.13 Dashes: > For an em dash [...] type two hyphens (leave no space on either side). > [...] authors can generally avoid the en dash and use hyphens instead. In other words, those guides that do indicate this practice tend to use a SINGLE hyphen for en dash. We therefore have either: a--b = em dash a-b = en dash ...or: a---b = em dash a--b = en dash In both cases, there are distinct differences and there is NEVER a suggestion of using a--b to imply BOTH em dash and en dash, which is what we seem to have at present. It is better to have completely different patterns for replacement of different characters. I would also like to mention that these types of wildcard replacement entries should be at user discretion i.e., not included by default as this makes it easier for locale-specific customisations to be made as necessary. Given what I am seeing and the questions I have posed in this comment I am setting the status to REOPENED.
In LibreOffice 5.0, using the default :--: and :---: patterns, it's possible to insert n-dash and m-dash immediately, without any ambiguity, so I am closing this issue. Many thanks for your help!
thanks Lazlo. you finally knocked down this one!!! :-)
(In reply to László Németh from comment #66) > In LibreOffice 5.0, using the default :--: and :---: patterns, it's possible > to > insert n-dash and m-dash immediately, without any ambiguity, so I am closing > this issue. > > Many thanks for your help! Hi Laszlo, I'm not using 5 yet. Does the same replacement method work in 5 as works in 4? In other words: *.---.* ---> em-dash (fro example) etc... Do you have a recommended set of autocorrect replacement options? And does: "Test---" insert the correct, closing smart quote? This is really the only hang up I continue to notice, and strikes me as, probably, a separate issue.
@all users please remember to remove any other autocorrect replacement for en- and em-dash like those with .* wildcards to avoid conflicts with the new :--: and :---: replacements
(In reply to tommy27 from comment #69) > @all users > please remember to remove any other autocorrect replacement for en- and > em-dash like those with .* wildcards to avoid conflicts with the new :--: > and :---: replacements So wait (putting my 'stupid hat' on) the new option should look like this? Replace................With :--: ..................en-dash :---:..................em-dash
And is the ":" the new wildcard? Or does this only pertain to en and em-dash replacement?
the ":" is not exactly a wildcard you have to digit it in order to see the conversion as Lazlo said, type :--: to have en-dash and :---: to have em-dash this should work in any situation: isolated, beginning of word, end of word, middle of word grab a 5.0.x daily build and test it. at the moment those replacements are available in US and Hungarian autocorrect lists but you may copy in any other languages as well see release notes here: https://wiki.documentfoundation.org/ReleaseNotes/5.0#Emoji_and_in-word_replacement_support
(In reply to tommy27 from comment #72) > the ":" is not exactly a wildcard > you have to digit it in order to see the conversion > > as Lazlo said, type :--: to have en-dash > and :---: to have em-dash > > this should work in any situation: isolated, beginning of word, end of word, > middle of word > > grab a 5.0.x daily build and test it. > at the moment those replacements are available in US and Hungarian > autocorrect lists but you may copy in any other languages as well > > see release notes here: > https://wiki.documentfoundation.org/ReleaseNotes/5.0#Emoji_and_in- > word_replacement_support Okay, I tried it. Seriously? Guess I'm not impressed...but...congratulations??? An inelegant, brute force solution. Think I'll go back to what I had. Was working very well. I mean, I could have come up with this 10 years ago &--& or *--*, or #---#, etc... Or maybe I'm just not getting this...
it seems you did not understand how the :--: really works try adding an autocorrect replacement like #--# to en-dash and try to write super#--#man to see what happens now try to write super:--:man and see the difference... IMHO not the right attitude to comment a fix... if you have a better solution submit a patch
(In reply to tommy27 from comment #74) > it seems you did not understand how the :--: really works > > try adding an autocorrect replacement like #--# to en-dash > and try to write super#--#man to see what happens > > now try to write super:--:man and see the difference... > > IMHO not the right attitude to comment a fix... > if you have a better solution submit a patch Yeah, I didn't even have to bother with LibreOffice50. Accomplished the same thing in 4.x, like this: *.#---#*. ---> em-dash *.#--#*. ---> en-dash *.:---:*. ---> em-dash *.:--:*. ---> en-dash Works beautifully. Every time. It appears that all you've done is to spare me the trouble of having to enter the wild card (in autocorrect). In exchange for which, for that one time inconvenience, I now I have to type two extra characters every single time I want an en-dash or em-dash (that I didn't have to type before). So, yeah, I call that inelegant and brute force. But then again, maybe I'm not getting something...
you could also use a mixed approach .*---.* for em-dash (wildcard thing) and :--: for en dash (emoji thing) so you have to type the extra characters for en-dash sequence only what you probably don't have noticed yet is that the wildcard trigger the autocorrect only when you finish a word and type enter, dot or other punctation marks. the emoji immediately trigger the autocorrect even before finishing the word
//...the emoji immediately trigger the autocorrect even before finishing the word...// Okay. As long as the wildcards still work, I'm happy. Personally, from a writer's perspective, I'll continue using wildcards (which have worked beautifully for me). The very specific and particular use-cases where the extra typing involved in using emoji *might* be worth it --- I'm not seeing yet, but maybe I don't need to.
(In reply to vermontpoet from comment #77) > ... > As long as the wildcards still work, I'm happy. > ... ok, so keep using the wildcards if you prefer them. (In reply to vermontpoet from comment #75) > ... > > Accomplished the same thing in 4.x, like this: > > *.#---#*. ---> em-dash > *.#--#*. ---> en-dash > > .... you would obtain the same with: *.---*. ---> em-dash *.#--#*. ---> en-dash so the # for the em-dash is unnecessary