Bug Hunting Session
Bug 83037 - → and ← and ↔ autocorrect collisions
Summary: → and ← and ↔ autocorrect collisions
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
4.4.0.0.alpha0+ Master
Hardware: Other Windows (All)
: medium normal
Assignee: Caolán McNamara
URL:
Whiteboard: target:4.4.0 target:5.3.0 target:5.2.3
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-25 05:45 UTC by tommy27
Modified: 2017-05-01 17:00 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description tommy27 2014-08-25 05:45:25 UTC
tested under Win7x64 using LibO 4.2.6.2, 4.3.0.4 and 4.4.x master

STEPS TO REPRODUCE

1- open a new document 
2- select language English (USA)
3- be sure autocorrect replacement is active (“Tools/AutoCorrect Options/Options” flag both M and T in “use replacement table”
4- be sure an English (USA) autocorrect list is activated in the user profile (“Tools/AutoCorrect Options/Replace” scroll the language list to English (USA) and add a new custom autocorrect entry
5- type: -> , -- > , <- , <-- 

with 4.2.x and 4.3.x you will see: → , → , ← , ←- 

with 4.4.x you'll see: → , -→ , ← , ←- 

this is caused by a collision of the following autocorrect replacements in 4.4.x :

.*->.*     to    →     (unicode U+2192)
.*-->.*    to    → 
.*<-.*     to    ←     (unicode U+2190)
.*<--.*    to    ←


are present in LibO 4.4.x master in the default autocorrect list for the these languages:

Catalan, Czech, Danish, Dutch (NL), Dutch (BE), English (AU), English (US), English (UK), Finnish, French, German, Lithuanian, Luxembourgish, Polish, Portuguese (PT), Portuguese (BR), Russian, Spanish, Swedish, Vietnamese

whilst .*->.* and .*<-.* are correctly replaced respectively as → and ←
.*-->.* and .*<--.* get erroneously replaced as -→ and  ←-

this is due to collision between the .*->.* and .*-->.* replacement since the -> pattern is already contained in the --> pattern

the middle autocorrect wildcard pattern is actually only present in 4.4.x master and not in 4.2.x and 4.3.x

that wildcard it was added to special characters by Lazlo Nemeth to fix Bug 81571 - AutoCorrect incorrectly requires and keeps spaces around special characters

see his committ: http://cgit.freedesktop.org/libreoffice/core/commit/?id=b3b6361c555e54ce852d62c80c0bb3d19c1ec78f


while that fix is ok for other replacements such as .*(r).* to ® in order to allow thinkgs like Sony(R) to be replaced as Sony®, it causes collisions between -> and --> 

So I suggest the to remove the wildcards from those entries since in 4.2.x and 4.3.x the old “ -> to → ” and “ --> to → ” replacements with no wildcard have no collision between them.

To avoid the autocorrect collision of *.<-.* and *.<--.* removing the wildcards is not enough since the bug is also present with simple <- and <-- and in 4.2.x and 4.3.x 

another part of the bug is indeed that the - is a trailing character (like “.” or “space”) and triggers autocorrect replacement before you complete the typing sequence.

So you have to remove the wildcard from *.<-.* and entirely replace them with .*←-.* to ← 

In summary we should change this current 4.4.x replacements:
.*->.*     to    →     (unicode U+2192)
.*-->.*    to    → 
.*<-.*     to    ←     (unicode U+2190)
.*<--.*    to    ←

with:
->      to    →     (unicode U+2192)
-->     to    → 
<-      to    ←     (unicode U+2190)
.*←-.*  to    ←
Comment 1 Commit Notification 2014-08-26 09:27:45 UTC
Laszlo Nemeth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=34827767b1551f7a61bcd53947255ad2d2a9e5da

fdo#83037 fix autocorrect collisions of short and long ASCII arrows



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 2 László Németh 2014-08-26 09:42:27 UTC
I have restored the original work of the single --> sequence.

Bad replacement of the single <-- is an interesting problem, maybe it doesn't related to the wildcard patch.

Thanks for your report!
Comment 3 tommy27 2014-08-26 13:11:07 UTC
(In reply to comment #2)
> I have restored the original work of the single --> sequence.
> 
> Bad replacement of the single <-- is an interesting problem, maybe it
> doesn't related to the wildcard patch.

the root of this problem is that the hyphen "-" is an autocorrect trigger like "." , "," , ":" , ";" , "space" and "enter"

that's why, even without wildcards you have a collision between <- and <--

it because when you type the last "-" of "<--" this triggers the "<-" autocorrect.
basically typing <-- is like typing <-.  so that's why 

this "hyphen autocorrect trigger" issue is the cause of other autocorrect collision which are described here: 

Bug 67364 - FORMATTING: Autocorrect no longer functions correctly when replacing two hyphens if also an entry with three hyphens exists

Bug 81301 - "apply border" won't work because of "hyphen" to "en-dash" autocorrect collision (nl-NL and nl-BE locales only) (edit)

the only way to avoid all these 3 bugs would be to tweak the code and add an exception to the "hyphen autocorrect trigger thing" and use it as a trigger only if preceded by letters, number or any other symbol rather than another hyphen.

LibO already supports some kind of autocorrect exceptions like "Words with Two Initial Capitals" or "Abbreviations (words with no subsequent capitalization"

so probably we should do something like "no autocorrect trigger after --"
I don't know if this is technically feasible... what's your opinion Lazlo?
Comment 4 tommy27 2014-08-31 19:03:25 UTC
Hi Lazlo,
another existing collision is the one that happens between:

<--> to ↔
and 
.*<-.* to ←
and
.*->.* to → 

(adding this to summary notes as well)

so when you type <--> you obtain instead of ↔ you get ← → 

this is again due to the "hyphen autocorrect trigger issue"

basically the <--> sequence is not managed by autocorrect engine as a single replacement, but is splitted as <- + -> since the first hyphen triggers autocorrect to replace <- into ← and then -> as →

I wonder if you could tweak the code to avoid autocorrect triggering when an hyphen is followed by another hyphen, adding an exception similar to the one used to fix Bug 33899 - Autocorrection replaces dates with fractions

moreoved if that " -- no autocorrect" could be achieved it would also represent a fix for aforementioned Bug 67364 and Bug 81301
Comment 5 tommy27 2014-09-28 06:19:49 UTC
(In reply to comment #4)
> ....
> 
> I wonder if you could tweak the code to avoid autocorrect triggering when an
> hyphen is followed by another hyphen, adding an exception similar to the one
> used to fix Bug 33899 - Autocorrection replaces dates with fractions
> 
> moreoved if that " -- no autocorrect" could be achieved it would also
> represent a fix for aforementioned Bug 67364 and Bug 81301

hi Lazlo, have you read my suggestion in previous comment?
do you think that it's technically feasible?
If my idea is right it would fix 3 bugs in a single shot: current Bug 83037, Bug 67364 and Bug 81301.
Comment 6 tommy27 2014-10-16 02:11:39 UTC
@Lazlo

the committ I refer is:
http://cgit.freedesktop.org/libreoffice/core/commit/?id=10176b1b6f4801d78695451a1eccabf32701e175

I'm not a developer but I suppose that if you were able to "disactivate" the "/" as an autocorrect trigger under some circumstances (i.e. dates like 1/2/14) it should be possible to do something similar with "-" in order to avoid the autocorrect to kick in when an hyphen is followed by another hyphen "--"
Comment 7 tommy27 2015-09-06 11:33:05 UTC
<--  corrected into  ←-  instead of  ← 
and <-->  corrected into ← →  instead of ↔  

still exists in recent LibO 5.1.0 alpha 

both issues are caused by the autocorrect engine kicking in as soon as you type an hyphen as explained in detail in comment 3 and comment 4
Comment 8 tommy27 2016-09-18 09:43:37 UTC
adding Caolan McNamara to CC list 

he recently provided a fix for Bug 96369 - ordinal numbers suffixes autocorrect replacements triggered in between words

so I'm asking him to evaluate if a similar approach could be use to fix this autocorrect collision as well, and specifically use the possibile solution I suggested in comment 3 which is:

add an exception to the "hyphen autocorrect trigger thing" and use it as a trigger only if preceded by letters, number or any other symbol rather than another hyphen.

if this solution is technically feasible it would resolve at least 3 bugs about hyphen autocorrect collision: Bug 83037, Bug 81301 and Bug 67364
Comment 9 Caolán McNamara 2016-09-21 19:31:55 UTC
What we could do is to go back revisit bug 55693 which make a hyphen a word break character, and not do that unless certain conditions hold that are good enough for the needs of bug 55693
Comment 10 Commit Notification 2016-09-21 19:40:27 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=0bfefc0b396bd61cc5f508bf86afc12cfadaa483

Resolves: tdf#83037 <-->  corrected into ← →  instead of ↔

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Caolán McNamara 2016-09-21 19:46:35 UTC
If there are no other problems after testing let me know and we can backport to 5-2
Comment 12 tommy27 2016-09-22 15:53:01 UTC
thanks Caolan.
your fix works and would be nice to backport it to 5.2.x
no more autocorrect collision typing <- or <-- or <--> or -> or --> 

VERIFIED under Win7 x64 using
LibO 5.3.0.0.alpha0+
Build ID: 4c70a1a6666a079872b8f1966bd398e924dc1d1a
CPU Threads: 8; OS Version: Windows 6.1; UI Render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2016-09-22_06:54:24
Locale: it-IT (it_IT); Calc: CL
Comment 13 Commit Notification 2016-09-27 09:12:36 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "libreoffice-5-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=3d83d4eb6bf3ed8f0fc52d2edd327a6a931a7f47&h=libreoffice-5-2

Resolves: tdf#83037 <-->  corrected into ← →  instead of ↔

It will be available in 5.2.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 V Stuart Foote 2017-05-01 13:09:42 UTC Comment hidden (off-topic)
Comment 15 V Stuart Foote 2017-05-01 14:32:12 UTC Comment hidden (off-topic)
Comment 16 Tiago Santos 2017-05-01 16:44:37 UTC Comment hidden (off-topic)
Comment 17 V Stuart Foote 2017-05-01 17:00:04 UTC Comment hidden (off-topic)