Bug 65535 - Correcting a misspelled word with a comment in the middle erases the comment (see comment 38)
Summary: Correcting a misspelled word with a comment in the middle erases the comment...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.0.0.3 release
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: BSA target:7.3.0 target:7.2.1
Keywords: bibisected, regression
: 77406 141927 (view as bug list)
Depends on:
Blocks: Spell-Checking Writer-Comments
  Show dependency treegraph
 
Reported: 2013-06-08 08:10 UTC by stragu
Modified: 2021-08-10 13:48 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of the issue (138.53 KB, image/png)
2013-06-08 08:10 UTC, stragu
Details
Failed spell check in a commented word (22.49 KB, image/png)
2014-10-10 18:29 UTC, Tupe
Details
a comment is seen as the end (or start) of a word..... (87.77 KB, image/png)
2014-10-11 11:49 UTC, Luuk
Details
bug as seen in LO Writer 6.2 (94.70 KB, image/png)
2019-06-05 03:37 UTC, stragu
Details
F7_after adding comment (60.79 KB, image/jpeg)
2019-06-07 20:04 UTC, Luuk
Details
0001-sw-spell-do-not-suggest-replacements-if-it-deletes-a.patch (4.82 KB, patch)
2021-07-16 11:13 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description stragu 2013-06-08 08:10:26 UTC
Created attachment 80511 [details]
Screenshot of the issue

Problem description: 

When adding a comment in a the middle of a word, the word is then recognised as misspelt by the spellchecker. Furthermore, running the Spelling and Grammar tool and correcting the broken word makes the comment disappear.

Steps to reproduce:
1. With Auto Spellcheck on, write a sentence
2. Click in the middle of a word and add a comment (ctrl + alt + C)
3. Write something in the comment field and click out
4. The word is underlined in red
5. Run the Spelling and Grammar tool (F7) and correct the supposedly misspelt word as proposed.

Current behaviour:

The comment breaks the words, makes it seen as misspelt by the spellchecker, and running the the Spelling and Grammar tool and trying to fix the supposedly misspelt word deletes the comment.

Expected behaviour:

The word where the comment is inserted should not be recognised as misspelt by the spellchecker in the first place.
Operating System: Ubuntu
Version: 4.0.3.3 release
Comment 1 Luuk 2013-06-08 11:52:23 UTC
I can confirm this happens, and it happens on Windows too.
Comment 2 stragu 2013-10-30 04:42:56 UTC
This is still the case in 4.1.2.3
Comment 3 Luuk 2013-12-12 09:19:53 UTC
As simple 'workaround' might be to add the comment not in the middle of a word, but at the end (or at the start).

Is it possible to do this automatically? 
This way spelling checker should not give problems. If the comment is at the start of the word, than adding letters to the start of that word should be done AFTER the comment, and if the comment is at the end of the word, it should be done BEFORE that comment ;)
Comment 4 Luuk 2013-12-12 09:20:52 UTC
oops forgot to say it's still a problem in  4.1.3.2  (OS:Windows7)
Comment 5 Tupe 2014-10-10 18:28:19 UTC
Issue is still there in 4.3.1.2

OS: Win7

See new attachment also
Comment 6 Tupe 2014-10-10 18:29:26 UTC
Created attachment 107675 [details]
Failed spell check in a commented word
Comment 7 Luuk 2014-10-11 11:49:08 UTC
Created attachment 107713 [details]
a comment is seen as the end (or start) of a word.....
Comment 8 Urmas 2014-10-12 14:34:00 UTC
The comment is a character within the word and cannot be ignored by the spellchecker.
Also, it is desirable to have comments everywhere within the word, not only at the start/end.
Comment 9 Luuk 2014-10-12 14:44:56 UTC
"Also, it is desirable to have comments everywhere within the word, not only at the start/end."

Can you clarify that? I'm not sure what you mean by that....

I would call this a BUG if the place of a comment influences the working of the spellings checker.
Comment 10 Gordo 2015-03-12 01:49:27 UTC
When using the Spelling and Grammar tool, pressing either the Ignore Once or Ignore All on the word with the comment, will result in an extra letter being added to the word where the comment used to be.

Also, this happens with range comments within the word.  Having a range comment from the beginning of a word to the middle of a word will be marked as misspelt,  but the Spelling and Grammar tool does not delete the comment when ignored.  Having a range comment from the middle of a word to the end of the word does not show as being misspelled.

All of this is assuming that the word you typed is correct.  If the word is actually misspelled, then the only way to correct it is to run the Spelling and Grammar tool again or manually right click on it.

Summary:
Spellchecker deleting comment.
Spellchecker adding extra letter because of comment.

I propose that Automatic Spell Checking and Spelling and Grammar be adjusted to take into account a comment anchor between letters.

See also #32773.

Version: 4.4.1.2
Build ID: 45e2de17089c24a1fa810c8f975a7171ba4cd432
Comment 11 Gordo 2015-05-02 10:07:43 UTC
*** Bug 77406 has been marked as a duplicate of this bug. ***
Comment 12 Olivier R. 2015-12-06 07:20:58 UTC
This bug has already been fixed previously, so this is a regression.
https://bugs.documentfoundation.org/show_bug.cgi?id=32773
Comment 13 Luuk 2015-12-06 13:39:30 UTC
The word count in a current version of a LibreOffice document is OK.
Bug 32773 talks about word count, therefore this bug (65535) is not a regression of bug 32773.
Comment 14 Aron Budea 2016-05-22 00:45:31 UTC
Works in 3.6.0.4, broken since 4.0.0.3, occurs in master, too.
Comment 15 stragu 2017-08-18 23:12:08 UTC
Reproduced in the current still version of LO:

Version: 5.3.5.2
Build ID: 50d9bf2b0a79cdb85a3814b592608037a682059d
CPU Threads: 2; OS Version: Linux 3.13; UI Render: default; VCL: kde4; Layout Engine: new; 
Locale: en-GB (en_GB.UTF-8); Calc: group

Comment in middle of word still makes spellcheck see the word as misspelled.
"Correcting" the word with spellcheck still makes the comment disappear.

However, I could not reproduce what Gordo described, with the tool adding a letter to the word if it was ignored.
Comment 16 Cliff Carlson 2017-12-05 04:57:14 UTC
Version 5.4.3; OS: MAC OS. 

Issue is still present
Comment 17 Muhammet Kara 2018-06-04 12:30:22 UTC

    ce3917fd9ea98ae2089cf4dca9cc742c10935e7f is the first bad commit
    commit ce3917fd9ea98ae2089cf4dca9cc742c10935e7f
    Author: Bjoern Michaelsen <bjoern.michaelsen@canonical.com>
    Date:   Mon Dec 10 01:49:34 2012 +0000
     
        source-hash-b0f170d7df9cff12535d2ecfd146b32b745a8ef8
       
        commit b0f170d7df9cff12535d2ecfd146b32b745a8ef8
        Author:     Caolán McNamara <caolanm@redhat.com>
        AuthorDate: Wed Jul 18 17:00:52 2012 +0100
        Commit:     Caolán McNamara <caolanm@redhat.com>
        CommitDate: Wed Jul 18 17:27:09 2012 +0100
       
            0xFFF9 is a better choice for CH_TXTATR_INWORD than 0x0002.
       
            a) the default properties for the code point make it not split a word it
            appears in into two different words in any break mode we have. Which is what we
            want from a CH_TXTATR_INWORD
       
            b) unicode TR#20 gives for the interlinear annotation anchor: "What to do if
            detected: In a proxy context or browser context, remove U+FFF9", so when we
            need to strip it from text to run that text through e.g. the spellchecker or
            word counting then there's a solid precedent for stripping it
       
            In addition I *do* want the footnote placeholder to break the word it appears
            in, that gives the desired wordcount and cursor travelling behaviour
       
            The BREAKWORD and other *random* selection of CH_TXTATR are still odd choices,
            and there's way too many of them.
       
            Change-Id: I930ff8ff806af448829bc1a1ae6cb92053e9a284
     
    :100644 100644 9d81536e7984b62a455c0e28fd5742cf487e4184 68f099b9e694f9e81548a97f2ca9d3f0d5db65b8 M  ccache.log
    :100644 100644 795b25464739ab134d722aa7e7d34d6535537138 e8290ad8beece52dc0da61c0d2ce30d3182ed694 M  commitmsg
    :100644 100644 c248b041df516d7e0f62b85c84ac0ebb59545ff3 630d940d889565aa459edecb3919bda5e64f0588 M  dev-install.log
    :100644 100644 acfe8830a332b6879cfcec6f74f902d0fb64c562 32e06662778eeba496c378cc426ad2f20e708d32 M  make.log
    :040000 040000 8b82c14ad77224021d3034b6517a3a67f938064f 36b63cc40dd49eb461a38f39f246f13776b0e07d M  opt
Comment 18 QA Administrators 2019-06-05 02:54:33 UTC Comment hidden (obsolete)
Comment 19 stragu 2019-06-05 03:37:05 UTC
Created attachment 151919 [details]
bug as seen in LO Writer 6.2

Confirmed in both 6.2 and 6.3 branches:

- Commenting inside a word / on a range inside a word makes the word being detected as misspelled (you might have to keep writing a bit more text for spellcheck to notice it)
- The spellcheck window shows the word with a "IAA" rectangle where the comment is located
- Correcting the word as suggested removes the comment

I can't reproduce Gordo's issue though (with a letter being added inside the word when ignoring).

Details about versions:

Version: 6.2.4.2
Build ID: 1:6.2.4-0ubuntu0.18.04.1~lo1
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: en-AU (en_AU.UTF-8); UI-Language: en-GB
Calc: threaded

Version: 6.3.0.0.alpha1
Build ID: 547edd20e527fb02900f6174973770d26306e2e7
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: en-AU (en_AU.UTF-8); UI-Language: en-US
Calc: threaded
Comment 20 Luuk 2019-06-07 20:03:28 UTC Comment hidden (obsolete)
Comment 21 Luuk 2019-06-07 20:04:18 UTC Comment hidden (obsolete)
Comment 22 Luuk 2019-06-07 20:05:37 UTC Comment hidden (obsolete)
Comment 23 stragu 2019-06-08 12:17:34 UTC
Hi Luuk

I also noticed that it might not straight away underline the word in red, but I believe that if you keep editing your document, it will eventually register it and mark it as misspelled with the red wiggly line (also using 6.2).

Would you mind confirming that?
Comment 24 Luuk 2019-06-08 13:04:43 UTC
Indeed, when starting to type some red wiggly line is shown.
Comment 25 Luuk 2019-06-08 13:07:20 UTC
And, when choosing 'Correct' the comment is lost.
Comment 26 stragu 2019-07-11 12:55:45 UTC
This is still present in:

Version: 6.3.0.1
Build ID: 41ac97386aba908b6db860cfb4cfe2da871886ae
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: en-AU (en_AU.UTF-8); UI-Language: en-US
Calc: threaded
Comment 27 stragu 2019-10-22 13:48:36 UTC
Reproduced as described in comment 19, using:

Version: 6.4.0.0.alpha1
Build ID: cc57df8f942f239d29cb575ea5a7cb01405db787
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: en-AU (en_AU.UTF-8); UI-Language: en-US
Calc: threaded
Comment 28 Aron Budea 2020-07-23 15:22:21 UTC
This might be a complete red herring, but recently a bug was found with how search was handling special characters (eg. soft-hyphens, comment or footnote markers), and was fixed with the following commit:
https://cgit.freedesktop.org/libreoffice/core/commit/?id=d8270636a57e7dc68ede51308c380e2098f765d7

The commit is only relevant for search/replace(!), but perhaps the logic is similar in both cases, and if someone gets around to looking at this bug, they might find something useful in that reference.
Comment 29 stragu 2020-10-15 13:27:51 UTC
Still reproducible with:

Version: 7.0.2.2
Build ID: 00(Build:2)
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Ubuntu package version: 1:7.0.2_rc2-0ubuntu0.18.04.2
Calc: threaded
Comment 30 stragu 2021-02-02 13:58:01 UTC
Still reproducible as described in original description and comment 19, with:

Version: 7.1.0.3 / LibreOffice Community
Build ID: f6099ecf3d29644b5008cc8f48f42f4a40986e4c
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Could this be considered as a data loss issue, given that running the spellcheck and correcting the words erroneously reported as misspelled would delete the comment?
Comment 31 Aron Budea 2021-04-28 16:46:26 UTC
*** Bug 141927 has been marked as a duplicate of this bug. ***
Comment 32 stragu 2021-06-22 12:17:32 UTC
Reproduced as in Comment 19 with:

Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: e3086b58eb5427d520b86c185f9d911bb6f7a3a0
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-06-21_15:37:11
Calc: threaded

and:

Version: 7.2.0.0.beta1 / LibreOffice Community
Build ID: c6974f7afec4cd5195617ae48c6ef9aacfe85ddd
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded
Comment 33 Justin L 2021-07-13 19:28:56 UTC
This is a fairly minimal problem, because if the user doesn't like having words marked as misspelled because of a comment in the middle, they can simply not add mid-word comments.

It is a bit of a catch-22. It would take a LOT of effort to avoid deleting the comment when a misspelled word is fixed (comment 25), so it would seem the best and safest approach would just be to ignore any such words. (But perhaps already solved for similar situations like track-change-deleted-stuff and hidden-text?)

On the other hand, if the word truly is misspelled, then the user is going to be upset that it isn't red-squigglied.

Perhaps the best approach would be:
-strip the comment marker out of the word before submitting to spell checker
-offer zero replacement choices (to avoid deleting the comment) forcing the user to fix manually.

Bug 66043 sounds so similar that I am surprised that it didn't fix this too. Bug 74177 also has some common elements.
Comment 34 Justin L 2021-07-14 13:28:53 UTC
(In reply to Justin L from comment #33)
> I am surprised that it didn't fix this too.
It kindo-of did. The word is not flagged as misspelled IF there is a hidden character, or track-changes somewhere in the same paragraph.

This will be rather complicated, and very prone to unintended consequences.
Comment 35 stragu 2021-07-14 23:39:13 UTC
Note that the end of a commented range also breaks the spellcheck.

I can think of two options to mostly fix this without breaking other things:

When the cursor is inside a word, or when a range is selected in which the end of the range is inside a word:

Option 1: ...for no range selection, an inserted comment is moved automatically to the closest word boundary. For a range selection, the end of the range is moved to the end boundary of the last word. (A bit like what Luuk proposed in Comment 3.)

Option 2: ...for no range selection, an inserted comment automatically applies to the whole word's range instead. For a range selection, the end of the range is moved to the end boundary of the last word.

Both these solutions would circumvent the issue of marking the word as misspelled.

I am not sure how much these workarounds would be a problem for different languages and scripts, but I think it would work acceptably well for languages using the Latin script with short words.

I would prefer option 2, as I suspect many people who insert a comment inside a word do so to refer to the whole word, and because it seems more consistent in its behaviour.

Urmas (Comment 8) mentioned that it might be undesired behaviour to not be able to comment in the middle of the word, but I find it hard to think of cases where it would be that much of an issue (especially when compared to the problem it would fix: broken spellcheck, and comment dataloss).

The issue with Justin's solution to "strip the comment character before submitting to spellcheck" (Comment 34) is that the deleting of comments when correcting misspelled words would remain, as far as I understand it.
Comment 36 Justin L 2021-07-15 05:52:49 UTC
(In reply to Justin L from comment #33)
> perhaps already solved for similar situations like
> track-change-deleted-stuff and hidden-text?)
No. Hidden text is just deleted (which makes sense).

Track-changes is an entirely different beast and its whole purpose it to keep deleted content around, so it too can just delete everything. So that won't help in this situation either.

Note that it will be best to check for comments BEFORE calling lcl_MaskRedlinesAndHiddenText, since that junction just adds MORE CH_TXTATR_INWORD's that would need to be tested to see if they are comment markers.
Comment 37 Justin L 2021-07-15 09:04:48 UTC
(In reply to Justin L from comment #33)
> It is a bit of a catch-22. It would take a LOT of effort to avoid deleting
> the comment when a misspelled word is fixed.
This is ALREADY a problem, so it won't be something introduced by trying to ignore comment-anchors during the spell check.

Proposed patch for fixing "the word is then recognized as misspelt" at https://gerrit.libreoffice.org/c/core/+/118972.

Still needed is a patch for "correcting the broken word makes the comment disappear" in case the word truly needed correcting.
Comment 38 Justin L 2021-07-15 19:28:14 UTC
(In reply to Justin L from comment #37)
> Still needed is a patch for "correcting the broken word makes the comment
> disappear" in case the word truly needed correcting.

Right-clicking on the word-with-mid-comment already brings up a different context menu without anything related to spell checking. So at the moment that situation is safe.

Running the spell-check dialog is very unsafe however (especially since you don't really see the context of the word and notice that it has a comment attached). That is MUCH harder to deal with. What looked promising was UPN_MAX_NUMBER_OF_SUGGESTIONS = 0, but that seems to be completely ignored (a bug? Well actually it seems to be used in a different context only). UPH_IS_USE_DICTIONARY_LIST = false doesn't give what is desired either.

The SpellChecker::GetProposals is what gives the choices to 
txtedt.cxx's  SwTextNode::Spell calls to 
pArgs->xSpeller->spell( rWord, static_cast<sal_uInt16>(eActLang), aPropVals );

Trying to trick it with LANGUAGE_NONE doesn't work because it just returns as "not a spelling error".

I don't think I'll be able to solve this last part by tweaking any existing stuff. A work-around hack will need to be created (and I don't like doing that, especially not in areas I am unfamiliar with).
Probably need to make a special function so that you can create an empty Proposal via SpellAlternatives::CreateSpellAlternatives( rWord, LANGUAGE_DONTKNOW, SpellFailure::SPELLING_ERROR, Sequence< OUString >( 0 ) )
Comment 39 stragu 2021-07-16 01:14:58 UTC
Justin, what do you think of the suggestions in Comment 35?
If that was implemented, there would never be a comment character inside a word, and therefore correcting a word would never make the comment disappear (according to what I tested so far).

If the drawbacks of that solution are too big (i.e. if we think that users will want to have the option to comment at a precise spot inside a word), another option could be to move the comment only when the word is corrected. The steps would be:

- User clicks to correct the word / picks a replacement
- LO checks if a comment character is inside the string
- If TRUE, move comment character to end of string
- Replace the string with correction

It would make sense, as there is no straight-forward way to decide where exactly the comment should end up being located in the new string, which might be completely different to the original in length and characters.
Comment 40 Justin L 2021-07-16 06:13:34 UTC
(In reply to stragu from comment #39)
> Justin, what do you think of the suggestions in Comment 35?
Certainly not.
There are many reasons why a user might want to insert a comment in the middle of a word. But the most important reason not to change is because it has already worked that way. It would break existing documents and limit user's choices.

The most important thing is to never let your individual problem drive the solution. The only reason to change existing behaviour is if it is fundamentally wrong.
Comment 41 Justin L 2021-07-16 11:13:15 UTC
Created attachment 173630 [details]
0001-sw-spell-do-not-suggest-replacements-if-it-deletes-a.patch

This patch is a bit too hack-y to actually submit, but it is probably the best I can do.
Comment 42 stragu 2021-07-16 12:12:48 UTC
(In reply to Justin L from comment #40)

> It would break existing documents and limit
> user's choices.

Ah I didn't even think of that! That makes sense :)
Comment 43 BogdanB 2021-07-31 10:37:01 UTC
Retested. The same in
Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: 5aa74aa1e6fac571f99146ebcb6adc9feb1459ad
CPU threads: 4; OS: Linux 5.8; UI render: default; VCL: gtk3
Locale: ro-RO (ro_RO.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-07-28_19:35:14
Calc: threaded
Comment 44 Commit Notification 2021-08-02 15:23:00 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/a6555eb809e2580562cd431085e35a23b7d47f9a

tdf#65535 sw spell: ignore comment marker when checking spelling

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 45 Commit Notification 2021-08-09 17:31:04 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "libreoffice-7-2":

https://git.libreoffice.org/core/commit/ae63df06b49c0439f8a7532656dbf6063dccdc4f

tdf#65535 sw spell: ignore comment marker when checking spelling

It will be available in 7.2.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 46 stragu 2021-08-10 13:34:26 UTC
Thank you, Justin.

Testing in the following version:

Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: b2130ad3fda841c68a0436fbddf29bcedede0af5
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-08-09_13:03:07
Calc: threaded

... I can see that your commit does fix the part about "a valid word with a comment ancher inside is recognised as misspelt", which is great!

The remaining issue, however, is "correcting a misspelt word with a comment anchor inside it will delete the comment".