Bug 161737 - Regression: spell checking triggered by NNBSP
Summary: Regression: spell checking triggered by NNBSP
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
24.2.4.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:25.2.0 target:24.8.0.0.beta2
Keywords: bibisectRequest, regression
Depends on:
Blocks: Word-Line-Break
  Show dependency treegraph
 
Reported: 2024-06-21 20:00 UTC by Maxime
Modified: 2024-07-08 15:09 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
A text document using the NNBSP character. (13.66 KB, application/vnd.oasis.opendocument.text)
2024-06-22 04:05 UTC, Maxime
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Maxime 2024-06-21 20:00:54 UTC
In LO 24.2.4.2 using a narrow no-break space (NNBSP) next to any word is interpreted as a spelling mistake—contrary to the previous version (24.2.3.2).
Using a classic no-break space (NBSP) do not trigger any error.

Steps to Reproduce:
1. Create a new document in LibreOffice (Writer, Calc... do not matter)
2. Enable spell checking (if it was not the case yet)
3. Write a sentence and replace a space with a NNBSP (U+202F)
4. Wait for spell checking

Actual Results:
The word next to the NNBSP is now considered as misspelled.

Expected Results:
Words should be checked without considering NNBSP, NBSP, etc.

Reproducible: Always

User Profile Reset: No

Version: 24.2.4.2 (X86_64) / LibreOffice Community
Build ID: 420(Build:2)
CPU threads: 8; OS: Linux 6.1; UI render: default; VCL: kf5 (cairo+xcb)
Locale: fr-FR (fr_FR.UTF-8); UI: fr-FR
Debian package version: 4:24.2.4-1~bpo12+1
Calc: threaded
Comment 1 m_a_riosv 2024-06-21 21:56:03 UTC
Please attach a sample file, showing the issue.
Comment 2 Maxime 2024-06-22 04:05:28 UTC
Created attachment 194905 [details]
A text document using the NNBSP character.

Here is a show-case of the issue.
Best regards.
Comment 3 V Stuart Foote 2024-06-22 10:11:49 UTC
Confirmed. Present current master, not present at 24.2.3.2 (433d9c2) release.

Maybe something more recent impacted NNBS handling as breakiterator for spellchecks. There was Jonathan's work for bug 49885, but that didn't get backport to 24.2


Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 7d93b585f5366ce22aaf174a1b463b004739f588
CPU threads: 8; OS: Windows 10 X86_64 (10.0 build 19045); UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL threaded
Comment 4 Heiko Tietze 2024-06-24 08:16:40 UTC
(In reply to Maxime from comment #0)
> Words should be checked without considering NNBSP, NBSP, etc.
Depending on the language you may need NNBSP to type a word.
Comment 5 Maxime 2024-06-24 16:01:53 UTC
(In reply to Heiko Tietze from comment #4)
> (In reply to Maxime from comment #0)
> > Words should be checked without considering NNBSP, NBSP, etc.
> Depending on the language you may need NNBSP to type a word.

You're right. Let me rectify: “words [used to] be checked without considering NNBSP, NBSP, etc.”
As for use cases, I know for sure that neither German nor French need NNBSP to type a word; it is only related to typography.
Regarding non-European languages, you got a point. According to Wikipedia: “It was introduced in Unicode 3.0 for Mongolian, to separate a suffix from the word stem without indicating a word boundary.” Yet, most other text editors still treat NNBSP as break-iterator for spellchecks.
In addition, there is no mention of such change in LO release notes for 24.2 (neither for 24.2.4 RC1 & RC2), that is why I described this behaviour as a bug.
Comment 6 V Stuart Foote 2024-06-27 11:27:57 UTC
Regression from commit 44699b3de37f07090ac6fee1cd97aa76036e9700
"tdf#49885 BreakIterator rule upgrades"

https://gerrit.libreoffice.org/c/core/+/169618
Comment 7 Commit Notification 2024-06-27 14:49:13 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/6e002da1615b52cda4e9331e87878458b1fe9677

tdf#161737 i18npool: fix fake spelling alarms with NNBSP

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Commit Notification 2024-06-27 14:50:16 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/2b9fee5a3e9d1eae65932fb0f08f0216f8a30cf7

tdf#161737 i18npool: fix bad word selection with NNBSP

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 László Németh 2024-06-27 14:53:39 UTC
Fixed in master, started to back-port to 24.8.

@Maxime and all: thanks for your bug report and feedback!

Full commit descriptions:

tdf#161737 i18npool: fix fake spelling alarms with NNBSP

Fix word break by excluding narrow no-break space at the
end of the words for spell checking.

This was a problem e.g. for French, where (automatically? or
manually) inserted narrow no-break space is used to get correct
typography before exclamation and question marks, also after and
before guillemets, if the OpenType/Graphite font doesn't have this
feature).

Regression from commit 44699b3de37f07090ac6fee1cd97aa76036e9700
"tdf#49885 BreakIterator rule upgrades".

Note: this fixes also the problem, when digits separated
by NNBSP thousand separator weren't handled by spell checking,
alarming fake spelling mistakes, when "Check words with numbers"
was enabled in Tools->Options->Languages and Locales->Writing Aids.
(TODO: at the case of thousand separators, remove NBSP by the
linguistic module or by the spell checking dictionaries to allow
to check numbers with thousand separators and with correct suffix.)

=======================

tdf#161737 i18npool: fix bad word selection with NNBSP

Fix word breaking rules also for editing. Previously
the word was selected with the following narrow no-break
space, e.g. at French words before exclamation and question
marks (where narrow no-break space allows to get correct
typography, if the OpenType/Graphite font doesn't have
this feature).

Add this and the previous fixes for Hungarian, which
handled by extra word-breaking rule files.

Follow-up to commit 6e002da1615b52cda4e9331e87878458b1fe9677
"tdf#161737 i18npool: fix fake spelling alarms with NNBSP".
Comment 10 Commit Notification 2024-07-08 15:09:18 UTC
László Németh committed a patch related to this issue.
It has been pushed to "libreoffice-24-8":

https://git.libreoffice.org/core/commit/fc2bba731459b5ba2ed88fc8212f90b6ae08c15a

tdf#161737 i18npool: fix fake spelling alarms with NNBSP

It will be available in 24.8.0.0.beta2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2024-07-08 15:09:20 UTC
László Németh committed a patch related to this issue.
It has been pushed to "libreoffice-24-8":

https://git.libreoffice.org/core/commit/eb815bdee64f9eb9527cb58e6b75f0bd69184c71

tdf#161737 i18npool: fix bad word selection with NNBSP

It will be available in 24.8.0.0.beta2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.