Bug 162514 - Spellchecker fails to recognize words ending in period (typically abbreviation)
Summary: Spellchecker fails to recognize words ending in period (typically abbreviation)
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
24.8.0.0 alpha1+
Hardware: All All
: medium normal
Assignee: Jonathan Clark
URL:
Whiteboard: target:25.2.0 target:24.8.5 target:24...
Keywords: bibisected, regression
Depends on:
Blocks: Spell-Checking
  Show dependency treegraph
 
Reported: 2024-08-18 18:32 UTC by Lars Jødal
Modified: 2024-12-04 10:21 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample Writer file with abbreviations including a period as last characer (20.29 KB, application/vnd.oasis.opendocument.text)
2024-11-26 21:20 UTC, Lars Jødal
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lars Jødal 2024-08-18 18:32:03 UTC
Description:
In LO Writer 24.8.0.3, the spellchecker seems to have stopped recognizing words ending with a period. In a language like Danish, an ending period is a necessary part of many abbreviations. I have found it more difficult to find English examples (abbreviations like "i.e." and "e.g." does not seem to be included in the English standard dictionary).

As an example, "dr." is an abbreviation in Danish for "doktor" (same word as English "doctor"). This abbreviation has been recognized in LO Writer version 24.2 and earlier, but not with 24.8.0, using the same dictionary.

Interestingly, the word IS recognized in Calc and Impress, so it seems to be a problem specifically for Writer.

Steps to Reproduce:
1. Open LO Writer.
2. To use my example: Type "dr." (or "etc.") and change spell-check language to Danish

Actual Results:
The word "dr." (or the word "etc.") is underlined in red as a spellchecking error.

Expected Results:
The word should be recognized by the spellchecker, as it is in the spellchecking dictionary. This is corroborated by right-click on the word for suggestions: "dr." (or "etc.") is among the suggestions.


Reproducible: Always


User Profile Reset: Yes

Additional Info:
Tested with LO 24.8.0.3 and 24.8.0.0.Alpha1:

Version: 24.8.0.3 (X86_64) / LibreOffice Community
Build ID: 0bdf1299c94fe897b119f97f3c613e9dca6be583
CPU threads: 4; OS: Windows 10 X86_64 (10.0 build 19045); UI render: Skia/Raster; VCL: win
Locale: da-DK (da_DK); UI: da-DK
Calc: threaded

Version: 24.8.0.0.alpha1 (X86_64) / LibreOffice Community
Build ID: a17e39caaf73108bee692d6f64a44c62f4066f1d
CPU threads: 4; OS: Windows 10 X86_64 (10.0 build 19045); UI render: Skia/Raster; VCL: win
Locale: da-DK (da_DK); UI: en-GB
Calc: threaded
Comment 1 Jeppe Bundsgaard 2024-09-12 21:56:19 UTC
I can confirm on: 
Version: 24.8.0.3 (X86_64) / LibreOffice Community
Build ID: 480(Build:3)
CPU threads: 8; OS: Linux 6.8; UI render: default; VCL: gtk3
Locale: da-DK (da_DK.UTF-8); UI: da-DK
Ubuntu package version: 4:24.8.0~rc3-0ubuntu0.24.04.1~lo2
Calc: threaded

It is a really serious regression.
Comment 2 Jeppe Bundsgaard 2024-09-14 11:29:53 UTC
I downloaded the latest Danish spellchecker here: https://stavekontrolden.dk/?dictionaries=1

I installed it in LibreOffice 24.8.1.2.
The correct abbreviation "osv." is considered a spelling mistake. 

I extracted da_DK.dic and .aff and used it in hunspell:

> jeppe@d46703:~$ hunspell -d /home/jeppe/Downloads/da_DK
> Hunspell 1.7.2
> osv.
> *

As you see, it is considered correct. This points to LibreOffice 24.8.1.2 having a bug.
Comment 3 Shantanu 2024-09-17 04:27:31 UTC
The abbreviation 'मा.' stands for the Marathi word 'माननीय' (respected). While it is included in the spell check, it has not been supported by any version I have tested so far.

Version: 24.8.1.2 (X86_64) / LibreOffice Community
Build ID: 87fa9aec1a63e70835390b81c40bb8993f1d4ff6
CPU threads: 1; OS: Windows 10 X86_64 (10.0 build 14393); UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

Version: 24.2.1.2 (AARCH64) / LibreOffice Community
Build ID: 420(Build:2)
CPU threads: 2; OS: Linux 6.5; UI render: default; VCL: gtk3
Locale: en-US (C.UTF-8); UI: en-US
Ubuntu package version: 4:24.2.1~rc2-0ubuntu0.22.04.1~lo1
Calc: threaded

Version: 7.6.7.2 (X86_64) / LibreOffice Community
Build ID: dd47e4b30cb7dab30588d6c79c651f218165e3c5
CPU threads: 1; OS: Windows 10.0 Build 14393; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

There are only a few such words in Marathi, which is why I had not reported it earlier.
Comment 4 Lars Jødal 2024-11-24 16:24:09 UTC
I have bibisected this bug with this commit as result:
 d311ff6407d88f0f18dc9ef6d05005e5f7473487 is the first bad commit
commit d311ff6407d88f0f18dc9ef6d05005e5f7473487
Author: Norbert Thiebaud <...@gmail.com>
Date:   Thu Jun 6 09:14:29 2024 -0700

    source 44699b3de37f07090ac6fee1cd97aa76036e9700

    source 44699b3de37f07090ac6fee1cd97aa76036e9700

 instdir/program/i18npoollo.dll | Bin 3181568 -> 3099136 bytes
 instdir/program/setup.ini      |   2 +-
 instdir/program/version.ini    |   2 +-
 3 files changed, 2 insertions(+), 2 deletions(-)
Comment 5 Lars Jødal 2024-11-26 21:20:45 UTC
Created attachment 197813 [details]
Sample Writer file with abbreviations including a period as last characer

Attached a sample file with explanations in English and example words in Danish. For checking it, install first the Danish dictionary, either from the LO distribution or from the official extensions repository. Link is provided in the file. (If you see no spelling errors among the Danish words, check that you have the Danish dictionary installed.)
Comment 6 Commit Notification 2024-11-29 17:41:54 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/f4fe6df6aa92573368c3fa0edb9fd03e64d9d059

tdf#162514 i18npool: Handle abbreviations in dictionary breakiterator

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Commit Notification 2024-12-01 10:51:15 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "libreoffice-24-8":

https://git.libreoffice.org/core/commit/a6e516fd615004d3025f2ffd696b6c28fa494cb4

tdf#162514 i18npool: Handle abbreviations in dictionary breakiterator

It will be available in 24.8.5.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Lars Jødal 2024-12-02 17:10:25 UTC
I can confirm that the bug has been resolved with the current Master version. Thanks for fixing this regression!

Version: 25.2.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: 44ccd392be12dad23e216fb3eb2c2e5b275eee75
CPU threads: 4; OS: Windows 10 X86_64 (10.0 build 19045); UI render: Skia/Raster; VCL: win
Locale: da-DK (da_DK); UI: en-US
Calc: threaded
Comment 9 Commit Notification 2024-12-04 10:21:26 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "libreoffice-24-8-4":

https://git.libreoffice.org/core/commit/0eea2bb4f7af0ef704b9bb90c619a3a414652d81

tdf#162514 i18npool: Handle abbreviations in dictionary breakiterator

It will be available in 24.8.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.