Bug 101962 - Words with combining characters fail spelling check (win only)
Summary: Words with combining characters fail spelling check (win only)
Status: ASSIGNED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
5.2.1.2 release
Hardware: All Windows (All)
: medium minor
Assignee: László Németh
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Spell-Checking Diacritics
  Show dependency treegraph
 
Reported: 2016-09-07 12:47 UTC by Dennis Roczek
Modified: 2024-10-14 22:59 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
the first word "fühlen" gets marked as incorrect by the German spell checker (9.56 KB, application/vnd.oasis.opendocument.text)
2016-09-07 12:47 UTC, Dennis Roczek
Details
correct document (9.55 KB, application/vnd.oasis.opendocument.text)
2016-09-07 13:03 UTC, Dennis Roczek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dennis Roczek 2016-09-07 12:47:18 UTC
Created attachment 127190 [details]
the first word "fühlen" gets marked as incorrect by the German spell checker

ü (0x75 + unicode binding 0x0308) is not recognized as a valid ü when using spelling correction, see attached document.


Version: 5.2.1.2 (x64)
Build-ID: 31dd62db80d4e60af04904455ec9c9219178d620
CPU-Threads: 4; BS-Version: Windows 6.19; UI-Render: Standard; 
Gebietsschema: de-DE (de_DE); Calc: group
Comment 1 Dennis Roczek 2016-09-07 13:03:46 UTC
Created attachment 127193 [details]
correct document
Comment 2 Dennis Roczek 2016-09-07 13:36:45 UTC
cannot reproduced under os x, maybe a win only problem
Comment 3 Buovjaga 2016-10-03 09:18:12 UTC
Confirmed with German spellchecking installed.

Win 7 Pro 64-bit, Version: 5.2.1.2 (x64)
Build ID: 31dd62db80d4e60af04904455ec9c9219178d620
CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; 
Locale: fi-FI (fi_FI); Calc: CL
Comment 4 steve 2016-10-03 11:08:14 UTC
Can not reproduce
Version: 5.2.2.2
Build ID: 8f96e87c890bf8fa77463cd4b640a2312823f3ad
CPU Threads: 4; OS Version: Mac OS X 10.12; UI Render: default; 
Locale: de-DE (de_DE.UTF-8); Calc: group
Comment 5 steve 2016-10-03 11:11:06 UTC
Can not reproduce on linux
Version: 5.1.4.2
Build-ID: 1:5.1.4-0ubuntu1
CPU-Threads: 1; BS-Version: Linux 4.4; UI-Render: Standard; 
Gebietsschema: de-DE (de_DE.UTF-8)
Comment 6 Aron Budea 2016-10-04 01:23:49 UTC
There's a rendering issue as well if OpenGL is enabled, I opened bug 102944 on it.
Steve, if are able to enable OpenGL in Linux or OSX, could you see if you can confirm it?
Comment 7 Xisco Faulí 2016-10-05 08:27:38 UTC
Related to bug 99677?
Comment 8 Dennis Roczek 2016-10-05 22:58:19 UTC
hah, that would mean that this is a regression, wait a few minutes *testing*.

(some minutes later)

@xisco: does not seem so. :-(
Version: 5.0.6.3
Build-ID: 490fc03b25318460cfc54456516ea2519c11d1aa
Gebietsschema: de-DE (de_DE)
(portable apps version) shows same behavior as 5.2.2 which should not have the same problem as mentioned in bug 99677.
Comment 9 Olivier R. 2017-01-17 06:06:53 UTC
Confirmed.
This is not a rendering issue.

The spellchecker Hunspell doesn’t recognize combining characters by default. We had the same issue for the French dictionary.

It can be easily solved by adding special commands in the affixes file which describes how the German dictionary behaves.

These commands are simple: they simply replace characters with combining diacritics by the usual ones before the words are parsed by the spellchecker.

Example:
ICONV 2
ICONV ü ü
ICONV ë ë
etc.

Create the list of characters to be replaced at input.
The first one is the character with combining diacritics, the second one is the usual one.

These commands have to be added to the affixes files of all German dictionaries:
https://cgit.freedesktop.org/libreoffice/dictionaries/tree/de/de_DE_frami.aff
https://cgit.freedesktop.org/libreoffice/dictionaries/tree/de/de_AT_frami.aff
https://cgit.freedesktop.org/libreoffice/dictionaries/tree/de/de_CH_frami.aff

You can have a look at what is done in the French dictionary:
https://cgit.freedesktop.org/libreoffice/dictionaries/tree/fr_FR/fr.aff#n133
(Search for ICONV lines)
Comment 10 Olivier R. 2017-01-17 06:24:17 UTC
I was curious to see why it worked on Linux, so I tried, and it didn’t work either on Linux (Linux Mint). But you can’t see it if you don’t install the German packages.
Comment 11 QA Administrators 2018-01-18 03:33:52 UTC Comment hidden (obsolete)
Comment 12 Dennis Roczek 2018-04-01 11:33:17 UTC Comment hidden (obsolete)
Comment 13 QA Administrators 2019-04-02 02:49:14 UTC Comment hidden (obsolete)
Comment 14 Dennis Roczek 2020-05-06 16:43:08 UTC
Still no change, "ü" are marked invalid.

Version: 6.4.2.2 (x64)
Build-ID: 4e471d8c02c9c90f512f7f9ead8875b57fcb1ec3
CPU-Threads: 4; BS: Windows 10.0 Build 18363; UI-Render: Standard; VCL: win; 
Gebietsschema: de-DE (de_DE); UI-Sprache: de-DE
Calc: CL
Comment 15 Dennis Roczek 2020-05-06 16:46:26 UTC
(In reply to Olivier R. from comment #9)
> The spellchecker Hunspell doesn’t recognize combining characters by default.
> We had the same issue for the French dictionary.
Basically this is then a bug in Hunspell --> adding Németh László

> It can be easily solved by adding special commands in the affixes file which
> describes how the German dictionary behaves.
> 
> These commands are simple: they simply replace characters with combining
> diacritics by the usual ones before the words are parsed by the spellchecker.
> 
> Example:
> ICONV 2
> ICONV ü ü
> ICONV ë ë
> etc.
> 
> Create the list of characters to be replaced at input.
> The first one is the character with combining diacritics, the second one is
> the usual one.
> 
> These commands have to be added to the affixes files of all German
> dictionaries:
> https://cgit.freedesktop.org/libreoffice/dictionaries/tree/de/de_DE_frami.aff
> https://cgit.freedesktop.org/libreoffice/dictionaries/tree/de/de_AT_frami.aff
> https://cgit.freedesktop.org/libreoffice/dictionaries/tree/de/de_CH_frami.aff
> 
> You can have a look at what is done in the French dictionary:
> https://cgit.freedesktop.org/libreoffice/dictionaries/tree/fr_FR/fr.aff#n133
> (Search for ICONV lines)
Comment 16 Dennis Roczek 2020-05-06 16:47:39 UTC
(In reply to Olivier R. from comment #9)
> It can be easily solved by adding special commands in the affixes file which
> describes how the German dictionary behaves.
> 
> These commands are simple: they simply replace characters with combining
> diacritics by the usual ones before the words are parsed by the spellchecker.
> 
> Example:
> ICONV 2
> ICONV ü ü
> ICONV ë ë
> etc.
> 
> Create the list of characters to be replaced at input.
> The first one is the character with combining diacritics, the second one is
> the usual one.
and for the workaround adding Karl Zeiler.
Comment 17 QA Administrators 2022-05-07 03:32:34 UTC Comment hidden (obsolete)
Comment 18 Dennis Roczek 2024-10-14 16:04:42 UTC
Problem still exists with

Version: 24.8.2.1 (X86_64) / LibreOffice Community
Build ID: 0f794b6e29741098670a3b95d60478a65d05ef13
CPU threads: 4; OS: Windows 10 X86_64 (10.0 build 19045); UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: de-DE
Calc: threaded
Comment 19 László Németh 2024-10-14 22:59:37 UTC
@Dennis: thanks for the report! I'm going to check it.