Description: Dutch spell checker produces debatable suggestions Steps to Reproduce: 1. open the attached file 2. Right click the words & look at the suggestions Actual Results: wrongleszst = wrongleszet as first suggestion? Expected Results: Dutch? Reproducible: Always User Profile Reset: No Additional Info: Version: 7.1.0.0.alpha0+ (x64) Build ID: 6640d7f405d2970ba2825a9455926cc803284d01 CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: default; VCL: win Locale: lb-LU (nl_NL); UI: en-US Calc: CL
Created attachment 164896 [details] Example file
@Cor Not to happy with the spelling suggestions. The compositions are way off, if you ask me. But not sure how the whole dictionary thing works.. So to blame it on Hunspell or the dictionary. Already seen in 3.5.0.3
Source seems to be https://github.com/OpenTaal/opentaal-hunspell
(In reply to Buovjaga from comment #3) > Source seems to be https://github.com/OpenTaal/opentaal-hunspell I posted a ticket at github but well no response. And well this isn't unique to Dutch. German dictionary has also odd results (as far I recall, but have try around to find some odd stuff) I can't asses what causes the issue: hunspell dictionary or hunspell itself.
László Németh committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/57d79744c77eef96b4c2bd3b16e0a04317ffcf9e tdf#136306 offapi linguistic: add options to disable rule-based compounding It will be available in 7.6.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
(In reply to Commit Notification from comment #5) > László Németh committed a patch related to this issue. > It has been pushed to "master": > > https://git.libreoffice.org/core/commit/ > 57d79744c77eef96b4c2bd3b16e0a04317ffcf9e > > tdf#136306 offapi linguistic: add options to disable rule-based compounding I'm not noticing any differences. It the commit actually related to this bug?
Created attachment 184412 [details] Additional example file (English) Another example - in this case US English with lots of noise.. even present in LibreOffice 3.3.0 OOO330m19 (Build:6) tag libreoffice-3.3.0.4
(In reply to Telesto from comment #6) > (In reply to Commit Notification from comment #5) > > László Németh committed a patch related to this issue. > > It has been pushed to "master": > > > > https://git.libreoffice.org/core/commit/ > > 57d79744c77eef96b4c2bd3b16e0a04317ffcf9e > > > > tdf#136306 offapi linguistic: add options to disable rule-based compounding > > I'm not noticing any differences. It the commit actually related to this bug? Previously the suggested words were accepted, as correct words. Now it's possible to reject them, including the English hyphenated compound words with the new spell-checking options. The next step will be to remove them from the suggestions, too. Hunspell 1.7.2 update improved the strange suggestions a little bit: if there is a dictionary or 2-word rule-based dictionary words, rule-based closed compound words with 3 or more words won't be suggested. Note: There is no ideal solution for the problem, especially because limiting the suggestions can be very slow. I've added a limitation for it, but I had to remove it, see the code part - rv = pAMgr->compound_check(word, 0, 0, 100, 0, NULL, (hentry**)&rwords, 0, 1, 0); // EXT + int info = (cpdsuggest == 1) ? SPELL_COMPOUND_2 : 0; + rv = pAMgr->compound_check(word, 0, 0, 100, 0, NULL, (hentry**)&rwords, 0, 1, &info); // EXT + // TODO filter 3-word or more compound words, as in spell() + // (it's too slow to call suggest() here for all possible compound words) in https://github.com/hunspell/hunspell/commit/ff3591b0f76950f13d73123d03a03edd9a892945
(In reply to László Németh from comment #8) 1) I do get that there is no ideal solution for the problem, especially rule-based methodology. 2) The next step will be to remove them from the suggestions, too. Oh, well work in progress: nice :-). 3) So the O-rule mentioned in bug 139319 comment 7 isn't involved? 4) The LibreOffice 4.2 suggestion often better, in the sense of not containing gibberish . The suggestion for: "bovnmatige"(Dutch) doesn't contain "boonmatige" (gibberish) using 4.2 LibreOffice 4.4.0.3 *does* suggest "boonmatige" (and all newer versions do) Same for the example at bug 139319. No suggestion "sprachgebundene, sprachgebunden" in 4.2. It's hard to say something sensible/generalized. LibreOffice 4.2 isn't consistently better. Sometimes it's better, sometimes same oddity today as before. Sometimes it's worse, compared today. Good in 7.6 bad in 4.2 opinipeilingen vacinatieprogramma Odd suggestions but no invented words/ gibberish with 7.6 as with 4.2 overengekomen gedetalleerde Gibberish with 7.6 not seen with 4.2 hoofdlettergevoligheid [gibberish: 'hoofdlettergelovigheid'] bvenmatig [gibberish: 'beenmatig' 'boenmatig' 'ovenmatig'] bovnmatige [gibberish: 'boonmatige']