Bug 164366 - invalid mongolian hunspell dictionary: easy fix
Summary: invalid mongolian hunspell dictionary: easy fix
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Localization (show other bugs)
Version:
(earliest affected)
25.2.0.0 alpha0+
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:25.8.0 target:24.8.5 target:25...
Keywords:
Depends on:
Blocks:
 
Reported: 2024-12-18 04:50 UTC by Robert Muir
Modified: 2024-12-18 15:10 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
patch for the problem (718 bytes, patch)
2024-12-18 04:51 UTC, Robert Muir
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Muir 2024-12-18 04:50:16 UTC
Description:
commit 31bc2a1104a1cd175f900902f994c76dea35c763 to the LibreOffice/dictionaries repository results in an invalid .aff file.

according to the hunspell manual page, the format of REP rules is:
REP number_of_replacement_definitions
REP what replacement

After this commit, the mongolian dictionary specifies 3619 replacement definitions but then 3621 follow that.

So the fix is simple, one line change: https://github.com/LibreOffice/dictionaries/pull/46

I apologize for being unable to figure out the gerrit setup: I'll attach a patch file to this bug report containing the fix as well.

Steps to Reproduce:
1. Look at mn_MN.AFF file and witness the REP 3619 specifying 3619 definitions
2. Count that 3621 REP definitions follow it.

Actual Results:
REP 3619 is specified

Expected Results:
REP 3621 is specified


Reproducible: Always


User Profile Reset: Yes

Additional Info:
Problematic commit: https://github.com/LibreOffice/dictionaries/commit/d1696029d8923ae697cb2d6d4d7d69791b1943f2

Background: We support parsers for the hunspell format in Apache Lucene, and our CI system detected the dictionary issue.
Comment 1 Robert Muir 2024-12-18 04:51:54 UTC
Created attachment 198167 [details]
patch for the problem

Simple patch file to fix the REP count, so that it correctly matches the number of REP rules that follow.
Comment 2 Commit Notification 2024-12-18 08:45:28 UTC
Robert Muir committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/dictionaries/commit/c3ff53711dcac4bdec24f23a2c1f9712a0833b67

tdf#164366: fix wrong REP count in mongolian .aff file
Comment 3 Xisco Faulí 2024-12-18 08:47:00 UTC
Hi Robert,
Thanks for the patch. I've applied it for you.
Closing as RESOLVED FIXED.
Comment 4 Commit Notification 2024-12-18 08:48:35 UTC
Robert Muir committed a patch related to this issue.
It has been pushed to "libreoffice-24-8":

https://git.libreoffice.org/dictionaries/commit/adbbc9e9850d609371810fe91054760d878982a3

tdf#164366: fix wrong REP count in mongolian .aff file
Comment 5 Commit Notification 2024-12-18 08:48:38 UTC
Robert Muir committed a patch related to this issue.
It has been pushed to "libreoffice-25-2":

https://git.libreoffice.org/dictionaries/commit/edb2228a1d8caf12d3daf93aae7cf1dd4284ec42

tdf#164366: fix wrong REP count in mongolian .aff file
Comment 6 Xisco Faulí 2024-12-18 09:00:03 UTC
Hi Robert, in https://gerrit.libreoffice.org/c/dictionaries/+/176291 i upgraded the dictionary from https://github.com/bataak/dict-mn/archive/refs/tags/2024.11.08.zip.
I think the real fix should be applied to https://github.com/bataak/dict-mn/
Comment 7 Xisco Faulí 2024-12-18 09:03:29 UTC
Meanwhile, could you please send a license statement to the mailing list ? https://wiki.documentfoundation.org/Development/GetInvolved#License_statement
Thanks in advance
Comment 8 Robert Muir 2024-12-18 13:46:44 UTC
> I think the real fix should be applied to https://github.com/bataak/dict-mn/

I agree, I looked into this at a glance, but the count is different there!

https://github.com/bataak/dict-mn/blob/main/mn_MN/mn_MN.aff#L49

I'll take another pass and try to understand what is happening and get it fixed there.

> Meanwhile, could you please send a license statement to the mailing list ? 

Done!
Comment 9 Xisco Faulí 2024-12-18 14:50:22 UTC
(In reply to Robert Muir from comment #8)
> > I think the real fix should be applied to https://github.com/bataak/dict-mn/
> 
> I agree, I looked into this at a glance, but the count is different there!
> 
> https://github.com/bataak/dict-mn/blob/main/mn_MN/mn_MN.aff#L49

it seems it was changed in a more recent commit: https://github.com/bataak/dict-mn/commit/874bc3e9e818b62396ac2f9b264364da3ec576ba 

> 
> I'll take another pass and try to understand what is happening and get it
> fixed there.
> 
> > Meanwhile, could you please send a license statement to the mailing list ? 
> 
> Done!

Thanks a lot
Comment 10 Robert Muir 2024-12-18 15:10:37 UTC
Thank you! That is the context I was missing. I saw the bataak commit date over Nov 8 and also saw the LibreOffice/dictionaries d169602 commit date of Nov 8 and didn't put two and two together.

I will verify the latest dictionary on bataak side and send them PR if necessary.