Bug 115494 - Wrong hyphenation dictionary for Bulgarian in libreoffice-dictionaries
Summary: Wrong hyphenation dictionary for Bulgarian in libreoffice-dictionaries
Status: RESOLVED INVALID
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Extensions (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-06 18:00 UTC by mlodewijck
Modified: 2018-02-07 22:26 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of dictionary opened in Notepad++ (28.27 KB, image/png)
2018-02-07 21:00 UTC, mlodewijck
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mlodewijck 2018-02-06 18:00:23 UTC
Description:
The hyphenation dictionary one receives for Bulgarian (hyph_bg_BG.dic) after download of libreoffice-dictionaries-5.4.4.2.tar.xz (goo.gl/XP2w58) is not the right one: the hyphenation patterns the file is showing up are for Hebrew (and the file is therefore ISO-8859-8 encoded), which is far from the Cyrillic script used for Bulgarian. Fortunately, the correct file can be found on the Apache OpenOffice Extensions website (goo.gl/vxP4yy) and in the LibreOffice development dictionaries repository (goo.gl/P5pN9L). Radostin Radnev, who is maintaining this file, had probably also not been made aware of this issue.

E-mail recently sent to László Németh and Radostin Radnev, but I have not received any reply to date.


Actual Results:  
_

Expected Results:
_


Reproducible: Always


User Profile Reset: No



Additional Info:


User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
Comment 1 Julien Nabet 2018-02-06 18:04:15 UTC
Andras/Hristo: thought you might be interested in this one since it concerns Bulgarian dictionary.
Comment 2 Hristo Simeonov Hristov 2018-02-07 18:54:22 UTC
Hi,
I did not uploaded Bulgarian dictionary as extension. I only fixed dictionary included into the LO source which can be installed with the LO itself.

what is this libreoffice-dictionaries-5.4.4.2.tar.xz file? How it is generated?
Comment 3 Andras Timar 2018-02-07 20:38:18 UTC
I think this bug in invalid. LibreOffice 5.4 source code release, libreoffice-dictionaries-5.4.4.2.tar.xz, from https://donate.libreoffice.org/home/dl/src/5.4.4/all/libreoffice-dictionaries-5.4.4.2.tar.xz?idx=2 (the reporter shortened this link with goo.gl) contains an older hyphenation dictionary, with the cp-1251 enconding. I have no idea why it was identified as Hebrew. In master branch there is a newer one, commit:

commit 06a89d28d971d60c7f7afabddaedec194618c786
Author: Stoyan Dimitrov <stoyan@gmx.com>
Date:   Sun Oct 1 15:19:45 2017 +0200

    tdf#112750: Fix disappearing/insufficient hyphenation points for bg-BG
    
    Also changed encoding of Bulgarian hyphenation dictionary from CP1251 to
    UTF-8
    
    Change-Id: Ic2add1198c281c83d2e8230a1b14273fd22d85f3
    Signed-off-by: Hristo Hristov <h.hristov@icobgr.info>
    Reviewed-on: https://gerrit.libreoffice.org/43000
    Reviewed-by: Adolfo Jayme Barrientos <fitojb@ubuntu.com>
    Tested-by: Adolfo Jayme Barrientos <fitojb@ubuntu.com>
Comment 4 mlodewijck 2018-02-07 21:00:54 UTC
Created attachment 139676 [details]
Screenshot of dictionary opened in Notepad++
Comment 5 mlodewijck 2018-02-07 21:03:23 UTC
Why "invalid"? Has anybody taken a close look at the file? Please, see the screenshot (attachment).
Comment 6 Andras Timar 2018-02-07 22:06:20 UTC
(In reply to mlodewijck from comment #5)
> Why "invalid"? Has anybody taken a close look at the file? Please, see the
> screenshot (attachment).

How can Notepad++ make a difference between two 8-bit encodings? It is not possible. The autodetection was wrong, that's it.
Comment 7 mlodewijck 2018-02-07 22:26:47 UTC
That's right.