115494 – Wrong hyphenation dictionary for Bulgarian in libreoffice-dictionaries

Bug 115494 - Wrong hyphenation dictionary for Bulgarian in libreoffice-dictionaries

Summary: Wrong hyphenation dictionary for Bulgarian in libreoffice-dictionaries

Status:	RESOLVED INVALID

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Extensions (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	All All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-02-06 18:00 UTC by mlodewijck
Modified:	2018-02-07 22:26 UTC (History)
CC List:	4 users (show)

See Also:
Crash report or crash signature:

Attachments
Screenshot of dictionary opened in Notepad++ (28.27 KB, image/png) 2018-02-07 21:00 UTC, mlodewijck	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description mlodewijck 2018-02-06 18:00:23 UTC

Description:
The hyphenation dictionary one receives for Bulgarian (hyph_bg_BG.dic) after download of libreoffice-dictionaries-5.4.4.2.tar.xz (goo.gl/XP2w58) is not the right one: the hyphenation patterns the file is showing up are for Hebrew (and the file is therefore ISO-8859-8 encoded), which is far from the Cyrillic script used for Bulgarian. Fortunately, the correct file can be found on the Apache OpenOffice Extensions website (goo.gl/vxP4yy) and in the LibreOffice development dictionaries repository (goo.gl/P5pN9L). Radostin Radnev, who is maintaining this file, had probably also not been made aware of this issue.

E-mail recently sent to László Németh and Radostin Radnev, but I have not received any reply to date.


Actual Results:  
_

Expected Results:
_


Reproducible: Always


User Profile Reset: No



Additional Info:


User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0

Comment 1 Julien Nabet 2018-02-06 18:04:15 UTC

Andras/Hristo: thought you might be interested in this one since it concerns Bulgarian dictionary.

Comment 2 Hristo Simeonov Hristov 2018-02-07 18:54:22 UTC

Hi,
I did not uploaded Bulgarian dictionary as extension. I only fixed dictionary included into the LO source which can be installed with the LO itself.

what is this libreoffice-dictionaries-5.4.4.2.tar.xz file? How it is generated?

Comment 3 Andras Timar 2018-02-07 20:38:18 UTC

I think this bug in invalid. LibreOffice 5.4 source code release, libreoffice-dictionaries-5.4.4.2.tar.xz, from https://donate.libreoffice.org/home/dl/src/5.4.4/all/libreoffice-dictionaries-5.4.4.2.tar.xz?idx=2 (the reporter shortened this link with goo.gl) contains an older hyphenation dictionary, with the cp-1251 enconding. I have no idea why it was identified as Hebrew. In master branch there is a newer one, commit:

commit 06a89d28d971d60c7f7afabddaedec194618c786
Author: Stoyan Dimitrov <stoyan@gmx.com>
Date:   Sun Oct 1 15:19:45 2017 +0200

    tdf#112750: Fix disappearing/insufficient hyphenation points for bg-BG
    
    Also changed encoding of Bulgarian hyphenation dictionary from CP1251 to
    UTF-8
    
    Change-Id: Ic2add1198c281c83d2e8230a1b14273fd22d85f3
    Signed-off-by: Hristo Hristov <h.hristov@icobgr.info>
    Reviewed-on: https://gerrit.libreoffice.org/43000
    Reviewed-by: Adolfo Jayme Barrientos <fitojb@ubuntu.com>
    Tested-by: Adolfo Jayme Barrientos <fitojb@ubuntu.com>

Comment 4 mlodewijck 2018-02-07 21:00:54 UTC

Created attachment 139676 [details]
Screenshot of dictionary opened in Notepad++

Comment 5 mlodewijck 2018-02-07 21:03:23 UTC

Why "invalid"? Has anybody taken a close look at the file? Please, see the screenshot (attachment).

Comment 6 Andras Timar 2018-02-07 22:06:20 UTC

(In reply to mlodewijck from comment #5)
> Why "invalid"? Has anybody taken a close look at the file? Please, see the
> screenshot (attachment).

How can Notepad++ make a difference between two 8-bit encodings? It is not possible. The autodetection was wrong, that's it.

Comment 7 mlodewijck 2018-02-07 22:26:47 UTC

That's right.