Bug 63460 - Northern/Central/Southern Kurdish
Summary: Northern/Central/Southern Kurdish
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Localization (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: medium enhancement
Assignee: Eike Rathke
URL:
Whiteboard: target:4.2.0 target:4.3.0
Keywords:
Depends on: 37496
Blocks:
  Show dependency treegraph
 
Reported: 2013-04-12 09:32 UTC by Andras Timar
Modified: 2019-04-20 17:21 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andras Timar 2013-04-12 09:32:01 UTC
Locales for LibreOffice only include Northern Kurdish, and it's named
"Kurdish" instead of the correct "Kurdish, Northern" (Kurdish encompasses
all variants of the language, it's unspecific). Central Kurdish and
Southern Kurdish need to be added too since they are basically different
languages. Central and Southern Kurdish use a modified arabic script by
default and are RTL whereas Northern Kurdish by default uses latin script
and is LTR.

I will suggest the mappings without country codes, this is because the
variants of Kurdish are similar throughout each country and there's not
much of a use in separating things by country. If country codes are an
absolute must I can suggest them anyway.

The three letter 639-3 codes should be used in the locale settings for the
Kurdish variations and not the two letter 639-1 codes, the suggested names
in the following list should be used and not simply "Kurdish".

---

Codes and mappings (ISO 639-3):

Mapping: ckb
Info: RTL, unicode
Language list:
English: Kurdish, Central
 French: Kurde, Centrale
German: Kurdisch, Zentral
 Kurdish, Northern: Kurdí Sorani
Swedish: Kurdiska, Central
 Turkish: Kürtçe, Orta

Mapping: ckb-Latn
Language list:
English: Kurdish, Central (latin)
 French: Kurde, Centrale (latin)
German: Kurdisch, Zentral (lateinisch)
 Kurdish, Northern: Kurdí Sorani (latînî)
Swedish: Kurdiska, Central (latinsk)
 Turkish: Kürtçe, Orta (latin)

Mapping: kmr
Language list:
English: Kurdish, Northern
 French: Kurdes du nord
German: Kurdisch, Nördliches
 Kurdish, Northern: Kurdí Kurmancî
Swedish: Kurdiska, Nordlig
 Turkish: Kürtçe, Kuzey

Mapping: sdh
Info: RTL, unicode
Language list:
 English: Kurdish, Southern
French: Kurdes du sud
 German: Kurdische, Süd
Kurdish, Northern: Kurdí Xwarig
 Swedish: Kurdiska, Sydlig
Turkish: Kürtçe, Güney

Mapping: sdh-Latn
Language list:
 English: Kurdish, Southern (latin)
French: Kurdes du sud (latin)
 German: Kurdische, Süd (lateinisch)
Kurdish, Northern: Kurdí Xwarig (latînî)
 Swedish: Kurdiska, Sydlig (latinsk)
Turkish: Kürtçe, Güney (latin)

note: About the ku-TR and ku-SY that are currently there: change the
language list name for them from "Kurdish" to "Kurdish, Northern". This
means that there will be three "Kurdish, Northern" in total (ku-TR, ku-SY
and kmr). All language tools and localizations should be updated and
assigned to kmr (and not to ku), once that's done I guess ku-TR and ku-SY
can be removed. If someone still wants to differentiate between Northern
Kurdish in different countries, then the mappings kmr-XX can be used (and
not ku-XX as ku is the macrolanguage kurdish and not specifically Northern
Kurdish).

note 2: sdh-Latn and ckb-Latn are LTR and use the same script as kmr. I
think they use unicode, but I'm not 100%.

note 3: In the language list, when I write "Kurdish, Northern:", I mean the
localization that is currently named "Kurdish". Like I said, this
localization should be renamed to "Kurdish, Northern".

---

Microsoft added Central Kurdish to Windows 8 and Office 2013. Here are the
LCIDs for Windows 8 (they're probably the same for Office 2013):
Central Kurdish, 0x0092, ku
Central Kurdish, 0x7c92, ku-Arab
Central Kurdish Iraq, 0x0492, ku-Arab-IQ

(source:
http://download.microsoft.com/download/9/5/E/95EF66AF-9026-4BB0-A41D-A4F81802D92C/[MS-LCID].pdf
)

A Word 2013 document written in sorani/arabic script and marked as "Central
Kurdish Iraq" is assigned the code "ku-Arab-IQ" in the xml, just as the
Microsofts LCID list says (so 0x0492 should be linked to "ckb").
Text written in latin can't be assigned "Central Kurdish Iraq" in Word
2013, this indicates that "Central Kurdish Iraq" ("ku-Arab-IQ") is reserved
for sorani/arabic script only. Perhaps then, "ku" is reserved for Central
Kurdish latin (if true then 0x0092 should be linked to "ckb-Latn") and
"ku-Arab" for Central Kurdish throughout all countries – ie Iraq+Iran –
meaning that 0x7c92 should be linked to "ckb". But it's hard to say at the
moment.
Comment 1 Kevin Scannell 2013-04-12 11:51:23 UTC
I agree this is an important change to make.  Just one comment, regarding Sahand's suggestion to change the language list names for ku-TR and ku-SY from "Kurdish" to "Kurdish, Northern".  I guess this is OK as a temporary hack but really the best solution is to move the existing localizations, spell checkers, etc. for Northern Kurdish to the correct code "kmr", once and for all.
Comment 2 Eike Rathke 2013-05-07 10:15:37 UTC
This needs BCP47 language tags implementation to be fully solved, adding dependency on bug 37496.
Comment 3 Sahand T. 2013-09-17 07:13:50 UTC
(In reply to comment #2)
> This needs BCP47 language tags implementation to be fully solved, adding
> dependency on bug 37496.

If you add this, use these updated language lists and not the above ones:

Mapping: ku (ISO 639-1)
Language list:
	English: Kurdish
	French: Kurde
	German: Kurdisch
	Kurdish, Central (ckb): کوردی
	Kurdish, Northern (kmr): Kurdî 
	Swedish: Kurdiska
	Turkish: Kürtçe


Codes and mappings (ISO 639-3):

Mapping: ckb
Info: RTL, unicode
Language list:
	English: Kurdish, Sorani
	French: Kurde, Sorani
	German: Kurdisch, Sorani
	Kurdish, Central (ckb): کوردیی سۆرانی
	Kurdish, Northern (kmr): Kurdî, Sorani
	Swedish: Kurdiska, Sorani
	Turkish: Kürtçe, Sorani

Mapping: ckb-Latn
Language list:
	English: Kurdish, Sorani (latin)
	French: Kurde, Sorani (latin)
	German: Kurdisch, Sorani (lateinischer)
	Kurdish, Central (ckb): کوردیی سۆرانی (لاتینی)
	Kurdish, Northern (kmr): Kurdî, Soranî (latînî)
	Swedish: Kurdiska, Sorani (latinsk)
	Turkish: Kürtçe, Sorani (latin)

Mapping: kmr
Language list:
	English: Kurdish, Kurmanji
	French: Kurde, Kurmandji
	German: Kurdisch, Kurmandschi
	Kurdish, Central (ckb): کوردیی کرمانجی
	Kurdish, Northern (kmr): Kurdî Kurmancî
	Swedish: Kurdiska, Nord
	Turkish: Kürtçe, Kurmanci

Mapping: sdh
Info: RTL, unicode
Language list:
	English: Kurdish, Southern
	French: Kurdes du sud
	German: Kurdisch, Süd
	Kurdish, Central (ckb): کوردیی باشووری
	Kurdish, Northern (kmr): Kurdî Xwarig
	Swedish: Kurdiska, Syd
	Turkish: Kürtçe, Güney

Mapping: sdh-Latn
Language list:
	English: Kurdish, Southern (latin)
	French: Kurdes du sud (latin)
	German: Kurdisch, Süd (lateinischer)
	Kurdish, Central (ckb): کوردیی باشووری (لاتینی)
	Kurdish, Northern (kmr): Kurdî Xwarig (latînî)
	Swedish: Kurdiska, Syd (latinsk)
	Turkish: Kürtçe, Güney (latin)
Comment 4 Eike Rathke 2013-10-17 21:53:19 UTC
Let's check the current status/assignments and how to transition to the
assignments above. Because previously script tags weren't supported we
defined a workaround that distinguishes them by different country
assignments and we have

LANGUAGE_USER_KURDISH_TURKEY  0x0626  /* sublang 0x01, Latin script */   [ku-TR]
LANGUAGE_USER_KURDISH_SYRIA   0x0A26  /* sublang 0x02, Latin script */   [ku-SY]
LANGUAGE_USER_KURDISH_IRAQ    0x0E26  /* sublang 0x03, Arabic script */  [ku-IQ]
LANGUAGE_USER_KURDISH_IRAN    0x1226  /* sublang 0x04, Arabic script */  [ku-IR]

First said, 'ku' is an ISO 639-1 macrolanguage code and should be
avoided in distinct language assignments where more specific ISO 639-1
or 639-3 codes exist. That should be 'ckb', 'kmr' and 'sdh' instead,
where 'ckb' and 'sdh' already imply Arabic script.

So according to that PDF Microsoft assigned

Central Kurdish       0x0092  /* sublang 0x00, Latin script? */  [ku]
Central Kurdish       0x7C92  /* sublang 0x1f, Arabic script */  [ku-Arab]
Central Kurdish Iraq  0x0492  /* sublang 0x01, Arabic script */  [ku-Arab-IQ]

and no other Kurdish tags. Great :-/
http://msdn.microsoft.com/library/dd318693.aspx mentions _only_ 0x0492
and nothing else, you gotta love'em :-(

At least it can be detected that since Windows 7 they seem to use
sublanguage IDs in the range 0x1d to 0x1f for primary language with
script type and without country. Which doesn't help much because 0x0492
can not be deduced from that and is rather 0x0092 with the default 0x01
sublanguage ID.

When defining new language entries for LibreOffice we should always
include a country to form a default locale of that language to make it
available for locale fallback mechanisms.

So here the results would be something like

LANGUAGE_KURDISH_ARABIC_IRAQ         0x0492  [ckb-IQ]
LANGUAGE_KURDISH_ARABIC_IRAQ         0x0492  [ku-Arab-IQ]
LANGUAGE_KURDISH_ARABIC_IRAQ         0x0492  [ku-IQ]
LANGUAGE_OBSOLETE_USER_KURDISH_IRAQ  0x0E26  [ckb-IQ]

with mappings to the preferred ISO 639-3 code and the ISO 639-1
macrolanguage code and the old values and codes, with

#define LANGUAGE_OBSOLETE_USER_KURDISH_IRAQ  0x0E26
#define LANGUAGE_USER_KURDISH_IRAQ  LANGUAGE_KURDISH_ARABIC_IRAQ


How to transition the other three, ku-TR, ku-SY and ku-IR?
ku-TR would be kmr-Latn-TR, Northern Kurdish Latin script.
ku-SY would be kmr-Latn-SY, Northern Kurdish Latin script.
But ku-IR? Which Kurdish would it be? ckb-IR?

The defaults for the new languages would be

Kurdish, Northern, Latin script   [kmr-Latn-TR]
Kurdish, Central, Arabic script   [ckb-IQ]
Kurdish, Southern, Arabic script  [sdh-IQ]

References I consulted
http://www.ethnologue.com/language/kmr
http://www.ethnologue.com/language/ckb
http://www.ethnologue.com/language/sdh
Comment 5 Sahand T. 2013-11-13 13:18:04 UTC
The transitions should be as you said for ku-TR, ku-SY and ku-IQ.

ku-TR => kmr-Latn-TR, Northern Kurdish Latin script.
ku-SY => be kmr-Latn-SY, Northern Kurdish Latin script.
ku-IQ => ckb-IQ, Central Kurdish (Arabic script implied).

Info from ethnologue about speakers of Kurdish dialects in Iran:
ckb: 3,250,000
sdh: 3,000,000
kmr: 350,000

So it should be either sdh-IR or ckb-IR. Sdh is broken down into subdialects whereas ckb basically represents Sorani alone, therefore ckb-IR might be the better choice. You should consult with someone else though, it's a bit tricky.

Another thing, are you supposed to explicitly write kmr-Latn? As I understand it kmr is latin based by implication, only the Arabic script should be written out explicitly. (I.e. kmr-Arab-TR but not kmr-Latn-TR.)
Comment 6 Eike Rathke 2013-11-18 15:08:13 UTC
Thanks for confirming.

BCP 47 does not list a suppress-script for 'kmr' so kmr-Latn-TR at least is technically correct, Ethnologue also lists Arabic and Cyrillic as possible scripts. I can add kmr-TR as well to map to Latin.
Comment 7 Commit Notification 2013-11-18 20:35:04 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=6a826ddc4ee40a9727131cd4b13365bf6ae16319

cleaned up ISO code usage for Kurdish, fdo#63460



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 8 Commit Notification 2013-11-18 21:28:31 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=d626d8b18cbb14825632900a02c7291912855f73

renamed ku* to kmr-Latn*, fdo#63460



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 9 Commit Notification 2013-11-18 21:28:55 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/help/commit/?id=aada8cbd4eb6e04bcf3df4282392f312073c2285

renamed ku* to kmr-Latn*, fdo#63460



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 10 Commit Notification 2013-11-18 22:48:41 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=eece7b1bc8579d7bdcbebeac67dcdc676617996e

renamed ku* to kmr-Latn*, fdo#63460



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2013-11-18 23:01:33 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=f79819b8eadd02f6bfc1131d7824b1948e7ee963

renamed ku* to kmr-Latn*, fdo#63460



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 12 Commit Notification 2013-11-19 00:22:16 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=8d1a0d88df029906968a3bb12da5f7a832e9b8a1

renamed ku* to kmr-Latn*, fdo#63460



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 13 Commit Notification 2013-11-19 13:47:43 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=444ea16c81746518897ed0643c5872d7cb9e277e

renamed ku* to kmr-Latn*, fdo#63460



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 14 Sahand T. 2013-11-30 23:02:05 UTC
(In reply to comment #6)
> Thanks for confirming.
> 
> BCP 47 does not list a suppress-script for 'kmr' so kmr-Latn-TR at least is
> technically correct, Ethnologue also lists Arabic and Cyrillic as possible
> scripts. I can add kmr-TR as well to map to Latin.

Hi Eike, thank you for this!

Two things:

In commit 6a826ddc4ee40a9727131cd4b13365bf6ae16319 it says the following:
< "Kurdish, Southern (Iraq)" ; LANGUAGE_USER_KURDISH_SOUTHERN_IRAQ ; > ;

How about adding LANGUAGE_USER_KURDISH_SOUTHERN_IRAN in addition to or instead of the Iraq one? (I think Southern Kurdish is more dominant in Iran, with its 3 million speakers, than in Iraq.)

2. kmr-TR is Latin based, implicitly, (75% of all speakers = 15 million out of 20). kmr-IQ, kmr-IR and kmr-SY are instead Arabic implicitly.

Because of this, have you considered having kmr and kmr-TR be Latin implicitly (instead of having to write kmr-Latn)? For kmr in Arabic script one can then use kmr-Arab, kmr-Arab-TR, kmr-IQ, kmr-IR or kmr-SY.

Again, great job!
Comment 15 Eike Rathke 2013-12-02 11:43:55 UTC
(In reply to comment #14)
> (In reply to comment #6)
> > BCP 47 does not list a suppress-script for 'kmr' so kmr-Latn-TR at least is
> > technically correct, Ethnologue also lists Arabic and Cyrillic as possible
> > scripts. I can add kmr-TR as well to map to Latin.
> 
> In commit 6a826ddc4ee40a9727131cd4b13365bf6ae16319 it says the following:
> < "Kurdish, Southern (Iraq)" ; LANGUAGE_USER_KURDISH_SOUTHERN_IRAQ ; > ;
> 
> How about adding LANGUAGE_USER_KURDISH_SOUTHERN_IRAN in addition to or
> instead of the Iraq one? (I think Southern Kurdish is more dominant in Iran,
> with its 3 million speakers, than in Iraq.)

Darn, I overlooked that Iraq is only a small portion of speakers, I'll add Iran.

> 2. kmr-TR is Latin based, implicitly, (75% of all speakers = 15 million out
> of 20). kmr-IQ, kmr-IR and kmr-SY are instead Arabic implicitly.
> 
> Because of this, have you considered having kmr and kmr-TR be Latin
> implicitly (instead of having to write kmr-Latn)? For kmr in Arabic script
> one can then use kmr-Arab, kmr-Arab-TR, kmr-IQ, kmr-IR or kmr-SY.

As said in comment 6, the IANA language tag registration does not list a suppress-script that would be implicit (redundant) for kmr. If a language is written in more than one script it is good practice to explicitly state the script. Deriving the script from a language-region combination is bad practice and should be avoided. I already added that reading kmr-TR from a document will be accepted and mapped to kmr-Latn-TR, but we'll write the fo:script='Latn' attribute when saving documents, same for kmr-SY mapping to kmr-Latn-SY. Maybe I'll do similar for kmr-IR to map to kmr-Arab-IR and others, so far we do not have any kmr-Arab-* mappings or language list entries.
Comment 16 Commit Notification 2013-12-02 12:01:12 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=fefacbd92f4e3355ecd04841b8eacc75a4a67223

added Kurdish, Southern (Iran) [sdh-IR] to language list, fdo#63460



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 17 Commit Notification 2013-12-02 12:05:37 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-4-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1ab139fdbe9a7ed3b781730dfecb83a98e0b671b&h=libreoffice-4-2

added Kurdish, Southern (Iran) [sdh-IR] to language list, fdo#63460


It will be available in LibreOffice 4.2.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 18 Sahand T. 2013-12-08 16:35:06 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #6)
> > > BCP 47 does not list a suppress-script for 'kmr' so kmr-Latn-TR at least is
> > > technically correct, Ethnologue also lists Arabic and Cyrillic as possible
> > > scripts. I can add kmr-TR as well to map to Latin.
> > 
> > In commit 6a826ddc4ee40a9727131cd4b13365bf6ae16319 it says the following:
> > < "Kurdish, Southern (Iraq)" ; LANGUAGE_USER_KURDISH_SOUTHERN_IRAQ ; > ;
> > 
> > How about adding LANGUAGE_USER_KURDISH_SOUTHERN_IRAN in addition to or
> > instead of the Iraq one? (I think Southern Kurdish is more dominant in Iran,
> > with its 3 million speakers, than in Iraq.)
> 
> Darn, I overlooked that Iraq is only a small portion of speakers, I'll add
> Iran.
> 
> > 2. kmr-TR is Latin based, implicitly, (75% of all speakers = 15 million out
> > of 20). kmr-IQ, kmr-IR and kmr-SY are instead Arabic implicitly.
> > 
> > Because of this, have you considered having kmr and kmr-TR be Latin
> > implicitly (instead of having to write kmr-Latn)? For kmr in Arabic script
> > one can then use kmr-Arab, kmr-Arab-TR, kmr-IQ, kmr-IR or kmr-SY.
> 
> As said in comment 6, the IANA language tag registration does not list a
> suppress-script that would be implicit (redundant) for kmr. If a language is
> written in more than one script it is good practice to explicitly state the
> script. Deriving the script from a language-region combination is bad
> practice and should be avoided. I already added that reading kmr-TR from a
> document will be accepted and mapped to kmr-Latn-TR, but we'll write the
> fo:script='Latn' attribute when saving documents, same for kmr-SY mapping to
> kmr-Latn-SY. Maybe I'll do similar for kmr-IR to map to kmr-Arab-IR and
> others, so far we do not have any kmr-Arab-* mappings or language list
> entries.

All right, great then. Thanks!