Locales for LibreOffice only include Northern Kurdish, and it's named "Kurdish" instead of the correct "Kurdish, Northern" (Kurdish encompasses all variants of the language, it's unspecific). Central Kurdish and Southern Kurdish need to be added too since they are basically different languages. Central and Southern Kurdish use a modified arabic script by default and are RTL whereas Northern Kurdish by default uses latin script and is LTR. I will suggest the mappings without country codes, this is because the variants of Kurdish are similar throughout each country and there's not much of a use in separating things by country. If country codes are an absolute must I can suggest them anyway. The three letter 639-3 codes should be used in the locale settings for the Kurdish variations and not the two letter 639-1 codes, the suggested names in the following list should be used and not simply "Kurdish". --- Codes and mappings (ISO 639-3): Mapping: ckb Info: RTL, unicode Language list: English: Kurdish, Central French: Kurde, Centrale German: Kurdisch, Zentral Kurdish, Northern: Kurdí Sorani Swedish: Kurdiska, Central Turkish: Kürtçe, Orta Mapping: ckb-Latn Language list: English: Kurdish, Central (latin) French: Kurde, Centrale (latin) German: Kurdisch, Zentral (lateinisch) Kurdish, Northern: Kurdí Sorani (latînî) Swedish: Kurdiska, Central (latinsk) Turkish: Kürtçe, Orta (latin) Mapping: kmr Language list: English: Kurdish, Northern French: Kurdes du nord German: Kurdisch, Nördliches Kurdish, Northern: Kurdí Kurmancî Swedish: Kurdiska, Nordlig Turkish: Kürtçe, Kuzey Mapping: sdh Info: RTL, unicode Language list: English: Kurdish, Southern French: Kurdes du sud German: Kurdische, Süd Kurdish, Northern: Kurdí Xwarig Swedish: Kurdiska, Sydlig Turkish: Kürtçe, Güney Mapping: sdh-Latn Language list: English: Kurdish, Southern (latin) French: Kurdes du sud (latin) German: Kurdische, Süd (lateinisch) Kurdish, Northern: Kurdí Xwarig (latînî) Swedish: Kurdiska, Sydlig (latinsk) Turkish: Kürtçe, Güney (latin) note: About the ku-TR and ku-SY that are currently there: change the language list name for them from "Kurdish" to "Kurdish, Northern". This means that there will be three "Kurdish, Northern" in total (ku-TR, ku-SY and kmr). All language tools and localizations should be updated and assigned to kmr (and not to ku), once that's done I guess ku-TR and ku-SY can be removed. If someone still wants to differentiate between Northern Kurdish in different countries, then the mappings kmr-XX can be used (and not ku-XX as ku is the macrolanguage kurdish and not specifically Northern Kurdish). note 2: sdh-Latn and ckb-Latn are LTR and use the same script as kmr. I think they use unicode, but I'm not 100%. note 3: In the language list, when I write "Kurdish, Northern:", I mean the localization that is currently named "Kurdish". Like I said, this localization should be renamed to "Kurdish, Northern". --- Microsoft added Central Kurdish to Windows 8 and Office 2013. Here are the LCIDs for Windows 8 (they're probably the same for Office 2013): Central Kurdish, 0x0092, ku Central Kurdish, 0x7c92, ku-Arab Central Kurdish Iraq, 0x0492, ku-Arab-IQ (source: http://download.microsoft.com/download/9/5/E/95EF66AF-9026-4BB0-A41D-A4F81802D92C/[MS-LCID].pdf ) A Word 2013 document written in sorani/arabic script and marked as "Central Kurdish Iraq" is assigned the code "ku-Arab-IQ" in the xml, just as the Microsofts LCID list says (so 0x0492 should be linked to "ckb"). Text written in latin can't be assigned "Central Kurdish Iraq" in Word 2013, this indicates that "Central Kurdish Iraq" ("ku-Arab-IQ") is reserved for sorani/arabic script only. Perhaps then, "ku" is reserved for Central Kurdish latin (if true then 0x0092 should be linked to "ckb-Latn") and "ku-Arab" for Central Kurdish throughout all countries – ie Iraq+Iran – meaning that 0x7c92 should be linked to "ckb". But it's hard to say at the moment.
I agree this is an important change to make. Just one comment, regarding Sahand's suggestion to change the language list names for ku-TR and ku-SY from "Kurdish" to "Kurdish, Northern". I guess this is OK as a temporary hack but really the best solution is to move the existing localizations, spell checkers, etc. for Northern Kurdish to the correct code "kmr", once and for all.
This needs BCP47 language tags implementation to be fully solved, adding dependency on bug 37496.
(In reply to comment #2) > This needs BCP47 language tags implementation to be fully solved, adding > dependency on bug 37496. If you add this, use these updated language lists and not the above ones: Mapping: ku (ISO 639-1) Language list: English: Kurdish French: Kurde German: Kurdisch Kurdish, Central (ckb): کوردی Kurdish, Northern (kmr): Kurdî Swedish: Kurdiska Turkish: Kürtçe Codes and mappings (ISO 639-3): Mapping: ckb Info: RTL, unicode Language list: English: Kurdish, Sorani French: Kurde, Sorani German: Kurdisch, Sorani Kurdish, Central (ckb): کوردیی سۆرانی Kurdish, Northern (kmr): Kurdî, Sorani Swedish: Kurdiska, Sorani Turkish: Kürtçe, Sorani Mapping: ckb-Latn Language list: English: Kurdish, Sorani (latin) French: Kurde, Sorani (latin) German: Kurdisch, Sorani (lateinischer) Kurdish, Central (ckb): کوردیی سۆرانی (لاتینی) Kurdish, Northern (kmr): Kurdî, Soranî (latînî) Swedish: Kurdiska, Sorani (latinsk) Turkish: Kürtçe, Sorani (latin) Mapping: kmr Language list: English: Kurdish, Kurmanji French: Kurde, Kurmandji German: Kurdisch, Kurmandschi Kurdish, Central (ckb): کوردیی کرمانجی Kurdish, Northern (kmr): Kurdî Kurmancî Swedish: Kurdiska, Nord Turkish: Kürtçe, Kurmanci Mapping: sdh Info: RTL, unicode Language list: English: Kurdish, Southern French: Kurdes du sud German: Kurdisch, Süd Kurdish, Central (ckb): کوردیی باشووری Kurdish, Northern (kmr): Kurdî Xwarig Swedish: Kurdiska, Syd Turkish: Kürtçe, Güney Mapping: sdh-Latn Language list: English: Kurdish, Southern (latin) French: Kurdes du sud (latin) German: Kurdisch, Süd (lateinischer) Kurdish, Central (ckb): کوردیی باشووری (لاتینی) Kurdish, Northern (kmr): Kurdî Xwarig (latînî) Swedish: Kurdiska, Syd (latinsk) Turkish: Kürtçe, Güney (latin)
Let's check the current status/assignments and how to transition to the assignments above. Because previously script tags weren't supported we defined a workaround that distinguishes them by different country assignments and we have LANGUAGE_USER_KURDISH_TURKEY 0x0626 /* sublang 0x01, Latin script */ [ku-TR] LANGUAGE_USER_KURDISH_SYRIA 0x0A26 /* sublang 0x02, Latin script */ [ku-SY] LANGUAGE_USER_KURDISH_IRAQ 0x0E26 /* sublang 0x03, Arabic script */ [ku-IQ] LANGUAGE_USER_KURDISH_IRAN 0x1226 /* sublang 0x04, Arabic script */ [ku-IR] First said, 'ku' is an ISO 639-1 macrolanguage code and should be avoided in distinct language assignments where more specific ISO 639-1 or 639-3 codes exist. That should be 'ckb', 'kmr' and 'sdh' instead, where 'ckb' and 'sdh' already imply Arabic script. So according to that PDF Microsoft assigned Central Kurdish 0x0092 /* sublang 0x00, Latin script? */ [ku] Central Kurdish 0x7C92 /* sublang 0x1f, Arabic script */ [ku-Arab] Central Kurdish Iraq 0x0492 /* sublang 0x01, Arabic script */ [ku-Arab-IQ] and no other Kurdish tags. Great :-/ http://msdn.microsoft.com/library/dd318693.aspx mentions _only_ 0x0492 and nothing else, you gotta love'em :-( At least it can be detected that since Windows 7 they seem to use sublanguage IDs in the range 0x1d to 0x1f for primary language with script type and without country. Which doesn't help much because 0x0492 can not be deduced from that and is rather 0x0092 with the default 0x01 sublanguage ID. When defining new language entries for LibreOffice we should always include a country to form a default locale of that language to make it available for locale fallback mechanisms. So here the results would be something like LANGUAGE_KURDISH_ARABIC_IRAQ 0x0492 [ckb-IQ] LANGUAGE_KURDISH_ARABIC_IRAQ 0x0492 [ku-Arab-IQ] LANGUAGE_KURDISH_ARABIC_IRAQ 0x0492 [ku-IQ] LANGUAGE_OBSOLETE_USER_KURDISH_IRAQ 0x0E26 [ckb-IQ] with mappings to the preferred ISO 639-3 code and the ISO 639-1 macrolanguage code and the old values and codes, with #define LANGUAGE_OBSOLETE_USER_KURDISH_IRAQ 0x0E26 #define LANGUAGE_USER_KURDISH_IRAQ LANGUAGE_KURDISH_ARABIC_IRAQ How to transition the other three, ku-TR, ku-SY and ku-IR? ku-TR would be kmr-Latn-TR, Northern Kurdish Latin script. ku-SY would be kmr-Latn-SY, Northern Kurdish Latin script. But ku-IR? Which Kurdish would it be? ckb-IR? The defaults for the new languages would be Kurdish, Northern, Latin script [kmr-Latn-TR] Kurdish, Central, Arabic script [ckb-IQ] Kurdish, Southern, Arabic script [sdh-IQ] References I consulted http://www.ethnologue.com/language/kmr http://www.ethnologue.com/language/ckb http://www.ethnologue.com/language/sdh
The transitions should be as you said for ku-TR, ku-SY and ku-IQ. ku-TR => kmr-Latn-TR, Northern Kurdish Latin script. ku-SY => be kmr-Latn-SY, Northern Kurdish Latin script. ku-IQ => ckb-IQ, Central Kurdish (Arabic script implied). Info from ethnologue about speakers of Kurdish dialects in Iran: ckb: 3,250,000 sdh: 3,000,000 kmr: 350,000 So it should be either sdh-IR or ckb-IR. Sdh is broken down into subdialects whereas ckb basically represents Sorani alone, therefore ckb-IR might be the better choice. You should consult with someone else though, it's a bit tricky. Another thing, are you supposed to explicitly write kmr-Latn? As I understand it kmr is latin based by implication, only the Arabic script should be written out explicitly. (I.e. kmr-Arab-TR but not kmr-Latn-TR.)
Thanks for confirming. BCP 47 does not list a suppress-script for 'kmr' so kmr-Latn-TR at least is technically correct, Ethnologue also lists Arabic and Cyrillic as possible scripts. I can add kmr-TR as well to map to Latin.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=6a826ddc4ee40a9727131cd4b13365bf6ae16319 cleaned up ISO code usage for Kurdish, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=d626d8b18cbb14825632900a02c7291912855f73 renamed ku* to kmr-Latn*, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/help/commit/?id=aada8cbd4eb6e04bcf3df4282392f312073c2285 renamed ku* to kmr-Latn*, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=eece7b1bc8579d7bdcbebeac67dcdc676617996e renamed ku* to kmr-Latn*, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=f79819b8eadd02f6bfc1131d7824b1948e7ee963 renamed ku* to kmr-Latn*, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=8d1a0d88df029906968a3bb12da5f7a832e9b8a1 renamed ku* to kmr-Latn*, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=444ea16c81746518897ed0643c5872d7cb9e277e renamed ku* to kmr-Latn*, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
(In reply to comment #6) > Thanks for confirming. > > BCP 47 does not list a suppress-script for 'kmr' so kmr-Latn-TR at least is > technically correct, Ethnologue also lists Arabic and Cyrillic as possible > scripts. I can add kmr-TR as well to map to Latin. Hi Eike, thank you for this! Two things: In commit 6a826ddc4ee40a9727131cd4b13365bf6ae16319 it says the following: < "Kurdish, Southern (Iraq)" ; LANGUAGE_USER_KURDISH_SOUTHERN_IRAQ ; > ; How about adding LANGUAGE_USER_KURDISH_SOUTHERN_IRAN in addition to or instead of the Iraq one? (I think Southern Kurdish is more dominant in Iran, with its 3 million speakers, than in Iraq.) 2. kmr-TR is Latin based, implicitly, (75% of all speakers = 15 million out of 20). kmr-IQ, kmr-IR and kmr-SY are instead Arabic implicitly. Because of this, have you considered having kmr and kmr-TR be Latin implicitly (instead of having to write kmr-Latn)? For kmr in Arabic script one can then use kmr-Arab, kmr-Arab-TR, kmr-IQ, kmr-IR or kmr-SY. Again, great job!
(In reply to comment #14) > (In reply to comment #6) > > BCP 47 does not list a suppress-script for 'kmr' so kmr-Latn-TR at least is > > technically correct, Ethnologue also lists Arabic and Cyrillic as possible > > scripts. I can add kmr-TR as well to map to Latin. > > In commit 6a826ddc4ee40a9727131cd4b13365bf6ae16319 it says the following: > < "Kurdish, Southern (Iraq)" ; LANGUAGE_USER_KURDISH_SOUTHERN_IRAQ ; > ; > > How about adding LANGUAGE_USER_KURDISH_SOUTHERN_IRAN in addition to or > instead of the Iraq one? (I think Southern Kurdish is more dominant in Iran, > with its 3 million speakers, than in Iraq.) Darn, I overlooked that Iraq is only a small portion of speakers, I'll add Iran. > 2. kmr-TR is Latin based, implicitly, (75% of all speakers = 15 million out > of 20). kmr-IQ, kmr-IR and kmr-SY are instead Arabic implicitly. > > Because of this, have you considered having kmr and kmr-TR be Latin > implicitly (instead of having to write kmr-Latn)? For kmr in Arabic script > one can then use kmr-Arab, kmr-Arab-TR, kmr-IQ, kmr-IR or kmr-SY. As said in comment 6, the IANA language tag registration does not list a suppress-script that would be implicit (redundant) for kmr. If a language is written in more than one script it is good practice to explicitly state the script. Deriving the script from a language-region combination is bad practice and should be avoided. I already added that reading kmr-TR from a document will be accepted and mapped to kmr-Latn-TR, but we'll write the fo:script='Latn' attribute when saving documents, same for kmr-SY mapping to kmr-Latn-SY. Maybe I'll do similar for kmr-IR to map to kmr-Arab-IR and others, so far we do not have any kmr-Arab-* mappings or language list entries.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=fefacbd92f4e3355ecd04841b8eacc75a4a67223 added Kurdish, Southern (Iran) [sdh-IR] to language list, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "libreoffice-4-2": http://cgit.freedesktop.org/libreoffice/core/commit/?id=1ab139fdbe9a7ed3b781730dfecb83a98e0b671b&h=libreoffice-4-2 added Kurdish, Southern (Iran) [sdh-IR] to language list, fdo#63460 It will be available in LibreOffice 4.2. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
(In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #6) > > > BCP 47 does not list a suppress-script for 'kmr' so kmr-Latn-TR at least is > > > technically correct, Ethnologue also lists Arabic and Cyrillic as possible > > > scripts. I can add kmr-TR as well to map to Latin. > > > > In commit 6a826ddc4ee40a9727131cd4b13365bf6ae16319 it says the following: > > < "Kurdish, Southern (Iraq)" ; LANGUAGE_USER_KURDISH_SOUTHERN_IRAQ ; > ; > > > > How about adding LANGUAGE_USER_KURDISH_SOUTHERN_IRAN in addition to or > > instead of the Iraq one? (I think Southern Kurdish is more dominant in Iran, > > with its 3 million speakers, than in Iraq.) > > Darn, I overlooked that Iraq is only a small portion of speakers, I'll add > Iran. > > > 2. kmr-TR is Latin based, implicitly, (75% of all speakers = 15 million out > > of 20). kmr-IQ, kmr-IR and kmr-SY are instead Arabic implicitly. > > > > Because of this, have you considered having kmr and kmr-TR be Latin > > implicitly (instead of having to write kmr-Latn)? For kmr in Arabic script > > one can then use kmr-Arab, kmr-Arab-TR, kmr-IQ, kmr-IR or kmr-SY. > > As said in comment 6, the IANA language tag registration does not list a > suppress-script that would be implicit (redundant) for kmr. If a language is > written in more than one script it is good practice to explicitly state the > script. Deriving the script from a language-region combination is bad > practice and should be avoided. I already added that reading kmr-TR from a > document will be accepted and mapped to kmr-Latn-TR, but we'll write the > fo:script='Latn' attribute when saving documents, same for kmr-SY mapping to > kmr-Latn-SY. Maybe I'll do similar for kmr-IR to map to kmr-Arab-IR and > others, so far we do not have any kmr-Arab-* mappings or language list > entries. All right, great then. Thanks!