| Summary: | Northern/Central/Southern Kurdish | ||
|---|---|---|---|
| Product: | LibreOffice | Reporter: | Andras Timar <timar74> |
| Component: | Localization | Assignee: | Eike Rathke <erack> |
| Status: | RESOLVED FIXED | ||
| Severity: | enhancement | CC: | erack, kscanne, sashther |
| Priority: | medium | ||
| Version: | unspecified | ||
| Hardware: | Other | ||
| OS: | All | ||
| See Also: | https://bugs.documentfoundation.org/show_bug.cgi?id=124856 | ||
| Whiteboard: | target:4.2.0 target:4.3.0 | ||
| Crash report or crash signature: | Regression By: | ||
| Bug Depends on: | 37496 | ||
| Bug Blocks: | |||
|
Description
Andras Timar
2013-04-12 09:32:01 UTC
I agree this is an important change to make. Just one comment, regarding Sahand's suggestion to change the language list names for ku-TR and ku-SY from "Kurdish" to "Kurdish, Northern". I guess this is OK as a temporary hack but really the best solution is to move the existing localizations, spell checkers, etc. for Northern Kurdish to the correct code "kmr", once and for all. This needs BCP47 language tags implementation to be fully solved, adding dependency on bug 37496. (In reply to comment #2) > This needs BCP47 language tags implementation to be fully solved, adding > dependency on bug 37496. If you add this, use these updated language lists and not the above ones: Mapping: ku (ISO 639-1) Language list: English: Kurdish French: Kurde German: Kurdisch Kurdish, Central (ckb): کوردی Kurdish, Northern (kmr): Kurdî Swedish: Kurdiska Turkish: Kürtçe Codes and mappings (ISO 639-3): Mapping: ckb Info: RTL, unicode Language list: English: Kurdish, Sorani French: Kurde, Sorani German: Kurdisch, Sorani Kurdish, Central (ckb): کوردیی سۆرانی Kurdish, Northern (kmr): Kurdî, Sorani Swedish: Kurdiska, Sorani Turkish: Kürtçe, Sorani Mapping: ckb-Latn Language list: English: Kurdish, Sorani (latin) French: Kurde, Sorani (latin) German: Kurdisch, Sorani (lateinischer) Kurdish, Central (ckb): کوردیی سۆرانی (لاتینی) Kurdish, Northern (kmr): Kurdî, Soranî (latînî) Swedish: Kurdiska, Sorani (latinsk) Turkish: Kürtçe, Sorani (latin) Mapping: kmr Language list: English: Kurdish, Kurmanji French: Kurde, Kurmandji German: Kurdisch, Kurmandschi Kurdish, Central (ckb): کوردیی کرمانجی Kurdish, Northern (kmr): Kurdî Kurmancî Swedish: Kurdiska, Nord Turkish: Kürtçe, Kurmanci Mapping: sdh Info: RTL, unicode Language list: English: Kurdish, Southern French: Kurdes du sud German: Kurdisch, Süd Kurdish, Central (ckb): کوردیی باشووری Kurdish, Northern (kmr): Kurdî Xwarig Swedish: Kurdiska, Syd Turkish: Kürtçe, Güney Mapping: sdh-Latn Language list: English: Kurdish, Southern (latin) French: Kurdes du sud (latin) German: Kurdisch, Süd (lateinischer) Kurdish, Central (ckb): کوردیی باشووری (لاتینی) Kurdish, Northern (kmr): Kurdî Xwarig (latînî) Swedish: Kurdiska, Syd (latinsk) Turkish: Kürtçe, Güney (latin) Let's check the current status/assignments and how to transition to the assignments above. Because previously script tags weren't supported we defined a workaround that distinguishes them by different country assignments and we have LANGUAGE_USER_KURDISH_TURKEY 0x0626 /* sublang 0x01, Latin script */ [ku-TR] LANGUAGE_USER_KURDISH_SYRIA 0x0A26 /* sublang 0x02, Latin script */ [ku-SY] LANGUAGE_USER_KURDISH_IRAQ 0x0E26 /* sublang 0x03, Arabic script */ [ku-IQ] LANGUAGE_USER_KURDISH_IRAN 0x1226 /* sublang 0x04, Arabic script */ [ku-IR] First said, 'ku' is an ISO 639-1 macrolanguage code and should be avoided in distinct language assignments where more specific ISO 639-1 or 639-3 codes exist. That should be 'ckb', 'kmr' and 'sdh' instead, where 'ckb' and 'sdh' already imply Arabic script. So according to that PDF Microsoft assigned Central Kurdish 0x0092 /* sublang 0x00, Latin script? */ [ku] Central Kurdish 0x7C92 /* sublang 0x1f, Arabic script */ [ku-Arab] Central Kurdish Iraq 0x0492 /* sublang 0x01, Arabic script */ [ku-Arab-IQ] and no other Kurdish tags. Great :-/ http://msdn.microsoft.com/library/dd318693.aspx mentions _only_ 0x0492 and nothing else, you gotta love'em :-( At least it can be detected that since Windows 7 they seem to use sublanguage IDs in the range 0x1d to 0x1f for primary language with script type and without country. Which doesn't help much because 0x0492 can not be deduced from that and is rather 0x0092 with the default 0x01 sublanguage ID. When defining new language entries for LibreOffice we should always include a country to form a default locale of that language to make it available for locale fallback mechanisms. So here the results would be something like LANGUAGE_KURDISH_ARABIC_IRAQ 0x0492 [ckb-IQ] LANGUAGE_KURDISH_ARABIC_IRAQ 0x0492 [ku-Arab-IQ] LANGUAGE_KURDISH_ARABIC_IRAQ 0x0492 [ku-IQ] LANGUAGE_OBSOLETE_USER_KURDISH_IRAQ 0x0E26 [ckb-IQ] with mappings to the preferred ISO 639-3 code and the ISO 639-1 macrolanguage code and the old values and codes, with #define LANGUAGE_OBSOLETE_USER_KURDISH_IRAQ 0x0E26 #define LANGUAGE_USER_KURDISH_IRAQ LANGUAGE_KURDISH_ARABIC_IRAQ How to transition the other three, ku-TR, ku-SY and ku-IR? ku-TR would be kmr-Latn-TR, Northern Kurdish Latin script. ku-SY would be kmr-Latn-SY, Northern Kurdish Latin script. But ku-IR? Which Kurdish would it be? ckb-IR? The defaults for the new languages would be Kurdish, Northern, Latin script [kmr-Latn-TR] Kurdish, Central, Arabic script [ckb-IQ] Kurdish, Southern, Arabic script [sdh-IQ] References I consulted http://www.ethnologue.com/language/kmr http://www.ethnologue.com/language/ckb http://www.ethnologue.com/language/sdh The transitions should be as you said for ku-TR, ku-SY and ku-IQ. ku-TR => kmr-Latn-TR, Northern Kurdish Latin script. ku-SY => be kmr-Latn-SY, Northern Kurdish Latin script. ku-IQ => ckb-IQ, Central Kurdish (Arabic script implied). Info from ethnologue about speakers of Kurdish dialects in Iran: ckb: 3,250,000 sdh: 3,000,000 kmr: 350,000 So it should be either sdh-IR or ckb-IR. Sdh is broken down into subdialects whereas ckb basically represents Sorani alone, therefore ckb-IR might be the better choice. You should consult with someone else though, it's a bit tricky. Another thing, are you supposed to explicitly write kmr-Latn? As I understand it kmr is latin based by implication, only the Arabic script should be written out explicitly. (I.e. kmr-Arab-TR but not kmr-Latn-TR.) Thanks for confirming. BCP 47 does not list a suppress-script for 'kmr' so kmr-Latn-TR at least is technically correct, Ethnologue also lists Arabic and Cyrillic as possible scripts. I can add kmr-TR as well to map to Latin. Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=6a826ddc4ee40a9727131cd4b13365bf6ae16319 cleaned up ISO code usage for Kurdish, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback. Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=d626d8b18cbb14825632900a02c7291912855f73 renamed ku* to kmr-Latn*, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback. Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/help/commit/?id=aada8cbd4eb6e04bcf3df4282392f312073c2285 renamed ku* to kmr-Latn*, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback. Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=eece7b1bc8579d7bdcbebeac67dcdc676617996e renamed ku* to kmr-Latn*, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback. Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=f79819b8eadd02f6bfc1131d7824b1948e7ee963 renamed ku* to kmr-Latn*, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback. Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=8d1a0d88df029906968a3bb12da5f7a832e9b8a1 renamed ku* to kmr-Latn*, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback. Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=444ea16c81746518897ed0643c5872d7cb9e277e renamed ku* to kmr-Latn*, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback. (In reply to comment #6) > Thanks for confirming. > > BCP 47 does not list a suppress-script for 'kmr' so kmr-Latn-TR at least is > technically correct, Ethnologue also lists Arabic and Cyrillic as possible > scripts. I can add kmr-TR as well to map to Latin. Hi Eike, thank you for this! Two things: In commit 6a826ddc4ee40a9727131cd4b13365bf6ae16319 it says the following: < "Kurdish, Southern (Iraq)" ; LANGUAGE_USER_KURDISH_SOUTHERN_IRAQ ; > ; How about adding LANGUAGE_USER_KURDISH_SOUTHERN_IRAN in addition to or instead of the Iraq one? (I think Southern Kurdish is more dominant in Iran, with its 3 million speakers, than in Iraq.) 2. kmr-TR is Latin based, implicitly, (75% of all speakers = 15 million out of 20). kmr-IQ, kmr-IR and kmr-SY are instead Arabic implicitly. Because of this, have you considered having kmr and kmr-TR be Latin implicitly (instead of having to write kmr-Latn)? For kmr in Arabic script one can then use kmr-Arab, kmr-Arab-TR, kmr-IQ, kmr-IR or kmr-SY. Again, great job! (In reply to comment #14) > (In reply to comment #6) > > BCP 47 does not list a suppress-script for 'kmr' so kmr-Latn-TR at least is > > technically correct, Ethnologue also lists Arabic and Cyrillic as possible > > scripts. I can add kmr-TR as well to map to Latin. > > In commit 6a826ddc4ee40a9727131cd4b13365bf6ae16319 it says the following: > < "Kurdish, Southern (Iraq)" ; LANGUAGE_USER_KURDISH_SOUTHERN_IRAQ ; > ; > > How about adding LANGUAGE_USER_KURDISH_SOUTHERN_IRAN in addition to or > instead of the Iraq one? (I think Southern Kurdish is more dominant in Iran, > with its 3 million speakers, than in Iraq.) Darn, I overlooked that Iraq is only a small portion of speakers, I'll add Iran. > 2. kmr-TR is Latin based, implicitly, (75% of all speakers = 15 million out > of 20). kmr-IQ, kmr-IR and kmr-SY are instead Arabic implicitly. > > Because of this, have you considered having kmr and kmr-TR be Latin > implicitly (instead of having to write kmr-Latn)? For kmr in Arabic script > one can then use kmr-Arab, kmr-Arab-TR, kmr-IQ, kmr-IR or kmr-SY. As said in comment 6, the IANA language tag registration does not list a suppress-script that would be implicit (redundant) for kmr. If a language is written in more than one script it is good practice to explicitly state the script. Deriving the script from a language-region combination is bad practice and should be avoided. I already added that reading kmr-TR from a document will be accepted and mapped to kmr-Latn-TR, but we'll write the fo:script='Latn' attribute when saving documents, same for kmr-SY mapping to kmr-Latn-SY. Maybe I'll do similar for kmr-IR to map to kmr-Arab-IR and others, so far we do not have any kmr-Arab-* mappings or language list entries. Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=fefacbd92f4e3355ecd04841b8eacc75a4a67223 added Kurdish, Southern (Iran) [sdh-IR] to language list, fdo#63460 The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback. Eike Rathke committed a patch related to this issue. It has been pushed to "libreoffice-4-2": http://cgit.freedesktop.org/libreoffice/core/commit/?id=1ab139fdbe9a7ed3b781730dfecb83a98e0b671b&h=libreoffice-4-2 added Kurdish, Southern (Iran) [sdh-IR] to language list, fdo#63460 It will be available in LibreOffice 4.2. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback. (In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #6) > > > BCP 47 does not list a suppress-script for 'kmr' so kmr-Latn-TR at least is > > > technically correct, Ethnologue also lists Arabic and Cyrillic as possible > > > scripts. I can add kmr-TR as well to map to Latin. > > > > In commit 6a826ddc4ee40a9727131cd4b13365bf6ae16319 it says the following: > > < "Kurdish, Southern (Iraq)" ; LANGUAGE_USER_KURDISH_SOUTHERN_IRAQ ; > ; > > > > How about adding LANGUAGE_USER_KURDISH_SOUTHERN_IRAN in addition to or > > instead of the Iraq one? (I think Southern Kurdish is more dominant in Iran, > > with its 3 million speakers, than in Iraq.) > > Darn, I overlooked that Iraq is only a small portion of speakers, I'll add > Iran. > > > 2. kmr-TR is Latin based, implicitly, (75% of all speakers = 15 million out > > of 20). kmr-IQ, kmr-IR and kmr-SY are instead Arabic implicitly. > > > > Because of this, have you considered having kmr and kmr-TR be Latin > > implicitly (instead of having to write kmr-Latn)? For kmr in Arabic script > > one can then use kmr-Arab, kmr-Arab-TR, kmr-IQ, kmr-IR or kmr-SY. > > As said in comment 6, the IANA language tag registration does not list a > suppress-script that would be implicit (redundant) for kmr. If a language is > written in more than one script it is good practice to explicitly state the > script. Deriving the script from a language-region combination is bad > practice and should be avoided. I already added that reading kmr-TR from a > document will be accepted and mapped to kmr-Latn-TR, but we'll write the > fo:script='Latn' attribute when saving documents, same for kmr-SY mapping to > kmr-Latn-SY. Maybe I'll do similar for kmr-IR to map to kmr-Arab-IR and > others, so far we do not have any kmr-Arab-* mappings or language list > entries. All right, great then. Thanks! |