Bug 66376 - i18n: change locale data [hr-HR]
Summary: i18n: change locale data [hr-HR]
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Localization (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Eike Rathke
URL:
Whiteboard: target:4.2.0 target:4.1.1
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-29 18:02 UTC by Krunoslav Šebetić
Modified: 2013-11-14 14:39 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
hr_HR.xml local file (15.43 KB, text/xml)
2013-06-29 18:02 UTC, Krunoslav Šebetić
Details
diff of re-indented file against current revision (29.82 KB, patch)
2013-07-01 17:46 UTC, Eike Rathke
Details
few modification to a croatian localization file (7.01 KB, patch)
2013-07-18 16:30 UTC, Krunoslav Šebetić
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Krunoslav Šebetić 2013-06-29 18:02:27 UTC
Created attachment 81702 [details]
hr_HR.xml local file

Tried to improve decimals and time specs, most done on dates - added (hope so) genitive case for log date formats, added spaces present date formats where needed.
Comment 1 Mikeyy - L10n HR 2013-06-29 21:21:04 UTC
Please put this on hold until we recheck it.
Comment 2 Eike Rathke 2013-07-01 17:44:16 UTC
Please note that we already have [hr-HR] locale data in i18npool/source/localedata/data/hr_HR.xml, see also http://cgit.freedesktop.org/libreoffice/core/tree/i18npool/source/localedata/data/hr_HR.xml

For already existing files please submit a diff instead of entire new files, preferably against the latest revision available.

I just sorted the FormatElement entries in that committed file by formatindex for comparability, but even with the indentation we use applied to the data you provided there are too many changes, I'll attach the diff so you can check.

Specifically I have doubts about the change of the group separator from '.' dot to ' ' space, http://www.unicode.org/cldr/charts/by_type/patterns.numbers.html#a1ef41eaeb6982d lists the '.' dot as group separator.

Additionally, the negative currency formats introduce the () parentheses formats, which I think are wrong, see also https://wiki.documentfoundation.org/LibreOffice_Localization_Guide/How_To_Submit_New_Locale_Data Pitfalls.

Not sure about the change of LongDateDaySeparator.

-  <LC_FORMAT replaceFrom="[CURRENCY]" replaceTo="[$kn-41A]">
+  <LC_FORMAT replaceFrom="[CURRENCY]" replaceTo="[$Kn-041a]">

changes currency symbol from 'kn' to 'Kn' (uppercase K). According to https://en.wikipedia.org/wiki/Croatian_kuna the curency symbol is 'kn' all lowercase. Note also that the hexadecimal number must be uppercase, so 41A.

+    <IndexKey phonetic="false" default="true" unoid="alphanumeric">A-B-C-Č-Ć-D-Đ-DŽ-E-F-G-H-I-J-K-L-LJ-M-N-NJ-O-P-R-S-Š-T-U-V-Z-Ž</IndexKey>

is wrong in using the '-' between all letters, which is reserved to express a range between two letters, please see documentation in i18npool/source/localedata/data/locale.dtd http://cgit.freedesktop.org/libreoffice/core/tree/i18npool/source/localedata/data/locale.dtd

In the calendar the month names and abbreviations are now capitalized, not sure if that is right or wrong, the ICU locale explorer says they aren't, https://ssl.icu-project.org/icu-bin/locexp?d_=en&_=hr_HR

Changes in <LC_TRANSLITERATION> element probably just due to overwriting existing data.

Elements of <LC_NumberingLevel> are different.

Elements of <LC_OutLineNumberingLevel> are removed and replaced by <LC_OutLineNumberingLevel ref="en_US"/>, most certainly not correct.

Summary: this change can't be applied.
Comment 3 Eike Rathke 2013-07-01 17:46:30 UTC
Created attachment 81826 [details]
diff of re-indented file against current revision
Comment 4 Mikeyy - L10n HR 2013-07-01 17:56:40 UTC
Thanks for your comment.
Does current hr_HR.xml file which is in use in LO contain all needed / possible fields?

Want to know if we need to recheck if all xml fields are there and we can just edit current file, or we maybe need to add new xml fields?
Comment 5 Eike Rathke 2013-07-01 21:28:34 UTC
It contains all elements. Only that it has <LC_COLLATION ref="sh_RS"/> which means that for the LC_COLLATION element it inherits the content defined in the sh_RS.xml file.
Comment 6 Krunoslav Šebetić 2013-07-18 16:30:13 UTC
Created attachment 82620 [details]
few modification to a croatian localization file

Added few date formats, modified LongDateDaySeparator and more...
Comment 7 Eike Rathke 2013-07-18 17:33:05 UTC
Looks good, except some nitpicks I have:

* FormatElement formatindex 50 and 51 are inserted instead of appended
  to the list (really minor nitpick ;-)

* FormatElement formatindex="24" was changed from
  <FormatCode>D, MMM YYYY</FormatCode> to
  <FormatCode>D. M. YYYY.</FormatCode>
  * I suggest to keep the original element and add another element
    instead.
    * Reason: when formats <50 are used with system standard locale (not
    hard set to a specific locale), and the standard locale is switched
    to another locale without reloading the document, the numerically
    equivalent format of that other locale is applied. Formats' meanings
    should match across locales, this is not enforced but it should..

* FormatElement formatindex="52"
  I thought this should use non-breaking spaces, as you asked about it
  in mail? The diff is with ordinary space.

* DateAcceptancePattern elements for the new formats with spaces are
  missing.

The rest looks good. If you have problems with any of these just tell me
and I'll do the necessary changes.

Last but not least:

As this seems to have been your first contribution to the code base
could you please send us a blanket statement that you contribute all
your past and future patches under LGPLv3+ and MPL 1.1 licenses? Best on
the dev (or l10n) mailing list so we can link to it from
https://wiki.documentfoundation.org/Development/Developers

Something like this does nicely:

All of my past & future contributions to LibreOffice may be
licensed under the MPLv2/LGPLv3+ dual license.


We need it to commit your contribution to the code repository.
Thanks.
Comment 8 Krunoslav Šebetić 2013-07-18 18:09:04 UTC
For non-breaking character, should &#160; do?
so need add <DateAcceptancePattern>D.&#160;M.</DateAcceptancePattern>
    <DateAcceptancePattern>D.&#160;M.&#160;YYYY</DateAcceptancePattern>
    <DateAcceptancePattern>DD.&#160;MM.&#160;YYYY.</DateAcceptancePattern>

and

    </FormatElement>
    <FormatElement msgid="DateFormatskey26" default="false" type="medium" usage="DATE" formatindex="54">
      <FormatCode>D.&#160;M.&#160;YYYY</FormatCode>

    <FormatElement msgid="DateFormatskey24" default="false" type="medium" usage="DATE" formatindex="52">
      <FormatCode>DD.&#160;MM.&#160;YYYY.</FormatCode>    
    </FormatElement>?

format index kes 24 back to D, MMM YYYY and formatindex key 23 and 25 to the end of that section?
Comment 9 Krunoslav Šebetić 2013-07-18 18:41:38 UTC
Yes, I do have hard time doing it right. Please, help me and change what needs to be changed, I just can't get it right. When everything is in place, let me no so I'll send a contribution statement to mailing list.

Thanks
Comment 10 Mikeyy - L10n HR 2013-07-18 20:39:02 UTC
Eike, formatindex="24" was changed because it was 100% indentical as formatindex="23" (except afcourse formatindex number and title).
Comment 11 Eike Rathke 2013-07-18 21:10:25 UTC
(In reply to comment #8)
> For non-breaking character, should &#160; do?

Numeric character entities indeed seem to work. For readability I prefer the literal character though.

(In reply to comment #9)
> Yes, I do have hard time doing it right. Please, help me and change what
> needs to be changed,

No problem, I'll do. I'll commit your change and then adapt to the needs with a followup-commit.

(In reply to comment #10)
> Eike, formatindex="24" was changed because it was 100% indentical as
> formatindex="23" (except afcourse formatindex number and title).

Yes, there may be a few duplicate entries with the current conventions, but nothing to worry about, they don't harm. In case you're interested in the historical reason see http://cgit.freedesktop.org/libreoffice/core/tree/offapi/com/sun/star/i18n/NumberFormatIndex.idl
There are future plans to revive the original meanings of the SYSTEM/SYS/DEF formats mentioned and introduce a system settings layer, so we're not weeding everything out.
Comment 12 Mikeyy - L10n HR 2013-07-18 21:30:34 UTC
Please leave DateAcceptancePattern as they are. We discussed this in another bug when you were introducing this and kicked out few of them. This should be fine, especially if it only affect Calc.

Concerning formatindex="24", I don't see why we should keep 2 indentical formattings in UI. If it's all the same to you, we will change it. If there are some other reasons and they must stay same, then we will add new one.

Finaly, does LC_COLLATION (this is new, copied from serbian xml file and edited) and LC_INDEX (changed to represent croatian alphabet) look ok to you?
We kicked out A-Z mark in LC_INDEX which included english only letters like Q, X, W and Y. Will that affect anything?
Comment 13 Eike Rathke 2013-07-18 23:00:15 UTC
(In reply to comment #12)
> Please leave DateAcceptancePattern as they are. We discussed this in another
> bug when you were introducing this and kicked out few of them. This should
> be fine, especially if it only affect Calc.

Ok if dates are not to be entered with spaces in between.


> Concerning formatindex="24", I don't see why we should keep 2 indentical
> formattings in UI. If it's all the same to you, we will change it. If there
> are some other reasons and they must stay same, then we will add new one.

They must not stay the same. I know it may look odd to have duplicated entries. The originally intended purpose also may have resulted in duplicated entries for some locales depending on the current system's Regional Settings. Currently there is no compelling reason though to keep the duplicated entries identical.

> Finaly, does LC_COLLATION (this is new, copied from serbian xml file and
> edited) and LC_INDEX (changed to represent croatian alphabet) look ok to you?

Yes, looks good.

> We kicked out A-Z mark in LC_INDEX which included english only letters like
> Q, X, W and Y. Will that affect anything?

That's ok, index terms starting with a letter not defined in IndexKey will be appended after the last entry with a defined IndexKey.
Comment 14 Commit Notification 2013-07-18 23:21:18 UTC
Krunoslav Å ebetiÄ committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=38fc4d86d96c8d4c9b79299f83fd005ac3bef85c

changed Croatian [hr-HR] locale data, fdo#66376



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 15 Commit Notification 2013-07-18 23:21:37 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=2cce78137bbf9810421b9fdedd45127b019b1188

small cleanup in [hr-HR] locale data, fdo#66376



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 16 Mikeyy - L10n HR 2013-07-19 05:52:57 UTC
Looks good to me, thank you.
Will this be included in 4.1.1 maybe or this kind of change only follows major releases?
Comment 17 Eike Rathke 2013-07-19 08:28:22 UTC
I can backport and submit for 4-1 review.
Comment 18 Eike Rathke 2013-07-19 09:08:15 UTC
Pending review for 4-1 as https://gerrit.libreoffice.org/4977
Comment 19 Mikeyy - L10n HR 2013-07-19 09:17:09 UTC
Thank you.
Comment 20 Commit Notification 2013-07-19 10:38:04 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-4-1":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=faf6e4c29463c28fd8c14f72542126139e0c9fbf&h=libreoffice-4-1

changed Croatian [hr-HR] locale data, fdo#66376


It will be available in LibreOffice 4.1.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 21 Krunoslav Šebetić 2013-07-19 15:49:34 UTC
Thanks Eike,

Kruno