Bug 151359 - Locale data: The DefaultName for CountryID "TW" should be "Taiwan (Province of China)" rather than "Taiwan"
Summary: Locale data: The DefaultName for CountryID "TW" should be "Taiwan (Province o...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Localization (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-05 15:07 UTC by Kevin Suo
Modified: 2022-10-08 07:48 UTC (History)
13 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Suo 2022-10-05 15:07:58 UTC
I hate to discuss political issues in open-source projects, but this one causes real headache.

Currently, in i18npool/source/localedata/data/zh_TW.xml we have:

  <LC_INFO>
    <Language>
      <LangID>zh</LangID>
      <DefaultName>Traditional Chinese</DefaultName>
    </Language>
    <Country>
      <CountryID>TW</CountryID>
      <DefaultName>Taiwan</DefaultName>
    </Country>
  </LC_INFO>

The DefaultName here is wrong. Taiwan is not a country. I am aware that some people from the Taiwan community may disagree, but you have to admit, that as of today, Taiwan is not an independent country recognized world-wide. The current DefaultName here causes potential risk of our LibreOffice software been banned in China. Even if not banned, the Chinese government agencies, companies, schools, individuals etc would not use our software if this issue emerges from the water.

I am not going to argue the political issues in this thread, but as a software distributed globally, we have to stick to some international standards.

The schema in i18npool/source/localedata/data/locale.dtd defines that "CountryID must be a valid two letter country identifier defined by ISO 3166". Then I assume that DefaultName must be the corresponding name as defined by ISO 3166. From https://www.iso.org/obp/ui/#iso:code:3166:TW we see that the short name for "TW" should be "Taiwan (Province of China)", rather than "Taiwan".
Comment 1 Kevin Suo 2022-10-05 15:22:40 UTC
Proposed patch submitted as:
https://gerrit.libreoffice.org/c/core/+/140987
Comment 2 Roman Kuznetsov 2022-10-05 15:54:47 UTC
But what about another side of this change? What about Taiwan government, schools, etc .? I would ask TDF Board of Directors here
Comment 3 Kevin Suo 2022-10-05 15:57:18 UTC
(In reply to Roman Kuznetsov from comment #2)
That is why we should comply with International Standards.
Comment 4 Eike Rathke 2022-10-05 16:04:21 UTC
Those locale data LC_INFO DefaultName aren't even displayed anywhere, it's only in source code. But if it will make people happy..
Comment 5 Julien Nabet 2022-10-05 18:32:42 UTC
Some countries don't recognize Taiwan as a country but some other countries do. At least, just letting "Taiwan" don't offense anyone.
So I'm strongly against this one but I suppose it won't change anything so it's just for the record.
If it's done, it would be "funny" that Taiwan ban Libreoffice.
Comment 6 V Stuart Foote 2022-10-05 19:20:50 UTC
No, the status quo suffices. While we could follow the IOC, WTO and even Beijing's practice to name it "Chinese Taipei" -- just leave it be and concede that just "Taiwan" is the least offensive to ALL.

Maybe in 20 - 30 years following some form of reunification "Taiwan (Province of China)" might become acceptable. Who knows...

Until then, -1
Comment 7 Julien Nabet 2022-10-06 04:54:57 UTC
Italo/Michael: are we going to give in here just under the cover of the Iso which clearly denies the Taiwan existence as a country just for the marketing?
Comment 8 Kevin Suo 2022-10-06 04:56:59 UTC
In reply to Julien Nabet in comment 5:
> At least, just letting "Taiwan" don't offense anyone.

In reply to V Stuart Foote in comment 6:
> just leave it be and concede that just "Taiwan" is the least offensive to ALL

It won't offend anyone if the word "Taiwan" appears standalone or within a "CountryOrRegion" tag. But it has already offended 1.4 billion Chinese people (including me) when you put the word "Taiwan" *within* the "country" tag:
    <Country>
      <CountryID>TW</CountryID>
      <DefaultName>Taiwan</DefaultName>
    </Country>

You have to be aware that this is not just the Taiwan issue. It may also affect other areas of the World. Each of the locale data in source/localedata should be reviewed carefully by high-level governance body of the TDF. The principles for such review should be based on a standard, rather than the opinion or judgement of several members of the community. If you decide that the locale data should comply with ISO 639 and ISO 3166, then you must follow this rule. By doing this, if something goes wrong, then the blame would go to ISO or the United Nations, the root cause of which may be the debate of the governments of different regions of the world, rather than our fault.

Another solution for this would be renaming the tag to the following for all locales:
    <CountryOrRegion>
      <CountryOrRegionID>TW</CountryOrRegionID>
      <DefaultName>Taiwan</DefaultName>
    </CountryOrRegionID>

Or, simply the following per BCP 47:
    <Region>
      <RegionID>TW</RegionID>
      <DefaultName>Taiwan</DefaultName>
    </RegionID>

Per RFC 5646 BCP 47, for "zh-TW", the tag "zh" is language and the tag "TW" is region. “Region subtags are used to indicate linguistic variations associated with or appropriate to a specific country, territory, or region.”, see https://www.rfc-editor.org/rfc/rfc5646.html#section-2.2.4. As such, the subtag here should be named "Region" rather than "Country".
RFC 5646 also stated that " two-letter region subtags were defined according to the assignments found in [ISO3166-1]". ISO3166-1, entitled " Codes for the representation of names of countries and their subdivisions - Part 1: Country Codes", defined the Alpha-2 code, Short name, Independent status etc for "TW", but it did not define whether "TW" is a "region" or a "country", if you interpret it as a "Country" then you MUST also consider the name ISO3166-1 assigned to it: "Taiwan (Province of China)".
As such, RFC 5646 BCP 47 says "TW" is a "REGION", and ISO3166 says the "Country Code" for Taiwan is "TW" and its Short Name is "Taiwan (Province of China)". It is LibreOffice (or strictly speaking, Aoo) which invented that "TW" is a "COUNTRY" and its name is "Taiwan".
Comment 9 Timur 2022-10-06 07:35:43 UTC
This request was properly explained. While technically trivial, politically it's an issue for BoD, so I add Chairperson to come back with a resolution. CC Thorsten.
Comment 10 Michael Meeks 2022-10-06 08:28:21 UTC
(In reply to Julien Nabet from comment #7)
> Italo/Michael: are we going to give in here just under the cover of the Iso
> which clearly denies the Taiwan existence as a country just for the marketing?

Flattered as I am by the thought that I should have a say in this =) I really expect the elected board to unwind this one as in all cases of delicate compromise. Clearly it is an important topic, thanks for sending it to the right place Julien & Timur.

I would counsel calm and leaving this topic to the board. As I read it things have been this way for a long time and this is not user-visible. The board (as I see it) is working through a large back-log of urgent problems currently - and no doubt will add this to its in-tray. I expect the board would value advice from Taiwan too on what seems reasonable =)
Comment 11 Caolán McNamara 2022-10-06 08:38:23 UTC
FWIW "short names" in 3166 are different than what we use in a bunch of these:
https://www.iso.org/obp/ui/#iso:code:3166:RU is "Russian Federation (the)" while we have "Russia", likewise
https://www.iso.org/obp/ui/#iso:code:3166:US is "United States of America (the)" vs our "United States" and
https://www.iso.org/obp/ui/#iso:code:3166:GB is "United Kingdom of Great Britain and Northern Ireland (the)" vs our "United Kingdom".

if we're not using this field for anything, perhaps we can just remove it entirely
Comment 12 Thorsten Behrens (allotropia) 2022-10-06 10:24:35 UTC
Thx for raising this, just to state I'm now aware of the report (and associated patch).

Please do continue the technical part of the discussion here (how we use it, whether it's even needed, etc).
Comment 13 Eike Rathke 2022-10-06 10:35:33 UTC
(In reply to Caolán McNamara from comment #11)
> if we're not using this field for anything, perhaps we can just remove it
> entirely

The field needs to be present (or we change the DTD), but we could use
<DefaultName>you know what</DefaultName>

So far I always regarded it as a kind of comment that is preserved by all XML processing.. maybe making it optional and just removing it in zh_TW.xml would indeed be best.


(In reply to Kevin Suo from comment #8)
> Another solution for this would be renaming the tag to the following for all
> locales:
>     <CountryOrRegion>
>       <CountryOrRegionID>TW</CountryOrRegionID>
>       <DefaultName>Taiwan</DefaultName>
>     </CountryOrRegionID>
No, because it is not a region ID, it is solely the ISO 3166-1 alpha-2 country code. Everything else per convention goes into the qlt local-use language tag in the Language field with its full BCP 47 tag in the Variant field.


> Or, simply the following per BCP 47:
>     <Region>
>       <RegionID>TW</RegionID>
>       <DefaultName>Taiwan</DefaultName>
>     </RegionID>
Similar no.

> Per RFC 5646 BCP 47, for "zh-TW", the tag "zh" is language and the tag "TW"
> is region. “Region subtags are used to indicate linguistic variations
> associated with or appropriate to a specific country, territory, or
> region.”, see https://www.rfc-editor.org/rfc/rfc5646.html#section-2.2.4.
And that encompasses much more than ISO 3166-1 alpha-2 country codes, to which the field needs to be restricted for Java compatibility if the LC_INFO element's IDs are stuffed into a css:lang::Locale.
Comment 14 Mark Hung 2022-10-06 14:33:08 UTC
Hi Eike,

(In reply to Eike Rathke from comment #13)
> (In reply to Caolán McNamara from comment #11)
> > if we're not using this field for anything, perhaps we can just remove it
> > entirely
> 
> The field needs to be present (or we change the DTD), but we could use

Is there any issue if we change DTD?
As far as I know LibreOffice support adding language by extension, maybe that might be an issue.

> <DefaultName>you know what</DefaultName>

As Caolán mentioned, "short names" in 3166 are different than what LibreOffice has here in some places. While there is no description about DefaultName property , maybe it is enough to just update the comment for CountryID to explicitly express it is not 3166 short name as a way to get rid of unecessary debate?

> 
> So far I always regarded it as a kind of comment that is preserved by all
> XML processing.. maybe making it optional and just removing it in zh_TW.xml
> would indeed be best.

Is the statement 'best' based on easiness of implementation?
Removing the field from zh_TW.xml alone based on other non-local user opinion doesn't sound good to me.
Comment 15 Mark Hung 2022-10-06 14:51:11 UTC
(In reply to Kevin Suo from comment #0)
> I hate to discuss political issues in open-source projects, but this one
> causes real headache.

The request itself is political. I hope that we didn't have to discuss this.


> The DefaultName here is wrong. Taiwan is not a country. I am aware that some
> people from the Taiwan community may disagree, but you have to admit, that

Sure.

> as of today, Taiwan is not an independent country recognized world-wide. The
> current DefaultName here causes potential risk of our LibreOffice software
> been banned in China. Even if not banned, the Chinese government agencies,
> companies, schools, individuals etc would not use our software if this issue
> emerges from the water.

The problem here is trying to censor other people. You're trying to change something that you're not a real user.
Comment 16 Kevin Suo 2022-10-06 15:23:53 UTC
(In reply to Mark Hung from comment #15)

Mark, I understand you opinion and I think we will not come to an agreement on this matter, since you are from the Taiwan community and I am from the (Mainland) China community. I am raising this issue solely for the long-term growth of this project.

Before I discover this matter, I know you well and I respect you for your contribution to LibreOffice, while I am also contribution plenty of my spare time to this project. I am not trying to censor anyone. I raise this issue here because I love this project and I help to promote LibreOffice to many person I know and I help on QA and bug fixes and I don't want my effort to be and end due to some political flaw which can be fixed technically. I hope this issue been resolved technically before it becomes a hot-topic on news media and further ruins the project, as the relationship between mainland and Taiwan becomes worse after the a lady (whose name I don't want to mention) 's visit to Taiwan.

I agree that we continue the discussion on technical part of this matter, as mentioned by Thorsten in comment 12.
Comment 17 Eike Rathke 2022-10-06 15:31:36 UTC
(In reply to Mark Hung from comment #14)
> (In reply to Eike Rathke from comment #13)
> > (In reply to Caolán McNamara from comment #11)
> > > if we're not using this field for anything, perhaps we can just remove it
> > > entirely
> > 
> > The field needs to be present (or we change the DTD), but we could use
> 
> Is there any issue if we change DTD?
No issue, we'd just have to mark the element as optional (if anyone actually used the locale.dtd to validate data against, as lined out there and pointed to in https://wiki.documentfoundation.org/LibreOffice_Localization_Guide/How_To_Submit_New_Locale_Data).


> As far as I know LibreOffice support adding language by extension, maybe
> that might be an issue.
Locale data can't be added by extension (afaik, unless you implement a complete css::i18n::XLocaleData5 interface and manage to plug that into the css::i18n::LocaleData2 service somehow so that it switches depending on what locale the interface is being asked for), but even if there was such, having an optional element being present wouldn't harm.


> As Caolán mentioned, "short names" in 3166 are different than what
> LibreOffice has here in some places.
Note that names may also change over time (like even worse country codes can, hence we go only with IANA type region subtag country codes nowadays), they may have been different in the past.


> While there is no description about
> DefaultName property , maybe it is enough to just update the comment for
> CountryID to explicitly express it is not 3166 short name as a way to get
> rid of unecessary debate?
? CountryID is *only* ISO 3166-1 alpha-2 country codes, or the IANA type region two letter code equivalent.


> > So far I always regarded it as a kind of comment that is preserved by all
> > XML processing.. maybe making it optional and just removing it in zh_TW.xml
> > would indeed be best.
> 
> Is the statement 'best' based on easiness of implementation?
Not only that. It would avoid endless political discussions about what content would be right or wrong. Obviously whatever we put there will be wrong for *someone*. We don't need the element for functionality, it's not shown anywhere in the UI, so let's just get rid of it in this case.
Comment 18 Eike Rathke 2022-10-06 15:35:47 UTC
Note also that in the UI we do not mention the name Taiwan, but call the languages/locales "Chinese (simplified)" and "Chinese (traditional)", exactly to get around this political issue.
Comment 19 Mark Hung 2022-10-08 02:23:08 UTC Comment hidden (obsolete)
Comment 20 Kevin Suo 2022-10-08 03:16:25 UTC Comment hidden (obsolete)
Comment 21 Julien Nabet 2022-10-08 07:48:45 UTC Comment hidden (obsolete)