Bug 130193 - Japanese Traditional Numeric texts are wrong such as 1,2,3, 10, 1000, 10000
Summary: Japanese Traditional Numeric texts are wrong such as 1,2,3, 10, 1000, 10000
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Naruhiko Ogasawara
URL:
Whiteboard: target:7.0.0
Keywords:
Depends on:
Blocks: CJK CJK-Japanese
  Show dependency treegraph
 
Reported: 2020-01-25 14:31 UTC by DaeHyun Sung
Modified: 2020-07-04 10:31 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Excel 2019 CJK Number string check Excel file (14.55 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2020-01-25 14:33 UTC, DaeHyun Sung
Details
traditional Chinese numbers in Excel. (17.35 KB, image/png)
2020-05-20 08:23 UTC, Mark Hung
Details
DBNum3 in LO (24.66 KB, image/png)
2020-05-20 08:40 UTC, Mark Hung
Details

Note You need to log in before you can comment on or make changes to this bug.
Description DaeHyun Sung 2020-01-25 14:31:42 UTC
Description:
In LibreOffice, I found Japanese Traditional Number text units are wrong.

I recommend to fix some Japanese Traditional Number text such as 1,2,3,10,1000,10000.

1: 一 -> 壱
2: 二 -> 弐
3: 三 -> 参
5: 五 -> 伍
10: 十 -> 壱拾
1000: 一千 -> 壱阡
10000: 一万 -> 壱萬

Steps to Reproduce:
1. open LibreOffice calc
2.
3.

Actual Results:
Japanese 日本の大字
0: 〇
1: 一
2: 二
3: 三
4: 四
5: 五
6: 六
7: 七
8: 八
9: 九
10: 十
100: 一百
1000: 一千
10000: 一万

Expected Results:
Japanese 日本の大字
0: 〇
1: 壱
2: 弐
3: 参
4: 四
5: 伍
6: 六
7: 七
8: 八
9: 九
10: 壱拾
100: 一百
1000: 一阡
10000: 一萬


Reproducible: Always


User Profile Reset: No



Additional Info:
In Japanese, Traditional Numeric texts use it 
1: 壱 instead of 一
2: 弐 instead of 二
3: 参 instead of 三
5: 伍 instead of 五
10: 壱拾 instead of 十
1000: 壱阡 instead of 一千
10000: 壱萬 instead of 一万
Comment 1 DaeHyun Sung 2020-01-25 14:33:46 UTC
Created attachment 157421 [details]
Excel 2019 CJK Number string check Excel file

Excel 2019 CJK Number string check Excel file
Korean - Japanese - Chinese[Mainland China and Taiwan]
Comment 3 Naruhiko Ogasawara 2020-05-02 10:21:35 UTC
Reproduced with:

Version: 6.4.3.2
Build ID: libreoffice-6.4.3.2-snap1

by following steps:

Step 1. launch Calc
Step 2. input numbers into cells one by one
  1
  2
  3
  5
  10
  1000
  10000
Step 4. select all cells, then Format > Cells, Numbers tab: 
  select Number, Format code: '[DBNum2]#'

Note: according to the help[1], [DBNum2] means

> Japanese: traditional Kanji characters; CAL: 2/5/5 [DBNum2]

and this should give us different result with [DBNum1],

> Japanese: short Kanji characters [DBNum1]; CAL: 1/4/4 [DBNum1]

Now these two modifier give as same results, which is the problem.

[1] https://help.libreoffice.org/3.3/Common/Number_Format_Codes
Comment 4 Naruhiko Ogasawara 2020-05-06 11:03:55 UTC
Sorry, my bad, according to the help, this behavior might be intentional.


Excel format modifier "[DBNum2]" means "Traditional Number Text" in Japanese, but in our latest help:

https://help.libreoffice.org/6.4/ja/text/shared/01/05020301.html?&DbPAR=WRITER&System=WIN

[DBNum2] is mapped to [NatNum4], which means "modern long Kanji text" in Calc native, not to [NatNum5], which means "traditional long Kanji text."


As the Japanese native, this is Excel incompatibility and IMHO it should be fixed, but this behavior has been same since 3.3, so I guess there is some historical reason... I'm asking it to the Japanese community.
Comment 5 Naruhiko Ogasawara 2020-05-17 09:01:41 UTC
Due to discussions in the Japanese community, I have decided to make major changes to the mapping between Excel (DBNum) and Calc (NatNum). This will change the specification, but will instead improve interoperability with Excel.

The new mapping rules are as follows:

from DBNum to NatNum (import):

- DBNum1 -> NatNum4 (modern long Kanji text)
- DBNum2 -> NatNum5 (traditional long Kanji text)
- DBNum3 -> NatNum3 (fullwidth Arabic digits)

from NatNum to DBNum (export):

- NatNum1 -> DBNum1
- NatNum2 -> DBNum2
- NatNum3 -> DBNum3
- NatNum4 -> DBNum1
- NatNum5 -> DBNum2
- NatNum6 -> DBNum3
- NatNum7 -> DBNum1
- NatNum8 -> DBNum2
- NatNum9 -> (DBNum0, as Arabic)

Translated with www.DeepL.com/Translator (free version)

I already submit a set of patches includes unit tests.  After it will get merged, I also have to update the help content.
Comment 6 Naruhiko Ogasawara 2020-05-19 14:01:56 UTC
COMMENT NEEDED from Chinese (both of Simplified and Traditional):

Related to this issue, I'd like to fix the Asian numeric format mapping in Calc - Excel. We've already committed to fixing the Japanese language.


At least in Japanese, Excel allows you to specify three different Asian numeric formats in the user interface (example: 123456789).

Modern:            [DBNum1]Standard  一億二千三百四十五萬六千七百八十九
Traditional:       [DBNum2]Standard  壹億貳仟參佰肆拾伍萬陸仟柒佰捌拾玖
Full-width Arabic: [DBNum3]0        123456789

Both "Standard" and "0" are modifiers which means "long format" and "short format."


Unfortunately, Calc can't recognize these modifiers, so [DBnum3]Standard and [DBNum3]0 must be the same format after importing to Calc.

So, at the moment, the Chinese mapping specification is:

Modern:            [DBNum1]Standard  一億二千三百四十五萬六千七百八十九
Traditional:       [DBNum2]Standard  壹億貳仟參佰肆拾伍萬陸仟柒佰捌拾玖
Full-width Arabic: [DBNum3]0        1億2千3百4十5萬6千7百8十9


Is the Excel UI spec in Chinese is the same as Japanese?
If so, would this specification still be desirable, even though importing from Excel would change the formatting of the full-width Arabic numerals?
Comment 7 Mark Hung 2020-05-20 08:23:43 UTC
Created attachment 161022 [details]
traditional Chinese numbers in Excel.

Both modern and traditional numerals are available from traditional Chinese MSO 2013. I didn't see an option for full-width Arabic numerals.
Comment 8 Ming Hua 2020-05-20 08:28:31 UTC
Hi Ogasawara-san,

Thanks for looking into the Chinese aspect of this issue.

(In reply to Naruhiko Ogasawara from comment #6)
> COMMENT NEEDED from Chinese (both of Simplified and Traditional):
> So, at the moment, the Chinese mapping specification is:
> 
> Modern:            [DBNum1]Standard  一億二千三百四十五萬六千七百八十九
> Traditional:       [DBNum2]Standard  壹億貳仟參佰肆拾伍萬陸仟柒佰捌拾玖
> Full-width Arabic: [DBNum3]0        1億2千3百4十5萬6千7百8十9

Can't speak for the traditional Chinese users (except that [DBNum3]0 result is obviously undesirable), but for simplified Chinese in Calc, the result is similar:

[DBNum1]General  一亿二千三百四十五万六千七百八十九
[DBNum2]General  壹亿贰仟叁佰肆拾伍万陆仟柒佰捌拾玖
[DBNum3]0        1亿2千3百4十5万6千7百8十9

The first two looks rather good to me (maybe 萬 instead of 万 in the second one, I don't know), while the third is also obviously wrong.  It's probably not a big deal though as in mainland China full-width Arabic is very rarely used (I've never seen one).
 
> Is the Excel UI spec in Chinese is the same as Japanese?
> If so, would this specification still be desirable, even though importing
> from Excel would change the formatting of the full-width Arabic numerals?

I don't use Excel myself and know little about its spec or how it should be imported.  I've put Kevin Suo in CC who hopefully can give more insight.

Also, if you or DaeHyun feel this bug is for the Japanese aspect and too much talk about Chinese would be confusing, I'm happy to open a new bug and have it referencing bug 130140.
Comment 9 Mark Hung 2020-05-20 08:40:10 UTC
Created attachment 161023 [details]
DBNum3 in LO

So, current behavior of [DBNum3]0 for zh-TW is wrong. I expect them to be all fullwidth Arabic as in Excel, maybe that is consistent with Japanese usage? 

HTH.
Comment 10 Naruhiko Ogasawara 2020-05-20 10:56:02 UTC
Ming Hua, Mark Hung, Thanks for your helpful comments!

So in my understanding, not only in Japanese, but also in Chinese, suitable mapping of DBNum3 might:

- DBNum3 -> NatNum3 (fullwidth Arabic digits)

which is the same as the patch I submitted.  I'll re-submit it with regard to your information.
Comment 11 Naruhiko Ogasawara 2020-05-20 11:36:23 UTC
And I just confirmed Excel doesn't have a UI to specify full-width Arabic in Simplified/Traditional Chinese, so [DBNum3] issue is not serious than Japanese, but I still thought it will be fixed.
Comment 12 Ming Hua 2020-05-20 11:51:47 UTC
(In reply to Naruhiko Ogasawara from comment #10)
> Ming Hua, Mark Hung, Thanks for your helpful comments!
> 
> So in my understanding, not only in Japanese, but also in Chinese, suitable
> mapping of DBNum3 might:
> 
> - DBNum3 -> NatNum3 (fullwidth Arabic digits)
Yes, for simplified Chinese the mapping you proposed in comment 5

> - DBNum1 -> NatNum4 (modern long Kanji text)
> - DBNum2 -> NatNum5 (traditional long Kanji text)
> - DBNum3 -> NatNum3 (fullwidth Arabic digits)

is good.  I'll look through your NatNum -> DBNum export mapping later.

> which is the same as the patch I submitted.  I'll re-submit it with regard
> to your information.
Thanks!
Comment 13 Ming Hua 2020-05-22 12:13:14 UTC
(In reply to Ming Hua from comment #12)
> I'll look through your NatNum -> DBNum export mapping later.
So NatNum formats for simplified Chinese are:

[NatNum1]  一二三四五六七八九
[NatNum2]  壹贰叁肆伍陆柒捌玖
[NatNum3]  123456789
[NatNum4]  一亿二千三百四十五万六千七百八十九
[NatNum5]  壹亿贰仟叁佰肆拾伍万陆仟柒佰捌拾玖
[NatNum6]  1亿2千3百4十5万6千7百8十9
[NatNum7]  亿二千三百四十五万六千七百八十九
[NatNum8]  亿贰仟叁佰肆拾伍万陆仟柒佰捌拾玖

Among these, 1-5 look good and are useful, 6-8 are either incorrect or useless.

According to Help (https://help.libreoffice.org/6.4/en-US/text/shared/01/05020301.html), NatNum6 is "fullwidth text", NatNum7 is "short lower case text", and NatNum8 is "short upper case text" for Chinese.

(In reply to Naruhiko Ogasawara from comment #5)
> from NatNum to DBNum (export):
> 
> - NatNum1 -> DBNum1
> - NatNum2 -> DBNum2
> - NatNum3 -> DBNum3
> - NatNum4 -> DBNum1
> - NatNum5 -> DBNum2
> - NatNum6 -> DBNum3
> - NatNum7 -> DBNum1
> - NatNum8 -> DBNum2
> - NatNum9 -> (DBNum0, as Arabic)
It seems for simplified Chinese we don't need such comprehensive mapping, just

NatNum3 -> DBNum3
NatNum4 -> DBNum1
NatNum5 -> DBNum2

is good enough.  Everything else can be exported to DBNum0 or no special format.
Comment 14 DaeHyun Sung 2020-05-23 12:35:54 UTC
(In reply to Naruhiko Ogasawara from comment #5)
> Due to discussions in the Japanese community, I have decided to make major
> changes to the mapping between Excel (DBNum) and Calc (NatNum). This will
> change the specification, but will instead improve interoperability with
> Excel.
> 
> The new mapping rules are as follows:
> 
> from DBNum to NatNum (import):
> 
> - DBNum1 -> NatNum4 (modern long Kanji text)
> - DBNum2 -> NatNum5 (traditional long Kanji text)
> - DBNum3 -> NatNum3 (fullwidth Arabic digits)
> 
> from NatNum to DBNum (export):
> 
> - NatNum1 -> DBNum1
> - NatNum2 -> DBNum2
> - NatNum3 -> DBNum3
> - NatNum4 -> DBNum1
> - NatNum5 -> DBNum2
> - NatNum6 -> DBNum3
> - NatNum7 -> DBNum1
> - NatNum8 -> DBNum2
> - NatNum9 -> (DBNum0, as Arabic)
> 
> Translated with www.DeepL.com/Translator (free version)
> 
> I already submit a set of patches includes unit tests.  After it will get
> merged, I also have to update the help content.

Unlike Japanese and Chinese[Simplified, Traditional] environment on Excel, In Korean Situation, Excel exist DBNum1~4.

I checked DBNum1~4 series on Excel.

DBNum1	1234567890	一十二億三千四百五十六万七千八百九十
DBNum2	1234567890	壹拾貳億參阡四百伍拾六萬七阡八百九拾
DBNum3	1234567890	十2億3千4百5十6万7千8百9十
DBNum4	1234567890	일십이억삼천사백오십육만칠천팔백구십

Also, I checked  Korean Number to Strings on LibreOffice.

TEXT(B2;”[natnum1]#”)	123456789012	一二三四五六七八九〇一二
TEXT(B3;”[natnum2]#”)	123456789012	壹貳參四五六七八九零壹貳
TEXT(B4;”[natnum3]#”)	123456789012	123456789012
TEXT(B5;”[natnum4]#”)	123456789012	一千二百三十四億五千六百七十八万九千一十二
TEXT(B6;”[natnum5]#”)	123456789012	壹仟貳佰參拾四億五仟六佰七拾八萬九仟壹拾貳
TEXT(B7;”[natnum6]#”)	123456789012	1천2백3십4억5천6백7십8만9천1십2
TEXT(B8;”[natnum7]#”)	123456789012	千二百三十四億五千六百七十八万九千十二
TEXT(B9;”[natnum8]#”)	123456789012	仟貳佰參拾四億五仟六佰七拾八萬九仟拾貳
TEXT(B10;”[natnum9]#”)	123456789012	일이삼사오육칠팔구영일이
TEXT(B11;”[natnum10]#”)	123456789012	일천이백삼십사억오천육백칠십팔만구천일십이
TEXT(B12;”[natnum11]#”)	123456789012	천이백삼십사억오천육백칠십팔만구천십이

As a result, 
1. from DBNum to NatNum (import):
 - DBNum1 -> NatNum4 (Korean Hanja text 한자숫자)
 - DBNum2 -> NatNum5 (Korean Upper Hanja text 갖은자)
 - DBNum3 -> NatNum6 (fullwidth Arabic digits with Korean hanja unit of Numbering)
 - DBNum4 -> NatNum10 (Korean Hangul text)

I found the Bug for NatNum6 (I'll change Korean Hangul to Hanja for compatibility) 

2. From NatNum to DBNum
 - NatNum1 -> DBNum1
 - NatNum2 -> DBNum2
 - NatNum3 -> DBNum3
 - NatNum4 -> DBNum1
 - NatNum5 -> DBNum2
 - NatNum6 -> DBNum3
 - NatNum7 -> DBNum1
 - NatNum8 -> DBNum2
 - NatNum9 -> DBNum4
 - NatNum10 -> DBNum4 
 - NatNum11 -> DBNum4


I'll submit the new mapping rules.
Comment 15 Commit Notification 2020-05-25 20:52:15 UTC
Naruhiko Ogasawara committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/9efd7cd637d9d882f2fc8277b657ec117c591e80

tdf#130193: Asian Excel-Calc number format interop

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Commit Notification 2020-05-25 20:57:26 UTC
DaeHyun Sung committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/51404171449eadcb69057ff03cbb7bdb0117910b

Remapping NatNum-DBNum in Korean for compatibility tdf#130193

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 DaeHyun Sung 2020-07-04 10:31:50 UTC
Submitted and apply to LibreOffice core repo.