Bug 147054 - Native Numbering: Natnum4 for Chinese is wrong for numbers between 10 and 20
Summary: Native Numbering: Natnum4 for Chinese is wrong for numbers between 10 and 20
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsDevAdvice
Depends on:
Blocks: CJK
  Show dependency treegraph
 
Reported: 2022-01-29 10:05 UTC by Kevin Suo
Modified: 2022-09-30 14:00 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Chinese_Natnum4_and_Natnum5.ods (33.48 KB, application/vnd.oasis.opendocument.spreadsheet)
2022-01-29 10:05 UTC, Kevin Suo
Details
NatNun4.png (bug btw 10-20) (147.77 KB, image/png)
2022-01-29 10:21 UTC, Kevin Suo
Details
NatNum5.png (correct) (160.74 KB, image/png)
2022-01-29 10:21 UTC, Kevin Suo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Suo 2022-01-29 10:05:18 UTC
Created attachment 177898 [details]
Chinese_Natnum4_and_Natnum5.ods

Steps to Reproduce:

1. Change locale to Simplified Chinese. Type in the following numbers in column A:

1
2
3
...
10
11
12
13
14
15
16
17
18
19
20
...

2. Set cell format code to "[Natnum4]0"

Current Result:
the ones from 10 to 19 is prefixed with "ONE-TEN".
e.g.:
10	一十	ONE-TEN
11	一十一	ONE-TEN AND ONE
12	一十二	ONE-TEN AND TWO
13	一十三	ONE-TEN AND THREE
...
17	一十七	ONE-TEN AND SEVEN
18	一十八	ONE-TEN AND EIGHT
19	一十九	ONE-TEN AND NINE


Expected Result:
the ones from 10 to 19 is should be prefixed with "TEN" only,
E.g.:
e.g.:
10	十	TEN
11	十一	TEN AND ONE
12	十二	TEN AND TWO
13	十三	TEN AND THREE
...
17	十七	TEN AND SEVEN
18	十八	TEN AND EIGHT
19	十九	TEN AND NINE

See the attached spreadsheet for the comparison.

Note that for all other num ranges the current result is correct and thus should not be changed in a fix.

Also note that, the above is for NatNum4 (i.e. Chinese Lower Case numbering). The current result for NatNum5 is correct (i.e. is prefixed with "ONE-TEN" correctly), and thus should not be changed in a fix.

Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: b9039e511ed103814dd3c2987c2e408aebb58058
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: zh-CN (zh_CN.UTF-8); UI: zh-CN
Build Platform: Fedora34@X64, Branch:master, bibisect-linux-64-7.4-CN
Calc: threaded
Comment 1 Kevin Suo 2022-01-29 10:21:07 UTC
Created attachment 177899 [details]
NatNun4.png (bug btw 10-20)
Comment 2 Kevin Suo 2022-01-29 10:21:37 UTC
Created attachment 177900 [details]
NatNum5.png (correct)
Comment 3 Kevin Suo 2022-01-29 12:46:51 UTC
Could someone give me some more code information on how to omit the ONE for numbers 10 to 19 only, for Chinese NatNum4, while not affecting other number ranges?

I guess its in /core/i18npool/source/nativenumber/nativenumbersupplier.cxx but its difficult to understand. For instance, for "const Number natnum4[4]" there is a NUMBER_OMIT_ZERO_ONE_67 for NumberChar_Modern_ja, and for NumberChar_Lower_ko there is a NUMBER_OMIT_ZERO. But if I add anything for NumberChar_Lower_zh the build may fail.
Comment 4 Ming Hua 2022-01-29 13:51:09 UTC
I don't use [NatNumX] formatting myself (unlike Kevin who I assume uses [NatNum5] daily at his work), so I haven't give this issue much thought.  The following are just my two cents from some tangentially related angles:

First, for [NatNum5], the so called "Chinese uppercase" formatting widely used in finance and accounting, LO Calc seems to give satisfactory results.  The [NatNum4] formatting looks like just a simple character-to-character replacement of [NatNum5], and therefore deviates from many people's accustomed convention.  The examples of 10-19 given by Kevin are the more obvious ones.

However I personally disagree with Kevin that 10-19 are the only numbers that [NatNum4/5] deviates from people's oral conventions.  I read 3015 as "三千零十五" myself instead of the "叁仟零壹拾伍/三千零一十五" produced by [NatNum4/5] formatting.  Similarly for 30015 and 300150000, I read them as "三万零十五" and "三亿零十五万".  It would be great if there are some national or industrial standard (I didn't check) for this.

Second, the number range custom field feature [1], which I do use, gives "十五" for 15 "correctly".  This feature is based on the libnumbertext library [2].

1. Menu "Insert > Field > More Fields...", "Variables" tab, "Number range" type, "One, Two, Three, ..." format, fill the desired number in "Value" field, press "Insert" button.
2. https://github.com/Numbertext/libnumbertext
Comment 5 Ming Hua 2022-01-29 14:03:44 UTC
(In reply to Kevin Suo from comment #3)
> Could someone give me some more code information on how to omit the ONE for
> numbers 10 to 19 only, for Chinese NatNum4, while not affecting other number
> ranges?
Not a developer myself, but I'd like to refer to the work done by Naruhiko Ogasawara and DaeHyun Sung recently on this area (see bug 130193 and bug 130140, though no exact commit linked to the latter), which you may find useful.
Comment 6 Kevin Suo 2022-01-29 14:26:33 UTC
(In reply to Ming Hua from comment #4)

Ming Hua:

There is a little difference between NatNum4 and NatNum5. 
For NatNum5, the ONE is needed, e.g. for RMB 10.00, it is "人民币壹拾圆整”。I do not find any local regulation on this - the only relevant regulation is the so-called "中国人民银行支票填写规范", where as it stated that "票据的出票日期必须使用中文大写,为防止变造票据的出票日期,在填写月、日时、月为壹、贰和壹拾的,日为壹至玖和壹拾、贰拾和叁拾的,应在其前加“零”,日为拾壹至拾玖的应在其前加“壹”,如1月15日应写成零壹月壹拾伍日,再如10月20日应写成零壹拾月零贰拾日". And this practice has been used widely, not only on dates, but also on numbers. The best place to verify this is on any VAT invoice.

However, if we say "I earned 10 dollars today", it is "我今天挣了十美元” (in LOWER form), rather than "我今天挣了一十美元" or "我今天挣了壹拾美元" (in upper form).

For 30015, I think its more accurate to read it as "三万〇一十五"。The best place to find such "standard" is in the "中华人民共和国民法典" (PRC Civil Code). For instance, Article 1010 is read as "第一千零一十条" (i.e. there is a ONE before the 2nd "10").
Comment 7 Kevin Suo 2022-01-29 14:43:54 UTC
> the number range custom field feature [1], which I do use, gives "十五" for 15 "correctly".

Thanks for your hint - This "One, Two, Three, ..." format uses capitalize ordinal-number in the "NativeNumberMode::NATNUM12" format code. This is exactly what we need for "一、二、三" numbering in Writer. This solves a part of bug 77803.
Comment 8 Ming Hua 2022-01-29 14:45:09 UTC
(In reply to Kevin Suo from comment #6)
> There is a little difference between NatNum4 and NatNum5. 
> [explanation skipped]
I know.  In my previous comment I meant "the current implementation of NatNum4 and NatNum5 in Calc" seems to be just character-for-character replacement of each other.

As for what differences you want to make them to have in the future, I don't use this feature and don't really have a horse in the race.
Comment 9 Ming Hua 2022-01-29 14:59:37 UTC
(In reply to Kevin Suo from comment #7)
> > the number range custom field feature [1], which I do use, gives "十五" for 15 "correctly".
> 
> Thanks for your hint - This "One, Two, Three, ..." format uses capitalize
> ordinal-number in the "NativeNumberMode::NATNUM12" format code. This is
> exactly what we need for "一、二、三" numbering in Writer. This solves a part of
> bug 77803.
Alas, this reminded me that the libnumbertext-based formatting is also provided through the [NatNum12] formatting in Calc.  So for you original problem, using "[NatNum12]0" instead of "[NatNum4]0" format code should give you the desired result, at least on an up-to-date [1] Calc [2].

1. Version 7.2 or earlier has a but for numbers like 201 and 2345, see https://github.com/Numbertext/libnumbertext/issues/83
2. [NatNum4] is also used for importing Microsoft's .xls/.xlsx files when they contain [DBNum1] format code, if I read bug 130193 correctly.  So interoperability with Excel is still a problem.
Comment 10 Kevin Suo 2022-01-29 15:08:04 UTC
(In reply to Ming Hua from comment #9)
I have used NATNUM12 to resolve the problem in bug 77803 only.

*This* bug should still remain open.
Comment 11 Eike Rathke 2022-01-31 18:16:08 UTC
(In reply to Kevin Suo from comment #3)
> Could someone give me some more code information on how to omit the ONE for
> numbers 10 to 19 only, for Chinese NatNum4, while not affecting other number
> ranges?
> I guess its in /core/i18npool/source/nativenumber/nativenumbersupplier.cxx
Doesn't that already work if you add NUMBER_OMIT_ONE_1 to the NumberChar_Lower_zh entry in natnum4[]? i.e. if in AsciiToNative_numberMaker() multiChar_index is 0 the native 1 will be omitted? But maybe I misunderstood from a short glance.

If that doesn't work, then you may have to insert another
#define NUMBER_OMIT_ONE_TENTHS (1 << 2)
(or whatever name) and for the other following defines shift them by one more place and change
#define NUMBER_OMIT_ONE_CHECK(bit)  (1 << (2 + bit))
to
#define NUMBER_OMIT_ONE_CHECK(bit)  (1 << (3 + bit))

and in AsciiToNative_numberMaker() around the NUMBER_OMIT_ONE_CHECK(multiChar_index) condition additionally check if NUMBER_OMIT_ONE_TENTHS is set and if so and begin==0 then evaluate str[begin] and str[begin+1] (if not past end) if they form a "1x" that exactly is the input string.

Just an idea, don't know if that actually fits.


> But if I add anything for
> NumberChar_Lower_zh the build may fail.
Fail in what sense?
Comment 12 Eike Rathke 2022-01-31 18:22:25 UTC
(In reply to Ming Hua from comment #9)
> 2. [NatNum4] is also used for importing Microsoft's .xls/.xlsx files when
> they contain [DBNum1] format code, if I read bug 130193 correctly.  So
> interoperability with Excel is still a problem.
Yes. See also
https://help.libreoffice.org/7.2/en-GB/text/shared/01/05020301.html?DbPAR=SHARED#bm_id3153514
the long "NatNum modifiers" section and tables.