Bug 77803 - Implement separate numbering styles for Chinese and Japanese (they're similar, but not the same)
Summary: Implement separate numbering styles for Chinese and Japanese (they're similar...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Kevin Suo
URL:
Whiteboard: target:7.4.0
Keywords:
Depends on:
Blocks: CJK Numbering-Formats
  Show dependency treegraph
 
Reported: 2014-04-23 10:31 UTC by Steph ZHANG
Modified: 2022-02-11 13:04 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
a odt file with Chinese-char page numbers (8.75 KB, application/vnd.oasis.opendocument.text)
2014-04-26 05:47 UTC, Kevin Suo
Details
pdf file shows the current bug behaviour (9.58 KB, application/pdf)
2014-04-26 05:48 UTC, Kevin Suo
Details
Numbering list numbered by Chinese numbers from 90 to 111 in odt (17.82 KB, application/vnd.oasis.opendocument.text)
2014-04-26 08:24 UTC, Steph ZHANG
Details
Japanese and Chinese NatNum# native numbering (27.66 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-09-16 16:40 UTC, Eike Rathke
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steph ZHANG 2014-04-23 10:31:02 UTC
As I have recently started a very long article that with a length of several hundreds of pages, I found that the page number is somewhat weird between one hundred and two hundred.

The correct experession of one hundred in Chinese should be 「一百」, but the "one" in "one hundred" is missing and the display become 「百」. Further more, the "zero" between hundreds and ones are missing too. The correct expression of one hundred and one is 「一百零一」, but now it became 「百一」 which literally means one hundred and ten and quite confusing.

I have recently written a java program which can convert integer number to number in Chinese, and it is now in http://paste.ubuntu.com/7313644 . I know that this program has terrible structure and efficiency so it is just for referencing. It can only convert numbers from 0 to 99,999,999 too, but I think that's enough.
Comment 1 Kevin Suo 2014-04-26 05:43:31 UTC
Confirmed in libreoffice 4.2.3.3,Build ID: 6c3586f855673fa6a1576797f575b31ac6fa0ba3

Set to new.

(However, I don't think it's a good idea to use "一,二,三..." as page numbers, and few people do that.
Comment 2 Kevin Suo 2014-04-26 05:47:58 UTC
Created attachment 98003 [details]
a odt file with Chinese-char page numbers

I have inserted a page break and set page number start from 100.

so the 2nd page number should be "一百" (100), rather than "百"
the 3rd page number should be "一百零一" (101), rather than "百一".
Comment 3 Kevin Suo 2014-04-26 05:48:38 UTC
Created attachment 98004 [details]
pdf file shows the current bug behaviour
Comment 4 Steph ZHANG 2014-04-26 08:24:03 UTC
Created attachment 98009 [details]
Numbering list numbered by Chinese numbers from 90 to 111 in odt

This problem is not only affecting the page number, but also affecting other words using the same algorithm, such as the numbered list, as this attachment shows.
Comment 5 Kevin Suo 2014-05-24 05:31:03 UTC
I don't think this is a localization issue. Changing component to LibreOffice.
Comment 6 Matthew Francis 2014-08-26 04:19:22 UTC
The issue here appears to be that there are not separate page (and other - list, etc.) numbering styles for Chinese and Japanese. The behaviour of the current 一, 二, 三, ... numbering is correct for Japanese, where one says "Hundred One" (百一) for 101 rather than "One Hundred Zero One" (一百零一) as in Chinese.

The precedent for this is that there are already separate entries for the visually similar Bulgarian, Russian and Serbian numbering, each tagged with the correct language.

If we follow this example, we would need to split the numbering styles into at least:

一, 二, 三, ... (Chinese)
一, 二, 三, ... (Japanese)


There are probably other implications, e.g. for import/export filters and forward/backward compatibility
Comment 7 Matthew Francis 2014-09-11 09:59:29 UTC
See also:
http://cgit.freedesktop.org/libreoffice/core/tree/filter/source/xslt/import/wordml/wordml2ooo_page.xsl#n345

...
            <xsl:when test="$number-format = 'chinese-counting-thousand' or $number-format = 'ideograph-digital' or $number-format = 'japanese-counting' or $number-format = 'japanese-digital-ten-thousand' or $number-format = 'taiwanese-counting-thousand' or $number-format = 'taiwanese-counting' or $number-format = 'taiwanese-digital' or $number-format = 'chinese-counting' or $number-format = 'korean-digital2' or $number-format = 'chinese-not-impl'">
                <xsl:attribute name="style:num-format">一, 二, 三, ...</xsl:attribute>
            </xsl:when>
...

If I read this correctly, we are folding numerous OOXML numbering formats into "一, 二, 三, ...". A thorough resolution of this issue should include consideration of precisely how these formats differ, and whether we are doing everything reasonably possible in terms of round trip compatibility.
Comment 8 Eike Rathke 2014-09-16 16:40:12 UTC
Created attachment 106383 [details]
Japanese and Chinese NatNum# native numbering

We have 8 different native number types for each, Chinese and Japanese, implemented. You can check these for example in Calc by applying number formats to a cell value, i.e.

[NatNum1]General
[NatNum2]General
...
[NatNum8]General

for each language. I'm attaching a document illustrating this. See also offapi/com/sun/star/i18n/NativeNumberMode.idl or http://api.libreoffice.org/docs/idl/ref/namespacecom_1_1sun_1_1star_1_1i18n_1_1NativeNumberMode.html

However, it seems not all are available for native numbering in page numbers and numbering lists. See also offapi/com/sun/star/style/NumberingType.idl or http://api.libreoffice.org/docs/idl/ref/namespacecom_1_1sun_1_1star_1_1style_1_1NumberingType.html

But the exact numbering as mentioned in comment 2 ("一百" (100), "一百零一" (101)) is not present even as NatNum# numbering. The closest would be Chinese NatNum4 with "一百" (100) and "一百〇一" (101).
Comment 9 Kevin Suo 2014-09-17 14:31:03 UTC
(In reply to comment #8)

> But the exact numbering as mentioned in comment 2 ("一百" (100), "一百零一" (101))
> is not present even as NatNum# numbering. The closest would be Chinese
> NatNum4 with "一百" (100) and "一百〇一" (101).

"一百〇一、一百〇二..." are acceptable, not bad Chinese. "百一、百二..." are really bad.
Comment 10 Volga 2016-11-08 16:11:58 UTC
Mozilla Developer Network have some examples for this
https://developer.mozilla.org/en-US/docs/Web/CSS/list-style-type
Comment 11 Volga 2017-01-25 08:13:29 UTC
Another examples can be found at W3C website:
https://drafts.csswg.org/css-counter-styles-3/#extending-css2
Comment 12 Kevin Suo 2022-01-27 11:18:37 UTC
An interesting implementation can be found in the following link (in Chinese):
http://data.biancheng.net/view/146.html
Comment 13 Kevin Suo 2022-01-29 10:12:26 UTC
I think I know why.

Currently for NUMBER_LOWER_ZH and NUMBER_UPPER_ZH, NATNUM7 and NATNUM8 are used.

For Chinese, format code NATNUM7 and NATNUM8 are "short lower case text" and "short upper case text", respectively, see [1]. We need full lower and full upper here for numbering purpose, not "short" form. E.g. for number 100, we need "一百" (lower case) and "壹佰" (upper case), rather than "百" which is in short form.
    
[1] https://help.libreoffice.org/latest/en-US/text/shared/01/05020301.html

The full form should be NATNUM4 and NATNUM4, and we should it two use these two instead.

However, there is another bug blocking this change, see bug 147054. That is, for NatNum4, numbers from 10 to 19 are prefixed as "一十" (i.e. ONE-TEN). As a result I mark *this* bug to depend on the fix for bug 147054.
Comment 14 Kevin Suo 2022-01-29 14:46:34 UTC
Per hint from Ming Hua, we can use NativeNumberMode::NATNUM12 for Chinese Lower numbering.
Comment 16 Commit Notification 2022-02-07 17:53:57 UTC
Kevin Suo committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/24e6217c0abdaed703a077e77881ad6e8b4f6f0e

tdf#77803: Use NATNUM12 and NATNUM4 for Chinese Numberring

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Kevin Suo 2022-02-11 13:04:40 UTC
This is now fixed on master.