Bug 102235 - Numerals displayed in Arabic Script for Hindi language document
Summary: Numerals displayed in Arabic Script for Hindi language document
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Laurent BP
URL:
Whiteboard: target:5.3.0
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-17 08:35 UTC by Shree Devi Kumar
Modified: 2017-01-15 08:40 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Numerals displayed in Arabic Script for Hindi Language Documents (174.06 KB, image/png)
2016-09-17 08:35 UTC, Shree Devi Kumar
Details
Libre Office Writer document showing that numerals are displayed in Arabic Script when using Hindi numerals option. (16.25 KB, application/vnd.oasis.opendocument.text)
2016-09-17 12:43 UTC, Shree Devi Kumar
Details
Screenshot (61.19 KB, image/png)
2016-09-17 14:50 UTC, m.a.riosv
Details
Hindi Numerals - Red Circles (131.66 KB, image/png)
2016-09-17 16:21 UTC, Shree Devi Kumar
Details
System Numerals - Black circles (146.82 KB, image/png)
2016-09-17 16:21 UTC, Shree Devi Kumar
Details
Hindi Numerals - Locale setting (143.53 KB, image/png)
2016-09-17 16:22 UTC, Shree Devi Kumar
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Shree Devi Kumar 2016-09-17 08:35:26 UTC
Created attachment 127376 [details]
Numerals displayed in Arabic Script for Hindi Language Documents

Numbers in Table of Contents and Numbered list in a Hindi document are being shown  in Arabic Script, even though 
 Options-----Language settings-----Complex text layout-----General options and changed the numerals to Hindi.

I have gotten round the problem by changing the numerals type to System, instead of Hindi.

However for a Hindi language document, the numerals should be in Hindi by default OR should work correctly when Hindi Numerals option is chosen.

--- It is possible that 'Hindi Numerals' under Options does not refer to numerals in devanagari script for Hindi language but to "what the Arabs call the "Hindi numerals", namely the Eastern Arabic numerals (٠‎ - ١‎ - ٢‎ - ٣‎ -٤‎ - ٥‎ - ٦‎ - ٧‎ - ٨‎ - ٩‎) used in the Middle East" - see https://en.wikipedia.org/wiki/Arabic_numerals

However, in that case they should be displayed only when the document is in Arabic language and not for Hindi language.
Comment 1 m.a.riosv 2016-09-17 11:42:24 UTC
For test, please attach a sample document with the issue as minimal as possible.
Comment 2 Shree Devi Kumar 2016-09-17 12:43:53 UTC
Created attachment 127379 [details]
Libre Office Writer document showing that numerals are displayed in Arabic Script when using Hindi numerals option.
Comment 3 m.a.riosv 2016-09-17 14:50:21 UTC
Created attachment 127380 [details]
Screenshot

Looks it works fine for me.
Win10x64
Version: 5.2.2.1 (x64)
Build ID: 3c2231d4aa4c68281f28ad35a100c092cff84f5d
CPU Threads: 4; OS Version: Windows 6.19; UI Render: GL; 
Locale: es-ES (es_ES); Calc: group

Please try resetting the user profile, sometimes solves strange issues.
https://wiki.documentfoundation.org/UserProfile
Usually it's enough renaming/deleting the file "user/registrymodifications.xcu",  it affects all the options in Menu/Tools/Options, and the files "user/basic/dialog.xlc" and "scrip.xlc" are overwritten, additionally custom colors in "user/config/standard.soc" are lost.
Comment 4 Shree Devi Kumar 2016-09-17 16:19:28 UTC
The screenshot that you have attached highlights the problem.

Let me clarify ...

In English language and many Indian languages numerals are written as 1 2 3 4 - in Libreoffice these have been called as 'Arabic' numerals.

In Hindi, Marathi, Nepali languages written in Devanagari script, numerals are written as १ २ ३ ४. For Hindi language documents, similar to the test case I submitted, numerals are displayed like this when 'System' option is chosen for numerals.

In the Middle East, the Eastern Arabic numerals are written as   ١‎ - ٢‎ - ٣‎ -٤‎ . In LibreOffice, these are called 'Hindi' Numerals - since per wikipedia they are called as such.

Please see https://en.wikipedia.org/wiki/Indian_numerals - Modern Devanagari, Hindu-Arabic and Arabic-Indic headings.

The confusion is caused, when using a Hindi locale and Hindi language, I create a document and the numerals in numbered lists, in table of contents are displayed as 'Hindi' numerals i.e. Eastern Arabic numerals.

I am attaching three images, 
one shows the incorrect rendering (from Hindi language point of view) when Hindi numerals are chosen. I have circled the numerals in red circles.
The second shows the correct rendering from Hindi language point of view, I have circled the numerals there in black circles.
The third screenshot just shows the locale setting.
Comment 5 Shree Devi Kumar 2016-09-17 16:21:06 UTC
Created attachment 127381 [details]
Hindi Numerals - Red Circles
Comment 6 Shree Devi Kumar 2016-09-17 16:21:45 UTC
Created attachment 127382 [details]
System Numerals - Black circles
Comment 7 Shree Devi Kumar 2016-09-17 16:22:53 UTC
Created attachment 127383 [details]
Hindi Numerals - Locale setting
Comment 8 Shree Devi Kumar 2016-09-17 17:30:10 UTC
At a minimum I would suggest, changing the labels for the numerals options ..


Arabic to 1 2 3
Hindi to ١‎ ٢‎ ٣‎

So that users are aware of the kind of numerals they are choosing.
Comment 9 m.a.riosv 2016-09-18 11:03:39 UTC
Sorry, let's see if someone with acknowledge can help.
Comment 10 Laurent BP 2016-09-18 19:04:41 UTC
"Hindi" may be replaced with "Indic" which is less confusing.
Adding sample digits is a good idea
Comment 11 Shree Devi Kumar 2016-09-19 04:14:14 UTC
Adding Khaled Hosny to CC list, to hopefully provide some insight into what the recommended labels should be for Arabic and Hindi numerals. 

The list in https://bugs.documentfoundation.org/show_bug.cgi?id=36038 uses the following:

NN	Representation	Numeral	Main LCID	Other LCID
00	1234567890			
01	1234567890	arabic-Europ		
02	١٢٣٤٥٦٧٨٩٠	arabic-indi	401	1401, 3c01, 0c01, 801, 2c01, 3401, 3001, 1001, 1801, 2001, 4001, 2801, 1c01, 3801, 2401
03	۱۲۳۴۵۶۷۸۹۰	arabic-farsi	429	
04	१२३४५६७८९०	hindi-devanagari	439	44E, 461, 861

------------
I would also suggest that for Hindi language/Hindi locale documents, the default numerals setting should be 'System' so that the numerals display in 'hindi-devanagari' instead of 'Hindi' which is 'arabic-indi' as per the above list.
Comment 12 Khaled Hosny 2016-09-19 20:53:02 UTC
The use of Hindi for Eastern Arabic numerals is very unfortunate and based mostly based on an urban legend that made its way to office suits.

As for naming, I’d use European for “123” (as they are named in Unicode) and Eastern Arabic for “١٢٣”.

I’d not recommend changing the default, it should always give you the exact characters you typed. If one wants automatic conversion, it should be opt-in as it is now.
Comment 13 Shree Devi Kumar 2016-09-20 03:53:02 UTC
Thanks for the clarification and suggestion regarding the numerals, Khaled.

So, in the dropdown list for Numerals,

1. 'Arabic' should be changed to 'European'.

2. 'Hindi' should be changed to 'Eastern 'Arabic'.
Comment 14 Shree Devi Kumar 2016-09-20 03:58:54 UTC
Regarding the default, with the current 'Hindi' numerals setting for Hindi language and Hindi locale, when a numeral is typed as part of the Hindi text it is in Devanagari script which is correct. It is the numerals that are generated by LO writer for page numbers, numbered list and heading etc that are  in Arabic script.
Comment 15 Commit Notification 2016-09-28 21:04:46 UTC
Laurent Balland-Poirier committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=80942ca029a632beb48d0e1baf37e28a355c7dd9

tdf#102235 Replace Hindi with Indic and add some sample digits

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Khaled Hosny 2016-09-28 21:53:04 UTC
The commit does not seem to follow what we discussed here. Indic is even worse than Hindi.
Comment 17 Shree Devi Kumar 2016-09-29 04:59:29 UTC
I agree with Khaled, changing 'Hindi' to 'Indic' is NOT correct.

As mentioned earlier (please see comment 13),

Hindi should have been changed to 'Eastern Arabic'
and
Arabic should have been changed to 'European'.

Adding the sample digits is good, Thanks for that.

-------------

Also ignore comment 14 by me - I have not found a way to delete it or mark it as obsolete. I reset user profile as suggested in comment 3 and that seems to have cleared the problem related to default rendering of numerals.
Comment 18 Laurent BP 2016-09-29 10:20:46 UTC
(In reply to shreeshrii from comment #17)
> I agree with Khaled, changing 'Hindi' to 'Indic' is NOT correct.
> 
> As mentioned earlier (please see comment 13),
> 
> Hindi should have been changed to 'Eastern Arabic'
> and
> Arabic should have been changed to 'European'.
> 
> Adding the sample digits is good, Thanks for that.
> 
> -------------
> 
> Also ignore comment 14 by me - I have not found a way to delete it or mark
> it as obsolete. I reset user profile as suggested in comment 3 and that
> seems to have cleared the problem related to default rendering of numerals.

Why is 'Indic' NOT correct? It is widely used and has no ambiguity. But Eastern Arabic is also ok, and seems as used as Indic. "Indic numerals" has 221,000 results on Google, while "Eastern Arabic numerals" as 158,000.

I don't understand the choice of 'European' as these numerals are used in a much wider area. Moreover it seems there can be some ambiguity with this term:
https://en.wikipedia.org/wiki/European_numerals 
'European Arabic' may be more clear?
Comment 19 Khaled Hosny 2016-09-29 10:41:36 UTC
(In reply to Laurent BP from comment #18)
> (In reply to shreeshrii from comment #17)
> > I agree with Khaled, changing 'Hindi' to 'Indic' is NOT correct.
> > 
> > As mentioned earlier (please see comment 13),
> > 
> > Hindi should have been changed to 'Eastern Arabic'
> > and
> > Arabic should have been changed to 'European'.
> > 
> > Adding the sample digits is good, Thanks for that.
> > 
> > -------------
> > 
> > Also ignore comment 14 by me - I have not found a way to delete it or mark
> > it as obsolete. I reset user profile as suggested in comment 3 and that
> > seems to have cleared the problem related to default rendering of numerals.
> 
> Why is 'Indic' NOT correct? It is widely used and has no ambiguity. But
> Eastern Arabic is also ok, and seems as used as Indic. "Indic numerals" has
> 221,000 results on Google, while "Eastern Arabic numerals" as 158,000.

Check the 2nd paragraph of the Wikipedia article cited in the commit.

> I don't understand the choice of 'European' as these numerals are used in a
> much wider area. Moreover it seems there can be some ambiguity with this
> term:
> https://en.wikipedia.org/wiki/European_numerals 
> 'European Arabic' may be more clear?

European is the term used by Unicode. I think we shouldn’t try to invent new terminology here and instead try to adopt some widely used ones.

Alternatively (and I’m leaning more towards this), keep the old names (they have a long history in office suites for better or worse) and keep the sample characters that should help clear any confusion.
Comment 20 Shree Devi Kumar 2016-09-29 11:10:48 UTC
Please, do NOT change it back to Hindi numerals. 

Hindi indicates the Hindi language in Devanagari script and causes confusion.

Indic refers to group of Indo-European languages. 

Eastern Arabic would be fine.

----------------------

More background:

Please see https://www.britannica.com/topic/Hindu-Arabic-numerals

"Hindu-Arabic numerals, Set of 10 symbols—1, 2, 3, 4, 5, 6, 7, 8, 9, 0—that represent numbers in the decimal number system. They originated in India in the 6th or 7th century and were introduced to Europe through Arab mathematicians around the 12th century (see al-Khwarizmi). They represented a profound break with previous methods of counting, such as the abacus, and paved the way for the development of algebra."

And, this decimal system of numbering can be contrasted with Roman numerals which are I II III IV V etc.

These numerals 0-9 are written differently in different scripts. 

As I mentioned before, in English and other languages written in Latin script (and nowadays even in some Indian languages) these numbers are written as 1 2 3.

In Hindi language, Devanagari script, they are written as १ २ ३ 

In Arabic language/script , they are written as ١٢٣

Please see comment 9 on https://bugs.documentfoundation.org/show_bug.cgi?id=36038 which lists how these 0-9 numerals are written in various other scripts.

--------------------------

If these additional options are to be given in Libre Office for numerals, 
my vote would be for 

Eastern Arabic for ١٢٣
and 
European Arabic for 123

along with the sample.

Thanks!
Comment 21 Laurent BP 2016-09-29 16:54:17 UTC
(In reply to Khaled Hosny from comment #19)
> European is the term used by Unicode. I think we shouldn’t try to invent new
> terminology here and instead try to adopt some widely used ones.
Actually I've chosen "Indic" and "Arabic" terms because of this page about Unicode glossary:
http://www.ibm.com/developerworks/library/glossaries/unicode.html#numbers

So we agree with "Eastern Arabic" for ١٢٣ numerals.

"European Arabic" would be more symmetric and less confusing as 123 numerals are known as Arabic numerals by most users.

I say "good luck" to all l10n team when they will have to translate these terms ;-)
Comment 22 Khaled Hosny 2016-09-29 18:41:44 UTC
I do not like inventing new terms; we already have enough confusion, so no from me to “European Arabic”. So Arabic and Eastern-Arabic is fine by me, though less than ideal.
Comment 23 Commit Notification 2016-09-29 23:21:19 UTC
Adolfo Jayme Barrientos committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=89a3f825559753d6600807342ca96c169cd58c87

tdf#102235 Tweak terms to avoid misunderstandings

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 24 Shree Devi Kumar 2016-11-10 14:24:33 UTC
I installed master-2016-11-10_00.11.47_LibreOfficeDev_5.3.0.0.alpha1_Win_x86_en-US_de_ar_ja_ru_vec_qtz.msi today and the numerals display is working as discussed earlier in this thread. Thanks.

However, the 5.3.0-alpha version that I had downloaded from LibreFresh on 5th Nov (LibreOfficeDev_5.3.0.0.alpha1_Win_x86.msi) does not seem to have the change. i.e. it was still displaying Arabic and Hindi as choices and displaying the Arabic and Eastern Arabic numbers for them. 

So, when will this change in master be reflected in the version in LibreFresh?

Thanks.
Comment 25 Xisco Faulí 2017-01-13 12:43:27 UTC
Hello,
Is this bug fixed?
If so, could you please close it as RESOLVED FIXED?
Comment 26 Shree Devi Kumar 2017-01-15 08:40:29 UTC
Installed the pre-release version today.

Version: 5.3.0.1 (x64)
Build ID: 3b800451b1d0c48045de03b5b3c7bbbac87f20d9
CPU Threads: 4; OS Version: Windows 6.19; UI Render: default; Layout Engine: new; 
Locale: hi-IN (en_IN); Calc: group

It is working as desired.

Thank you everyone, for your work.