Bug 64977 - Adding Tibetan Language Support
Summary: Adding Tibetan Language Support
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Localization (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Eike Rathke
URL:
Whiteboard: target:4.2.0
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-25 09:43 UTC by Elie Roux
Modified: 2013-06-15 17:41 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
patch for the core/ directory (65.18 KB, text/plain)
2013-05-25 09:43 UTC, Elie Roux
Details
corrected bo_CN.xml file (27.28 KB, text/xml)
2013-06-08 07:32 UTC, Elie Roux
Details
corrected bo_IN.xml file (27.51 KB, text/xml)
2013-06-08 07:33 UTC, Elie Roux
Details
bo_charset.txt file (same as dzongkha) (4.29 KB, text/plain)
2013-06-08 07:34 UTC, Elie Roux
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Elie Roux 2013-05-25 09:43:51 UTC
Created attachment 79782 [details]
patch for the core/ directory

Dear all,

I would like to add Tibetan in the list of available languages (I'm currently writing a hunspell dictionnary). I made a git patch for the core/ directory (attached), adding the locales and a few more little things.

I just had a problem: in the files

oovbaapi/ooo/vba/word/WdLanguageID.idl
oovbaapi/ooo/vba/office/MsoLanguageID.idl

the constants associated with Tibetan have number 1105 while it is 2121 in 

l10ntools/source/ulfconv/msi-encodinglist.txt

Is it normal?

Also, I would like to be able to select Tibetan in the list Tools->Options->Languages Settings->Languages->CTL. How should I add it? I wasn't able to find...

As for my previous bug 64926, I could not test it...

Thank you,
-- 
Elie
Comment 1 Urmas 2013-05-25 10:15:41 UTC
1105 is for China
2121 is for Bhutan
Comment 2 Elie Roux 2013-05-25 12:26:15 UTC
Oh, ok then, thank you!

Do you think someone will review the patches I sent?
Comment 3 Michael Meeks 2013-05-25 16:57:38 UTC
Hi Elie - thanks for your work - this is great :-)
The best way to get patches merged is to mail the developer list, or to push them to gerrit; but this is almost as good.

Andras - any chance you could look into this ?

Thanks for contributing !
Comment 4 Elie Roux 2013-05-25 17:05:59 UTC
Thank you!

Re-thinking about this, I remember I have modified /i18npool/source/nativenumber/data/numberchar.h by adding a lines for Tibetan, but I don't really know how this is called so it might well have no effect... this should be easy to improve for people who know the code though.

I've started https://github.com/eroux/tibetan-spellchecker, the .dic is dumb right now, but tomorrow or monday it will get a very complete list of words. When it will be more stable, I'll make a bugreport for its inclusion (certainly in a few weeks).
Comment 5 Commit Notification 2013-06-08 05:50:32 UTC
Elie Roux committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=0114e1c85bc42fd3bd2a3d0aa33f77f67093b66b

fdo#64977 Adding Tibetan Language Support



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 6 Commit Notification 2013-06-08 05:58:54 UTC
Fridrich Strba committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=fceb821c473112727520c0952607f8377b62f417

Revert "fdo#64977 Adding Tibetan Language Support"



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 7 Elie Roux 2013-06-08 07:32:27 UTC
Created attachment 80507 [details]
corrected bo_CN.xml file

This is the corrected bo_CN.xml file (see comments)
Comment 8 Elie Roux 2013-06-08 07:33:15 UTC
Created attachment 80508 [details]
corrected bo_IN.xml file
Comment 9 Elie Roux 2013-06-08 07:34:17 UTC
Created attachment 80509 [details]
bo_charset.txt file (same as dzongkha)
Comment 10 Elie Roux 2013-06-08 07:38:24 UTC
Dear all,

I just saw that the patch was accepted then reverted. Re-reading it, I realize there was several mistakes in it:
 - in the locale files, the CURRENCY formats were wrong for a part of it
 - in the locale files, the weekdays were like in the dzongkha calendar, where it should be shifted of one
 - the patch for numberchar.txt was stupid

So I attached the new corrected files. Should I propose it on gerrit? Also, as the bo_charset.txt file is a copy of dz_charset, is there a way to just make the locale files point to dz_charset?

Thank you,
-- 
Elie
Comment 11 Miklos Vajna 2013-06-10 07:33:36 UTC
Do I get it right, that you submitted the new version to gerrit as <https://gerrit.libreoffice.org/#/c/4197/>?
Comment 12 Elie Roux 2013-06-10 07:45:14 UTC
Absolutely, I'm sorry I didn't make a link from here!

Thank you,
Comment 13 Michael Meeks 2013-06-10 09:14:13 UTC
one for Eike to track I think :-)
Comment 14 Eike Rathke 2013-06-11 10:22:41 UTC
The numbers under oovbaapi/ are a very limited subset used only by the VBA compatibility API. The actual MS-LangIDs used by LibreOffice are in include/i18nlangtag/lang.h and mappings to ISO codes in i18nlangtag/source/isolang/isolang.cxx

See also https://wiki.documentfoundation.org/LibreOffice_Localization_Guide/Adding_a_New_Language_or_Locale

The bo 2121 in l10ntools/source/ulfconv/msi-encodinglist.txt seems to be wrong and probably should be 1105 instead for bo-CN, for bo-BT it could be 2129. 0x0849 (2121) actually is not related to Tibetan at all but assigned to Tamil Sri Lanka ta-LK.

However, there's some confusion in the MS assignments of "Tibetan" in their IDs, 0x0851 (2129) is used for Tibetan_Bhutan and Dzongkha, see https://issues.apache.org/ooo/show_bug.cgi?id=40713 and https://issues.apache.org/ooo/show_bug.cgi?id=53497 (unfortunately not reachable at the moment of this writing so I couldn't look up details).

To be able to support both bo-CN and bo-IN  we need to add a LangID for bo-IN

I'll try to sort things out and prepare to be able to apply your patch.
Comment 15 Elie Roux 2013-06-11 12:50:20 UTC
Oh, thank you *very much* for the pointer to the documentation, that's what I was missing!

There is an error in the doc:

i18nlangtag/inc/i18nlangtag/lang.h

should be replaced by 

include/i18nlangtag/lang.h

Also, taking a look quickly, I wonder if Tibetan and Dzongkha should appear in MsLangId::needsSequenceChecking of mslangid.cxx?... what does it do exactly?

Apart from this, the only file that seem to need changes is langtab.src... but shouldn't Dzongkha be added to this file too?

Should I provide the patch for langtab.src?

Thank you!
Comment 16 Elie Roux 2013-06-11 14:53:20 UTC
Taking a closer look:

the only change langtab.src would need is to replace

        < "Tibetan (PR China)" ; LANGUAGE_TIBETAN ; > ;

by

        < "Tibetan" ; LANGUAGE_TIBETAN ; > ;

as Tibetan is also spoken in India (several states including Sikkhim, Ladakh and Zanskar), Nepal, Bhutan, etc.

Also, why isn't Tibetan in the list of Complex scripts in Options->Langages Parameters->Languages? How is this list generated?
Comment 17 Caolán McNamara 2013-06-11 15:34:52 UTC
"Also, taking a look quickly, I wonder if Tibetan and Dzongkha should appear in MsLangId::needsSequenceChecking of mslangid.cxx?... what does it do exactly?"

Probably not. Sequence checking is where "invalid" character combinations are rejected when the user inputs them. Thai and Khmer are the classic cases. Best to default to avoiding marking a language as needing sequence checking unless there's a strong reason otherwise.
Comment 18 Eike Rathke 2013-06-11 17:45:40 UTC
(In reply to comment #15)
> There is an error in the doc:
> 
> i18nlangtag/inc/i18nlangtag/lang.h
> 
> should be replaced by 
> 
> include/i18nlangtag/lang.h

Already corrected, thanks to Andras :-)  The header files were recently moved.


> Apart from this, the only file that seem to need changes is langtab.src...
> but shouldn't Dzongkha be added to this file too?

It is there, line 216:

        < "Dzongkha" ; LANGUAGE_DZONGKHA ; > ;


(In reply to comment #16)
> the only change langtab.src would need is to replace
> 
>         < "Tibetan (PR China)" ; LANGUAGE_TIBETAN ; > ;
> 
> by
> 
>         < "Tibetan" ; LANGUAGE_TIBETAN ; > ;

No, that entry needs to stay as is, but an additional entry will be needed

        < "Tibetan (India)" ; LANGUAGE_TIBETAN_INDIA ; > ;

the LANGUAGE_TIBETAN_INDIA constant needs to be added to lang.h first and the ISO code mapping be added, that's what I referred earlier which needs to be sorted out and I'll do.


> Also, why isn't Tibetan in the list of Complex scripts in Options->Langages
> Parameters->Languages?

The languages listed there for the default document language list boxes appear only if fully supported, i.e. locale data exists. Currently you can see Tibetan only in the character attribution dialog, e.g. in Writer Format->Character->Font CTL Font list.

> How is this list generated?

Classification is obtained from MsLangId::getScriptType() in i18nlangtag/source/isolang/mslangid.cxx
Comment 19 Eike Rathke 2013-06-12 09:43:55 UTC
http://cgit.freedesktop.org/libreoffice/core/commit/?id=ad3105a2933aff80b8fd471d32c0846440a508c5

Adds the necessary LangID and mapping for bo-IN, wrong fdo# in commit summary though (number of the OOo issue mentioned above) so it didn't show up automatically in this bug, don't worry ...
Comment 20 Commit Notification 2013-06-12 11:25:32 UTC
Elie Roux committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=c56f9b76693d0b7f43234afb58796338dcd52489

fdo#64977 Adding Tibetan Language Support



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.