Bug Hunting Session
Bug 103855 - Add a language code via an extension
Summary: Add a language code via an extension
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
5.0.3.2 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard: target:5.3.0
Keywords:
Depends on:
Blocks: Extension-add-language
  Show dependency treegraph
 
Reported: 2016-11-11 10:42 UTC by martin_hosken
Modified: 2019-04-15 12:29 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description martin_hosken 2016-11-11 10:42:12 UTC
This bug would allow the addition of a new language code via an extension.

Due to the excellent work in l18nlangtag, there is very little that needs to be done. But there is the need to hook up a language against which script type it has and a UI name for it for the language setting in character formatting.

The planned approach is to add a new registry structure under VCL, since the language less is managed in svtools, that allows the specification of a new language tag and its script type, name and anything else needed at the level of a language tag.
Comment 1 martin_hosken 2016-11-16 15:36:03 UTC
An implementation patch is available for review https://gerrit.libreoffice.org/#/c/30882/

There is no need to support sequence checking, since sequence checking has to be built into code inside libreoffice and so will require a rebuild anyway. In addition, sequence checking needs are very rare. Likewise, there is unlikely to be many CJK type scripts, although a few.

An example .oxt that makes use of the new capability is available from: http://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=ccq6hrjrxc

Installing this will add a Shan entry in the language list for complex script fonts in things like the character format dialog.

The public values for script type introduced could grow but would never need to shrink. Notice how the RTL value implies another value (CTL) and any future growth (not intended) would have numbers with similar implications of other values.
Comment 2 martin_hosken 2016-11-17 09:48:06 UTC
I've been asked why the changes to a core file like languagetag.hxx?

The way much of the core applications are set up for language is via the use of LanguageTypes which are a 16-bit id after a MSLangId. The particular query I am needing to address is the ScriptType of a given LanguageType. The ScriptType tells us what category of script is used for the language. Primarily this is for the western vs ctl vs asian split that controls font and language selection based on characters in the text, but also for rtl and cjk querying in other areas.

One option would be to query the language tag and ask for its script. The problem here is that most language tags and their users do specify the script in the tag. There is a big ambiguous hole in the language tag standard that says: if you do not have a suppress script for your language then you may use a script sub tag, but that you SHOULD not include the script if it adds no extra distinguishing information to the tag. Therefore, for most languages in the world, there is no need or desire to specify the script. But that means for languages with no suppress script information, we do not know what the default script is for the language. If we were to specify that in some way to the language tag, the language tag would still not know whether or not to include the script in the tag when it canonicalises it. And so I decided not to take this particular implementation down that rather windy rabbit warren.

LanguageTag has a nice internal map of LanguageType to LanguageTagImpl, and we can store the ScriptType in the LanguageTagImpl, which means it is more stable than in a LanguageTag that can get remade many times. But the only way to get to the LanguageTagImpl interface and the map is via the LanguageTag interface. Hence the changes I have had to make to that interface. Unfortunately, this interface is referenced pretty much everywhere and the result is a large build churn. The hope is that we can minimise any re-changes to this interface, which is considered mature and we don't want to demature it.
Comment 3 Commit Notification 2016-11-17 17:19:06 UTC
Martin Hosken committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=6b35e804198ac45386805e80a3d413ed3405c3b4

Fix tdf#103855 add language codes and names to language lists from extensions

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 4 Stephan Bergmann 2016-11-21 08:46:34 UTC
Martin, if the commit from comment 3 is all that was needed for this issue, please mark as RESOLVED - FIXED.
Comment 5 martin_hosken 2016-11-21 15:02:39 UTC
This issue is fixed in libreoffice 5.3
Comment 6 Eike Rathke 2019-04-08 13:39:11 UTC
@Martin:
The download of the sample from scriptsource.org meanwhile appears to be broken, could you please add documentation here how to actually use the new ExtraLanguage Name and ScriptType properties in a dictionary? Thanks.
Comment 7 Eike Rathke 2019-04-09 11:48:02 UTC
Presumably this is how it's supposed to work:

In dictionaries.xcu have

 <node oor:name="ServiceManager">
   <node oor:name="Dictionaries">
     ...
     <prop oor:name="Locales" oor:type="oor:string-list">
       <value>nod-TH</value>
     </prop>
   </node>
   <node oor:name="ExtraLanguages">
     <node oor:name="nod-TH" oor:op="replace">
       <prop oor:name="Name">
         <value xml:lang="en-US">Default US English UI name</value>
         <value xml:lang="...">Your favorite translation here...</value>
       </prop>
       <prop oor:name="ScriptType">
         <value>3</value>
       </prop>
     </node>
   </node>
 </node>

Where "nod-TH" is a BCP 47 language tag for the dictionary's
language/locale to be added to the Language list in dialogs, represented
in the UI by the "Name" defined.

ScriptType value 3 here means CTL. The values are explained in
officecfg/registry/schema/org/openoffice/VCL.xcs under
<group oor:name="ExtraLanguage">
See https://opengrok.libreoffice.org/xref/core/officecfg/registry/schema/org/openoffice/VCL.xcs?r=33e80611#79
Comment 8 Eike Rathke 2019-04-10 13:37:52 UTC
The example has an error in that it adds the node under
  <item oor:path="/org.openoffice.Office.Linguistic">                                                                     
    <node oor:name="ServiceManager">                                                                                       
which is wrong, instead it should be a separate item with a different path,
  <item oor:path="/org.openoffice.VCL">
    <node oor:name="ExtraLanguages">

According to https://lists.freedesktop.org/archives/libreoffice/2019-April/082441.html (see full version of a dictionaries.xcu there) a working example is

  <item oor:path="/org.openoffice.VCL">
    <node oor:name="ExtraLanguages">
      <node oor:name="nod-TH" oor:op="fuse">
        <prop oor:name="Name" oor:type="xs:string">
  	  <value>Northern Thai</value>
        </prop>
        <prop oor:name="ScriptType" oor:type="xs:int">
  	  <value>3</value>
        </prop>
      </node>
    </node>
  </item>

Adding the localization/translation capabilities that then might be

  <item oor:path="/org.openoffice.VCL">
    <node oor:name="ExtraLanguages">
      <node oor:name="nod-TH" oor:op="fuse">
        <prop oor:name="Name" oor:type="xs:string">
  	  <value xml:lang="en-US">Northern Thai</value>
          <value xml:lang="...">Your favorite translation here...</value>
        </prop>
        <prop oor:name="ScriptType" oor:type="xs:int">
  	  <value>3</value>
        </prop>
      </node>
    </node>
  </item>

I'm not quite sure about the oor:op="fuse" vs oor:op="replace" attribute, i.e. what would actually happen if there are multiple entries for one name.
Comment 9 Stephan Bergmann 2019-04-10 13:48:26 UTC
(In reply to Eike Rathke from comment #8)
> I'm not quite sure about the oor:op="fuse" vs oor:op="replace" attribute,
> i.e. what would actually happen if there are multiple entries for one name.

"replace" just puts the new node in place, completely ignoring any previously existing one, if any, while "fuse" merges with the content of any previously existing node, if any.  But since this node of type ExtraLanguage is a non-extensible group with exactly two member props ("Name" and "ScriptType"), both of which are explicitly set here, there's effectively no difference in "replace" vs. "fuse" in this case.  (And if multiple extensions bring along an "ExtraLanguages" set element named "nod-TH", a random one wins.)
Comment 10 Eike Rathke 2019-04-15 12:29:24 UTC
(Just collecting info and material here for a later README or such)
oxttools, Tools for creating language support oxt extensions for LibreOffice
https://github.com/silnrsi/oxttools