Download it now!
Bug 128200 - Wrong translations of formula function names and identifiers
Summary: Wrong translations of formula function names and identifiers
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Localization (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: high major
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-17 10:44 UTC by Eike Rathke
Modified: 2020-05-05 18:14 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
l10n-wrong-function-names-1.txt function names (45.38 KB, text/plain)
2019-10-17 10:44 UTC, Eike Rathke
Details
l10n-wrong-function-names-2.txt identifiers (19.46 KB, text/plain)
2019-10-17 10:45 UTC, Eike Rathke
Details
l10n-wrong-function-names-1.txt function names (44.89 KB, text/plain)
2020-03-03 22:25 UTC, Eike Rathke
Details
l10n-wrong-function-names-2.txt identifiers (18.88 KB, text/plain)
2020-03-03 22:27 UTC, Eike Rathke
Details
SQL function names (44.11 KB, text/plain)
2020-04-27 21:24 UTC, Lionel Elie Mamane
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eike Rathke 2019-10-17 10:44:25 UTC
Created attachment 155069 [details]
l10n-wrong-function-names-1.txt function names

There's quite some mess in general in the formula module's
translations. Specifically, names in
msgctxt "RID_STRLIST_FUNCTION_NAMES":

* Translated function names MUST NOT contain spaces, parentheses,
  hyphens, or anything else that would be an operator in
  spreadsheet formula context. Function names may only contain
  letters (of any alphabet or script of course, not just ASCII) or
  digits or '.' dot or '_' underscore, and must start with
  a letter.

  * Attached is l10n-wrong-function-names-1.txt that hopefully
    caught all, created by the command line

    grep -E -A1 'msgid "[A-Z0-9._]+"' translations/source/*/formula/messages.po|grep -B1 'msgstr ".*[-+*/ ()~]'

    (all on one line) also repeated on top of the attached list.


* Translated identifiers like "#All" or "#Headers" if they start
  with a '#' hash character MUST start with a '#' character also
  in the translation, this is vital to recognize them as table
  reference identifiers (or error constants possibly).

  * Attached is l10n-wrong-function-names-2.txt that hopefully
    caught all, created by the command line

    grep -E -A1 'msgid "#' translations/source/*/formula/messages.po | grep -B1 'msgstr "[^#"]'

    (all on one line) also repeated on top of the attached list.


While these wrong translations display correctly when loading
a document, any attempt to compile such formula expression that does not
meet the criteria will fail.

PLEASE, check the attached lists for your language's translation (the
list is sorted by language codes between translations/source/ and
/formula/messages.po) and correct errors ASAP; corrections should also
end up in the 6-3 branch, Cloph probably can help with that.

Also, (possibly after any change) please load the 6.2 attachment
https://bugs.documentfoundation.org/attachment.cgi?id=150855 of
https://bugs.documentfoundation.org/show_bug.cgi?id=93992 in your
translated UI, to check for duplicated function names.

Thanks
Comment 1 Eike Rathke 2019-10-17 10:45:09 UTC
Created attachment 155070 [details]
l10n-wrong-function-names-2.txt identifiers
Comment 2 Xisco Faulí 2019-10-17 13:41:09 UTC
Hi Eike,
is it something you're working on? Otherwsise, can we turn it into an easyhack ?
Comment 3 Eike Rathke 2019-10-17 17:16:55 UTC
It's a translation-only problem, nothing that can be worked on otherwise.
Translators will have to fix the malformed translations.
Comment 4 Eike Rathke 2020-03-03 22:25:40 UTC
Created attachment 158357 [details]
l10n-wrong-function-names-1.txt function names

Updated list from current master, still 319 bad translations.
Comment 5 Eike Rathke 2020-03-03 22:27:11 UTC
Created attachment 158358 [details]
l10n-wrong-function-names-2.txt identifiers

Updated list from current master, still 150 bad translations.
Comment 6 Lionel Elie Mamane 2020-04-27 21:24:35 UTC
Created attachment 160011 [details]
SQL function names

SQL function names (msgctxt "RID_RSC_SQL_INTERNATIONAL") seems to suffer from the same issue, see e.g. bug 103736.

I generated this list with (in module translations; in one line)

git grep -A2 -F 'RID_RSC_SQL_INTERNATIONAL' |
egrep -B2 'msgstr ".*[^[:alnum:]_].*"'

This assumes [:alnum:] does the right thing in all languages / character sets.
Comment 7 Eike Rathke 2020-04-28 14:08:48 UTC
Using the predefined character classes [:alpha:] and [:alnum:] may even be better, it catches things not thought of.. Though it depends on the current locale and delivers false positives in a C or non-UTF-8 locale.

On the other hand Calc allows some constructs like mid letter, spacing marks and modifier letters, I'll have to rethink the Calc grep.
Comment 8 Christian Lohmaier 2020-04-29 16:50:46 UTC
FYI: added a check to weblate for the calc formula issues, using the matches erAck proposed:

in formula/messages.po for strings with context of RID_STRLIST_FUNCTION_NAMES:

Throw error if translation matches the regex [-+*/ ()~]
and throw error if one of source or translation starts with # but the other one doesn't.

Did use bulk change in weblate to flag all the strings as fuzzy, so while they'll still be present in the po file, they won't actually be used/treated as untranslated.

Please keep in mind that weblate is python based, so I'd need python re compatible expressions
Comment 9 Eike Rathke 2020-04-30 12:31:52 UTC
Found out that translators had even more creative ideas.. and this catches more bad cases:
grep -E 'msgstr "(([^[:alpha:]])|(.*[-+*/&% ()~<=>!]))"'

However, python apparently doesn't know the predefined [:alpha:] class, so this could do as well:
'msgstr \"(([0-9_.].*)|(.*[-+*/&% ()~<=>!].*))\"'

If *only* the function name is to be matched, not the msgstr ..., then this is sufficient:
(^[0-9_.])|(.*[-+*/&% ()~<=>!])
Comment 10 Christian Lohmaier 2020-05-05 18:14:58 UTC
(In reply to Lionel Elie Mamane from comment #6)
> 
> git grep -A2 -F 'RID_RSC_SQL_INTERNATIONAL' |
> egrep -B2 'msgstr ".*[^[:alnum:]_].*"'
> 
> This assumes [:alnum:] does the right thing in all languages / character
> sets.

Hmm. Please specify what "the right thing" is. I was assuming it should accept numerical and characters from all scripts, but the list also refuses combined characters, e.g. the গ্য that is a combination of গ + য (according to http://www.wbsed.gov.in/feedback/bengali_help.html, I have no idea of writing in that script :-))

Not treating combinations and advanced characters matches python's behaviour in regards to the \w character class, so I added a check for strings with context RID_RSC_SQL_INTERNATIONAL that complains unless the translation matches ^[\w\d]+$
(unicode word and digit characters, \w includes underscore)