Bug 109258 - Tooling allows (and thus code has) duplicate string entries for translation
Summary: Tooling allows (and thus code has) duplicate string entries for translation
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: framework (show other bugs)
Version:
(earliest affected)
5.4.0.1 rc
Hardware: All Mac OS X (All)
: high major
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 111878 (view as bug list)
Depends on:
Blocks: Spell-Checking
  Show dependency treegraph
 
Reported: 2017-07-21 14:22 UTC by Martin Srebotnjak
Modified: 2019-03-21 09:53 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot for FR spellchecker in 5403 (39.63 KB, image/png)
2017-07-25 08:48 UTC, Alex Thurgood
Details
Screenshot localized character dialog 5403 (56.85 KB, image/png)
2017-07-25 08:53 UTC, Alex Thurgood
Details
Unlocalized list of languages (156.40 KB, image/png)
2017-07-25 12:50 UTC, Martin Srebotnjak
Details
Updated Slovenian misc.po file (13.00 KB, application/zip)
2017-07-28 13:17 UTC, Martin Srebotnjak
Details
Duplicated po strings in LO code (3.28 KB, text/plain)
2017-08-19 08:12 UTC, Martin Srebotnjak
Details
List of duplicate localization strings in 6.1/master (as of 2018-04-04) (4.32 KB, text/plain)
2018-04-05 06:53 UTC, Martin Srebotnjak
Details
List of duplicate localization strings in 6.1/master (as of 2018-04-23) (1.71 KB, text/plain)
2018-04-23 16:50 UTC, Martin Srebotnjak
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Srebotnjak 2017-07-21 14:22:02 UTC
Description:
List of languages to select spell-checking language are in English.
Applies both for Character dialog as well as for the list in the status bar (where that language can be set as well).

Steps to Reproduce:
1. Open localized LO.
2. Change spell-checking language via Character dialog or status bar.
3.

Actual Results:  
Listed languages are in English.

Expected Results:
Should be a localized list of languages.


Reproducible: Always

User Profile Reset: No

Additional Info:
I think this is a critical bug for all localized builds, at least a major bug project-wise, should be fixed before the release!


User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:54.0) Gecko/20100101 Firefox/54.0
Comment 1 Alex Thurgood 2017-07-21 15:01:17 UTC
Looks like a DUP of bug 109090
Comment 2 Martin Srebotnjak 2017-07-21 15:56:04 UTC
This is not about the selected language.
Creating a new document with this version of Slovenian LO opens a new document with default language set as Slovenian.
The problem is, that the language is listed as "Slovenian" and not as "slovenski", and all the list of spell-checking languages is populated with English strings and not with localized language names ... That is the problem.
Comment 3 Alex Thurgood 2017-07-25 08:45:20 UTC
No repro with FR langpack and

Version: 5.4.0.3
Build ID: 92c2794a7c181ba4c1c5053618179937228ed1fb
Threads CPU : 4; OS : Mac OS X 10.12.6; UI Render : par défaut; 
Locale : fr-FR (fr_FR.UTF-8); Calc: group
Comment 4 Alex Thurgood 2017-07-25 08:46:05 UTC
In the spellchecker, the names of the various dictionary languages are locaized in French as they should be.
Comment 5 Alex Thurgood 2017-07-25 08:48:53 UTC
Created attachment 134831 [details]
Screenshot for FR spellchecker in 5403

See screenshot showing localized lang strings for spellechecker
Comment 6 Alex Thurgood 2017-07-25 08:53:00 UTC
Created attachment 134832 [details]
Screenshot localized character dialog 5403

The character dialog is also displays the correct localization for the installed lang-pack.
Comment 7 Alex Thurgood 2017-07-25 08:54:24 UTC
Setting to WFM

@Miles : if this problem is still present for you in the latest 540 test release, please indicate which specific lang-packs are affected.
Comment 8 Martin Srebotnjak 2017-07-25 12:50:10 UTC
Created attachment 134835 [details]
Unlocalized list of languages
Comment 9 Martin Srebotnjak 2017-07-25 12:50:53 UTC
Affected is the sl language pack, maybe some others too, I cannot tell, I do not use them.
Comment 10 Martin Srebotnjak 2017-07-25 12:52:27 UTC
Not only that, as you can see in the screenshot, preceding the spelled-out language names in English are lang-codes of installed spell-checker dictionaries, so it is even more serious.
Comment 11 Martin Srebotnjak 2017-07-25 13:10:17 UTC
I is still present with yesterdays RC3, so please take care of this bug.
For me it is a stopper.
Comment 12 Alex Thurgood 2017-07-25 13:59:44 UTC
No repro either with Italian langpack

Versione: 5.4.0.3
Build ID: 92c2794a7c181ba4c1c5053618179937228ed1fb
Thread CPU: 4; SO: Mac OS X 10.12.6; Resa interfaccia: predefinito; 
Versione locale: it-IT (fr_FR.UTF-8); Calc: group
Comment 13 Alex Thurgood 2017-07-25 14:03:12 UTC
Confirming with Slovenian langpack only :

Različica: 5.4.0.3
ID gradnje: 92c2794a7c181ba4c1c5053618179937228ed1fb
Niti CPE: 4; Op. sistem: Mac OS X 10.12.6; Upodobitev vmesnika: privzeta; 
Področna nastavitev: sl-SI (fr_FR.UTF-8); Calc: group

Changing title to reflect this finding
Comment 14 Alex Thurgood 2017-07-25 14:08:27 UTC
(In reply to miles from comment #11)
> I is still present with yesterdays RC3, so please take care of this bug.
> For me it is a stopper.

I am not a bug fixer, merely a bug report triager volunteer.
The bug is confirmed.

Whether or not it gets fixed for 5.4 lies in someone else's hands, probably either the localisation team for Slovenian if the langpack is at fault, or the package builder if the langpack is missing some kind of required information to make it work.

Adding cloph to cc : any ideas what might be wrong here ?
Comment 15 Martin Srebotnjak 2017-07-25 20:02:26 UTC
Ok,this reminds of a bug a few months ago.
Slovenian is not in Pootle, it is pulled directly from source that Andras adds from my po files.
Maybe some setting for OSX builds changed during last months?
I checked my installed 54beta2 and it already showed languages in English.
I do not remember this from beta1, but I cannot check that anymore.
Good news is Slovenian is 100 % localized from OpenOffice.org 2.4 onwards so this list is always fully or almost fully localized.
It must be a build thing.

Please do inspect this, ASAP.
Comment 16 Christian Lohmaier 2017-07-28 09:29:00 UTC
invalid/notourbug/worksforme.

Not a bug in the translation process/packaging, but merely a case of a project not using pootle and not updating against templates.

e.g.: svtools/source/misc.po of fr project:
#: langtab.src
msgctxt ""
"langtab.src\n"
"STR_ARR_SVT_LANGUAGE_TABLE\n"
"English (USA)\n"
"itemlist.text"
msgid "English (USA)"
msgstr "Anglais (U.S.A.)"

(above is correct) vs in sl:
#: langtab.src
msgctxt ""
"langtab.src\n"
"STR_ARR_SVT_LANGUAGE_TABLE\n"
"LANGUAGE_ENGLISH_US\n"
"pairedlist.text"
msgid "English (USA)"
msgstr "angleški (ZDA)" 


→ wrong/old msgctxt

Incomplete translation in a single language is not a stopper, there are couple of languages that don't have 100% coverage.
Furthermore the list of languages is no way a critical place.
Comment 17 Martin Srebotnjak 2017-07-28 13:16:38 UTC
Christian, it is a lie that Slovenian was not updated with latest pot files.
Please do chose your words carefully, I do not have time to sue you. I work for free and really do my job very precisely. As Andras can prove, I update my po files to fresh pot files much more regularly than Pootle, which in the end brings additional value to LO project as I can notice errors in strings or strange behaviour.

I did now find a problem reported by Poedit software in the misc.pot file that is the case here. I do not use Poedit for translation, but it reports an error given by msgmerge, so it is problematic with gettext library, which it means there is either a bug in gettext library or there is real error in this pot file. So I investigated further, unlike you.

It reports an error with a doubled definition of a string, which should never happen in a pot or po file. Regarding July 6 5.4 pot file (that I have) I have then manually found the error *in LO 5.4 pot file*, not in my translation or my translation system.

Here is the duplicated entry:
#. EM5QV
#: langtab.src
msgctxt ""
"langtab.src\n"
"STR_ARR_SVT_LANGUAGE_TABLE\n"
"Default\n"
"itemlist.text"
msgid "Default"
msgstr ""

#. EM5QV
#: langtab.src
msgctxt ""
"langtab.src\n"
"STR_ARR_SVT_LANGUAGE_TABLE\n"
"Default\n"
"itemlist.text"
msgid "Default"
msgstr ""

In the July 6 file it happens in rows 594-613 of misc.pot file.

So do clear your pot files of errors before this gets really serious also for Pootle users.

I have now allowed Poedit to "fix" (its own wording) the pot file) and updated my old po file with the "fixed" pot file. I got 104 fully translated strings, 377 fuzzy strings (probably equal translations with changed tags) and 1 untranslated string ("Kituba" language). I have accepted all fuzzy translations and translated this new string and I am attaching the updated Slovenian po file that should go into the svtools/source folder.

So I guess my translation system found an error while using msgmerge in this file and skipped it. Which is not an error on my side, but an error on LO side.

I accept your apologies, cloph.
Comment 18 Martin Srebotnjak 2017-07-28 13:17:55 UTC
Created attachment 134938 [details]
Updated Slovenian misc.po file

Cleaned and updated Slovenian po file.
Slovenian is the only language of l10n teams in LibreOffice to have error-free po files. Unlike the Pootle languages.
Comment 19 Christian Lohmaier 2017-07-28 21:07:48 UTC
> Christian, it is a lie that Slovenian was not updated with latest pot files.
> Please do chose your words carefully, I do not have time to sue you.

I stick by it. Feel free to sue me. I actually did bother and look at the cause, but you decided to ignore it, that's why I resolve this bug to invalid/works as designed.

> I work for free and really do my job very precisely.

For this issue it doesn't matter who does the job, whether it is individual or a company or some monkeys hitting random keys.

It is a *FACT* that the po files that were added to the libreoffice-5-4-0 branch do not  reflect the current templates.

> As Andras can prove, I update my po files to fresh pot files much more 
> regularly than Pootle,

Is not worth a dime if it doesn't match the final versions.

> which in the end brings additional value to LO project as I can notice errors 
> in strings or strange behaviour.

But doesn't make you better than anyone else, especially not deserving special treatment.

> I did now find a problem reported by Poedit software in the misc.pot file that
> is the case here. I do not use Poedit for translation, but it reports an error
> given by msgmerge, so it is problematic with gettext library, which it means
> there is either a bug in gettext library or there is real error in this pot 
> file. So I investigated further, unlike you.

You're bullshitting me, and I don't appreciate that.

> It reports an error with a doubled definition of a string, which should never
> happen in a pot or po file.

Guess what Sherlock, this is not new info. This has been known from the start, mentioned in ESC and on l10n list. 

> Regarding July 6 5.4 pot file (that I have) I have then manually found the 
> error *in LO 5.4 pot file*, not in my translation or my translation system.

Still bullshit/two completely different things. Yes. Strings that were formerly different ones now have a single representation. But that is not the problem why the Slovenian translation is broken.

Don't blame us when you ignore error messages in your tooling/when your tooling cannot cope.

> So do clear your pot files of errors before this gets really serious also for
> Pootle users.

Pootle is totally fine with that, as are other po processing tools.

> So I guess my translation system found an error while using msgmerge in this 
> file and skipped it. Which is not an error on my side, but an error on LO side.

again. your problem for ignoring error messages.

> I accept your apologies, cloph.

You can go pleasure yourself.

You confirmed that the po files that were provided didn't match latest templates, thus invalid, no reason to revert or do another release just  for Slovenian. 5.4.1 will happily accept the fixed po file.
Comment 20 Martin Srebotnjak 2017-07-28 21:51:00 UTC
Oh, buzz off, cloph.
Let's let the monkeys do the translation, and for that matter, the tooling as well.
Comment 21 Martin Srebotnjak 2017-07-28 22:42:59 UTC
Renaming the bug as it shows that the l10n process is not consistent with the gettext/po standards producing double entries which leads to inconsistencies in translations and should at all cost be avoided.

Currently the cost is low but it can lead to unwanted proportions.
Comment 22 Martin Srebotnjak 2017-08-19 08:11:02 UTC
I just wrote a script to find duplicate entries in all po(t) files in LO and it is pretty simple (gettext libraries must be installed):
for f in $(find ./pot_LO54_2017-07-07 -type f -name '*.pot'); do msguniq -d "$f"; done

This script shows there are or were 17 duplicate string entries in 8 po(t) files (as of LO54 on July 7, 2017, which is the latest I have to test with).

I am attaching my shortened version of the report.

I will now update Slovenian po files that are affected by duplicated po strings.
Comment 23 Martin Srebotnjak 2017-08-19 08:12:38 UTC
Created attachment 135654 [details]
Duplicated po strings in LO code

These are the duplicated po strings in LO code that should be removed/tagged uniquely.
Comment 24 Christian Lohmaier 2017-08-22 14:09:54 UTC
re fixing for 5.4 I'll just add what I wrote in mail:

As master branch with gettext migration doesn't have duplicate (non-unique strings in terms of po-format), and that fixing it for 5.4 would be a break of string freeze (and the problem is not in translations themselves, but just when updating against latest templates, which now after string freeze should be rare exception) I consider this wfm/wontfix for the 5-4 branch.

An easy workaround exists (running msguniq over the templates), the code itself doesn't care about duplicates (I think it is first-match-wins, but might be non-deterministic if there are indeed dupes) and pootle based translations are already cleared off the duplicates (and even if there were could be cleaned up using the same msguniq call).

Happy to wait with tagging for 5.4.1 rc2 until Thu evening or even Friday. Not sure Whether the Commit to the branch is a intermediate or already the fully completed version.
Comment 25 Martin Srebotnjak 2017-08-22 22:29:28 UTC
Hi, Cloph,
I believe Andras took in the full po patch, but let him confirm this.
I agree this is not a stopper for 5.4, but for >5.4 there should be some tooling to prevent duplicate strings.
Comment 26 Andras Timar 2017-08-23 11:58:59 UTC
(In reply to miles from comment #25)
> Hi, Cloph,
> I believe Andras took in the full po patch, but let him confirm this.
> I agree this is not a stopper for 5.4, but for >5.4 there should be some
> tooling to prevent duplicate strings.

Yes, I pushed everything that Martin sent me.
Comment 27 Martin Srebotnjak 2017-08-23 21:15:54 UTC
Yes, Christian, that was the complete patch.
Thanks, m.
Comment 28 Martin Srebotnjak 2017-08-25 19:23:05 UTC
*** Bug 111878 has been marked as a duplicate of this bug. ***
Comment 29 Martin Srebotnjak 2017-09-10 20:15:11 UTC
Hi,
I ran msguniq over the today's pot zip of 6.0/master and it found a duplicate in /extensions/messages.pot, where the following string appears 4 times with the same definition in the same file:

#. aPRNZ
#: strings.hrc:286
msgctxt "RID_UPDATE_STR_DLG_TITLE"
msgid "Check for Updates"
msgstr ""

Please, correct this mistake.

Thanks, m.
Comment 30 Martin Srebotnjak 2017-09-11 01:12:44 UTC
Please disregard my previous message.

The on-the-road-to-6.0 reshuffling of strings into messages.po files for translation caused the following number of duplicated definitions appearing in the following files (reported by msginit, but detailed string definitions appearing several times in the same file can be extruded by running msguniq):

/avmedia/messages.pot
msginit: 60 critical errors

/basctl/messages.pot
msginit: 155 critical errors

/basic/messages.pot
msginit: 405 critical errors

/connectivity/messages.pot
msginit: 324 critical errors

/cui/messages.pot
msginit: 1293 critical errors

/extensions/messages.pot
msginit: 1983 critical errors

/forms/messages.pot
msginit: 174 critical errors

/formula/messages.pot
msginit: 1311 critical errors

/fpicker/messages.pot
msginit: 38 critical errors

/framework/messages.pot
msginit: 29 critical errors

/reportdesign/messages.pot
msginit: 774 critical errors

/sc/messages.pot
msginit: 5759 critical errors

/scaddins/messages.pot
msginit: 901 kritična napaka

/sccomp/messages.pot
msginit: 36 critical errors

/sd/messages.pot
msginit: 3438 critical errors

/sfx2/messages.pot
msginit: 915 critical errors

/starmath/messages.pot
msginit: 1518 critical errors

/svtools/messages.pot
msginit: 1761 critical errors

/svx/messages.pot
msginit: 1146 critical errors

/uui/messages.pot
msginit: 471 critical errors

/xmlsecurity/messages.pot
msginit: 270 critical errors

The last one, for example, using msguniq shows there are about 96 string definitions (if I counted right manually) appearing in one or more repetitions, causing 270 critical errors - i.e. one string (of the 96) has approximately 3 unneeded/unwanted repeated definitions.

Some of these pot files can be fixed by running them with poEdit, which then saves the pot file without duplicated, but extensions/messages.pot cannot get cleaned with that, there might be other errors (or this is a bug with poEdit).
Comment 31 Martin Srebotnjak 2017-09-12 16:46:12 UTC
My further investigation into this showed (because poEdit did not want for some reason delete the duplicate string definitions from extensions/messages.pot and opened it as it is, and then I selected the Group by content option) that all the newly generated messages.pot - and this goes for all messages.pot files that gathered all strings from existing UI subdir po files in LibreOffice - are all faulty!

*They consist of quadrupled gettext strings definitions.*

A simple example is the short sccomp/messages.pot file that you can check from git.
Here is the structure of the file (as of September 10th, 2017):
lines 1-14: header
lines 16-86: first set of string definitions
lines 89-159: second set of string definitions (a complete copy of the first set)
lines 162-232: third set of string definitions (second copy of the first set)
lines 235-305: fourth set of string definition (third copy of the first set)

You can see where the repetitions begin because there is a double empty line there: 15, 87 160, 233. So you can check all other messages.pot files for double empty lines and find where it happens.

So all pot files, generated by this huge migration of UI strings, resulted in faults - with quadrupled definitions. This is a huge problem, IMHO.
Comment 32 Martin Srebotnjak 2017-09-12 23:22:30 UTC
I think I might have found what poEdit didn't like about extensions/messages.pot to automatically delete duplicated string entries.

It is probably this entry in the pot file:

#. AXGeC
#: strings.hrc:320
msgctxt "RID_UPDATE_BUBBLE_DOWNLOADING"
msgid ""
msgstr ""

The English string is empty, as you can see ...
Comment 33 Martin Srebotnjak 2017-09-24 09:17:21 UTC
Just a report based on the state of UI 6.0/master strings on September 23:
a- the duplicates (quadruplicates) of UI strings are gone from messages.pot files;
b- the UI strings reshuffle for l10n Caolan did and checked in on September 14 went OK (AFAICS);
c- there are some new duplicates (so the tooling still allows duplicate string definitions).

Here are the duplicates: in the sc/messages.pot there are 13 duplicated string definitions from files notebookbar_groupedbar_compact.ui and notebookbar_groupedbar_full.ui (mBSfG, Z7t2R, xeEFE, G3TRo, Hq6JL, FPdH9, sqE94, keb9M, 3ibZN, DGBbw, WtFbH, t9EbD, FFrSw).

As far as how to automatically detect these duplicates at the time of check-in (or whenever that would be most helpful) - that is still the goal of this bug report.
Comment 34 Martin Srebotnjak 2017-11-06 10:53:09 UTC
As of November 03 state of strings in master, I report of duplicate strings "only" in the sd/messages.pot, here are some string codes:
nTEKy, ZShaH, h6EHi, WfzeY, BHDdD, i8XUZ, 4nboE, DQLzy.

As far as how to automatically detect these duplicates at the time of check-in (or whenever that would be most helpful) - that is still the goal of this bug report.
Comment 35 Martin Srebotnjak 2018-04-05 06:52:36 UTC
Just downloaded pot files for 6.1 (master) dated 04-04.
Runing the command
for f in $(find ./pot_LO61_2018-04-04 -type f -name '*.pot'); do msguniq –d "$f"; done
in the folder of pot files I got report of duplicated pot entries (making the pot files not compliant to the gettext rules), obviously (all?) located in sw/uiconfig/swriter/ui (i.e. file sw/messages.pot).
I am attaching the report.
For the time being, I cleaned the file of duplicated entries by opening it with poEdit which automatically fixed the duplicates and saved it back.
Comment 36 Martin Srebotnjak 2018-04-05 06:53:14 UTC
Created attachment 141107 [details]
List of duplicate localization strings in 6.1/master (as of 2018-04-04)
Comment 37 Martin Srebotnjak 2018-04-23 16:42:41 UTC
While last reported duplicates were fixed soon after my reporting,
I must unfortunately report of new three duplicated string definitions, two in sd/messages.pot and one in sw/messages.pot.

Will attach a detailed report.

Hopefully this check can be somehow automated in the build or code keeping system.
Comment 38 Martin Srebotnjak 2018-04-23 16:50:46 UTC
Created attachment 141576 [details]
List of duplicate localization strings in 6.1/master (as of 2018-04-23)

Fresh dupe list is attached.
Comment 39 Julien Nabet 2019-02-09 16:36:36 UTC
Martin:
About the part:
#. S4ZPU
#: sd/uiconfig/simpress/ui/notebookbar.ui:6408
#: sd/uiconfig/simpress/ui/notebookbar_groupedbar_full.ui:3816
msgctxt "notebookbar_groupedbar_full|slideshowb"
msgid "_Slide Show"
msgstr ""

seems ok now.

About the part:
#. gQQfL
#: sd/uiconfig/simpress/ui/notebookbar.ui:7031
#: sd/uiconfig/simpress/ui/notebookbar_groupedbar_full.ui:4854
#: sd/uiconfig/simpress/ui/notebookbar_groupedbar_full.ui:6446
msgctxt "notebookbar_groupedbar_full|reviewb"
msgid "_Review"
msgstr ""

I submitted this patch on gerrit:
https://gerrit.libreoffice.org/#/c/67589/1


About the part:
#. 8E3hc
#: sw/uiconfig/swriter/ui/notebookbar.ui:2138
#: sw/uiconfig/swriter/ui/notebookbar_compact.ui:2393
msgctxt "notebookbar_compact|fileb"
msgid "_File"
msgstr ""

I got in sw/uiconfig/swriter/ui/notebookbar.ui
2079    <object class="svtlo-ManagedMenuButton" id="File-FileButton:MenuFile">
2080    <property name="label" translatable="yes" context="WriterNotebookbar|FileMenuButton">_File</property>

sw/uiconfig/swriter/ui/notebookbar_compact.ui
2207    <object class="svtlo-ManagedMenuButton" id="File-FileButton:MenuFile">
2208    <property name="label" translatable="yes" context="notebookbar_compact|FileMenuButton">_File</property>

Just to be sure, the fact they have the same id is a pb?
Comment 40 Julien Nabet 2019-02-13 19:30:47 UTC
The quoted patch is now pushed on master, see https://cgit.freedesktop.org/libreoffice/core/commit/?id=51cc835ab8d4c4e8d4f0219635f7300870d49cc6.

So waiting for feedback to know if there are still problems.
Comment 41 Xisco Faulí 2019-03-21 09:53:30 UTC
(In reply to Julien Nabet from comment #40)
> The quoted patch is now pushed on master, see
> https://cgit.freedesktop.org/libreoffice/core/commit/
> ?id=51cc835ab8d4c4e8d4f0219635f7300870d49cc6.
> 
> So waiting for feedback to know if there are still problems.

I don't think NEEDINFO is the proper status here.
Let's put it to RESOLVED FIXED instead. We can always reopen it if the problem is still present.