Description: Documents I receive from students show {en} as the text language. I have gone into Tools > Options > Language Settings > Languages and checked that English (USA) is the selection in User Interface, Locale Setting, And Default Languages of Documents. However, the setting does not take effect. Documents I receive from students instead appear as having {en} as the language. Steps to Reproduce: 1. Go to Tools > Options > Language Settings > Languages and check that English (USA) is the selection in User Interface, Locale Setting, And Default Languages of Documents. 2. Open a document, discover in sadness that the setting is not taking effect. Actual Results: The setting is not forcing documents to use the desired language selection for documents. This requires you to select the entire document, apply the setting, and then continue with your work. Annoying. Expected Results: This should force all documents to use the selected language. Reproducible: Always User Profile Reset: Yes Additional Info: This should force all documents to use the selected language.
*** Bug 137743 has been marked as a duplicate of this bug. ***
(In reply to Larry Tate from comment #0) > Description: > Documents I receive from students show {en} as the text language. I have > gone into Tools > Options > Language Settings > Languages and checked that > English (USA) is the selection in User Interface, Locale Setting, And > Default Languages of Documents. However, the setting does not take effect. > Documents I receive from students instead appear as having {en} as the > language. > ... > This should force all documents to use the selected language. No. This setting has no relation to what you see when open *existing* documents, only to new documents, or to new text that you may enter to documents that don't define language themselves. Normally all current complex document formats, like ODF or OOXML (or older binary formats of MSO), include the information about language used when creating them. And *that* information must be used when you open those documents, not something that you define as default for new documents. So this is not a bug. However, the question is why you see the {en}, and not something like "English (UK)" or "English (South Africa)" and so on. The question is, if this is something wrong in LibreOffice at opening/import stage, or something happening when saving the files (what format are the documents? which application was used to create them?). You may want to attach a sample document that shows the problem. Then possibly the report may be confirmed, and converted to address the actual bug.
Created attachment 166703 [details] Doc that produces the issue I described with language settings.
Thank you for that clarification. I am attaching a representative document where the issue appears. These are student essays, so I had to remove virtually all of the text in the document. However, after making the edits and saving the file, upon reopening the same issue I describe is still present.
(In reply to Larry Tate from comment #4) Opening the document, the language indeed is shown as {en}. The document's word/styles.xml contains > <w:lang w:val="en" ... /> However, I cannot reproduce generating such a value neither using LibreOffice 7.0.3.1, nor with Word 2016. There's no way to select generic "English" in Word's list of languages... So the question remains: "which application was used to create them?" And the fact that > I had to remove virtually all of the text in the document i.e., you edited it and re-saved in your LibreOffice, made the document changed, and impossible to understand what was used to generate the document. You could ask one of the students to send you a document with a dummy text - just for inspection...?
Just a hint: Google Docs downloading .docx file creates such <w:lang w:val="en" /> entry in styles.xml.
Created attachment 166731 [details] LoremIpsum created using Google Docs shows issue Attached a Google Docs dummy text, which has been set to English and shows "{en}" in status bar for the language in use.
(In reply to Uwe Auer from comment #6) Thanks! So what should be the issue here? Should it be a NAB? Or maybe should it be an enhancement to enable a fallback for generic en case, to use some dictionary (which? en-GB? en-US?)?
I'm getting this problem on 30 out of 30 submissions, so I don't think this is coming exclusively from Google Docs. I am working now on securing a dummy document from a student that I can share with you and surveying the class about their OS and software for composing. Standby! And thanks.
(In reply to Larry Tate from comment #9) > I'm getting this problem on 30 out of 30 submissions, so I don't think this > is coming exclusively from Google Docs. > This makes me feel that there is a template, already having this setting and which has been distributed to all students.
(In reply to Mike Kaganski from comment #8) > Or maybe should it be an enhancement to enable a fallback for generic en > case, to use some dictionary (which? en-GB? en-US?)? Hmm, from a pure users perspective: Pop up a selection list, stating that the documents language setting is ambiguous and have the user select the variant of their language (offered languages restricted to the settings '{en}-*.'). Finally, to not touch/change the existing document, force a save to a new document (Just my thoughts, you may immediately forget that). More pragmatic: On open inform user about i) ambiguity/incompleteness in language setting ii) forced a change to en-US (or e.g de-DE, if it was '{de}'-whatever it could be) From a more technical perspective: Not a bug of LibreOffice but a the creating application's bug (but in fact I'm not aware of any standard here).
(In reply to Uwe Auer from comment #11) > More pragmatic: On open inform user about > i) ambiguity/incompleteness in language setting > ii) forced a change to en-US (or e.g de-DE, if it was '{de}'-whatever it > could be) This seems consistent with currently existing infobar when a hyphenation dictionary is missing. It would feel logical from my PoV.
(In reply to Uwe Auer from comment #10) > (In reply to Larry Tate from comment #9) > > I'm getting this problem on 30 out of 30 submissions, so I don't think this > > is coming exclusively from Google Docs. > > > > This makes me feel that there is a template, already having this setting and > which has been distributed to all students. One thought I'd considered is that these docs are all downloaded from our college's LMS (Canvas). I suppose it is possible that these documents are altered in some way before they are offered to the professor for download...
Not much that UX people can contribute here. At least GDocs export function is broken and due to false language information the spellchecking won't work as expected. Adding a warning / infobar sounds not really actionable to me. The user would be supposed to understand what {en} means, what consequences this setting has, and how to solve the issue. My take: either we fix it silently and convert ISO 639-1 codes (or whatever it is) into proper language tags or just blame others (=> NOB). (needsUX needs UX-advice at CC)
(In reply to Heiko Tietze from comment #14) > Not much that UX people can contribute here. At least GDocs export function > is broken and due to false language information the spellchecking won't work > as expected. Well ... GDocs export function is not broken (unfortunately). The "en" is a valid BCP-47 tag ... and there's no requirement that language there include also country part of the tag. GDocs apparently don't discriminate their English dictionaries for countries, or have one of those dictionaries set as "generic English" ... and they are formally correct. We add supported languages as required: people ask Eike to add this locale, or that locale ... and then provide locale data and translations and dictionaries ... and it appears in our list. Nothing prevents us - or any other OOXML- or ODF-conformant application - to use "en" locale (as opposed to e.g. "en-US").
(In reply to Heiko Tietze from comment #14) > My take: either we fix it silently and convert ISO 639-1 codes (or whatever > it is) into proper language tags or just blame others (=> NOB). But hardcode-mapping "en" to "en-US" (as MS Word does) seems a sane way to solve this.
Other than displaying the language as "{en}" (which means "en" is a valid language tag but there is no "English (generic)" or some such language/locale list entry), is there an actual problem with those documents? Of course spell-checking doesn't work with that because there's no indication which English dictionary is to be used, unless the system provides one for only "en" (which AFAIK no system has for good reasons). Yes, we could map a bare "en" to "en-US" but that could be equally wrong if instead it should had been "en-GB" (or some other). In Google's great US-centric manner the en-US might be desired here, but.. Asking the user or popping up infobars or making up other fallbacks is not an option because we accept and preserve *any* syntactically valid BCP47 language tag, also unknown to us, on purpose. Btw, I could not see a language attribution in the GDocs UI, spell-checking seems to use some language recognition, e.g. using German words in an English UI checks fine and downloading as .docx or .odt also results in "en" attribute so that crap is useless anyway.
Not to contradict Eike (personally I totally agree with his assessment), just wanted to mention two related discussions, which incidentally show how ambiguous this is - so this basically is expected to support Eike's PoV: 1. In this one, a random person argues that for "Guessing the missing parts" problem, *the rule is to select the "original country" of the language. The exceptions are mostly based on population* (mentioning en, pt, and zh as those exception cases). https://stackoverflow.com/questions/2500066/if-you-have-an-application-localized-in-pt-br-and-pt-pt-what-language-you-shoul 2. This one shows that the first fallback for English locales is chosen by Google to be "International English variant", which is en-GB: "After opening a bug report on Google, defaulting to en_GB and not default strings.xml, they mentioned that this in the intended behaviour for Android N above". https://stackoverflow.com/questions/45511769/localization-for-canada-defaults-to-uk-should-default-to-us
So let's do the silent conversion. Since we default to en_US (and ship the localization) I would use rather this (surprised that Google defaults to en_GB).
Note that the problem is not limited to "en". For example, "fr", "zh", "it" are also affected: https://ask.libreoffice.org/en/question/286107 https://ask.libreoffice.org/en/question/291826 https://ask.libreoffice.org/en/question/289004
"el", "ja", "ka" ... : https://ask.libreoffice.org/en/question/293486 https://ask.libreoffice.org/en/question/284727 https://ask.libreoffice.org/en/question/280102 https://ask.libreoffice.org/en/question/298632 https://ask.libreoffice.org/en/question/287776 ... It's pretty annoying to people, and many are affected. I make it "normal" instead of "trivial".
*** Bug 137635 has been marked as a duplicate of this bug. ***
*** Bug 136808 has been marked as a duplicate of this bug. ***
*** Bug 136809 has been marked as a duplicate of this bug. ***
*** Bug 132396 has been marked as a duplicate of this bug. ***
This might help to find a solution. While wotking on AskLO question 286107, I made an experiment with Writer 7.0.5.2 Linux 5.11 VCL kf5 to try and understand when the message was issued. Some users complain that the misbehaviour happens on standard .odt files having never seen Google Docs. A suggested solution was to uncheck the auto-hyphenate box in the paragraph style Text Flow tab. I could not create the misbehaviour at first. But after having tampered a lot with Tools>Options, I made Writer crash. After the crash, I let the auto-recovery rebuild the document. When done, the message was present although the language pack is installed and hunspell modules too. I looked at the .fodt version but did not see anything obvious as I am not familiar with the details of encoding. However if I change fo:language="fr" to fo:language="fr_FR", there is no longer any message on open. The trick works also for uninstalled language. Where I am puzzled is the fact that I created then a fresh document from scratch, saved it and made sure it opens without "missing hyphenation data" message. When I looked at its XML, the fo:language attributes were simply set to "fr" without country code (which is in attribute fo:country=…). So the cause of the problem may be somewhere else. I'm attaching the faulty file for analysis. Its paragraph styles have been a bit modified after the crash but the mishap is still there.
Created attachment 170851 [details] Recovered document issuing "Missing hyphenation data"
(In reply to ajlittoz from comment #26) > I looked at the .fodt version but did not see anything obvious as I am not > familiar with the details of encoding. However if I change fo:language="fr" > to fo:language="fr_FR", there is no longer any message on open. The trick > works also for uninstalled language. fo:language="fr_FR" is wrong though, the fo:language attribute is to contain *only* the language. > When I looked at its XML, the fo:language attributes were simply > set to "fr" without country code (which is in attribute fo:country=…). Which is correct. fo:language="fr" fo:country="FR" denotes the fr-FR language tag. > I'm attaching the faulty file for analysis. Its paragraph styles have been a > bit modified after the crash but the mishap is still there. All four <style:text-properties> have both fo:language="fr" fo:country="FR" attributes. There is no fo:language="fr" alone. The missing hyphenation data messsage when opening the document is reproducible for me, though understandably because I don't have any French spell-checking or hyphenation installed. If it happens also if French hyphenation data is installed then it's likely not because the document would contain wrong language attribution but something else.
(In reply to Eike Rathke from comment #28) > (In reply to ajlittoz from comment #26) > All four <style:text-properties> have both fo:language="fr" fo:country="FR" > attributes. There is no fo:language="fr" alone. I was short in my description. The fo:country attribute was there of course. I tried fr_FR in the .fodt of it just to see, even if I knew this was redundant and contradictory with fo:country. > The missing hyphenation data messsage when opening the document is > reproducible for me, though understandably because I don't have any French > spell-checking or hyphenation installed. Initially, I "tagged" the first paragraph it_IT because this hyphenation package is not installed. After the crash, to my surprise, the message requested fr, not it. When I changed fr to fr_FR (which is wrong), then the message requested it. > If it happens also if French hyphenation data is installed then it's likely > not because the document would contain wrong language attribution but > something else. This is why I attached it, being unable to interpret the rest of it. Make a .fodt from the attached document and change the fo:language for one you have installed. Open it in Writer. Does the message still display?
Fwiw, code pointers for that: Check for hyphenator to output the info message happens in sw/source/core/text/inftxt.cxx line 1496 of SwTextFormatInfo::IsHyphenate() with if (!xHyph->hasLocale(g_pBreakIt->GetLocale(eTmp))) where eTmp is LCID 0x040C (1036) for fr-FR and GetLocale() call correctly results in lang::Locale("fr","FR","") and then in my debug build where I do have French hyphenation available xHyph->hasLocale() returns true and does not complain. The info message in case the locale is not found also only outputs the language, not the full language tag, so that single "fr" there is explained.
However, I doubt this is even related to the original problem of Google documents being broken by specifying only a language, could you please submit another bug for that? Thanks.
(In reply to Eike Rathke from comment #31) > However, I doubt this is even related to the original problem of Google > documents being broken by specifying only a language, could you please > submit another bug for that? Thanks. See bug 141384.
For the record, the brackets started to be displayed after https://cgit.freedesktop.org/libreoffice/core/commit/?id=bde834ee6b0cb43cebece47cac55cc9b80aadc24 author Eike Rathke <erack@redhat.com> 2017-03-14 11:52:52 +0100 committer Eike Rathke <erack@redhat.com> 2017-03-14 12:48:22 +0100 commit bde834ee6b0cb43cebece47cac55cc9b80aadc24 (patch) tree ef3f5ebe8340d0e0392905ec3cfab033953a6425 parent bf63e5a3a6ae458ffe10061c1bcf969a534760c5 (diff) display raw language tags in curly brackets
(In reply to Mike Kaganski from comment #20) > Note that the problem is not limited to "en". For example, "fr", "zh", "it" > are also affected: (In reply to Mike Kaganski from comment #21) > "el", "ja", "ka" ... : Those are quite useless though because none of them provided a sample document or answered the question whether their document has been processed by GoogleDocs (except one who claims they didn't but also did not provide a sample document), and the message displayed only the language, not the full language tag; which I fixed with https://gerrit.libreoffice.org/c/core/+/119020 (also for 7-2 and 7-1)
I'll see if I can make something out of the known GDocs 'en' case at least..
Eike Rathke committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/23f17b7ea6fbd2f422c7e40192ae60e4df25224c Resolves: tdf#137742 Workaround cheesy Google Docs writing language-only tags It will be available in 7.3.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Pending review https://gerrit.libreoffice.org/c/core/+/120438 for 7-2
Eike Rathke committed a patch related to this issue. It has been pushed to "libreoffice-7-2": https://git.libreoffice.org/core/commit/118eb9e426fe729324347685f986ff9e78d49483 Resolves: tdf#137742 Workaround cheesy Google Docs writing language-only tags It will be available in 7.2.1. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Xisco Fauli committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/f6a04457b8aa227deb9402e6406ea843fabfcbb0 tdf#137742: sw_ooxmlexport16: Add unittest It will be available in 7.3.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
*** Bug 144273 has been marked as a duplicate of this bug. ***