Bug 125411 - FILSAVE DOCX and DOC: Language information lost upon saving text
Summary: FILSAVE DOCX and DOC: Language information lost upon saving text
Status: CLOSED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-21 10:02 UTC by Lars Jødal
Modified: 2019-05-22 16:28 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lars Jødal 2019-05-21 10:02:31 UTC
Description:
When I use LO Writer to write a text in a different language than my locale, I can mark the locale of the text and get spelling correction in the correct language. (So far, so good.) However, if I save as DOC or DOCX file and open in MS Word then Word has lost the language information and treats the text as being in my default language (and therefore finds lots of "spelling errors").

Noteworthy, if I open the DOC or DOCX file within LO Writer, then the language information is retained. Apparently, the information is saved within the DOC or DOCX file, but not in the correct format for MS Word. I use MS Word 2013.

Steps to Reproduce:
1. Write a text using a different language than your locale, and choose the correct language for the text.

2. Sample:
This sentence is in English. 
Diesen Satz ist in Deutsch. [German]
Denne sætning er på dansk. [Danish]

3. Spell-check. (If correct languages have been chosen, there should be no spelling errors. This step will only give you assurance if you have the relevant dictionaries installed.)

3. Save the file in DOC format or DOCX format.

4. Open the saved file in MS Word.

5. Spell-check the file within MS Word.

Actual Results:
MS Word will treat all text as your default language and find spelling errors where the language is different.

Expected Results:
The langugage information should be stored in the DOCX or DOC file in a format readable to MS Word, for which reason the spell-checking in MS Word should be able to work correctly for the non-default language.


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 6.2.3.2 (x64)
Build ID: aecc05fe267cc68dde00352a451aa867b3b546ac
CPU tråde: 4; Styresystem: Windows 10.0; Gengiver af brugergrænseflade: GL; VCL: win; 
Lokalisering: da-DK (da_DK); Sprog for brugergrænseflade: da-DK
Calc: threaded
Comment 1 Eike Rathke 2019-05-21 11:15:46 UTC
With your example, having unzip'ed the stored .docx document and inspecting the contained word/document.xml file I spot nothing wrong:

<w:p><w:pPr><w:pStyle w:val="Normal"/><w:rPr></w:rPr></w:pPr><w:r><w:rPr></w:rPr><w:t>This sentence is in English.</w:t></w:r></w:p><w:p><w:pPr><w:pStyle w:val="Normal"/><w:rPr><w:lang w:val="de-DE"/></w:rPr></w:pPr><w:r><w:rPr><w:lang w:val="de-DE"/></w:rPr><w:t>Diesen Satz ist in Deutsch.</w:t></w:r></w:p><w:p><w:pPr><w:pStyle w:val="Normal"/><w:rPr><w:lang w:val="da-DK"/></w:rPr></w:pPr><w:r><w:rPr><w:lang w:val="da-DK"/></w:rPr><w:t>Denne sætning er på dansk.</w:t></w:r></w:p>

The paragraphs are clearly tagged with <w:lang w:val="de-DE"/> and <w:lang w:val="da-DK"/>

Having selected only the words within the sentences (excluding the . dot) to force a non-paragraph attribution there's

<w:p><w:pPr><w:pStyle w:val="Normal"/><w:rPr></w:rPr></w:pPr><w:r><w:rPr></w:rPr><w:t>This sentence is in English.</w:t></w:r></w:p><w:p><w:pPr><w:pStyle w:val="Normal"/><w:rPr></w:rPr></w:pPr><w:r><w:rPr><w:lang w:val="de-DE"/></w:rPr><w:t>Diesen Satz ist in Deutsch</w:t></w:r><w:r><w:rPr></w:rPr><w:t>.</w:t></w:r></w:p><w:p><w:pPr><w:pStyle w:val="Normal"/><w:rPr></w:rPr></w:pPr><w:r><w:rPr><w:lang w:val="da-DK"/></w:rPr><w:t>Denne sætning er på dansk</w:t></w:r><w:r><w:rPr></w:rPr><w:t>.</w:t></w:r></w:p>

Nothing wrong either, IMHO (someone with more OOXML WordprocessingML experience may want to verify).

Reloading the .docx into LibreOffice also works as expected, preserving the language attribution

For reference see the OOXML specification ECMA-376-1:2016 17.3.2.20 lang (Languages for Run Content) and 22.9.2.6 ST_Lang (Language Reference).
Comment 2 Lars Jødal 2019-05-21 15:41:48 UTC
Good idea to unzip the docx-file. After similar experiments myself, I get the suspicion that the bug is in MS Word 2013, not in LO Writer:

1) Testing with MS Word 2016 (rather than 2013), the problem disappears.

2) Opening the file in MS Word 2013 (finding apparent spelling errors) and saving from there, the unzipped docx file contains no "lang" tag. I.e., MS Word 2013 missed the tag.

3) Opening the file in MS Word 2013, changing languages and then saving, the unzipped docx file does contain "lang" tags. I am no xml expert, but from a cursory look, the tags looks very much like the original. The main difference seems to be in the introductory formatting.

So, I may quite possibly have reported a bug that does not belong to LO but to an old version of MS Word. 

Just in case somebody can get useful information from it, I report below the introductory xml codes from both versions of the file. If it is not interesting (not giving tips to a better docx export filter), then this bug can simply be closed.


Original (saved by LO):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" mc:Ignorable="w14 wp14"><w:body> [...] </w:body></w:document>

Saved from Word 2013 after specifically setting languages:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 wp14"><w:body> [...] </w:body></w:document>
Comment 3 Eike Rathke 2019-05-22 16:27:26 UTC
So if Word 2016 opens and saves it correctly but Word 2013 doesn't let's close this as notourbug.