Bug 115298 - Locale information is wrong when LibreOffice optimizes them from span to paragraph
Summary: Locale information is wrong when LibreOffice optimizes them from span to para...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.0.4 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, dataLoss, regression
Depends on:
Blocks: Paragraph Save
  Show dependency treegraph
 
Reported: 2018-01-29 14:14 UTC by Christian.gruemme
Modified: 2024-02-19 16:33 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
the document carrieng the spans (9.07 KB, application/vnd.oasis.opendocument.text)
2018-01-29 14:15 UTC, Christian.gruemme
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christian.gruemme 2018-01-29 14:14:22 UTC
Description:
When there is the situation, that there are several text:spans carrying the same locale information. LibreOffice moves the locale information from the text:spans to the parent paragraph. This is fine, but LibreOffice sets the locale information of the text:spans to "none" instead of removing it. Then there is no correct locale information on those text:spans resulting i.e. in non correct hyphenation and spell check.

Steps to Reproduce:
1. Save the attached document and look in the "content.xml". Note that the spans carry the french locale information.
2. Open the document with the writer and save the document again. Now take a look in the "content.xml" again. Note that the spans carry the "none" locale information.


Actual Results:  
Text style has locale value set to "none".

Expected Results:
Text style has the french locale or no locale.


Reproducible: Always


User Profile Reset: Yes


OpenGL enabled: Yes

Additional Info:
The best would be when libreoffice optimizes the locale information to remove the "fo:country" and "fo:language" completly instead of setting them to the value "none".


User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36
Comment 1 Christian.gruemme 2018-01-29 14:15:30 UTC
Created attachment 139435 [details]
the document carrieng the spans
Comment 2 Christian.gruemme 2018-01-29 14:15:54 UTC
Comment on attachment 139435 [details]
the document carrieng the spans

This is a test case.
Comment 3 Buovjaga 2018-02-17 15:20:54 UTC
Yep I see

<style:style style:name="T1" style:family="text">
    <style:text-properties fo:language="none" fo:country="none" />
</style:style>

and spans using the style

<text:p text:style-name="P1">
    <text:span text:style-name="T1">Popotan </text:span>
    <text:date style:data-style-name="N79" text:date-value="2018-01-17T11:37:27.375999949" text:fixed="true">mercredi 17 janvier 2018</text:date>
    <text:s />
    <text:span text:style-name="T1">
        <text:s />est un visual novel de type eroge japonais développé par Petit Ferret, avec pour designer Akio Watanabe dessinant sous le pseudonyme de Poyoyon Rock.</text:span>
</text:p>

This actually does not happen in 3.6.

Arch Linux 64-bit
Version: 6.1.0.0.alpha0+
Build ID: 26783527823883ccd5bbf3b9e014a0a3c1e3a022
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on February 16th 2018

Arch Linux 64-bit
Version 3.6.7.2 (Build ID: e183d5b)
Comment 4 Buovjaga 2018-02-17 17:19:26 UTC
Hmm, tested with 4.3.0 beta1 and it changes the locale to en-US

<style:style style:name="T1" style:family="text">
	<style:text-properties fo:language="en" fo:country="US" />
</style:style>
Comment 5 Buovjaga 2018-07-10 19:05:33 UTC
Bibisected with Linux 42max. Unfortunately there were lots of skipped commits due to refusal to launch and a peculiar case where the file opened with missing data.

Git finally regurgitated a list of 52 potential bad commits. I edited the list so it had "git log" at the beginning of each line and then triple-click copied & pasted and examined each and every commit. The suspicious commit was of course the last one I checked:

commit d23fb81d81dd4bddd1ddb095fae729a7e10e249f
Author: Matthew Francis <mjay.francis@gmail.com>
Date:   Sat Sep 5 18:57:48 2015 +0800

    source-hash-4935422b410757bb4920b98a2d81da3c11b8e3d7
    
    commit 4935422b410757bb4920b98a2d81da3c11b8e3d7
    Author:     Eike Rathke <erack@redhat.com>
    AuthorDate: Tue Jul 9 15:48:10 2013 +0200
    Commit:     Eike Rathke <erack@redhat.com>
    CommitDate: Tue Jul 9 15:52:21 2013 +0200
    
        read/write ODF *:script* and *:rfc-language-tag*
    
        This prepares to be able to read/write the attributes, it does not
        enable proper handling of unknown language tags yet. An unknown tag
        usually falls back to SYSTEM locale.

Adding Cc: to Eike Rathke
Comment 6 QA Administrators 2019-09-02 09:26:03 UTC Comment hidden (obsolete)
Comment 7 Regina Henschel 2020-09-12 22:38:04 UTC
The problem still exists in Version: 7.1.0.0.alpha0+ (x64)
Build ID: 1e0cfd5662d95cea84e80e4fe10d52c3b1101ae6
CPU threads: 8; OS: Windows 10.0 Build 18362; UI render: Skia/Vulkan; VCL: win
Locale: de-DE (en_US); UI: en-US
Calc: CL

But the bug is not related to ODF. Original and resaved files are valid ODF. Therefore I remove blocks 108198.
Comment 8 QA Administrators 2022-09-14 03:37:51 UTC Comment hidden (obsolete)
Comment 9 Kevin Suo 2024-02-18 15:29:01 UTC
I can still reproduce this on latest master
Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 95e6f942b3fa5c6f3e5473ac474a4702ab815502
CPU threads: 4; OS: Linux 6.5; UI render: default; VCL: gtk3
Locale: zh-CN (zh_CN.UTF-8); UI: zh-CN
Calc: threaded
Comment 10 Kevin Suo 2024-02-19 06:12:17 UTC
I am able to set up an old Fedora 19 to build the old libreoffice 4.2 code.

bisect bad 4935422b410757bb4920b98a2d81da3c11b8e3d7
bisect good cc7a301dad831f8113cc3d737e2f4d23061a65ac

The bad commit:

commit 4935422b410757bb4920b98a2d81da3c11b8e3d7
Author: Eike Rathke <erack@redhat.com>
Date:   Tue Jul 9 15:48:10 2013 +0200

    read/write ODF *:script* and *:rfc-language-tag*
    
    This prepares to be able to read/write the attributes, it does not
    enable proper handling of unknown language tags yet. An unknown tag
    usually falls back to SYSTEM locale.



Adding Eike Rathke to cc: would you please take a look?
Comment 11 Kevin Suo 2024-02-19 07:20:00 UTC
I noticed the following console output which may be relevant:

warn:i18nlangtag:1769941:1769941:i18nlangtag/source/languagetag/languagetag.cxx:1435: LanguageTagImpl::convertLocaleToLang: with bAllowOnTheFlyID invalid 'fr-fr-FR'
Comment 12 Eike Rathke 2024-02-19 14:18:55 UTC
That fr-fr-FR seems to stem from a bad style, either

    <number:date-style number:automatic-order="true" number:country="FR" number:language="fr" number:script="fr" style:name="N79">

or, there are other similar bad paragraph and text styles with

      fo:country="FR" fo:language="fr" fo:script="fr"

with *:script="fr" where "fr" is not a ISO 15924 script code; in fact that attribute shouldn't be present at all for fr-FR.

There's also

    <style:style style:family="paragraph" style:name="a7e81ac" style:parent-style-name="Standard">      
      <style:text-properties fo:country="fr" fo:language="FR"/>
    </style:style>

note the lower case country "fr" and upper case language "FR" switched, I'm quite sure we never wrote that..

Also odd is that content.xml contains ^M carriage return characters as if it was edited manually on Windows and repackaged.

The generator is said to be
LibreOffice/5.4.4.2$Linux_X86_64 LibreOffice_project/40m0$Build-2

Does the in comment 0 described behaviour persist if content.xml is edited and all number:script="fr" removed?
Comment 13 Eike Rathke 2024-02-19 14:20:50 UTC
all *:script="fr"
fwiw..
Comment 14 Kevin Suo 2024-02-19 16:33:12 UTC
> Does the in comment 0 described behaviour persist if content.xml is edited and all *:script="fr" removed?

I confirm that the comment 0 described behaviour is gone if cotent.xml is edited with all the *:script="fr" removed.

So, that odt file was not correct ODF, maybe this should be closed as NOTABUG? Or, should we improve our code to auto-correct such *:script="fr" staff in our import filter?

Christian.gruemme, would you please clarify how the test document in attachment 139435 [details] was generated? If it was generated by LibreOffice, then is the ODT file as it was, or did you manually edited the content.xml and re-zipped?