Bug Hunting Session
Bug 58060 - autocorrect entries in the [All] list are saved as English(USA)
Summary: autocorrect entries in the [All] list are saved as English(USA)
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
4.0.0.0.beta1
Hardware: x86 (IA32) Windows (All)
: medium major
Assignee: Eike Rathke
URL:
Whiteboard: target:4.1.0 target:4.0.0.0.beta0
Keywords: regression
Depends on:
Blocks: mab4.0
  Show dependency treegraph
 
Reported: 2012-12-09 20:14 UTC by tommy27
Modified: 2012-12-15 14:32 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
test "acor_.dat" (619 bytes, application/octet-stream)
2012-12-13 19:20 UTC, tommy27
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tommy27 2012-12-09 20:14:24 UTC
Version 4.0.0.0.beta1+ (Build ID: 7061f72159e38e76134bc7fefc8a75cd233889c)
Clean Install on Windows Vista 64 Home Premium SP1. Intel Core 2 Duo CPU P8400 @ 2.26GHz - 4GB RAM).


Start LibO 4.0
click on “Tools/Autocorrect Options”

it opens a replacement table with autocorrect entries stored in the “acor” files contained in the “C:\Program Files (x86)\LOdev 4.0\share\autocorr” (i.e. acor_de-DE.dat ; acor_en-GB.dat ; acor_it-IT.dat etc. etc.)

each .dat file contains a default set of autocorrect entries for each language that come with LibO at first start.

if you add a new custom entry in one of those autocorrect lists, let's say “German (Germany)” , a new acor-de-DE.dat file is created in the user profile under:
 
C:\Users\YourName\AppData\Roaming\LOdev\4\user\autocorr

the new .dat file will contain the default entries previously stored in “share\autocorr” and the new custom entry.

Now open again the replacement table from “Tools/Autocorrect Options” and scroll to the top to the “[All]” list and add a new autocorrect entry.

LibO 4 will create an “acor_en-US.dat” in the user profile

the expected behavior would be the creation of an “acor_.dat” file, like it happens is LibO 3.6.4 and previous releases.

now open again the replacement table and select the “English (USA)” list...
you will see that it contains the same entries of the “[All]” list... 

basically the [All] list is saved as an English (USA) list.

This is going to mess up autocorrect function and IMHO should be considered as a LibO 4 MAB.
Comment 1 tommy27 2012-12-11 16:49:42 UTC
the bug was not present on an older master release
Version 3.7.0.0.alpha0+ (Build ID: 24761a6) 

just tested on WinXP 32bit.
new autocorrect entries saved in the [All] list are correctly stored as "acor_.dat" and not as "acor_en-US.dat"

I hope this information will help to exactly identify when the regression took place in the 3.7 --> 4.0 development
Comment 2 Michael Meeks 2012-12-12 22:09:38 UTC
So the problem creeps in in the range:

git log 24761a6..7061f72159e38e76134bc7fefc8a75cd233889c | grep Author | wc -l

five thousand commits or so :-) If I was guessing, I'd think:

commit 378e437fbe313e87b7e56f8f0a1fc4009470679c
Author: Eike Rathke <erack@redhat.com>
Date:   Fri Nov 16 23:22:11 2012 +0100

    use LanguageTag
    
    Change-Id: If056193c803f70f8707373ed7ff7b1abbf953852

1       1       editeng/source/misc/svxacorr.cxx

diff --git a/editeng/source/misc/svxacorr.cxx b/editeng/source/misc/svxacorr.cxx
index 3f2298a..d8ae94c 100644
--- a/editeng/source/misc/svxacorr.cxx
+++ b/editeng/source/misc/svxacorr.cxx
@@ -1905,7 +1905,7 @@ sal_Bool SvxAutoCorrect::FindInCplSttExceptList(LanguageType eLang,
 String SvxAutoCorrect::GetAutoCorrFileName( LanguageType eLang,
                                             sal_Bool bNewFile, sal_Bool bTst ) const
 {
-    String sRet, sExt( MsLangId::convertLanguageToIsoString( eLang ) );
+    String sRet, sExt( LanguageTag( eLang ).getBcp47() );
     sExt.Insert('_', 0);
     sExt.AppendAscii( ".dat" );
     if( bNewFile )

Might be related; is it possible that some corner-case LanguageType enumeration like LANGUAGE_DONTKNOW is getting mapped differently there:

sal_Bool SvxAutoCorrect::AddCplSttException( const String& rNew,
                                        LanguageType eLang )
...
        else if(CreateLanguageFile(LANGUAGE_DONTKNOW, sal_True))
            pLists = pLangTable->find(LANGUAGE_DONTKNOW)->second;

Is liblangtag using the default instead of empty-string perhaps ?
Comment 3 Michael Meeks 2012-12-12 22:27:41 UTC
As suspected; fixed in master and will pick to -4-0.

Eike - it worries me that other places using DONTKNOW will also behave differently - are you aware of that ?

Tommy - thanks so much for testing it nicely, reporting cleanly and narrowing down the commit range well.

Good work.
Comment 4 Not Assigned 2012-12-12 22:28:54 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a61928d2f7512d573e598b23c7fd3cf341f97780

fdo#58060 - use empty-string for LANGUAGE_DONTKNOW ie. acorr_.dat



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 5 Not Assigned 2012-12-12 22:34:38 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "libreoffice-4-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=08a3a5ee5b64105bb1b690d5fa4da83bbe220337&g=libreoffice-4-0

fdo#58060 - use empty-string for LANGUAGE_DONTKNOW ie. acorr_.dat


It will be available in LibreOffice 4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 6 Eike Rathke 2012-12-13 11:36:34 UTC
Argh.. this is yet another (now the 4th) meaning of an empty locale or string, what a mess.. ok, adding this to the list. There are ~30 places I need to check.

Cleanest would be to have a reserved for local use "All" language 'qaa' or "dontknow" 'qdk' for this purpose instead of an empty string.

The user profile's acor_*.dat files are migrated from 3.x to 4.0, aren't they? Maybe this would be a chance to copy oldprofile/acor_.dat to newprofile/acor_qaa.dat or some such.
Comment 7 tommy27 2012-12-13 11:48:54 UTC
@Eike

if this may help you,if you copy the "acor.dat" file from 3.6.x user profile into into the bugged 4.0 user profile, it will somehow work...

I mean, LibO 4 still reads old autocorrect entries from the "acor.dat" file but saves new into "acor_en-US.dat" and load this one if you open the replacement table.

I agree that it would be smart to rename the "acor_.dat" into a more intuitive name such as "acor_all-ALL.dat" or anything else to avoid the empty string locale.
Comment 8 Michael Meeks 2012-12-13 12:53:57 UTC
> The user profile's acor_*.dat files are migrated from 3.x to 4.0,
> aren't they? Maybe this would be a chance to copy oldprofile/acor_.dat
> to newprofile/acor_qaa.dat or some such.

Sounds like a good idea to me ! :-) acor_all.dat or something would be great [ reflects the UI name I guess ]. Once the migration bit is done it'd be a trivial fix in the svxacorr - though I'm concerned by the report that we continued to load the data with the acor_.dat name - I'd have imagined that wouldn't work.
Comment 9 Eike Rathke 2012-12-13 14:49:32 UTC
Actually there's the 'und' ISO 639 code for "undetermined" that would fit here.. so acor_und.dat

> though I'm concerned by the report that we
> continued to load the data with the acor_.dat name - I'd have imagined that
> wouldn't work.

Indeed, that's a surprise to me as well.. while at it I'll try to figure it out.
Comment 10 Eike Rathke 2012-12-13 17:54:13 UTC
Fwiw, I could not reproduce that copying an existing acor_.dat (or renaming the newly for "[All]" created acor_en-US.dat to acor_.dat) would result in that file being read in.
Comment 11 tommy27 2012-12-13 18:00:08 UTC
@Eike.
I'll retest later this evening. maybe I was wrong.
Comment 12 tommy27 2012-12-13 19:20:22 UTC
Created attachment 71455 [details]
test "acor_.dat"

@Eike

I still confirm what I reported before. test it yourself.

download the attached "acor_.dat" file which contains a single autocorrect entry (Tommu --> Tommy).

place it in the autocorr subfolder of LibO 4.0.

then open a blank Writer document and digit "Tommu" it will be corrected ad "Tommy"

if you open the autocorrect options menu and select the [All] list you won't see Tommu --> Tommy entry, instead you will see "english american" autocorrect entries.

obviously the test I performed was done on the daily build before Micheal's fix.
Comment 13 Not Assigned 2012-12-13 19:25:16 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-4-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=cbd9177a3985910312d4f6a4b7c2068bbab50619&g=libreoffice-4-0

fdo#58060 use acor_und.dat and LANGUAGE_UNDETERMINED


It will be available in LibreOffice 4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 14 Not Assigned 2012-12-13 19:25:35 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=623410669fa2d5da9a2ce4e3c4b81ce23605a6df

fdo#58060 use acor_und.dat and LANGUAGE_UNDETERMINED



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 15 tommy27 2012-12-13 19:30:30 UTC
regarding "acor_und.dat" the most important things is that it should keep the same "autocorrect power" of the old "acor_.dat".

I mean, autocorrect entries in "acor_.dat" are applied in any language you write... from Arabic to Zulu... since it was intended as a list of universal autocorrect 

the same should happen with the "acor_und.dat"
Comment 16 Eike Rathke 2012-12-13 20:02:46 UTC
It does, it's just the language tag and internal mapping from/to ID that changed, the logic is untouched. An existing acor_.dat is copied to acor_und.dat during user profile migration from 3 to 4.
Comment 17 tommy27 2012-12-15 14:32:24 UTC
thanks Eike. I confirm "acor_und.dat" works fine in latest LOdev 4.0 daily build.