this bug is the "LibO twin" of OOo issue 101224 that I opened in april 09: https://issues.apache.org/ooo/show_bug.cgi?id=101224 I'm replicating here to propose it as an easy-hack: http://wiki.documentfoundation.org/Development/Easy_Hacks#Progress DESCRIPTION Lib autocorrect replacement tables are stored as .dat files in this path under Windows: "....User\LibreOffice 3\user\autocorr" (Windows) there is one “universal replacement table” called "acor_.dat" whose entries are applied in any language you are writing in. there are also separate replacement tables for all language variants: - UK English --> acor_en-GB.dat - USA English --> acor_en-US.dat the same applies to all spanish, german and even italian subvariants (i.e. Italian --> acor_it-IT.dat; Swiss Italian --> acor_it-CH.dat) however those .dat files are not mutual... this separate subtype policy must be kept because of the minority of words that have different spelling among language variants For example i could set a: - “colour -> color” entry in the acor-en_US.dat file and a - “color -> colour” entry in the acor-en_GB.dat file there's however the vast majority of words that have exactly the same spelling... let's take an example: “yellow” which is the same in England, USA, South Africa, Australia, Canada etc. etc. if you come with a typing error like “yrllow” you should set an autocorrect entry in each of the localized english .dat files... it would be too time consuming... It would be much user friendly and time saving to have a “non localized” "acor- en.dat" file whose entries are shared by all english subtypes. it would be great to have something similar to the the “universal replacement table” acor_.dat but restricted to certain language groups. something like: - acor_en.dat working on both UK, US, AUS etc. ect. english variants - acor_it-ALL.dat working both on italian and swiss language TECHNICAL INFO LibO developer John Holesovsky AKA Kendy gave me some interesting hints how to fix the problem on the developer mailing list. here's what he said: the code you want to play with is editeng/source/misc/svxacorr.cxx . http://docs.libreoffice.org/editeng/html/svxacorr_8cxx_source.html You probably want to tweak SvxAutoCorrect::SearchWordsInList() so that it fallbacks to 'en' in case the word is not found in 'en_US', or something like that; but you will have to tweak some code around that probably too, in order to load the shared acor_XY.dat in addition to acor_XY_AB.dat, etc. I don't think it is hard; but some constructs used in that piece of code are not too obvious, my favorite is this condition: else if( ( FStatHelper::IsDocument( sUserDirFile ) || FStatHelper::IsDocument( sShareDirFile = GetAutoCorrFileName( eLang, sal_False, sal_False ) ) ) || ( sShareDirFile = sUserDirFile, bNewFile ))
easy-hack-ising :-) I think there is enough here to go on - I'm happy to mentor.
whoever tries hacking this should not probably alter the current behaviour of adding autocorrect entries by right click menu which adds entries to the current language of the document I mean, if you write a document in America English, and you accept a "right click" autocorrect suggestion, this should go (like it does right now) in the acor_en-US.dat file. I think about the common acor_en.dat for all english language variants, as a "replacement table accesuble only" database, just as the acor_.dat file (common autocorrect database for all language).
I've taken a while looking at this and don't feel confident enough in what I know about the codebase to feel like I can commit a fix.
@jam@jamandbees.net sorry to hear that, but at least you tried so I appreciate your efforts anyway.
Deteted "Easyhack" from summary
adding LibreOffice developer list as CC to unresolved EasyHacks for better visibility. see e.g. http://nabble.documentfoundation.org/minutes-of-ESC-call-td4076214.html for details
Removing comma from whiteboard (please use a space to delimit values in this field) https://wiki.documentfoundation.org/QA/Bugzilla/Fields/Whiteboard#Getting_Started
Tommy27: I tried to use a generic solution for fdo#79276. Would you have some time to give it a try? (need 4.2.6/4.3.1) For example, I don't know if it's ok with a old profile.
WOW!!! Well done Julien, your fix for Bug 79276 could represent a solution for the current bug as well. If I copy one of those autocorrect .dat file and manually remove the final sublocalization tag (i.e. acor_it-IT.dat → acor_it.dat) and I place it in the autocorr subfolder of the user profile it will work as an unlocalized autocorrect version for that language. That means that autocorrect entries in the acor_it.dat file will be applied either in documents written in Italian (Italy) or in Italian (Switzerland). The same will apply to an acor_en.dat file which could be an universal autocorrect replacement for all english variants as well. The only thing which is missing is that those unlocalized acor.dat files actually are not shown in the UI of the Tools/autocorrect options/Replace, so you have no way to edit od add or remove those entries. If you find a way to make those unlocalized acor.dat files editable in the UI the fix will be complete. We also have to decide how those unlocalized autocorrect lists should look in the language list.... I mean, we have Italian (Italy) for acor_it-IT.dat and Italian (Switzerland) for acor_it-CH.dat, what we should visualize for acor_it.dat? Maybe we should keep it simple and display it only as Italian rather than Italian (unlocalized) or Italian (common) or Italian (General) etc.etc.
Tommy27: I imagined generic language files more like a base for standard or use dictionaries not as a generic dictionary per se. However, I'm not i18n expert at all and let Andras speak. For example, I put a selection in en-US and another in fr-FR then I added 1 word for each. I found the result in wordbook/standard.dic (from a brand new profile with master sources updated some days ago): OOoUserDict1 lang: <none> type: positive --- stiro stari Is it ok or not, I don't know (I hadn't made this test before). Andras: put you in cc of this one because I'm not sure what we should do now.
Sorry but I do Not understand what You are talking about these are list for automatic correction of typing errors not dictionaries See The " yellow "example in the original description
Oups forget what I told, of course you're right :-)
(sorry again for my previous comment, I was focus on dictionaries) A second issue about editing generic unlocalized autocorrect list is what to do with localized ones (if they've been generated) once the unlocalized autocorrect list is changed? Should we try to spread the change in localized autocorrect lists? If yes, what to do if there's a conflict?
Unlocalized file should have its own list and should not mix with localized file the reason was explained here (In reply to comment #0) >... > > there are also separate replacement tables for all language variants: > however those .dat files are not mutual... this separate subtype policy must > be kept because of the minority of words that have different spelling among > language variants > > For example i could set a: > - “colour -> color” entry in the acor-en_US.dat file and a > - “color -> colour” entry in the acor-en_GB.dat file > > there's however the vast majority of words that have exactly the same > spelling... let's take an example: “yellow” which is the same in England, > USA, > South Africa, Australia, Canada etc. etc. > > if you come with a typing error like “yrllow” you should set an autocorrect > entry in each of the localized english .dat files... it would be too time > consuming... > > It would be much user friendly and time saving to have a “non localized” > "acor-en.dat" file whose entries are shared by all english subtypes. > > it would be great to have something similar to the the “universal > replacement table” acor_.dat but restricted to certain language groups. > something like: > > - acor_en.dat working on both UK, US, AUS etc. ect. english variants > - acor_it.dat working both on italian and swiss language
Tommy27: Just to be sure to understand, it would mean: - a first file for initial unlocalized file - a second file for unlocalized autocorrect if you edit the unlocalized list - a third file for your localized autocorrect if you edit localized list => So autocorrect process should search in second and third file first (in which order? A user could have made a mistake and put a same word to replace but a different replacement) and if there's none of these files, should search in first file only Is it correct?
first of all we have to define the exact position of those autocorrect files. default replacements are under ...\LibreOffice 4\share\autocorr these are use for first time use of the autocorrect engine and are copied into the user profile which should be under ...LibreOffice 4\user\autocorr further edits (addition of new entries, removal or modification of existing one) will affect the files in the "user" profile, not those under "share" so in a french scenario, since you have an unlocalized version under "share" which is acor_fr.dat, when you use it for the first time in a french(france) document it will be copied under "user" as acor_fr-FR.dat and will apply just to french(france) documents and not to other variants like french(canada). if you wanna an unlocalized version of the french autocorrect list, you have to manually copy the acor_fr.dat from "share" and place it under "user" this will work and apply replacements either in french(france) or in french(canada) documents. the problem is that actually you don't see the unlocalized french list in the autocorrect options dropdown menus, so further edits are not possible. the code should be tweaked to display unlocalized language list as well. actually you see: French (France) --> acor_fr-FR.dat French (Canada) --> acor_fr-CA.dat etc. etc. while you should be able to see: French --> acor_fr.dat French (France) --> acor_fr-FR.dat French (Canada) --> acor_fr-CA.dat etc. etc.
(In reply to comment #15) > Tommy27: > Just to be sure to understand, it would mean: > - a first file for initial unlocalized file > - a second file for unlocalized autocorrect if you edit the unlocalized list > - a third file for your localized autocorrect if you edit localized list > => So autocorrect process should search in second and third file first (in > which order? A user could have made a mistake and put a same word to replace > but a different replacement) and if there's none of these files, should > search in first file only > Is it correct? I made a test to see how the code behaves in front of conflicts. let's say you have: color → colour in acor_en-GB.dat colour → color in acor_en-US.dat each one will apply only respectively in british english and american english documents with no conflicts. If you instead have a: color → colour in acor_en.dat it will apply to american english documents as well so it means that currently the autocorrect engine looks first in the unlocalized version (acor_en.dat) rather than the localized version (acor_en-US.dat) which doesn't look good to me. In my opinion when you have conflicts, the autocorrect engine should look first in the autocorrect list which is specific for the document language, in this case (acor_en-US.dat), and only in a second time in the unlocalized version (acor_en.dat) if there's no replacement in the previous file.
(In reply to comment #17) > (In reply to comment #15) > > Tommy27: > > Just to be sure to understand, it would mean: > > - a first file for initial unlocalized file > > - a second file for unlocalized autocorrect if you edit the unlocalized list > > - a third file for your localized autocorrect if you edit localized list > > => So autocorrect process should search in second and third file first (in > > which order? A user could have made a mistake and put a same word to replace > > but a different replacement) and if there's none of these files, should > > search in first file only > > Is it correct? > > I made a test to see how the code behaves in front of conflicts. > > let's say you have: > color → colour in acor_en-GB.dat > colour → color in acor_en-US.dat > > each one will apply only respectively in british english and american > english documents with no conflicts. > > If you instead have a: > color → colour in acor_en.dat > it will apply to american english documents as well > > so it means that currently the autocorrect engine looks first in the > unlocalized version (acor_en.dat) rather than the localized version > (acor_en-US.dat) which doesn't look good to me. > > In my opinion when you have conflicts, the autocorrect engine should look > first in the autocorrect list which is specific for the document language, > in this case (acor_en-US.dat), and only in a second time in the unlocalized > version (acor_en.dat) if there's no replacement in the previous file. With fresh build of master sources + French UI by default here are my tests. Open autocorrect French France, change "afirmer => affirmer" to "afirmer => afffirmer". I get "afffirmer" (3 f) when I type "afirmer". I close LO and reopen and the change is still the localized one. I'm quite lost here :-(
Created attachment 104939 [details] autocorrect testkit Hi Julien, probably my test in comment 17 was not 100% accurate. try replicating this new experiment. a- download the attached .zip file which contains 3 minimal autocorrect .dat file 1- acor_und.dat it has a single entry: test → test1 it will apply in any document regardless the language since the acor_und.dat file is the global autocorrect list (you can find it at the top of the language dropdown list in the autocorrect replacement table under [All] (don't know how's localized in french) 2- acor_en-GB.dat it has a single entry: test → test2 it will apply only in document where the language is English(UK) 3- acor_en.dat it has a single entry: test → test3 it should apply in any document written in any of the English variants (UK, US, Australia etc.) b- place these 3 dat files in the autocorr subfolder of the user profile c- load a blank new Writer document and set the language as English(UK) d- type test and see how it gets autocorrected e- compare with my results with LibO 4.2.6.2 under Win7x64: - all 3 dat files present → test corrected into test2, so the localized variant acor_en-GB.dat rules over the acor_und.dat and acor_en.dat - remove the und.dat file → again you get test2, so the en-GB.dat wins over the en.dat - remove only the en-GB.dat file → you get test3 so the en.dat wins over the und.dat - remove both en-GB.dat and und.dat → again you get test3 since only en.dat is left - remove both en-GB.dat and en.dat → you get test1 since only und.dat is left so basically in case of autocorrect conflicts the “en-GB” list wins over the “en” list and over the “und” list. I agree this is the correct behavior since the language of the document should tell which is the first autocorrect list to look inside. f- same results if I rename those file to match italian locale (i.e. acor_it.dat and acor_it-IT.dat) and I write an italian(Italy) document. g- different results if you do the same trick with german or french locales. In those cases the renamed acor_de.dat and acor_fr.dat files will have no effect even if you remove the und.dat and the de-DE.dat and the fr-FR.dat files. So it seems that the unlocalized variant .dat files doesn't work in some language subgroups... this is strange and unconsistent with the results with Italian and English where it worked with no issue. Do you have any thoughts about that?
P.S. when you remove some .dat files as described in the tests above always remember to close LibO first and then restart the program again.
First, I don't think you should manually copy files in user for the tests Then, I don't have LO at work so can't check but there can't be any unlocalized files in user\autocorrect. With the brand new profile all autocorrect files (unlocalized and localized) are in share\autocorrect, adding an entry creates a localized autocorrect file in user\autocorrect. This last one is used in priority. Again, perhaps I miss something or am wrong since I'm not at home to check.
(In reply to comment #21) > First, I don't think you should manually copy files in user for the tests trust me, it's harmelss. I've done it multiple times. > Then, I don't have LO at worsk so can't check but there can't be any > unlocalized files in user\autocorrect. > With the brand new profile all autocorrect files (unlocalized and localized) > are in share\autocorrect, adding an entry creates a localized autocorrect > file in user\autocorrect. This last one is used in priority. it can't be there since you cannot create a brand new one. the workaround is to enter an entry in a language you don't use (let's say iceland) and then rename the dat file which is created inside the user profile. this is what I did to create the acor_en.dat file and it worked. what we miss at the moment is the ability to directly create an unlocalized variant of an autocorrect list since in the language dropdown menu of the replacement table you can only select localized versions of languages like English(UK), English(US), English(Australia) etc. etc. and you don't have the chance to select a plain English language item with no indicated variant. What I think that we should tweak that UI and allow support for variantless languages. > Again, perhaps I miss something or am wrong since I'm not at home to check. why don't you downlaod the portable LibreOffice version from WinPenPack? link is here: http://sourceforge.net/projects/winpenpack/files/X-LibreOffice/releases/ then you can put in a USB key and bring it with you everywhere.
There are 2 points to distinguish: 1) In the basic process, there can't be any unlocalized file in user/autocorrect. It seems you may have weird result only if you bypass the process by copying an unlocalized file in it. 2) I understand you'd like to edit unlocalized file and I didn't try to implement it. Being able to edit it would mean indeed mean there could be unlocalized files in user/autocorrect (without manually copying). I don't think I'd be able to do it since it means: - to be able to list unlocalized languages as you said - prevent conflicts you described when there are localized and unlocalized files (again as you said) I'm sorry to tell I can't help more on this last point :-( Andras/Michael: any thoughts?
(In reply to comment #23) > There are 2 points to distinguish: > 1) In the basic process, there can't be any unlocalized file in > user/autocorrect. It seems you may have weird result only if you bypass the > process by copying an unlocalized file in it. yes, your fix for Bug 79276 had this side effect and allowed me to manually "hack" the user profile and use and unlocalized autocorrect file inside it. the weird thing is that this work with some languages (Italian, English) and not with others (French, German). I don't understand why... > 2) I understand you'd like to edit unlocalized file and I didn't try to > implement it. Being able to edit it would mean indeed mean there could be > unlocalized files in user/autocorrect (without manually copying). I don't > think I'd be able to do it since it means: > - to be able to list unlocalized languages as you said, exactly, that would be exactly what I wanted to be implemented when I opened this Bug 44580 > - prevent conflicts you described when there are localized and unlocalized > files (again as you said) I think it's up to the user to avoid autocorrect conflicts. You can already have conflicts using the acor_UND.dat file but it's your fault if you set discordant autocorrect replacements among different lists. > I'm sorry to tell I can't help more on this last point :-( you already did a lot.
Adding self to CC if not already on
Migrating Whiteboard tags to Keywords: (easyHack, difficultyBeginner, skillCpp, topicCleanup)
JanI is default CC for Easy Hacks (Add Jan; remove LibreOffice Dev List from CC) [NinjaEdit]