Bug Hunting Session
Bug 58592 - Italian Autocorrection lacks some common mistakes
Summary: Italian Autocorrection lacks some common mistakes
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Localization (show other bugs)
Version:
(earliest affected)
3.6.1.2 release
Hardware: All All
: low enhancement
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-20 23:26 UTC by Marco Menardi
Modified: 2013-08-23 13:22 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Extended autocorrect DocumentList.xml for acor_it-IT.dat (6.48 KB, text/plain)
2012-12-29 14:33 UTC, Marco Menardi
Details
Extended autocorrect DocumentList.xml for acor_it-IT.dat (6.27 KB, text/plain)
2012-12-29 15:27 UTC, tommy27
Details
Extended autocorrect DocumentList.xml for acor_it-IT.dat (6.27 KB, text/plain)
2012-12-29 15:30 UTC, tommy27
Details
Extended autocorrect DocumentList.xml for acor_it-IT.dat (6.25 KB, text/plain)
2012-12-29 21:18 UTC, tommy27
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marco Menardi 2012-12-20 23:26:12 UTC
Problem description: 
In Italian we have double accent for "e" vowel, "è" and "é".
When you handwrite you don't have this distinction, so is a common mistake when writing with a PC to make a mistake and use the wrong one.
Autocorrect should add some very common ones I'm pretty sure were present time ago (I don't notice since I usually wrote correctly).
Note that the spell checker works fine, and the wrong spell is marked as so.
They are:
perchè that should be autocorrected with perché
affinchè -> affinché
sennonchè -> sennonché


Steps to reproduce:
1. write one of the above words, spell checker will tell you that are wrong since are not been automatically changed with the right spell
2. try poichè that is correctly converted in poiché


Current behavior:
There is not the substitution for perchè affinchè sennonchè
Expected behavior:
Must be autocorrected 
              
Operating System: Linux (Other)
Version: 3.6.1.2 release
Comment 1 tommy27 2012-12-28 21:22:28 UTC
confirmed on LibO 3.6.4 using Windows Vista 64bit.

however I do not know if it really deserves a fix since I think that it so easy for the user to manually add these new autocorrect entries if he/she needs it...

just my personal opinion.
Comment 2 Marco Menardi 2012-12-29 00:38:14 UTC
@tommy27
I reject this reasoning, otherwise we could remove everything in autocorrect. Those are among the most common mistaken, me must do our best especially when is so easy to code (and "other" word processors already do).
Comment 3 tommy27 2012-12-29 08:37:27 UTC
well, my point was that it's impossible to store all the possible mistakes in the default autocorrection default subset... there will be always some missing one...

you reported:
affinchè -> affinché
sennonchè -> sennonché

but there other unrecognized "errors" like:
"benchè, cosicchè, finchè, giacchè, perchè, purchè" 
that require a final "é"

and other "errors" like:
"caffé, é , té"
that require a final "è"

source: http://www.accademiadellacrusca.it/it/lingua-italiana/consulenza-linguistica/domande-risposte/vademecum-sullaccento-indicarlo-pronunciarlo

however if you think there are some entries that MUST be added, look for the default autocorrect file (under Windows is in "LibreOffice/share/autocorrect/acor_it-IT.dat"), manually edit it and add the missing entries, then upload the edited file here with a list of the additions you made.

I added the responsible for italian localization to the CC list.
let's see what he thinks about it.
Comment 4 Marco Menardi 2012-12-29 11:36:16 UTC
Indeed, as you link says, there are other "common errors" that so should be included, thanks for pointing it out.
Is not a matter of include ALL possible mistakes, but would be great put the most common ones. If an efficient algorithm is used (like the binary one), enlarge the list will have no impact on performances. I'm against dangerous "overcorrecting", but in these cases there are just plain errors that is safe to fix.
I've found the file in GNU/Linux, uncompressed, and loaded DocumentList.xml in Emacs 24, but is nothing that can be easily edited (is just a single long line with tags but is not shown in a structured, line by line way).
I've structured with xmllint --format DocumentList.xml and will provide the formatted version back as soon as possible.
Comment 5 Marco Menardi 2012-12-29 14:31:43 UTC
Ok, I've added some words in the attached file, that is a formatted version of DocumentList.xml, don't know how to "unformat" and in chat I've been told to not care about it, so probably devs have some automatic tool.
I've took the 4.0b2 file, and new items have been inserted respecting alphabetical order.
Already present were:
anziché, né, nonché, poiché, sé

I've added:
affinché, benché, cosicché, finché, giacché, nonché, perché, purché, sennonché, sicché

On the flip side I've added:
caffè, cioè, tè

I've also added these:
aereoporto -> aeroporto
obbiettivo -> obiettivo
pneomatico -> pneumatico
vedavamo -> vedevamo

Thanks a lot
Comment 6 Marco Menardi 2012-12-29 14:33:22 UTC
Created attachment 72252 [details]
Extended autocorrect DocumentList.xml for acor_it-IT.dat
Comment 7 tommy27 2012-12-29 15:27:10 UTC
Created attachment 72255 [details]
Extended autocorrect DocumentList.xml for acor_it-IT.dat

nice job. however that .xml must follow a precise format which is:

1st line: 
<?xml version="1.0"?>

2nd line:
all the remaining lines merged in a single one without spaces between them.

see the attached file which is a copy of your modified to match these rules, otherwise it will not work in LibO.

P.S. text manipulation were done using http://textmechanic.com/  (nice web tool which can help you in tasks like this)
Comment 8 tommy27 2012-12-29 15:30:18 UTC
Created attachment 72256 [details]
Extended autocorrect DocumentList.xml for acor_it-IT.dat

sorry... the previous file had a mistake (3 lines: the 2nd was blank)... I'm uploading a correct version right now
Comment 9 Marco Menardi 2012-12-29 16:07:48 UTC
@tommy27, comment #7
The original file from 4.0beta2 DOES NOT have <?xml version="1.0"?>, while the one from 3.6.2 has, is it a new bug that you are fixing or something has changed so that line is no more required?
Note that my file, even if took from 4.0, had it because was added automatically by xmllint.
Thanks a lot
Comment 10 tommy27 2012-12-29 21:12:49 UTC
@Marco Menardi

nice catch!!! I do not know why LibO 4.0 beta .xml files lack that line ( <?xml version="1.0"?> ). However it doesn't seems to cause problems.

from what I know about LibO autocorrection, all the .dat files stored in the “C:\Program Files (x86)\LOdev 4.0\share\autocorr” (i.e. acor_de-DE.dat ; acor_en-GB.dat ; acor_it-IT.dat etc. etc.) represent a kind of temporary default set of autocorrect entries for each language that come with LibO at first start.

if you add a new custom entry in one of those autocorrect lists, let's say “German (Germany)” , a new acor-de-DE.dat file will be created in the user profile under:
 
"C:\Users\YourName\AppData\Roaming\LOdev\4\user\autocorr" (on Vista 64bit)

the new .dat file will contain the default entries previously stored in “share\autocorr” and the new custom entry.

so the .dat files from “C:\Program Files (x86)\LOdev 4.0\share\autocorr”  serve as the base to start a custom new autocorrect database which is stored in the User profile.

the DocumentList.xml files contained inside the .dat files have the structure I described earlier with small differences in the first line.

in LibO 3.6.x the first line is:

<?xml version="1.0"?>   in the "share/autocorr" folder

<?xml version="1.0" encoding="UTF-8"?>  in the "user/autocorr" folder
...................................

in LibO 4.0 beta2 the first line is:

missing in the "share/autocorr" folder as you pointed out

<?xml version="1.0" encoding="UTF-8"?>  in the "user/autocorr" folder

this causes no issues, since from test I've done, if you add a new autocorrect entry to one "virgin" database without first line, let's say "acor_pl-PL" (Polish) in the "share/autocorr", the resulting "acor_pl-PL" "user/autocorr" folder will have that <?xml version="1.0" encoding="UTF-8"?> first line. 

 
P.S. by the way I changed the bug summary notes which had some english mistakes ... funny isnt' it?
Comment 11 tommy27 2012-12-29 21:18:48 UTC
Created attachment 72276 [details]
Extended autocorrect DocumentList.xml for acor_it-IT.dat

according to what was found in Comment 9 and Comment 10, I'm uploading a new version of the .xml file without the first line <?xml version="1.0"?>

now it's up the the LibO italian localization team to choose whether or not to use the new .xml file updated by Marco Menardi
Comment 12 tommy27 2013-04-01 15:48:37 UTC
any news from the Italian localization team?
Comment 13 Valter Mura 2013-04-01 18:26:58 UTC
Hi tommy27

could you please send directly to me the file? I'll check it and, if necessary, send it to devs.

Please be aware that the autocorrect option is fully customizable, if everything in order, we can integrate it. Please also note that a spellchecker is bundled in the program, so wrong words are highlighted and the user can choose if autocorrect them or not.

Ciao
Comment 14 tommy27 2013-04-01 18:33:00 UTC
@Valter

the file is here:
https://bugs.freedesktop.org/attachment.cgi?id=72276
Comment 15 tommy27 2013-06-20 04:42:39 UTC
@Valter
any final decision about that file?
are you gonna use it or not?
Comment 16 Julien Nabet 2013-07-02 21:18:16 UTC
Sophie/Adolfo/Andras: I thought about adding the words quoted in this tracker here:
extras/source/autotext/lang/it/acor/DocumentList.xml
Is it ok for you or should it follow a kind of process?
Comment 17 tommy27 2013-08-18 07:52:58 UTC
is there any news about this report? has a final decision been taken?
are you gonna implement the new autocorrect items or leave the original files unchanged?
Comment 18 Valter Mura 2013-08-22 15:10:36 UTC
Hi All,

the file has been updated and a bug will be opened to upload the new file.

I suppose the changes will be done for the next v. 4.1.x
Comment 19 Julien Nabet 2013-08-22 17:57:23 UTC
Valter: I don't see any changes for DocumentList.xml for acor it-IT.dat on master sources. From who have you get some information?
Comment 20 Valter Mura 2013-08-23 12:39:46 UTC
Hi Julien, I've closed this bug and open another one (enhancement) with the new file attached.

Sorry, but I didn't indicate the new one:

https://bugs.freedesktop.org/show_bug.cgi?id=68440
Comment 21 Andras Timar 2013-08-23 13:22:03 UTC
fixed, see bug 68440