Bug 138495 - Removed duplicate lines from spelling dictionaries
Summary: Removed duplicate lines from spelling dictionaries
Status: RESOLVED WONTFIX
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Dictionaries
  Show dependency treegraph
 
Reported: 2020-11-25 15:49 UTC by Pander
Modified: 2021-08-03 16:44 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pander 2020-11-25 15:49:52 UTC
The following command will list all duplicate lines in .dic dictionary files for the spelling checker:

for i in $(find . -name '*.dic' -type f); do echo -n $i\ ; sort $i|uniq -c|grep -v '      1 '; done

Please deduplicate the lines for the dictionaries maintained here and if possible report upstream author.

Note that changes to .dic files need an update of the first line of the .dic file in which the number of words in that dictionary is listed.

Some dictionaries have hundreds of duplicate lines. Improving dictionaries will (marginally) increase dictionary load times and improve dictionary maintainability.
Comment 1 Roman Kuznetsov 2021-07-30 18:23:28 UTC
Pander, do you want do it yourself?
Comment 2 Pander 2021-08-03 16:42:14 UTC
No, and giving this a second thought, you can close it. In the end, duplicates are there sometimes for a reason, because the file has been authored or generated like that.