138495 – Removed duplicate lines from spelling dictionaries

Bug 138495 - Removed duplicate lines from spelling dictionaries

Summary: Removed duplicate lines from spelling dictionaries

Status:	RESOLVED WONTFIX

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Linguistic (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	Dictionaries
	Show dependency tree / graph

Reported:	2020-11-25 15:49 UTC by Pander
Modified:	2021-08-03 16:44 UTC (History)
CC List:	3 users (show)

See Also:
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Pander 2020-11-25 15:49:52 UTC

The following command will list all duplicate lines in .dic dictionary files for the spelling checker:

for i in $(find . -name '*.dic' -type f); do echo -n $i\ ; sort $i|uniq -c|grep -v '      1 '; done

Please deduplicate the lines for the dictionaries maintained here and if possible report upstream author.

Note that changes to .dic files need an update of the first line of the .dic file in which the number of words in that dictionary is listed.

Some dictionaries have hundreds of duplicate lines. Improving dictionaries will (marginally) increase dictionary load times and improve dictionary maintainability.

Comment 1 Roman Kuznetsov 2021-07-30 18:23:28 UTC

Pander, do you want do it yourself?

Comment 2 Pander 2021-08-03 16:42:14 UTC

No, and giving this a second thought, you can close it. In the end, duplicates are there sometimes for a reason, because the file has been authored or generated like that.