Please, remove license information from dictionary files. The .dic files should be as clean as possible. License information should be stored in the appropriate README, LICENSE or COPYRIGHT files. There is also more place and avoids that license info is maintained in multiple places. Additionally, and most importantly, encoding problems can arise from characters with diacritics in license information, especially names of authors. On top of that, this information is added in different ways, by using whitespace, # or / 1) For Danish, remove on the first line all after the number, including the whitespace 161315 # (c) Stavekontrolden.dk See - https://cgit.freedesktop.org/libreoffice/dictionaries/tree/da_DK/da_DK.dic 2) For German, remove line numbers 2 to 18, where line 18 is an empty line and the rest start with # See - https://cgit.freedesktop.org/libreoffice/dictionaries/tree/de/de_AT_frami.dic - https://cgit.freedesktop.org/libreoffice/dictionaries/tree/de/de_CH_frami.dic - https://cgit.freedesktop.org/libreoffice/dictionaries/tree/de/de_DE_frami.dic (Something similar has been found in the non-frami German dictionaries. If possible, address those too.) 3) For Italian, remove line numbers 2 to 34 that start with # See - https://cgit.freedesktop.org/libreoffice/dictionaries/tree/it_IT/it_IT.dic 4) For Guarani, remove whitespace and word "wordlist" from the first line and remove the second line that is empty See - https://cgit.freedesktop.org/libreoffice/dictionaries/tree/gug/gug.dic 5) For Dutch, remove the last empty line See - https://cgit.freedesktop.org/libreoffice/dictionaries/tree/nl_NL/nl_NL.dic#n142520 6) For Arabic, remove empty line number 13553 See - https://cgit.freedesktop.org/libreoffice/dictionaries/tree/ar/ar.dic#n13553 - https://bugs.documentfoundation.org/show_bug.cgi?id=117389 7) For Nepal, remove empty line number 38029. Note that this is better observed in the plain file (second url). See: - https://cgit.freedesktop.org/libreoffice/dictionaries/tree/ne_NP/ne_NP.dic#n38029 - https://cgit.freedesktop.org/libreoffice/dictionaries/plain/ne_NP/ne_NP.dic 8) After cleaning up these files, please check also that the line count in the first line is correct. I.e. the total lines in the files excludes (if I'm not mistaken): - the first line - any line starting with comment - any line starting with slash - any empty lines - any lines with only whitespace This could be a general QA check for the dictionary files. I've noticed these minor improvements as developing for Hunspell/Nuspell and have scripts available for QA or reporting on this. I'm willing to contribute these, however I am completely unfamiliar with the LibreOffice development habitat.
9) Convert .aff and .dic files from DOS format line terminators to UNIX format line terminators with e.g. `flip -u` or `flip -b -u` This concerns: - hu_HU/hu_HU.aff: Non-ISO extended-ASCII text, with very long lines, with LF, NEL line terminators - pt_BR/pt_BR.dic: Non-ISO extended-ASCII text, with CRLF line terminators - pt_BR/pt_BR.aff: ISO-8859 text, with CRLF line terminators - ru_RU/ru_RU.dic: ISO-8859 text, with CRLF, LF line terminators - ne_NP/ne_NP.dic: UTF-8 Unicode text, with CRLF, LF line terminators Some extra inspection regarding long lines should be done for: - da_DK/da_DK.aff: UTF-8 Unicode text, with very long lines - si_LK/si_LK.dic: UTF-8 Unicode text, with very long lines See also: - for i in `find dictionaries -type f|grep -v hyph`; do file $i; done|grep 'long lines' - for i in `find dictionaries -type f|grep -v hyph`; do file $i; done|grep 'line terminators'
Adolfo, any opinion here ?
@Sophi, do you think we could turn this issue into an easyhack ?
(In reply to Xisco Faulí from comment #3) > @Sophi, do you think we could turn this issue into an easyhack ? I guess yes, it seems Pander has well documented the issue already.
Let's turn this into an easy hack then...
Dictionaries in LO are usually downstream, thus these changes should be done where the originals are maintained. Eg. the Italian dictionaries are now maintained by LibreItalia. Finding out if other dictionaries are still maintained, and if by the same person/group as in the current readmes does not look like a lot of fun, but probably that's what should be done.
Re-evaluating the EasyHack in 2022 This task is still relevant, and it is not finished yet. The credits lines are still there in the dictionary files, and other cleanups are yet to be done. But asking someone to find the source of dictionaries and update them in the upstream is not a straightforward, well defined project that can be useful for the EasyHackers. Therefore, I am removing the EasyHack keyword from this issue. Although some of the files are not updated regularly (even yearly), finding the link of upstream projects, and mentioning them here can help. I should also state that cleaning up the files here in the dictionaries repository can be also helpful, at least for some of the rarely updated files.
Dear Pander, To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug