Created attachment 67077 [details] Telugu test example (From the bug report by Steven Dickson:) There appears to be a logic error in the hnj_hyphen_rhmin function in the file hyphen.c. The function is supposed to remove hyphens from the right hand side of a word based on the value of RIGHTHYPHENMIN defined in the hyphenation pattern file for the language. It works properly for words containing only single-byte characters, but can fail if the word contains multi-byte characters. The code erroneously assumes that the last character of the word is a single-byte character and starts scanning the word at the next to last byte of the word. This can be corrected by initializing the character count variable, i, to 0 rather than 1 and starting the for loop with j = word_size – 1 rather than j = word_size -2. The code also erroneously increments the character count variable, i, while still inside of a mult-byte character. This can be corrected by only incrementing i when at the first byte of a multi-byte character (word[j] & 0xc0 == 0xc0) or when at a single-byte character (word[j] & 0x80 != 0x80). A diff of hyphen.c with the corrections follows. 737c737 < int i = 1; --- > int i = 0; 743c743 < for (j = word_size - 2; i < rhmin && j > 0; j--) { --- > for (j = word_size - 1; i < rhmin && j > 0; j--) { 756c756 < if (!utf8 || (word[j] & 0xc0) != 0xc0) i++; --- > if (!utf8 || (word[j] & 0xc0) == 0xc0 || (word[j] & 0x80) != 0x80) i++;
Also fixed in the Hyphen CVS: http://hunspell.cvs.sourceforge.net/viewvc/hunspell/hyphen/
Laszlo Nemeth committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=3d654071413bc107e0730dd31261c252f71572bf fdo#54843 righthyphenmin fix (patch by Steven Dickson) The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.