Bug 43931 - : Unwanted behaviors due to Hyphen 2.8.3
Summary: : Unwanted behaviors due to Hyphen 2.8.3
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.0 Beta1
Hardware: Other All
: high major
Assignee: László Németh
QA Contact:
URL:
Whiteboard: BSA
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-18 08:17 UTC by mlodewijck
Modified: 2013-11-27 14:08 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Current behavior with new Hyphen (12.03 KB, application/pdf)
2011-12-18 08:17 UTC, mlodewijck
Details
Test file (8.91 KB, application/vnd.oasis.opendocument.text)
2011-12-21 06:01 UTC, László Németh
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mlodewijck 2011-12-18 08:17:04 UTC
Created attachment 54539 [details]
Current behavior with new Hyphen

Version ID : 7362ca8-b5a8e65-af86909-d471f98-61464c4

Problem description: 

Unwanted behaviors due to Hyphen 2.8.3 (2011/10/10) in hyphenated compounds with apostrophe.

Current behavior:

e.g.:
sot=l’y=laisse > sotl'-[y=laisse
va=t’en > vat'-[en

Expected behavior:

sot=l’y=laisse > sot=[l’y=[laisse
va=t’en > va=[t'en

Additionnal remarks:

Workaround: explicite declararion of NEXTLEVEL in pattern dictionary file.

Did send a mail to Lázló Németh (2011/12/18).


Thanks.
Comment 1 László Németh 2011-12-21 00:25:15 UTC
It seems, the following explicite declaration fix the problem:

NOHYPHEN ',’
1-1
1'1
1’1
NEXTLEVEL

I suggest to remove all of your extra patterns with hyphens and apostrophes.

By the way, the recent implicit declaration, with the problematic hyphen replacement (it was the fix the old double hyphens at hyphenations of hard hyphen by Hyphen patterns, now I cannot reproduce this problem, maybe a parallel fix of LibreOffice resulted the problem with the French/etc. hyphenation):

NOHYPHEN ',’
1-1/=,1,1
1'1
1’1
NEXTLEVEL

I will fix it in the source of LibreOffice, too.
Comment 2 László Németh 2011-12-21 06:01:08 UTC
Created attachment 54636 [details]
Test file
Comment 3 László Németh 2011-12-21 06:07:00 UTC
Fixed in LibreOffice: http://cgit.freedesktop.org/libreoffice/core/commit/?id=4a3ca24020bdaa956acbefd911e688917c7fa3dd

Now the French hyphenation works well without explicite NOHYPHEN and NEXTLEVEL declarations in the hyphenation dictionary. See the attached test file.
Comment 4 mlodewijck 2011-12-21 23:11:10 UTC
(In reply to comment #3)
> Fixed in LibreOffice:
> http://cgit.freedesktop.org/libreoffice/core/commit/?id=4a3ca24020bdaa956acbefd911e688917c7fa3dd
> 
> Now the French hyphenation works well without explicite NOHYPHEN and NEXTLEVEL
> declarations in the hyphenation dictionary. See the attached test file.


Thank you, László! I will check this as soon as I can afford it.

By the way, I noticed that the NOHYPHEN parameter has been problematic from the start for the French words (if no explicit additional patterns) :-/

ml
Comment 5 Caolán McNamara 2011-12-22 15:49:56 UTC
caolanm->László: FWIW I got a little confused in this area today. i.e. I bumped hyphen to version 2.8.3 in master and I believe I accidentally wiped out this fix (for master, not 3-5) because the change happened inside the hyphen-2.7.1-2.8.3 patch which I presumed just upgraded 2.7.1 to 2.8.3 :-(

This fix here isn't actually in hyphen-2.8.3 right ?, and it isn't in upstream hyphen CVS for to-be 2.8.4 either yet right ? (should it be ?)

I think I restored this specific fix as http://cgit.freedesktop.org/libreoffice/core/commit/?id=84897d4b3b2a0e4719b00fb06abb8c04e3c20c24
Comment 6 László Németh 2011-12-23 07:44:52 UTC
(In reply to comment #5)
nemeth->Caolán: This is a quick fix only for LibreOffice, yet. If this modification could result double hyphens by hyphenation near hard hyphens sporadically (I wasn't be able to reproduce this problem with the recent 3.5 master), I will limit this patch only for French (using explicit patterns) before searching the root of the apostrophe problem (in hyphen, lingucomponent or other module). Original hyphenation algorithms (the default hyphenation at hard hyphens and the libhyphen based hyphenation) had no conflict, but from OOo 3.3 libhyphen gets words with hyphens, too, and now we need to hyphenate at hard hyphens to fix the frequent missing hyphenation (not only at hard hyphens).
Thanks for preserving the patch. I will test it again in beta 2 and the recent source.
Comment 7 László Németh 2011-12-27 06:29:34 UTC
I have found double hyphens in my tests (eg. in the word "va-t’en-touil") and some other anomalies (eg. forced break in all position of "touil" without hyphens in va-t’en-touil), so I completely removed the libhyphen based breaks at hard hyphens:

http://cgit.freedesktop.org/libreoffice/core/commit/?id=af366b733201bc3ed982e807c3ca4bc300b3700d

The implicite declaration:

NOHYPHEN ',’,-
1-1
1'1
1’1
NEXTLEVEL

These patterns fix the bad hyphenation of words with hard hyphens (resulted by the different word boundaries), but now they don't fix the frequent missing hyphenation at hard hyphens (resulted by the competing hyphenation mechanisms).

> Original hyphenation algorithms (the default hyphenation at hard
> hyphens and the libhyphen based hyphenation) had no conflict, but from OOo
> 3.3 libhyphen gets words with hyphens, too, and now we need to hyphenate at
> hard hyphens to fix the frequent missing hyphenation (not only at hard
> hyphens).

Sorry, the hyphenation could miss only hard hyphens, when there is enough space (for a libhyphen based search for a potential hyphenation break after the hard hyphen) and there is an appropriate hyphenation point before the hard hyphen. Eg. "eighteen-year-old" will be hyphenated as "eigh=teen-year-old" instead of the possible "eighteen=year-old", especially when there is more free place in the line.