Bug 42383 - Bad non-standard hyphenation of diaeresis and Unicode f ligatures
Summary: Bad non-standard hyphenation of diaeresis and Unicode f ligatures
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
3.4.3 release
Hardware: All All
: low minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Font-Rendering
  Show dependency treegraph
 
Reported: 2011-10-29 12:34 UTC by Reinout van Schouwen
Modified: 2023-12-04 03:14 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Test spreadsheet with hyphenation for Calc (16.45 KB, application/vnd.oasis.opendocument.spreadsheet)
2013-04-22 11:19 UTC, László Németh
Details
summary (screenshot in Calc) (41.77 KB, image/png)
2013-04-22 11:23 UTC, László Németh
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Reinout van Schouwen 2011-10-29 12:34:44 UTC
This bug is carried over from http://openoffice.org/bugzilla/show_bug.cgi?id=71608 .
--------
It seems, non-standard hyphenation (hyphenation with alternative spelling)
support of OOo has an implementation bug: it doesn't break Dutch and Greek
words with diaeresis correctly. Mostly non-standard hyphenation works well (for
example, Dutch omaatje -> oma- tje, cafeetje -> café- tje), but it doesn't with
diaeresis: reëel -> ree- el is bad, we need re- eel. Maybe we need hardwired
language dependent patch... It is not an external hyphenator component problem,
but inner hyphenation support of Writer. Test data attached.

------- Comment #9 From tl@openoffice.org 2007-02-09 10:52:32 -------
When looking at it with SRC680 m200 I found the following:
- in SO the hyphenation position for reëel is re=ëel and the hyphenated word
becomes re=eel. As of m202 the hyphenated word is re=ëel.
- OOo the hyphenated word is also re=ëel

The above results were directly obtained from the hyphenator.
(You may use the Basic script below to check)
Thus it is a problem of the specific implementations.
As for SO there can nothing be done but report this to the vendor,
and for OOo someone needs to patch the hyphenation patterns.

Thus I'm reassigning this issue to lingucomponent.

Sub Main

xH = createUnoService("org.openoffice.lingu.LibHnjHyphenator")
'xH = createUnoService("com.sun.star.lingu2.Proximity.Hyphenator")

dim nl_NL as new com.sun.star.lang.Locale
nl_NL.Language = "nl"
nl_NL.Country  = "NL"

xHW = xH.hyphenate( "reëel", nl_NL, 3, DimArray() )
'xHW = xH.hyphenate( "Hundefutter", nl_NL, 3, DimArray() )

msgtxt = " " + xHW.getHyphenatedWord() + " " + xHW.getHyphenPos()
msgbox msgtxt

End Sub
------- 

------- Comment #11 From nemeth@openoffice.org 2007-10-03 20:45:25 -------
Nemeth->TL: I have tried the script with the attached data, and I have got 
"re=eel" (reeel 1) and oma=tje (omatje 2), so it seems for me, it is the bug of
OpenOffice.org's implementation, not the LibHnj non standard hyphenation
extension. Maybe hyphenpos=1 wrongly forbidden by the 2-characters limit.
Please, check my example, not the default Dutch hyphenation pattern. LibHnj
executable works well on my example. Thanks in advance, Laci

------- Comment #14 From tl@openoffice.org 2007-11-23 11:36:23 -------
Testing with SRC680 m227:
- SO: reëel gets hyphenated in the document as ree-el
  but the hypenator say it should be re-eel
- OOo: reëel gets hyphenated as re-ëel
  and the hyphenator says the same.

I don't know which hyphenator is right or wrong (and if the SO hyphenator
result is wrong it can't be fixed on our side, it needs to be reported to the vendor).
But clearly since the SO hyphenator says re-eel an actual document should
behave similar. Thus we have a problem with the algorithm here.

I don't see any problem with OOo hyphenator unless someone says that the result
from the OOo hyphenator should not be re-ëel because that one is wrong.

Does someone have input on the correct hyphenation of reëel?

For the time being I will keep this issue and with since there seems to be a
problem with the code for evaluating alternative spellings (as already
expected).
------- Comment #15 From tl@openoffice.org 2007-11-23 11:40:23 -------
TL-Nemeth: I missed that the correct hyphenation for reëel was already listed
as being re-eel. Thus the OOo hyphenator or it's dictionary file needs to be
fixed.

Since I will use this issue to fix the problem in the code for evaluating
alternative spellings please submit a new one for either or both of the above
changes in OOo.

------- Comment #16 From nemeth@openoffice.org 2007-11-23 13:10:32 -------
Nemeth->TL: Thanks for your check and comment. This bug report was only a
theoretically problem with attached test data, because nobody worked on Dutch
or Greek non-standard hyphenation patterns a years ago, when I checked my
alternative/non-standard hyphenator patch in OpenOffice.org. But now here is
the result of OpenTaal project, the extended Dutch hyphenation patterns, and
OpenOffice.org (and StarOffice) can't handle half of the Dutch non-standard
hyphenation described by the hyphenation patterns correctly.

I believe, OpenTaal's activity and result (see
http://www.linux.com/feature/116697 for example) and collaboration with
OpenTaal is very important for the future of OpenOffice.org, because we would have official certificated spell checking and hyphenation in OpenOffice.org at least for one language. I have modified the language specifics summary according to your plan. Thanks in advance, Laci
------- Comment #17 From tl@openoffice.org 2007-11-26 13:32:38 -------
When checking this I found the problem is not the SvxGetAltSpelling function
(which I suspected to be at fault). Instead it is with the actual
implementation
that evaluates that result and does the line breaking.

That has two consequences:
a) If that one is to be fixed it needs to be fixed in each application 
   separately. Thus specific issues for Calc and Draw/Impress are required.
b) I was told the area that is effected by the required change is quite 
   tricky and troublesome to change.

Also it looks to me that the actual problem itself is not about the diaeresis
at all. But about the position of the text to be changed:
When comparing it to alternative spelling in the now outdated German pre-reform
spelling the problem is this
- in German Bäc-ker changed to Bäk-ker when getting hyphenated 
- and in Dutch re-ëel should become re-eel 
The difference is that in the German example the char left to the hyphenation
position changes (which is sufficient for German) where in the Dutch example it
is the one to the right.

The code parts that take care of alternative spellings in Writer are rather old
and were probably implemented for German at that time. No one needed text
changes to the right and thus it was never implemented... :-(
------- Comment #18 From tl@openoffice.org 2007-11-26 13:43:10 -------
If that one gets fixed it should be done future safe.
That is:
- the text change need not be directly next to the hyphen
- it should allow for more than one letter changes to the left
- it should allow for more than one letter changes to the right
- it should allow for all of the above at the same time

Basically speaking it should be able to handle all possible results that the
function SvxGetAltSpelling may return. (And that one is flexible enough to
allow for complete new words...)

------- Comment #22 From nemeth@openoffice.org 2010-03-10 10:41:33 -------
This is problem for the hyphenation of f ligatures.

efficiency -> ef-ficiency (Nor a simple fi -> f=i hyphenation doesn't work.)

(By the way, the automatic OpenType solution of ligature handling has also
potential problems: some languages, for example German doesn't use ligatures at
word part boundaries in compound words. Also the HYPHENMIN values depends from
the usage of ligatures. The fi- can be in the end of the lines in Hungarian,
but this hyphenation is deprecated with ligatures.)
Comment 1 sasha.libreoffice 2012-03-06 07:33:36 UTC
@ erack@redhat.com
Please, look at this bug when will have time. And tell me to which expert would be right to talk with such problems. I just guessed expert from FindTheExpert. May be missed.
Comment 2 Eike Rathke 2012-03-07 06:18:15 UTC
Sorry I can't help here, I'm not familiar at all with hyphenation. Judging from the copied comments #17 and #18 above I'd say that in Writer and EditEngine (that use the result of SvxGetAltSpelling()) the correct character changes would have to be implemented. So I think best would be some Writer developer.
Comment 3 sasha.libreoffice 2012-03-07 06:36:11 UTC
Thanks for reply
Comment 4 Joel Madero 2012-10-23 17:47:10 UTC
Per Eike's recommendation, CC'ing Michael on this one. 

@Michael: any opinions?

Regards,
Joel
Comment 5 László Németh 2013-04-22 11:19:45 UTC
Created attachment 78326 [details]
Test spreadsheet with hyphenation for Calc

Hyphenation patterns for the test file (replacement for the German hyph_de.dic): https://bugs.freedesktop.org/attachment.cgi?id=78225 (see Bug 63711)
Comment 6 László Németh 2013-04-22 11:23:04 UTC
Created attachment 78327 [details]
summary (screenshot in Calc)
Comment 7 László Németh 2013-04-22 11:43:34 UTC
This is a problem also for Calc and drawing shapes, too (there were more problems there, but most of them have been fixed, see Bug 63711), see the attached screenshot.
Comment 8 QA Administrators 2016-02-21 08:38:16 UTC Comment hidden (obsolete)
Comment 9 QA Administrators 2017-03-06 16:12:40 UTC Comment hidden (obsolete)
Comment 10 QA Administrators 2019-12-03 13:58:45 UTC Comment hidden (obsolete)
Comment 11 QA Administrators 2021-12-03 04:23:39 UTC Comment hidden (obsolete)
Comment 12 QA Administrators 2023-12-04 03:14:56 UTC
Dear Reinout van Schouwen,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug