Bug 158885 - Hyphenate compound words at compound constituent boundaries, and not near them
Summary: Hyphenate compound words at compound constituent boundaries, and not near them
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:24.8.0 inReleaseNotes:24.8
Keywords:
Depends on:
Blocks: Hyphenation
  Show dependency treegraph
 
Reported: 2023-12-27 11:07 UTC by László Németh
Modified: 2024-07-01 18:53 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Swedish test document (24.19 KB, application/vnd.oasis.opendocument.text)
2023-12-31 14:17 UTC, László Németh
Details
Hungarian test document (28.04 KB, application/vnd.oasis.opendocument.text)
2023-12-31 14:20 UTC, László Németh
Details
Screenshot before fixing Hungarian hyphenation (101.54 KB, image/png)
2023-12-31 14:23 UTC, László Németh
Details
Screenshot after fixing Hungarian hyphenation (125.75 KB, image/png)
2023-12-31 14:26 UTC, László Németh
Details

Note You need to log in before you can comment on or make changes to this bug.
Description László Németh 2023-12-27 11:07:19 UTC
Description:
For readability and tradition, orthography and typography prefer or only allow hyphenation between stems in compound words in several languages, like Danish, Dutch, German, Hungarian, Norwegian and Sweden.

Hyphenation zone is to avoid of too much or bad hyphenation. Preferring stem boundaries for hyphenation within the hyphenation zone is a natural extension of  it, i.e. skip hyphenation within stems, if there is stem boundary within the hyphenation zone.

More information: “The hyphenation problems in Swedish have to do with the high frequency of compound words (the Swedish vocabulary can’t be enumerated: new compounds are easily created by anyone) and the rule that a compound word shall always be hyphenated between the constituent word parts, to ease the flow of reading.”  quoted in Notes on Compound Word Hyphenation in TEX by Petr Sojka, TUGboat, Volume 16 (1995), No. 3 — Proceedings of the 1995 Annual Meeting (https://tug.org/TUGboat/tb16-3/tb48soj2.pdf).

Steps to Reproduce:
1. open the attached test file after installing the Hungarian hyphenation patterns

Actual Results:
hyphenation of “főbejárat” (main entrance) as “főbe-járat” (resulting words with different meaning: “főbe” = “into head”, “járat” = “route/flight”).

Expected Results:
fő-bejárat (“main” and “entrance”)


Reproducible: Always


User Profile Reset: No

Additional Info:
Note: libhyphen contains some compound word support and hard-wired arguments for limiting hyphenation near the stem boundaries, but only German hyphenation patterns started to use that, and the solution doesn't work for compounds containing custom words (see Grammar By feature of the custom dictionary of LibreOffice).
Comment 1 László Németh 2023-12-31 14:17:59 UTC
Created attachment 191668 [details]
Swedish test document
Comment 2 László Németh 2023-12-31 14:20:20 UTC
Created attachment 191669 [details]
Hungarian test document
Comment 3 László Németh 2023-12-31 14:23:56 UTC
Created attachment 191670 [details]
Screenshot before fixing Hungarian hyphenation
Comment 4 László Németh 2023-12-31 14:26:21 UTC
Created attachment 191671 [details]
Screenshot after fixing Hungarian hyphenation

(Not changed hyphenation in a few lines is only for testing.)
Comment 5 Commit Notification 2023-12-31 23:43:18 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/c899d3608d30f3ab4c2bc193c1fcd765221614a4

tdf#158885 sw: don't hyphenate right after a stem boundary

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Commit Notification 2024-03-20 12:04:58 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/3a332d9f1cacb3c6f81fcf6c08afa51d091ddff4

tdf#158885 cui offapi sw xmloff: fix hyphenation at stem boundary

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Commit Notification 2024-04-01 13:33:20 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/help/commit/c2349a58a8e43f44a796f8e21053f4ca6de6fb4c

tdf#106733 tdf#158885 add "Exclude from hyphenation, Compound characters...