Bug 155310 - hyphenation line break function needs to be deprecated
Summary: hyphenation line break function needs to be deprecated
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.2.2.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-15 06:33 UTC by Ron
Modified: 2023-06-07 16:16 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ron 2023-05-15 06:33:00 UTC
Description:
Modern usage of hyphenated words should be treated as one word for the purpose of line breaks. 

In an earlier century typewriters would ring a bell one character before the margin and the current word could be broken across lines using a hyphen. This feature categorically does not exist, most particularly in a word processor. 

Steps to Reproduce:
1. Place a hyphenated word at the end of the line where the first part fits, and the hyphenated subsequent word does not. 
2. Adjust the margin until the break appears. 


Actual Results:
Will a copyrighted OS needs to be downloaded every time you boot? I thin my good friend the editor-
in-chief thinks it’s a good idea.

Expected Results:
Will a copyrighted OS needs to be downloaded every time you boot? I thin my good friend the editor-in-chief thinks it’s a good idea.


Reproducible: Always


User Profile Reset: No

Additional Info:
I notice google docs has this right. Just checked it.
Comment 1 Mike Kaganski 2023-05-15 07:11:00 UTC
Line breaking is discussed in detail in the Unicode's Annex #14 [1]; and the HYPHEN-MINUS is explicitly treated as a line breaking opportunity, except in numeric context.

It is completely unclear why should this standardized behavior be changed; is there some set of rules requiring this change?

[1] https://www.unicode.org/reports/tr14/
Comment 2 Ron 2023-05-15 07:41:08 UTC
Note that there is an automatic hyphenation switch. It makes perfect sense to break a dictionary word when this feature, automatic hyphenation, is on. And to treat a dictionary word with a hyphen as a word when it is off. 

This opportunistic line break when automatic hyphenation is off for dictionary words with hyphens is inconsistent.

And it leaves the stub of the hyphen. From this artifact it is clearly known what the correct behavior should be. There is no valid appeal to documentation on this point.
Comment 3 Mike Kaganski 2023-05-15 08:28:08 UTC
(In reply to Ron from comment #2)

Hyphenation - as splitting words by syllables to fill lines when justifying the text on both boundaries - is orthogonal to the HYPHEN-MINUS (and HYPHEN), as discussed in the document. These are used in places where the glyph shall be shown irrespective to the line breaks, as in your "editor-in-chief", so where these are discussed in the document, those sections apply to the issue discussed here.

SHY ("soft hyphen") is similar to the auto-hyphenation, but that is not what you are discussing here.

Also, there is an explicit NO-BREAK HYPHEN, created explicitly to express author's intention to *suppress* the line breaking opportunity.

So - again: there is an existing rule set defined in the document I mentioned; there are existing means and established rules how to avoid breaks; and there are *also* hyphenation rules, implemented separately. Up to now, you didn't provide any reference to a rule set that would justify a change.
Comment 4 TBeholder 2023-05-15 17:30:57 UTC
Unicode 0x2011 NON-BREAKING HYPHEN (“‑”) is what the example requires, yes.
A better way to use different hyphen symbols (and minus) than “ASCII 0x2D and guessing” would definitely be useful, however.
Perhaps for documents saved in a format that supports Unicode any explicitly entered hyphen should be interpreted as non-breaking. When using an engine that supports automatic hyphenation, that should be something distinct and easy to remove (maybe 0x00AD SOFT HYPHEN).
The simplest solution is to use <KP_Minus> for minus and upper row <minus> for (non-breaking) hyphen. The only problem is that some keyboards (laptops and screen) don’t have numpads.
Comment 5 ⁨خالد حسني⁩ 2023-05-15 18:53:52 UTC
I’m inclined to close this as NOTABUG, like Mike explained, the requested behavior is nonstandard and deviates from the established norms of Unicode and digital typography in general.
Comment 6 Ron 2023-05-16 04:24:59 UTC
When you turn off auto hyphenation, it still auto-hyphenates dictionary words containing hyphens.
Comment 7 Mike Kaganski 2023-05-16 05:07:52 UTC
(In reply to Ron from comment #6)
> When you turn off auto hyphenation, it still auto-hyphenates dictionary
> words containing hyphens.

No. Auto-hyphenation is a process of *inserting* hyphens where they weren't normally shown, and *then* using this as a break opportunity. Pre-existing hyphens are not *inserted* so no, these words are *not* "auto-hyphenated"; and these existing hyphens in compounds are treated as normal breaking opportunities (just like spaces, which are also sometimes part of "dictionary words"). This is called orthography.