Bug 106077 - Treat hyphenation character U+002D same as U+2010
Summary: Treat hyphenation character U+002D same as U+2010
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
5.2.5.1 release
Hardware: All Windows (All)
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Hyphenation
  Show dependency treegraph
 
Reported: 2017-02-18 15:34 UTC by Alfred Spalt
Modified: 2024-03-09 14:40 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
different hyphenation-characters in spell-checking (28.64 KB, image/jpeg)
2017-02-18 15:34 UTC, Alfred Spalt
Details
different hyphenation-charcters in spell-checking - improved (43.34 KB, image/jpeg)
2017-02-19 12:53 UTC, Alfred Spalt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alfred Spalt 2017-02-18 15:34:43 UTC
Created attachment 131319 [details]
different hyphenation-characters in spell-checking

Prerequisites:
* One word out of the standard dictionary, e.g. "downhill"
* Another word that was added to the dictionary, e.g. "wahnsinn"

Current Behavior:
When I combine words of the two categories described above (e.g. downhill-wahnsinn) using the linguistically correct hyphenation character U+2010, spelling is shown as correct.

However, when I use the ASCII hyphenation character U+002D, an error is shown.

Expected Behavior:
It would be great if LibreOffice spell checking treated the hyphenation character U+002D the same als U+2010.
Comment 1 Alfred Spalt 2017-02-19 12:53:28 UTC
Created attachment 131339 [details]
different hyphenation-charcters in spell-checking - improved
Comment 2 Xisco Faulí 2018-06-12 10:49:32 UTC
Thank you for reporting the bug.
Could you please try to reproduce it with the latest version of LibreOffice
from https://www.libreoffice.org/download/libreoffice-fresh/ ?
I have set the bug's status to 'NEEDINFO'. Please change it back to
'UNCONFIRMED' if the bug is still present in the latest version.
Comment 3 Alfred Spalt 2018-06-20 14:54:22 UTC
The "bug" is still present in version 6.0.4.2.
However, it is not a bug at all. Just a change request. As I stated before: a feature that would be nice to have.
Comment 4 Alfred Spalt 2018-06-20 14:56:13 UTC
I just noticed, that character U+2010 is not present in some fonts. I reproduced the "bug" with U+2012, which is present e.g. in the Liberation font family.
Comment 5 Buovjaga 2018-06-24 19:13:24 UTC

*** This bug has been marked as a duplicate of bug 85731 ***
Comment 6 ⁨خالد حسني⁩ 2018-06-26 17:29:50 UTC
This is a different issue than bug 85731 which is about the character being inserted at line break during hyphenation (i.e. output), not which characters are recognized as hyphen during input.
Comment 7 Alfred Spalt 2018-06-26 20:03:25 UTC
From a user's perspective, the situation is even more tricky:

Hyphen characters U+002D and U+2010D ARE both treated as word separators, if both words are in the standard dictionary.
U+002D is no longer treated as word separator as soon as one of the words is taken from a user's dictionary. See attachment 131339 [details].

So for me it looks like this is not an issue of the hyphenation library but rather one of how LO treats words from different dictionaries.
Comment 8 João Paulo 2024-03-09 14:40:11 UTC
I can confirm this bug still exists on:

Version: 24.2.1.2 (X86_64) / LibreOffice Community
Build ID: db4def46b0453cc22e2d0305797cf981b68ef5ac
CPU threads: 8; OS: Windows 10.0 Build 22631; UI render: Skia/Raster; VCL: win
Locale: pt-BR (pt_BR); UI: pt-BR
Calc: threaded

However, I noticed that, at least in PT-BR, the character U+002d is shown as correct, but the character U+2010 is shown as incorrect, when hyphenating words such as: ampliá‐la (verb+hyphen+pronoun).

To my point of view as a user, both characters should be considered correct for hyphenation, even if the technically more correct character (unambiguous and defined on the Unicode Standard) is U+2010.

It may seem as a frivolous request as the U+002d is in the keyboard and the U+2010 is not; however:

* there are fonts that show different glyphs for each of them (the height and/or the size is adjusted), and the difference is more noticeable when using italic or bold formatting;
* one of the use cases of LibreOffice is to create EPUBs and PDFs, and beautiful books use proper typographic characters or even obey a style guide (the Chicago Manual of Style is one of the best known).