Bug 158127 - INDEX should use en dash (not hyphen) for number ranges
Summary: INDEX should use en dash (not hyphen) for number ranges
Status: ASSIGNED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Benjamin Johll
URL:
Whiteboard:
Keywords: difficultyMedium, easyHack, skillCpp
Depends on:
Blocks: TableofContents-Indexes Authors
  Show dependency treegraph
 
Reported: 2023-11-09 10:16 UTC by R. Green
Modified: 2025-05-02 03:58 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description R. Green 2023-11-09 10:16:02 UTC
Version: 7.5.4.2 (X86_64) / LibreOffice Community
Build ID: 36ccfdc35048b057fd9854c757a8b67ec53977b6
CPU threads: 2; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-GB (en_GB.UTF-8); UI: en-GB
Calc: threaded

Number ranges are ALWAYS written by using dashes, e.g. 23–29, NOT hyphens (i.e. NOT 
23-29).

Unfortunately, indexes are generated in LO Writer using hyphens rather than the correct em dashes.

So, hyphens in index number ranges need to be replaced with em dashes.
Comment 1 R. Green 2023-11-24 09:51:52 UTC
Big oops! That should have been EN (repeat EN) dashes NOT em dashes.
Comment 2 Dieter 2023-11-26 12:32:52 UTC
As far as I can see, "ALWAYS" is not true. Wikipedia says for example APA-Stiyle uses en-dash, while AMA-Style uses hyphen: https://en.wikipedia.org/wiki/Dash

So perhaps there should be an option in index dialog. The option "Combine with -" is too vague. To have the options "Combine with hyphen" and "Combine with en-dash" would be an enhancement.

cc: Design-Team
Comment 3 Heiko Tietze 2023-11-27 11:08:57 UTC
Quick and dirty solution would be to change "aNumStr += "-";" in sw/source/core/doc/doctxm.cxx. But I like the idea with the option, which should be available in the ToC dialog offered as dropdown list (cannot think of another list separator than dashes) instead of "combine with -".

I wonder if the file format has any restriction and what MSO makes out of those documents. If I manually replace the dash it's read in both Writer and MSO correctly (of course replaced on update).
Comment 4 V Stuart Foote 2023-11-27 13:16:01 UTC
Other facet is localization. The TOC/Index generator (core/tox and header)  seem to have additional TOC/Index structure for CJK and CTL nodes. 

Rather than just the appended U+002D HYPHEN-MINUS as U+2013 EN DASH what could other locales require?
Comment 5 Heiko Tietze 2023-11-27 13:22:15 UTC
(In reply to V Stuart Foote from comment #4)
> Rather than just the appended U+002D HYPHEN-MINUS as U+2013 EN DASH what
> could other locales require?

Wikipedia lists four types: En dash, Em dash, Horizontal bar, Figure dash, plus the U+002D hyphen makes it five. I can also imagine running text <1> "to" <2" (localized, of course).
Comment 6 Heiko Tietze 2024-01-31 09:09:20 UTC
No further input, let's implement.
Comment 7 Benjamin Johll 2025-04-25 07:28:30 UTC
Working on implementing this.
Comment 8 Tex2002ans 2025-05-02 03:58:10 UTC
Awesome, thanks Benjamin.

Can't wait to see this added. :)

- - -

On Comment #3 and Comment #5: In this case, between number ranges, all we need is the:

- - = U+002D = HYPHEN-MINUS
--- This is the one on your keyboard.
- – = U+2013 = EN DASH
--- This is the typographically correct choice.

- - -

Technical Note on Dashes: There are quite a few other "dash-like" characters in Unicode.

But all of those aren't used in this specific case... and/or come with serious side effects, like:

- Missing in many fonts.
- Broken Text-to-Speech.

I've written about this extensively over the years.

For example:

- https://www.reddit.com/r/libreoffice/comments/wxp7ps/make_it_look_beautiful/ilwpn34/
--- See my "Tip #5: Use the Proper Dashes".
--- This covers the most common 3/4 types.
- https://www.reddit.com/r/PubTips/comments/lvfad3/pubq_quick_question_about_the_em_dash/gpqmen9/
--- "Dash/Hyphen Basics"
--- More proper use-cases.
- https://www.reddit.com/r/writing/comments/9q1jzi/punctuation_is_important_too/e88105a/?context=3
--- Covering some Text-to-Speech issues.
- https://www.mobileread.com/forums/showthread.php?p=3952918#post3952918
--- Covering HORIZONTAL BAR / "quotation dash" / U+2015.
--- Some languages use this in the beginning of dialogue/QUOTATIONS, not number ranges.
- https://www.reddit.com/r/libreoffice/comments/1jk0qa5/how_to_replace_with_em_dash/mk15yqw/
--- Covering dozens of extremely obscure "symbols that look like lines".