Bug 128994 - EDITING / FORMATTING Bad management of em dash in Spanish language texts
Summary: EDITING / FORMATTING Bad management of em dash in Spanish language texts
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsDevAdvice
Depends on:
Blocks: AutoCorrect-Complete
  Show dependency treegraph
 
Reported: 2019-11-24 15:21 UTC by José Moya
Modified: 2023-04-22 11:25 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description José Moya 2019-11-24 15:21:22 UTC
Description:
In Spanish, em dash (—) is a parenthetical sign. This means it has to be treated as brackets or quotes when breaking lines:

In Spanish, dialogue is written this way:
—Blah blah blah —lorem ipsum dolor— blah

Notice there is no space between first — and blah: this means "dialogue start."

There is also no space between lorem ipsum dolor and the surrounding dashes. This means lorem...dolor should be interpreted as a comment.

If a line begins with dash (—), the line is interpreted as a dialogue quotation, thus the line should never break before the third dash (—), but after it. Just as it would do if a quote char were on its place.

This problem was reported years ago to OOo, and it seems no one took it in charge. I hoped some user from Libreoffice noticed the bug report in the forked app, but Libreoffice still has a bad management of "—" in Spanish.
Your competitor MsWord has the right behaviour from version 2013.

(More info about using dashes in Spain will appear if you look the Royal Academy web at http://lema.rae.es/dpd/?key=raya )





Steps to Reproduce:
1.Select Spanish as language.
2. Type "—baah —said little sheep— baah."
3. Add baah before "—said" until "— baah." arrives to a new line.
4. The second line will say "— baah". This is undesired.

Actual Results:
—baah baah baah —said little sheep
— baah

Expected Results:
—baah baah baah —said little sheep—
baah


Reproducible: Always


User Profile Reset: Yes



Additional Info:
Each language in the world has their own typographical rules. Ooo assumed all languages had the same rules than english and hardcoded some line break rules. I found that when trying to make a quick fix for Ooo.
Comment 1 V Stuart Foote 2019-11-24 20:25:09 UTC
Reading Unicode UAX#14 [1], it seems to be covered by the B2, and LB17 rules for handling U+2014 EM DASH. Although not clear it is fully resolved to all Unicode participants and ICU-8061 issue [2] remains unresolved. 

At some point should make it into ICU libs and be available to possibly use in LibreOffice edit engine(s). Until then it remains incumbent on users to mind their formatting.

And, as suggested in the AOO see also i122337, while writing dialog one could set the AutoCorrect localized options for open and close 'Double Quotes' to U+2014 by picking the EM DASH character.  Done from Tools -> AutoCorrect -> AutoCorrectOptions -> Localized Options.

=-ref-=
[1] http://www.unicode.org/reports/tr14/tr14-40.html
[2] https://unicode-org.atlassian.net/browse/ICU-8061?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel
Comment 2 V Stuart Foote 2019-11-24 20:49:16 UTC
Not clear if the dev effort needed to control break/linebreak handling of U+2014 EM DASH for es-XX locales should be done outside provisions of ICU lib handling. 

Durring editing insertion of a ZWNBS U+FEFF before or after the EM DASH is trivial. Annoying to have to do it, but easily done.

Worth keeping open on the back burner? Or, best to issue a WONTFIX until Unicode and ICU can resolve?
Comment 3 Heiko Tietze 2019-11-25 06:58:47 UTC
If resolved it has to be a NOB. But I would keep it. Would be interesting if the line break depends on OS and program, eg. I wonder how simple text editors deal with it.
Comment 4 José Moya 2019-11-25 22:21:21 UTC
Hi!


I have reported to Unicode Consortium. I can't believe the Spanish companies and institutions represented at Unicode Consortium did not report this, but, hey, they are Spanish, so they are not supposed to do their job. 

I want to clarify I am the same "Arcalaus" that wrote about quotes at the link https://bz.apache.org/ooo/show_bug.cgi?id=122337 

But there is a misunderstanding. I am not talking about using the Autoquotes to change quotes into em dashes. That would not solve the situation.

I am talking about what, in Unicode jargon, would be, "removing em dash from B2 and assigning it to QU if the language setting is Spanish".

 
Also, please notice that inserting a ZWNBS before every em dash would be a pain in the *ss. Just imagine a 50k word novel with 10000 lines of dialogue, every one featuring between one and three em dashes. Inserting the ZWNBSP is difficult, and programming a macro to do it is way difficult (I can program a MsWord macro in minutes, but a OOo / LibreOffice macro gets me weeks).

Finally, there is the question of compatibility between platforms. I am writing you because in my new job they use LibreOffice. I have word at home. A literature exam prepared in my home, with the em dashes in the right places, gets a piece of crap (em dashes put in places where they should not go) when I print it at my workplace. Yes, I could use PDF, but I am against PDF for my own reasons.


Yours,

José Moya
Comment 5 V Stuart Foote 2019-11-25 23:50:32 UTC
(In reply to José Moya from comment #4)
> I have reported to Unicode Consortium. I can't believe the Spanish companies
> and institutions represented at Unicode Consortium did not report this...

The issue was already reported and open with Unicode (at least the ICU project) as ICU-8061 [1]; and looks like your report was ICU-10754 [2]. 

Pending any LibreOffice dev comment, we'll either close WF, or set aside to see what comes out in ICU libs (or maybe in CLDR) that we might implement.

=-ref-=
[1] https://unicode-org.atlassian.net/browse/ICU-8061
[2] https://unicode-org.atlassian.net/browse/ICU-10754
Comment 6 Heiko Tietze 2019-11-26 13:34:43 UTC
(In reply to José Moya from comment #4)
> please notice that inserting a ZWNBS before every em dash would be a pain

Not zero but narrow non-breaking space U+202F is available under Insert > Formatting Mark or per shift+alt+space since 6.3 IIRC.

And you could also create a new rule for autocorrection. Just as workaround, correct implementation is needed of course.