162502 – Treat direction-neutral characters with language according to their role in that language

Bug 162502 - Treat direction-neutral characters with language according to their role in that language

Summary: Treat direction-neutral characters with language according to their role in t...

Status:	UNCONFIRMED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	LibreOffice (show other bugs)
Version: (earliest affected)	Inherited From OOo
Hardware:	All All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:	QA:needsComment
Keywords:

Depends on:	148257
Blocks:	129038 153378 RTL Script-Assignment
	Show dependency tree / graph

Reported:	2024-08-17 16:48 UTC by Eyal Rozenberg
Modified:	2025-04-02 20:12 UTC (History)
CC List:	3 users (show)

See Also:
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Eyal Rozenberg 2024-08-17 16:48:50 UTC

This bug depends on 148257 having been fixed, i.e. when we have stretches of text which are explicitly/definitely marked as being in a certain language. When this is the case, we should qualify the application of the Unicode Bidirectional Algorithm when it comes to neutral characters like '-', '?', western-arabic digits etc.:

When a directionality-character is marked as being in a language in which it does not interrupt direction runs of text in that language, e.g. '-' for English where everything is LTR, we should treat is as a strongly-directional character with the  directionality of that language.

Thus, for example, if I write "-fax" in an RTL paragraph, the visual layout will be:

fax-

two runs, a 1-char RTL run and a 3-char LTR run. But if we mark this text as being in English, we should see:

-fax

a single LTR run despite - being a neutral character in general - because we know that the minus is part of a sequence of characters in English.


Caveat: Some languages may not have a single directionality, like Japanese; in which case we should either treat the character as neutral or apply some other logic.

------------------------------

Alternative, weaker option: Instead of treating the character as strongly-directional, "bias" the neutral character direction so that it takes its language's direction if the stretch of neutral chars has a stretch of chars in its language either before or after it.

Comment 1 Eyal Rozenberg 2024-09-21 15:49:24 UTC Comment hidden (invalid, obsolete)

To "reproduce" this, do the following in LO and in MS Word:

1. Create a new document
2. Make your paragraph LTR
3. Be in an English keyboard layout
4. Type a
5. Be in a Hebrew keyboard layout and type ALEPH, or alternatively paste ALEPH  (but don't insert it using the SCD)
6. Be in an English keyboard layout
7. Type: equals, 5

In LO, the direction of ALEPH,equals,5 will flip when you type the 5.
In MS Word - it will not, provided that you were using an English keyboard layout.

I believe this is due to MS Word infusing the characters with a language - and thus not treating them as purely neutral.

You can verify that by selecting individual characters after pasting each additional character. In LO, the equals, for example, will start out as English, but will then become Hebrew, supposedly, after typing the 5. In MS Word, the equals remains English - Word remembers that when you typed it, you were typing English. 

Mike notes that if you're in MS Notepad - you get the same behavior as in LO, because Notepad is a plain text editor, and does not save the language a character is supposed to be in, just the characters themselves.