158099 – Auto numbering needs more options for RTL languages to change direction of numbers

Bug 158099 - Auto numbering needs more options for RTL languages to change direction of numbers

Summary: Auto numbering needs more options for RTL languages to change direction of nu...

Status:	RESOLVED DUPLICATE of bug 149824

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	LibreOffice (show other bugs)
Version: (earliest affected)	7.6.2.1 release
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	RTL
	Show dependency tree / graph

Reported:	2023-11-07 06:56 UTC by M.Mahdi
Modified:	2024-08-03 09:15 UTC (History)
CC List:	4 users (show)

See Also:	155470
Crash report or crash signature:

Attachments
Wrong (upper part) and correct (lower part) autonumbering (48.10 KB, image/jpeg) 2023-11-07 07:00 UTC, M.Mahdi	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description M.Mahdi 2023-11-07 06:56:19 UTC

Description:
Hi.
I as a native Persian writer has a serious problem about auto numbering in LibreOffice and specially about writer and that is RTL numbering. Assume I have this text:

1. Chapter 1
1.1. Section 1
1.2. Section 2
1.2.1. Subsection 1
1.2.2. Subsection 2
1.2.3. Subsection 3

And these are Heading1 to Heading3. This in Left to right languages is OK because semantically is correct. In numbers, first part is chapter number, second is section and third is subsection.
In RTL languages, the standard for numbering is Right to Left. I mean, above text should be like this:

1. فصل اول
1.1. بخش اول
2.1. بخش دوم
1.2.1. زیربخش اول
2.2.1. زیربخش دوم
3.2.1. زیربخش سوم

but in the current version of LibreOffice, it is like this:

1. فصل اول
1.1. بخش اول
1.2. بخش دوم
1.2.1. زیر بخش اول
1.2.2. زیربخش دوم
1.2.3. زیربخش سوم

And numbering are left to right, and this is not true semantically.
So my request is adding an option in numbering section or localization to make this auto numbering RTL as like as the text.

Steps to Reproduce:
1. write text
2. enable autonumbering by toggle ordered list and change outline format to enable sublevels
3. make the text RTL

Actual Results:

1. فصل اول
1.1. بخش اول
2.1. بخش دوم
1.2.1. زیربخش اول
2.2.1. زیربخش دوم
3.2.1. زیربخش سوم

Expected Results:

1. فصل اول
1.1. بخش اول
1.2. بخش دوم
1.2.1. زیر بخش اول
1.2.2. زیربخش دوم
1.2.3. زیربخش سوم

Reproducible: Always

User Profile Reset: No

Additional Info:
Version: 7.6.2.1 (AARCH64) / LibreOffice Community
Build ID: 56f7684011345957bbf33a7ee678afaf4d2ba333
CPU threads: 8; OS: Mac OS X 14.0; UI render: Skia/Metal; VCL: osx
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

Comment 1 M.Mahdi 2023-11-07 07:00:38 UTC

Created attachment 190698 [details]
Wrong (upper part) and correct (lower part) autonumbering

Comment 2 افشین 2023-11-07 07:52:00 UTC

I confirm it.

Comment 3 Hossein 2023-11-07 11:45:31 UTC

I confirm that at least some universities provide the thesis templates that require such a numbering.

It is worth noting that the separator here is ., which is not common. It has two problems:
1. It may change to / according to the fa_IR locale.
2. The usual separator in Persian language is dash (-).

> In RTL languages, the standard for numbering is Right to Left.
It is not standard, or universal. That is why I think this feature request should be implemented as an option in which the users can choose if they want to create the numbers as left to right (as before), or right to left.

On the other hand, it both cases the rendering should be compatible with MS Word. It seems that this is the real issue here. As an example, see the table of contents for this document:

TOC on page 11:
https://zand.ac.ir/wp-content/uploads/2020/10/%D9%81%D8%B1%D9%85%D8%AA-%D9%82%D8%A7%D9%84%D8%A8-%D9%BE%D8%B1%D9%88%DA%98%D9%87-%D9%BE%D8%A7%DB%8C%D8%A7%D9%86%DB%8C-%DA%A9%D8%A7%D8%B1%D8%AF%D8%A7%D9%86%DB%8C-%D9%88-%DA%A9%D8%A7%D8%B1%D8%B4%D9%86%D8%A7%D8%B3%DB%8C.docx 

Note that in Word it is -1-1-2 comes after -2-1-1, but in LibreOffice it is vice versa.

It is a manual table of contents, but the problem is visible.

@M.Mahdi:
Can you provide an example, in which in MS Word the rendering is as you want, but the same file renders incorrectly in LibreOffice Writer?

Comment 4 ajlittoz 2023-11-07 21:11:05 UTC

IMHO, the problem comes from too simplistic a number recognition.

In Arabic/Persian, text is written RTL but numbers are written the same as in European scripts, i.e. most significant digit at left, least significant at right. Then numbers look LTR.

A number can contain a decimal separator. Depending on country, this separator is either U+002E FULL STOP or U+002C COMMA. Both these characters are "direction neutral". This means that when met in a sequence of characters they won't change the "directionality" nor create a break. Then any sequence of digits, full stops and commas constitutes a block which will be inserted as is in the output flow.

After the multi-level list item number has been generated, it is passed to the layout engine and its "multi-part" semantics is lost. Though it is an invalid number dues to the presence of several decimal separators, the text engine only sees a "homogeneous" sequence and will lay it out as a whole instead of considering each component (level) in its turn and each decimal as a word separator (which would also change the position of the separator).

Presently, list styles don't allow to change the intermediate level separator which is hard-wired to U+002E FULL STOP. There is no workaround.

A possible ugly fix would be to scan the numeric string for multiple decimal separators. If more than one is found, the string is split; then numbers and separators are individually sent to output.

However this does not solve the case of a level-2 numbering of the form 2.1 without final dot.

Part of the solution goes through improving list style configuration by allowing more control on the intermediate separator. A probably elegant solution not involving scanning the item number could be to add U+200C ZERO WIDTH NON-JOINER before and after the intermediate separator (and perhaps also before and after separators). Should this be done automatically or left under user control (for more fancy formatting)?

Comment 5 M.Mahdi 2023-11-08 06:30:46 UTC

(In reply to Hossein from comment #3)
> @M.Mahdi:
> Can you provide an example, in which in MS Word the rendering is as you
> want, but the same file renders incorrectly in LibreOffice Writer?

Unfortunately, I haven't MS Office and can't test it.

Comment 6 M.Mahdi 2023-11-08 06:53:26 UTC

(In reply to ajlittoz from comment #4)

Thanks for your informing about how the system works. Now the issue make sense.

> Presently, list styles don't allow to change the intermediate level
> separator which is hard-wired to U+002E FULL STOP. There is no workaround.

> Part of the solution goes through improving list style configuration by
> allowing more control on the intermediate separator. A probably elegant
> solution not involving scanning the item number could be to add U+200C ZERO
> WIDTH NON-JOINER before and after the intermediate separator (and perhaps
> also before and after separators). Should this be done automatically or left
> under user control (for more fancy formatting)?

And it's another feature request! But not in this place. I should search about previous related requests first....
According to your comment, the main issue is intermediate separators that if it could be U+2013 EN DASH character, then the system don't change the direction of numbers. Anyway, I tested U+200C ZERO WIDTH NON-JOINER in a regular RTL text and in this case it did nothing and the issue is still there. If I write Persian letters as separator between numbers, then the whole numbering will be RTL. Characters like zwnj or U+002D HYPHEN-MINUS not work.

> A possible ugly fix would be to scan the numeric string for multiple decimal
> separators. If more than one is found, the string is split; then numbers and
> separators are individually sent to output.
> 
> However this does not solve the case of a level-2 numbering of the form 2.1
> without final dot.

Nope, this solution isn't good. In some situations like versioning it make wrong text.

Comment 7 ajlittoz 2023-11-08 09:09:19 UTC

(In reply to M.Mahdi from comment #6)
> Anyway, I tested U+200C ZERO WIDTH NON-JOINER in a
> regular RTL text and in this case it did nothing and the issue is still
> there.

I know Arabic/Persian letters have 4 shapes: isolated, initial, middle and final. What is the effect of zwnj if you put it inside a word, e.g. w <zwnj> o <zwnj> rd? Would the w and o (I know, weak vowels are not written in Arabic; this is just an example) take the isolated shape? And rd be written as initial+final?

The case of U+002E FULL stop may need the use of U+200F RIGHT-TO-LEFT MARK to force it RTL because its Unicode direction property is "neutral". Then perhaps we'll get correct ordering.

Comment 8 Khaled Hosny 2023-11-08 21:40:20 UTC


*** This bug has been marked as a duplicate of bug 149824 ***

Comment 9 M.Mahdi 2023-11-09 05:59:35 UTC

(In reply to ajlittoz from comment #7)

> I know Arabic/Persian letters have 4 shapes: isolated, initial, middle and
> final. What is the effect of zwnj if you put it inside a word, e.g. w <zwnj>
> o <zwnj> rd? Would the w and o (I know, weak vowels are not written in
> Arabic; this is just an example) take the isolated shape? And rd be written
> as initial+final?

It depends on character that all 4 shapes are different or not. If we assume the character have these 4 shapes then yes, w and o are isolated and rd is initial+final.

> The case of U+002E FULL stop may need the use of U+200F RIGHT-TO-LEFT MARK
> to force it RTL because its Unicode direction property is "neutral". Then
> perhaps we'll get correct ordering.

Maybe. I tested it and works in this form:
1.<RTL MARK>2.<RTL MARK>3. Text
But I think the main thing in there is that we should be able to change intermediate characters, what is not possible currently. Am I right?