Bug 106306 - RTL: Wrong text language detection for punctuation at the beginning of sentence (with locale fa_IR)
Summary: RTL: Wrong text language detection for punctuation at the beginning of senten...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.3.0.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: RTL-CTL Language-Detection
  Show dependency treegraph
 
Reported: 2017-03-03 20:43 UTC by Hossein
Modified: 2023-06-24 07:04 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
An examle of wrong output with double and single quotation mark. (22.84 KB, image/png)
2017-03-03 20:45 UTC, Hossein
Details
sample (7.99 KB, application/vnd.oasis.opendocument.text)
2017-10-30 19:19 UTC, Yousuf Philips (jay) (retired)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hossein 2017-03-03 20:43:35 UTC
Description:
I am a Persian/Farsi user, and I usually work with Persian/Farsi language documents. Because of this, I use "Persian" locale in Libreoffice.
Now that I want to create an English language document, I change the keyboard into English, and change the paragraph to left-to-right, and start typing. When I closely look at the status bar, I see Persian in the language section of the status bar, and I should type at least one character to see Englih language there. This may not seem to create problems at first, but actually it does. When I want to write quotation mark, it uses the Persian/Farsi quotation mark, and not the correct English one.

Steps to Reproduce:
1. Set locale to Persian in Options > Language Settings > Languages > Locale setting
2. Change the paragraph to LTR
3. Start typing something with single or double quotation mark like: "Test" or 'Test'

Actual Results:  
You will see that the first quotation mark is shown as « which is wrong.

Expected Results:
You should see the correct English quotation marks, double quotation mark " or single quotation mark '.


Reproducible: Always

User Profile Reset: No

Additional Info:
This seems to be created in LibreOffice 5.3, which fixed a lot of  text rendering issues. The locale prolbems are not confined to this. Translating numerals into appropriate shapes according to the context is also wrong in LibreOffice 5.3. If you set locale to Persian, you will see all the numerals are Hindi in a completely English and LTR Impress document.


User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0
Comment 1 Hossein 2017-03-03 20:45:05 UTC
Created attachment 131615 [details]
An examle of wrong output with double and single quotation mark.
Comment 2 m_a_riosv 2017-03-04 01:36:32 UTC
And selecting 'English' as font language for character?, or with double-click on the status bar language to select English?
Comment 3 QA Administrators 2017-09-29 08:57:42 UTC Comment hidden (obsolete)
Comment 4 Xisco Faulí 2017-10-30 10:51:21 UTC Comment hidden (obsolete)
Comment 5 Yousuf Philips (jay) (retired) 2017-10-30 19:19:09 UTC
Can repo it with arabic locale.

It treats the first single or double quotes as if it in the ctl language and the second quote after typing an english word as latin language.

Version: 6.0.0.0.alpha1+
Build ID: 43d6b11a5c1dda0cc2c1e06c768eece25051a56c
CPU threads: 2; OS: Linux 4.4; UI render: default; VCL: gtk2; 
Locale: ar-AE (en_US.UTF-8); Calc: group
Comment 6 Yousuf Philips (jay) (retired) 2017-10-30 19:19:48 UTC
Created attachment 137380 [details]
sample
Comment 7 V Stuart Foote 2017-10-30 20:00:39 UTC
Isn't this a Unicode implementation issue?

Don't these transitions between language scripts depend on our ICU library handling? But they still need additional boundary logic--otherwise as here where Unicode usage is not defined to a script, i.e. punctuation, symbols, numbers we get this type of issue at script transition(s).

Is there a better way to detect/toggle word boundaries?

=-ref-=
[1] http://unicode.org/reports/tr29/#Word_Boundaries
Comment 8 Hiunn-hué 2017-10-31 10:05:33 UTC Comment hidden (no-value)
Comment 9 ⁨خالد حسني⁩ 2017-10-31 14:02:53 UTC
(In reply to V Stuart Foote from comment #7)
> Isn't this a Unicode implementation issue?

AFAIK, no. The itemization of text into Western/CTL/Asian (or only three categories) is done by Writer and/or other LibreOffice internal code.

My guess is  that is is just using the default languages for common characters and then it does not look back when it sees the first script-specific character.
Comment 10 Omer Zak 2017-11-14 11:03:27 UTC
Still happens in:

Version: 6.0.0.0.alpha1+
Build ID: 9050854c35c389466923f0224a36572d36cd471a
CPU threads: 8; OS: Linux 4.9; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.utf8); Calc: group

OS: Debian 64bit Stretch (Debian 9.2, with some backported packages)
Comment 11 QA Administrators 2018-11-15 03:43:08 UTC Comment hidden (obsolete)
Comment 12 Usama 2019-05-11 15:07:57 UTC
Confirmed on

Version: 6.3.0.0.alpha0+
Build ID: 98630a0bd49bd80652145a21e4e0d0ded792b36b
CPU threads: 4; OS: Linux 4.4; UI render: default; VCL: gtk3; 
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2019-05-04_04:44:35
Locale: tr-TR (tr_TR.UTF-8); UI-Language: en-US
Calc: threaded
Comment 13 Volga 2019-11-10 19:29:15 UTC
Does ODF 1.3 got solution for this?
Comment 14 Volga 2021-07-28 07:40:22 UTC
This is still reproduce in

Version: 7.2.0.1 (x64) / LibreOffice Community
Build ID: 32efc3b7f3a71cfa6a7fa3f6c208333df48656cc
CPU threads: 4; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: zh-CN (zh_CN); UI: zh-CN
Calc: threaded
Comment 15 Hossein 2022-08-23 00:44:05 UTC
Still reproducible with the latest LO 7.5 master:

Version: 7.5.0.0.alpha0+ / LibreOffice Community
Build ID: 947a6455d23bff290319313734c8c30e8f495773
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: fa-IR (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 16 Eyal Rozenberg 2023-02-10 13:39:10 UTC
I _can't_ reproduce this with:

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: ad387d5b984c6666906505d25685065f710ed55d
CPU threads: 4; OS: Linux 6.1; UI render: default; VCL: gtk3
Locale: fa-IR (en_IL); UI: en-US
Calc: threaded

but then, I never reproduced this before. I suspect perhaps the reproduction instructions are partial? Perhaps auto-correct needs to be on?
Comment 17 Hossein 2023-02-10 23:45:55 UTC
Still reproducible with the latest LO 7.6 dev master:
Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 72959cc2b36749a779b56522f27e290731187043
CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: fa-IR (fa_IR); UI: en-US
Calc: threaded

You can reproduce it with a few steps:

1. Create a new Writer document.

2. Go to "Tools > Options > Language Settings > Languages > Default Language for Documents > Complex text layout". Set the combo box to "Persian".

3. Make the paragraph LTR using "left ctrl+ left shift"

4. Type -> "test" (including the double quotes).

You will get «Test" and this is wrong.
Comment 18 Eyal Rozenberg 2023-02-11 08:26:16 UTC
(In reply to Hossein from comment #17)
> You will get «Test" and this is wrong.

I followed your instructions, and got "test". No change of quotes.
Comment 19 Buovjaga 2023-02-17 08:47:31 UTC
I uncommented fa_IR in my /etc/locale.gen and ran sudo locale-gen

Then I launched LibreOffice with

LC_ALL=fa_IR.UTF-8 libreoffice

I could reproduce the issue.

Arch Linux 64-bit, X11
Version: 7.5.0.3 (X86_64) / LibreOffice Community
Build ID: 50(Build:3)
CPU threads: 8; OS: Linux 6.1; UI render: default; VCL: kf5 (cairo+xcb)
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
7.5.0-1
Calc: threaded
Comment 20 Buovjaga 2023-02-17 08:48:10 UTC
(In reply to Buovjaga from comment #19)
> Arch Linux 64-bit, X11
> Version: 7.5.0.3 (X86_64) / LibreOffice Community
> Build ID: 50(Build:3)
> CPU threads: 8; OS: Linux 6.1; UI render: default; VCL: kf5 (cairo+xcb)
> Locale: fi-FI (fi_FI.UTF-8); UI: en-US
> 7.5.0-1
> Calc: threaded

Ignore the locale here, this is just my boilerplate version paste.
Comment 21 Volga 2023-06-24 07:04:08 UTC
I believe LibreOffice could implement smart rules to assign font face, font size, text direction, etc. for such punctuations.