Bug 106306 - RTL: Wrong text language detection for punctuation at the beginning of sentence
Summary: RTL: Wrong text language detection for punctuation at the beginning of sentence
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected) release
Hardware: All All
: medium normal
Assignee: Not Assigned
Depends on:
Blocks: RTL-CTL Language-Detection
  Show dependency treegraph
Reported: 2017-03-03 20:43 UTC by Hossein
Modified: 2022-08-23 00:44 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:
Regression By:

An examle of wrong output with double and single quotation mark. (22.84 KB, image/png)
2017-03-03 20:45 UTC, Hossein
sample (7.99 KB, application/vnd.oasis.opendocument.text)
2017-10-30 19:19 UTC, Yousuf Philips (jay) (retired)

Note You need to log in before you can comment on or make changes to this bug.
Description Hossein 2017-03-03 20:43:35 UTC
I am a Persian/Farsi user, and I usually work with Persian/Farsi language documents. Because of this, I use "Persian" locale in Libreoffice.
Now that I want to create an English language document, I change the keyboard into English, and change the paragraph to left-to-right, and start typing. When I closely look at the status bar, I see Persian in the language section of the status bar, and I should type at least one character to see Englih language there. This may not seem to create problems at first, but actually it does. When I want to write quotation mark, it uses the Persian/Farsi quotation mark, and not the correct English one.

Steps to Reproduce:
1. Set locale to Persian in Options > Language Settings > Languages > Locale setting
2. Change the paragraph to LTR
3. Start typing something with single or double quotation mark like: "Test" or 'Test'

Actual Results:  
You will see that the first quotation mark is shown as « which is wrong.

Expected Results:
You should see the correct English quotation marks, double quotation mark " or single quotation mark '.

Reproducible: Always

User Profile Reset: No

Additional Info:
This seems to be created in LibreOffice 5.3, which fixed a lot of  text rendering issues. The locale prolbems are not confined to this. Translating numerals into appropriate shapes according to the context is also wrong in LibreOffice 5.3. If you set locale to Persian, you will see all the numerals are Hindi in a completely English and LTR Impress document.

User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0
Comment 1 Hossein 2017-03-03 20:45:05 UTC
Created attachment 131615 [details]
An examle of wrong output with double and single quotation mark.
Comment 2 m.a.riosv 2017-03-04 01:36:32 UTC
And selecting 'English' as font language for character?, or with double-click on the status bar language to select English?
Comment 3 QA Administrators 2017-09-29 08:57:42 UTC Comment hidden (obsolete)
Comment 4 Xisco Faulí 2017-10-30 10:51:21 UTC Comment hidden (obsolete)
Comment 5 Yousuf Philips (jay) (retired) 2017-10-30 19:19:09 UTC
Can repo it with arabic locale.

It treats the first single or double quotes as if it in the ctl language and the second quote after typing an english word as latin language.

Build ID: 43d6b11a5c1dda0cc2c1e06c768eece25051a56c
CPU threads: 2; OS: Linux 4.4; UI render: default; VCL: gtk2; 
Locale: ar-AE (en_US.UTF-8); Calc: group
Comment 6 Yousuf Philips (jay) (retired) 2017-10-30 19:19:48 UTC
Created attachment 137380 [details]
Comment 7 V Stuart Foote 2017-10-30 20:00:39 UTC
Isn't this a Unicode implementation issue?

Don't these transitions between language scripts depend on our ICU library handling? But they still need additional boundary logic--otherwise as here where Unicode usage is not defined to a script, i.e. punctuation, symbols, numbers we get this type of issue at script transition(s).

Is there a better way to detect/toggle word boundaries?

[1] http://unicode.org/reports/tr29/#Word_Boundaries
Comment 8 Hiunn-hué 2017-10-31 10:05:33 UTC Comment hidden (no-value)
Comment 9 خالد حسني 2017-10-31 14:02:53 UTC
(In reply to V Stuart Foote from comment #7)
> Isn't this a Unicode implementation issue?

AFAIK, no. The itemization of text into Western/CTL/Asian (or only three categories) is done by Writer and/or other LibreOffice internal code.

My guess is  that is is just using the default languages for common characters and then it does not look back when it sees the first script-specific character.
Comment 10 Omer Zak 2017-11-14 11:03:27 UTC
Still happens in:

Build ID: 9050854c35c389466923f0224a36572d36cd471a
CPU threads: 8; OS: Linux 4.9; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.utf8); Calc: group

OS: Debian 64bit Stretch (Debian 9.2, with some backported packages)
Comment 11 QA Administrators 2018-11-15 03:43:08 UTC Comment hidden (obsolete)
Comment 12 Usama 2019-05-11 15:07:57 UTC
Confirmed on

Build ID: 98630a0bd49bd80652145a21e4e0d0ded792b36b
CPU threads: 4; OS: Linux 4.4; UI render: default; VCL: gtk3; 
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2019-05-04_04:44:35
Locale: tr-TR (tr_TR.UTF-8); UI-Language: en-US
Calc: threaded
Comment 13 Volga 2019-11-10 19:29:15 UTC
Does ODF 1.3 got solution for this?
Comment 14 Volga 2021-07-28 07:40:22 UTC
This is still reproduce in

Version: (x64) / LibreOffice Community
Build ID: 32efc3b7f3a71cfa6a7fa3f6c208333df48656cc
CPU threads: 4; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: zh-CN (zh_CN); UI: zh-CN
Calc: threaded
Comment 15 Hossein 2022-08-23 00:44:05 UTC
Still reproducible with the latest LO 7.5 master:

Version: / LibreOffice Community
Build ID: 947a6455d23bff290319313734c8c30e8f495773
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: fa-IR (en_US.UTF-8); UI: en-US
Calc: threaded