Bug 106306 - RTL: Wrong text language detection for punctuation at the beginning of sentence
Summary: RTL: Wrong text language detection for punctuation at the beginning of sentence
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected) release
Hardware: All All
: medium normal
Assignee: Not Assigned
Depends on:
Blocks: RTL-CTL Language-Detection
  Show dependency treegraph
Reported: 2017-03-03 20:43 UTC by Hossein
Modified: 2018-11-15 03:43 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:

An examle of wrong output with double and single quotation mark. (22.84 KB, image/png)
2017-03-03 20:45 UTC, Hossein
sample (7.99 KB, application/vnd.oasis.opendocument.text)
2017-10-30 19:19 UTC, Yousuf Philips (jay) (retired)

Note You need to log in before you can comment on or make changes to this bug.
Description Hossein 2017-03-03 20:43:35 UTC
I am a Persian/Farsi user, and I usually work with Persian/Farsi language documents. Because of this, I use "Persian" locale in Libreoffice.
Now that I want to create an English language document, I change the keyboard into English, and change the paragraph to left-to-right, and start typing. When I closely look at the status bar, I see Persian in the language section of the status bar, and I should type at least one character to see Englih language there. This may not seem to create problems at first, but actually it does. When I want to write quotation mark, it uses the Persian/Farsi quotation mark, and not the correct English one.

Steps to Reproduce:
1. Set locale to Persian in Options > Language Settings > Languages > Locale setting
2. Change the paragraph to LTR
3. Start typing something with single or double quotation mark like: "Test" or 'Test'

Actual Results:  
You will see that the first quotation mark is shown as « which is wrong.

Expected Results:
You should see the correct English quotation marks, double quotation mark " or single quotation mark '.

Reproducible: Always

User Profile Reset: No

Additional Info:
This seems to be created in LibreOffice 5.3, which fixed a lot of  text rendering issues. The locale prolbems are not confined to this. Translating numerals into appropriate shapes according to the context is also wrong in LibreOffice 5.3. If you set locale to Persian, you will see all the numerals are Hindi in a completely English and LTR Impress document.

User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0
Comment 1 Hossein 2017-03-03 20:45:05 UTC
Created attachment 131615 [details]
An examle of wrong output with double and single quotation mark.
Comment 2 m.a.riosv 2017-03-04 01:36:32 UTC
And selecting 'English' as font language for character?, or with double-click on the status bar language to select English?
Comment 3 QA Administrators 2017-09-29 08:57:42 UTC Comment hidden (obsolete)
Comment 4 Xisco Faulí 2017-10-30 10:51:21 UTC Comment hidden (obsolete)
Comment 5 Yousuf Philips (jay) (retired) 2017-10-30 19:19:09 UTC
Can repo it with arabic locale.

It treats the first single or double quotes as if it in the ctl language and the second quote after typing an english word as latin language.

Build ID: 43d6b11a5c1dda0cc2c1e06c768eece25051a56c
CPU threads: 2; OS: Linux 4.4; UI render: default; VCL: gtk2; 
Locale: ar-AE (en_US.UTF-8); Calc: group
Comment 6 Yousuf Philips (jay) (retired) 2017-10-30 19:19:48 UTC
Created attachment 137380 [details]
Comment 7 V Stuart Foote 2017-10-30 20:00:39 UTC
Isn't this a Unicode implementation issue?

Don't these transitions between language scripts depend on our ICU library handling? But they still need additional boundary logic--otherwise as here where Unicode usage is not defined to a script, i.e. punctuation, symbols, numbers we get this type of issue at script transition(s).

Is there a better way to detect/toggle word boundaries?

[1] http://unicode.org/reports/tr29/#Word_Boundaries
Comment 8 Hiunn-hué 2017-10-31 10:05:33 UTC Comment hidden (no-value)
Comment 9 Khaled Hosny (inactive) 2017-10-31 14:02:53 UTC
(In reply to V Stuart Foote from comment #7)
> Isn't this a Unicode implementation issue?

AFAIK, no. The itemization of text into Western/CTL/Asian (or only three categories) is done by Writer and/or other LibreOffice internal code.

My guess is  that is is just using the default languages for common characters and then it does not look back when it sees the first script-specific character.
Comment 10 Omer Zak 2017-11-14 11:03:27 UTC
Still happens in:

Build ID: 9050854c35c389466923f0224a36572d36cd471a
CPU threads: 8; OS: Linux 4.9; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.utf8); Calc: group

OS: Debian 64bit Stretch (Debian 9.2, with some backported packages)
Comment 11 QA Administrators 2018-11-15 03:43:08 UTC
** Please read this message in its entirety before responding **

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)

If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword

Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team