Bug 105091 - Using unicode caracter 2042 (asterism) switches language to Hindi
Summary: Using unicode caracter 2042 (asterism) switches language to Hindi
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
6.1.3.2 release
Hardware: All All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Language-Detection
  Show dependency treegraph
 
Reported: 2017-01-04 09:13 UTC by MonsieurLune
Modified: 2018-12-16 09:42 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description MonsieurLune 2017-01-04 09:13:56 UTC
Description:
When using the U2042 unicode (asterism - three stars) char in a document, any style set or reset changes the language to Hindi and uses a wrong font for display.
This usue occurs on Windows 10 x64, ArchLinux x64 (up to date LO 5.2.4.2), LUbuntu x64 (LO 5.2.0.4) and others Linux distribs.

Steps to Reproduce:
1. Use a font with asterism (U2042) - I'm using a modified version of DayRoman I can share if needed
2. Set this font to any style (for example the Default Style)
3. Pick up the char in the Insert>Special char dialog
4. Reset the style (double clicking on it on the sidebar, or use Format>Clear direct formatting)

Actual Results:  
- Langage is set to Hindi, and the font in the document is not the original font anymore (sometimes Mangal - even if not present on the system - sometimes to the initial value of the Style, before modification).
- When checking the style itself, the font is still set to the right one.
- When selecting the char properties, the language and the assigned fonts are the right ones (french and Day Roman in my case).

Expected Results:
- U2042 is part of the General Ponctuation section of Unicode and should not fall to Hindi
- Fonts and language assigned to styles should not be changed without the express consent of the user
- The displayed font and the information in the char properties do not match
- The displayed language (in the information line at the bottom of the screen) and the information in the char properties do not match


Reproducible: Always

User Profile Reset: Yes

Additional Info:
See "Sample U2024.odt" attached


User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0
Comment 1 MonsieurLune 2017-01-04 09:16:45 UTC
I could not add the sample file as an attachment (with embedded fonts, the odt file is about 19Mo), but you can get it there: http://www.mirari.fr/7Y9K
Comment 2 MonsieurLune 2017-01-04 10:50:36 UTC
Another important details I found after trying to figure out what's happening:
Tools>Options>Language Settings>Languages>Default Document Languages>Complex Script is unchecked.
When checking it and setting it to "no proof", the 'Paragraph style' window allows to use a specified font for complex scripts that solves the problem.

I think that there's a triple flaw there:
- Unicode caracter U2024 should not be considered as "complex script", as it is part of "General ponctuaction" (https://en.wikipedia.org/wiki/General_Punctuation)
- Unchecking the 'Complex Script' in the Options should not lead to unexpected languages changes or at least have the same behaviour as checking+"no proof".
- The option "only for current document" should be in the document options (File>Properties) not in the general options as you never know how to change the behaviour either for the current document and for the program.


So it's somehow related to issue https://bugs.documentfoundation.org/show_bug.cgi?id=39935 (and all duplicate bugs), but with the specificity that U2024 is part of the occidental charset.
Comment 3 Buovjaga 2017-01-05 09:49:35 UTC
(In reply to MonsieurLune from comment #1)
> I could not add the sample file as an attachment (with embedded fonts, the
> odt file is about 19Mo), but you can get it there: http://www.mirari.fr/7Y9K

Confirmed the Hindi. Actually already in 3.6, even though it does not support the embedded font feature (the stars are not shown. 3.3 does not show Hindi, but I'm not sure if this can be called a regression.. maybe it is some other incompatibility.

MonsieurLune: you seem to know how to use your computer(s), so how would you like joining the QA team? https://wiki.documentfoundation.org/QA/GetInvolved

Arch Linux 64-bit, KDE Plasma 5
Version: 5.4.0.0.alpha0+
Build ID: 1a58cdf8af1aba52ce0a376666dd7d742234d7cf
CPU Threads: 8; OS Version: Linux 4.8; UI Render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on January 4th 2016

Arch Linux 64-bit
Version 3.6.7.2 (Build ID: e183d5b)
Comment 4 ⁨خالد حسني⁩ 2017-01-27 15:33:31 UTC
I can’t reproduce this on master following the mentioned steps, the symbol takes the language of the text around it.
Comment 5 MonsieurLune 2017-01-30 11:11:13 UTC
I still have the issue on the master build:
5.4.0.0.alpha0+
Build ID: c6dd735afb2e1b3837c4f8c5659f52fafab4c56f
CPU Threads: 2
OS Version: Linux 4.4
UI Render: default
VCL: gtk2
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2017-01-30_03:01:16
Locale: fr-FR (fr_FR.UTF-8)
Calc: group

Simply open a new document, insert the U2042 char, et you're set to Hindi.
Comment 6 QA Administrators 2018-12-15 03:56:30 UTC Comment hidden (obsolete)
Comment 7 Roman Kuznetsov 2018-12-15 07:46:32 UTC
(In reply to MonsieurLune from comment #5)
> I still have the issue on the master build:
> 5.4.0.0.alpha0+
> Build ID: c6dd735afb2e1b3837c4f8c5659f52fafab4c56f
> CPU Threads: 2
> OS Version: Linux 4.4
> UI Render: default
> VCL: gtk2
> TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time:
> 2017-01-30_03:01:16
> Locale: fr-FR (fr_FR.UTF-8)
> Calc: group
> 
> Simply open a new document, insert the U2042 char, et you're set to Hindi.

I used shortcat Alt+X for inserting unicode symbol 2042 in

Version: 6.3.0.0.alpha0+
Build ID: 3c964980da07892a02d5ac721d80558c459532d0
CPU threads: 4; OS: Windows 6.1; UI render: default; VCL: win; 
TinderBox: Win-x86@42, Branch:master, Time: 2018-12-12_02:07:45
Locale: ru-RU (ru_RU); UI-Language: en-US
Calc: threaded

and I still have Russian language for my text in document.
Comment 8 V Stuart Foote 2018-12-15 15:14:42 UTC
Can not confirm on Windows 10 Home 64-bit en-US (1803) with
Version: 6.1.4.1 (x64)
Build ID: 25073d18caee244880112e52c4a7e71f6081b3a9
CPU threads: 4; OS: Windows 10.0; UI render: GL; 
Locale: en-US (en_US); Calc: CL

or current master/6.3 build.

And in fact if I open the ODT archive of the sample document from comment 1 a review of its content.xml shows that the code point for the glyph is *not* U+2042 but rather is U+07fa

NKo is Unicode range 07c0-07ff, and is a RTL script. So its use would trigger the ICU lib BiDi handling, which IIUC in LibreOffice *would* trigger the CTL handling, and which gets assigned Hindi with a default user profile.

Not sure how OP generated the character when inserting into sample document, but when picking U+2042 ASTERISM from the SpecialCharacter dialog, or an external app (e.g. BabelMap), the U+2042 glyph is rendered correctly to canvas and there is no BiDi shift to RTL or change to CTL (Hindi default).

So, WFM. And IMHO => NAB

Gues OP can reopen if proving details on method of character input that actually reproduces the wrong codepoint.
Comment 9 MonsieurLune 2018-12-16 09:42:55 UTC
@V Stuart Foote
Thank you for your test. I checked the asterism place in my font and the problem is there: it's been wrongfully placed in U+Od2042 (U+0x07FA) instead of U+0x2042.
I confirm that the issue can be closed.