Bug 66791 - FORMATTING: Incorrect application of "Asian text font" for quotation marks when the paragraph contains a mixture of western and asian characters
Summary: FORMATTING: Incorrect application of "Asian text font" for quotation marks wh...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 101751 124657 126387 134350 (view as bug list)
Depends on:
Blocks: CJK Language-Detection
  Show dependency treegraph
 
Reported: 2013-07-10 19:16 UTC by simonjwiles
Modified: 2024-09-27 17:52 UTC (History)
15 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot (6.99 KB, image/png)
2013-07-10 19:16 UTC, simonjwiles
Details
test cases of English and Chinese quotes (58.94 KB, application/vnd.oasis.opendocument.text)
2013-09-03 03:14 UTC, Kevin Suo
Details
screenshot_including_complex_text_layout (95.42 KB, image/png)
2017-10-29 15:03 UTC, Hiunn-hué
Details
Screenshot on WordPad (30.80 KB, image/png)
2023-06-29 20:53 UTC, Volga
Details
The same file opened with LibreOffice Writer (17.65 KB, image/png)
2023-06-29 20:59 UTC, Volga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description simonjwiles 2013-07-10 19:16:15 UTC
Created attachment 82294 [details]
Screenshot

If I have an East-Asian character in my (predominantly English) document, followed by a quotation mark (opening or closing), the quotation mark takes the font settings from the "Asian text font" section of the style definition.  This results in very ugly copy.


Steps to reproduce:
1. Type some western text into LO Writer, surrounded by quotation marks (e.g. "sun").
2. Move the cursor to before the opening quotation mark, and type (or paste -- the IME is not relevant) an East-Asian character (e.g. 日).


Current behaviour:
The initial quotation mark takes the settings from "Asian text font" instead of "Western text font".  The behaviour is the same if a (normal-width, western) space comes between the East-Asian character and the opening quotation mark.


Expected behaviour:
The opening quotation mark, being surrounded by a normal-width space on one side, and a Latin letter ("s" in this case) on the other, should take the "Western text font" settings.


The only way to "work-around" this problem is to select the characters that have been rendered incorrectly and manually force the application of the "Western text font" settings.  Of course, this breaks if "Clear Direct Formatting" is used.

It's not clear to me why typing an opening quotation mark immediately after an East-Asian character results in the insertion of Asian punctuation (e.g. 「 or 『).  If I wanted Asian punctuation, I would, of course, type Asian punctuation.  I don't know if this is connected.


ask.libreoffice.org link: http://ask.libreoffice.org/en/question/19750/problem-with-full-width-asian-punctuation/

May perhaps be linked to this bug: https://bugs.freedesktop.org/show_bug.cgi?id=60106


I'm currently using LO Version 4.0.4.2 (Build ID: 400m0(Build:2)) on Linux Mint 14 amd64, but the problem has been around as long as I can remember and on every platform I've tried.
Comment 1 Kevin Suo 2013-09-03 03:14:09 UTC
Created attachment 85096 [details]
test cases of English and Chinese quotes

I confirm this bug in LibreOffice 4.0.5.2 and 4.1.1.2.

I did some test in the attached file, see the highlighted part. Quotes are incorrect when in the first line or after a different language.

When disable "double quotes replacement" in autocorrection option, everything is OK, so its a replacement problem.
Comment 2 Kevin Suo 2014-06-25 06:24:35 UTC
Today I tested attachment 85096 [details] in 4.3.0.1, 
And it seems that it's getting worse.
All the start quote which are at the beginning of paragraph are always shown as "half-width", regardless of whether the following chars are westen or Asian.
Comment 3 QA Administrators 2015-07-18 17:43:52 UTC Comment hidden (obsolete)
Comment 4 simonjwiles 2015-07-18 17:57:48 UTC
can confirm this bug is still present:

Version: 4.4.4.3
Build ID: 40m0(Build:3)
Locale: en_GB.UTF-8

(LO from "LibreOffice Fresh" PPA, on Linux Mint 17.2 (package base == Trusty).
Comment 5 QA Administrators 2016-09-20 10:18:03 UTC Comment hidden (obsolete)
Comment 6 Volga 2016-12-12 02:55:27 UTC
I think using “East Asian text font” is more suitable.
Comment 7 tommy27 2017-02-18 15:08:08 UTC
(In reply to Volga from comment #6)
> I think using “East Asian text font” is more suitable.

@simon
does this helps?
Comment 8 Eric Ding 2017-06-23 03:07:33 UTC
Four years after the initial report, this bug still exists in LibreOffice 5.3.4 (running on Windows) with a mix of East Asian (CJK) and non-CJK fonts and text.
Comment 9 Hiunn-hué 2017-10-29 15:03:36 UTC
Created attachment 137355 [details]
screenshot_including_complex_text_layout

This also happens to languages like Thai (Complex text layout), please see attached PNG file.

It's actually quite annoying ...

--
Version: 6.0.0.0.alpha1+
Build ID: 81d50fd137fdf712a0f37988217c43278cf24c26
CPU threads: 4; OS: Linux 4.4; UI render: default; VCL: gtk2; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2017-10-28_00:31:27
Locale: zh-TW (zh_TW.UTF-8); Calc: group
--
Comment 10 QA Administrators 2018-11-01 03:52:22 UTC Comment hidden (obsolete)
Comment 11 Eric Ding 2018-11-07 06:12:28 UTC
I confirm that this bug is still present in:

Version: 6.1.3.2
Build ID: 86daf60bf00efa86ad547e59e09d6bb77c699acb
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk2; 
Locale: en-US (en_US.UTF-8); Calc: group threaded
Comment 12 Volga 2019-05-12 03:12:03 UTC
(In reply to tommy27 from comment #7)
> (In reply to Volga from comment #6)
> > I think using “East Asian text font” is more suitable.
> 
> @simon
> does this helps?
Oh I made a misunderstand, but I thought that is more proper name.
Comment 13 Volga 2019-05-12 03:23:53 UTC Comment hidden (obsolete)
Comment 14 Volga 2019-05-13 03:46:10 UTC Comment hidden (obsolete)
Comment 15 Volga 2019-05-13 07:11:32 UTC
*** Bug 124657 has been marked as a duplicate of this bug. ***
Comment 16 Volga 2019-07-15 11:21:03 UTC
*** Bug 126387 has been marked as a duplicate of this bug. ***
Comment 17 Volga 2019-07-16 13:35:22 UTC
Anyone who has an idea for this?
Comment 18 Liaison to zh-CN User Community 2019-07-28 07:18:40 UTC
The core issue of this bug, IMHO, is that curly double quotation marks (U+201C and U+201D) are widely used in both English and (simplified) Chinese, so LO has no way to know which style (western or Asian) it should apply to these quotation marks, and has to rely on context.

There are potentially more characters that cause such problem, the most obvious being single quotation marks.  But I've also seen the middle dot (U+00B7) and em dash (U+2014) with similar problems.

The quotation marks are especially visible because the current bug makes them unsymmetrical, which brings quite some visual discomfort.  So the obvious brute-force solution is that instead of determining their style according to context, LO can just make sure the quotation marks are consistently using the same style, either through some language/locale setting as comment 14 mentioned, or as an special setting that can be changed by the user.  In other words, treat quotation marks differently than the other characters.
Comment 19 Kevin Suo 2022-11-30 14:58:18 UTC
*** Bug 101751 has been marked as a duplicate of this bug. ***
Comment 20 Volga 2023-06-22 11:40:18 UTC
Mr. Khaled, what do you think of?
Comment 21 ⁨خالد حسني⁩ 2023-06-23 17:09:37 UTC
(In reply to Volga from comment #20)
> Mr. Khaled, what do you think of?

I checked MS Word, and it seems to treat the quotation marks as western text unless their language is set to Chinese, then it treats them as Asian text regardless of the context.

This seems simpler and more reliable than what we currently do. I wounder if it does this to all punctuation characters?

It feels less smart, though. The smart, and more Unicode-compliant way is to try to resolve common characters based on context like we do know, except that our implementation is buggy.

I’m not sure which is the better way, to be honest, as either option has compatibility considerations (either with older LO versions if we go MS way, or both if we fix our current way).

I’m not sure who should decide this.
Comment 22 ⁨خالد حسني⁩ 2023-06-23 17:09:56 UTC
*** Bug 134350 has been marked as a duplicate of this bug. ***
Comment 23 Volga 2023-06-24 07:01:31 UTC
I've seen someone made a tsukkomi for a long time.
https://yongweiwu.wordpress.com/2014/12/18/a-complaint-of-odfs-asian-language-support/
Although MS Word set the good example for this, I believe implement a smart rules to assign would be better choice. In this way LibreOffice would be able to assign font face for such punctuations to make them match the mostly used language/locale without breaking down text style or file structure.
Comment 24 Volga 2023-06-29 20:53:26 UTC
Created attachment 188133 [details]
Screenshot on WordPad

From the last comment I found this test file by blog author
https://yongweiwu.files.wordpress.com/2014/12/odf_test.odt
Then I remembered WordPad, a native word processor in Windows, so let's see what happened on WordPad.
Comment 25 Volga 2023-06-29 20:59:49 UTC
Created attachment 188134 [details]
The same file opened with LibreOffice Writer

Then this screenshot is made after the same file opened with LibreOffice Writer, note both two apps are zh-CN locale when I see them. So Khaled, what happened if you open this ODT in WordPad or MS Word?
Comment 26 himajin100000 2023-07-01 08:42:12 UTC
*** Bug 134350 has been marked as a duplicate of this bug. ***
Comment 27 Volga 2023-07-21 16:13:25 UTC
Have you checked Windows WordPad so far?
Comment 28 Volga 2023-07-28 17:43:45 UTC
Seen from the commit d6efe8c302b81886706e18640148c51cf7883bbf, I think there is an  to fix this bug, from which I believe this could be done by assigning font face to such punctuations dependes on surrounding texts.

For characters that could be affcted by this bug, see:
https://www.w3.org/International/clreq/#tables_of_chinese_punctuation_marks
https://www.w3.org/International/jlreq/#cl-01
https://www.w3.org/International/klreq/#chars-grouping
Comment 29 Kevin Suo 2023-12-08 01:24:16 UTC
See a related articles:

中西文混合排版中标点符号的渲染 https://blog.1a23.com/2020/06/28/zhong-xi-wen-hunhe-paiban-zhong-biaodian-fuhao-de-xuanran/

中英混排中的标点符号问题 https://www.hutrua.com/blog/2018/07/22/punctuation.html
Comment 30 Volga 2024-09-27 17:48:51 UTC Comment hidden (no-value)
Comment 31 Volga 2024-09-27 17:52:29 UTC
Unicode 16.0 made new definations for four quotation marks encoded in General Punctuation block. To my eyes, if they are accomplished with U+FE01, they should be rendered with CJK fonts whatsoever. 

https://www.unicode.org/charts/PDF/Unicode-16.0/U160-2000.pdf