Created attachment 123829 [details] This file demonstrate the bug on two versions of LibO [Description] In vertical layout, some characters in Noto Sans CJK are wrongly converted to other characters. Please see attachment. [Environment] OS: elementary OS 0.3.2 Freya (based on Ubuntu 14.04) LibreOffice 5.1.1.2 (1:5.1.1~rc2-0ubuntu1~trusty0) LibreOffice 5.2.0.0.alpha0+ (70dad291a6baf1eff59b61f9d344c10018585dd4)
Created attachment 123830 [details] Example
[Additional Description] This bug report contains two issues: (1) Some characters, ellipsis (…) for example, should rotate 90 degrees in vervical layout, but they don't. This bug depends on the fonts (in my case it's Noto Sans CJK), and it seems to be related with #95836. Note that not only punctuations are affected. The Japanese Kana "ー" doesn't rotate 90 degrees either. (2) Many characters are rendered as different one. For example, the Chinese Bopomofo "ㄡ" becomes Japanese Kana "ア"; the punctuation "」" becomes "〆"; and Kana "ス" becomes Kana "ん". If you copy those and paste to gedit, you'll find they remain the original one.
Created attachment 124153 [details] Screenshot from LibO 5.2 My result does not match yours. I guess I would have to test on Linux. The font: https://github.com/googlei18n/noto-cjk/blob/master/NotoSansCJKtc-Regular.otf Win 7 Pro 64-bit Version: 5.2.0.0.alpha0+ Build ID: 7a7be32e5265f897174f3880adc061dac0203f1f CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2016-03-31_07:39:11 Locale: fi-FI (fi_FI)
Confirmed, Linux-only. Arch Linux 64-bit, KDE Plasma 5 Version: 5.2.0.0.alpha0+ Build ID: 9b0069c2833313ba9fab91f45edd354ba334f94b CPU Threads: 8; OS Version: Linux 4.4; UI Render: default; Locale: fi-FI (fi_FI.UTF-8) Built on April 3rd 2016
Created attachment 128163 [details] Comparison after and before setting SAL_USE_COMMON_LAYOUT=1 in Windows10 This issue no more a Linux only issue after 5.3.0-alpha1. :-( Worse yet, punctuation marks from MS built-in traditional Chinese fonts PMing-Liu and ZhengHei are incorrect now (Highlight in red box.
For reference http://www.unicode.org/reports/tr50/
Created attachment 128179 [details] With SAL_USE_COMMON_LAYOUT=1 in Ubuntu 16.03 It get better in Linux with SAL_USE_COMMON_LAYOUT=1
Hi Khaled, Here are some findings and my thoughts about processing vertical writing - For common layout, GetVerticalFlagsForScript() returns GF_NONE for USCRIPT_COMMON, so the CJK punctuation marks in the sample do not have chance to be processed processed as its vertical writing form in CommonSalLayout::LayoutText(). OTOH, I'm thinking maybe GetNextRun should merge runs with script type USCRIPT_COMMON with previous run or set it to script type of the whole text if the run contains CJK punctuation marks as mentioned in 5.1 Handling Characters with the Common Script Property in UAX#24. The last thing is about how LayoutText set vertical flags for each glyph. Define rotation purely based on codepoint seems to be error prone. It also depend on availability of vertical writing form in different font.
(In reply to Mark Hung from comment #8) > Hi Khaled, > > Here are some findings and my thoughts about processing vertical writing - > > For common layout, GetVerticalFlagsForScript() returns GF_NONE for > USCRIPT_COMMON, so the CJK punctuation marks in the sample do not have > chance to be processed processed as its vertical writing form in > CommonSalLayout::LayoutText(). We shouldn’t get any USCRIPT_COMMON by the time this function is called, since it should have been resolved into a specific script by script itemization. > OTOH, I'm thinking maybe GetNextRun should merge runs with script type > USCRIPT_COMMON with previous run or set it to script type of the whole text > if the run contains CJK punctuation marks as mentioned in 5.1 Handling > Characters with the Common Script Property in UAX#24. This should be the case already. > The last thing is about how LayoutText set vertical flags for each glyph. > Define rotation purely based on codepoint seems to be error prone. I caouldn’t come up with a better way to do it, any suggestions are highly appreciated. What I have been thinking is that we should itemize the runs by character orientation for vertical text, in addition to the bidi and script itemization we currently do. But I couldn’t find any resources to know how to that, may be you will have a better luck than me. > It also depend on availability of vertical writing form in different font. I think we will have to depend on the fonts doing the right thing for now, after making sure we handle correct fonts correctly we can look into some fallback behavior for defective fonts.
What about this patch: diff --git a/vcl/source/gdi/CommonSalLayout.cxx b/vcl/source/gdi/CommonSalLayout.cxx index 91d7762..42e33ae 100644 --- a/vcl/source/gdi/CommonSalLayout.cxx +++ b/vcl/source/gdi/CommonSalLayout.cxx @@ -547,9 +547,8 @@ bool CommonSalLayout::LayoutText(ImplLayoutArgs& rArgs) if (bVertical) { int nVertFlag; -#if 0 /* XXX: does not work as expected for Common script */ - UErrorCode error = U_ZERO_ERROR; - nVertFlag = GetVerticalFlagsForScript(uscript_getScript(aChar, &error)); +#if 1 /* XXX: does not work as expected for Common script */ + nVertFlag = GetVerticalFlagsForScript(aScriptRun.maScript); #else nVertFlag = GetVerticalFlags(aChar); if (nVertFlag == GF_ROTR) It seems to improve this test file, but breaks parentheses :(
(In reply to Khaled Hosny from comment #9) > We shouldn’t get any USCRIPT_COMMON by the time this function is called, > since it should have been resolved into a specific script by script > itemization. > > > OTOH, I'm thinking maybe GetNextRun should merge runs with script type > > USCRIPT_COMMON with previous run or set it to script type of the whole text > > if the run contains CJK punctuation marks as mentioned in 5.1 Handling > > Characters with the Common Script Property in UAX#24. > > This should be the case already. While dumping the script type, I found there are a few are UCSCRIPT_COMMON: Char=《 Code=300a Script=0 GR_NONE Char=… Code=2026 Script=0 GR_NONE Char=… Code=2026 Script=0 GR_NONE Char=》 Code=300b Script=0 GR_NONE Char=「 Code=300c Script=0 GR_NONE Char=… Code=2026 Script=25 GR_NONE Char=」 Code=300d Script=25 GR_NONE So I mean there is something wrong in GetNextRun > > > The last thing is about how LayoutText set vertical flags for each glyph. > > Define rotation purely based on codepoint seems to be error prone. > > I caouldn’t come up with a better way to do it, any suggestions are highly > appreciated. > > What I have been thinking is that we should itemize the runs by character > orientation for vertical text, in addition to the bidi and script > itemization we currently do. But I couldn’t find any resources to know how > to that, may be you will have a better luck than me. > > > It also depend on availability of vertical writing form in different font. > > I think we will have to depend on the fonts doing the right thing for now, > after making sure we handle correct fonts correctly we can look into some > fallback behavior for defective fonts. I would trust Microsoft PMingLiu more for Chinese font, but I'm not a font expert.
(In reply to Mark Hung from comment #11) > (In reply to Khaled Hosny from comment #9) > > > We shouldn’t get any USCRIPT_COMMON by the time this function is called, > > since it should have been resolved into a specific script by script > > itemization. > > > > > OTOH, I'm thinking maybe GetNextRun should merge runs with script type > > > USCRIPT_COMMON with previous run or set it to script type of the whole text > > > if the run contains CJK punctuation marks as mentioned in 5.1 Handling > > > Characters with the Common Script Property in UAX#24. > > > > This should be the case already. > > While dumping the script type, I found there are a few are UCSCRIPT_COMMON: > Char=《 Code=300a Script=0 GR_NONE > Char=… Code=2026 Script=0 GR_NONE > Char=… Code=2026 Script=0 GR_NONE > Char=》 Code=300b Script=0 GR_NONE > Char=「 Code=300c Script=0 GR_NONE > Char=… Code=2026 Script=25 GR_NONE > Char=」 Code=300d Script=25 GR_NONE This shouldn’t happen, can you debug it?
Created attachment 128227 [details] Screenshot with SAL_USE_COMMON_LAYOUT This is how the attached document looks like on my system.
(In reply to Khaled Hosny from comment #13) > Created attachment 128227 [details] > Screenshot with SAL_USE_COMMON_LAYOUT > > This is how the attached document looks like on my system. Do you mean that's what it looks like without any code change? That looks similar to mine on Linux ( some ellipsis are exception, not centered ). But we have regression on Windows.
(In reply to Mark Hung from comment #14) > (In reply to Khaled Hosny from comment #13) > > Created attachment 128227 [details] > > Screenshot with SAL_USE_COMMON_LAYOUT > > > > This is how the attached document looks like on my system. > > Do you mean that's what it looks like without any code change? Yes. > That looks similar to mine on Linux ( some ellipsis are exception, not > centered ). But we have regression on Windows. I see, then it is more likely a bug in the Windows rendering code not the cross-platform layout code. I have some local changes to improve Windows rendering code for local text, but not finished yet.
Khaled Hosny committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=3d456dfa6637c6c3ebe7a21f1f1a5b05039cee2a tdf#98879: Fix vertical text on Windows for the new layout It will be available in 5.3.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
@Mark: Is this fixed now, or there are still pending issues?
Created attachment 128492 [details] missing 0x300c in Windows10.
Hmm, that is weird. I’m pretty sure they were there when I worked on https://gerrit.libreoffice.org/gitweb?p=core.git;a=commitdiff;h=3d456dfa6637c6c3ebe7a21f1f1a5b05039cee2a. Looks like some fonts are affected and some are not, no idea what is going on.
With latest build from master, brackets are all fine now. But, ellipsis isn't rotated 90 degree clockwise on Windows. It's fine on Linux. Version: 5.4.0.0.alpha0+ (x64) Build ID: 4badf4000837488987b08d67d75b97b442ff0242 CPU Threads: 8; OS Version: Windows 6.19; UI Render: default; Locale: zh-TW (zh_TW); Calc: group
(In reply to Mark Hung from comment #20) > With latest build from master, brackets are all fine now. But, ellipsis > isn't rotated 90 degree clockwise on Windows. It's fine on Linux. > > Version: 5.4.0.0.alpha0+ (x64) > Build ID: 4badf4000837488987b08d67d75b97b442ff0242 > CPU Threads: 8; OS Version: Windows 6.19; UI Render: default; > Locale: zh-TW (zh_TW); Calc: group Confirmed with Noto CJK in attachment 123830 [details] Version: 5.4.0.0.alpha0+ Build ID: 3afe82bd63fde41d2a88418fb64e4ff587b05436 CPU Threads: 4; OS Version: Windows 6.19; UI Render: default; TinderBox: Win-x86@42, Branch:master, Time: 2017-01-04_23:42:57 Locale: fi-FI (fi_FI); Calc: group
Created attachment 130279 [details] screenshot from 5.4.0. I have the following issues on 5.4.0.0.alpha0+ ... (1) Some brackets are wrongly converted when using the Droid Sans Fallback. They were correct on LibreOffice 5.2.3.2. (2) Fullwidth Colon (U+FF1A) and Fullwidth Semicolon (U+FF1B) are not rotated 90 degreed clockwise with fonts such as Droid Sans Fallback or AR PL UMing. They were correct on LibreOffice 5.2.3.2. (3) Althogh the brackets like 《》 or 「」 are now correct, but they are not centered and too close to the left border. Please see attachment. Version: 5.4.0.0.alpha0+ Build ID: 63ddc8dc4ae1f3c3ee2f860c34886688b0ed2d57 CPU Threads: 4; OS Version: Linux 4.4; UI Render: default; VCL: gtk2; TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2017-01-08_00:28:48 Locale: zh-TW (zh_TW.UTF-8); Calc: group
(In reply to Hiunn-hué from comment #22) > (3) Althogh the brackets like 《》 or 「」 are now correct, but they are not > centered and too close to the left border. This problem looks similar to what I have seen in attachment 128631 [details], when I disable vert feature for 《》(Mongolian Baiti:vert=0), they looks close to the top border. Version: 5.3.0.2 (x64) Build ID: 5ad7b2889021c491af62f7930a4b1cb631392f16 CPU Threads: 4; OS Version: Windows 6.19; UI Render: default; 布局引擎:新; Locale: zh-CN (zh_CN); Calc: group
(In reply to Hiunn-hué from comment #22) > (1) Some brackets are wrongly converted when using the Droid Sans Fallback. > They were correct on LibreOffice 5.2.3.2. This is font issue, I found this problem also appearing in Chrome, Firefox, Opera on Android, maybe this font has bad vert feature.
(In reply to Hiunn-hué from comment #22) > (2) Fullwidth Colon (U+FF1A) and Fullwidth Semicolon (U+FF1B) are not > rotated 90 degreed clockwise with fonts such as Droid Sans Fallback or AR PL > UMing. They were correct on LibreOffice 5.2.3.2. According to Unicode data, U+FF1A/FF1B has Tr property, which means they are fallback to Rotated if they does not have proper feature for vertical layout in a font, but consider these characters are different from bracket characters, I suggest LibO should giving an exception.
(In reply to Volga from comment #25) > According to Unicode data, U+FF1A/FF1B has Tr property, which means they are > fallback to Rotated if they does not have proper feature for vertical layout > in a font, but consider these characters are different from bracket > characters, I suggest LibO should giving an exception. See: http://www.unicode.org/reports/tr50/#data
Hi, (In reply to Hiunn-hué from comment #22) > Created attachment 130279 [details] > screenshot from 5.4.0. According to the attachment, characters that use Noto Sans CJK Regular TC are displayed with correct orientation in Linux. Although there are still residual issues, let's create different issues for them if necessary. With knowledge we gain about vertical writing, I'd like to suggest to create different for the combination of 1. Whether OpenGL is enabled 2. the characters under investigation 3. the Font under investigation, and the expectation about the result - Whether the font support vertical writing (i.e. have vhea and vmtx table) make differences. - Each Font have it's own Glyph substitution table (GSUB) for vertical writing so that we can not expect same character have the same result among different fonts in vertical writing. It should be verified against font design first. Intuition about the result or past expectation isn't always correct.
> (2) Fullwidth Colon (U+FF1A) and Fullwidth Semicolon (U+FF1B) are not > rotated 90 degreed clockwise with fonts such as Droid Sans Fallback or AR PL > UMing. They were correct on LibreOffice 5.2.3.2. I opened a new issue for this: https://bugs.documentfoundation.org/show_bug.cgi?id=106295