Bug 114432 - Tangut character aligned incorrectly in vertical layout
Summary: Tangut character aligned incorrectly in vertical layout
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
6.0.0.0.beta2
Hardware: All All
: medium normal
Assignee: ⁨خالد حسني⁩
URL:
Whiteboard: target:24.2.0 target:7.6.0.0.beta2
Keywords:
Depends on:
Blocks: CJK Vertical-Text
  Show dependency treegraph
 
Reported: 2017-12-13 04:17 UTC by Volga
Modified: 2023-06-21 17:00 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample ODT (14.07 KB, application/vnd.oasis.opendocument.text)
2017-12-13 04:19 UTC, Volga
Details
Snapshot (162.84 KB, image/png)
2017-12-13 04:19 UTC, Volga
Details
Screenshot: Comparion (200.83 KB, image/png)
2017-12-16 12:07 UTC, Hiunn-hué
Details
Screenshot: Comparion (107.46 KB, image/png)
2017-12-21 03:51 UTC, Volga
Details
Screenshot from 7.3.0.0.alpha0+ (167.27 KB, image/png)
2021-06-24 08:15 UTC, Volga
Details
Screenshot from LibreOffice Writer 7.5.4.2 (132.82 KB, image/png)
2023-06-21 05:10 UTC, Volga
Details
Orignal text (93.04 KB, image/png)
2023-06-21 06:21 UTC, Volga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Volga 2017-12-13 04:17:46 UTC
Description:
Tangut characters shifted to the right vertical layout, leaves a narrow gap after CJK characters.

Steps to Reproduce:
1. Open Yu Luo Jun Mei Shu.odt

Actual Results:  
See attached screenshot

Expected Results:
LiberOffice should not insert gap between Chinese and Tangut characters, and Tangut characters should aligned the same as Chinese Characters in vertical layout. 
See the example: https://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&oldid=815160812


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 6.0.0.0.beta2+ (x64)
Build ID: b030bf19e29f031b0a640dd92c38d654785f1a99
CPU threads: 4; OS: Windows 10.0; UI render: default; 
TinderBox: Win-x86_64@42, Branch:libreoffice-6-0, Time: 2017-12-12_05:03:02
Locale: zh-CN (zh_CN); Calc: group threaded


User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0
Comment 1 Volga 2017-12-13 04:19:01 UTC
Created attachment 138407 [details]
Sample ODT
Comment 2 Volga 2017-12-13 04:19:37 UTC
Created attachment 138408 [details]
Snapshot
Comment 3 Volga 2017-12-13 04:22:18 UTC
You can try Tangut Yinchuan for test.
http://www.babelstone.co.uk/Fonts/Yinchuan.html
Comment 4 Hiunn-hué 2017-12-16 12:07:33 UTC
Created attachment 138474 [details]
Screenshot: Comparion


Sorry, I don't know how to describe it properly, please see the attached screenshot.

It seems to depend on ...
    1. the chinese font you are using and
    2. the first character of a line.

Compared to other fonts, Noto Sans CJK and Noto Serif CJK seem to shift to left a lot, and the characters are overlapped with underline.


---

Version: 6.1.0.0.alpha0+
Build ID: aad9c6da5154a89c6ef02214d1122d4b444eea23
CPU threads: 4; OS: Linux 4.4; UI render: default; VCL: gtk2; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2017-12-15_23:20:39
Locale: zh-TW (zh_TW.UTF-8); Calc: group threaded
Comment 5 Volga 2017-12-17 09:07:54 UTC
OK, for me, the problem looks the same as what you have seen with Noto Sans CJK and Noto Serif CJK, however characters aren't overlapped with underline.

Version: 6.0.0.0.beta2+ (x64)
Build ID: bd260a59cfdd050db3aa9a641ef0fa09efcedf5b
CPU threads: 4; OS: Windows 10.0; UI render: default; 
TinderBox: Win-x86_64@42, Branch:libreoffice-6-0, Time: 2017-12-16_05:54:16
Locale: zh-CN (zh_CN); Calc: group threaded
Comment 6 Volga 2017-12-21 03:51:47 UTC
Created attachment 138559 [details]
Screenshot: Comparion

Here is what I have seen on Windows.

Version: 6.0.0.0.beta2+ (x64)
Build ID: fe31edb29e5e77bb60e3aa73ec6a0380314acc61
CPU threads: 4; OS: Windows 10.0; UI render: default; 
TinderBox: Win-x86_64@42, Branch:libreoffice-6-0, Time: 2017-12-20_03:45:21
Locale: zh-CN (zh_CN); Calc: group
Comment 7 Buovjaga 2017-12-26 17:04:07 UTC
Let's set to NEW per comment 4
Comment 8 QA Administrators 2018-12-27 03:42:36 UTC Comment hidden (obsolete)
Comment 9 QA Administrators 2020-12-27 03:36:11 UTC Comment hidden (obsolete)
Comment 10 Volga 2021-06-24 08:15:37 UTC
Created attachment 173137 [details]
Screenshot from 7.3.0.0.alpha0+

The Tangut characters still have gap at the top. As a suggection, in vertical text layout, they should apply the same alignment as CJK Ideographs, Kana and Hangul. 

Version: 7.3.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: f6b9f671d128c989ce223d61d0d5d43ff1dc9fcb
CPU threads: 4; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: zh-CN (zh_CN); UI: zh-CN
Calc: threaded

Use this font:
https://mirrors.tuna.tsinghua.edu.cn/adobe-fonts/source-han-serif/OTF/SimplifiedChinese/SourceHanSerifSC-Regular.otf
Comment 11 ⁨خالد حسني⁩ 2023-06-20 11:03:35 UTC
Does not seem to be reproducible any more. Please re-open if reproducible.
Comment 12 Volga 2023-06-21 05:10:22 UTC
Created attachment 188028 [details]
Screenshot from LibreOffice Writer 7.5.4.2

I see there are no change for this, further more, Khitan Small Script is now encoded in Unicode, which is another writing system inspired by Chinese characters, so there's necessary to fix this bug.

Version: 7.5.4.2 (X86_64) / LibreOffice Community
Build ID: 36ccfdc35048b057fd9854c757a8b67ec53977b6
CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: zh-CN (zh_CN); UI: zh-CN
Calc: threaded
Comment 13 ⁨خالد حسني⁩ 2023-06-21 05:45:09 UTC
OK, I see the issue now. Any new CJK scripts in Unicode other than Tangut and Khitan Small Script?
Comment 14 Volga 2023-06-21 06:21:31 UTC
Created attachment 188029 [details]
Orignal text

This is the original text I have extracted, the source can be seen in attachment 138407 [details]. Here you can see althrough there's some character spacing in content, no extra spacing between Chinese and Tangut characters.
Comment 15 Volga 2023-06-21 06:35:07 UTC
(In reply to ⁨خالد حسني⁩ from comment #13)
> OK, I see the issue now. Any new CJK scripts in Unicode other than Tangut
> and Khitan Small Script?
On Unicode website (https://www.unicode.org/charts/) I see no new CJK scripts encoded in Unicode so far. But I believe LibreOffice should be treated siniform scripts the same for content layout, and be prepared for new comer if it was adopted in new version of the Unicode Standard, for example, Jurchen script.
Comment 16 ⁨خالد حسني⁩ 2023-06-21 06:38:51 UTC
(In reply to Volga from comment #15)
> (In reply to ⁨خالد حسني⁩ from comment #13)
> > OK, I see the issue now. Any new CJK scripts in Unicode other than Tangut
> > and Khitan Small Script?
> On Unicode website (https://www.unicode.org/charts/) I see no new CJK
> scripts encoded in Unicode so far. But I believe LibreOffice should be
> treated siniform scripts the same for content layout, and be prepared for
> new comer if it was adopted in new version of the Unicode Standard, for
> example, Jurchen script.

How to identify a siniform script in a future-proof way? We are currently using ICU script codes, so we can only check scripts that are already encoded.
Comment 17 Commit Notification 2023-06-21 08:41:32 UTC
Khaled Hosny committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/bb2c55b2c8f72bfbb7f98cf88911cb88ee1a71d6

tdf#114432: classify Tangut and Khitan Small Script as ScriptType::ASIAN

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Volga 2023-06-21 15:05:45 UTC
Should this to be backport to 7.6beta?
Comment 19 Commit Notification 2023-06-21 17:00:30 UTC
Khaled Hosny committed a patch related to this issue.
It has been pushed to "libreoffice-7-6":

https://git.libreoffice.org/core/commit/914e45a65818a52b7469161b5970db4b8c7c66a5

tdf#114432: classify Tangut and Khitan Small Script as ScriptType::ASIAN

It will be available in 7.6.0.0.beta2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.