Bug 114760 - Word Count problem with symbols in Chinese mixed with English text
Summary: Word Count problem with symbols in Chinese mixed with English text
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
3.6.0.4 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: CJK
  Show dependency treegraph
 
Reported: 2017-12-30 02:26 UTC by Cheng-Chia Tseng
Modified: 2023-07-05 03:13 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Cheng-Chia Tseng 2017-12-30 02:26:03 UTC
Description:
In Word Count dialogue, there is a "Words" count section. 
It counts English text for words without symbols while it counts Chinese text for characters AND symbols.

In Chinese text we have 2 counting ways: one to count Chinese characters and symbols, and the other to count only Chinese characters (no symbols). The previous one method counting Chinese symbols is much more popular in press.

So when we are counting a text document including Chinese text and English text, we add the Word count of English (not counting symbols) and the Word count of Chinese (either counting symbols or not) together.

The "Words" count in LibreOffice now uses the first method above to count English "words" and "Chinese characters and Chinese symbols." I think that is confusing because we see "phonogram words" equal to "Chinese characters."

"Words count" should be divided into 
1. Words => be corrected by only counting words and Chinese characters.
2. Words and Chinese symbols => the method we use for Words count now.

Steps to Reproduce:
1. Open Writer
2. Copy paste "Hello, world! 世界,你好!"
3. Select Tools > Word Count to see the stats

Actual Results:  
1. Words: 8
2. Characters including spaces: 20
3. Characters excluding spaces: 18
4. Asian characters and Korean syllables: 6

Expected Results:
In "Hello, world! 世界,你好!" sentence, there are 2 English words (Hello world), 4 Chinese characters (世界你好), 4 symbols (,!,!), 2 Chinese symbols (,!) and 2 spaces.

1. Words: 6 => Should be corrected as "Words" not including symbols 
2. Words and Chinese symbols: 8 => What the Words count method now
3. Words and symbols: 10
4. Characters including spaces: 20
5. Characters excluding spaces: 18
6. Asian characters and Korean syllables: 6


Reproducible: Always


User Profile Reset: No



Additional Info:


User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0
Comment 1 Buovjaga 2018-01-27 18:37:43 UTC
Confirmed.

Arch Linux 64-bit
Version: 6.1.0.0.alpha0+
Build ID: 2d8f17565ebe867210f5769851d91b2e7b612a8f
CPU threads: 8; OS: Linux 4.14; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group threaded
Built on January 27th 2018
Comment 2 QA Administrators 2019-01-28 03:42:17 UTC Comment hidden (obsolete)
Comment 3 Ming Hua 2019-05-27 08:25:11 UTC
(In reply to QA Administrators from comment #2)
Still reproducible in 6.2.4.

Version: 6.2.4.2 (x64)
Build ID: 2412653d852ce75f65fbfa83fb7e7b669a126d64
CPU threads: 2; OS: Windows 10.0; UI render: GL; VCL: win; 
Locale: zh-CN (zh_CN); UI-Language: en-US
Calc: threaded
Comment 4 Naruhiko Ogasawara 2019-06-23 04:05:40 UTC
Just exclude symbols to word count is enough?  Or we need extra count (currently "word" count)?

Now I'm digging this issue, then I would like to confirm the real problem we should fix.
Comment 5 Cheng-Chia Tseng 2019-06-23 16:51:05 UTC
In my opinion, "words" does not include symbols basically.

The method used by LibreOffice now takes Chinese symbols into account to help users in Taiwan or China to know what the press wants to know.

Note: The press/media in Taiwan or China count Chinese symbols as well to give the pay for writers.

I suggest adding an extra count to know the "real word" count regardless any form of symbols.
Comment 6 Ming Hua 2019-07-04 05:42:03 UTC
In my opinion, there are multiple issues here, some illustrated by the example from the bug submitter, some not.  Maybe I should file separate bugs.

1. Exclude Chinese punctuations and symbols from the "Words" count.  Or alternatively, exclude all Chinese characters and symbols from the "Words" count, as "words" (词/詞) is a rather vague concept in Chinese anyway, and counting each Chinese character as a word would never be correct.

2. Recognize full-width space (U+3000) in the "Characters excluding spaces" count;

3. Provide Asian character count excluding punctuations and symbols, as that number is sometimes preferred.
Comment 7 QA Administrators 2021-07-04 04:36:39 UTC Comment hidden (obsolete)
Comment 8 QA Administrators 2023-07-05 03:13:45 UTC
Dear Cheng-Chia Tseng,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug