Bug 71877 - Word Count Wrong for ZWSP delimited text in SEA langauges (Thai, Lao, Khmer, and Burmese)
Summary: Word Count Wrong for ZWSP delimited text in SEA langauges (Thai, Lao, Khmer, ...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Word-Count
  Show dependency treegraph
 
Reported: 2013-11-21 14:11 UTC by Robert M Campbell
Modified: 2018-10-26 02:58 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Test document including ZWSP and non-ZWSP Thai, Lao, Khmer, and Burmese text (42.63 KB, application/vnd.oasis.opendocument.text)
2013-11-21 14:11 UTC, Robert M Campbell
Details
Test document including ZWSP and non-ZWSP Thai, Lao, Khmer, and Burmese text (38.13 KB, application/vnd.oasis.opendocument.text)
2013-11-25 04:21 UTC, Robert M Campbell
Details
Mittaphap (24.57 KB, application/x-font-ttf)
2013-11-25 04:42 UTC, Robert M Campbell
Details
Mittaphap Book (24.63 KB, application/x-font-ttf)
2013-11-25 04:43 UTC, Robert M Campbell
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert M Campbell 2013-11-21 14:11:17 UTC
Created attachment 89590 [details]
Test document including ZWSP and non-ZWSP Thai, Lao, Khmer, and Burmese text

When working with text that uses ZWSPs (zero width spaces) to delimit text, LibreOffice does not count each word. When the ZWSPs are removed, the word count acts fine.

But, word selection (double click) and line breaking work fine with or without ZWSPs.

Testing document attached.
Comment 1 Robinson Tryon (qubit) 2013-11-24 22:54:58 UTC
CONFIRMED in LO Version: 4.2.0.0.beta1 + Ubuntu 12.04.3

(In reply to comment #0)
> When working with text that uses ZWSPs (zero width spaces) to delimit text,
> LibreOffice does not count each word. When the ZWSPs are removed, the word
> count acts fine.

Per instructions in Test document:

REPRO STEPS:
- Open test document in LibreOffice
- Highlight first 4 paragraphs

As noted in the document, the bottom bar shows "202 words"

- Highlight the next set of 4 paragraphs

As noted in the document, the bottom bar shows "2 words"

> But, word selection (double click) and line breaking work fine with or
> without ZWSPs.

Well, at least there's that!

> 
> Testing document attached.

Thanks for the test document. Some of the fonts are not present on my system -- would it be possible to change the test document to use fonts included in LO that exercise the same bug?  (if not, perhaps point to where the fonts might be downloaded)

Status -> NEW
Comment 2 Robinson Tryon (qubit) 2013-11-24 22:57:05 UTC
Andras - Is this behavior a bug?
Comment 3 Robert M Campbell 2013-11-25 04:17:08 UTC
Paragraphs 1 & 5 (Thai) - No LibreOffice fonts that I can tell
Droid Sans
https://www.google.com/fonts/specimen/Droid+Sans

Paragraphs 2 & 6 (Khmer) - No LibreOffice fonts that I can tell
Khmer OS
http://sourceforge.net/projects/khmer/files/Fonts%20-%20KhmerOS/KhmerOS%20Fonts%204.0-%20LGPL%20License/

Paragraphs 3 & 7 (Lao) - No LibreOffice fonts that I can tell
Mittaphap
http://hg.palaso.org/font-lao2/file/d0764b11848f

Padauk (included in LibreOffice) is the Burmese Font

I'll adjust the document to the fonts listed. Mittaphap in particular is fairly new and only available as source, not ttf yet, but I have generated some fonts and can attach them here if that would be helpful?
Comment 4 Robert M Campbell 2013-11-25 04:21:26 UTC
Created attachment 89726 [details]
Test document including ZWSP and non-ZWSP Thai, Lao, Khmer, and Burmese text
Comment 5 Robinson Tryon (qubit) 2013-11-25 04:38:03 UTC
(In reply to comment #3)
> [...various font things ..] 
> I'll adjust the document to the fonts listed.

thanks

> Mittaphap in particular is
> fairly new and only available as source, not ttf yet, but I have generated
> some fonts and can attach them here if that would be helpful?

As long as the links are stable and fonts under some FOSS license so we may test against them, then it's generally fine to link to external font files.
Comment 6 Robert M Campbell 2013-11-25 04:42:58 UTC
Created attachment 89727 [details]
Mittaphap
Comment 7 Robert M Campbell 2013-11-25 04:43:29 UTC
Created attachment 89728 [details]
Mittaphap Book
Comment 8 Robert M Campbell 2013-11-25 04:52:49 UTC
Mittaphap is licensed OFL
Comment 9 Robert M Campbell 2014-01-22 03:16:36 UTC
Any news on this bug? Anything I can do to help?
Comment 10 Robinson Tryon (qubit) 2015-01-14 08:14:19 UTC
(In reply to Robert M Campbell from comment #9)
> Any news on this bug? Anything I can do to help?

Hi Robert,
Good question -- sorry for the late reply here! As you can see, we have a large number of open bug reports filed against LibreOffice, so it's often a matter of finding the right resource to help address a particular bug or set of bugs.

This bug appears to affect a number of different languages including Thai, so I'd suggest that you check with the Thai mailing list and see if others are experiencing the same problem:
https://wiki.documentfoundation.org/Local_Mailing_Lists#Thai

If the problem is affecting many people, then we can try to identify someone who'd be interested in working on a fix. This could be a great opportunity for a university CS student or someone else familiar with programming to learn more about LibreOffice.
Comment 11 QA Administrators 2017-10-25 08:58:19 UTC Comment hidden (obsolete)
Comment 12 Robert M Campbell 2017-10-25 10:48:57 UTC
Sorry, life, travels, and ever expanding projects seem to eat up time. I've just now reviewed this bug (tested with 5.4.2.2 (x64)) and...

Still works in the same manner as previous (so still not providing correct word counts). 

Basically, without any zero-width-spaces, the word counts seem spot on. It's just when working with text that has zero-width-spaces (ZWSP). 

I'm not exactly sure where this happens in the code. My programming skills in the no web sphere is not super high, but I am willing to look into it, if someone can kind of guide me where I should start looking. 

What I don't know, and this my play a major factor in things, is if all users use zero width spaces to delimit words (in the case of Thai, Lao, Khmer - this seems to be the case, but I'm not a linguist/language expert, though I can read at varying levels in each listed language). It may be that sometimes users may insert ZWSPs specifically for cases where in English we'd use a hyphen to do the same (line breaking). 

Anyways, point me where I can help, and I'm glad to do what I can.

Thanks!
Comment 13 QA Administrators 2018-10-26 02:58:40 UTC
** Please read this message in its entirety before responding **

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug