Bug 131557 - LO could consider rules for word count in different languages to make a more reliable word count
Summary: LO could consider rules for word count in different languages to make a more ...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Word-Count
  Show dependency treegraph
 
Reported: 2020-03-25 08:20 UTC by Philippe
Modified: 2023-08-20 13:21 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of the erreoneous count for French (38.13 KB, image/png)
2020-05-13 16:43 UTC, Philippe
Details
Screenshot of the correct count for English (36.79 KB, image/png)
2020-05-13 16:43 UTC, Philippe
Details
ODT file with erroneous count example text (8.45 KB, application/vnd.oasis.opendocument.text)
2020-05-13 16:44 UTC, Philippe
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philippe 2020-03-25 08:20:49 UTC
Word count is wrong for French language (at least), it counts many more words than there are. I gather that Writer counts words by counting runs of whitespace separating them.
French typographic rules (and, hence, normal use) call for certain  common punctuation (most notably quote marks, semicolon, colon, interrogation and exclamation marks, dashes) to be separated from words by white space. This wrongly inflates the word count. 

Such punctuation should not be counted as words.


So, for French language at least, a count of punctuation marks surrounded by white space should be substracted from the actual word count. (Adding theses punctuation marks as white space to the counting regex (I guess?) would also mess word count with gender neutral writing, with words such as "développeur·se", "développeur(se)", "développeur/se" or ""développeu-se".)
Comment 1 sophie 2020-05-13 12:11:41 UTC
Hi, could you provide a document where the count is wrong and tell us which version and operating system you are using. 
For information, if I cut and paste "développeur·se", "développeur(se)", "développeur/se" or ""développeu-se" in Writer I get 5 words, 73 char. including spaces, 69 excluding spaces which is the right number of words. So I can't reproduce your issue in version 6.4.0.2 or 7.0 alpha.
Comment 2 Xisco Faulí 2020-05-13 12:18:36 UTC
Setting to NEEDINFO meanwhile
Comment 3 Julien Nabet 2020-05-13 12:22:41 UTC
Except the fact I'm strongly against gender neutral writing because it's ridiculous and make text hardly readable, I don't know anything about word counting process.
I can't help here=>uncc myself.
Comment 4 Philippe 2020-05-13 16:43:12 UTC
Created attachment 160765 [details]
Screenshot of the erreoneous count for French
Comment 5 Philippe 2020-05-13 16:43:47 UTC
Created attachment 160766 [details]
Screenshot of the correct count for English
Comment 6 Philippe 2020-05-13 16:44:37 UTC
Created attachment 160767 [details]
ODT file with erroneous count example text
Comment 7 Philippe 2020-05-13 16:51:13 UTC
I added an odt file which the French count is 4 words instead of 2, a screenshot of the erroneous count and, for contrast, a screenshot of the correct count for the English text (showing that quotations marks should not be counted as words).

I am using 6.4.2.2 on Ubuntu 20.4.

Sorry if I was unclear with the inclusive writing stuff, LO counts it perfectly right, I was only trying to say that what seemed to me as the obvious solution could mess this up.
Comment 8 Dieter 2020-05-17 06:43:33 UTC
(In reply to Philippe from comment #4)
> Created attachment 160765 [details]
> Screenshot of the erreoneous count for French

As far as I can see the word count follows the rules of LO: "In general, every string of characters between two spaces is a word." [1].

So I won't consider this as  bug, but as an enhancement request. I assume, that "white spaces" are nonbreaking spaces, right? So could you change bug summary in a way, that makes your idea more clear? Thank you.


[1] https://help.libreoffice.org/7.0/en-GB/text/swriter/01/06040000.html?&DbPAR=WRITER&System=WIN
Comment 9 Francoise 2020-05-17 16:27:41 UTC
(In reply to Philippe from comment #7)
Hello

As a French user of Open Office, I totally agree with you, Philippe, the Word Count feature does not give the expected result for the French language. 

Word counting looks a trivial task but there are some rules in French. 
For example, all symbols that are not letters or numbers count for nothing: 
This is the case with punctuation marks (comma, period, colon, semi-colon, exclamation mark, question mark), hyphens, dashes, apostrophes, quotation marks, parentheses, and square brackets…. 

There is a web application that takes into account these rules: https://www.combiendemots.com/
The web clearly announces to be more efficient than Word and so on than OpenOffice since it gives same results… 

Enhance the Word Count feature could be an interesting marketing point because there are several administrative or educational procedures that require precise word counting and may have a big impact on the user life. 

So It could be a great enhancement for OpenOffice to offer a tool efficient enough that French users no need to copy and paste their text to another tool.
Comment 10 Dieter 2020-05-18 09:53:19 UTC
So I've changed bug summary.
Comment 11 Dieter 2020-12-01 08:11:03 UTC
Sophie, in comment 1 you've asked for a document. Is it possible for you to have a look at the document from comment 6?
Comment 12 Jean-Baptiste Faure 2021-08-12 17:59:10 UTC
I confirm that the word count in the test file is clearly wrong. String without alphanumerical character can't be considered as word.

For me it is a bug not an enhancement.

Set status to NEW.

Best regards. JBF
Comment 13 QA Administrators 2023-08-13 03:20:11 UTC
Dear Philippe,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug