Description Philippe 2020-03-25 08:20:49 UTC
Word count is wrong for French language (at least), it counts many more words than there are. I gather that Writer counts words by counting runs of whitespace separating them.
French typographic rules (and, hence, normal use) call for certain  common punctuation (most notably quote marks, semicolon, colon, interrogation and exclamation marks, dashes) to be separated from words by white space. This wrongly inflates the word count. 

Such punctuation should not be counted as words.

So, for French language at least, a count of punctuation marks surrounded by white space should be substracted from the actual word count. (Adding theses punctuation marks as white space to the counting regex (I guess?) would also mess word count with gender neutral writing, with words such as "développeur·se", "développeur(se)", "développeur/se" or ""développeu-se".)
Comment 1 sophie 2020-05-13 12:11:41 UTC
Hi, could you provide a document where the count is wrong and tell us which version and operating system you are using. 
For information, if I cut and paste "développeur·se", "développeur(se)", "développeur/se" or ""développeu-se" in Writer I get 5 words, 73 char. including spaces, 69 excluding spaces which is the right number of words. So I can't reproduce your issue in version or 7.0 alpha.
Comment 2 Xisco Faulí 2020-05-13 12:18:36 UTC
Setting to NEEDINFO meanwhile
Comment 3 Julien Nabet 2020-05-13 12:22:41 UTC
Except the fact I'm strongly against gender neutral writing because it's ridiculous and make text hardly readable, I don't know anything about word counting process.
I can't help here=>uncc myself.
Comment 4 Philippe 2020-05-13 16:43:12 UTC
Created attachment 160765 [details]
Screenshot of the erreoneous count for French
Comment 5 Philippe 2020-05-13 16:43:47 UTC
Created attachment 160766 [details]
Screenshot of the correct count for English
Comment 6 Philippe 2020-05-13 16:44:37 UTC
Created attachment 160767 [details]
ODT file with erroneous count example text
Comment 7 Philippe 2020-05-13 16:51:13 UTC
I added an odt file which the French count is 4 words instead of 2, a screenshot of the erroneous count and, for contrast, a screenshot of the correct count for the English text (showing that quotations marks should not be counted as words).

I am using on Ubuntu 20.4.

Sorry if I was unclear with the inclusive writing stuff, LO counts it perfectly right, I was only trying to say that what seemed to me as the obvious solution could mess this up.
Comment 8 Dieter 2020-05-17 06:43:33 UTC
(In reply to Philippe from comment #4)
> Created attachment 160765 [details]
> Screenshot of the erreoneous count for French

As far as I can see the word count follows the rules of LO: "In general, every string of characters between two spaces is a word." [1].

So I won't consider this as  bug, but as an enhancement request. I assume, that "white spaces" are nonbreaking spaces, right? So could you change bug summary in a way, that makes your idea more clear? Thank you.

[1] https://help.libreoffice.org/7.0/en-GB/text/swriter/01/06040000.html?&DbPAR=WRITER&System=WIN
Comment 9 Francoise 2020-05-17 16:27:41 UTC
(In reply to Philippe from comment #7)

As a French user of Open Office, I totally agree with you, Philippe, the Word Count feature does not give the expected result for the French language. 

Word counting looks a trivial task but there are some rules in French. 
For example, all symbols that are not letters or numbers count for nothing: 
This is the case with punctuation marks (comma, period, colon, semi-colon, exclamation mark, question mark), hyphens, dashes, apostrophes, quotation marks, parentheses, and square brackets…. 

There is a web application that takes into account these rules: https://www.combiendemots.com/
The web clearly announces to be more efficient than Word and so on than OpenOffice since it gives same results… 

Enhance the Word Count feature could be an interesting marketing point because there are several administrative or educational procedures that require precise word counting and may have a big impact on the user life. 

So It could be a great enhancement for OpenOffice to offer a tool efficient enough that French users no need to copy and paste their text to another tool.
Comment 10 Dieter 2020-05-18 09:53:19 UTC
So I've changed bug summary.
Comment 11 Dieter 2020-12-01 08:11:03 UTC
Sophie, in comment 1 you've asked for a document. Is it possible for you to have a look at the document from comment 6?
Comment 12 Jean-Baptiste Faure 2021-08-12 17:59:10 UTC
I confirm that the word count in the test file is clearly wrong. String without alphanumerical character can't be considered as word.

For me it is a bug not an enhancement.

Set status to NEW.

Best regards. JBF
