Bug 55359 - Word count gives wrong results
Summary: Word count gives wrong results
Status: RESOLVED DUPLICATE of bug 99189
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.1.2 release
Hardware: x86 (IA32) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Word-Count
  Show dependency treegraph
 
Reported: 2012-09-26 13:47 UTC by Urmas
Modified: 2020-03-23 09:51 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
Example (1.91 KB, text/rtf)
2012-09-26 13:47 UTC, Urmas
Details
Another Example (13.12 KB, application/vnd.oasis.opendocument.text)
2012-10-07 20:10 UTC, Larry Tate
Details
ODT with same contents as 68225/"Another example", but different word count (9.34 KB, application/vnd.oasis.opendocument.text)
2012-10-11 22:58 UTC, stfhell
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Urmas 2012-09-26 13:47:18 UTC
Created attachment 67729 [details]
Example

The correct answer is 5.

Word counting in documents is not a joke and needs to work correctly.
Comment 1 Julien Nabet 2012-09-26 19:35:45 UTC
On pc Debian x86-64 with 3.6 branch updated today, I reproduced this behaviour.

I'll give a try with master sources asap.
Comment 2 Julien Nabet 2012-09-26 21:09:33 UTC Comment hidden (obsolete)
Comment 3 Larry Tate 2012-10-07 20:10:00 UTC
Created attachment 68225 [details]
Another Example
Comment 4 Larry Tate 2012-10-07 20:11:56 UTC
The problem also occurs with the parenthetical citations in the MLA citation format. MLA uses page numbers in parenthesis for citation. For some reason, the citation (89) is treated as three words! 

{See attached example}
Comment 5 stfhell 2012-10-11 22:54:22 UTC
No, word counting is not a joke, but it's not an exact science either. Whatever way Writer counts words, there will always be somebody who disagrees with it, because "word" is not a simple and easy concept for complex documents. (Is a number a word? If yes, are "10.5" and "10,5" 2 words? Is "off-duty" 1 or 2 words, and what about dates like "1/1/2010"? Should headers and footers be included in the count?)

Word count is something that has been discussed at length, see for example OpenOffice-Bugs
80815 (Word count differs from MS Word):
https://issues.apache.org/ooo/show_bug.cgi?id=80815
86537 (word count should display count excluding footnotes):
https://issues.apache.org/ooo/show_bug.cgi?id=86537
102135 (Document the rules used to count words in a document)
https://issues.apache.org/ooo/show_bug.cgi?id=102135

Back then (2009) the developers aimed at having a word count that is compatible to MS Word, not because Word did a particularly good job at that, but because it defines a de facto standard. I think this is still true (judging from the patches made to the word count algorithm for Bug 46757), and Word/Writer compatibility here does make some sense.

Re attachment 67729 [details]: The correct answer is 7, as far as it concerns me - footnotes are part of a document and should be included in the word count because they appear on the page, they are not metadata. (One could argue that counting the 2 footnote characters as 1 word is a bit inconsistent.) Including footnotes/endnotes in the count is not a bug. One should be able to exclude them (like in Word) if an organisation measures text in words _ex_ footnotes, but there are also people who need them in the word count (see OpenOffice-Bug 86537/comments 2+4) and expect LO not to ignore them just like that. Changing this behaviour (having options like in Word) would be an enhancement, not a bug.

Re attachment 68225 [details]: Writer does not count the brackets as words. If you delete " (89)" from your document, it still has 7 words (and 42 characters!), 2 more than you can actually see. The problem is: Where does Writer see these 2 words? Writer also counts 1 line, but 3 paragraphs. This looks like a bug, but it has nothing to do with the parentheses.
Comment 6 stfhell 2012-10-11 22:58:10 UTC
Created attachment 68463 [details]
ODT with same contents as 68225/"Another example", but different word count

ODT created with LO 3.5.4.2. Has the same text contents as wordcount.odt (attachment Comment #3), but correct word count.
Comment 7 Simo Kaupinmäki 2012-10-14 22:35:38 UTC
*** Bug 55586 has been marked as a duplicate of this bug. ***
Comment 8 tommy27 2014-10-24 21:47:13 UTC
(In reply to Urmas from comment #0)
> Created attachment 67729 [details]
> Example
> 
> The correct answer is 5.
> 
> Word counting in documents is not a joke and needs to work correctly.

still reproducible with LibO 4.3.2.2 and 4.4.0.0.alpha1+
Build ID: 6ba8b7f5eacac969e4781d63718083a05491b1bc
TinderBox: Win-x86@42, Branch:master, Time: 2014-10-24_02:23:51
Comment 9 f5d505f9 2015-12-19 09:14:38 UTC
Reproducible under LO 5.0.4.2  (x64)
Build ID: 2b9802c1994aa0b7dc6079e128979269cf95bc78
Locale: nl-BE (nl_BE)
on Windows 10.
Comment 10 QA Administrators 2017-01-03 19:48:45 UTC Comment hidden (obsolete)
Comment 11 tommy27 2017-01-05 23:22:08 UTC
still reproducible under Win8.1 x64 in LibO 5.2.4.2 and a recent 5.4.0.0 daily build
Comment 12 András Novoszáth 2017-06-10 10:54:54 UTC
I have the same problem with version 5.1.6.2 under Ubuntu 16.04 LTS
Comment 13 Cheryl 2018-01-13 05:34:02 UTC
I have the same problem. 9 words written and 14 counted in LO.5.4.3.2 

It seems to be counting spaces and carriage returns as words. That's the only thing I can think of for the discrepancy on a blank page that only contains 9 words. There are no headers, footers, or footnotes or notes to count.
Comment 14 Julien Nabet 2018-01-13 17:02:24 UTC Comment hidden (obsolete)
Comment 15 QA Administrators 2019-01-14 03:52:08 UTC Comment hidden (obsolete)
Comment 16 Cheryl 2019-01-15 03:07:46 UTC
I don't have a problem with the word count on LibreOffice 6.0.4.2 release. Some idiot set up my computer that I can't download .exe files for earlier releases and open them and I don't know how to fix it.
Comment 17 Julien Nabet 2019-01-15 14:05:35 UTC
Cheryl: field version must correspond to "earliest affected" as indicated so please don't put a more recent version here.

Just for the record, I could reproduce this on 6.1.4.2 on Win7 with the attached file and on a brand new odt.


Now I agree with stfhell's comment 5, after all, why not counting words in footnotes, both views are understandeable. If it's just to do like Word, I don't think it's sufficient, but I know that some people are ready to mimic MsOffice until reproducing the same bugs sometimes.
Just to avoid some confusion, when I'm talking about bugs, I don't have word counting in mind.
Of course, I suppose we may add another option in UI with default value corresponding to Word.

Michael: as Writer expert, thought you might have some opinion here.
Comment 18 Timur 2020-03-23 09:47:38 UTC
This bug and bug 99189 look the same. 
Although we normally mark latter as duplicate, and it was mistake to confirm without search, I'll mark this one because it has poor description "look into document" and some discussion that's not the point of the bug.
Other bug makes it clear: "add option..." and that's the only way, no exact solution nor agreement what to count.

*** This bug has been marked as a duplicate of bug 99189 ***
Comment 19 Timur 2020-03-23 09:51:00 UTC
Note to CC users: please subscribe to bug 99189 because LO has a policy of raising importance of enhancement based on duplicates and users.
(not that I agree and not that it will speed up the fix, but that's how it goes).