Bug 53590 - incorrect word count on odts derived from non-odt documents
Summary: incorrect word count on odts derived from non-odt documents
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected) release
Hardware: All All
: medium normal
Assignee: Caolán McNamara
Whiteboard: target:3.7.0
Depends on:
Blocks: Word-Count
  Show dependency treegraph
Reported: 2012-08-16 13:03 UTC by Guy Voets
Modified: 2016-10-24 21:20 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:

Writer file, originally .doc, with erroneous word count (14.39 KB, application/vnd.oasis.opendocument.text)
2012-08-16 20:20 UTC, Guy Voets
another Writer file (originally .docx) with erroneous count (9.95 KB, application/vnd.oasis.opendocument.text)
2012-08-16 20:30 UTC, Guy Voets

Note You need to log in before you can comment on or make changes to this bug.
Description Guy Voets 2012-08-16 13:03:46 UTC
Word count of entire document, as well as word count of selected text, are incorrect. Tested on different documents.
-> Documents originated as .odt seem to have the word count correct!
Same erroneous results in status bar as in top menu (Tools > Word count).
Results are erratic, sometimes a paragraph gets an exact count, most of the time 75% or less of the exact amount of entire text or selection.

Tested on iMac with Mac OSX 10.8, confirmed on Windows 7
(see https://mail.google.com/mail/?nsr=1&shva=1#inbox/1391c2d245ba246b)
Comment 1 Michael Meeks 2012-08-16 15:13:57 UTC
Can you provide the smallest, sample document that you can generate that shows an incorrect word count, and attach that bug in .doc format (or whatever).

We really need something we can just load and see the issue in - that'd be most helpful.

Thanks !
Comment 2 Guy Voets 2012-08-16 20:20:58 UTC
Created attachment 65660 [details]
Writer file, originally .doc, with erroneous word count

total count = 183
first paragraph = 215
second chapter  89 + 48 + 83
Comment 3 Guy Voets 2012-08-16 20:29:16 UTC
wordcount2.odt: Another .odt with erroneous word count.
Originally from a .doc
It seems this bug may be a replica of 'non-breaking space corrupts word count' (bug 53399): if all non-breaking spaces are deleted, the word count seems to be OK
Comment 4 Guy Voets 2012-08-16 20:30:31 UTC
Created attachment 65661 [details]
another Writer file (originally .docx) with erroneous count

possibly bug is duplicate of 53399
Comment 5 Michael Meeks 2012-09-07 13:23:42 UTC
Bug un-related to 53399 - we still get this wrong at least on load; as soon as I start to select text and/or edit the document - the count is right, I'm testing a master build from:

commit 346cf4ee5d2f82b59900de1f71160c0d90ffab41
Author: Caolán McNamara <caolanm@redhat.com>
Date:   Fri Sep 7 12:39:02 2012 +0100

any thoughts Caolan ? it seems a tad odd :-)
Comment 6 Caolán McNamara 2012-09-07 14:06:22 UTC
These document display a wrong word count because that's the word count stored in their meta.xml which gets loaded and used until the document is dirtied at which point it gets recalculated.

Seeing as LibreOffice wrote these .odt's presumably the wordcount was wrong after loading the original .doc/.docx. What I'd really like is the original .doc and .docx to make sure we now have the right word count after loading the original format. 

Either way I guess we'll probably have to mark loaded SwDocStat's as suspect and recalculate them on first request.
Comment 7 Not Assigned 2012-09-07 15:50:15 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "master":


Resolves: fdo#53590 you can trust no one to tell you the truth

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.