Bug 74895 - Wrong character count as compared to MSO2007 and other LO versions
Summary: Wrong character count as compared to MSO2007 and other LO versions
Status: RESOLVED INVALID
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.4 release
Hardware: Other Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-12 18:04 UTC by e324182
Modified: 2015-04-01 14:51 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
An archive with test documents and statistics (83.96 KB, application/x-7z-compressed)
2014-02-12 18:04 UTC, e324182
Details

Note You need to log in before you can comment on or make changes to this bug.
Description e324182 2014-02-12 18:04:43 UTC
Created attachment 93951 [details]
An archive with test documents and statistics

Different editors (Microsoft Office 2007, LibreOffice of different versions) can show a different character count (with spaces) (see "stat_en" in the attachment). This "stat_en" shows statistics provided by MSO2007, LO 3.5 and LO 4.0 for my test file (test.odt). 

test.doc is a file converted with LO 3.5 from test.odt. As you can see, for some reason the characters count for different formats (the  text contents remains the same) is different. This "text.odt" is a document from my work, in which all characters are replaced with "a". In a real work document, which I cannot submit, the characters count differs even between LibreOffice versions. 

Moreover, the characters count can be different if I change the properties of the document (for example, indicate an author) or insert comments (this bugs seems to be fixed in the following LO versions).

Concerning the statistics provided by MSO2007 and LO, the difference is the most drastical for the entire doc/odt document. The attachment also comprises test_fragm.odt and test_fragm.doc, which are a copy of test.odt in which a top page and headers are removed. The difference in the character count is not so great, but is still present. 

test_lo.pdf and test_word.pdf show the way these files are viewed in MSO2007 and LO 3.5. MSO2007 for some reason moves some text to the next page (which needs a different bug report I guess). I use Liberation Serif instead of Times New Roman (just a font substitution), however, they are compatible, moreover, I don't have such empty space in a printed copy. 

The characters count is a VERY important issue for translators who rely on the characters count instead of a word count. Please note that this bug seems to be valid not only for 3.5 but also for other versions.
Comment 1 Yousuf Philips (jay) (retired) 2014-07-07 02:45:24 UTC
Hello e324182,

I checked the stats in Microsoft Office 2007 for the doc file and it shows:

Words                    :    65
Characters (no spaces)   : 9,469
Characters (with spaces) : 9,484

when you have the 'include textboxes, footnotes and endnotes' checkbox ticked. While in LibreOffice 4.2.5 it shows:

Words                    :    70
Characters (no spaces)   : 9,537
Characters (with spaces) : 9,521

So from these two results, its easy to see that with both of them having different results for the number of words, that the character count would likely also be different. If i press Ctrl+A to select all the entire document without the footnotes, etc., LibreOffice 4.2.5 it shows:

Words                    :    51
Characters (no spaces)   : 6,234
Characters (with spaces) : 6,232

and MS word without the footnotes, etc. shows:

Words                    :    51
Characters (no spaces)   : 6,232
Characters (with spaces) : 6,233

So there isnt really much difference when you compare the two correctly. If you'd like to be able to see the word count of the non-footnotes, etc. also when you check the word count, we can turn this bug into an enhancement request for such a feature.
Comment 2 QA Administrators 2015-02-19 04:33:35 UTC
Dear Bug Submitter,

This bug has been in NEEDINFO status with no change for at least 6 months. Please provide the requested information as soon as possible and mark the bug as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in NEEDINFO status with no change in 30 days the QA team will close the bug as INVALID due to lack of needed information.

For more information about our NEEDINFO policy please read the wiki located here: 
https://wiki.documentfoundation.org/QA/FDO/NEEDINFO

If you have already provided the requested information, please mark the bug as UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed.


Thank you for helping us make LibreOffice even better for everyone!


Warm Regards,
QA Team

Message generated on: 2015-02-18
Comment 3 QA Administrators 2015-04-01 14:51:39 UTC
Dear Bug Submitter,

Please read this message in its entirety before proceeding.

Your bug report is being closed as INVALID due to inactivity and
a lack of information which is needed in order to accurately
reproduce and confirm the problem. We encourage you to retest
your bug against the latest release. If the issue is still
present in the latest stable release, we need the following
information (please ignore any that you've already provided):

a) Provide details of your system including your operating
   system and the latest version of LibreOffice that you have
   confirmed the bug to be present

b) Provide easy to reproduce steps – the simpler the better

c) Provide any test case(s) which will help us confirm the problem

d) Provide screenshots of the problem if you think it might help

e) Read all comments and provide any requested information

Once all of this is done, please set the bug back to UNCONFIRMED
and we will attempt to reproduce the issue. Please do not:

a) respond via email 

b) update the version field in the bug or any of the other details
   on the top section of our bug tracker

-- The LibreOffice QA Team This NEEDINFO Message was generated on: 2015-04-01

Warm Regards,
QA Team