Bug 46757 - Words and Character excluding spaces Word Count incorrect with Record Changes enabled
Summary: Words and Character excluding spaces Word Count incorrect with Record Changes...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.0 Beta3
Hardware: All All
: medium normal
Assignee: Muhammad Haggag
QA Contact:
URL:
Whiteboard: BSA target:3.7.0 target:3.6.0.0.beta2
Keywords: regression
: 48072 50590 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-02-29 02:37 UTC by mogliii
Modified: 2012-08-03 10:00 UTC (History)
6 users (show)

See Also:


Attachments
Screenshot of text showing also the Word Count window (27.35 KB, image/png)
2012-02-29 02:37 UTC, mogliii
Details
Proposed patch. (1.31 KB, patch)
2012-06-12 02:25 UTC, Muhammad Haggag
Details
Updated patch. (1.92 KB, patch)
2012-06-13 07:50 UTC, Muhammad Haggag
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mogliii 2012-02-29 02:37:50 UTC
Created attachment 57796 [details]
Screenshot of text showing also the Word Count window

Problem description: 

Steps to reproduce:
1. Open new writer document and paste the following text:
"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus eu ligula et arcu dapibus viverra ac ut elit. Proin rhoncus sapien et velit cursus ac molestie justo malesuada. Aliquam pretium, orci nec malesuada laoreet, nisl nisi tristique dui, vitae rutrum ipsum libero sit amet nunc."

2. Open Tools -> Word Count, you will see
Words: 45
Characters: 289
Characters excluding spaces: 245

3. Activate tracking of changes 
Edit -> Changes -> Record

4. Mark everything from "Proin" until "nunc." and delete.

5. Word Count now shows
Words: 45
Characters: 57
Characters excluding spaces: 245

Current behavior:
Parts deleted while tracking changes only affects "Characters" in the Word Count.

Expected behavior:
Changes affect either all three counts, or none. 

Platform (if different from the browser): 
Windows 7 64bit
LibreOffice 3.5.0rc3 
Build ID: 7e68ba2-a744ebf-1f241b7-c506db1-7d53735
Comment 1 Jose Manuel 2012-04-20 12:39:19 UTC
[Not Reproducible] with "LibreOffice 3.3.4 - Ubuntu 11.04 (32bit) Spanish UI"
Comment 2 Christopher M. Penalver 2012-05-06 15:18:24 UTC
1) lsb_release -rd
Description: Ubuntu 12.04 LTS
Release: 12.04

2) apt-cache policy libreoffice-writer
libreoffice-writer:
  Installed: 1:3.5.2-2ubuntu1
  Candidate: 1:3.5.2-2ubuntu1
  Version table:
 *** 1:3.5.2-2ubuntu1 0
        500 http://us.archive.ubuntu.com/ubuntu/ precise/main i386 Packages
        100 /var/lib/dpkg/status

3) What is expect to happen in Writer in a blank document is paste the following text:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus eu ligula
et arcu dapibus viverra ac ut elit. Proin rhoncus sapien et velit cursus ac
molestie justo malesuada. Aliquam pretium, orci nec malesuada laoreet, nisl
nisi tristique dui, vitae rutrum ipsum libero sit amet nunc.

Activate record changes via Edit -> Changes -> Record, highlight everything from:
Proin

until:
nunc.

delete it, and the Word Count shows as it does in Word 2010 screenshot:
https://bugs.launchpad.net/ubuntu/+source/libreoffice/+bug/981033/+attachment/3134188/+files/word2010.png

4) What happens instead is it shows:
Words: 45
Characters: 113
Characters excluding spaces: 245
Comment 3 Christopher M. Penalver 2012-05-06 15:20:25 UTC
*** Bug 48072 has been marked as a duplicate of this bug. ***
Comment 4 sasha.libreoffice 2012-05-24 07:11:03 UTC
reproduced in 3.5.3 on Fedora 64 bit
not reproduced in 3.3.4 , therefore regression
problem only in  Tools->Word count, no problem in File->Properties->Statistics
Comment 5 Muhammad Haggag 2012-06-12 01:07:48 UTC
(In reply to comment #4)
> reproduced in 3.5.3 on Fedora 64 bit
> not reproduced in 3.3.4 , therefore regression
> problem only in  Tools->Word count, no problem in File->Properties->Statistics

Are you sure file statistics aren't suffering from the same problem? It's showing 45 for me (same as the word count dialog and status bar).

Also, when you go ahead and save the document, it updates the statistics differently from how the word count dialog does it--it actually counts characters marked for deletion (286), and so the word count dialog shows the same count (since it's seeded from the document statistics). As soon as you start typing, the word count code is invoked and the number of characters becomes 113 again.

The problem is that character counting masks text marked as deleted and hidden text by replacing it with spaces, but all other word/character counting code doesn't. It seems intentional, although the "Why" isn't clear to me. See SwTxtNode::CountWords and its call to lcl_MaskRedlinesAndHiddenText: http://opengrok.libreoffice.org/xref/core/sw/source/core/txtnode/txtedt.cxx#1864

I tracked the change with 'git blame' to the following commit by John LeMoyne Castle:
http://cgit.freedesktop.org/libreoffice/core/commit/?id=4bd28ba4c6d2af96bb6638b88635598e1bb88e8f

Unfortunately, the commit message doesn't explain why it's doing character masking. A google search for "John LeMoyne Castle character count" leads to fdo#30550: https://bugs.freedesktop.org/show_bug.cgi?id=30550

It looks like the initial work was done by Mattias Johnsson, then John fixed several bugs. It seems the intent of his commit was to fix the selection case only. It might be that the character masking bit was erroneously added, perhaps a left over from another commit.

My recommendation is to remove the masking of deleted characters, since it'll be  a lot of work to get that working properly (and consistently) with both word/character count and document statistics, for no obvious benefit. If there's demand for such a feature, it should be filed and tracked separately.

I'll be posting a patch shortly to remove the masking and make the behavior consistent.
Comment 6 Muhammad Haggag 2012-06-12 02:25:11 UTC
Created attachment 62923 [details]
Proposed patch.

After looking at the code closely, I change my recommendation. It's actually straight-forward to get the word counting code to consistently ignore deleted content. Patch attached.

One problem that remains with this patch is document statistics. When you save the document, a gross word count is computed (including deleted content), saved with the document, and seeded to the word count dialog. The word count dialog (and status bar) show the incorrect count until you edit the document (insert/delete something), at which point the proper word counting logic is invoked and the count is corrected.

I won't post the patch for review/commit yet in the hopes that I can fix the document statistics issue as well. If it looks complicated, I'll get this committed and pursue the statistics issue separately.
Comment 7 Muhammad Haggag 2012-06-13 07:50:52 UTC
Created attachment 62967 [details]
Updated patch.

The reason document statistics is broken comes down to the following commit: http://cgit.freedesktop.org/libreoffice/core/commit/?id=6af264883910fe31433b4164b1956f4f9ed75ecb

It disables redlining deleted changes (by removing the flag REDLINE_SHOW_DELETE) when exporting (saving) documents, which leads to SwTxtNode::CountWords counting the deleted changes instead of masking them (since it only masks redlined content).

It appears that was done as a bug fix. Unfortunately, it was fixing a bug from 2001, and information on such bugs are not available anymore. The attached patch leaves the broken document statistics behavior as is, and I'll file a separate bug to track it.
Comment 8 Not Assigned 2012-06-15 08:42:41 UTC
Muhammad Haggag committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=6c14d15dbbdc8920e1695b5fdc32b6519508815d

fdo#46757 Word/character count incorrect with record changes enabled
Comment 9 Not Assigned 2012-06-15 10:04:46 UTC
Muhammad Haggag committed a patch related to this issue.
It has been pushed to "libreoffice-3-6":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=553f9ccfc8a6048528b9ffcd535adf7f1cd51fc7&g=libreoffice-3-6

fdo#46757 Word/character count incorrect with record changes enabled


It will be available in LibreOffice 3.6.
Comment 10 Not Assigned 2012-06-15 13:46:04 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=03a59c7096cde0ced1a88069647c3ec60f86f9d6

Regression test for fdo#46757
Comment 11 Stefan Knorr (astron) 2012-06-28 01:38:59 UTC
Setting to FIXED.

Thank you, Muhammad!
Comment 12 Stefan Knorr (astron) 2012-06-28 01:43:15 UTC
*** Bug 50590 has been marked as a duplicate of this bug. ***
Comment 13 Not Assigned 2012-07-19 13:07:15 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=3442913accc4e44c3a1ac69a990edee15117948e

Related: fdo#46757 fix weird word/char count with hidden deleted text
Comment 14 Not Assigned 2012-08-03 10:00:09 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=27c2fe405ca55a2630176a657fb4895c5e31fcea

Related: fdo#46757 extend ModelToViewHelper for more cases
Comment 15 Not Assigned 2012-08-03 10:00:27 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=92236c0fc4c8704a72f20a3c2e6f22df3c5ae333

Related: fdo#46757 unsafe to pass expanded text to masking