Created attachment 57796 [details] Screenshot of text showing also the Word Count window Problem description: Steps to reproduce: 1. Open new writer document and paste the following text: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus eu ligula et arcu dapibus viverra ac ut elit. Proin rhoncus sapien et velit cursus ac molestie justo malesuada. Aliquam pretium, orci nec malesuada laoreet, nisl nisi tristique dui, vitae rutrum ipsum libero sit amet nunc." 2. Open Tools -> Word Count, you will see Words: 45 Characters: 289 Characters excluding spaces: 245 3. Activate tracking of changes Edit -> Changes -> Record 4. Mark everything from "Proin" until "nunc." and delete. 5. Word Count now shows Words: 45 Characters: 57 Characters excluding spaces: 245 Current behavior: Parts deleted while tracking changes only affects "Characters" in the Word Count. Expected behavior: Changes affect either all three counts, or none. Platform (if different from the browser): Windows 7 64bit LibreOffice 3.5.0rc3 Build ID: 7e68ba2-a744ebf-1f241b7-c506db1-7d53735
[Not Reproducible] with "LibreOffice 3.3.4 - Ubuntu 11.04 (32bit) Spanish UI"
1) lsb_release -rd Description: Ubuntu 12.04 LTS Release: 12.04 2) apt-cache policy libreoffice-writer libreoffice-writer: Installed: 1:3.5.2-2ubuntu1 Candidate: 1:3.5.2-2ubuntu1 Version table: *** 1:3.5.2-2ubuntu1 0 500 http://us.archive.ubuntu.com/ubuntu/ precise/main i386 Packages 100 /var/lib/dpkg/status 3) What is expect to happen in Writer in a blank document is paste the following text: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus eu ligula et arcu dapibus viverra ac ut elit. Proin rhoncus sapien et velit cursus ac molestie justo malesuada. Aliquam pretium, orci nec malesuada laoreet, nisl nisi tristique dui, vitae rutrum ipsum libero sit amet nunc. Activate record changes via Edit -> Changes -> Record, highlight everything from: Proin until: nunc. delete it, and the Word Count shows as it does in Word 2010 screenshot: https://bugs.launchpad.net/ubuntu/+source/libreoffice/+bug/981033/+attachment/3134188/+files/word2010.png 4) What happens instead is it shows: Words: 45 Characters: 113 Characters excluding spaces: 245
*** Bug 48072 has been marked as a duplicate of this bug. ***
reproduced in 3.5.3 on Fedora 64 bit not reproduced in 3.3.4 , therefore regression problem only in Tools->Word count, no problem in File->Properties->Statistics
(In reply to comment #4) > reproduced in 3.5.3 on Fedora 64 bit > not reproduced in 3.3.4 , therefore regression > problem only in Tools->Word count, no problem in File->Properties->Statistics Are you sure file statistics aren't suffering from the same problem? It's showing 45 for me (same as the word count dialog and status bar). Also, when you go ahead and save the document, it updates the statistics differently from how the word count dialog does it--it actually counts characters marked for deletion (286), and so the word count dialog shows the same count (since it's seeded from the document statistics). As soon as you start typing, the word count code is invoked and the number of characters becomes 113 again. The problem is that character counting masks text marked as deleted and hidden text by replacing it with spaces, but all other word/character counting code doesn't. It seems intentional, although the "Why" isn't clear to me. See SwTxtNode::CountWords and its call to lcl_MaskRedlinesAndHiddenText: http://opengrok.libreoffice.org/xref/core/sw/source/core/txtnode/txtedt.cxx#1864 I tracked the change with 'git blame' to the following commit by John LeMoyne Castle: http://cgit.freedesktop.org/libreoffice/core/commit/?id=4bd28ba4c6d2af96bb6638b88635598e1bb88e8f Unfortunately, the commit message doesn't explain why it's doing character masking. A google search for "John LeMoyne Castle character count" leads to fdo#30550: https://bugs.freedesktop.org/show_bug.cgi?id=30550 It looks like the initial work was done by Mattias Johnsson, then John fixed several bugs. It seems the intent of his commit was to fix the selection case only. It might be that the character masking bit was erroneously added, perhaps a left over from another commit. My recommendation is to remove the masking of deleted characters, since it'll be a lot of work to get that working properly (and consistently) with both word/character count and document statistics, for no obvious benefit. If there's demand for such a feature, it should be filed and tracked separately. I'll be posting a patch shortly to remove the masking and make the behavior consistent.
Created attachment 62923 [details] Proposed patch. After looking at the code closely, I change my recommendation. It's actually straight-forward to get the word counting code to consistently ignore deleted content. Patch attached. One problem that remains with this patch is document statistics. When you save the document, a gross word count is computed (including deleted content), saved with the document, and seeded to the word count dialog. The word count dialog (and status bar) show the incorrect count until you edit the document (insert/delete something), at which point the proper word counting logic is invoked and the count is corrected. I won't post the patch for review/commit yet in the hopes that I can fix the document statistics issue as well. If it looks complicated, I'll get this committed and pursue the statistics issue separately.
Created attachment 62967 [details] Updated patch. The reason document statistics is broken comes down to the following commit: http://cgit.freedesktop.org/libreoffice/core/commit/?id=6af264883910fe31433b4164b1956f4f9ed75ecb It disables redlining deleted changes (by removing the flag REDLINE_SHOW_DELETE) when exporting (saving) documents, which leads to SwTxtNode::CountWords counting the deleted changes instead of masking them (since it only masks redlined content). It appears that was done as a bug fix. Unfortunately, it was fixing a bug from 2001, and information on such bugs are not available anymore. The attached patch leaves the broken document statistics behavior as is, and I'll file a separate bug to track it.
Muhammad Haggag committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=6c14d15dbbdc8920e1695b5fdc32b6519508815d fdo#46757 Word/character count incorrect with record changes enabled
Muhammad Haggag committed a patch related to this issue. It has been pushed to "libreoffice-3-6": http://cgit.freedesktop.org/libreoffice/core/commit/?id=553f9ccfc8a6048528b9ffcd535adf7f1cd51fc7&g=libreoffice-3-6 fdo#46757 Word/character count incorrect with record changes enabled It will be available in LibreOffice 3.6.
Caolan McNamara committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=03a59c7096cde0ced1a88069647c3ec60f86f9d6 Regression test for fdo#46757
Setting to FIXED. Thank you, Muhammad!
*** Bug 50590 has been marked as a duplicate of this bug. ***
Caolan McNamara committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=3442913accc4e44c3a1ac69a990edee15117948e Related: fdo#46757 fix weird word/char count with hidden deleted text
Caolan McNamara committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=27c2fe405ca55a2630176a657fb4895c5e31fcea Related: fdo#46757 extend ModelToViewHelper for more cases
Caolan McNamara committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=92236c0fc4c8704a72f20a3c2e6f22df3c5ae333 Related: fdo#46757 unsafe to pass expanded text to masking