Created attachment 65456 [details] word count test document Non-breaking spaces are handles incorrectly by word count, giving wrong and inconsistent results. Recipe: 1. Open the attached document wordcount-sample.odt containing a text consisting of 6 words. Word count is correct in this case, both, in the status line as well as in the word count window (file wordcount-sample-1.png). 2. Replace the space between the fourth and fifth word with a non-breaking space, but don't move the cursor (file wordcount-sample-2.png). Inconsistent behaviour: Word count in status line switches to 4 whereas word count in explicit windows switches to 5. Wrong results: Both word counts are wrong as the number of words didn't change. 3. Move the cursor (file wordcount-sample-3.png). Wrong results: Word count in explicit windows now also switches to 4 and 'characters excluding spaces' decrease from 24 down to 16. It looks like non-breaking spaces are treated like end-of-file markers by the word count algorithm. Still, that doesn't explain why multiple inconsistent word count statistics are displayed in status line and the explicit word count window.
Created attachment 65457 [details] first example - correct results
Created attachment 65458 [details] second example - inconsistent and incorrect results
Created attachment 65459 [details] third example - incorrect results
I've checked with LibreOffice 3.5.2 on Windows XP. Here are the results: 1. After step 2 in the given recipe word count in the word count window decreases to 5 as well. 2. After step 3 (moving the cursor) word count is corrected to 6 again. That is, part of the bug is not present in LibO 3.5.2. LibreOffice 3.5.2.2 Build-ID: 281b639-6baa1d3-ef66a77-d866f25-f36d45f
(In reply to comment #4) > I've checked with LibreOffice 3.5.2 on Windows XP. Here are the results: > > 1. After step 2 in the given recipe word count in the word count window > decreases to 5 as well. > > 2. After step 3 (moving the cursor) word count is corrected to 6 again. That > is, part of the bug is not present in LibO 3.5.2. > > > LibreOffice 3.5.2.2 > Build-ID: 281b639-6baa1d3-ef66a77-d866f25-f36d45f The same applies to LibO 3.5.4 as shipped by Linux Mint 13. LibreOffice 3.5.4.2 Build-ID: 350m1(Build:2)
Also words are not counted (not in status bar, nor in dialog) after the first ZWSP character.
Good catch ! Same problem in LO 3.6.1 rc1 under Linux (Ubuntu 11.10 x86) :-( Best regards. JBF
Hi Muhammad, Can you help here ? :-) Best regards. JBF
Added "regression" keyword -- LibreOffice 3.5.6.2 (Build-ID: e0fbe70-dcba98b-297ab39-994e618-0f858f0) shows the right word count, treating non-breaking spaces correctly just like ordinary spaces.
(In reply to comment #8) > Hi Muhammad, > > Can you help here ? :-) > > Best regards. JBF Hello. I'm investigating :)
The issue is that non-breaking space (as well as a bunch of other Unicode characters in the separator category) isn't handled as a separator/space character in lcl_IsSkippableWhitespace, defined in sw/source/core/txtnode/txtedit.cxx. I'm working on a fix.
Patch is up for review at: https://gerrit.libreoffice.org/453 The patch doesn't fix the inconsistency between the dialog and the status bar. The two update differently. The dialog has some hooks in the editing code that update it when text is entered or selection changes, but it appears it doesn't have a hook for when you replace the currently selected character if replaced with "Insert special character" (and maybe other missing hooks--it's a tedious way to implement the functionality). The status bar field is updated constantly whenever anything changes in the document/selection, so it's more up to date. I'll file a separate bug to track this issue and perhaps have the status bar updates drive the dialog as well.
Muhammad Haggag committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=3ba107606682b5e675127483a514f0e6580ecfd1 fdo#53399 Word count is inconsistent and wrong with non-breaking space
(In reply to comment #13) > Muhammad Haggag committed a patch related to this issue. > It has been pushed to "master": Thank you very much for fixing this issue so fast!
review on-going on-list to back-port to 3.6.2 thanks for the report.
Hi Muhammad, it seems that there is a fatal side effect with the last fix: in French double punctuation marks (; ? ! :) must be preceeded by a non-breaking space. With the last fix this punctuation mark is counted as a word. Steps to reproduce : - open a new empty text doc - type two dummy words like aaa bbb -> count = 2 words - add a non-breaking space followed by a punctuation mark -> 3 words !!! - add another word -> 4 words. So your fix works in the sense that non-breaking space does not interrupt the word count for the current paragraph. Best regards. JBF
(In reply to comment #16) > Hi Muhammad, > > it seems that there is a fatal side effect with the last fix: in French double > punctuation marks (; ? ! :) must be preceeded by a non-breaking space. With the > last fix this punctuation mark is counted as a word. > Steps to reproduce : > - open a new empty text doc > - type two dummy words like aaa bbb -> count = 2 words > - add a non-breaking space followed by a punctuation mark -> 3 words !!! > - add another word -> 4 words. So your fix works in the sense that non-breaking > space does not interrupt the word count for the current paragraph. > > Best regards. JBF Hi Jean, I haven't modified word-counting behavior regarding punctuation. Stand-alone punctuation marks are counted as separate words, even without my change. I can reproduce that on the official Ubuntu LO package (version 3.5.4.2).
(In reply to comment #17) > I haven't modified word-counting behavior regarding punctuation. Stand-alone > punctuation marks are counted as separate words, even without my change. I can > reproduce that on the official Ubuntu LO package (version 3.5.4.2). For this, see bug 38983. French interpunctation is a special case, of course, but IMHO a proper fix for bug 38983 would fix the issue about French interpunctation, too ... therefore: @Jean-Baptiste Faure: Could you please add a short comment about the problem with French interpunctation to bug 38983? Thank you!
(In reply to comment #18) > For this, see bug 38983. French interpunctation is a special case, of course, > but IMHO a proper fix for bug 38983 would fix the issue about French > interpunctation, too ... therefore: Well, I was too fast. Bug 38983 is contaminated by a (IMHO a bit too sophisticated) discussion about the impossibility of an exact account for word-counting. Therefore, while I still think that bug 38983 can be fixed/at least: improved, and that French spacing should be mentioned there, it may be reasonable to file a special (new) bug report about the special case of word counting and French interpunctation, which is special in that IMHO no discussion is necessary about it, so that fixing it is much easier than fixing the general bug 38983 ... Sorry! @Jean-Baptiste Faure: So please file an additional special bug report about the problem with word counting and French interpunctation, and mention in it that this is (unlike bug 38983) a matter which does not need much discussion, but just a fix ;-) Thank you again!
Hi, I am not sure that French punctuation is a special case here. If space is a separator of words and non-breaking space is not, then aaa bbb ccc ; ddd (assuming that the space before ; is a non-breaking one) should be counted for 4 words instead of 3 in LO 3.6, no matter if the third word is defined as "ccc" or "ccc ;". In English you have aaa bbb ccc; ddd which should be counted for 4 words too. Instead of that, in both cases, LO 3.6 stops the count at the ; even when there is a new separator after it. For me the question is why the space following the ; does not play its role of separator when the ; is preceded by a non-breaking space? If non-breaking space is used as a separator, then isolated punctuation marks (that is really used for punctuation) should be separators too and the counting algorithm should aggregate consecutive separators in the same way it is done in csv import in Calc. Best regards. JBF
btw, re comment #12 I added some stuff to update the dialog if its open and the statusbar is updated with more recent up-to-date word/char count data http://cgit.freedesktop.org/libreoffice/core/commit/?id=5192468dd49f5e1d821239cd51cea42f8bac7a4b
Muhammad Haggag committed a patch related to this issue. It has been pushed to "libreoffice-3-6": http://cgit.freedesktop.org/libreoffice/core/commit/?id=48d1979dc3fb4618e04f37e5090c66ddf2fdad3a&g=libreoffice-3-6 fdo#53399 Word count is inconsistent and wrong with non-breaking space It will be available in LibreOffice 3.6.2.
I can confirm that word count doesn't stay permanently incorrect in presence of non-breaking spaces with LibO 3.6.2.1 (Build ID: ba822cc) anymore. Still, word count is temporarily inconsistent in dialogue and status line when replacing a selected space between two words with a non-breaking space by pressing Shift+Ctrl+Space. I have opened bug 54918 for tracking this remaining glitch. Thanks!