Bug 30550 - EasyHacks: Character count without spaces
Summary: EasyHacks: Character count without spaces
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Arnaud Versini
URL:
Whiteboard:
Keywords: easyHack
Depends on: 33774 38690
Blocks:
  Show dependency treegraph
 
Reported: 2010-10-01 15:26 UTC by vilen.looga
Modified: 2015-12-18 09:55 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot showing the wrong word count. (692.16 KB, image/png)
2010-11-01 06:30 UTC, sophie
Details
Test case for bug, not reproduced. (136.00 KB, image/png)
2010-11-16 21:30 UTC, Brandon Fields
Details

Note You need to log in before you can comment on or make changes to this bug.
Description vilen.looga 2010-10-01 15:26:37 UTC
This is a long-standing OOorg feature request (8 years to be more precise: http://www.openoffice.org/issues/show_bug.cgi?id=10356), hopefully it can be implemented in LibreOffice.
Comment 1 Raphael Bircher 2010-10-01 15:55:25 UTC
I think we can, but it realy make sense? I mean, professional translators are a small group of OOo Users, so for a load of people this feature is not important. This is the reason why we have extensions.

We can maby writhe a extension who also indludes price calculator. Ok th function you request is not big but maybe we can writhe samething better if we take the Extension Solution. what you think about?
Comment 2 Kohei Yoshida 2010-10-01 18:12:31 UTC
Well, I wouldn't be so dismissive of this enhancement right off the butt.  If this satisfies even a small group of user base, we should at least give it a fair consideration.  Also, regarding using extension, we already have the word count dialog there; it may be much simpler to extend that to include this enhancement than moving the whole thing (or just the requested functionality) into an extension.

Anyway, that's my opinion.
Comment 3 Jan Holesovsky 2010-10-14 05:20:30 UTC
I have added this to the EasyHacks page - sounds reasonably easy for somebody interested.

http://wiki.documentfoundation.org/Development/Easy_Hacks#Count_characters_without_whitespace_in_the_Writer_statistics
Comment 4 Norbert Thiebaud 2010-10-14 06:00:37 UTC
for the counting it looks like the interesting function is:

void SwTxtNode::CountWords( SwDocStat& rStat, xub_StrLen nStt, xub_StrLen nEnd ) const
in writer/sw/source/core/txtnode.cxx

something like

diff --git a/sw/source/core/txtnode/txtedt.cxx b/sw/source/core/txtnode/txtedt.cxx
index 19af890..ac83925 100644
--- a/sw/source/core/txtnode/txtedt.cxx
+++ b/sw/source/core/txtnode/txtedt.cxx
@@ -1885,6 +1885,7 @@ void SwTxtNode::CountWords( SwDocStat& rStat,
             ++rStat.nPara;
             ULONG nTmpWords = 0;
             ULONG nTmpChars = 0;
+            ULONG nTmpWordsChars = 0; // Count the umber of chars used in words
 
             // Shortcut: Whole paragraph should be considered and cached values
             // are valid:
@@ -1892,6 +1893,7 @@ void SwTxtNode::CountWords( SwDocStat& rStat,
             {
                 nTmpWords = GetParaNumberOfWords();
                 nTmpChars = GetParaNumberOfChars();
+                nTmpWordsChars = GetParaNumberOfWordsChars();
             }
             else
             {
@@ -1925,9 +1927,14 @@ void SwTxtNode::CountWords( SwDocStat& rStat,
 
                     while ( aScanner.NextWord() )
                     {
-                        if ( aScanner.GetLen() > 1 ||
-                             CH_TXTATR_BREAKWORD != aExpandText.match(aBreakWord, aScanner.GetBegin() ) )
-                            ++nTmpWords;
+                        if(CH_TXTATR_BREAKWORD != ExpandText.match(aBreakWord, aScanner.GetBegin() ))
+                        {
+                            if ( aScanner.GetLen() > 1)
+                            {
+                                ++nTmpWords;
+                            }
+                            nTmpWordsChar += aScanner.GetLen();
+                        }
                     }
                 }


should count the number of characters used in 'words' in a textnode.
which then need to be accumulated in the stat of the document...

then of course there are gui change to display it, and possibly some change to be able to store it in the saved document (in order not to recalculated it at every open...)...
That I have no clue where and how yet...
Comment 5 Sebastian@SSpaeth.de 2010-10-21 07:32:51 UTC
Just another opinion, should this important but very specialized feature not be offered in the form of an optional and external extension?

Adding "EasyHacks" to summary to make it findable.
Comment 6 Arnaud Versini 2010-10-21 11:01:27 UTC
I've started a patch for this request.
Comment 7 LeMoyne Castle 2010-10-26 12:01:50 UTC
I have emailed Arnaud and he yielded this project to me - I did not mean to scare him off.  I have done some analysis and proto/junk hacking.  

I am drafting a plan beyond the basic fix requested here.  There are currently several hard-coded exclusions/inclusions for notes, hidden and other textnode subtypes.  It should be possible to come up with a system that gives a user option for where to count (what nodes to include?) and what to count (what is white? what else gets excluded?).  

The plan will include involving Doc people in specifying what actual users want in this area by using TDF Wiki.  Freelancers and contractors who write or translate for a living use word and character counts to estimate time for bids, to measure progress in the work and to bill for work done.  There are dozens of dup/related bugs in the OO issuezilla e.g. 4568, 10356) and they make all kinds of feature requests.  The Doc people (OO's committed user base) will have a better take on what users really need and want in this area. 

LeMoyne - JLCastle
jlc@mail2lee.com
Comment 9 LeMoyne Castle 2010-10-27 07:00:06 UTC
Gratz to Mattias Johnsson.  From looking at his patch the basic fix is in.  I let myself get bogged down in warm-up patches, other stuff and the greater problem here.  I still believe that the Documentation people will have much to say about what tools are useful to writers in this area of doc stats.
Comment 10 Bartosz 2010-10-29 23:45:13 UTC
Thanks for this great implementation.

If you add character count without spaces, it will be great to add also paragraph counter in one step. 

The paragraph counter is already implemented, we need only display it.
Comment 11 sophie 2010-11-01 06:30:19 UTC
Created attachment 39950 [details]
Screenshot showing the wrong word count.
Comment 12 sophie 2010-11-01 06:32:42 UTC
There is something wrong with selection ending with a punctuation. See the attached screenshot, the number of characters is the same for the those counted without space where it's not for those counted with space. Reopening - Sophie
Comment 13 Linus 2010-11-01 18:00:47 UTC
(In reply to comment #12)
> There is something wrong with selection ending with a punctuation. See the
> attached screenshot, the number of characters is the same for the those counted
> without space where it's not for those counted with space. Reopening - Sophie

It's not only with selections ending with a punctuation.
The text "Selection." will always give 10 characters without spaces.
For example, Selecting the 'l' will give 1 word, 1 character and 10 characters without spaces. Selecting 'le' will give 1 word, 2 characters and 10 characters without spaces.
Comment 14 Brandon Fields 2010-11-16 21:30:24 UTC
Created attachment 40320 [details]
Test case for bug, not reproduced.

I am unable to resproduce sophie's test case on Ubuntu 10.10 with version LIBREOFFICE_3_3_FREEZE-86-gbb8af21
Comment 15 vilen.looga 2010-11-28 09:41:44 UTC
Hi, guys!

I just tried the 3rd beta and I'm really happy with the progress that you have made with this feature. So as my way of saying thanks I made a little donation to LibreOffice.

Keep up the good work!
Comment 16 Rob Snelders 2011-05-27 11:33:38 UTC
I also can't reproduce the bug on Ubuntu 10.04 (64bit) in the trunk
Comment 17 Korrawit Pruegsanusak 2011-07-06 09:30:14 UTC
For issue in comment 11 and comment 12, fixed in this commit:
http://cgit.freedesktop.org/libreoffice/writer/commit/?h=libreoffice-3-3&id=c450ac7031cd7a2146380b6664df24fd9d2b995c

Note that this is specific to LibO 3.3.x release, because in -3-4 and master, they've been fixed after branching of -3-3:
http://cgit.freedesktop.org/libreoffice/writer/commit/?id=335534df4946437a12cd3c18b4a24beee188317b

Mark as FIXED/RESOLVED
Comment 18 Robinson Tryon (qubit) 2015-12-18 09:55:43 UTC
Migrating Whiteboard tags to Keywords: (EasyHack)
[NinjaEdit]