Bug 126629 - Writer counts dashes (soft hyphen, hyphen, and others) as words when en-dash and em-dash are ignored
Summary: Writer counts dashes (soft hyphen, hyphen, and others) as words when en-dash ...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: x86-64 (AMD64) All
: medium trivial
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Formatting-Mark Word-Count
  Show dependency treegraph
 
Reported: 2019-07-30 17:42 UTC by steve.sottong
Modified: 2023-06-26 16:57 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Shows example of a dash that is not counted as a word and one that is. (8.11 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-07-30 17:43 UTC, steve.sottong
Details

Note You need to log in before you can comment on or make changes to this bug.
Description steve.sottong 2019-07-30 17:42:01 UTC
Description:
I found when checking word count in a long document that Writer always was 10 words longer. I finally traced it to Writer counting some dashes as words. Neither MS Word nor Softmaker Textmaker reads these as words in their count. I can provide a document that demonstrates the difference, but it doesn't reproduce in an online form.

Steps to Reproduce:
1.Not sure how the dashes that are counted were made.
2.
3.

Actual Results:
Some dashes are counted as words

Expected Results:
The count should have ignored the dashes.


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 steve.sottong 2019-07-30 17:43:41 UTC
Created attachment 153059 [details]
Shows example of a dash that is not counted as a word and one that is.
Comment 2 V Stuart Foote 2019-07-30 20:50:53 UTC
In OOXML the run is "<w:t xml:space="preserve">Earth </w:t><w:softHyphen/><w:t>– not</w:t></w:r>" 

Which on filter import to Writer gives a text run of U+0020 U+00AD U+2013 U+0020

So, seems the filter assigned U+00AD (SOFT HYPHEN) in combination with the (EN DASH) and bounded by spaces is treated as an edit engine word, increasing the word count.
Comment 3 QA Administrators 2021-08-07 03:40:06 UTC Comment hidden (obsolete)
Comment 4 Diana Vides 2023-05-25 01:55:11 UTC
I was able to reproduce this bug first in version 6.4.7.2. When using a short dash is counted as a word but when using a long dash (autocorrected) is not counted as a word.
Steps to Reproduce:
1. Type a dash and add space and type a word and press enter
2. Type a word add space and type a dash and type a word and add space


Actual Results:
The short dash in Step 1 is counted as a word and the long(autocorrected)dash in Step 2 is not counted as a word.

Expected Results:
Both short dash and long dash should be counted or ignored depending on the specifications. The user guide is ambiguous. 
https://help.libreoffice.org/7.2/en-US/text/swriter/guide/words_count.html?&DbPAR=WRITER&System=WIN


Version: 6.4.7.2 (x64)
Build ID: 639b8ac485750d5696d7590a72ef1b496725cfb5
CPU threads: 6; OS: Windows 10.0 Build 19045; UI render: default; VCL: win; 
Locale: en-US (en_US); UI-Language: en-US
Calc: CL

I reproduced it in version 7.5.2.2 and it is still present 

Version: 7.5.2.2 (X86_64) / LibreOffice Community
Build ID: 53bb9681a964705cf672590721dbc85eb4d0c3a2
CPU threads: 6; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

I reproduced it in the master version  7.6.0.0 and it is still present 

Version: 7.6.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: f4c24da1e7f11664e0d2f688d2531f068e4a3bc0
CPU threads: 6; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL threaded
Comment 5 Stéphane Guillou (stragu) 2023-06-26 16:57:54 UTC
I checked in OOo 3.3, it was already the case for a simple hyphen and a soft hyphen surrounded by spaces (although the en-dash was also counted back then).

Related issue looking at the documentation is bug 62799.

Testing in 24.2 alpha0+:

Not counted

En – dash: not counted (U+2013)
Em — dash: not counted (U+2014)

Counted

Horizontal ― bar: counted (U+2015)
Figure ‒ dash: counted (U+2012)
Hyphen - minus: counted (U+002D)
Minus − sign: counted (U+2212)
Hyphen ‐ hyphen: counted (U+2010)
Soft ­ hyphen: counted (U+00AD)

Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 9fc0b2b9b96d87eb642a3b29e9dcb5d6273265eb
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded