Bug 126629 - Writer reads some n-dashes as words - Editing
Summary: Writer reads some n-dashes as words - Editing
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.2.5.2 release
Hardware: x86-64 (AMD64) Windows (All)
: medium trivial
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Formatting-Mark
  Show dependency treegraph
 
Reported: 2019-07-30 17:42 UTC by stephen.sottong
Modified: 2019-08-07 16:25 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Shows example of a dash that is not counted as a word and one that is. (8.11 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-07-30 17:43 UTC, stephen.sottong
Details

Note You need to log in before you can comment on or make changes to this bug.
Description stephen.sottong 2019-07-30 17:42:01 UTC
Description:
I found when checking word count in a long document that Writer always was 10 words longer. I finally traced it to Writer counting some dashes as words. Neither MS Word nor Softmaker Textmaker reads these as words in their count. I can provide a document that demonstrates the difference, but it doesn't reproduce in an online form.

Steps to Reproduce:
1.Not sure how the dashes that are counted were made.
2.
3.

Actual Results:
Some dashes are counted as words

Expected Results:
The count should have ignored the dashes.


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 stephen.sottong 2019-07-30 17:43:41 UTC
Created attachment 153059 [details]
Shows example of a dash that is not counted as a word and one that is.
Comment 2 V Stuart Foote 2019-07-30 20:50:53 UTC
In OOXML the run is "<w:t xml:space="preserve">Earth </w:t><w:softHyphen/><w:t>– not</w:t></w:r>" 

Which on filter import to Writer gives a text run of U+0020 U+00AD U+2013 U+0020

So, seems the filter assigned U+00AD (SOFT HYPHEN) in combination with the (EN DASH) and bounded by spaces is treated as an edit engine word, increasing the word count.