Bug 147828 - "Select to Next Sentence" does not work properly when the current sentence ends with a period and the next sentence does not have a capital letter at the beginning of the first word
Summary: "Select to Next Sentence" does not work properly when the current sentence en...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.2.5.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsDevEval
Depends on:
Blocks: Selection
  Show dependency treegraph
 
Reported: 2022-03-07 15:23 UTC by sdc.blanco
Modified: 2022-04-01 15:07 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
test cases to demonstrate bug (15.23 KB, application/vnd.oasis.opendocument.text)
2022-03-07 15:23 UTC, sdc.blanco
Details

Note You need to log in before you can comment on or make changes to this bug.
Description sdc.blanco 2022-03-07 15:23:40 UTC
Created attachment 178700 [details]
test cases to demonstrate bug

The problem is with .uno:GotoNextSentenceSel ("Select to Next Sentence")

The attached file gives instructions for how to observe the problem.

Maybe the test cases do not completely identify the boundaries of the problem, but the general issue is that this command selects the rest of the sentence plus the next sentence in cases where the rest of the sentence ends with a period, and followed by a sentence that does not have a capital as the first letter of the first word.  It should only select the rest of the sentence.

Friendly advice to testers.  Make a keyboard shortcut for "Select to Next Sentence", then use the shortcut with the attached test file.
Comment 1 sdc.blanco 2022-03-07 15:25:27 UTC
also repro with 

Version: 7.4.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 7ac19fbce8a35f559eebb879cd0f232bfc95e703
CPU threads: 8; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: da-DK (da_DK); UI: pt-BR
Calc: CL
Comment 2 Dieter 2022-03-22 06:09:57 UTC
I confirm it with

Version: 7.3.1.3 (x64) / LibreOffice Community
Build ID: a69ca51ded25f3eefd52d7bf9a5fad8c90b87951
CPU threads: 4; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
Calc: CL

But behaviour indicates indirectly the wrong spelling (lowercase at beginning of the sentence). So behaviour might be useful. If we change behaviour: How is it possible to distinguish a dot at he end of a sentence from a dot, that is part of an abbreviation?

What do you think?
Comment 3 sdc.blanco 2022-03-22 10:41:38 UTC
(In reply to Dieter from comment #2)
> How is it possible to distinguish a dot at he end of a sentence
> from a dot, that is part of an abbreviation?
> 
> What do you think?
2-part answer.

   - How many times do you have an abbreviation with a dot in a sentence? 
     (i.e., statistically, relatively rare case for selection)
 
   - admittedly, a missing capitalization is also a rare case, but the current
     behavior requires a "repair" action (i.e., new key strokes to back up 
     because the cursor/selection has gone too far), while the proposed change 
     can easily handle the abbreviation case by repeating the command 
     (i.e., same keystroke, Ctrl+Shift+S, no need to move hands).
    
Here is a test case:  This is abc. and nothing else. how does this work.

Actual behavior if starting at T and pressing Ctrl+Shift+S, then entire line is selected (which then requires having to back up).

Proposed behavior:  Stops selection after "abc" and then stops selection after "else".  

Additional note in relation to your misspelling example. The proposed change would make it easier to place the cursor in the right place for correction, unlike the current behavior.
Comment 4 sdc.blanco 2022-03-22 10:48:27 UTC
Should have made clear in comment 3 that Ctrl+Shift+S could be an example of a keyboard shortcut to this command. 

Also, as shown in the test file, this problem also arises (for example) with fields.  If you use a cross-reference to "Figure 7" then the following two sentences are grammatically correct where Figure 7 is a field, then command selects to the end of the second sentence.  

Here is a sentence. Figure 7 shows the diagram.

(This seems like a more common situation then the misspelling or abbreviation cases.)
Comment 5 Dieter 2022-03-30 13:50:24 UTC
(In reply to sdc.blanco from comment #3)

>    - How many times do you have an abbreviation with a dot in a sentence? 
>      (i.e., statistically, relatively rare case for selection)

In a German text very often (z.B., v.a., u.a., evtl., ggf.)

> Here is a test case:  This is abc. and nothing else. how does this work.
> 
> Actual behavior if starting at T and pressing Ctrl+Shift+S, then entire line
> is selected (which then requires having to back up).
> 
> Proposed behavior:  Stops selection after "abc" and then stops selection
> after "else".

I'm fine with that proposal.

Let's ask design-team for decision.
Comment 6 V Stuart Foote 2022-03-30 15:38:32 UTC
This is a ICU wordbound/sentence bound issue. It also affects the cyclic multi-click mouse selection: double--word, triple--sentence, quad--para.

Believe the logic for the sentence bound is structured with ICU lib calls.
Comment 7 V Stuart Foote 2022-03-30 17:25:36 UTC
This is locale specific and should depend on ICU lib word break / sentence break iterators for the bounds in the general case, but I've doubts we do so. Instead using viewshell hacks that miss punctuation and grammar in specific cases as here and in the see also bug 125174
Comment 8 Heiko Tietze 2022-03-31 09:10:29 UTC
If the current behavior changes we can easily continue the selection with uno:GotoNextSentenceSel but the triple click would be changed irrecoverable (3x = sentence, 4x = paragraph).

According https://www.unicode.org/reports/tr29/#Sentence_Boundaries the sentence break is forbidden for 'the resp. leaders are'. The standard is pretty clear to me and we should follow it. No question to UX.

The particular issue is to my understanding NAB.
Comment 9 sdc.blanco 2022-03-31 09:48:59 UTC
(In reply to Heiko Tietze from comment #8)
> If the current behavior changes we can easily continue the selection with
> uno:GotoNextSentenceSel but the triple click would be changed irrecoverable
Try triple-click on the test cases.  Making the change would be an improvement.

> According https://www.unicode.org/reports/tr29/#Sentence_Boundaries
"As with the other default specifications, implementations are free to override (tailor) the results to meet the requirements of different environments"

Consider this case (also in the test case attachment), where |Field| is meant to represent an inserted field:

This is sentence 1. |Field| is the next sentence. 

Actual (with triple-click in first sentence): both sentences selected.
Expected: only the first.

At least this particular tailoring/override seems appropriate.
Comment 10 V Stuart Foote 2022-04-01 15:00:49 UTC
See comments around OOo era bug i24098 [1] when the sentence break iterators were being implemented in 2004. Not sure we've ever refactored to keep them current with ICU lib offerings.

=-ref-=

[1] https://bz.apache.org/ooo/show_bug.cgi?id=24098#c12
Comment 11 V Stuart Foote 2022-04-01 15:07:00 UTC
Michael S. did the last substantive rework of the view shell and cursor mgmt here.
But unclear if we are using CLDR and ICU libs to respond fully to locale--what is happening for CJK and CTL users where "sentence" and word breaks can be much more complex. Are the break iterators appropriate?

=-ref-=

https://opengrok.libreoffice.org/xref/core/sw/inc/breakit.hxx?a=true&r=06bd8d70&h=60#62

https://opengrok.libreoffice.org/xref/core/sw/source/core/crsr/swcrsr.cxx?a=true&r=ec1c4c49&h=1550#1558