Created attachment 178700 [details]
test cases to demonstrate bug
The problem is with .uno:GotoNextSentenceSel ("Select to Next Sentence")
The attached file gives instructions for how to observe the problem.
Maybe the test cases do not completely identify the boundaries of the problem, but the general issue is that this command selects the rest of the sentence plus the next sentence in cases where the rest of the sentence ends with a period, and followed by a sentence that does not have a capital as the first letter of the first word. It should only select the rest of the sentence.
Friendly advice to testers. Make a keyboard shortcut for "Select to Next Sentence", then use the shortcut with the attached test file.
also repro with
Version: 22.214.171.124.alpha0+ (x64) / LibreOffice Community
Build ID: 7ac19fbce8a35f559eebb879cd0f232bfc95e703
CPU threads: 8; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: da-DK (da_DK); UI: pt-BR
I confirm it with
Version: 126.96.36.199 (x64) / LibreOffice Community
Build ID: a69ca51ded25f3eefd52d7bf9a5fad8c90b87951
CPU threads: 4; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
But behaviour indicates indirectly the wrong spelling (lowercase at beginning of the sentence). So behaviour might be useful. If we change behaviour: How is it possible to distinguish a dot at he end of a sentence from a dot, that is part of an abbreviation?
What do you think?
(In reply to Dieter from comment #2)
> How is it possible to distinguish a dot at he end of a sentence
> from a dot, that is part of an abbreviation?
> What do you think?
- How many times do you have an abbreviation with a dot in a sentence?
(i.e., statistically, relatively rare case for selection)
- admittedly, a missing capitalization is also a rare case, but the current
behavior requires a "repair" action (i.e., new key strokes to back up
because the cursor/selection has gone too far), while the proposed change
can easily handle the abbreviation case by repeating the command
(i.e., same keystroke, Ctrl+Shift+S, no need to move hands).
Here is a test case: This is abc. and nothing else. how does this work.
Actual behavior if starting at T and pressing Ctrl+Shift+S, then entire line is selected (which then requires having to back up).
Proposed behavior: Stops selection after "abc" and then stops selection after "else".
Additional note in relation to your misspelling example. The proposed change would make it easier to place the cursor in the right place for correction, unlike the current behavior.
Should have made clear in comment 3 that Ctrl+Shift+S could be an example of a keyboard shortcut to this command.
Also, as shown in the test file, this problem also arises (for example) with fields. If you use a cross-reference to "Figure 7" then the following two sentences are grammatically correct where Figure 7 is a field, then command selects to the end of the second sentence.
Here is a sentence. Figure 7 shows the diagram.
(This seems like a more common situation then the misspelling or abbreviation cases.)
(In reply to sdc.blanco from comment #3)
> - How many times do you have an abbreviation with a dot in a sentence?
> (i.e., statistically, relatively rare case for selection)
In a German text very often (z.B., v.a., u.a., evtl., ggf.)
> Here is a test case: This is abc. and nothing else. how does this work.
> Actual behavior if starting at T and pressing Ctrl+Shift+S, then entire line
> is selected (which then requires having to back up).
> Proposed behavior: Stops selection after "abc" and then stops selection
> after "else".
I'm fine with that proposal.
Let's ask design-team for decision.
This is a ICU wordbound/sentence bound issue. It also affects the cyclic multi-click mouse selection: double--word, triple--sentence, quad--para.
Believe the logic for the sentence bound is structured with ICU lib calls.
This is locale specific and should depend on ICU lib word break / sentence break iterators for the bounds in the general case, but I've doubts we do so. Instead using viewshell hacks that miss punctuation and grammar in specific cases as here and in the see also bug 125174
If the current behavior changes we can easily continue the selection with uno:GotoNextSentenceSel but the triple click would be changed irrecoverable (3x = sentence, 4x = paragraph).
According https://www.unicode.org/reports/tr29/#Sentence_Boundaries the sentence break is forbidden for 'the resp. leaders are'. The standard is pretty clear to me and we should follow it. No question to UX.
The particular issue is to my understanding NAB.
(In reply to Heiko Tietze from comment #8)
> If the current behavior changes we can easily continue the selection with
> uno:GotoNextSentenceSel but the triple click would be changed irrecoverable
Try triple-click on the test cases. Making the change would be an improvement.
> According https://www.unicode.org/reports/tr29/#Sentence_Boundaries
"As with the other default specifications, implementations are free to override (tailor) the results to meet the requirements of different environments"
Consider this case (also in the test case attachment), where |Field| is meant to represent an inserted field:
This is sentence 1. |Field| is the next sentence.
Actual (with triple-click in first sentence): both sentences selected.
Expected: only the first.
At least this particular tailoring/override seems appropriate.
See comments around OOo era bug i24098  when the sentence break iterators were being implemented in 2004. Not sure we've ever refactored to keep them current with ICU lib offerings.
Michael S. did the last substantive rework of the view shell and cursor mgmt here.
But unclear if we are using CLDR and ICU libs to respond fully to locale--what is happening for CJK and CTL users where "sentence" and word breaks can be much more complex. Are the break iterators appropriate?