In addition to the issue identified in https://bugs.freedesktop.org/show_bug.cgi?id=35515, there are other instances where the Capitalize first letter of every sentence option is more trouble than it's worth. The 'start of sentence detection' should be improved to recognise the following for what they are, and therefore not perform any capitalization: 1. Common contractions, e.g. "esp." for "especially", "incl." for "including", "temp." for "temporary", "e.g.", "i.e.", etc. 2. Things which are clearly acronyms, e.g. "U.S.", "Y.M.C.A.", etc. In regex terms I'd imagine the pattern to be /([a-zA-Z]\.){2,}/, i.e., any two or more occurrences of a letter followed by a period. You could make a judgment call about whether you wanted to limit it to capital letters. On the plus side you're more likely to be looking at something really intended as an acronym, but on the negative site I often use acronyms like "w.r.t." for "with regard to", and suchlike. This might matter more if you thought "e.g." and "i.e." are more accurately classed as acronyms than contractions; I'm not sure the conceptual distinction would make a difference here in practice. 3. Did you spot the 'intentional mistake' in number 1. above? :-) The case where a contraction or acronym falls at the end of a sentence is tricky. Some cursory research ([1],[2]) confirms that in these situations the correct thing to do is to have only one period, which 'does double duty', both indicating the shortening and ending the sentence. Therefore, in these situations LO would probably miss the new sentence and not be able to capitalize. However, both ending sentences with acronyms and (hopefully) the occurrence of people forgetting to capitalize are pretty rare, so I'd vote to suffer this possible intermittent inconvenience in order to have the benefit which 1. and 2. above would bring. As a pie-in-the-sky concept, I guess it'd be possible to do some heuristics using the grammar engine to determine if the writer probably intended to finish the sentence at a certain point, but that seems like a disproportionate amount of effort. Localization issues ------------------- /[a-zA-Z]/ is Unicode-unfriendly for a start. I can't remember if LO's regex engine supports Unicode-aware character entities like [[:alpha:]]: if it does, we can use them; if it doesn't, that's another bug report :p In addition, it's likely that all the rules above would have to be language-contingent. The possible scope of this might be taking us outside the realms of an EasyHack, but it should be possible to lay the groundwork easily enough. [1] http://ethnicity.rutgers.edu/~jlynch/Writing/p.html#periods [2] http://english.stackexchange.com/search?q=[punctuation]+etc
Thanks for idea. IMHO it is much simpler to type two spaces after abbreviation. And add option "Not capitalize after 2 spaces" that do not capitalizes words after dot and 2 spaces, just deletes extra space.
EasyHack tags unification: only allowed in Whiteboard to make queries more easy and reliable
This is a feature/enhancement request, therefore changed 'Importance' field to 'enhancement'.
Sorry, can't help disagreeing with you there Roman! "Capitalize first letter after a period '.'" is not a feature. The actual feature of the software we're talking about is "Capitalize first letter of *a sentence*". Therefore, to the extent that LO fails to identify properly what is a new sentence and what isn't, this is a *bug* in the existing feature, not a new feature/enhancement request. I feel it might be a little rude to undo your Importance change, but I'd like to hear your reasons why you feel that this issue does not highlight that capitalization-of-new-sentences is partly broken *existing* functionality :-)
We have tools->autocorrect options, and that comes prefilled with a bunch of exceptions. So for en_US i.e. and e.g. are already there. If you have some more words you think should be in there then http://cgit.freedesktop.org/libreoffice/core/tree/extras/source/autotext/lang/en-US/acor/SentenceExceptList.xml is where the list lives and it should be straightforward to add some more in there and submit that to us. That should cover the vast vast majority of cases and is straightforward to do right now.
Can someone provide me with the code pointer for this Enhancement!?i would like to take it up!
Code entry point is SvxAutoCorrect::AutoCorrect in editeng/source/misc/svxacorr.cxx Autocorrect exceptions are stored in extras/source/autotext/lang/*/acor/SentenceExceptList.xml which is where known contractions for a given language go. so... a) for part 1. "common contraction", add any common US English missing contractions to the extras/source/autotext/lang/en-US/acor/SentenceExceptList.xml b) for part 2. I reckon it's likely sufficient to not auto-capitalize the start of a new sentence if there is a previous block of non-white-space-separated text and that previous block has a . as its second last character e.g. Foo. bar ^ capitalize this to Bar ^___ because this is not a period F.o.o. bar ^ do not capitalize this ^___ because this is a period i.e. the best goal is not to make sure to autocapitalize the right things, but to make sure not to autocapitalize the wrong things
What does the "second to last" here mean..??coz according to what i can observe before the "bar" there is a " " and before that there is "." in both the cases! Can someone please elaborate!
F.o.o. bar ^ do not capitalize this, because... ^____ initially consider o as the last character of the sentence, now examine the second last character of that block of non-whitespace characters ^___ this second last character is a period, reject "o" as a sentence and do not capitalize the following word "bar" to "Bar"
Created attachment 75179 [details] helpful patch
The bug still exists, should i start working on it ???
Created attachment 78265 [details] It Fixes the F.o.o. bar auto capitalization In this case , F.o.o. bar , here b is not capitalized and also if we use any other abbreviation like U.S. , the next word of it won't be capitalized even its the end of statement as we can set the user free to decide whether its the end of sentence or not . Because U.S.. is not valid in English . So, basic aim is that it should not unnecessarily capitalize the next word is being fulfilled . Please Review .
anuragkanungo committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=3a390f36e8931e50009438f992ed0e4cdb02cca4 Resolves: fdo#42893 improve Capitalize first letter of Sentence The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
I'm going to consider this closed now as the specific scenario is fixed. There's always the possibility to improve autotext in other ways but then you get a bug issue that bloats out of control and turns into a kitchen sink issue. So if there are further suggestions outside of the specific addressed "f.o.o. remain lowercase" scenario then don't reopen this bug, file a new one.
Migrating Whiteboard tags to Keywords: (EasyHack, DifficultyBeginner, ) [NinjaEdit]
Xisco Fauli committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/ac03c0ed332ab0dcd319c72f46a32c76b88c4812 tdf#42893: sw_autocorrect: Add unittest It will be available in 25.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Xisco Fauli committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/27e8099a740562bd46d716ccc65c8fb42424a557 tdf#42893: sw_autocorrect: Add unittest It will be available in 25.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.