139185 – "creatine" is detected as a Romanian word

Bug 139185 - "creatine" is detected as a Romanian word

Summary: "creatine" is detected as a Romanian word

Status:	RESOLVED INSUFFICIENTDATA

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Linguistic (show other bugs)
Version: (earliest affected)	7.0.4.2 release
Hardware:	All All

Importance:	medium minor
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	Language-Detection
	Show dependency tree / graph

Reported:	2020-12-23 12:03 UTC by Dan Dascalescu
Modified:	2024-12-26 03:16 UTC (History)
CC List:	6 users (show)

See Also:	https://bz.apache.org/ooo/show_bug.cgi?id=73173 95274 76974 113298
Crash report or crash signature:

Attachments
"creatine" is actually a US English word (43.54 KB, image/png) 2020-12-23 12:03 UTC, Dan Dascalescu	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dan Dascalescu 2020-12-23 12:03:33 UTC

Created attachment 168450 [details]
"creatine" is actually a US English word

Not sure if this should be filed against a dictionary component, please re-file accordingly.

Comment 1 Mike Kaganski 2020-12-23 12:29:36 UTC

What is specifically wrong with the screenshot, and why do you say in the title that "creatine" is detected as a Romanian word? At least the image does not make it obvious.

What I see is that it detects a spelling error on "creatine" written in an unknown language (the status bar, which could tell the language information, has not fit on the screenshot); and that there is a "Word is Romanian" suggestion - again, unclear why, given that there's no OS and LO configuration information provided in the report. I would guess that it simply suggests user's locale, or maybe from the list of installed dictionaries, or somesuch, without any relation to whether it thinks the word is Romanian or not.

And only if it does not underline it when set to Romanian; or if there's a reason to believe that it shows this suggestion exactly because of the guess, and not because there are installed components that it suggests, can we think that the preamble is correct ...

Comment 2 Ming Hua 2020-12-23 12:36:35 UTC

I'm not sure including all amino acid names in the general-purpose English dictionary is a good idea.

I don't know anything about Romanian, but "creatine" is probably indeed a common Romanian word, therefore you see the suggestion.  It only appears if you have Romanian dictionary installed (and maybe enabled)?

You can always solve your problem locally by adding "creatine" to your user's dictionary using the "Add to Dictionary" menu item, but you probably already know that.

Comment 3 Ming Hua 2020-12-23 16:14:14 UTC

(In reply to Ming Hua from comment #2)
> It only appears if you have Romanian dictionary installed (and maybe enabled)
I take this back.

I was testing in Writer and didn't see the "Word is Romanian (Romania)" menu item like Dan's screenshot showed.  Now that I've tested in Calc, I can see the same menu even if I don't have Romanian dictionary installed.

Version: 7.1.0.0.beta1 (x64)
Build ID: 828a45a14a0b954e0e539f5a9a10ca31c81d8f53
CPU threads: 2; OS: Windows 10.0 Build 18363; UI render: default; VCL: win
Locale: zh-CN (zh_CN); UI: zh-CN
Calc: threaded

Chinese locale and UI, default western text in Tools > Options > Language Settings > Languages is set to "English (USA)", the text "Cellucor creatine" in a cell is detected as English according to the status bar, yet the context menu when right-clicking on "creatine" still gives "Word is Romanian..." and "Paragraph is Romanian..." items.

Comment 4 Mike Kaganski 2020-12-23 16:39:34 UTC

Looking into the code, OP seems to have guessed right.

The menu items are created in EditView::ExecuteSpellPopup (editeng/source/editeng/editview.cxx). It uses a language guesser, implemented in lingucomponent/source/languageguessing/guesslang.cxx.

When used for a single word, EditView::CheckLanguage tries four languages:
* The default document language from "Tools/Options - Language Settings - Languages: Western";
* The one from "Tools/Options - Language Settings - Languages: User interface";
* The one from "Tools/Options - Language Settings - Languages: Locale setting";
* en-US.
If they have active dictionaries, then first of them is used further.

When checking paragraph text, the language guesser uses libexttextcat [1] to perform a "fingerprint-based" guessing. It looks highly unreliable, based on the evidence...

I suppose it is the same as (part of) tdf#66051. Personally I would just drop it.

[1] https://wiki.documentfoundation.org/Libexttextcat

Comment 5 Mike Kaganski 2020-12-23 16:40:48 UTC

(In reply to Mike Kaganski from comment #4)
> Personally I would just drop it.

... I mean, just drop the language guesser. I don't see it doing anything useful.

Comment 6 Dan Dascalescu 2020-12-23 19:25:14 UTC

Agree, I would like to disable the language guesser altogether (is there a way to do that?) for the performance gain, because I only use English in my documents (part of an effort to advocate for using English universally, since the costs of translation, globally, exceed those of eliminating hunger, http://bit.ly/translation-vs-world-hunger, but that's a totally separate story).

FWIW, I don't have any locales installed either. I'm coincidentally Romanian and "creatine" is not a Romanian word actually (https://dexonline.ro/definitie/creatine).

I would advocate for including it in the English dictionary because it is more than just another amino acid; it's probably the second most popular supplement in the fitness industry.

Comment 7 Mike Kaganski 2020-12-24 07:38:40 UTC

See also: "Language Guessing" at https://www.openoffice.org/development/releases/2.3.0.html

Comment 8 Michael Bauer 2021-12-08 13:00:04 UTC

I agree the language guesser is not working well but I don't think kicking it is the solution. It is - even though I have nothing Romanian installed on my PC ANYWHERE - suggesting Romanian to me.

In any case, as a simple solution, use Marco Pinto's English dictionary? It's not on the LO extensions site but on the OO one: https://extensions.openoffice.org/en/project/english-dictionaries-apache-openoffice 
It's much more up do date than what LO seems to bundle and it certainly passes  creatine as an English word for me.

Comment 9 Mike Kaganski 2021-12-08 13:12:58 UTC

(In reply to Michael Bauer from comment #8)

Marco Pinto is a great LO contributor: https://gerrit.libreoffice.org/q/owner:marcoagpinto%2540sapo.pt

So it's definitely not that "It's much more up do date than what LO seems to bundle" - and the issue of a single word in the dictionary would not solve the underlying issue of "random" guessing of applicable languages based on what is focused (which is what your bug 95274 is about, either).

And anyone is of course welcome to provide contributions to our dictionaries :-) - see https://wiki.documentfoundation.org/Development/Dictionaries

Comment 10 Stéphane Guillou (stragu) 2024-05-28 06:19:38 UTC

I suggest closing as duplicate of bug 95274, as it boils down the the same issue: libexttextcat not doing a great job, at least in how we use it currently.
Any objection?

Comment 11 QA Administrators 2024-11-25 03:11:02 UTC Comment hidden (obsolete)

Dear Dan Dascalescu,

This bug has been in NEEDINFO status with no change for at least
6 months. Please provide the requested information as soon as
possible and mark the bug as UNCONFIRMED. Due to regular bug
tracker maintenance, if the bug is still in NEEDINFO status with
no change in 30 days the QA team will close the bug as INSUFFICIENTDATA
due to lack of needed information.

For more information about our NEEDINFO policy please read the
wiki located here:
https://wiki.documentfoundation.org/QA/Bugzilla/Fields/Status/NEEDINFO

If you have already provided the requested information, please
mark the bug as UNCONFIRMED so that the QA team knows that the
bug is ready to be confirmed.
 
Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-NeedInfo-Ping

Comment 12 QA Administrators 2024-12-26 03:16:47 UTC

Dear Dan Dascalescu,

Please read this message in its entirety before proceeding.

Your bug report is being closed as INSUFFICIENTDATA due to inactivity and
a lack of information which is needed in order to accurately
reproduce and confirm the problem. We encourage you to retest
your bug against the latest release. If the issue is still
present in the latest stable release, we need the following
information (please ignore any that you've already provided):

a) Provide details of your system including your operating
   system and the latest version of LibreOffice that you have
   confirmed the bug to be present

b) Provide easy to reproduce steps – the simpler the better

c) Provide any test case(s) which will help us confirm the problem

d) Provide screenshots of the problem if you think it might help

e) Read all comments and provide any requested information

Once all of this is done, please set the bug back to UNCONFIRMED
and we will attempt to reproduce the issue. Please do not:

a) respond via email 

b) update the version field in the bug or any of the other details
   on the top section of our bug tracker

Warm Regards,
QA Team

MassPing-NeedInfo-FollowUp