I'm wondering if Libra office can add accessibility features like offline speech to text and text to speech. I noticed the whisper model also not only does speech recognition but translation and it automatically entered grammar. You can find this at huging face https://huggingface.co/openai/whisper-large-v2 openai/whisper-large-v2. This will make Office Suite a lot more accessible for people with disabilities and it will also help with privacy if you can do speech recognition.
It'll also help with privacy if you could do speech recognition offline.
So, one request per bug report. Don't ask for "all sorts of accessibility features". If you want N features, open N bugs. Also - make all of them depend on the accessibility meta-bug, bug 101912. Now, about "offline speech to text" - what exactly do you mean by "offline"? Do you want LO to be able to import audio files, extracting the speech in them as text? If not - please be much more concrete in your description.
An OpenAI service would by nature not be implemented on the client system, but like the current LanguageTool hooks the project could pass control to an external processing service with published API. However, there is no UAA interface supporting speech-to-text nor text-to-speech, meaning any "handling" would need to be provided by extension of the os/DE--and for that there is no standard. Like LanguageTool's AI this would need to be done by extension--and any TDF dev effort to implement would be out of scope. Kind of wonder what the LanguageTool take on the Whisper OpenAI will be. IMHO => WF
A feature like what they have on Word where they have a mic icon and you can click on it and dictate into your microphone. Transcribing your speech. I was looking at different AI Models that could possibly handle this sort of task. I think with the increase in computing power, I could be mistaken but I don't think an API would be necessary. I think the calculations could be done on a users computer to do pretty accurate speech recognition. We see this is the past with programs like Dragon naturally speaking.
[Automated Action] NeedInfo-To-Unconfirmed
Why would it be out of scope to add speech recognition to the office suite I see that office 365 has already added it. There is also the Mozilla speech recognition project. In several others speech recognition projects out there if using the whisper model is an issue. I think accessibility is something we need to leave forward with an open source community. To make sure it's not monopolized buy a bunch of big corporations.
The Assistive Technology of speech-to-text or of text-to-speech are not covered by any meaningful standard--meaning each os/DE will do its own thing. Project can not afford (in dev effort) cost to implement the native code it would require to provide speech-to-text, while text-to-speech is already provided (for better or worse) by os/DE. An OpenAI project like Whisper offers a model for speech recognition and interface (as replacement of keyboard and mouse HID usage) similar to the external Gramerly or LanguageTool provided grammar/spelling and style support, but direct integration beyond that *is* out of scope.
(In reply to V Stuart Foote from comment #3) > An OpenAI service would by nature not be implemented on the client system, > but like the current LanguageTool hooks the project could pass control to an > external processing service with published API. > ... > IMHO => WF +1, we should not rely on one external service.
An extension is very welcome.