Bug 155675 - Accessibility Request -- Whisper OpenAI
Summary: Accessibility Request -- Whisper OpenAI
Status: RESOLVED WONTFIX
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Extensions (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: accessibility
Depends on:
Blocks: a11y, Accessibility DoAsMacro
  Show dependency treegraph
 
Reported: 2023-06-04 20:32 UTC by Misty
Modified: 2023-10-04 12:31 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Misty 2023-06-04 20:32:18 UTC
I'm wondering if Libra office can add accessibility features like offline speech to text and text to speech. I noticed the whisper model also not only does speech recognition but translation and it automatically entered grammar. You can find this at huging face https://huggingface.co/openai/whisper-large-v2 openai/whisper-large-v2. This will make Office Suite a lot more accessible for people with disabilities and it will also help with privacy if you can do speech recognition.
Comment 1 Misty 2023-06-04 20:34:53 UTC
It'll also help with privacy if you could do speech recognition offline.
Comment 2 Eyal Rozenberg 2023-06-04 22:40:24 UTC
So, one request per bug report. Don't ask for "all sorts of accessibility features". If you want N features, open N bugs. Also - make all of them depend on the accessibility meta-bug, bug 101912.

Now, about "offline speech to text" - what exactly do you mean by "offline"? Do you want LO to be able to import audio files, extracting the speech in them as text? If not - please be much more concrete in your description.
Comment 3 V Stuart Foote 2023-06-05 01:54:48 UTC
An OpenAI service would by nature not be implemented on the client system, but like the current LanguageTool hooks the project could pass control to an external processing service with published API. 

However, there is no UAA interface supporting speech-to-text nor text-to-speech, meaning any "handling" would need to be provided by extension of the os/DE--and for that there is no standard.

Like LanguageTool's AI this would need to be done by extension--and any TDF dev effort to implement would be out of scope. Kind of wonder what the LanguageTool take on the Whisper OpenAI will be.

IMHO => WF
Comment 4 Misty 2023-06-05 03:37:14 UTC
A feature like what they have on Word where they have a mic icon and you can click on it and dictate into your microphone. Transcribing your speech. I was looking at different AI Models that could possibly handle this sort of task. I think with the increase in computing power, I could be mistaken but I don't think an API would be necessary. I think the calculations could be done on a users computer to do pretty accurate speech recognition. We see this is the past with programs like Dragon naturally speaking.
Comment 5 QA Administrators 2023-06-06 03:13:26 UTC Comment hidden (obsolete)
Comment 6 Misty 2023-06-06 19:42:34 UTC
Why would it be out of scope to add speech recognition to the office suite I see that office 365 has already added it. There is also the Mozilla speech recognition project. In several others speech recognition projects out there if using the whisper model is an issue. I think accessibility is something we need to leave forward with an open source community. To make sure it's not monopolized buy a bunch of big corporations.
Comment 7 V Stuart Foote 2023-06-07 16:42:58 UTC
The Assistive Technology of speech-to-text or of text-to-speech are not covered by any meaningful standard--meaning each os/DE will do its own thing.

Project can not afford (in dev effort) cost to implement the native code it would require to provide speech-to-text, while text-to-speech is already provided (for better or worse) by os/DE.

An OpenAI project like Whisper offers a model for speech recognition and interface (as replacement of keyboard and mouse HID usage) similar to the external Gramerly or LanguageTool provided grammar/spelling and style support, but direct integration beyond that *is* out of scope.
Comment 8 Heiko Tietze 2023-10-02 08:47:38 UTC
(In reply to V Stuart Foote from comment #3)
> An OpenAI service would by nature not be implemented on the client system,
> but like the current LanguageTool hooks the project could pass control to an
> external processing service with published API. 
> ...
> IMHO => WF

+1, we should not rely on one external service.
Comment 9 Heiko Tietze 2023-10-04 12:31:04 UTC
An extension is very welcome.