Tutorials

Speech Recognition for Non-standard Speech

Alicia Martín, Katrin Tomanek and Richard Cave

aliciamartin@google.com
Google AI
Tuesday November 14th
9:00 - 13:00
Room 2

Registration Link

https://forms.office.com/r/qAx2nujh21

Abstract

When computers are able to recognize more diverse speech patterns, they can help provide more resources for people who have trouble being understood by technology or other people in their daily lives. Project Euphonia, is a research initiative that aims to make speech recognition more accessible for people with non-standard speech. In many cases, when someone with non-standard speech uses any voice-activated assistant, it does not understand them. Speech recognition models have not been trained on data that includes non-standard speech samples, leading to lower accuracy for individuals with speech challenges. Since the launch of our research, volunteers have contributed more than 1,600 hours of speech samples, creating the largest known non-standard speech dataset in the world (Jiang, P. (2022). “Euphonia Project: Automatic speech recognition research expands to include new languages, including Spanish”).

Our research demonstrated the potential for personalized autonomous speech recognition (ASR) models to help individuals with non-standard speech be better understood by technology and other people. We found that an ASR model fine-tuned with an individual’s voice recordings could recognize that individual’s voice better than human transcribers (Green, J., et al. (2023). "Automatic Speech Recognition of Disordered Speech: Personalized Models Outperforming Human Listeners on Short Phrases." Science 382.6387: 34-36. doi:10.1126/science.abm7687). In many circumstances, such as voice activated smart home assistants, personalized ASR models required only 20 minutes of speech data for most individuals (Tobin, J., et al. (2021). “Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets.” doi.org/10.48550/arXiv.2110.04612).

Working closely with Trusted Testers in the Euphonia program, it became clear that personalized models can be very useful, but for many users, recording dozens or hundreds of examples can be challenging. In addition, the personalized models did not always perform well in freeform conversation. To address these challenges, Euphonia’s research efforts have been focusing on speaker independent ASR (SI-ASR) to make models work better out of the box for people with non-standard speech so that no additional training is necessary. We demonstrated that utilizing the Euphonia’s speech corpus in model finetuning could improve performance on non-standard speech by ~30% (Tobin, J., et al. (2023). “Responsible AI at Google Research: AI for Social Good”).

These contributions have enabled Google's speech and research teams to conduct cutting-edge machine learning research in speech recognition, including the ability to create personalized models that understand individual people and speech-to-speech recognition. Allowing repetition of words in a clear synthesized voice. This research also helped us launch Project Relate, an Android app currently in beta, which allows people to access a personalized model that helps make communication more accessible.

We have now expanded our efforts to more languages, including Spanish. This data collection will help Google build more inclusive speech recognition models, including for Spanish speakers.

Target Audience

Researchers and LATAM accessibility groups.

Pre-requisites

No pre-requisites required.

Technical requirements

None required.