Brain-computer interfaces are a cutting-edge technology that helps paralyzed people regain functions they have lost, such as moving an arm. These devices record signals from the brain and work out the user’s intended action, avoiding damaged or degraded nerves that would normally transmit those brain signals to control muscles.
Since 2006, demonstrations of brain-computer interfaces in humans have focused primarily on restoring arm and hand movements by enabling humans to control computer cursors or robotic arms. Recently, researchers have begun developing brain-computer speech interfaces to restore communication to people who cannot speak.
As the user tries to speak, these brain-computer interfaces record the unique human brain signals associated with the muscle movements of the effort to speak and then translate them into words. These words can then be displayed as text on a screen or spoken aloud using text-to-speech software.
I am a researcher in the Neuroprosthetics Lab at the University of California, Davis, which is part of the BrainGate2 clinical trial. My colleagues and I recently demonstrated a brain-computer speech interface that decodes the speech effort of a man with ALS, or amyotrophic lateral sclerosis, also known as Lou Gehrig’s disease. The interface converts neural signals to text with more than 97% accuracy. Key to our system is a set of artificial intelligence language models – artificial neural networks that help interpret natural ones.
Recording brain signals
The first step in our brain-computer speech interface is to record brain signals. There are several sources of brain signals, some of which require surgery to record. Surgically implanted recording devices can capture high-quality brain signals because they are placed closer to neurons, resulting in stronger signals with less interference. These neural recording devices include grids of electrodes placed on the surface of the brain or electrodes implanted directly into brain tissue.
In our study, we used electrode arrays that were surgically implanted in participant Casey Harrell’s speech motor cortex, the part of the brain that controls speech-related muscles. We recorded neural activity from 256 electrodes while Harrell tried to speak.
Decoding brain signals
The next challenge is to associate the complex brain signals with the words the user wants to say.
One approach is to map neural activity patterns directly to spoken words. This method requires multiple recordings of brain signals corresponding to each word to identify the average correlation between neural activity and specific words. While this strategy works well for small vocabularies, as a 2021 study showed with a 50-word vocabulary, it becomes impractical for larger vocabularies. Imagine asking the user of a brain-computer interface to try to say every word in the dictionary multiple times – it could take months, and it still wouldn’t work for new words.
Instead, we use an alternative strategy: mapping brain signals to phonemes, the basic units of sound that make up words. In English, there are 39 phonemes, including ch, er, oo, pl and sh, which can be combined to form any word. We can often measure the neural activity associated with each phoneme by asking the participant to read a few sentences aloud. By accurately mapping neural activity to phonemes, we can combine them into any English word, even ones the system has not been explicitly trained with.
To map brain signals to phonemes, we use advanced machine learning models. These models are particularly suited to this task because of their ability to find patterns in large amounts of complex data that would be impossible for humans to recognize. Think of these models as super-smart listeners that can pick out important information from noisy brain signals, like focusing on a conversation in a crowded room. Using these models, we were able to decipher phoneme sequences during attempted speech with over 90% accuracy.
From phonemes to words
Once we have the phoneme sequences down, we need to convert them into words and sentences. This is challenging, especially if the degraded phoneme sequence is not perfect. To solve this problem, we use two complementary types of machine learning language models.
The first is n-gram language models, which predict which words are most likely to follow a sequence n words. We trained a 5-gram, or five-word, language model on millions of sentences to predict word probability based on the previous four words, capturing local context and common phrases. For example, after “I’m very good,” it might suggest that “today” is more likely than “potato”. Using this model, we convert our phoneme sequences into 100 most likely word sequences, each with an associated probability.
The second is large language models, which power AI chatbots and also predict which words are likely to follow others. We use large language models to refine our choices. These models, trained on huge amounts of different texts, have a broader understanding of language structure and meaning. They help us decide which of our 100 candidate sentences make the most sense in a wider context.
By carefully balancing the probabilities from the n-gram model, the large language model and our initial phoneme predictions, we can make highly educated guesses about what the user of the brain-computer interface is trying to say. This multi-step process allows us to deal with the uncertainties of decoding phonemes and produce coherent, context-appropriate sentences.
Benefits in real life
In practice, this speech decoding strategy has been extremely successful. We enabled Casey Harrell, a man with ALS, to “speak” with over 97% accuracy using his thoughts. This advancement allows him to easily chat with his family and friends for the first time in years, all in the comfort of his own home.
Brain-computer speech interfaces represent a significant step forward in restoring communication. As we continue to refine these devices, they promise to give a voice to those who have lost the ability to speak, reconnecting them with their loved ones and the world around them.
However, challenges remain, such as making the technology more accessible, portable and durable over years of use. Despite these barriers, brain-computer speech interfaces are a powerful example of how science and technology can come together to solve complex problems and dramatically improve people’s lives.
This article is republished from The Conversation, a non-profit, independent news organization that brings you reliable facts and analysis to help you make sense of our complex world. Written by: Nicholas Card, University of California, Davis
Read more:
Nicholas Card does not work for, consult with, or own shares in, or receive funding from, any company or organization that would benefit from this article this, and has disclosed no relevant affiliations beyond their academic appointment.