Scientists are getting closer to developing a computer-generated tool to allow people with severe speech impairments — like the late cosmologist Stephen Hawking — to communicate verbally.
In a paper published today in the journal Nature, a team of researchers at the University of California San Francisco (UCSF) report that they’re working on an early computerized system that can decode brain signals from movements made while speaking, and then translate those movements into sounds. The authors said in a press briefing that the study is a proof of principle that it’s possible to synthesize speech by reading brain activity. “It’s been a long-standing goal of our lab to create technologies to restore communications for people with severe speech disability,” says co-author Dr. Edward Chang, a neurosurgeon at UCSF.
The UCSF team’s system works in two stages. In the first, a device surgically attached to the surface of the brain picks up neural activity for vocal tract movements. That neural activity is used to estimate the physical movements of the jaw, larynx, lips, and tongue while a person is speaking. In the second stage, those movements are decoded so the computer can recreate the sounds as synthesized speech, like an artificial vocal tract.
To train the system, the researchers asked people without speech disabilities to carefully read sentences while the researchers recorded their neural activity. The men and women, who suffer from epilepsy, already needed a sensor on their brains as part of their clinical treatment and agreed to participate in the study.
The authors report that their computer’s synthesizer-produced speech had energy patterns that closely tracked that of the original spoken sentences. But there are still challenges to making computer synthesized sounds understandable. In a test reported in this study, men and women were asked to transcribe the synthesized speech choosing from a list of provided words, and the authors report that about 70% of words were correctly transcribed.
One of the team’s most impressive achievements is that they managed to decode speech signals in the brain in real-time. In the synthetic speech system that Hawking famously used, he controlled a cursor by moving his cheek and the software’s predictive algorithm would help him select words he wanted to type. A spelling-based system like that could produce about 5 to 10 words per minute, according to Chang. This new system uses natural rates of speaking, around 120 to 150 words per minute, and has the potential to help people communicate far faster than spelling-based systems.
By modeling the vocal tract movements, “the authors tap into existing neural processes for speech production that are likely generative.”
In another test, the team had study subjects mime the sentences without producing any sound. The system was still able to decode the signals based on the brain activity from the vocal tract movements. “This is an interesting finding in the context of future speech prostheses for people unable to speak,” says Blaise Yvert of the University Grenoble Alpes who has also published research on speech synthesizers and was not involved with the study. “Yet this result should be confirmed in other participants and also when participants imagine speaking without performing any movement.”
Other research groups are also getting closer to a functioning brain activity-based speech decoder. Nima Mesgarani of Columbia University led a team that published similar experiments earlier this year in Scientific Reports. Mesgarani’s team focused on neural activity in the sensory cortex, the part of the brain where speech perception happens, while the UCSF team focused on the motor cortex, the part of the brain where the muscular movements behind speech production occurs. “What approach will ultimately prove better for decoding the imagined speech condition remains to be seen, but it is likely that a hybrid of the two may be the best,” says Mesgarani.
By modeling the vocal tract movements, “the authors tap into existing neural processes for speech production that are likely generative as they demonstrated in their mimed-speech condition, and somewhat more intuitive for individuals to use in future clinical applications to restore speech for individuals with severe speech and physical impairments,” says Jonathan Brumberg of the University of Kansas who was not involved with the study.
While the UCSF team is mostly focused on the engineering of the system itself, they acknowledge the potential for clinical trials and studies that will include people with communication disabilities. But while the underlying mechanisms of how the brain activity relates to speech is the same across people, each person’s brain is different, and so future attempts to use this technology in someone who cannot speak would need to be personalized appropriately.
The system also requires placing a sensor directly on the brain, which limits the pool of people available to train the system. (No ethics board would allow study subjects to have an invasive device put on their brain, which is why the Nature study relied on people who already needed to have such devices implanted.) “It will be exciting to see over coming years whether similar results can be obtained using noninvasive brain-imaging approaches that do not involve surgery,” says Ian Wiggins of the Nottingham Biomedical Research Centre, who was not involved with the study. “If so, this could really open up new possibilities for people who have lost the ability to communicate because of neurological damage.”