A New Chip Makes Voice Control More Efficient, Less Creepy

Voice assistants suck down a lot of battery power. MIT found a way of making them much thriftier.
This image may contain Text
Then One/WIRED

Maximizing battery life remains the great challenge of every smartphone manufacturer. People use their phones for everything these days, and of course they want a battery that lasts forever, and charges in minutes. Engineers have a few ways of tackling this problem beyond packing a bigger, and potentially more dangerous, lithium-ion battery inside. The most effective trick is making the chips, drivers, and other components as energy efficient as possible.

The obvious targets include big screens, 4G modems, and Bluetooth. But researchers are taking a close look at the always-on voice assistants that let you bark a command without touching a button. Those things can suck down a lot of power, but researcher at MIT's Microsystems Technology Laboratories found a way of making them thriftier, too.

Keep It on the Chip

Alexa, Siri, and Google Assistant use the cloud to process voice commands. In a clever twist, the MIT chip handles much more of that processing itself, easing the burden on other components---and saving power. “Even if power consumption isn't an issue, hardware accelerators can be useful for making devices simpler and lower cost,” says Michael Price, who designed the new chip. “If you can offload a difficult computation from the main processor, that processor doesn't have to be as fast.”

That means manufacturers can use a less expensive processor. Cheaper is good, but increased efficiency is better---and Price set out to radically reduce the amount of power required to drive voice-assistant features. Smartphones generally need 1 watt of power to drive a single speech-recognition query, Price says. The system his team developed requires about 1/100 as much in the worst case.1 Some basic voice-processing functions sipped just 0.2 milliwatts, making it 5,000 times more efficient than the 1 watt benchmark.

Power Save

The primary energy savings comes in making the chip more adept at recognizing speech. Instead of streaming audio over a web connection to a server, the processor converts speech to text locally. Processing those queries as a text file consumes far less power. The system also is more power-savvy when detecting speech; a low-power circuit within the chip detects when ambient noise is interrupted by a voice, then triggers the primary system when a voice command is registered.

The research process included a counterintuitive finding. Price’s team tested three circuits, and found that the most power-hungry of them resulted in the greatest energy savings. Why? Because it registered fewer false-positives than the others, which often activated the speech-processing chip after mistaking ambient noise for a voice command.

Its minuscule power requirement means you could see the chip providing voice-control capabilities in the next wave of smaller IoT devices with tiny batteries. That said, Price couldn't comment on whether you'll see the technology in consumer products anytime soon. MIT developed the chip specifically for battery-powered gadgets, but similar components could impact how plug-in devices such as the Amazon Echo and Google Home work.

“If an in-home device is doing speech recognition locally and that turns out to be a processing bottleneck, then our technology could be useful,” Price says.

Find Your Voice

The sound clips recorded by your Echo and Home stay on the companies' servers even after they're processed. Amazon and Google have excellent security and privacy protection records, but more on-device processing means less personal data stored in the cloud. Price says converting speech to text on the device before zinging it to a server would eliminate some info from the captured data, such as the speaker’s age, accent, gender, and ambient noise in the background.

“Of course, privacy is up to the system designer,” Price says. “There's nothing stopping them from saving the audio on a device or transmitting it, even if the speech recognition is done locally.” True. But any technology that makes voice commands more efficient and less creepy is a good thing.

1UPDATE March 10, 2017 10:00 a.m. ET: This story has been updated to correct statistics for MIT's new chip technology. The worst-case scenario in their testing drew 1/100th of a watt, not 1/1,000th of a watt.