Being able to hack a speech-to-text system may seem virtually pointless on the surface. So you can ruin someone’s day by messing with their transcription? I speak from experience: if someone has spent hours transcribing, their day has already hit rock bottom.
But researchers from the University of California, Berkeley believe that their latest achievement – embedding hidden messages within audio files to trick a speech-to-text programme – could have serious repercussions in a world where Siri, Google, Alexa, Bixby and Cortana are lining up to organise our information.
To be clear, that’s not what they’ve achieved for now, instead working with Mozilla’s open-source DeepSpeech programme, but the principals are the same. As the white paper explains: “Given any audio waveform, we can produce another that is over 99.9% similar but transcribes as any phrase we choose (at a rate of up to 50 characters per second.)” The attack worked with a 100% success rate.
So far, so sinister, but it gets worse: “By starting with an arbitrary waveform instead of speech (such as music), we can embed speech into audio that should not be recognised as speech; and by choosing silence as the target, we can hide audio from a speech-to-text system.”
So in theory, using the same principals, they should be able to get virtual assistants to their bidding as well – whether it’s an Amazon Echo or a Google Home device. When you say a phrase to your smart speaker, the first thing it does is convert the audio to text so it can understand what you want to be done. In fact, one of the researchers had previously managed to get Google Assistant to do tricks without asking it directly, like in the video below:
Kind of neat, but not too worrying. I think most people would smell a rat if they heard the girl from The Exorcist chatting to their Google Home. But what the new research shows is that the same results can be delivered so subtly, you might not even know it’s happening. Nicholas Carlini, one of the authors of the paper told The Next Web that he’d “ feel confident in saying that with some more work, someone will be able to make our audio adversarial examples also work over-the-air.”
In other words, a cyber attack aimed at the Echo’s virtual ears could be embedded in a hit song on the radio or an advert on TV. We’ve already seen overt but benign tricks where adverts and TV shows deliberately trigger the Amazon Echo into doing their bidding. The chances that malicious troublemakers will aim for something more damaging in the future seem pretty high to me. Right now that has limited potential for mischief – getting Alexa to remotely lower your smart thermostat is irritating, but hardly the end of the world. If we start outsourcing more serious matters to our virtual assistants – banking, say – then this might be one piece of research that Amazon and Google’s engineers keep returning to.
You are here:
- Researchers trick speech-to-text systems with hidden messages – Could Alexa, Bixby, Google, Siri and Cortana be next?