
In brief
- Researchers at Zhejiang University developed AudioHijack, which hides imperceptible commands in audio to manipulate large audio-language models with a 79–96% success rate.
- The attack transferred from open models to commercial voice AI from Microsoft and Mistral; most standard defenses stopped only a small fraction of attempts.
- The team is now investigating whether the technique can reach closed models from OpenAI and Anthropic through shared open-source audio components.
University researchers in China have found a way to alter the behavior of AI voice models by embedding hidden commands inside audio clips that are inaudible to humans. The attack has an up to 96% success rate, according to research out of Zhejiang University.
The attack method, presented at the 47th IEEE Symposium on Security and Privacy in San Francisco, targets large audio-language models, or LALMs, which can process spoken commands and interact with external tools and applications.
“It takes just half an hour to train this signal, and then, because this signal is context-agnostic, you can use it to attack the target model whenever you want, no matter what the user says,” lead author Meng Chen, a Ph.D. student at Zhejiang University, said in a statement.
The attack works by modifying the numerical values inside a digital audio waveform in ways that are not perceptible to human listeners but still affect how AI models interpret the signal. Researchers said the manipulated audio can override or redirect a model’s behavior even when legitimate user instructions are included with the clip.
AudioHijack differs from traditional prompt injection attacks because it does not manipulate what the user says to the AI. Instead, it alters the audio signal itself, embedding hidden instructions inside sounds humans cannot hear. Researchers said that makes the attack harder to defend against because it bypasses safeguards designed to detect suspicious text prompts.
The researchers tested AudioHijack on 13 open-source AI voice models, and found that it could make them refuse requests, spread false information, insert harmful links, change personality, or perform actions the user never asked for, including web searches, file downloads, and emails containing personal data. The attacks also worked on commercial voice AI systems from Microsoft and Mistral that use similar technology.
“Many previous attacks on generative models required the attacker to have complete control over both the final audio input and original instructions given to the model, essentially acting as the user,” the study said. “Here, the attacker manipulates only the audio data being processed by the model, which makes it possible to attack a model while it’s being used by someone else.”
According to the study, possible delivery methods include online videos, music clips, voice notes, or audio from Zoom calls uploaded to AI transcription services. The team also said unpublished follow-up work demonstrated similar attacks in live AI voice chats.
The researchers said monitoring a model’s internal attention mechanisms was the most effective defense they tested. However, they also found that attackers aware of the defense could reduce the strength of the manipulation while maintaining much of the attack’s effectiveness.
“These single-point defenses struggle to resist our attack because we found it’s very hard for these models to distinguish the normal user intent and our adversary attack,” Chen said.
Daily Debrief Newsletter
Start every day with the top news stories right now, plus original features, a podcast, videos and more.




Be the first to comment