Amazon enters real-time AI voice race with Nova Sonic, a unified voice model that senses emotion

What happens when the AI senses the frustration or joy in your voice? A new speech-to-speech AI model from Amazon, called Nova Sonic, unifies speech recognition and generation to deliver more natural voice interactions — part of the Seattle tech giant’s broader effort to develop human-like intelligence in competition with Google, OpenAI, and others. Among other advances, Amazon says Nova Sonic picks up on tone of voice, adapting to the style and emotions of users. An angry customer on a support call might hear a calm, steady voice in return, while someone sounding excited could get a more upbeat response.… Read More

Apr 8, 2025 - 14:29

Amazon enters real-time AI voice race with Nova Sonic, a unified voice model that senses emotion

Amazon CEO Andy Jassy teased ahead to today’s announcement when he unveiled Amazon’s Nova initiative in December at AWS re:Invent in Las Vegas. (GeekWire Photo / Todd Bishop)

What happens when the AI senses the frustration or joy in your voice?

A new speech-to-speech AI model from Amazon, called Nova Sonic, unifies speech recognition and generation to deliver more natural voice interactions — part of the Seattle tech giant’s broader effort to develop human-like intelligence in competition with Google, OpenAI, and others.

Among other advances, Amazon says Nova Sonic picks up on tone of voice, adapting to the style and emotions of users. An angry customer on a support call might hear a calm, steady voice in return, while someone sounding excited could get a more upbeat response.

“I think of intelligence as inseparable from context,” said Rohit Prasad, Amazon’s senior vice president of artificial general intelligence, who leads a central team working on the company’s most advanced AI technology.

“If you’re excited about Hawaii, it will be excited about it,” he explained, as an example. “If you’re not, then it will suggest a separate destination.”

Nova Sonic will be available to third-party developers through Amazon’s Bedrock service. Amazon is already using components of the model internally, in products including its newly released Alexa+ voice assistant.

Unlike traditional voice systems that stitch together separate models for speech recognition, language processing, and text-to-speech, Nova Sonic combines all three in a single architecture, according to the company.

Amazon says this integration allows the model to preserve the full context of a conversation — including intonation, pacing, and intent — making interactions feel more conversational and responsive.

It can also take action in the middle of a conversation, like pulling up flight options or checking an account, without breaking the flow of the interaction.

Amazon is making Nova Sonic available via a new streaming API built for real-time voice applications. It currently supports English with a few different voices and accents. Amazon says it’s working on support for more languages.

Rohit Prasad, Amazon’s senior vice president of AGI. (Amazon Photo)

Nova Sonic enters a growing field of voice and multimodal AI models, as companies race to build more human-like digital assistants. OpenAI recently launched GPT-4o, its own real-time speech model, while Google has added conversational voice capabilities to its Gemini assistant.

Based on its testing, Amazon says Nova Sonic outperforms these rivals on speed and cost, with lower latency and better pricing.

For example, Amazon says Nova Sonic responds in just over a second on average — faster than both OpenAI’s GPT-4o and Google’s Gemini Flash 2.0 in tests run by the research firm Artificial Analysis. The company says Nova Sonic is nearly 80% cheaper to use than GPT-4o for real-time voice interactions.

Prasad, who previously was previously Alexa’s chief scientist, now oversees Amazon’s AGI group, reporting to Amazon CEO Andy Jassy.

The long-term goal, Prasad said in an interview, is to create unified models that can handle any kind of input and respond in the most natural way — delivering the “general” in artificial general intelligence.

“I actually think you’re merging the powers of the human and machine together,” Prasad said of AGI initiatives. “That’s why this is so important.”

He called Nova Sonic “a huge step” in that direction.

Companies testing Nova Sonic include ASAPP, for customer service calls; Education First, applying it to language learning tools; and Stats Perform, which is using it to deliver real-time sports insights through voice.

Amazon says Nova Sonic is designed to integrate with company systems to access real-time information such as pricing, availability, or schedules. The model can also be used to carry out tasks mid-conversation, including making reservations or offering alternative options.

Nova Sonic is the latest addition to Amazon’s Nova line of AI models, introduced by Jassy at AWS re:Invent in December, which includes AI for generating and understanding text, images, and video. It follows Amazon’s recent release of a research preview of Nova Act, for building web-based AI agents.

Amazon enters real-time AI voice race with Nova Sonic, a unified voice model that senses emotion

Tags:

Related Posts