Google’s DeepMind Unveils Video-to-Audio Technology for AI-Generated Videos

June 18, 2024

Google’s DeepMind lab has made significant progress in the development of “video-to-audio” technology (V2A), which aims to generate synchronized audio for AI-generated videos. This technology is crucial because sound plays a critical role in films and videos, allowing the audience to connect emotionally with the content. Even in the era of silent films, there was always a musical accompanist to set the tone and evoke specific emotions.

The emergence of generative AI videos has presented a new challenge because these videos are often silent. To address this issue, Google has been working on V2A technology, which enables the automatic generation of soundtracks and dialogue that match up perfectly with AI-generated videos. This development is part of Google’s broader efforts to compete with rivals like OpenAI, which has its own AI video generator called Sora, and GPT-4o, which creates AI voice responses.

While companies like Meta and Suno have explored AI-generated audio and music, the pairing of audio with video is a relatively new concept. DeepMind’s V2A technology is unique because it doesn’t require text prompts to generate audio. Other tools, such as ElevenLabs’ offering, rely on text prompts to match audio with video. Instead, V2A utilizes a diffusion model trained with visual inputs, natural language prompts, and video annotations to gradually refine random noise into audio that fits the tone and context of videos.

V2A can be integrated with existing AI video tools like Google Veo or archival footage and silent films. It can be used to create soundtracks, sound effects, and even dialogue. The model is capable of understanding raw pixels, eliminating the need for text prompts, although they can improve accuracy. Additionally, the model can be prompted to generate audio with either a positive or negative tone.

To demonstrate the capabilities of V2A, DeepMind released several demo videos. These include a video of a dark, creepy hallway accompanied by horror music, a lone cowboy at sunset scored to a mellow harmonica tune, and an animated figure discussing its dinner. DeepMind plans to include Google’s SynthID watermarking as a measure to prevent misuse of the technology. Currently, V2A is undergoing testing and will be released to the public once it meets the necessary quality standards.

In conclusion, Google’s DeepMind lab has made significant progress in developing video-to-audio technology (V2A) that generates synchronized audio for AI-generated videos. This technology eliminates the need for human intervention in creating soundtracks, sound effects, and dialogue for videos. It utilizes a diffusion model trained with visual inputs and video annotations to refine random noise into audio that perfectly matches the tone and context of the videos. While text prompts are not required, they can improve accuracy. V2A is currently undergoing testing and will be released to the public in the future.

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

RELATED ARTICLES

CZ Zhao Released: A New Chapter After Binance’s Historic Settlement

Empowering Developers: Discord’s New Opportunities for Gaming Innovation

Apple’s Vision Pro: Anticipating the M5 Upgrade and Future Innovations