Resemble AI Releases Detect-2B: A Highly Accurate Deepfake Detection Model

Resemble AI, a voice cloning company, has unveiled its latest deepfake detection model called Detect-2B. This advanced model boasts an impressive accuracy rate of around 94%. Detect-2B utilizes a combination of pre-trained sub-models and fine-tuning techniques to analyze audio clips and determine if they were generated using AI.

The company explained that Detect-2B is a significant improvement over their previous model, with enhancements in model architecture, training data, and overall performance. Resemble AI stated in a blog post that Detect-2B is an exceptionally robust and accurate deepfake detection model, proven through extensive evaluation against a vast dataset of real and fake audio clips.

Detect-2B’s sub-models consist of a frozen audio representation model with an adaptation module inserted into its key layers. This adaptation module focuses on identifying artifacts or accidental sounds that are often present in real audio but absent in fake ones. By predicting how much of the audio is generated by AI without the need for retraining, Detect-2B can differentiate between authentic and AI-generated clips. Additionally, the sub-models are trained on large datasets to improve performance.

To determine the authenticity of a recording, Detect-2B aggregates prediction scores and compares them against a carefully calibrated threshold. Resemble AI designed the model to be fast to train and deploy, reducing the computing power required for deployment.

The architecture of Detect-2B is based on Mamba-SSM or state space models, which utilize stochastic or random probabilistic models instead of static data or recurring patterns. This stochastic architecture is particularly effective for audio detection as it captures various dynamics in an audio clip, adapting between different states of an audio signal and maintaining performance even with low-quality recordings.

Resemble AI evaluated Detect-2B using a test set that included unseen speakers, deepfake-generated audio, and different languages. The model demonstrated an accuracy rate of at least 93% in correctly detecting deepfake audio across six different languages.

Detect-2B will be available through an API and can be integrated into various applications. The ability to identify AI-generated voices and videos has become increasingly important, especially in the lead-up to the 2024 U.S. Presidential Elections. With the potential for AI voices to mislead voters and spread misinformation, tools like Detect-2B play a crucial role in detecting and exposing deepfakes before they reach the public. Resemble AI is not alone in this effort, as other companies like McAfee and Meta are also working on their own methods of detecting AI clones and adding watermarks to AI-generated audio.

Resemble AI acknowledges that there is still work to be done, recognizing the need for continuous improvement in deepfake detection as generative AI capabilities advance. The company plans to focus on areas such as representation learning, advanced model architectures, and data expansion to further enhance Detect-2B’s performance.