Home ai Assembly AI Introduces Universal-1 Model with a Remarkable 30% Reduction in Hallucinations...

Assembly AI Introduces Universal-1 Model with a Remarkable 30% Reduction in Hallucinations Compared to Whisper

Assembly AI, an AI-as-a-service provider, has recently introduced its new speech recognition model called Universal-1. This model has been trained on over 12.5 million hours of multilingual audio data and has shown impressive results in terms of speech-to-text accuracy across multiple languages, including English, Spanish, French, and German.

One of the key features of Universal-1 is its ability to reduce hallucinations in speech data by 30% and in ambient noise by 90% when compared to OpenAI’s Whisper Large-v3 model. This improvement is a significant milestone for Assembly AI and highlights their commitment to providing accurate and robust speech-to-text capabilities in multiple languages.

In addition to its improved accuracy, Universal-1 also supports improved timestamp estimation, making it highly useful for tasks such as audio and video editing, as well as conversation analytics. Assembly AI claims that Universal-1 is 13% better than its predecessor, Conformer-2, resulting in better speaker diarization, improved concatenated minimum-permutation word error rate (cpWER) of 14%, and more accurate speaker count estimation.

Another advantage of Universal-1 is its efficient parallel inference, which significantly reduces the processing time for long audio files. Assembly AI compared the processing speed of Universal-1 with Whisper Large-v3 on Nvidia Tesla T4 machines with 16GB of VRAM. The results showed that Universal-1 was five times faster than Whisper Large-v3 when transcribing one hour of audio using a batch size of 64. Even with a smaller batch size of 24, Universal-1 still outperformed Whisper Large-v3, taking only 21 seconds compared to 107 seconds.

The benefits of having improved speech-to-text AI models like Universal-1 are numerous. Notetakers can generate more accurate and hallucination-free notes, while applications incorporating AI-powered video editing workflows can benefit from better metadata sorting, such as identifying proper nouns and speaker timing information. Telehealth platforms can also use Universal-1 for automated clinical note entry and claims submission processes where accuracy is crucial.

The Universal-1 model is available through Assembly AI’s API, providing developers and customers worldwide with the opportunity to build various Speech AI applications with improved speech-to-text capabilities.

In conclusion, Assembly AI’s introduction of the Universal-1 model marks a significant milestone in their mission to provide accurate and robust speech-to-text capabilities in multiple languages. With its impressive reduction in hallucinations, improved timestamp estimation, and efficient parallel inference, Universal-1 is set to revolutionize the field of speech recognition and enable a wide range of applications across industries.

Exit mobile version