Israeli startup aiOla has announced a breakthrough in speech recognition technology that may help solve the problem of models failing to understand industry-specific jargon and vocabulary. This development could greatly enhance the accuracy and responsiveness of speech recognition systems, making them more suitable for complex enterprise settings. The startup has adapted OpenAI’s Whisper model using its technique, resulting in a reduction in word error rate and improved detection accuracy. However, aiOla claims that its approach can work with any speech recognition model, including Meta’s MMS model and proprietary models.
The problem of jargon in speech recognition has been a challenge for organizations using state-of-the-art automatic speech recognition (ASR) models like Whisper. While these models have achieved human-level robustness and accuracy in English speech recognition, they can struggle with audio from real-world environmental conditions, such as background noise or industry-specific terminology. Many organizations have tried to solve this problem by training ASR models to their industry’s unique requirements. However, this process can be time-consuming and costly, requiring extensive data collection and manual transcription.
To address this issue, aiOla has developed a two-step approach called “contextual biasing.” First, their keyword spotting model, AdaKWS, identifies domain-specific jargon from a given speech sample. Then, these identified keywords are used to prompt the ASR decoder, guiding it to incorporate the jargon into the final transcribed text. In initial tests using Whisper, aiOla found that both the keyword-guided Whisper and prompt tuning techniques improved the model’s performance on various datasets, even in challenging acoustic environments.
What sets aiOla’s approach apart is its compatibility with different ASR models. Enterprises can use this approach with any ASR model they have, allowing for a bespoke recognition system without the need for retraining. All they need to do is provide the list of their industry-specific words to the keyword spotter and update it as needed. This adaptability can be beneficial for industries that rely heavily on technical jargon, such as aviation, transportation, manufacturing, supply chain, and logistics.
Fortune 500 enterprises have already started deploying aiOla’s adaptive model, experiencing increased efficiency in handling jargon-heavy processes. For example, a global shipping and logistics leader reduced the time required for daily truck inspections from 15 minutes per vehicle to under 60 seconds per vehicle using aiOla’s automated workflow. Similarly, a leading Canadian grocer used the models to inspect product and meat temperatures, resulting in projected time savings of 110,000 hours annually and expected cost savings of over $2.5 million.
aiOla hopes that its research will inspire other AI research teams to build on its work. However, at present, the company is not providing API access to the adapted model or releasing the weights. Enterprises can only use it through aiOla’s product suite, which operates on a subscription-based pricing structure. Overall, aiOla’s approach has the potential to revolutionize speech recognition technology by addressing the challenges posed by industry-specific jargon and vocabulary, saving time and resources for businesses in various sectors.