Advertising

Cohere for AI Open Weights Release: Aya 23 Multilingual Language Models for Improved Performance and Ethical Compliance

Breaking Language Barriers with Aya: Cohere for AI’s New Multilingual Language Models

Cohere for AI (C4AI), the non-profit research arm of Canadian enterprise AI startup Cohere, has announced the open weights release of Aya 23, a family of state-of-the-art multilingual language models. These models, available in 8B and 35B parameter variants, aim to tackle the language barriers that have limited the performance of previous large language models (LLMs).

The Problem: English-Centric Models and Lack of Instruction-Style Training Data

While LLMs have made significant strides in recent years, most of the focus has been on English-centric models. As a result, these models tend to struggle when dealing with low-resource languages or languages outside of the dominant few. C4AI researchers identified two main issues causing this problem: a lack of robust multilingual pre-trained models and a scarcity of instruction-style training data covering a diverse range of languages.

The Solution: The Aya Initiative and Aya 101

To address these issues, C4AI launched the Aya initiative, bringing together over 3,000 independent researchers from 119 countries. The group created the Aya Collection, a massive multilingual instruction-style dataset consisting of 513 million instances of prompts and completions. Using this dataset, they developed the Aya 101 model, which supports 101 different languages and was released as an open source LLM in February 2024.

Moving from Breath to Depth: Introducing Aya 23

While Aya 101 was a significant step forward, it had its limitations. It was built upon outdated technology and focused too much on breadth, sacrificing performance on specific languages. With the release of Aya 23, Cohere for AI aims to strike a balance between breadth and depth. These new models, based on Cohere’s Command series and the Aya Collection, allocate more capacity to fewer languages (23 in total), resulting in improved generation across them.

Impressive Performance: Outperforming Aya 101 and Other Open Models

The Aya 23 models have shown impressive performance in both discriminative and generative multilingual benchmarks. In fact, they outperform Aya 101 and other open models like Google’s Gemma and Mistral’s various open source models. The researchers report improvements of up to 14% in discriminative tasks, up to 20% in generative tasks, and up to 41.6% in multilingual MMLU (multilingual mean log-likelihood under-approximation) compared to Aya 101. Across all comparisons, the Aya 23 models are consistently preferred.

Open Weights Release: Accessible for Researchers and Practitioners

To provide access to this groundbreaking research, Cohere for AI has released the open weights for both the 8B and 35B models on Hugging Face under the Creative Commons attribution-noncommercial 4.0 international public license. By doing so, they aim to empower researchers and practitioners to advance multilingual models and applications. Users can even try out the new models on the Cohere Playground for free.

In conclusion, Cohere for AI’s Aya 23 models represent a significant advancement in the field of multilingual language modeling. By addressing the limitations of previous models and focusing on depth rather than just breadth, these models have shown superior performance across a range of languages. With the open weights release, Cohere for AI hopes to foster further innovation and collaboration in this field, ultimately breaking down language barriers and enabling more inclusive AI applications.