Home ai “AI2 Releases OLMoE: A Cost-Effective and Open-Source Language Model with 7 Billion...

“AI2 Releases OLMoE: A Cost-Effective and Open-Source Language Model with 7 Billion Parameters”

How is OLMoE built?

The Allen Institute for AI (AI2) has developed a new open-source model called OLMoE that aims to address the need for a large language model (LLM) that is both high-performing and cost-effective. OLMoE utilizes a sparse mixture of experts (MoE) architecture and has a total of 7 billion parameters, with only 1 billion parameters used per input token. It comes in two versions: OLMoE-1B-7B, which is more general purpose, and OLMoE-1B-7B-Instruct for instruction tuning.

Unlike most MoE models, OLMoE is fully open-source, providing researchers and academics with access to its model weights, training data, code, and recipes. This openness is a significant step forward in building cost-efficient MoEs that can rival closed-source models in terms of capabilities.

Nathan Lambert, a research scientist at AI2, shared that the release of OLMoE aligns with the institute’s goal of creating open-sourced models that perform as well as closed models. He emphasized that AI2 is continuously improving its open-source infrastructure and data, making it accessible to others in the field.

To build OLMoE, AI2 adopted a fine-grained routing approach, using 64 small experts that are activated in groups of eight at a time. Through experiments, AI2 demonstrated that OLMoE performs on par with other models but with significantly lower inference costs and memory storage requirements.

OLMoE builds upon AI2’s previous open-source model, OLMO 1.7-7B, which supported a context window of 4,096 tokens. OLMoE was trained on a combination of data from DCLM and Dolma, including a filtered subset of Common Crawl, Dolma CC, Refined Web, StarCoder, C4, Stack Exchange, OpenWebMath, Project Gutenberg, Wikipedia, and more.

In benchmark tests, OLMoE-1B-7B consistently outperformed other models with similar active parameters, even surpassing larger models like Llama2-13B-Chat and DeepSeekMoE-16B. When compared to models with only 1 billion parameters, OLMoE-1B-7B demonstrated superior performance compared to open-source models like Pythia, TinyLlama, and even AI2’s previous OLMO model.

Open-sourcing mixture of experts

AI2’s motive behind developing OLMoE was to provide fully open-source AI models to researchers, particularly for mixture of experts (MoE) architecture, which has gained popularity among developers. Several AI model developers, including Mistral and Grok, have used the MoE system to build their models. However, AI2 highlights that many of these models lack full openness and do not provide information about their training data or source code.

MoEs present unique design questions for language models, such as determining the number of total versus active parameters to use, whether to employ many small or few large experts, and the choice of routing algorithm. Despite these complexities, not many MoE models offer the level of openness required to address these design challenges.

The Open Source Initiative, an organization that promotes and defines open-source standards, has started exploring what it means for AI models to be open source. AI2’s release of OLMoE contributes to this discussion by providing an example of an open-source MoE model that offers transparency and accessibility to the wider research community.

In conclusion, AI2’s OLMoE model represents a significant step forward in the development of large language models that are both high-performing and cost-effective. By leveraging a sparse mixture of experts architecture and providing full openness, OLMoE demonstrates the potential for building cost-efficient models that rival closed-source counterparts. Researchers and academics now have access to an open-source MoE model that outperforms other models with similar parameters, driving innovation and collaboration in the field of AI language models.

Exit mobile version