DeepSeek, a Chinese AI startup, has announced the release of DeepSeek Coder V2, an open-source mixture of experts (MoE) code language model. This model, built upon DeepSeek-V2, excels in coding and math tasks and supports over 300 programming languages. It outperforms closed-source models like GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro, making it the first open model to achieve this feat. Additionally, DeepSeek Coder V2 maintains comparable performance in general reasoning and language capabilities.
DeepSeek’s mission is to unravel the mystery of AGI (Artificial General Intelligence) with curiosity. As a notable player in the Chinese AI race, DeepSeek has already open-sourced several models, including the DeepSeek Coder family. The original DeepSeek Coder performed well on benchmarks but had limitations in language support and context window size. DeepSeek Coder V2 addresses these limitations by expanding language support to 338 and context window size to 128K, enabling it to handle more complex coding tasks.
When tested on various benchmarks, including MBPP+, HumanEval, Aider, MATH, and GSM8K, DeepSeek Coder V2 consistently scored high, outperforming most closed and open-source models. The only model that surpassed it across multiple benchmarks was GPT-4o. DeepSeek achieved these technical advancements by using its Mixture of Experts framework as a foundation and pre-training the base V2 model on a dataset of 6 trillion tokens sourced from GitHub and CommonCrawl.
In addition to excelling at coding and math tasks, DeepSeek Coder V2 also delivers decent performance in general reasoning and language understanding tasks. It scored well in the MMLU benchmark, which evaluates language understanding across multiple tasks. This showcases that open coding-specific models are closing in on state-of-the-art closed-source models, not just in their core use cases.
DeepSeek Coder V2 is currently offered under a MIT license, allowing for both research and unrestricted commercial use. Users can download the models in different sizes via Hugging Face or access them through DeepSeek’s platform using an API under a pay-as-you-go model. The company also offers the option to interact with DeepSeek Coder V2 via a chatbot for those who want to test its capabilities firsthand.