DeepSeek’s mission is to unravel the mystery of AGI (Artificial General Intelligence) with curiosity. As a notable player in the Chinese AI race, DeepSeek has already open-sourced several models, including the DeepSeek Coder family. The original DeepSeek Coder performed well on benchmarks but had limitations in language support and context window size. DeepSeek Coder V2 addresses these limitations by expanding language support to 338 and context window size to 128K, enabling it to handle more complex coding tasks.
When tested on various benchmarks, including MBPP+, HumanEval, Aider, MATH, and GSM8K, DeepSeek Coder V2 consistently scored high, outperforming most closed and open-source models. The only model that surpassed it across multiple benchmarks was GPT-4o. DeepSeek achieved these technical advancements by using its Mixture of Experts framework as a foundation and pre-training the base V2 model on a dataset of 6 trillion tokens sourced from GitHub and CommonCrawl.
In addition to excelling at coding and math tasks, DeepSeek Coder V2 also delivers decent performance in general reasoning and language understanding tasks. It scored well in the MMLU benchmark, which evaluates language understanding across multiple tasks. This showcases that open coding-specific models are closing in on state-of-the-art closed-source models, not just in their core use cases.
DeepSeek Coder V2 is currently offered under a MIT license, allowing for both research and unrestricted commercial use. Users can download the models in different sizes via Hugging Face or access them through DeepSeek’s platform using an API under a pay-as-you-go model. The company also offers the option to interact with DeepSeek Coder V2 via a chatbot for those who want to test its capabilities firsthand.