Introducing Inflection AI’s Latest Model for Pi Chatbot, Offering Comparable Performance to GPT-4

Inflection AI, the Palo Alto-based startup founded by DeepMind co-founder Mustafa Suleyman and LinkedIn co-founder Reid Hoffman, has announced the launch of its latest model, Inflection-2.5. This new foundation model, built on previous work, outperforms the company’s original Inflection-1 and is comparable to OpenAI’s GPT-4 model, especially in STEM subjects. Inflection-2.5 powers the company’s Pi assistant, which aims to compete with ChatGPT and Gemini. The model can be tested via mobile and web platforms.

The introduction of Inflection-2.5 is part of a broader effort within the AI space to challenge the dominance of OpenAI. Anthropic recently released Claude 3 Opus, becoming the first model to surpass GPT-4. Inflection AI aims to create an “empathetic, useful, and safe” AI that acts more personally and colloquially than other models, including the GPT series. The company has used unique empathetic fine-tuning to give the Pi assistant a distinct personality and exceptional emotional intelligence (EQ).

With Inflection-2.5, Inflection AI is focusing on building up the IQ aspect of its model, particularly in areas like physics and mathematics. Users interacting with Pi can discuss a wide range of topics, from hobbies to coding, checking answers to biology papers, or even drafting business plans. The upgraded model shows significant improvements over Inflection-1 and performs closely to GPT-4 in benchmarks. However, it still lags behind GPT-4 in terms of overall performance.

In benchmark tests, Inflection-2.5 scored 85.5 on the MMLU benchmark, just below GPT-4’s score of 87.3. In STEM exams, the model performed comparably to GPT-4, scoring 63 in the Hungarian Math exam (vs. GPT-4’s 68) and the 85th percentile in the Physics GRE (compared to GPT-4’s 97th percentile). It also scored 86.3 in the GSM8K benchmark, consisting of grade school math problems, compared to GPT-4’s score of 92. In terms of code generation capabilities, Inflection-2.5 scored 73.8 in the 0-shot HumanEval benchmark, while GPT-4 scored 79.3.

Inflection AI highlights that the performance achieved with Inflection-2.5 has been accomplished with more efficient training than GPT-4. The company states that Inflection-2.5 required only 40% of the training FLOPs (compute) of GPT-4 to achieve its results. Additionally, like GPT-4, Inflection-2.5 incorporates real-time web search capabilities, providing users with up-to-date information on current events. This upgrade is significant as Inflection AI positions Pi assistant as an AI for everyone.

Inflection AI has already rolled out the Inflection-2.5 model for its Pi chatbot, allowing users to test its capabilities. The company has not provided specific details on how users are benefiting from the upgraded model, but it has noted that it has had a significant impact on user sentiment, engagement, and retention, accelerating organic user growth. The Pi chatbot currently has one million daily active users and six million monthly active users, with over four billion messages exchanged with the AI. Users can access Pi on Android, iOS, web, and desktop applications.

With the launch of Inflection-2.5, Inflection AI continues to push the boundaries of AI technology, aiming to provide a more empathetic and useful AI experience. As the AI space evolves rapidly, companies like Inflection AI are striving to challenge established players like OpenAI and contribute to the development of AI for humanity.