Comparing the Costs of GPT-4o and Llama 3: Which LLM is More Cost Effective for Conversational AI?

# How do you calculate LLM costs for a conversational AI?

When considering the costs of implementing a conversational AI tool, there are two primary financial factors to consider: the set-up costs and the eventual processing costs. Set-up costs include development and operational expenses required to get the Language Model (LLM) up and running, while processing costs refer to the actual cost of each conversation once the tool is live.

The cost-to-value ratio of set-up costs depends on the purpose and usage of the LLM. If speed is crucial and you need to deploy your product quickly, you might opt for a model like GPT-4o, which requires minimal set-up. However, if you manage a large number of clients or require more control over your LLM, you may choose to invest more in set-up costs for greater long-term benefits.

In terms of conversation processing costs, it’s essential to consider token usage. LLMs like GPT-4o and Llama 3 use tokens as a unit of text processing. However, different LLMs may define tokens differently, making direct comparisons challenging. Despite this, by simplifying the costs as much as possible, we can approximate the comparison.

Based on internal research, it was found that while GPT-4o may be cheaper in terms of upfront costs, Llama 3 is more cost-effective over time. By comparing the price per 1,000 tokens for input and output, it was estimated that the benchmark conversation on GPT-4o costs approximately $0.16, while the same conversation on Llama 3-70B costs around $0.08. This suggests that Llama 3 is almost 50% less expensive for conversations once both models are fully set up.

It’s important to note that these calculations provide a snapshot of the overall costs, and other variables come into play when developing a product for unique needs. Additionally, for companies that view conversational AI as a core service but not a fundamental element of their brand, off-the-shelf products may offer sufficient quality without the need for in-house development.

# What are the foundational costs of each LLM?

Before diving into the costs per conversation, it’s crucial to understand the foundational costs of each LLM. GPT-4o is a closed source model hosted by OpenAI, requiring minimal set-up. All you need to do is make API calls to OpenAI’s infrastructure and data libraries. On the other hand, Llama 3 is an open source model that must be hosted on your own private servers or cloud infrastructure providers. This incurs additional hosting costs, depending on the provider’s pricing structure.

For example, Amazon Bedrock, a managed, serverless platform by AWS, calculates costs based on the number of tokens processed. This could be a cost-effective solution, especially for businesses with low usage volumes. However, hosting Llama 3 requires more time and money for operations, server selection, maintenance, and development of necessary tools and alerts.

When calculating the foundational cost-to-value ratio, factors such as time to deployment, product usage, and the level of control needed over your product and data should be considered. Open source models like Llama 3 offer greater control, but they require more investment in terms of time and resources.

# What are the costs per conversation for major LLMs?

Now let’s explore the costs per conversation for major LLMs. Using a heuristic where 1,000 words equal 7,515 characters or 1,870 tokens, an average consumer conversation of 16 messages between AI and humans would amount to approximately 30,390 tokens.

For GPT-4o, the price per 1,000 input tokens is $0.005, and per 1,000 output tokens is $0.015. Therefore, the benchmark conversation on GPT-4o costs around $0.16.

In comparison, Llama 3-70B on AWS Bedrock has a price per 1,000 input tokens of $0.00265, and per 1,000 output tokens of $0.00350. This results in the benchmark conversation on Llama 3-70B costing approximately $0.08.

In summary, once both models are fully set up, the cost of a conversation on Llama 3 is almost 50% less than on GPT-4o. However, it’s important to factor in any server costs associated with Llama 3.

It’s worth noting that these calculations provide a general idea of the costs, and other variables, such as the use of multi-prompt or single-prompt approaches, can influence the overall expenses.

In conclusion, integrating conversational AI can be highly beneficial, but it’s essential to consider what makes sense for your company’s context and your customers’ needs. Whether you choose to build in-house or opt for off-the-shelf products, always prioritize what aligns with your goals and resources.

*Sam Oliver is a Scottish tech entrepreneur and serial startup founder.*

—

*Welcome to the VentureBeat community!*

*DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation. If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers. You might even consider contributing an article of your own! [Read More From DataDecisionMakers](https://venturebeat.com/category/DataDecisionMakers/)*