Using LLMs comes at a price (or several prices). Let’s go through the costs associated with LLM use and how to manage them.
The cost of using Large Language Models (LLMs) for chatbots and AI services can vary significantly, depending on the approach taken and the scale of implementation. There are several pricing models and strategies to consider for managing these costs efficiently.
When it comes to using LLMs in business operations, it's crucial to have a solid understanding of the associated costs. Let's break down the key factors contributing to LLM expenditure:
Tokenisation is the way that LLMs break down language into chunks (or tokens) they can understand, be it a single letter, a whole word, or even part of a word.
Most popular LLM providers offer pay-as-you-go models based on token usage. These costs vary between providers and models, with generally more complex models charging a higher amount per token.
As of February 2025, prices per million tokens can range from as little as $0.15 for an input and $0.60 for an output (GPT-4o-mini) to $15 for an input and $75 for an output (Claude 3 Opus) and beyond.
A single conversation with an LLM usually uses thousands of tokens both in inputs and outputs, meaning these prices can escalate quickly in a large organisation, but you only pay for what you use.
Chatbot subscription costs vary widely depending on the provider, features, and scale of implementation. These subscriptions often offer a more predictable pricing model for businesses compared to pay-as-you-go token-based pricing.
Taking Anthropic’s Claude as an example, subscriptions range from free to $25 per user/month, with custom pricing for plans better suited to businesses operating at scale. OpenAI charges similarly for their ChatGPT subscriptions.
Some organisations may go a different route from paying subscriptions or token costs by choosing to self-host their own language model. While upfront costs are high, self-hosting can be more economical for certain applications.
For organisations hosting their own models, the upfront cost of setting up the infrastructure is likely to be the main expense. This includes the cost of specialised hardware, cloud computing, and energy.
These costs vary widely, but according to the 2024 AI Index Report, “the training costs of state-of-the-art AI models have reached unprecedented levels”, with Google’s Gemini Ultra costing $191 million worth of compute to train. In addition, according to Epoch AI, research and development (R&D) staff costs can account for a substantial 29-49% of the cost of training.
To manage GenAI costs efficiently, consider the following approaches:
The complexity and size of an LLM significantly impact its cost. Larger models with more parameters require more computational resources and are consequently more expensive to operate.
Being selective and finding the right model for your specific use case is crucial.
Maintaining awareness of your budget and keeping an eye on opportunities for potential savings is one of the most simple and valuable ways you can reduce costs.
Use budget management tools, such as those built into Narus, to:
LLM providers typically charge based on the number of tokens processed. A token can be a word, part of a word, or even a single character. The more tokens your requests contain, the higher the cost. This applies to both input (the text you send to the model) and output (the response generated by the model).
Through careful prompt engineering, you can prevent the use of an unnecessary number of tokens, saving cost while gaining more accurate responses from LLMs.
Repeating the same prompts and generating the same responses unnecessarily is a wasted expense. Caching in LLMs is storing previously computed results to reuse them. This avoids redoing the same calculations, making the LLM faster and cheaper to run, especially when similar prompts are used repeatedly.
To reduce, reuse, and recycle previous outputs and effective prompts:
RAG is an advanced AI technique that combines the power of LLMs with external data sources, such as uploaded documents.
It’s worth considering whether RAG could save your business money, as this technique is generally more cost-effective than fine-tuning (another technique aimed at producing more accurate outputs from LLMs), which requires significantly more computational resources.
Organisations are increasingly seeing financial benefits from AI, as shown in a recent McKinsey survey. 42% of respondents reported cost reductions, while 59% saw revenue growth. This represents a significant improvement in cost efficiency, with a 10% rise in the number of organisations reporting decreased costs compared to the previous year.
Despite the cost of implementing them, using LLMs where appropriate can save businesses a lot of money. By carefully considering these factors and implementing strategic cost management practices, businesses can harness the power of GenAI while keeping expenses under control. It's crucial to regularly review and adjust your approach as both the technology and pricing models in this field are rapidly evolving.