Narus | LLM Cost Management

The cost of using Large Language Models (LLMs) for chatbots and AI services can vary significantly, depending on the approach taken and the scale of implementation. There are several pricing models and strategies to consider for managing these costs efficiently.

‍

Understanding LLM costs

When it comes to using LLMs in business operations, it's crucial to have a solid understanding of the associated costs. Let's break down the key factors contributing to LLM expenditure:

Token costs (pay-as-you-go)

Tokenisation is the way that LLMs break down language into chunks (or tokens) they can understand, be it a single letter, a whole word, or even part of a word.

Most popular LLM providers offer pay-as-you-go models based on token usage. These costs vary between providers and models, with generally more complex models charging a higher amount per token.

As of February 2025, prices per million tokens can range from as little as $0.15 for an input and $0.60 for an output (GPT-4o-mini) to $15 for an input and $75 for an output (Claude 3 Opus) and beyond.

A single conversation with an LLM usually uses thousands of tokens both in inputs and outputs, meaning these prices can escalate quickly in a large organisation, but you only pay for what you use.

Chatbot subscription costs

Chatbot subscription costs vary widely depending on the provider, features, and scale of implementation. These subscriptions often offer a more predictable pricing model for businesses compared to pay-as-you-go token-based pricing.

Taking Anthropic’s Claude as an example, subscriptions range from free to $25 per user/month, with custom pricing for plans better suited to businesses operating at scale. OpenAI charges similarly for their ChatGPT subscriptions.

Self-hosting costs

Some organisations may go a different route from paying subscriptions or token costs by choosing to self-host their own language model. While upfront costs are high, self-hosting can be more economical for certain applications.

For organisations hosting their own models, the upfront cost of setting up the infrastructure is likely to be the main expense. This includes the cost of specialised hardware, cloud computing, and energy.

These costs vary widely, but according to the 2024 AI Index Report, “the training costs of state-of-the-art AI models have reached unprecedented levels”, with Google’s Gemini Ultra costing $191 million worth of compute to train. In addition, according to Epoch AI, research and development (R&D) staff costs can account for a substantial 29-49% of the cost of training.

Other costs to consider

Fine-tuning: For businesses requiring customised models, there are additional costs associated with fine-tuning, which can vary based on the model and the extent of customisation required.
Training employees: LLMs are tools, and as such, they are only as useful as the skill and knowledge of the person using them. Therefore, companies must equip their workforce with the skills they need to get the most value out of the LLMs they implement.
Data and compliance costs: Investing in robust data governance and security measures is crucial. Navigating complex intellectual property rights, data protection regulations, and potential liability issues adds to overall costs.

‍

Cost management strategies

To manage GenAI costs efficiently, consider the following approaches:

Model selection

The complexity and size of an LLM significantly impact its cost. Larger models with more parameters require more computational resources and are consequently more expensive to operate.

Being selective and finding the right model for your specific use case is crucial.

Consider smaller models, which can cost significantly less than larger ones.
Find a balance between model accuracy and cost by testing different models to find the smallest one that meets your performance requirements.
For complex use cases, consider using different models for different tasks to optimise performance and cost. Read more about the benefits of using multiple models here.

Tracking and monitoring

Maintaining awareness of your budget and keeping an eye on opportunities for potential savings is one of the most simple and valuable ways you can reduce costs.

Use budget management tools, such as those built into Narus, to:

Set your budget,
Track your AI expenditure,
And limit access to your more expensive models by user, preventing individual employees and even whole teams from using models that aren’t cost-effective for their use case.

Prompt engineering

LLM providers typically charge based on the number of tokens processed. A token can be a word, part of a word, or even a single character. The more tokens your requests contain, the higher the cost. This applies to both input (the text you send to the model) and output (the response generated by the model).

Through careful prompt engineering, you can prevent the use of an unnecessary number of tokens, saving cost while gaining more accurate responses from LLMs.

Keep prompts concise (limiting your token use) while retaining essential information.
Meticulously design prompts to produce optimal outputs the first time around.
Define the length of response that you’re after to reduce the amount of unnecessary tokens used in the output.
Consolidate multiple prompts into one where possible.

Caching

Repeating the same prompts and generating the same responses unnecessarily is a wasted expense. Caching in LLMs is storing previously computed results to reuse them. This avoids redoing the same calculations, making the LLM faster and cheaper to run, especially when similar prompts are used repeatedly.

To reduce, reuse, and recycle previous outputs and effective prompts:

Use a prompt library, such as the one built into Narus, to store effective prompts that you or your team will use again,
And save previous responses from the LLM to avoid regenerating the same or similar output and using more tokens in the process.

Retrieval-augmented generation (RAG)

RAG is an advanced AI technique that combines the power of LLMs with external data sources, such as uploaded documents.

It’s worth considering whether RAG could save your business money, as this technique is generally more cost-effective than fine-tuning (another technique aimed at producing more accurate outputs from LLMs), which requires significantly more computational resources.

‍

The ROI of LLMs

Organisations are increasingly seeing financial benefits from AI, as shown in a recent McKinsey survey. 42% of respondents reported cost reductions, while 59% saw revenue growth. This represents a significant improvement in cost efficiency, with a 10% rise in the number of organisations reporting decreased costs compared to the previous year.

Despite the cost of implementing them, using LLMs where appropriate can save businesses a lot of money. By carefully considering these factors and implementing strategic cost management practices, businesses can harness the power of GenAI while keeping expenses under control. It's crucial to regularly review and adjust your approach as both the technology and pricing models in this field are rapidly evolving.