tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > Large Language Models > LLM Cost Optimisation with Model Routing

LLM Cost Optimisation with Model Routing

Author: Venkata Sudhakar

Not every customer question needs your most powerful and expensive LLM. A customer asking "What are your store hours?" needs a simple factual answer - routing that to GPT-4o is like sending a courier by private jet. A customer asking for a personalised financial plan comparing three loan products genuinely needs deeper reasoning. Model routing analyses each incoming query and sends it to the cheapest model that can handle it well. For a business chatbot handling ten thousand queries per day, smart routing can reduce your LLM bill by 60-80% with no noticeable drop in quality.

The routing logic itself is a fast, cheap classification call - ask a small model to score the complexity of the query. Simple queries (greetings, FAQs, yes/no checks) go to a mini model. Medium queries (product comparisons, eligibility checks, short summaries) also go to a cost-effective model. Only genuinely complex queries (multi-step financial analysis, detailed complaint resolution, long document drafting) go to the premium model. You pay premium prices only for the small fraction of queries that truly need it - typically 5-10% of total volume.

The below example shows a retail bank routing customer queries across model tiers, measuring the token cost for each, and showing the total saving over a batch of queries.


Running six representative customer queries and comparing costs,


It gives the following output,

Query                                                   Tier     Model   Tokens  Cost USD
-----------------------------------------------------------------------------------------------
What are PrimBank branch timings on Sundays?            SIMPLE   mini    95      $0.000029
How do I reset my net banking password?                 SIMPLE   mini    88      $0.000026
What is the minimum balance for a savings account?      SIMPLE   mini    102     $0.000031
Can I open a joint account with my spouse online?       MEDIUM   mini    145     $0.000044
Compare your home loan and personal loan interest...    MEDIUM   mini    210     $0.000063
I have Rs 50 lakhs to invest - should I split...        COMPLEX  gpt-4o  320     $0.001600

Total with routing:    $0.00179
Total without routing: $0.00405
Cost saving: 56%

# 5 of 6 queries routed to the cheap model
# Only the complex investment question went to gpt-4o
# At 10,000 queries per day this saves thousands per month

Start by logging your actual query distribution for one week before building a routing system. You will typically find 60-70% are simple FAQs, 20-25% are medium complexity, and only 5-10% genuinely need the premium model. Build your routing classifier on that real data. Also combine routing with response caching - if the same FAQ is asked twenty times per hour, cache the first answer and serve it free. Together, routing and caching reduce LLM costs by 75-85% for most high-volume business applications.


 
  


  
bl  br