|
|
LLM Cost Optimisation with Model Routing
Author: Venkata Sudhakar
Not every customer question needs your most powerful and expensive LLM. A customer asking "What are your store hours?" needs a simple factual answer - routing that to GPT-4o is like sending a courier by private jet. A customer asking for a personalised financial plan comparing three loan products genuinely needs deeper reasoning. Model routing analyses each incoming query and sends it to the cheapest model that can handle it well. For a business chatbot handling ten thousand queries per day, smart routing can reduce your LLM bill by 60-80% with no noticeable drop in quality. The routing logic itself is a fast, cheap classification call - ask a small model to score the complexity of the query. Simple queries (greetings, FAQs, yes/no checks) go to a mini model. Medium queries (product comparisons, eligibility checks, short summaries) also go to a cost-effective model. Only genuinely complex queries (multi-step financial analysis, detailed complaint resolution, long document drafting) go to the premium model. You pay premium prices only for the small fraction of queries that truly need it - typically 5-10% of total volume. The below example shows a retail bank routing customer queries across model tiers, measuring the token cost for each, and showing the total saving over a batch of queries.
Running six representative customer queries and comparing costs,
It gives the following output,
Query Tier Model Tokens Cost USD
-----------------------------------------------------------------------------------------------
What are PrimBank branch timings on Sundays? SIMPLE mini 95 $0.000029
How do I reset my net banking password? SIMPLE mini 88 $0.000026
What is the minimum balance for a savings account? SIMPLE mini 102 $0.000031
Can I open a joint account with my spouse online? MEDIUM mini 145 $0.000044
Compare your home loan and personal loan interest... MEDIUM mini 210 $0.000063
I have Rs 50 lakhs to invest - should I split... COMPLEX gpt-4o 320 $0.001600
Total with routing: $0.00179
Total without routing: $0.00405
Cost saving: 56%
# 5 of 6 queries routed to the cheap model
# Only the complex investment question went to gpt-4o
# At 10,000 queries per day this saves thousands per month
Start by logging your actual query distribution for one week before building a routing system. You will typically find 60-70% are simple FAQs, 20-25% are medium complexity, and only 5-10% genuinely need the premium model. Build your routing classifier on that real data. Also combine routing with response caching - if the same FAQ is asked twenty times per hour, cache the first answer and serve it free. Together, routing and caching reduce LLM costs by 75-85% for most high-volume business applications.
|
|