|
|
ADK Agent Cost Optimisation Strategies
Author: Venkata Sudhakar
Gemini API costs scale with token consumption. For production ADK agents handling thousands of conversations daily, unoptimised prompts and redundant API calls can lead to significant unnecessary spend. Cost optimisation is therefore a first-class engineering concern alongside accuracy and latency.
ShopMax India runs ADK agents for customer service, product recommendations, and inventory queries. By applying model tiering, context pruning, response caching, and batch processing, the engineering team reduced API spend by 60% without degrading response quality for end users.
The below example shows model tiering - routing simple queries to a cheaper model and complex queries to a more capable model using an ADK classification layer.
It gives the following output,
Query: What are your store hours in Mumbai?
Model: gemini-2.0-flash | Complexity: SIMPLE | Tokens: 48
Query: Analyse Q1 sales trends and recommend inventory adj...
Model: gemini-2.0-pro | Complexity: COMPLEX | Tokens: 1247
The below example shows response caching and context pruning to avoid redundant API calls and keep prompt sizes small for ShopMax India agent sessions.
It gives the following output,
[API CALL] tokens=112
ShopMax India accepts returns within 30 days of purchase with original receipt...
--- Second call ---
[CACHE HIT] key=a3f91c2d
ShopMax India accepts returns within 30 days of purchase with original receipt...
Combining model tiering, response caching, and context pruning in ShopMax India ADK agents delivered a 60% reduction in monthly API spend while maintaining over 95% customer satisfaction scores. These techniques are straightforward to layer into existing ADK agent architectures without structural changes.
|
|