|
|
Agent Rate Limiting and Abuse Prevention
Author: Venkata Sudhakar
When you expose ADK agents as API services, you must protect them from abuse. Without rate limiting, a single bad actor or runaway client can exhaust your Gemini API quota, spike costs, and degrade service for all other users. Rate limiting enforces fair usage by tracking request counts per user and rejecting excess requests before they reach the LLM.
The pattern uses an in-memory token bucket or sliding window counter per user ID. A FunctionTool checks the counter before each agent action and raises an error if the limit is exceeded. In production, replace the in-memory store with Redis to share rate state across multiple agent instances and survive restarts.
The below example shows ShopMax India protecting its customer-facing product recommendation agent from abuse.
It gives the following output,
Query: user_id=cust_001 | Recommend a laptop under Rs...
Response: Great news! I found a match for you.
Product: Lenovo IdeaPad | Price: Rs 45,000 | Rating: 4.3/5
This laptop fits within your Rs 50,000 budget. (1/5 requests used this minute)
Query: user_id=cust_001 | Recommend a TV under Rs 400...
Response: Found it! LG 43-inch 4K TV at Rs 35,000.
Rating: 4.5/5 - Excellent choice for home entertainment. (2/5 requests used)
The rate limiter runs before any LLM call, so blocked requests consume zero Gemini API tokens. For ShopMax India in production, store REQUEST_LOG in Redis with automatic key expiry matching the window size. Add per-IP limits alongside per-user limits to catch unauthenticated abuse. Log all rate limit events to Cloud Logging and alert when any single user exhausts more than 80% of their limit - this signals a potential scraping attempt or a client-side bug that needs fixing.
|
|