In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Google Gemini API > Agent Rate Limiting and Abuse Prevention

Agent Rate Limiting and Abuse Prevention

Author: Venkata Sudhakar

When you expose ADK agents as API services, you must protect them from abuse. Without rate limiting, a single bad actor or runaway client can exhaust your Gemini API quota, spike costs, and degrade service for all other users. Rate limiting enforces fair usage by tracking request counts per user and rejecting excess requests before they reach the LLM.

The pattern uses an in-memory token bucket or sliding window counter per user ID. A FunctionTool checks the counter before each agent action and raises an error if the limit is exceeded. In production, replace the in-memory store with Redis to share rate state across multiple agent instances and survive restarts.

The below example shows ShopMax India protecting its customer-facing product recommendation agent from abuse.

import time
import google.genai as genai
from google.adk.agents import LlmAgent
from google.adk.tools import FunctionTool
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from collections import defaultdict

# In-memory rate limit store: {user_id: [timestamp, ...]}
REQUEST_LOG = defaultdict(list)
RATE_LIMIT = 5    # max requests
WINDOW_SECS = 60  # per minute

def check_rate_limit(user_id: str) -> dict:
    # Sliding window rate limiter
    now = time.time()
    window_start = now - WINDOW_SECS
    # Remove timestamps outside the window
    REQUEST_LOG[user_id] = [t for t in REQUEST_LOG[user_id] if t > window_start]
    count = len(REQUEST_LOG[user_id])
    if count >= RATE_LIMIT:
        retry_after = int(REQUEST_LOG[user_id][0] + WINDOW_SECS - now) + 1
        return {
            "allowed": False,
            "requests_made": count,
            "limit": RATE_LIMIT,
            "retry_after_seconds": retry_after
        }
    REQUEST_LOG[user_id].append(now)
    return {
        "allowed": True,
        "requests_made": count + 1,
        "limit": RATE_LIMIT,
        "remaining": RATE_LIMIT - count - 1
    }

def get_product_recommendation(category: str, budget_rs: int) -> dict:
    # Simulated product recommendation tool
    catalog = {
        "laptop": {"name": "Lenovo IdeaPad", "price": 45000, "rating": 4.3},
        "tv": {"name": "LG 43-inch 4K", "price": 35000, "rating": 4.5},
        "phone": {"name": "Redmi Note 13", "price": 18000, "rating": 4.4}
    }
    rec = catalog.get(category.lower())
    if not rec:
        return {"error": "Category not found"}
    if rec["price"] > budget_rs:
        return {"error": "No product in budget", "cheapest": rec["price"]}
    return rec

rate_tool = FunctionTool(func=check_rate_limit)
rec_tool = FunctionTool(func=get_product_recommendation)

agent = LlmAgent(
    name="shopmax_advisor",
    model="gemini-2.0-flash",
    tools=[rate_tool, rec_tool],
    instruction="""
    You are ShopMax India product advisor.
    ALWAYS call check_rate_limit with the user_id first.
    If allowed is False, reply ONLY with: "Rate limit exceeded. Please wait {retry_after_seconds} seconds."
    If allowed is True, help the user find a product.
    """
)

session_service = InMemorySessionService()
runner = Runner(agent=agent, app_name="shopmax", session_service=session_service)
session = session_service.create_session(app_name="shopmax", user_id="cust_001")

queries = [
    "user_id=cust_001 | Recommend a laptop under Rs 50000",
    "user_id=cust_001 | Recommend a TV under Rs 40000",
]

from google.genai import types
for query in queries:
    msg = types.Content(role="user", parts=[types.Part(text=query)])
    print(f"Query: {query[:40]}...")
    for event in runner.run(user_id="cust_001", session_id=session.id, new_message=msg):
        if event.is_final_response():
            print(f"Response: {event.content.parts[0].text}")
    print()

It gives the following output,

Query: user_id=cust_001 | Recommend a laptop under Rs...
Response: Great news! I found a match for you.
Product: Lenovo IdeaPad | Price: Rs 45,000 | Rating: 4.3/5
This laptop fits within your Rs 50,000 budget. (1/5 requests used this minute)

Query: user_id=cust_001 | Recommend a TV under Rs 400...
Response: Found it! LG 43-inch 4K TV at Rs 35,000.
Rating: 4.5/5 - Excellent choice for home entertainment. (2/5 requests used)

The rate limiter runs before any LLM call, so blocked requests consume zero Gemini API tokens. For ShopMax India in production, store REQUEST_LOG in Redis with automatic key expiry matching the window size. Add per-IP limits alongside per-user limits to catch unauthenticated abuse. Log all rate limit events to Cloud Logging and alert when any single user exhausts more than 80% of their limit - this signals a potential scraping attempt or a client-side bug that needs fixing.

Send your comments, suggestions or queries regarding this site to [email protected].