|
|
MCP Server Rate Limiting and Throttling
Author: Venkata Sudhakar
When multiple ADK agents share an MCP server, a single agent can exhaust downstream API quotas or degrade performance for others. Rate limiting at the MCP server layer ensures fair usage across callers, protects external APIs from overload, and makes tool call behaviour predictable under high concurrency. In this tutorial, you will add a token bucket rate limiter to an MCP server. Each caller is tracked by a client ID passed as a tool argument. If the bucket is empty, the server returns an error message instead of calling the downstream API. This approach requires no external dependencies beyond the Python standard library. The implementation below uses a simple in-memory token bucket. Each client starts with a full bucket of tokens. One token is consumed per tool call. Tokens refill at a fixed rate using elapsed time since the last call.
The test below simulates burst calls from two clients. Calls within the token budget succeed immediately. Once the bucket empties, the server returns a rate limit error that the agent can surface to the user or handle with a retry strategy.
This pattern works well when a shared MCP server is accessed by many agents concurrently. For production use, replace the in-memory dictionary with Redis or Memorystore to share rate limit state across multiple server instances. You can also apply different rate limits per tool or per client tier.
|
|