|
|
Gemini API Rate Limiting and Retry
Author: Venkata Sudhakar
The Gemini API enforces rate limits on requests per minute (RPM) and tokens per minute (TPM). In production applications, hitting these limits causes 429 errors that must be handled gracefully with exponential backoff and retry logic. ShopMax India implements a robust retry wrapper around all Gemini calls to ensure uninterrupted service during peak traffic hours. The recommended pattern is exponential backoff with jitter - on a 429 response, wait an increasing delay before retrying, with a random jitter component to prevent thundering herd. The tenacity library simplifies this in Python. For sustained high volume, request a quota increase via the Google Cloud Console. The below example shows a retry wrapper with exponential backoff for Gemini API calls.
It gives the following output,
ShopMax India accepts returns within 10 days of delivery for electronics in original condition with all accessories. Damaged or used items are not eligible unless the damage was present at delivery.
The below example shows a concurrent request manager that throttles parallel Gemini calls to stay within quota limits.
It gives the following output,
Product 1: ShopMax Product 1 delivers premium quality at an affordable price for Indian homes.
Product 2: ShopMax Product 2 combines modern design with reliable performance.
Product 3: ShopMax Product 3 is the top choice for value-conscious buyers in India.
Product 4: ShopMax Product 4 offers cutting-edge features backed by a 1-year warranty.
Product 5: ShopMax Product 5 is built for durability and everyday convenience.
ShopMax India handles peak loads of 200+ concurrent Gemini requests during sale events. The semaphore-based concurrency control combined with exponential backoff keeps the error rate below 0.1% even during Diwali sale traffic spikes. For sustained high-volume needs, the team uses the Vertex AI endpoint which offers dedicated quota separate from the shared API pool.
|
|