In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Google Gemini API > Gemini API Rate Limiting and Retry

Gemini API Rate Limiting and Retry

Author: Venkata Sudhakar

The Gemini API enforces rate limits on requests per minute (RPM) and tokens per minute (TPM). In production applications, hitting these limits causes 429 errors that must be handled gracefully with exponential backoff and retry logic. ShopMax India implements a robust retry wrapper around all Gemini calls to ensure uninterrupted service during peak traffic hours.

The recommended pattern is exponential backoff with jitter - on a 429 response, wait an increasing delay before retrying, with a random jitter component to prevent thundering herd. The tenacity library simplifies this in Python. For sustained high volume, request a quota increase via the Google Cloud Console.

The below example shows a retry wrapper with exponential backoff for Gemini API calls.

import google.generativeai as genai
import time
import random
from google.api_core.exceptions import ResourceExhausted, ServiceUnavailable

genai.configure(api_key="your-api-key")
model = genai.GenerativeModel("gemini-2.0-flash")

def call_with_retry(prompt, max_retries=5, base_delay=1.0):
    for attempt in range(max_retries):
        try:
            response = model.generate_content(prompt)
            return response.text

except ResourceExhausted as e:
            if attempt == max_retries - 1:
                raise
            # Exponential backoff with jitter
            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limit hit. Retrying in {delay:.1f}s (attempt {attempt + 1}/{max_retries})")
            time.sleep(delay)

except ServiceUnavailable:
            if attempt == max_retries - 1:
                raise
            time.sleep(base_delay * (2 ** attempt))

result = call_with_retry("Summarise ShopMax India return policy in 2 sentences.")
print(result)

It gives the following output,

ShopMax India accepts returns within 10 days of delivery for electronics in original condition with all accessories. Damaged or used items are not eligible unless the damage was present at delivery.

The below example shows a concurrent request manager that throttles parallel Gemini calls to stay within quota limits.

It gives the following output,

Product 1: ShopMax Product 1 delivers premium quality at an affordable price for Indian homes.
Product 2: ShopMax Product 2 combines modern design with reliable performance.
Product 3: ShopMax Product 3 is the top choice for value-conscious buyers in India.
Product 4: ShopMax Product 4 offers cutting-edge features backed by a 1-year warranty.
Product 5: ShopMax Product 5 is built for durability and everyday convenience.

ShopMax India handles peak loads of 200+ concurrent Gemini requests during sale events. The semaphore-based concurrency control combined with exponential backoff keeps the error rate below 0.1% even during Diwali sale traffic spikes. For sustained high-volume needs, the team uses the Vertex AI endpoint which offers dedicated quota separate from the shared API pool.

Send your comments, suggestions or queries regarding this site to [email protected].