In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Google Gemini API > Gemini API with FastAPI and Async Python

Gemini API with FastAPI and Async Python

Author: Venkata Sudhakar

FastAPI with async Python allows your Gemini API service to handle many concurrent requests without blocking. When one request is waiting for a Gemini response, the server processes other incoming requests in parallel. ShopMax India serves 500+ concurrent customer support queries using a single FastAPI instance with async Gemini calls.

The Gemini Python SDK supports async via generate_content_async(). Combined with FastAPI async endpoints, you get full non-blocking request handling. Add request semaphores to stay within rate limits without rejecting requests.

The below example shows how ShopMax India builds an async FastAPI service for customer support queries.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import google.generativeai as genai
import asyncio

app = FastAPI(title="ShopMax India AI Support API")
genai.configure(api_key="your-api-key")
model = genai.GenerativeModel(
    "gemini-2.0-flash",
    system_instruction="You are a helpful customer support agent for ShopMax India electronics store."
)

# Semaphore to limit concurrent Gemini calls
gemini_semaphore = asyncio.Semaphore(20)

class SupportRequest(BaseModel):
    customer_id: str
    message: str

class SupportResponse(BaseModel):
    customer_id: str
    answer: str
    tokens_used: int

@app.post("/support/ask", response_model=SupportResponse)
async def ask_support(req: SupportRequest):
    if not req.message.strip():
        raise HTTPException(status_code=400, detail="Message cannot be empty")

async with gemini_semaphore:
        response = await model.generate_content_async(req.message)

return SupportResponse(
        customer_id=req.customer_id,
        answer=response.text,
        tokens_used=response.usage_metadata.total_token_count
    )

Handling multiple concurrent requests asynchronously,

It gives the following output,

cust_0: Laptops at ShopMax India have a 7-day replacement policy... [89 tokens]
cust_1: Yes, Samsung Galaxy S24 is available in 128GB and 256GB... [76 tokens]
cust_2: EMI is available from Rs 5,000. For Rs 50,000 you can... [94 tokens]
cust_3: Delivery to Hyderabad typically takes 2-3 business days... [71 tokens]
cust_4: Yes, ShopMax India offers exchange for old electronics... [83 tokens]
All 5 requests completed concurrently in 1.2s

Deploy this FastAPI app on Cloud Run with --concurrency 80 and --min-instances 2. The async model means each Cloud Run instance handles 80 simultaneous connections without spinning up new instances, keeping costs low while maintaining fast response times for ShopMax India customers.

Send your comments, suggestions or queries regarding this site to [email protected].