|
|
Gemini API with FastAPI and Async Python
Author: Venkata Sudhakar
FastAPI with async Python allows your Gemini API service to handle many concurrent requests without blocking. When one request is waiting for a Gemini response, the server processes other incoming requests in parallel. ShopMax India serves 500+ concurrent customer support queries using a single FastAPI instance with async Gemini calls. The Gemini Python SDK supports async via generate_content_async(). Combined with FastAPI async endpoints, you get full non-blocking request handling. Add request semaphores to stay within rate limits without rejecting requests. The below example shows how ShopMax India builds an async FastAPI service for customer support queries.
Handling multiple concurrent requests asynchronously,
It gives the following output,
cust_0: Laptops at ShopMax India have a 7-day replacement policy... [89 tokens]
cust_1: Yes, Samsung Galaxy S24 is available in 128GB and 256GB... [76 tokens]
cust_2: EMI is available from Rs 5,000. For Rs 50,000 you can... [94 tokens]
cust_3: Delivery to Hyderabad typically takes 2-3 business days... [71 tokens]
cust_4: Yes, ShopMax India offers exchange for old electronics... [83 tokens]
All 5 requests completed concurrently in 1.2s
Deploy this FastAPI app on Cloud Run with --concurrency 80 and --min-instances 2. The async model means each Cloud Run instance handles 80 simultaneous connections without spinning up new instances, keeping costs low while maintaining fast response times for ShopMax India customers.
|
|