|
|
Gemini API Streaming with Server-Sent Events
Author: Venkata Sudhakar
By default the Gemini API waits until the full response is generated before returning it. For chat interfaces, this creates a poor user experience - the screen is blank for several seconds before the full answer appears. Streaming mode returns tokens as they are generated, enabling a typewriter effect that feels responsive. ShopMax India uses streaming for their customer support chat interface. Streaming is enabled by passing stream=True to generate_content. The response is an iterator of partial chunks. Each chunk has a text attribute with the newly generated tokens. For web delivery, wrap this in a Flask endpoint using Server-Sent Events. The below example shows how ShopMax India implements streaming Gemini responses for their customer chat interface.
It gives the following output (tokens appear progressively),
ShopMax India Return Policy for Electronics:
1. 7-Day Replacement: All electronics are eligible for replacement
within 7 days of delivery if found defective...
[tokens stream in real-time]
Total tokens: 312
For web chat interfaces, expose streaming responses as Server-Sent Events from a Flask endpoint. The browser receives tokens in real-time and appends them to the chat bubble as they arrive.
It gives the following output (SSE stream from the endpoint),
GET /chat/stream?message=What+is+your+return+policy
data: ShopMax
data: India
data: offers
data: a 7-day
data: replacement
data: policy...
data: [DONE]
On the browser side, use the EventSource JavaScript API to consume the SSE stream and append tokens to the chat bubble in real-time. This gives ShopMax India customers a fast, responsive chat experience even for long answers about product specifications or warranty terms.
|
|