|
|
Gemini Live API - Voice Customer Support Agent
Author: Venkata Sudhakar
The Gemini Live API enables real-time, bidirectional audio conversations with Gemini models. Unlike text-based chat, Live API streams audio in and out simultaneously, enabling natural spoken dialogue with sub-second latency. This is ideal for voice IVR systems, call centre automation, and voice-enabled kiosks.
In this tutorial, we demonstrate the Gemini Live API setup for a ShopMax India voice support agent. The agent listens to customer audio, processes queries about orders and products, and responds with synthesised speech - all in a continuous streaming session.
The below example shows how to configure and run a Live API session with function calling for a voice support scenario.
Now run the async Live API session loop,
It gives the following output,
Voice agent ready. Sending simulated customer query...
Audio chunk received: 4096 bytes
Audio chunk received: 4096 bytes
Audio chunk received: 3200 bytes
Transcript: Your order ORD-1001 is currently out for delivery and is expected
to arrive today by 6 PM in Mumbai. Is there anything else I can help you with?
Turn complete.
In production, replace the text prompt with real-time PCM audio bytes from a microphone stream, and pipe the received audio chunks to a speaker. The Live API supports interruption handling - the model stops speaking when it detects the user talking. Use session.send_realtime_input() for continuous microphone streaming.
|
|