In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Google Gemini API > Gemini Live API - Voice Customer Support Agent

Gemini Live API - Voice Customer Support Agent

Author: Venkata Sudhakar

The Gemini Live API enables real-time, bidirectional audio conversations with Gemini models. Unlike text-based chat, Live API streams audio in and out simultaneously, enabling natural spoken dialogue with sub-second latency. This is ideal for voice IVR systems, call centre automation, and voice-enabled kiosks.

In this tutorial, we demonstrate the Gemini Live API setup for a ShopMax India voice support agent. The agent listens to customer audio, processes queries about orders and products, and responds with synthesised speech - all in a continuous streaming session.

The below example shows how to configure and run a Live API session with function calling for a voice support scenario.

import asyncio
import os
from google import genai
from google.genai import types

# Tool definitions for the voice agent
def get_order_status(order_id: str) -> dict:
    """Check order delivery status for ShopMax India."""
    orders = {
        "ORD-1001": {"status": "Out for Delivery", "eta": "Today by 6 PM", "city": "Mumbai"},
        "ORD-1002": {"status": "Delivered", "date": "Yesterday", "city": "Bangalore"},
        "ORD-1003": {"status": "Processing", "eta": "3-5 business days", "city": "Delhi"},
    }
    return orders.get(order_id, {"status": "Not found", "message": "Please check your order ID"})

# Live API config - voice in, voice out
live_config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    system_instruction=types.Content(
        parts=[types.Part(text="""You are a friendly ShopMax India voice support agent.
Speak clearly and concisely. Use get_order_status to check order details.
Always confirm order IDs by repeating them back to the customer.""")]
    ),
    tools=[types.Tool(
        function_declarations=[types.FunctionDeclaration(
            name="get_order_status",
            description="Check the delivery status of a ShopMax India order",
            parameters=types.Schema(
                type="OBJECT",
                properties={"order_id": types.Schema(type="STRING")},
                required=["order_id"]
            )
        )]
    )],
    speech_config=types.SpeechConfig(
        voice_config=types.VoiceConfig(
            prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name="Puck")
        )
    )
)

Now run the async Live API session loop,

async def run_voice_support():
    client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

async with client.aio.live.connect(
        model="gemini-2.0-flash-live-001",
        config=live_config
    ) as session:

print("Voice agent ready. Sending simulated customer query...")

# In production: stream microphone audio bytes
        # Here we send a text prompt to simulate the voice query
        await session.send_client_content(
            turns=types.Content(
                role="user",
                parts=[types.Part(text="Hi, what is the status of my order ORD-1001?")]
            ),
            turn_complete=True
        )

# Receive streamed response (audio chunks + tool calls)
        async for response in session.receive():
            if response.tool_call:
                # Handle function call from the model
                for fc in response.tool_call.function_calls:
                    result = get_order_status(**{k: v for k, v in fc.args.items()})
                    await session.send_tool_response(
                        function_responses=[types.FunctionResponse(
                            name=fc.name,
                            id=fc.id,
                            response=result
                        )]
                    )
            if response.data:  # audio bytes
                print(f"Audio chunk received: {len(response.data)} bytes")
            if response.text:
                print(f"Transcript: {response.text}")
            if response.server_content and response.server_content.turn_complete:
                print("Turn complete.")
                break

asyncio.run(run_voice_support())

It gives the following output,

Voice agent ready. Sending simulated customer query...
Audio chunk received: 4096 bytes
Audio chunk received: 4096 bytes
Audio chunk received: 3200 bytes
Transcript: Your order ORD-1001 is currently out for delivery and is expected
to arrive today by 6 PM in Mumbai. Is there anything else I can help you with?
Turn complete.

In production, replace the text prompt with real-time PCM audio bytes from a microphone stream, and pipe the received audio chunks to a speaker. The Live API supports interruption handling - the model stops speaking when it detects the user talking. Use session.send_realtime_input() for continuous microphone streaming.

Send your comments, suggestions or queries regarding this site to [email protected].