In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Google Gemini API > ADK A/B Testing Agents

ADK A/B Testing Agents

Author: Venkata Sudhakar

A/B testing lets you compare two versions of an ADK agent before committing to one in production. For ShopMax India, you might want to test whether a more detailed system instruction produces better customer responses, or whether a different model (gemini-2.0-flash vs gemini-1.5-pro) gives higher quality answers at acceptable latency.

The pattern below routes each request to either Agent A or Agent B based on a hash of the user ID, ensuring consistent assignment per user. Metrics are collected per variant so you can compare them after the test period.

import hashlib
import time
from google.adk.agents import LlmAgent
from google.adk.tools import FunctionTool
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google.genai.types import Content, Part

# Metrics storage per variant
metrics = {"A": {"requests": 0, "total_ms": 0}, "B": {"requests": 0, "total_ms": 0}}

def get_product_details(product_id: str) -> dict:
    """Get product details from ShopMax India catalogue."""
    return {"product_id": product_id, "name": "Samsung Galaxy S24", "price_rs": 74999, "stock": 45}

# Variant A - concise instruction
agent_a = LlmAgent(
    name="shopmax_agent_a",
    model="gemini-2.0-flash",
    tools=[FunctionTool(get_product_details)],
    instruction="You are a ShopMax India assistant. Answer customer queries about products."
)

# Variant B - detailed instruction with tone guidance
agent_b = LlmAgent(
    name="shopmax_agent_b",
    model="gemini-2.0-flash",
    tools=[FunctionTool(get_product_details)],
    instruction=(
        "You are a friendly ShopMax India sales assistant. "
        "Always mention the price in Rs, confirm stock availability, "
        "and end with an offer to help with the purchase."
    )
)

session_service = InMemorySessionService()
runner_a = Runner(agent=agent_a, session_service=session_service, app_name="shopmax_ab")
runner_b = Runner(agent=agent_b, session_service=session_service, app_name="shopmax_ab")

def assign_variant(user_id: str) -> str:
    h = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
    return "A" if h % 2 == 0 else "B"

def run_ab_test(user_id: str, query: str):
    variant = assign_variant(user_id)
    runner = runner_a if variant == "A" else runner_b

session = session_service.create_session(app_name="shopmax_ab", user_id=user_id)
    start = time.perf_counter()

events = list(runner.run(
        user_id=user_id,
        session_id=session.id,
        new_message=Content(parts=[Part(text=query)])
    ))

elapsed_ms = (time.perf_counter() - start) * 1000
    metrics[variant]["requests"] += 1
    metrics[variant]["total_ms"] += elapsed_ms

answer = ""
    for event in events:
        if hasattr(event, "content") and event.content:
            for part in event.content.parts:
                if hasattr(part, "text") and part.text:
                    answer = part.text.strip()

print(f"[Variant {variant}] User {user_id}: {round(elapsed_ms)}ms")
    print(f"Response: {answer[:120]}...")
    print()

# Simulate 6 users
test_users = ["u001", "u002", "u003", "u004", "u005", "u006"]
for uid in test_users:
    run_ab_test(uid, "Tell me about product SKU-S24")

# Print summary
print("--- A/B Test Summary ---")
for v, m in metrics.items():
    avg = round(m["total_ms"] / m["requests"]) if m["requests"] else 0
    print(f"Variant {v}: {m['requests']} requests, avg latency {avg} ms")

It gives the following output,

[Variant A] User u001: 1842ms
Response: Samsung Galaxy S24 is available at Rs 74,999 with 45 units in stock...

[Variant B] User u002: 1956ms
Response: Great news! The Samsung Galaxy S24 is priced at Rs 74,999 and we have
45 units in stock. Would you like help placing an order?...

[Variant A] User u003: 1798ms
Response: Samsung Galaxy S24 is available at Rs 74,999 with 45 units in stock...

[Variant B] User u004: 2011ms
Response: Great news! The Samsung Galaxy S24 is priced at Rs 74,999...

[Variant A] User u005: 1821ms
Response: Samsung Galaxy S24 is available at Rs 74,999...

[Variant B] User u006: 1978ms
Response: Great news! The Samsung Galaxy S24 is priced at Rs 74,999...

--- A/B Test Summary ---
Variant A: 3 requests, avg latency 1820 ms
Variant B: 3 requests, avg latency 1982 ms

Variant B produces more friendly, sales-oriented responses but costs an extra 162 ms on average - the longer instruction increases prompt tokens processed by the model. For ShopMax India, this is an acceptable trade-off if the conversion rate improves. Track click-through-to-purchase rates alongside latency to make the final decision.

In production, replace InMemorySessionService with Firestore-backed sessions so that the same user always gets the same variant across browser sessions. Store variant assignments in a separate collection for analysis, and use Cloud Monitoring dashboards to visualise the latency and quality differences over time.

Send your comments, suggestions or queries regarding this site to [email protected].