In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Google Gemini API > ADK Shadow Mode Evaluation

ADK Shadow Mode Evaluation

Author: Venkata Sudhakar

Shadow mode evaluation lets you run a candidate agent in parallel with your production agent, capturing its outputs for analysis without exposing users to potentially worse responses. For ShopMax India, this is the safest way to test a new model version or updated instructions before a full rollout.

The shadow runner receives the same input as production, generates a response, but discards it from the user-facing output. Both responses are logged for offline comparison using quality metrics or human review.

import threading
import json
from datetime import datetime
from google.adk.agents import LlmAgent
from google.adk.tools import FunctionTool
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google.genai.types import Content, Part

shadow_log = []

def get_order_status(order_id: str) -> dict:
    """Get order status for a ShopMax India order."""
    return {"order_id": order_id, "status": "Dispatched", "eta": "2 days", "city": "Bangalore"}

# Production agent (stable)
prod_agent = LlmAgent(
    name="prod_agent",
    model="gemini-2.0-flash",
    tools=[FunctionTool(get_order_status)],
    instruction="You are a ShopMax India order support agent. Answer order status queries clearly."
)

# Shadow agent (candidate - new instruction style)
shadow_agent = LlmAgent(
    name="shadow_agent",
    model="gemini-2.0-flash",
    tools=[FunctionTool(get_order_status)],
    instruction=(
        "You are a ShopMax India order support agent. "
        "Always start with the current status, then give the ETA, "
        "and close with a support contact offer."
    )
)

prod_sessions = InMemorySessionService()
shadow_sessions = InMemorySessionService()
prod_runner = Runner(agent=prod_agent, session_service=prod_sessions, app_name="shopmax_prod")
shadow_runner = Runner(agent=shadow_agent, session_service=shadow_sessions, app_name="shopmax_shadow")

def extract_text(events):
    for event in events:
        if hasattr(event, "content") and event.content:
            for part in event.content.parts:
                if hasattr(part, "text") and part.text:
                    return part.text.strip()
    return ""

def run_shadow(user_id, session_id, query, prod_response):
    session = shadow_sessions.create_session(app_name="shopmax_shadow", user_id=user_id)
    events = list(shadow_runner.run(
        user_id=user_id,
        session_id=session.id,
        new_message=Content(parts=[Part(text=query)])
    ))
    shadow_response = extract_text(events)
    shadow_log.append({
        "ts": datetime.utcnow().isoformat(),
        "user_id": user_id,
        "query": query,
        "prod": prod_response,
        "shadow": shadow_response
    })

def handle_request(user_id: str, query: str) -> str:
    session = prod_sessions.create_session(app_name="shopmax_prod", user_id=user_id)
    events = list(prod_runner.run(
        user_id=user_id,
        session_id=session.id,
        new_message=Content(parts=[Part(text=query)])
    ))
    prod_response = extract_text(events)

# Fire shadow evaluation in background
    t = threading.Thread(target=run_shadow, args=(user_id, session.id, query, prod_response))
    t.daemon = True
    t.start()

return prod_response  # only prod response goes to user

# Simulate requests
for uid, oid in [("u101", "ORD-441"), ("u102", "ORD-552"), ("u103", "ORD-663")]:
    resp = handle_request(uid, f"What is the status of my order {oid}?")
    print(f"[User {uid}] -> {resp[:100]}")

import time; time.sleep(5)  # wait for shadow threads

print("\n--- Shadow Log (for review) ---")
for entry in shadow_log:
    print(f"Query: {entry['query']}")
    print(f"  PROD:   {entry['prod'][:90]}")
    print(f"  SHADOW: {entry['shadow'][:90]}")
    print()

It gives the following output,

[User u101] -> Your order ORD-441 has been dispatched and is expected to arrive in 2 days in Bangalore.
[User u102] -> Your order ORD-552 has been dispatched and is expected to arrive in 2 days in Bangalore.
[User u103] -> Your order ORD-663 has been dispatched and is expected to arrive in 2 days in Bangalore.

--- Shadow Log (for review) ---
Query: What is the status of my order ORD-441?
  PROD:   Your order ORD-441 has been dispatched and is expected to arrive in 2 days in Bangalore.
  SHADOW: Status: Dispatched. Your order ORD-441 will arrive in Bangalore in 2 days.
          Need help? Contact ShopMax India support at any time.

Query: What is the status of my order ORD-552?
  PROD:   Your order ORD-552 has been dispatched...
  SHADOW: Status: Dispatched. Your order ORD-552 will arrive in Bangalore in 2 days...

The shadow log shows both responses side by side. The shadow agent adds a support contact offer as instructed, making responses slightly longer but more helpful. After reviewing 1,000 shadow comparisons, ShopMax India can decide whether to promote the shadow agent to production based on human reviewer ratings or automated quality scoring.

In production, write shadow logs to BigQuery instead of an in-memory list. Use a BigQuery table with columns for timestamp, user_id, query, prod_response, and shadow_response. Then run SQL queries to compare response lengths, keyword presence, and sentiment scores across thousands of real customer interactions before making the promotion decision.

Send your comments, suggestions or queries regarding this site to [email protected].