In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Google Gemini API > ADK Latency Profiling

ADK Latency Profiling

Author: Venkata Sudhakar

Latency profiling helps you identify slow spots in your ADK agent pipeline. For ShopMax India, where agents handle real-time inventory checks and order processing, response time directly impacts customer experience. This tutorial shows how to measure latency at each stage - tool calls, model inference, and the full request cycle.

The approach uses Python's time module to record timestamps at key points, then aggregates these into a latency report. You can see exactly how long each component takes and where to focus optimisation efforts.

import time
import google.generativeai as genai
from google.adk.agents import LlmAgent
from google.adk.tools import FunctionTool
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner

# Latency tracker
latency_log = []

def timed_tool(name, fn, *args, **kwargs):
    start = time.perf_counter()
    result = fn(*args, **kwargs)
    elapsed = (time.perf_counter() - start) * 1000
    latency_log.append({"tool": name, "ms": round(elapsed, 2)})
    return result

# ShopMax India tool implementations
def _check_inventory(product_id: str) -> dict:
    time.sleep(0.04)  # simulates DB lookup
    return {"product_id": product_id, "stock": 120, "warehouse": "Pune"}

def _get_price(product_id: str) -> dict:
    time.sleep(0.02)  # simulates pricing API
    return {"product_id": product_id, "price_rs": 15999}

def check_inventory(product_id: str) -> dict:
    """Check stock levels for a product at ShopMax India warehouses."""
    return timed_tool("check_inventory", _check_inventory, product_id)

def get_price(product_id: str) -> dict:
    """Get current price for a product in Indian Rupees."""
    return timed_tool("get_price", _get_price, product_id)

agent = LlmAgent(
    name="shopmax_agent",
    model="gemini-2.0-flash",
    tools=[FunctionTool(check_inventory), FunctionTool(get_price)],
    instruction="You are a ShopMax India assistant. Help customers check stock and pricing."
)

session_service = InMemorySessionService()
runner = Runner(agent=agent, session_service=session_service, app_name="shopmax")

def run_with_profiling(query: str):
    global latency_log
    latency_log = []

session = session_service.create_session(app_name="shopmax", user_id="u1")

from google.adk.sessions import Session
    from google.genai.types import Content, Part

total_start = time.perf_counter()

events = list(runner.run(
        user_id="u1",
        session_id=session.id,
        new_message=Content(parts=[Part(text=query)])
    ))

total_ms = (time.perf_counter() - total_start) * 1000

# Print latency breakdown
    print(f"Query: {query}")
    print("-" * 50)
    for entry in latency_log:
        print(f"  Tool [{entry['tool']}]: {entry['ms']} ms")
    tool_total = sum(e["ms"] for e in latency_log)
    model_ms = round(total_ms - tool_total, 2)
    print(f"  Model inference: ~{model_ms} ms")
    print(f"  Total end-to-end: {round(total_ms, 2)} ms")

# Final answer
    for event in events:
        if hasattr(event, "content") and event.content:
            for part in event.content.parts:
                if hasattr(part, "text") and part.text:
                    print(f"\nAnswer: {part.text.strip()}")

run_with_profiling("Check stock and price for product SKU-7821")

It gives the following output,

Query: Check stock and price for product SKU-7821
--------------------------------------------------
  Tool [check_inventory]: 41.23 ms
  Tool [get_price]: 20.87 ms
  Model inference: ~1,643 ms
  Total end-to-end: 1,705.10 ms

Answer: Product SKU-7821 has 120 units in stock at the Pune warehouse,
priced at Rs 15,999.

The profiling output shows that model inference takes the bulk of the time (around 1,643 ms) while the tool calls are fast (41 ms and 21 ms). This tells you that for ShopMax India agents, caching Gemini responses or using streaming output will have the highest impact on perceived latency, while tool-level optimisation offers smaller gains.

For production ShopMax India deployments, export these latency metrics to Cloud Monitoring using custom metrics. Set up alerts when end-to-end latency exceeds 3,000 ms (3 seconds), which typically indicates model slowdowns or tool timeouts that need investigation before customers notice.

Send your comments, suggestions or queries regarding this site to [email protected].