|
|
ADK Latency Profiling
Author: Venkata Sudhakar
Latency profiling helps you identify slow spots in your ADK agent pipeline. For ShopMax India, where agents handle real-time inventory checks and order processing, response time directly impacts customer experience. This tutorial shows how to measure latency at each stage - tool calls, model inference, and the full request cycle.
The approach uses Python's time module to record timestamps at key points, then aggregates these into a latency report. You can see exactly how long each component takes and where to focus optimisation efforts.
It gives the following output,
Query: Check stock and price for product SKU-7821
--------------------------------------------------
Tool [check_inventory]: 41.23 ms
Tool [get_price]: 20.87 ms
Model inference: ~1,643 ms
Total end-to-end: 1,705.10 ms
Answer: Product SKU-7821 has 120 units in stock at the Pune warehouse,
priced at Rs 15,999.
The profiling output shows that model inference takes the bulk of the time (around 1,643 ms) while the tool calls are fast (41 ms and 21 ms). This tells you that for ShopMax India agents, caching Gemini responses or using streaming output will have the highest impact on perceived latency, while tool-level optimisation offers smaller gains.
For production ShopMax India deployments, export these latency metrics to Cloud Monitoring using custom metrics. Set up alerts when end-to-end latency exceeds 3,000 ms (3 seconds), which typically indicates model slowdowns or tool timeouts that need investigation before customers notice.
|
|