tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > Google Gemini API > ADK Agent Versioning and A/B Testing

ADK Agent Versioning and A/B Testing

Author: Venkata Sudhakar

Changing an agent instruction in production is risky without a controlled rollout mechanism. ADK agent versioning lets you deploy multiple versions of an agent simultaneously, route traffic between them, measure performance differences, and promote or rollback based on real metrics. This is the same discipline used for software deployments applied to agent prompt engineering.

ShopMax India A/B tests its product recommendation agent regularly. Version A uses a straightforward recommendation approach; Version B uses a more personalised prompt that factors in browsing history context. By routing 10% of traffic to Version B and measuring add-to-cart rate per recommendation, the team validated a 23% improvement before full rollout. Without A/B testing, this improvement would have been impossible to measure objectively.

The below example shows how to implement a simple traffic-splitting router for two versions of an ADK agent.


It gives the following output,

User CUST-1001 -> version: v1, latency: 1243ms
User CUST-9999 -> version: v2, latency: 1891ms

The below example shows how to log experiment results and compute the performance difference between versions.


It gives the following output,

v1: n=90, CVR=13.3%, latency=1448ms
v2: n=10, CVR=20.0%, latency=1521ms

Conclusion: v2 shows +50% relative improvement in CVR with only 73ms latency increase.
Recommend promoting v2 to 100% traffic.

The deterministic user bucketing via hash(user_id) % 100 ensures each user always gets the same agent version throughout an experiment, preventing the confounding effect of a user seeing different recommendations on different visits. For ShopMax India, this discipline means A/B test results are statistically meaningful and trustworthy - the team can confidently promote a new agent version knowing the measured improvement will hold at scale.


 
  


  
bl  br