tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > Google Gemini API > ADK Health Checks and Readiness Probes

ADK Health Checks and Readiness Probes

Author: Venkata Sudhakar

Cloud Run health probes determine whether an ADK agent instance is ready to serve traffic and whether it remains alive during operation. Without probes, Cloud Run sends traffic to instances that are still loading (causing errors) or to instances that have entered a broken state (causing silent failures). ShopMax India configures startup, liveness, and readiness probes on all production agents to ensure zero-downtime deployments and automatic recovery from hung instances.

Cloud Run supports three probe types: startup probes (delay before first health check, used for slow-starting agents), liveness probes (kill and restart the instance if it fails), and readiness probes (remove the instance from the load balancer without killing it). Each probe can use an HTTP endpoint, a TCP port check, or a gRPC health check. For ADK agents, an HTTP /healthz endpoint is the simplest and most reliable approach.

The below example shows an ADK agent wrapped in a FastAPI server with health endpoints and the corresponding Cloud Run deployment configuration.


It gives the following output,

# GET /healthz/live
{"status": "alive", "uptime_seconds": 142}

# GET /healthz/ready  (before init complete)
HTTP 503: {"detail": "Agent not yet initialised"}

# GET /healthz/ready  (after init complete)
{"status": "ready"}

It gives the following output,

Applying new configuration to Cloud Run service [shopmax-support-agent]...
Probes configured:
  Startup  probe: GET /healthz/ready  (max wait 60s)
  Liveness probe: GET /healthz/live   (every 30s, fails after 3 misses -> restart)

Service is healthy and serving traffic.

For ADK agents that connect to Firestore or external APIs during startup, add those connectivity checks to the /healthz/ready endpoint so traffic is only routed to fully initialised instances. Set the liveness probe period to 30 seconds and failure threshold to 3 so that a temporarily slow Gemini API response does not incorrectly trigger a restart. Monitor probe failure counts in Cloud Monitoring to detect recurring startup issues early.


 
  


  
bl  br