tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > Google Gemini API > ADK Error Handling and Retry Patterns

ADK Error Handling and Retry Patterns

Author: Venkata Sudhakar

Production ADK agents encounter errors constantly � external APIs time out, databases return unexpected results, third-party services go down temporarily. An agent that crashes or returns a confusing error message on the first failure is not production-ready. Resilient agents handle errors gracefully at every layer: tools catch and return structured error responses so the agent can explain the issue conversationally, retry logic handles transient failures transparently, and fallback tools provide degraded-but-functional responses when primary sources are unavailable.

The key design principle is that tools should never raise unhandled exceptions to the agent. Instead, tools return structured dicts that include an error key when something goes wrong. The agent reads the error, reasons about it, and responds appropriately � apologising, suggesting alternatives, or asking the user to try again. For transient errors like network timeouts, wrap tool calls with exponential backoff retry logic. For persistent failures, provide a fallback tool that returns cached or static data so the agent can still give a useful response.

The below example shows a resilient order tracking agent with error handling at tool level, automatic retry with exponential backoff for network errors, and a fallback to cached data when the primary order API is unavailable.


Testing the agent under normal and failure conditions,


It gives the following output showing graceful handling at all failure levels,

Q: Where is order ORD-88421?
Attempt 1 failed. Retrying in 1s...
Attempt 2 failed. Retrying in 2s...
[Cache fallback used]
Your order ORD-88421 shows as in transit based on our last update. Our live
tracking is temporarily experiencing delays, so this may not reflect the
very latest status. For the most current information, please call 1800-SHOPMAX
or check the Delhivery app with your tracking number.

Q: Track order ORD-99999 (invalid ID)
I was not able to find order ORD-99999 in our system. Could you double-check
the order ID from your confirmation email? It should start with ORD- followed
by 5 digits. I am happy to try again once you have the correct number!

# Retry logic: 3 attempts with 1s/2s backoff before fallback
# Cache fallback: stale data is better than a crash or empty response
# Agent never exposed the raw ConnectionError to the customer
# Invalid order: structured error returned, agent responds conversationally

Error handling architecture for production: tools are the error boundary � all exceptions are caught inside the tool function and returned as structured error dicts. The agent layer never sees raw Python exceptions. Use three retry attempts with exponential backoff (1s, 2s, 4s) for all network-dependent tools. Always provide a cache or static fallback for critical tools like order status, product availability, and account balance � users tolerate slightly stale data far better than a complete failure. Log every tool error with the session_id and error details using the ADK callback system (Tutorial 325) so your operations team can monitor tool failure rates and fix root causes proactively.


 
  


  
bl  br