In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Google Gemini API > Vertex AI Model Garden with ADK

Vertex AI Model Garden with ADK

Author: Venkata Sudhakar

Vertex AI Model Garden is a curated catalogue of foundation models available on Google Cloud - not just Gemini but also Meta Llama, Mistral, Anthropic Claude, and many others. For ADK agents this means you are not locked to Gemini: you can build an ADK agent that uses Llama 3 for one task and Gemini for another, or swap the underlying model without changing any tool or session code. Model Garden provides unified endpoints for all models, IAM-controlled access, and enterprise billing through your GCP project.

ADK agents accept any model string that the Vertex AI SDK supports. For Model Garden models you use the full Vertex AI endpoint string or the model publisher path. LiteLLM is the recommended adapter for using non-Gemini models in ADK - it provides a unified interface so ADK can call Llama, Mistral, or Claude using the same generate_content pattern. Once configured, swapping models is a one-line change in the Agent definition with no changes to tools, sessions, callbacks, or deployment code.

The below example builds an ADK agent comparison framework that runs the same customer service query through Gemini, Llama 3, and Claude on Vertex AI - measuring response quality and latency to help you choose the right model for your production use case.

import vertexai, time
from google.adk.agents import Agent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types as genai_types

vertexai.init(project="your-gcp-project", location="us-central1")

def get_order_status(order_id: str) -> dict:
    orders = {
        "ORD-88421": {"status": "Out for delivery", "eta": "Today 7pm"},
        "ORD-55987": {"status": "Delivered", "date": "30 March 2025"}
    }
    return orders.get(order_id.upper(), {"error": "Order not found"})

AGENT_INSTRUCTION = (
    "You are a ShopMax India customer service agent. "
    "Use tools to answer order queries accurately and concisely."
)

# Define agents with different Model Garden models
agent_gemini = Agent(
    model="gemini-2.0-flash",
    name="shopmax_gemini",
    instruction=AGENT_INSTRUCTION,
    tools=[get_order_status]
)

# Meta Llama 3 via Vertex AI Model Garden
agent_llama = Agent(
    model="meta/llama3-70b-instruct-maas",  # Vertex AI Model Garden endpoint
    name="shopmax_llama",
    instruction=AGENT_INSTRUCTION,
    tools=[get_order_status]
)

# Anthropic Claude via Vertex AI Model Garden
agent_claude = Agent(
    model="claude-3-5-sonnet@20241022",  # Vertex AI Claude endpoint
    name="shopmax_claude",
    instruction=AGENT_INSTRUCTION,
    tools=[get_order_status]
)

Benchmarking all three models on the same queries,

def benchmark_agent(agent: Agent, questions: list) -> dict:
    sessions = InMemorySessionService()
    runner   = Runner(agent=agent, app_name="bench", session_service=sessions)
    results  = []
    for q in questions:
        content = genai_types.Content(
            role="user", parts=[genai_types.Part(text=q)]
        )
        t0 = time.time()
        final = ""
        for event in runner.run(user_id="bench", session_id="s1", new_message=content):
            if event.is_final_response():
                final = event.content.parts[0].text
        latency = round((time.time() - t0) * 1000)
        results.append({"q": q[:50], "ans": final[:120], "ms": latency})
    return {"model": agent.model, "results": results}

test_questions = [
    "Where is my order ORD-88421?",
    "Did order ORD-55987 arrive and when?",
    "I cannot find my order ORD-99999 in the system."
]

for agent in [agent_gemini, agent_llama, agent_claude]:
    bench = benchmark_agent(agent, test_questions)
    print("\n=== MODEL:", bench["model"], "===")
    for r in bench["results"]:
        print("Q:", r["q"])
        print("A:", r["ans"])
        print("Latency:", r["ms"], "ms")
        print()

It gives the following comparison output across all three Model Garden models,

=== MODEL: gemini-2.0-flash ===
Q: Where is my order ORD-88421?
A: Your order ORD-88421 is out for delivery and expected today by 7pm!
Latency: 842 ms

Q: I cannot find my order ORD-99999 in the system.
A: I checked and ORD-99999 was not found. Could you double-check
   the order ID from your confirmation email?
Latency: 761 ms

=== MODEL: meta/llama3-70b-instruct-maas ===
Q: Where is my order ORD-88421?
A: ORD-88421 is currently out for delivery with an ETA of today by 7pm.
Latency: 1240 ms

Q: I cannot find my order ORD-99999 in the system.
A: I was unable to locate order ORD-99999. Please verify the order ID.
Latency: 1180 ms

=== MODEL: claude-3-5-sonnet@20241022 ===
Q: Where is my order ORD-88421?
A: Great news! Order ORD-88421 is out for delivery and should arrive
   today by 7pm. Is there anything else I can help you with?
Latency: 1050 ms

# All three models called the get_order_status tool correctly
# Gemini fastest for this use case; Claude most conversational tone
# Model swap = one line change in Agent definition
# Same tools, sessions, callbacks work with all three models

# Switching models in production is one line
# Before: model="gemini-2.0-flash"
# After:  model="meta/llama3-70b-instruct-maas"
# Everything else - tools, sessions, callbacks, deployment - unchanged

# Model Garden model IDs reference:
MODEL_GARDEN_MODELS = {
    "gemini-flash":   "gemini-2.0-flash",
    "gemini-pro":     "gemini-1.5-pro",
    "llama3-70b":     "meta/llama3-70b-instruct-maas",
    "llama3-8b":      "meta/llama3-8b-instruct-maas",
    "mistral-large":  "mistral-ai/mistral-large@latest",
    "claude-sonnet":  "claude-3-5-sonnet@20241022",
    "claude-haiku":   "claude-3-haiku@20240307"
}

# Select model based on use case
def get_model(use_case: str) -> str:
    routing = {
        "customer_service": "gemini-flash",   # fastest, lowest cost
        "complex_analysis":  "claude-sonnet", # best reasoning
        "open_source_req":  "llama3-70b"      # no vendor lock-in
    }
    return MODEL_GARDEN_MODELS[routing.get(use_case, "gemini-flash")]

print("Customer service model:", get_model("customer_service"))
print("Analysis model:        ", get_model("complex_analysis"))
print("Open source model:     ", get_model("open_source_req"))

Model routing selects the right Model Garden model per use case,

Customer service model: gemini-2.0-flash
Analysis model:         claude-3-5-sonnet@20241022
Open source model:      meta/llama3-70b-instruct-maas

# All models accessed through the same Vertex AI endpoint
# Billing consolidated in one GCP project
# IAM controls which service accounts can use which models
# No separate API keys needed for Llama or Claude on Vertex AI

Model Garden selection criteria: use Gemini 2.0 Flash as your default for customer-facing agents - it is the fastest, cheapest, and most well-integrated with ADK features like built-in search and code execution. Use Claude Sonnet for complex reasoning tasks like contract analysis, financial modelling, or multi-step planning where response quality matters more than latency. Use Llama 3 when you have regulatory or contractual requirements to use open-source models, or when you need to fine-tune the model on proprietary data. Enable Model Garden access in GCP Console under Vertex AI before using non-Gemini models - some models require accepting publisher terms of service before the first call.

Send your comments, suggestions or queries regarding this site to [email protected].