In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Google Gemini API > ADK with Computer Use - Browser Automation

ADK with Computer Use - Browser Automation

Author: Venkata Sudhakar

Computer use agents interact with software the way a human does - by looking at the screen and deciding what to click or type. Gemini's vision capabilities combined with Playwright-style browser control enable agents to navigate web UIs, extract information from dynamic pages, and complete multi-step workflows that have no API.

In this tutorial, we build a ShopMax India price monitoring agent that uses browser automation with Gemini vision to visit a competitor website, read product prices from the rendered page, and return a structured price comparison report - no scraping library required.

The below example shows how to integrate browser screenshot capture with Gemini vision analysis in an ADK tool.

import os
import base64
from playwright.sync_api import sync_playwright
from google.adk.agents import LlmAgent
from google.adk.tools import FunctionTool
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google import genai
from google.genai import types

def capture_page_screenshot(url: str) -> dict:
    """Navigate to a URL and return a base64 screenshot for vision analysis."""
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page(viewport={"width": 1280, "height": 800})
        page.goto(url, wait_until="networkidle", timeout=15000)
        screenshot_bytes = page.screenshot(full_page=False)
        browser.close()
    return {
        "url": url,
        "screenshot_b64": base64.b64encode(screenshot_bytes).decode(),
        "format": "png"
    }

def analyse_page_for_prices(url: str, product_query: str) -> dict:
    """Capture a page screenshot and use Gemini vision to extract product prices."""
    shot = capture_page_screenshot(url)

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
    image_part = types.Part.from_bytes(
        data=base64.b64decode(shot["screenshot_b64"]),
        mime_type="image/png"
    )
    prompt = f"Look at this webpage screenshot. Find prices for: {product_query}. Return a JSON list of {{product, price, currency}} objects. If not found, return empty list."

response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=[image_part, prompt],
        config=types.GenerateContentConfig(response_mime_type="application/json")
    )
    import json
    try:
        prices = json.loads(response.text)
    except Exception:
        prices = []
    return {"url": url, "product_query": product_query, "prices_found": prices}

Now wire it into an ADK agent for a price comparison workflow,

It gives the following output,

Price Analysis - Dell Inspiron 15

Competitor scan complete:
- Croma: Dell Inspiron 15 found at Rs 63,490

ShopMax current price: Rs 62,000
Competitor price: Rs 63,490
ShopMax is already Rs 1,490 cheaper than Croma.

Recommendation: No price adjustment needed. ShopMax is competitively priced.
Consider promoting the price advantage in marketing to drive volume.

This pattern works for any web UI that has no public API - competitor catalogs, government portals, legacy systems. For form-filling workflows, extend capture_page_screenshot to also accept click coordinates and text inputs, building a full computer-use loop. Always respect robots.txt and terms of service when automating web access.

Send your comments, suggestions or queries regarding this site to [email protected].