tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > Google Gemini API > ADK with Computer Use - Browser Automation

ADK with Computer Use - Browser Automation

Author: Venkata Sudhakar

Computer use agents interact with software the way a human does - by looking at the screen and deciding what to click or type. Gemini's vision capabilities combined with Playwright-style browser control enable agents to navigate web UIs, extract information from dynamic pages, and complete multi-step workflows that have no API.

In this tutorial, we build a ShopMax India price monitoring agent that uses browser automation with Gemini vision to visit a competitor website, read product prices from the rendered page, and return a structured price comparison report - no scraping library required.

The below example shows how to integrate browser screenshot capture with Gemini vision analysis in an ADK tool.


Now wire it into an ADK agent for a price comparison workflow,


It gives the following output,

Price Analysis - Dell Inspiron 15

Competitor scan complete:
- Croma: Dell Inspiron 15 found at Rs 63,490

ShopMax current price: Rs 62,000
Competitor price: Rs 63,490
ShopMax is already Rs 1,490 cheaper than Croma.

Recommendation: No price adjustment needed. ShopMax is competitively priced.
Consider promoting the price advantage in marketing to drive volume.

This pattern works for any web UI that has no public API - competitor catalogs, government portals, legacy systems. For form-filling workflows, extend capture_page_screenshot to also accept click coordinates and text inputs, building a full computer-use loop. Always respect robots.txt and terms of service when automating web access.


 
  


  
bl  br