|
|
Gemini Vision - Image Understanding
Author: Venkata Sudhakar
The Gemini API can analyse images natively - identifying objects, reading text, understanding scenes, and answering questions about visual content. ShopMax India uses Gemini Vision to automate product image quality checks, extract text from packaging photos, and power its visual customer support channel where shoppers send photos of damaged items. Images can be passed to Gemini in two ways: as inline base64 data for images under 20 MB, or via the Files API for larger files. Multiple images can be sent in a single request, enabling comparison and batch analysis workflows. The below example shows how to analyse a product image and extract structured information using Gemini.
It gives the following output,
{
"brand": "Samsung",
"model": "UA55CU7700",
"price_on_box": "Rs 45,990",
"warranty": "1 year comprehensive",
"key_features": [
"55 inch 4K UHD Crystal Display",
"PurColor technology",
"Smart TV with Tizen OS",
"AirSlim design"
]
}
The below example shows damage detection - a customer sends a photo of a broken product and the agent assesses whether it qualifies for a return.
It gives the following output,
{
"damage_visible": true,
"severity": "major",
"damage_description": "Large crack across the screen panel, display non-functional",
"return_eligible": true,
"recommended_action": "Approve immediate replacement or full refund",
"order_id": "ORD-9921"
}
ShopMax India processes over 500 damage claim photos per week across Mumbai, Hyderabad, and Delhi. Gemini Vision automates the first-pass triage - eligible claims are fast-tracked for replacement while edge cases are routed to a senior support agent. This reduced claim processing time from 48 hours to under 2 hours on average.
|
|