|
|
Gemini Multimodal - Document OCR and Form Extraction
Author: Venkata Sudhakar
Gemini can read scanned documents, invoices, and filled forms just by looking at the image. Unlike traditional OCR tools that only extract raw text, Gemini understands the document structure - it knows that a number below the word 'Total' is a currency amount, not a page number. ShopMax India uses this to automatically process vendor invoices received as scanned PDFs. You send the document image as an inline part or File API upload alongside a structured extraction prompt. Gemini returns the fields you asked for, handling handwriting, poor scan quality, and varied layouts without any template configuration. The below example shows how ShopMax India extracts invoice data from scanned vendor documents using Gemini.
It gives the following output,
{
"invoice_number": "INV-2024-08812",
"invoice_date": "2024-11-12",
"vendor_name": "Samsung Electronics India Pvt Ltd",
"vendor_gstin": "29AAACD1234F1ZL",
"line_items": [
{"description": "Galaxy S24 Ultra 256GB", "quantity": 10, "unit_price": 109900, "total": 1099000},
{"description": "Galaxy A55 128GB", "quantity": 25, "unit_price": 38990, "total": 974750}
],
"subtotal": 2073750,
"gst_amount": 373275,
"grand_total": 2447025,
"payment_due_date": "2024-12-12"
}
For multi-page invoices, upload as PDF using the File API and Gemini processes all pages together. You can also extract data from handwritten delivery challans and purchase orders by passing the image and asking for specific fields.
It gives the following output,
Processed: samsung_inv_08812.pdf - Invoice INV-2024-08812
Processed: lg_inv_03341.pdf - Invoice INV-LG-03341
Processed: bosch_inv_12209.pdf - Invoice BCH-2024-12209
Exported 3 invoices to CSV
For high-volume invoice processing at ShopMax India, use the Batch API to process thousands of invoices overnight at 50% lower cost. Validate extracted amounts against purchase orders in AlloyDB before approving for payment to catch discrepancies automatically.
|
|