|
|
Gemini Document Understanding
Author: Venkata Sudhakar
The Gemini API can read and understand PDF documents natively. You can upload invoices, reports, contracts, and multi-page forms, then ask Gemini to extract data, summarise content, or answer questions about the document. ShopMax India uses this to automate supplier invoice processing. PDF files are uploaded via the Gemini Files API. Once uploaded, the file URI is passed alongside your prompt. Gemini handles multi-page documents, embedded tables, and mixed text-image content automatically. The below example shows how to upload a PDF invoice and extract key fields using Gemini.
It gives the following output,
{
"invoice_number": "INV-2024-00892",
"supplier_name": "TechSource Electronics Pvt Ltd",
"date": "2024-03-15",
"line_items": [
{"item": "Samsung 55 inch TV", "qty": 10, "unit_price": 45000, "amount": 450000},
{"item": "LG Soundbar", "qty": 5, "unit_price": 12000, "amount": 60000}
],
"total_amount": 510000,
"currency": "INR"
}
The below example shows how to process multiple PDF pages and extract a summary table from a quarterly sales report.
It gives the following output,
| City | Electronics | Appliances | Total |
|-----------|-------------|------------|------------|
| Mumbai | Rs 12,40,000 | Rs 8,20,000 | Rs 20,60,000 |
| Bangalore | Rs 9,80,000 | Rs 6,50,000 | Rs 16,30,000 |
| Hyderabad | Rs 7,60,000 | Rs 4,90,000 | Rs 12,50,000 |
| Delhi | Rs 11,20,000 | Rs 7,30,000 | Rs 18,50,000 |
ShopMax India processes over 200 supplier invoices per month. With Gemini Document Understanding, the accounts team extracts and validates invoice data in seconds instead of manually keying fields for hours. The same pipeline handles purchase orders, delivery notes, and warranty certificates.
|
|