In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Google Gemini API > Gemini Batch Processing

Gemini Batch Processing

Author: Venkata Sudhakar

Gemini Batch Processing allows you to submit large volumes of requests asynchronously and retrieve results when complete. Instead of sending thousands of individual API calls, batch mode accepts a JSONL input file, processes all requests in the background, and writes results to Cloud Storage. ShopMax India uses batch processing to generate product descriptions for its entire 10,000-item catalogue overnight.

Batch jobs are submitted via the Vertex AI Batch Prediction API using the Gemini model endpoint. Each request in the JSONL file is independent. Batch mode is up to 50% cheaper than online inference and has no rate limit concerns, making it ideal for large-scale content generation, classification, and analysis tasks.

The below example shows how to prepare a JSONL batch input file and submit a batch job to Vertex AI.

It gives the following output,

Batch input file created with 3 requests

The below example shows uploading the input to Cloud Storage, submitting the batch job, and retrieving results when complete.

It gives the following output,

Job submitted: shopmax-product-descriptions
State: JOB_STATE_RUNNING
...
Completed. Output: gs://shopmax-batch-jobs/outputs/prediction-shopmax-product-descriptions/

ShopMax India runs a nightly batch job every day at 2 AM to generate descriptions for new catalogue additions and refresh existing ones. Processing 10,000 products takes under 90 minutes at batch pricing, costing 50% less than equivalent online API calls. Results are written directly to Cloud Storage and picked up by the product database pipeline at 4 AM before the business day begins.

Send your comments, suggestions or queries regarding this site to [email protected].