tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > Google Gemini API > Gemini Multimodal - Video Understanding

Gemini Multimodal - Video Understanding

Author: Venkata Sudhakar

Gemini can analyse video files directly � not just images. You can upload a recorded product demo, a customer interview, a training video, or a meeting recording, and ask Gemini to extract insights, summarise key points, identify timestamps, transcribe speech, or answer questions about what happens in specific scenes. This opens a new class of business applications: automated quality review of sales call recordings, instant summarisation of long training sessions, content moderation of user-generated video, and intelligent search across video libraries.

Videos are uploaded using the Gemini Files API, which handles files up to 2GB. After upload, the file is processed and becomes available for multimodal queries. You reference the uploaded file by its URI in the contents of a generate_content call. Gemini supports MP4, MOV, AVI, and other common formats. For long videos, Gemini samples frames at regular intervals to understand the content � you can ask about scenes at specific timestamps and it will reference the correct part of the video.

The below example shows a retail company analysing recorded product demo videos to extract key features mentioned, customer questions asked, and a structured summary � automating what previously required a human reviewer watching every recording.


Uploading and analysing a product demo recording,


It gives the following output with structured video analysis,

Uploading video: shopmax_tv_demo_april.mp4
Processing video...
Video ready: https://generativelanguage.googleapis.com/v1beta/files/abc123

=== VIDEO ANALYSIS: shopmax_tv_demo_april ===
SUMMARY: This 8-minute demo showcases the Samsung 65-inch QLED TV with
focus on picture quality, smart features, and gaming capabilities. The
presenter walks through setup, content streaming, and the gaming mode.

FEATURES:
- QLED panel with Quantum HDR 32X (shown at 01:20)
- 4K upscaling for HD content (demonstrated at 02:45)
- Samsung SmartHub with OTT apps (03:10)
- Auto Low Latency Mode for gaming (05:30)
- Multi-View split screen feature (06:15)

QUESTIONS:
- "Does it support Dolby Atmos?" asked at 04:22
- "What is the input lag for gaming?" asked at 05:45

TIMESTAMPS:
00:00 - Introduction and unboxing
01:00 - Picture quality demonstration
03:00 - Smart TV features walkthrough
05:30 - Gaming mode setup
07:00 - Final comparison and pricing

SENTIMENT: Positive - presenter is enthusiastic and questions are engaged

Remote control scene: The remote appears at 06:45. The presenter highlights
the voice control button, direct OTT app shortcuts, and the solar charging
panel on the back of the remote.

Video file deleted from Gemini Files API

Video analysis production patterns: keep uploaded videos short (under 10 minutes) for fastest analysis � Gemini works best with focused clips rather than multi-hour recordings. For long recordings, split into chapters and analyse each chapter separately. Use the Files API delete endpoint immediately after analysis to avoid storage accumulation. For high-volume use cases like moderating thousands of user-submitted videos daily, combine the Files API upload with Gemini Batch API (Tutorial 300) to process many videos concurrently at 50 percent reduced cost. Store the analysis results in BigQuery for searchable video metadata across your entire content library.


 
  


  
bl  br