tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > Ollama > Building a Local RAG System with Ollama in Python

Building a Local RAG System with Ollama in Python

Author: Venkata Sudhakar

Retrieval Augmented Generation (RAG) combines a retrieval step with LLM generation to answer questions grounded in a specific document set. Instead of relying on the model knowledge alone, RAG first retrieves the most relevant passages from a document store and then passes them as context to the LLM. With Ollama, the entire pipeline can run locally - no cloud API, no data leaving the machine. At ShopMax India, a local RAG system allows staff to query internal policy documents, product manuals, and pricing sheets using natural language.

A minimal RAG pipeline needs three components: a document store (a list of text chunks), an embedding model to vectorize documents and queries, and a generative model to produce the final answer. Ollama provides both the embedding model (nomic-embed-text) and the generative model (llama3.2) locally, making it a self-contained solution.

The below example shows how to build a simple local RAG system using Ollama for ShopMax India internal policy queries.


It gives the following output,

Indexing documents...
Q: What is the return policy for electronics?
A: ShopMax India offers a 7-day return policy for all electronics
sold in stores.

Q: How much travel allowance do Mumbai employees get?
A: Employees in Mumbai and Bangalore are eligible for a Rs 5,000
monthly travel allowance.

Q: When are performance reviews held?
A: Annual performance reviews are conducted every March for all
full-time employees.

This minimal RAG pattern is effective for small document sets. For larger document collections at ShopMax India, replace the in-memory cosine search with a vector database like ChromaDB or Qdrant running locally. Chunk long documents into 200 to 400 word segments before embedding to ensure each chunk covers a single focused topic, which improves retrieval precision. The combination of nomic-embed-text and llama3.2 via Ollama provides a completely offline, cost-free RAG solution suitable for internal business tools.


 
  


  
bl  br