In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Ollama > Building a Local RAG System with Ollama in Python

Building a Local RAG System with Ollama in Python

Author: Venkata Sudhakar

Retrieval Augmented Generation (RAG) combines a retrieval step with LLM generation to answer questions grounded in a specific document set. Instead of relying on the model knowledge alone, RAG first retrieves the most relevant passages from a document store and then passes them as context to the LLM. With Ollama, the entire pipeline can run locally - no cloud API, no data leaving the machine. At ShopMax India, a local RAG system allows staff to query internal policy documents, product manuals, and pricing sheets using natural language.

A minimal RAG pipeline needs three components: a document store (a list of text chunks), an embedding model to vectorize documents and queries, and a generative model to produce the final answer. Ollama provides both the embedding model (nomic-embed-text) and the generative model (llama3.2) locally, making it a self-contained solution.

The below example shows how to build a simple local RAG system using Ollama for ShopMax India internal policy queries.

import ollama
import math

def cosine_sim(v1, v2):
    dot = sum(a * b for a, b in zip(v1, v2))
    m1 = math.sqrt(sum(a * a for a in v1))
    m2 = math.sqrt(sum(b * b for b in v2))
    return dot / (m1 * m2) if m1 and m2 else 0.0

# ShopMax India - internal policy document chunks
docs = [
    "Employees in Mumbai and Bangalore are eligible for Rs 5,000 monthly travel allowance.",
    "All purchase orders above Rs 50,000 require approval from the finance head.",
    "ShopMax India offers a 7-day return policy for all electronics sold in stores.",
    "Staff discount is 15% on all ShopMax branded products across Delhi and Hyderabad stores.",
    "Annual performance reviews are conducted every March for all full-time employees."
]

# Pre-compute document embeddings
print("Indexing documents...")
doc_embeddings = [
    ollama.embeddings(model="nomic-embed-text", prompt=d)["embedding"]
    for d in docs
]

def rag_query(question, top_k=2):
    # Embed the question
    q_emb = ollama.embeddings(model="nomic-embed-text", prompt=question)["embedding"]

# Retrieve top-k most relevant chunks
    scores = [cosine_sim(q_emb, de) for de in doc_embeddings]
    ranked = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)
    context = "\n".join(docs[i] for i in ranked[:top_k])

# Generate answer with context
    prompt = f"Context:\n{context}\n\nQuestion: {question}\nAnswer briefly:"
    response = ollama.chat(
        model="llama3.2",
        messages=[{"role": "user", "content": prompt}]
    )
    return response["message"]["content"]

# Test queries
queries = [
    "What is the return policy for electronics?",
    "How much travel allowance do Mumbai employees get?",
    "When are performance reviews held?"
]

for q in queries:
    print(f"Q: {q}")
    print(f"A: {rag_query(q)}")
    print()

It gives the following output,

Indexing documents...
Q: What is the return policy for electronics?
A: ShopMax India offers a 7-day return policy for all electronics
sold in stores.

Q: How much travel allowance do Mumbai employees get?
A: Employees in Mumbai and Bangalore are eligible for a Rs 5,000
monthly travel allowance.

Q: When are performance reviews held?
A: Annual performance reviews are conducted every March for all
full-time employees.

This minimal RAG pattern is effective for small document sets. For larger document collections at ShopMax India, replace the in-memory cosine search with a vector database like ChromaDB or Qdrant running locally. Chunk long documents into 200 to 400 word segments before embedding to ensure each chunk covers a single focused topic, which improves retrieval precision. The combination of nomic-embed-text and llama3.2 via Ollama provides a completely offline, cost-free RAG solution suitable for internal business tools.

Send your comments, suggestions or queries regarding this site to [email protected].