In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > LangChain > LangChain Retrieval QA Chain

LangChain Retrieval QA Chain

Author: Venkata Sudhakar

A Retrieval QA chain connects a vector store retriever to an LLM so the model can answer questions grounded in your own documents. The user asks a question, the retriever fetches the most relevant document chunks from the vector store, and the LLM generates an answer using those chunks as context. This is the core pattern behind knowledge base chatbots, documentation assistants, and internal search tools. In modern LangChain, this is built entirely with LCEL using the pipe operator rather than the legacy RetrievalQA class.

The key components are: a document loader (loads your source), a text splitter (chunks documents into pieces that fit the context window), an embedding model (converts chunks to vectors), a vector store (stores and retrieves vectors), a retriever (wraps the vector store with a search interface), and a prompt template that injects retrieved context before the question. Connecting them with LCEL gives you a chain that can be invoked, streamed, or batched with a single call.

The below example shows building a complete RAG pipeline that loads documents, indexes them into ChromaDB, and answers questions using LCEL with source attribution.

# pip install langchain langchain-openai langchain-chroma
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents import Document

llm = ChatOpenAI(model="gpt-4o-mini", api_key="your-api-key", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", api_key="your-api-key")

# Sample documents (in production load from PDF, web, database etc.)
raw_docs = [
    Document(page_content="Debezium reads the MySQL binary log (binlog) to capture row-level changes. It requires the binlog to be in ROW format and a dedicated MySQL user with REPLICATION SLAVE and REPLICATION CLIENT privileges.", metadata={"source": "debezium-guide.md"}),
    Document(page_content="CDC replication lag is the delay between a change occurring in the source database and the change event being processed by the consumer. Lag is measured in seconds and in pending message count.", metadata={"source": "cdc-concepts.md"}),
    Document(page_content="Kafka consumer groups allow multiple consumer instances to share the work of reading a topic. Each partition is assigned to exactly one consumer in the group at any time.", metadata={"source": "kafka-guide.md"}),
    Document(page_content="pgvector is a PostgreSQL extension that stores embedding vectors and supports cosine similarity search using the <=> operator and HNSW indexes.", metadata={"source": "pgvector-guide.md"}),
]

# Split documents into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
docs = splitter.split_documents(raw_docs)

# Index into ChromaDB (persisted in memory for this example)
vectorstore = Chroma.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

print(f"Indexed {len(docs)} chunks into vector store")

It gives the following output,

Indexed 4 chunks into vector store

It gives the following output,

Q: What MySQL privileges does Debezium need?
A: Debezium requires a dedicated MySQL user with REPLICATION SLAVE
   and REPLICATION CLIENT privileges.

Q: How is CDC lag measured?
A: CDC lag is measured in seconds and in pending message count.

Q: What is the capital of France?
A: I do not have that information in the provided context.

# Last answer correctly refuses to hallucinate - only answers from context

For production RAG pipelines: use RecursiveCharacterTextSplitter with chunk_size around 500-1000 tokens and chunk_overlap of 10-15% to preserve context across chunk boundaries. Set retriever k=3-5 to balance context richness with token cost. Add source metadata to every document so you can show users where each answer came from. Use a temperature of 0 for the LLM in RAG applications - you want consistent, factual extraction from context, not creative generation.

Send your comments, suggestions or queries regarding this site to [email protected].