|
|
RAG Evaluation with RAGAS
Author: Venkata Sudhakar
Building a RAG pipeline is only half the work - measuring whether it is actually working correctly is equally important. RAGAS (Retrieval Augmented Generation Assessment) is an open-source framework that evaluates RAG pipelines across four key metrics: faithfulness, answer relevancy, context precision, and context recall. Without evaluation, you cannot confidently improve or deploy a RAG system. Faithfulness checks whether the generated answer is factually consistent with the retrieved context. Answer relevancy measures how well the answer addresses the question. Context precision checks whether the retrieved chunks are actually useful. Context recall measures whether all relevant information was retrieved. Together they give a complete picture of RAG quality. The below example evaluates a ShopMax India product FAQ RAG pipeline using RAGAS, measuring all four metrics on a sample question-answer dataset.
It gives the following output,
RAGAS Evaluation Results:
faithfulness: 0.917
answer_relevancy: 0.894
context_precision: 0.875
context_recall: 0.833
Scores above 0.8 across all metrics indicate a well-functioning RAG pipeline. A low faithfulness score (below 0.7) indicates hallucination - the model is adding facts not present in the context. A low context recall score indicates chunking or retrieval problems. Run RAGAS evaluations on a labelled test set of 50 to 100 questions before deploying any ShopMax RAG system to production, and re-evaluate after every significant change to your chunking strategy, embedding model, or prompt.
|
|