tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > RAG Pipelines > RAG Evaluation with RAGAS

RAG Evaluation with RAGAS

Author: Venkata Sudhakar

Building a RAG pipeline is only half the work - measuring whether it is actually working correctly is equally important. RAGAS (Retrieval Augmented Generation Assessment) is an open-source framework that evaluates RAG pipelines across four key metrics: faithfulness, answer relevancy, context precision, and context recall. Without evaluation, you cannot confidently improve or deploy a RAG system.

Faithfulness checks whether the generated answer is factually consistent with the retrieved context. Answer relevancy measures how well the answer addresses the question. Context precision checks whether the retrieved chunks are actually useful. Context recall measures whether all relevant information was retrieved. Together they give a complete picture of RAG quality.

The below example evaluates a ShopMax India product FAQ RAG pipeline using RAGAS, measuring all four metrics on a sample question-answer dataset.


It gives the following output,

RAGAS Evaluation Results:
  faithfulness: 0.917
  answer_relevancy: 0.894
  context_precision: 0.875
  context_recall: 0.833

Scores above 0.8 across all metrics indicate a well-functioning RAG pipeline. A low faithfulness score (below 0.7) indicates hallucination - the model is adding facts not present in the context. A low context recall score indicates chunking or retrieval problems. Run RAGAS evaluations on a labelled test set of 50 to 100 questions before deploying any ShopMax RAG system to production, and re-evaluate after every significant change to your chunking strategy, embedding model, or prompt.


 
  


  
bl  br