Skip to main content

Ragas Eval Service (Python)

This document describes the Flask application serving as a microservice for evaluating the Retrieval-Augmented Generation (RAG) pipeline using the Ragas library.

Overview

This Flask application provides an API to trigger the evaluation of the RAG pipeline. It uses a predefined dataset and the Ragas library to calculate metrics such as faithfulness, answer relevancy, context recall, and context precision.

Key Functionalities Exposed:

  • Receiving a request to evaluate the RAG pipeline.
  • Loading an evaluation dataset.
  • Preparing the dataset for Ragas.
  • Running the Ragas evaluation.
  • Returning the evaluation results.

Technology Stack:

  • Flask: For creating the web server and API endpoints.
  • Ragas: The core library for RAG evaluation.
  • google-generativeai: For using Gemini as the LLM for Ragas metrics.

API Endpoints

POST /evaluate-rag

Triggers the evaluation of the RAG pipeline.

Responses:

  • 200 OK: A JSON object with the evaluation results.
  • 500 Internal Server Error: If the evaluation fails.