Ragas Eval Service (Python)
This document describes the Flask application serving as a microservice for evaluating the Retrieval-Augmented Generation (RAG) pipeline using the Ragas library.
Overview
This Flask application provides an API to trigger the evaluation of the RAG pipeline. It uses a predefined dataset and the Ragas library to calculate metrics such as faithfulness, answer relevancy, context recall, and context precision.
Key Functionalities Exposed:
- Receiving a request to evaluate the RAG pipeline.
- Loading an evaluation dataset.
- Preparing the dataset for Ragas.
- Running the Ragas evaluation.
- Returning the evaluation results.
Technology Stack:
- Flask: For creating the web server and API endpoints.
- Ragas: The core library for RAG evaluation.
- google-generativeai: For using Gemini as the LLM for Ragas metrics.
API Endpoints
POST /evaluate-rag
Triggers the evaluation of the RAG pipeline.
Responses:
200 OK: A JSON object with the evaluation results.500 Internal Server Error: If the evaluation fails.