RAGEval#

This project supports independent evaluation and end-to-end evaluation for RAG and multimodal RAG:

Independent Evaluation: Evaluating the retrieval module separately. The evaluation metrics for the retrieval module include Hit Rate, Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative Gain (NDCG), Precision, etc. These metrics are used to measure the system’s effectiveness in ranking items based on a query or task.
End-to-End Evaluation: Evaluating the final response generated by the RAG model for a given input. This includes the relevance and alignment of the model-generated answer with the input query. From the content generation objective perspective, the evaluation can be divided into no-reference and reference-based evaluations: No-reference evaluation metrics include Context Relevance, Faithfulness, etc.; Reference-based evaluation metrics include Accuracy, BLEU, ROUGE, etc.