ragflow

pengjunjie/ragflow

Fork 1

mirror of https://github.com/infiniflow/ragflow.git synced 2025-12-08 20:42:30 +08:00

Commit Graph

Author	SHA1	Message	Date
hsparks-codes	237a66913b	Feat: RAG evaluation (#11674 ) ### What problem does this PR solve? Feature: This PR implements a comprehensive RAG evaluation framework to address issue #11656. Problem: Developers using RAGFlow lack systematic ways to measure RAG accuracy and quality. They cannot objectively answer: 1. Are RAG results truly accurate? 2. How should configurations be adjusted to improve quality? 3. How to maintain and improve RAG performance over time? Solution: This PR adds a complete evaluation system with: - Dataset & test case management - Create ground truth datasets with questions and expected answers - Automated evaluation - Run RAG pipeline on test cases and compute metrics - Comprehensive metrics - Precision, recall, F1 score, MRR, hit rate for retrieval quality - Smart recommendations - Analyze results and suggest specific configuration improvements (e.g., "increase top_k", "enable reranking") - 20+ REST API endpoints - Full CRUD operations for datasets, test cases, and evaluation runs Impact: Enables developers to objectively measure RAG quality, identify issues, and systematically improve their RAG systems through data-driven configuration tuning. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-03 17:00:58 +08:00

Author

SHA1

Message

Date

hsparks-codes

237a66913b

Feat: RAG evaluation (#11674 )

### What problem does this PR solve?

Feature: This PR implements a comprehensive RAG evaluation framework to
address issue #11656.

**Problem**: Developers using RAGFlow lack systematic ways to measure
RAG accuracy and quality. They cannot objectively answer:
1. Are RAG results truly accurate?
2. How should configurations be adjusted to improve quality?
3. How to maintain and improve RAG performance over time?

**Solution**: This PR adds a complete evaluation system with:
- **Dataset & test case management** - Create ground truth datasets with
questions and expected answers
- **Automated evaluation** - Run RAG pipeline on test cases and compute
metrics
- **Comprehensive metrics** - Precision, recall, F1 score, MRR, hit rate
for retrieval quality
- **Smart recommendations** - Analyze results and suggest specific
configuration improvements (e.g., "increase top_k", "enable reranking")
- **20+ REST API endpoints** - Full CRUD operations for datasets, test
cases, and evaluation runs

**Impact**: Enables developers to objectively measure RAG quality,
identify issues, and systematically improve their RAG systems through
data-driven configuration tuning.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

2025-12-03 17:00:58 +08:00

1 Commits