mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-01-23 11:36:38 +08:00
### What problem does this PR solve? This PR adds a dedicated HTTP benchmark CLI for RAGFlow chat and retrieval endpoints so we can measure latency/QPS. ### Type of change - [x] Documentation Update - [x] Other (please describe): Adds a CLI benchmarking tool for chat/retrieval latency/QPS --------- Co-authored-by: Liu An <asiro@qq.com>
286 lines
9.2 KiB
Markdown
286 lines
9.2 KiB
Markdown
# RAGFlow HTTP Benchmark CLI
|
|
|
|
Run (from repo root):
|
|
```
|
|
PYTHONPATH=./test uv run -m benchmark [global flags] <chat|retrieval> [command flags]
|
|
Global flags can be placed before or after the command.
|
|
```
|
|
|
|
If you run from another directory:
|
|
```
|
|
PYTHONPATH=/Directory_name/ragflow/test uv run -m benchmark [global flags] <chat|retrieval> [command flags]
|
|
```
|
|
|
|
JSON args:
|
|
For --dataset-payload, --chat-payload, --messages-json, --extra-body, --payload
|
|
- Pass inline JSON: '{"key": "value"}'
|
|
- Or use a file: '@/path/to/file.json'
|
|
|
|
Global flags
|
|
```
|
|
--base-url
|
|
Base server URL.
|
|
Env: RAGFLOW_BASE_URL or HOST_ADDRESS
|
|
--api-version
|
|
API version string (default: v1).
|
|
Env: RAGFLOW_API_VERSION
|
|
--api-key
|
|
API key for Authorization: Bearer <token>.
|
|
--connect-timeout
|
|
Connect timeout seconds (default: 5.0).
|
|
--read-timeout
|
|
Read timeout seconds (default: 60.0).
|
|
--no-verify-ssl
|
|
Disable SSL verification.
|
|
--iterations
|
|
Iterations per benchmark (default: 1).
|
|
--concurrency
|
|
Number of concurrent requests (default: 1). Uses multiprocessing.
|
|
--json
|
|
Output JSON report (plain stdout).
|
|
--print-response
|
|
Print response content per iteration (stdout). With --json, responses are included in the JSON output.
|
|
--response-max-chars
|
|
Truncate printed responses to N chars (0 = no limit).
|
|
```
|
|
|
|
Auth and bootstrap flags (used when --api-key is not provided)
|
|
```
|
|
--login-email
|
|
Login email.
|
|
Env: RAGFLOW_EMAIL
|
|
--login-nickname
|
|
Nickname for registration. If omitted, defaults to email prefix when registering.
|
|
Env: RAGFLOW_NICKNAME
|
|
--login-password
|
|
Login password (encrypted client-side). Requires pycryptodomex in the test group.
|
|
--allow-register
|
|
Attempt /user/register before login (best effort).
|
|
--token-name
|
|
Optional API token name for /system/new_token.
|
|
--bootstrap-llm
|
|
Ensure LLM factory API key is configured via /llm/set_api_key.
|
|
--llm-factory
|
|
LLM factory name for bootstrap.
|
|
Env: RAGFLOW_LLM_FACTORY
|
|
--llm-api-key
|
|
LLM API key for bootstrap.
|
|
Env: ZHIPU_AI_API_KEY
|
|
--llm-api-base
|
|
Optional LLM API base URL.
|
|
Env: RAGFLOW_LLM_API_BASE
|
|
--set-tenant-info
|
|
Set tenant defaults via /user/set_tenant_info.
|
|
--tenant-llm-id
|
|
Tenant chat model ID.
|
|
Env: RAGFLOW_TENANT_LLM_ID
|
|
--tenant-embd-id
|
|
Tenant embedding model ID.
|
|
Env: RAGFLOW_TENANT_EMBD_ID
|
|
--tenant-img2txt-id
|
|
Tenant image2text model ID.
|
|
Env: RAGFLOW_TENANT_IMG2TXT_ID
|
|
--tenant-asr-id
|
|
Tenant ASR model ID (default empty).
|
|
Env: RAGFLOW_TENANT_ASR_ID
|
|
--tenant-tts-id
|
|
Tenant TTS model ID.
|
|
Env: RAGFLOW_TENANT_TTS_ID
|
|
```
|
|
|
|
Dataset/document flags (shared by chat and retrieval)
|
|
```
|
|
--dataset-id
|
|
Existing dataset ID.
|
|
--dataset-ids
|
|
Comma-separated dataset IDs.
|
|
--dataset-name
|
|
Dataset name when creating a new dataset.
|
|
Env: RAGFLOW_DATASET_NAME
|
|
--dataset-payload
|
|
JSON body for dataset creation (see API docs).
|
|
--document-path
|
|
Document path to upload (repeatable).
|
|
--document-paths-file
|
|
File containing document paths, one per line.
|
|
--parse-timeout
|
|
Document parse timeout seconds (default: 120.0).
|
|
--parse-interval
|
|
Document parse poll interval seconds (default: 1.0).
|
|
--teardown
|
|
Delete created resources after run.
|
|
```
|
|
|
|
Chat command flags
|
|
```
|
|
--chat-id
|
|
Existing chat ID. If omitted, a chat is created.
|
|
--chat-name
|
|
Chat name when creating a new chat.
|
|
Env: RAGFLOW_CHAT_NAME
|
|
--chat-payload
|
|
JSON body for chat creation (see API docs).
|
|
--model
|
|
Model field for OpenAI-compatible completion request.
|
|
Env: RAGFLOW_CHAT_MODEL
|
|
--message
|
|
Single user message (required unless --messages-json is provided).
|
|
--messages-json
|
|
JSON list of OpenAI-format messages (required unless --message is provided).
|
|
--extra-body
|
|
JSON extra_body for OpenAI-compatible request.
|
|
```
|
|
|
|
Retrieval command flags
|
|
```
|
|
--question
|
|
Retrieval question (required unless provided in --payload).
|
|
--payload
|
|
JSON body for /api/v1/retrieval (see API docs).
|
|
--document-ids
|
|
Comma-separated document IDs for retrieval.
|
|
```
|
|
|
|
Model selection guidance
|
|
- Embedding model is tied to the dataset.
|
|
Set during dataset creation using --dataset-payload:
|
|
```
|
|
{"name": "...", "embedding_model": "<model_name>@<provider>"}
|
|
```
|
|
Or set tenant defaults via --set-tenant-info with --tenant-embd-id.
|
|
- Chat model is tied to the chat assistant.
|
|
Set during chat creation using --chat-payload:
|
|
```
|
|
{"name": "...", "llm": {"model_name": "<model_name>@<provider>"}}
|
|
```
|
|
Or set tenant defaults via --set-tenant-info with --tenant-llm-id.
|
|
- --model is required by the OpenAI-compatible endpoint but does not override
|
|
the chat assistant's configured model on the server.
|
|
|
|
What this CLI can do
|
|
- This is a benchmark CLI. It always runs either a chat or retrieval benchmark
|
|
and prints a report.
|
|
- It can create datasets, upload documents, trigger parsing, and create chats
|
|
as part of a benchmark run (setup for the benchmark).
|
|
- It is not a general admin CLI; there are no standalone "create-only" or
|
|
"manage" commands. Use the reports to capture created IDs for reuse.
|
|
|
|
Do I need the dataset ID?
|
|
- If the CLI creates a dataset, it uses the returned dataset ID internally.
|
|
You do not need to supply it for that same run.
|
|
- The report prints "Created Dataset ID" so you can reuse it later with
|
|
--dataset-id or --dataset-ids.
|
|
- Dataset name is only used at creation time. Selection is always by ID.
|
|
|
|
Examples
|
|
|
|
Example: chat benchmark creating dataset + upload + parse + chat (login + register)
|
|
```
|
|
PYTHONPATH=./test uv run -m benchmark chat \
|
|
--base-url http://127.0.0.1:9380 \
|
|
--allow-register \
|
|
--login-email "qa@infiniflow.org" \
|
|
--login-password "123" \
|
|
--bootstrap-llm \
|
|
--llm-factory ZHIPU-AI \
|
|
--llm-api-key $ZHIPU_AI_API_KEY \
|
|
--dataset-name "bench_dataset" \
|
|
--dataset-payload '{"name":"bench_dataset","embedding_model":"BAAI/bge-small-en-v1.5@Builtin"}' \
|
|
--document-path test/benchmark/test_docs/Doc1.pdf \
|
|
--document-path test/benchmark/test_docs/Doc2.pdf \
|
|
--document-path test/benchmark/test_docs/Doc3.pdf \
|
|
--chat-name "bench_chat" \
|
|
--chat-payload '{"name":"bench_chat","llm":{"model_name":"glm-4-flash@ZHIPU-AI"}}' \
|
|
--message "What is the purpose of RAGFlow?" \
|
|
--model "glm-4-flash@ZHIPU-AI"
|
|
```
|
|
|
|
Example: chat benchmark with existing dataset + chat id (no creation)
|
|
```
|
|
PYTHONPATH=./test uv run -m benchmark chat \
|
|
--base-url http://127.0.0.1:9380 \
|
|
--chat-id <existing_chat_id> \
|
|
--login-email "qa@infiniflow.org" \
|
|
--login-password "123" \
|
|
--message "What is the purpose of RAGFlow?" \
|
|
--model "glm-4-flash@ZHIPU-AI"
|
|
```
|
|
|
|
Example: retrieval benchmark creating dataset + upload + parse
|
|
```
|
|
PYTHONPATH=./test uv run -m benchmark retrieval \
|
|
--base-url http://127.0.0.1:9380 \
|
|
--allow-register \
|
|
--login-email "qa@infiniflow.org" \
|
|
--login-password "123" \
|
|
--bootstrap-llm \
|
|
--llm-factory ZHIPU-AI \
|
|
--llm-api-key $ZHIPU_AI_API_KEY \
|
|
--dataset-name "bench_dataset" \
|
|
--dataset-payload '{"name":"bench_dataset","embedding_model":"BAAI/bge-small-en-v1.5@Builtin"}' \
|
|
--document-path test/benchmark/test_docs/Doc1.pdf \
|
|
--document-path test/benchmark/test_docs/Doc2.pdf \
|
|
--document-path test/benchmark/test_docs/Doc3.pdf \
|
|
--question "What does RAG mean?"
|
|
```
|
|
|
|
Example: retrieval benchmark with existing dataset IDs
|
|
```
|
|
PYTHONPATH=./test uv run -m benchmark retrieval \
|
|
--base-url http://127.0.0.1:9380 \
|
|
--login-email "qa@infiniflow.org" \
|
|
--login-password "123" \
|
|
--dataset-ids "<dataset_id_1>,<dataset_id_2>" \
|
|
--question "What does RAG mean?"
|
|
```
|
|
|
|
Example: retrieval benchmark with existing dataset IDs and document IDs
|
|
```
|
|
PYTHONPATH=./test uv run -m benchmark retrieval \
|
|
--base-url http://127.0.0.1:9380 \
|
|
--login-email "qa@infiniflow.org" \
|
|
--login-password "123" \
|
|
--dataset-id "<dataset_id>" \
|
|
--document-ids "<doc_id_1>,<doc_id_2>" \
|
|
--question "What does RAG mean?"
|
|
```
|
|
|
|
Quick scripts
|
|
|
|
These scripts create a dataset,
|
|
upload/parse docs from test/benchmark/test_docs, run the benchmark, and clean up.
|
|
The both script runs retrieval then chat on the same dataset, then deletes it.
|
|
|
|
- Make sure to run ```uv sync --python 3.12 --group test ``` before running the commands.
|
|
- It is also necessary to run these commands prior to initializing your containers if you plan on using the built-in embedded model: ```echo -e "TEI_MODEL=BAAI/bge-small-en-v1.5" >> docker/.env```
|
|
and ```echo -e "COMPOSE_PROFILES=\${COMPOSE_PROFILES},tei-cpu" >> docker/.env```
|
|
|
|
Chat only:
|
|
```
|
|
./test/benchmark/run_chat.sh
|
|
```
|
|
|
|
Retrieval only:
|
|
```
|
|
./test/benchmark/run_retrieval.sh
|
|
```
|
|
|
|
Both (retrieval then chat on the same dataset):
|
|
```
|
|
./test/benchmark/run_retrieval_chat.sh
|
|
```
|
|
|
|
Requires:
|
|
- ZHIPU_AI_API_KEY exported in your shell.
|
|
|
|
Defaults used:
|
|
- Base URL: http://127.0.0.1:9380
|
|
- Login: qa@infiniflow.org / 123 (with allow-register)
|
|
- LLM bootstrap: ZHIPU-AI with $ZHIPU_AI_API_KEY
|
|
- Dataset: bench_dataset (BAAI/bge-small-en-v1.5@Builtin)
|
|
- Chat: bench_chat (glm-4-flash@ZHIPU-AI)
|
|
- Chat message: "What is the purpose of RAGFlow?"
|
|
- Retrieval question: "What does RAG mean?"
|
|
- Iterations: 1
|
|
- concurrency:f 4
|