mirror of
https://github.com/infiniflow/ragflow.git
synced 2025-12-08 20:42:30 +08:00
Docs: parser behavior change (#11176)
### What problem does this PR solve? ### Type of change - [x] Documentation Update
This commit is contained in:
18
docs/faq.mdx
18
docs/faq.mdx
@ -514,11 +514,11 @@ See [here](./guides/agent/best_practices/accelerate_agent_question_answering.md)
|
||||
|
||||
### How to use MinerU to parse PDF documents?
|
||||
|
||||
MinerU PDF document parsing is available starting from v0.21.1. RAGFlow supports MinerU (>= 2.6.3) as an optional PDF parser with multiple backends. RAGFlow itself only acts as a client: it calls MinerU to parse documents, reads the output files, and ingests the parsed content into RAGFlow. To use this feature, follow these steps:
|
||||
MinerU PDF document parsing is available starting from v0.22.0. RAGFlow supports MinerU (>= 2.6.3) as an optional PDF parser with multiple backends. RAGFlow acts only as a client for MinerU, calling it to parse documents, reading the output files, and ingesting the parsed content. To use this feature, follow these steps:
|
||||
|
||||
1. **Prepare MinerU**
|
||||
1. Prepare MinerU
|
||||
|
||||
- **If you run RAGFlow from source**, install MinerU into an isolated virtual environment (recommended path: `$HOME/uv_tools`):
|
||||
- **If you deploy RAGFlow from source**, install MinerU into an isolated virtual environment (recommended path: `$HOME/uv_tools`):
|
||||
|
||||
```bash
|
||||
mkdir -p "$HOME/uv_tools"
|
||||
@ -530,7 +530,7 @@ MinerU PDF document parsing is available starting from v0.21.1. RAGFlow supports
|
||||
# uv pip install -U "mineru[all]" -i https://mirrors.aliyun.com/pypi/simple
|
||||
```
|
||||
|
||||
- **If you run RAGFlow with Docker**, you usually only need to turn on MinerU support in `docker/.env`:
|
||||
- **If you deploy RAGFlow with Docker**, you usually only need to turn on MinerU support in `docker/.env`:
|
||||
|
||||
```bash
|
||||
# docker/.env
|
||||
@ -541,7 +541,7 @@ MinerU PDF document parsing is available starting from v0.21.1. RAGFlow supports
|
||||
|
||||
Enabling `USE_MINERU=true` will internally perform the same setup as the manual configuration (including setting the MinerU executable path and related environment variables). You only need the manual installation above if you are running from source or want full control over the MinerU installation.
|
||||
|
||||
2. **Start RAGFlow with MinerU enabled**
|
||||
2. Start RAGFlow with MinerU enabled:
|
||||
|
||||
- **Source deployment** – in the RAGFlow repo, export the key MinerU-related variables and start the backend service:
|
||||
|
||||
@ -570,7 +570,7 @@ MinerU PDF document parsing is available starting from v0.21.1. RAGFlow supports
|
||||
|
||||
### How to configure MinerU-specific settings?
|
||||
|
||||
The table below summarizes the most commonly used MinerU-related environment variables:
|
||||
The table below summarizes the most frequently used MinerU environment variables:
|
||||
|
||||
| Environment variable | Description | Default | Example |
|
||||
| ---------------------- | ---------------------------------- | ----------------------------------- | ----------------------------------------------------------------------------------------------- |
|
||||
@ -583,14 +583,14 @@ The table below summarizes the most commonly used MinerU-related environment var
|
||||
|
||||
1. Set `MINERU_EXECUTABLE` to the path to the MinerU executable if the default `mineru` is not on `PATH`.
|
||||
2. Set `MINERU_DELETE_OUTPUT` to `0` to keep MinerU's output. (Default: `1`, which deletes temporary output.)
|
||||
3. Set `MINERU_OUTPUT_DIR` to specify the output directory for MinerU (otherwise a system temp directory is used).
|
||||
3. Set `MINERU_OUTPUT_DIR` to specify the output directory for MinerU; otherwise, a system temp directory is used.
|
||||
4. Set `MINERU_BACKEND` to specify a parsing backend:
|
||||
- `"pipeline"` (default): The traditional multimodel pipeline.
|
||||
- `"vlm-transformers"`: A vision-language model using HuggingFace Transformers.
|
||||
- `"vlm-vllm-engine"`: A vision-language model using a local vLLM engine (requires a local GPU).
|
||||
- `"vlm-http-client"`: A vision-language model via HTTP client to a remote vLLM server (RAGFlow only requires CPU).
|
||||
5. If using the `"vlm-http-client"` backend, you must also set `MINERU_SERVER_URL` to the URL of your vLLM server.
|
||||
6. If you want RAGFlow to call a **remote MinerU service** (instead of a MinerU process running locally with RAGFlow), set `MINERU_APISERVER` to the URL of the remote MinerU server.
|
||||
5. If using the `"vlm-http-client"` backend, you must also set `MINERU_SERVER_URL` to your vLLM server's URL.
|
||||
6. If configuring RAGFlow to call a *remote* MinerU service, set `MINERU_APISERVER` to the MinerU server's URL.
|
||||
|
||||
:::tip NOTE
|
||||
For information about other environment variables natively supported by MinerU, see [here](https://opendatalab.github.io/MinerU/usage/cli_tools/#environment-variables-description).
|
||||
|
||||
Reference in New Issue
Block a user