Docs: How to use MinerU to parse pdf documents (#10763)

### What problem does this PR solve?



### Type of change

- [x] Documentation Update
This commit is contained in:
writinwaters
2025-10-23 18:56:09 +08:00
committed by GitHub
parent 83e80e3d7f
commit de24e74b4c
3 changed files with 49 additions and 2 deletions

View File

@ -510,3 +510,27 @@ See [here](./guides/agent/best_practices/accelerate_agent_question_answering.md)
---
### How to use MinerU to parse PDF documents?
MinerU PDF document parsing is available starting from v0.21.1. To use this feature, follow these steps:
1. Before deploying ragflow-server, update your **docker/.env** file:
- Enable `HF_ENDPOINT=https://hf-mirror.com`
- Add a MinerU entry: `MINERU_EXECUTABLE=/ragflow/uv_tools/.venv/bin/mineru`
2. Start the ragflow-server and run the following commands inside the container:
```bash
mkdir uv_tools
cd uv_tools
uv venv .venv
source .venv/bin/activate
uv pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple
```
3. Restart the ragflow-server.
4. In the web UI, navigate to the **Configuration** page of your dataset. Click **Built-in** in the **Ingestion pipeline** section, select a chunking method from the **Built-in** dropdown, which supports PDF parsing, and slect **MinerU** in **PDF parser**.
5. If you use a custom ingestion pipeline instead, you must also complete the first three steps before selecting **MinerU** in the **Parsing method** section of the **Parser** component.