Docs: How to use MinerU to parse pdf documents (#10763)

### What problem does this PR solve? ### Type of change - [x] Documentation Update
2026-02-05 18:15:06 +08:00 · 2025-10-23 18:56:09 +08:00
parent 83e80e3d7f
commit de24e74b4c
3 changed files with 49 additions and 2 deletions
--- a/docs/faq.mdx
+++ b/docs/faq.mdx
@ -510,3 +510,27 @@ See [here](./guides/agent/best_practices/accelerate_agent_question_answering.md)

 ---

+### How to use MinerU to parse PDF documents?
+
+MinerU PDF document parsing is available starting from v0.21.1. To use this feature, follow these steps:
+
+1. Before deploying ragflow-server, update your **docker/.env** file:  
+   - Enable `HF_ENDPOINT=https://hf-mirror.com`
+   - Add a MinerU entry: `MINERU_EXECUTABLE=/ragflow/uv_tools/.venv/bin/mineru`
+
+2. Start the ragflow-server and run the following commands inside the container:  
+
+```bash
+mkdir uv_tools
+cd uv_tools
+uv venv .venv
+source .venv/bin/activate
+uv pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple
+```
+
+3. Restart the ragflow-server.
+4. In the web UI, navigate to the **Configuration** page of your dataset. Click **Built-in** in the **Ingestion pipeline** section, select a chunking method from the **Built-in** dropdown, which supports PDF parsing, and slect **MinerU** in **PDF parser**.
+5. If you use a custom ingestion pipeline instead, you must also complete the first three steps before selecting **MinerU** in the **Parsing method** section of the **Parser** component.
+
+
+
--- a/docs/guides/dataset/select_pdf_parser.md
+++ b/docs/guides/dataset/select_pdf_parser.md
@ -35,8 +35,31 @@ RAGFlow isn't one-size-fits-all. It is built for flexibility and supports deeper

  - DeepDoc: (Default) The default visual model performing OCR, TSR, and DLR tasks on PDFs, which can be time-consuming.
  - Naive: Skip OCR, TSR, and DLR tasks if *all* your PDFs are plain text.
+  - MinerU: An experimental feature.
  - A third-party visual model provided by a specific model provider.

+:::danger IMPORTANG
+MinerU PDF document parsing is available starting from v0.21.1. To use this feature, follow these steps:
+
+1. Before deploying ragflow-server, update your **docker/.env** file:  
+   - Enable `HF_ENDPOINT=https://hf-mirror.com`
+   - Add a MinerU entry: `MINERU_EXECUTABLE=/ragflow/uv_tools/.venv/bin/mineru`
+
+2. Start the ragflow-server and run the following commands inside the container:  
+
+```bash
+mkdir uv_tools
+cd uv_tools
+uv venv .venv
+source .venv/bin/activate
+uv pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple
+```
+
+3. Restart the ragflow-server.
+4. In the web UI, navigate to the **Configuration** page of your dataset. Click **Built-in** in the **Ingestion pipeline** section, select a chunking method from the **Built-in** dropdown, which supports PDF parsing, and slect **MinerU** in **PDF parser**.
+5. If you use a custom ingestion pipeline instead, you must also complete the first three steps before selecting **MinerU** in the **Parsing method** section of the **Parser** component.
+:::
+
 :::caution WARNING
 Third-party visual models are marked **Experimental**, because we have not fully tested these models for the aforementioned data extraction tasks.
 :::
--- a/docs/release_notes.md
+++ b/docs/release_notes.md
@ -28,12 +28,12 @@ Released on October 23, 2025.

 ### New features

- Experimental: Adds support for PDF document parsing using MinerU.
+- Experimental: Adds support for PDF document parsing using MinerU. See [here](./faq.mdx#how-to-use-mineru-to-parse-pdf-documents).

 ### Improvements

 - Enhances UI/UX for the dataset and personal center pages.
- Upgrades RAGFlow's document engine, Infinity, to v0.6.1.
+- Upgrades RAGFlow's document engine, [Infinity](https://github.com/infiniflow/infinity), to v0.6.1.

 ### Fixed issues