From 83fac6d0a082c21745776613959e8a09f090805c Mon Sep 17 00:00:00 2001
From: writinwaters <93570324+writinwaters@users.noreply.github.com>
Date: Wed, 3 Dec 2025 09:35:52 +0800
Subject: [PATCH] Docs: How to specify an ingestion pipeline when creating a
 dataset (#11670)

### What problem does this PR solve?


### Type of change

- [x] Documentation Update
---
 docs/references/http_api_reference.md | 53 ++++++++++++++-------------
 1 file changed, 28 insertions(+), 25 deletions(-)
diff --git a/docs/references/http_api_reference.md b/docs/references/http_api_reference.md
index 7f006ec3d..e836c9a31 100644
--- a/docs/references/http_api_reference.md
+++ b/docs/references/http_api_reference.md
@@ -419,17 +419,11 @@ Creates a dataset.
   - `"embedding_model"`: `string`
   - `"permission"`: `string`
   - `"chunk_method"`: `string`
-  - "parser_config": `object`
-  - "parse_type": `int`
-  - "pipeline_id": `string`
+  - `"parser_config"`: `object`
+  - `"parse_type"`: `int`
+  - `"pipeline_id"`: `string`
 
-Note: Choose exactly one ingestion mode when creating a dataset.
-- Chunking method: provide `"chunk_method"` (optionally with `"parser_config"`).
-- Ingestion pipeline: provide both `"parse_type"` and `"pipeline_id"` and do not provide `"chunk_method"`.
-
-These options are mutually exclusive. If all three of `chunk_method`, `parse_type`, and `pipeline_id` are omitted, the system defaults to `chunk_method = "naive"`.
-
-##### Request example
+##### A basic request example
 
 ```bash
 curl --request POST \
@@ -441,9 +435,11 @@ curl --request POST \
       }'
 ```
 
-##### Request example (ingestion pipeline)
+##### A request example specifying ingestion pipeline
 
-Use this form when specifying an ingestion pipeline (do not include `chunk_method`).
+:::caution WARNING
+You must *not* include `"chunk_method"` or `"parser_config"` when specifying an ingestion pipeline.
+:::
 
 ```bash
 curl --request POST \
@@ -452,15 +448,11 @@ curl --request POST \
   --header 'Authorization: Bearer <YOUR_API_KEY>' \
   --data '{
    "name": "test-sdk",
-   "parse_type": <NUMBER_OF_FORMATS_IN_PARSE>,
+   "parse_type": <NUMBER_OF_PARSERS_IN_YOUR_PARSER_COMPONENT>,
    "pipeline_id": "<PIPELINE_ID_32_HEX>"
   }'
 ```
 
-Notes:
-- `parse_type` is an integer. Replace `<NUMBER_OF_FORMATS_IN_PARSE>` with your pipeline's parse-type value.
-- `pipeline_id` must be a 32-character lowercase hexadecimal string.
-
 ##### Request parameters
 
 - `"name"`: (*Body parameter*), `string`, *Required*  
@@ -488,7 +480,8 @@ Notes:
   - `"team"`: All team members can manage the dataset.
 
 - `"chunk_method"`: (*Body parameter*), `enum<string>`  
-  The chunking method of the dataset to create. Available options:  
+  The default chunk method of the dataset to create. Mutually exclusive with `"parse_type"` and `"pipeline_id"`. If you set `"chunk_method"`, do not include `"parse_type"` or `"pipeline_id"`.  
+  Available options:  
   - `"naive"`: General (default)
   - `"book"`: Book
   - `"email"`: Email
@@ -501,7 +494,6 @@ Notes:
   - `"qa"`: Q&A
   - `"table"`: Table
   - `"tag"`: Tag
-  - Mutually exclusive with `parse_type` and `pipeline_id`. If you set `chunk_method`, do not include `parse_type` or `pipeline_id`.
 
 - `"parser_config"`: (*Body parameter*), `object`  
   The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:  
@@ -538,14 +530,25 @@ Notes:
       - Defaults to: `{"use_raptor": false}`.
   - If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object.
 
-- "parse_type": (*Body parameter*), `int`  
-  The ingestion pipeline parse type identifier. Required if and only if you are using an ingestion pipeline (together with `"pipeline_id"`). Must not be provided when `"chunk_method"` is set.
+- `"parse_type"`: (*Body parameter*), `int`  
+  The ingestion pipeline parse type identifier, i.e., the number of parsers in your **Parser** component.  
+  - Required (along with `"pipeline_id"`) if specifying an ingestion pipeline.
+  - Must not be included when `"chunk_method"` is specified.
 
-- "pipeline_id": (*Body parameter*), `string`  
-  The ingestion pipeline ID. Required if and only if you are using an ingestion pipeline (together with `"parse_type"`).  
-  - Must not be provided when `"chunk_method"` is set.
+- `"pipeline_id"`: (*Body parameter*), `string`  
+  The ingestion pipeline ID. Can be found in the corresponding URL in the RAGFlow UI.
+  - Required (along with `"parse_type"`) if specifying an ingestion pipeline.
+  - Must be a 32-character lowercase hexadecimal string, e.g., `"d0bebe30ae2211f0970942010a8e0005"`.
+  - Must not be included when `"chunk_method"` is specified.
 
-Note: If none of `chunk_method`, `parse_type`, and `pipeline_id` are provided, the system will default to `chunk_method = "naive"`.
+:::caution WARNING
+You can choose either of the following ingestion options when creating a dataset, but *not* both:
+
+- Use a built-in chunk method -- specify `"chunk_method"` (optionally with `"parser_config"`).
+- Use an ingestion pipeline -- specify both `"parse_type"` and `"pipeline_id"`.
+
+If none of `"chunk_method"`, `"parse_type"`, or `"pipeline_id"` are provided, the system defaults to `chunk_method = "naive"`.
+:::
 
 #### Response