From 83fac6d0a082c21745776613959e8a09f090805c Mon Sep 17 00:00:00 2001 From: writinwaters <93570324+writinwaters@users.noreply.github.com> Date: Wed, 3 Dec 2025 09:35:52 +0800 Subject: [PATCH] Docs: How to specify an ingestion pipeline when creating a dataset (#11670) ### What problem does this PR solve? ### Type of change - [x] Documentation Update --- docs/references/http_api_reference.md | 53 ++++++++++++++------------- 1 file changed, 28 insertions(+), 25 deletions(-) diff --git a/docs/references/http_api_reference.md b/docs/references/http_api_reference.md index 7f006ec3d..e836c9a31 100644 --- a/docs/references/http_api_reference.md +++ b/docs/references/http_api_reference.md @@ -419,17 +419,11 @@ Creates a dataset. - `"embedding_model"`: `string` - `"permission"`: `string` - `"chunk_method"`: `string` - - "parser_config": `object` - - "parse_type": `int` - - "pipeline_id": `string` + - `"parser_config"`: `object` + - `"parse_type"`: `int` + - `"pipeline_id"`: `string` -Note: Choose exactly one ingestion mode when creating a dataset. -- Chunking method: provide `"chunk_method"` (optionally with `"parser_config"`). -- Ingestion pipeline: provide both `"parse_type"` and `"pipeline_id"` and do not provide `"chunk_method"`. - -These options are mutually exclusive. If all three of `chunk_method`, `parse_type`, and `pipeline_id` are omitted, the system defaults to `chunk_method = "naive"`. - -##### Request example +##### A basic request example ```bash curl --request POST \ @@ -441,9 +435,11 @@ curl --request POST \ }' ``` -##### Request example (ingestion pipeline) +##### A request example specifying ingestion pipeline -Use this form when specifying an ingestion pipeline (do not include `chunk_method`). +:::caution WARNING +You must *not* include `"chunk_method"` or `"parser_config"` when specifying an ingestion pipeline. +::: ```bash curl --request POST \ @@ -452,15 +448,11 @@ curl --request POST \ --header 'Authorization: Bearer ' \ --data '{ "name": "test-sdk", - "parse_type": , + "parse_type": , "pipeline_id": "" }' ``` -Notes: -- `parse_type` is an integer. Replace `` with your pipeline's parse-type value. -- `pipeline_id` must be a 32-character lowercase hexadecimal string. - ##### Request parameters - `"name"`: (*Body parameter*), `string`, *Required* @@ -488,7 +480,8 @@ Notes: - `"team"`: All team members can manage the dataset. - `"chunk_method"`: (*Body parameter*), `enum` - The chunking method of the dataset to create. Available options: + The default chunk method of the dataset to create. Mutually exclusive with `"parse_type"` and `"pipeline_id"`. If you set `"chunk_method"`, do not include `"parse_type"` or `"pipeline_id"`. + Available options: - `"naive"`: General (default) - `"book"`: Book - `"email"`: Email @@ -501,7 +494,6 @@ Notes: - `"qa"`: Q&A - `"table"`: Table - `"tag"`: Tag - - Mutually exclusive with `parse_type` and `pipeline_id`. If you set `chunk_method`, do not include `parse_type` or `pipeline_id`. - `"parser_config"`: (*Body parameter*), `object` The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`: @@ -538,14 +530,25 @@ Notes: - Defaults to: `{"use_raptor": false}`. - If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object. -- "parse_type": (*Body parameter*), `int` - The ingestion pipeline parse type identifier. Required if and only if you are using an ingestion pipeline (together with `"pipeline_id"`). Must not be provided when `"chunk_method"` is set. +- `"parse_type"`: (*Body parameter*), `int` + The ingestion pipeline parse type identifier, i.e., the number of parsers in your **Parser** component. + - Required (along with `"pipeline_id"`) if specifying an ingestion pipeline. + - Must not be included when `"chunk_method"` is specified. -- "pipeline_id": (*Body parameter*), `string` - The ingestion pipeline ID. Required if and only if you are using an ingestion pipeline (together with `"parse_type"`). - - Must not be provided when `"chunk_method"` is set. +- `"pipeline_id"`: (*Body parameter*), `string` + The ingestion pipeline ID. Can be found in the corresponding URL in the RAGFlow UI. + - Required (along with `"parse_type"`) if specifying an ingestion pipeline. + - Must be a 32-character lowercase hexadecimal string, e.g., `"d0bebe30ae2211f0970942010a8e0005"`. + - Must not be included when `"chunk_method"` is specified. -Note: If none of `chunk_method`, `parse_type`, and `pipeline_id` are provided, the system will default to `chunk_method = "naive"`. +:::caution WARNING +You can choose either of the following ingestion options when creating a dataset, but *not* both: + +- Use a built-in chunk method -- specify `"chunk_method"` (optionally with `"parser_config"`). +- Use an ingestion pipeline -- specify both `"parse_type"` and `"pipeline_id"`. + +If none of `"chunk_method"`, `"parse_type"`, or `"pipeline_id"` are provided, the system defaults to `chunk_method = "naive"`. +::: #### Response