Refa: http API create dataset and test cases (#7393)

### What problem does this PR solve?

This PR introduces Pydantic-based validation for the create dataset HTTP
API, improving code clarity and robustness. Key changes include:
1. Pydantic Validation
2. ​​Error Handling
3. Test Updates
4. Documentation

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
- [x] Refactoring
This commit is contained in:
liu an
2025-04-29 16:53:57 +08:00
committed by GitHub
parent c88e4b3fc0
commit 78380fa181
11 changed files with 1239 additions and 812 deletions

View File

@ -95,11 +95,12 @@ else:
```python
RAGFlow.create_dataset(
name: str,
avatar: str = "",
description: str = "",
embedding_model: str = "BAAI/bge-large-zh-v1.5",
avatar: Optional[str] = None,
description: Optional[str] = None,
embedding_model: Optional[str] = "BAAI/bge-large-zh-v1.5@BAAI",
permission: str = "me",
chunk_method: str = "naive",
pagerank: int = 0,
parser_config: DataSet.ParserConfig = None
) -> DataSet
```
@ -112,16 +113,16 @@ Creates a dataset.
The unique name of the dataset to create. It must adhere to the following requirements:
- Maximum 65,535 characters.
- Maximum 128 characters.
- Case-insensitive.
##### avatar: `str`
Base64 encoding of the avatar. Defaults to `""`
Base64 encoding of the avatar. Defaults to `None`
##### description: `str`
A brief description of the dataset to create. Defaults to `""`.
A brief description of the dataset to create. Defaults to `None`.
##### permission
@ -147,6 +148,10 @@ The chunking method of the dataset to create. Available options:
- `"one"`: One
- `"email"`: Email
##### pagerank, `int`
The pagerank of the dataset to create. Defaults to `0`.
##### parser_config
The parser configuration of the dataset. A `ParserConfig` object's attributes vary based on the selected `chunk_method`: