Fix: Enforce default embedding model in create_dataset / update_dataset (#8486)

### What problem does this PR solve? Previous: - Defaulted to hardcoded model 'BAAI/bge-large-zh-v1.5@BAAI' - Did not respect user-configured default embedding_model Now: - Correctly prioritizes user-configured default embedding_model Other: - Make embedding_model optional in CreateDatasetReq with proper None handling - Add default embedding model fallback in dataset update when empty - Enhance validation utils to handle None values and string normalization - Update SDK default embedding model to None to match API changes - Adjust related test cases to reflect new validation rules ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-28 22:26:36 +08:00 · 2025-06-25 16:41:32 +08:00
parent 340354b79c
commit dac5bcdf17
7 changed files with 52 additions and 30 deletions
--- a/api/apps/sdk/dataset.py
+++ b/api/apps/sdk/dataset.py
@ -347,6 +347,8 @@ def update(tenant_id, dataset_id):
                return get_error_data_result(message=f"Dataset name '{req['name']}' already exists")

        if "embd_id" in req:
+            if not req["embd_id"]:
+                req["embd_id"] = kb.embd_id
            if kb.chunk_num != 0 and req["embd_id"] != kb.embd_id:
                return get_error_data_result(message=f"When chunk_num ({kb.chunk_num}) > 0, embedding_model must remain {kb.embd_id}")
            ok, err = verify_embedding_availability(req["embd_id"], tenant_id)