refactor: optimize agent list payload and improve multimodal detection logic (#12942)

## Description
This PR focuses on API performance optimization and refining the model
capability detection logic in the Agent/Canvas module.

### 1. Performance Optimization (Backend)
- **Changes**: Removed `cls.model.dsl` from query fields in
`UserCanvasService.get_by_tenant_ids`.
- **Reasoning**: The `dsl` object is large and unnecessary for the Agent
list view. Excluding it reduces the payload size of the
`/v1/canvas/list` API, leading to faster serialization and reduced
network latency.
- **Consistency**: Full DSL data remains accessible via the individual
`/v1/canvas/get/<id>` endpoint used in the detail view.

### 2. Multimodal Detection Refinement (Frontend)
- **Changes**: Replaced `model_type === LlmModelType.Image2text` with
`tags?.includes('IMAGE2TEXT')`.
- **Reasoning**: In RAGFlow, `model_type` defines the primary role of a
model (e.g., `chat`). However, many advanced Chat models are also
vision-capable. Since `model_type` is a single-value field, it cannot
represent these multiple capabilities.
- **Solution**: Utilizing the `tags` field (which supports multiple
attributes) to check for `IMAGE2TEXT` ensures that models like
`gpt-5.2-pro` correctly display multimodal input options.



## Type of Change
- [x] Bug fix (logic correction for multimodal detection)
- [x] Optimization (performance improvement for list API)

## Main Changes
- `api/db/services/canvas_service.py`: Optimized DB query by excluding
heavy DSL fields.
- `web/src/pages/agent/form/agent-form/index.tsx`: Enhanced capability
detection using the tags system.

## Verification
- [x] Verified Agent list loads faster with reduced response payload.
- [x] Confirmed that `chat` models with the `IMAGE2TEXT` tag now
correctly enable the multimodal input UI.
This commit is contained in:
eviaaaaa
2026-02-02 17:35:54 +08:00
committed by GitHub
parent 0121866ce4
commit 2e5a18602b
2 changed files with 1 additions and 2 deletions

View File

@ -146,7 +146,6 @@ class UserCanvasService(CommonService):
cls.model.id, cls.model.id,
cls.model.avatar, cls.model.avatar,
cls.model.title, cls.model.title,
cls.model.dsl,
cls.model.description, cls.model.description,
cls.model.permission, cls.model.permission,
cls.model.user_id.alias("tenant_id"), cls.model.user_id.alias("tenant_id"),

View File

@ -162,7 +162,7 @@ function AgentForm({ node }: INextOperatorForm) {
<FormWrapper> <FormWrapper>
{isSubAgent && <DescriptionField></DescriptionField>} {isSubAgent && <DescriptionField></DescriptionField>}
<LargeModelFormField showSpeech2TextModel></LargeModelFormField> <LargeModelFormField showSpeech2TextModel></LargeModelFormField>
{findLlmByUuid(llmId)?.model_type === LlmModelType.Image2text && ( {findLlmByUuid(llmId)?.tags?.includes('IMAGE2TEXT') && (
<QueryVariable <QueryVariable
name="visual_files_var" name="visual_files_var"
label="Visual Input File" label="Visual Input File"