## What problem does this PR solve?
This PR addresses three specific issues to improve agent reliability and
model support:
1. **`codeExec` Output Limitation**: Previously, the `codeExec` tool was
strictly limited to returning `string` types. I updated the output
constraint to `object` to support structured data (Dicts, Lists, etc.)
required for complex downstream tasks.
2. **`codeExec` Error Handling**: Improved the execution logic so that
when runtime errors occur, the tool captures the exception and returns
the error message as the output instead of causing the process to abort
or fail silently.
3. **Spark Model Configuration**:
- Added support for the `MAX-32k` model variant.
- Fixed the `Spark-Lite` mapping from `general` to `lite` to match the
latest API specifications.
## Type of change
- [x] Bug Fix (fixes execution logic and model mapping)
- [x] New Feature / Enhancement (adds model support and improves tool
flexibility)
## Key Changes
### `agent/tools/code_exec.py`
- Changed the output type definition from `string` to `object`.
- Refactored the execution flow to gracefully catch exceptions and
return error messages as part of the tool output.
### `rag/llm/chat_model.py`
- Added `"Spark-Max-32K": "max-32k"` to the model list.
- Updated `"Spark-Lite"` value from `"general"` to `"lite"`.
## Checklist
- [x] My code follows the style guidelines of this project.
- [x] I have performed a self-review of my own code.
Signed-off-by: evilhero <2278596667@qq.com>
### Issue
When using Qwen3 models (`qwen3-32b`, `qwen3-max`) through the
Tongyi-Qianwen provider for non-streaming calls (e.g., knowledge graph
generation), the API fails with:
Closes#12424
```
parameter.enable_thinking must be set to false for non-streaming calls
```
### Root Cause
In `LiteLLMBase.async_chat()`, the `extra_body={"enable_thinking":
False}` was set in `kwargs` but never forwarded to
`_construct_completion_args()`.
### What problem does this PR solve?
Pass merged kwargs to `_construct_completion_args()` using
`**{**gen_conf, **kwargs}` to safely handle potential duplicate
parameters.
### Changes
- `rag/llm/chat_model.py`: Forward kwargs containing `extra_body` to
`_construct_completion_args()` in `async_chat()`
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=42954461
### What problem does this PR solve?
Feat: bedrock iam authentication #12008
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Asure-OpenAI resource not found. #11750
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: duplicate output by async_chat_streamly
Refact: revert manual modification
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
## What changes were proposed in this pull request?
Added a return statement after the successful completion of the async
for loop in async_chat_streamly.
## Why are the changes needed?
Previously, the code lacked a break/return mechanism inside the try
block. This caused the retry loop (for attempt in range...) to continue
executing even after the LLM response was successfully generated and
yielded, resulting in duplicate requests (up to max_retries times).
## Does this PR introduce any user-facing change?
No (it fixes an internal logic bug).
### What problem does this PR solve?
Migrate CV model chat to Async. #11750
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
### What problem does this PR solve?
Cleanup synchronous functions in chat_model and implement
synchronization for conversation and dialog chats.
### Type of change
- [x] Refactoring
- [x] Performance Improvement
### What problem does this PR solve?
Make RAGFlow more asynchronous 2. #11551, #11579, #11619.
### Type of change
- [x] Refactoring
- [x] Performance Improvement
### What problem does this PR solve?
Make RAGFlow more asynchronous 2. #11551, #11579, #11619.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
- [x] Performance Improvement
### What problem does this PR solve?
Add MiniMax-M2 and remove deprecated models.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
### What problem does this PR solve?
Try to make this more asynchronous. Verified in chat and agent
scenarios, reducing blocking behavior. #11551, #11579.
However, the impact of these changes still requires further
investigation to ensure everything works as expected.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Add fallbacks for MinerU output path. #11613, #11620.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add GPT-5.1, GPT‑5.1 Instant and Claude-Opus-4.5. #11548
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add auth header for Ollama chat model. #11350
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: Jason <ggbbddjm@gmail.com>
### What problem does this PR solve?
#10056
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
issue:
[#5787](https://github.com/infiniflow/ragflow/issues/5787)
change:
Support Specifying OpenRouter Model Provider
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Gemini 2.5 Flash Models use reasoning by default. There is currently no
way to disable this behaviour. This leads to very long response times (>
1min). The default behaviour should be, that reasoning is disabled and
configurable
issue #10474
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Google Cloud model does not work correctly with gemini-2.5 models
Close#10408
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
issue:
[#6193](https://github.com/infiniflow/ragflow/issues/6193)
change:
support qwq reasoning models with non-stream output
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
**Adds a new feature that enables the LLM to extract a structured table
of contents (TOC) directly from plain text.**
_This implementation prioritizes efficiency over reasoning — the model
runs in a strictly deterministic mode (thinking disabled) to minimize
latency.
As a result, overall performance may be less optimal, but the extraction
speed and consistency are guaranteed._
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### Related issues
#10078
### What problem does this PR solve?
Integrate DeerAPI provider.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
Co-authored-by: DeerAPI <tensor.null@gmail.com>
### What problem does this PR solve?
Add support for international Dashscope service. #10340
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix invalid COMPONENT_EXEC_TIMEOUT. #10273
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
issue:
[Bug]: ERROR: list index out of range #10188
change:
fix a potential list index out of range error in chat response parsing
by adding explicit checks for empty choices.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Revert back to chat.completions.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [x] Other (please describe):
Revert back to chat.completions.
### What problem does this PR solve?
Currently, Azure OpenAI returns one minute Quota limit responses when
chat API is utilized. This change is needed in order to be able to
process almost any documents using models deployed in Azure Foundry.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: resolve hash collisions by switching to UUID &correct logic for
always-true statements, solved: #10165
Feat: Update GPT api integration, solved: #10204
Feat: Support qianwen-deepresearch, solved: #10163
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Migrate OpenAI-compatible chats to LiteLLM.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### Related issues
#10078
### What problem does this PR solve?
Integrate CometAPI provider.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update