Commit Graph

314 Commits

Author SHA1 Message Date
5fa6f2f151 Update embedding_model.py (#8836)
### What problem does this PR solve?

Remove useless covert for bge encode_queries

### Type of change

- [x] Performance Improvement
2025-07-15 14:04:58 +08:00
5383e254c4 Perf:Remove Useless Convert When BGE Embedding (#8816)
### What problem does this PR solve?

FlagModel internal support returns as numpy

### Type of change
- [x] Performance Improvement
2025-07-14 14:02:48 +08:00
07208e519b Fix: Wrong_Input_type_for_Gemin (#8783)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8763#issuecomment-3055317110

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-11 11:34:04 +08:00
1895667573 Feat: add xAI provider (#8781)
### What problem does this PR solve?

Add xAI provider (experimental feature, requires user feedback).

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-11 10:35:23 +08:00
8281ceb406 Refa: refine retry gap. (#8773)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2025-07-10 14:28:57 +08:00
8d027813f5 Refactor: Improve How To Handle QWenEmbed (#8765)
### What problem does this PR solve?

Based on https://github.com/infiniflow/ragflow/issues/8740 
1. A better handle for 'NoneType' object is not subscriptable
2. Add some logs to get the internal message

### Type of change

- [x] Refactoring
2025-07-10 10:30:18 +08:00
19419281c3 Fix: Change Ollama Embedding Keep Alive (#8734)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/8733

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-09 12:17:26 +08:00
e60ec0a31b Fix:disallowed special token while embedding (#8692)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8567

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-07 14:13:37 +08:00
9580e99650 fix: retry embedding with Qwen family models when limits temporarily reached. (#8690)
fix: retry embedding with Qwen family models when limits temporarily
reached.

APIs of Qwen family models are limited by calling rates. When reached,
the "output" attribute of the "resp" will be None, and in turn cause
TypeError when trying to retrieve "embeddings". Since these limits are
almost temporary, I have added a simple retry mechanism to avoid it.
Besides, if retry_max reached, the error can be early raised, instead of
hidden behind "TypeError".

### What problem does this PR solve?

Sometimes Qwen blocks calling due to rate limits, but it will cause the
whole parsing procedure stops when creating knowledge base. In this
situation, resp["output"] will be None, and resp["output"]["embeddings"]
will cause TypeError. Since the limits are temporary, I apply a simple
retry mechanism to solve it.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-07-07 12:15:52 +08:00
f8a6987f1e Refa: automatic LLMs registration (#8651)
### What problem does this PR solve?

Support automatic LLMs registration.

### Type of change

- [x] Refactoring
2025-07-03 19:05:31 +08:00
fffb7c0bba Fix: anthropic llm issue. (#8633)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-02 18:37:34 +08:00
898da23caa make dirs with 'exist_ok=True' (#8629)
### What problem does this PR solve?

The following error occurred during local testing, which should be fixed
by configuring 'exist_ok=True'.

```log
set_progress(7461edc2535c11f0a2aa0242c0a82009), progress: -1, progress_msg: 21:41:41 Page(1~100000001): [ERROR][Errno 17] File exists: '/ragflow/tmp'
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-02 18:35:16 +08:00
d343cb4deb Add Google Cloud Vision API Integration (Image2Text) (#8608)
### What problem does this PR solve?

This PR introduces Google Cloud Vision API integration to enhance image
understanding capabilities in the application. It addresses the need for
advanced image description and chat functionalities by implementing a
new `GoogleCV` class to handle API interactions and updating relevant
configurations. This enables users to leverage Google Cloud Vision for
image-to-text tasks, improving the application's ability to process and
interpret visual data.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-02 10:02:01 +08:00
1c77b4ed9b fix: Correctly format message parts in GoogleChat (#8596)
### What problem does this PR solve?

This PR addresses an incompatibility issue with the Google Chat API by
correcting the message content format in the `GoogleChat` class.
Previously, the content was directly assigned to the "parts" field,
which did not align with the API's expected format. This change ensures
that messages are properly formatted with a "text" key within a
dictionary, as required by the API.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-01 14:06:07 +08:00
d46c24045f Feat: add GiteeAI as a llm provider. (#8572)
### What problem does this PR solve?

#1853

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-30 11:22:11 +08:00
aafeffa292 Feat: add gitee as LLM provider. (#8545)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-30 09:22:31 +08:00
e441c17c2c Refa: limit embedding concurrency and fix chat_with_tool (#8543)
### What problem does this PR solve?

#8538

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-06-27 19:28:41 +08:00
a10f05f4d7 Fix: chat with tools bug. (#8528)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-27 12:10:53 +08:00
340354b79c fix the error 'Unknown field for GenerationConfig: max_tokens' when u… (#8473)
### What problem does this PR solve?
[https://github.com/infiniflow/ragflow/issues/8324](url)

docker image version: v0.19.1

The `_clean_conf` function was not implemented in the `_chat` and
`chat_streamly` methods of the `GeminiChat` class, causing the error
"Unknown field for GenerationConfig: max_tokens" when the default LLM
config includes the "max_tokens" parameter.

**Buggy Code(ragflow/rag/llm/chat_model.py)**
```python
class GeminiChat(Base):
    def __init__(self, key, model_name, base_url=None, **kwargs):
        super().__init__(key, model_name, base_url=base_url, **kwargs)

        from google.generativeai import GenerativeModel, client

        client.configure(api_key=key)
        _client = client.get_default_generative_client()
        self.model_name = "models/" + model_name
        self.model = GenerativeModel(model_name=self.model_name)
        self.model._client = _client

    def _clean_conf(self, gen_conf):
        for k in list(gen_conf.keys()):
            if k not in ["temperature", "top_p"]:
                del gen_conf[k]
        return gen_conf

    def _chat(self, history, gen_conf):
        from google.generativeai.types import content_types

        system = history[0]["content"] if history and history[0]["role"] == "system" else ""
        hist = []
        for item in history:
            if item["role"] == "system":
                continue
            hist.append(deepcopy(item))
            item = hist[-1]
            if "role" in item and item["role"] == "assistant":
                item["role"] = "model"
            if "role" in item and item["role"] == "system":
                item["role"] = "user"
            if "content" in item:
                item["parts"] = item.pop("content")

        if system:
            self.model._system_instruction = content_types.to_content(system)
        response = self.model.generate_content(hist, generation_config=gen_conf)
        ans = response.text
        return ans, response.usage_metadata.total_token_count

    def chat_streamly(self, system, history, gen_conf):
        from google.generativeai.types import content_types

        if system:
            self.model._system_instruction = content_types.to_content(system)
        #_clean_conf was not implemented 
        for k in list(gen_conf.keys()):
            if k not in ["temperature", "top_p", "max_tokens"]:
                del gen_conf[k]
        for item in history:
            if "role" in item and item["role"] == "assistant":
                item["role"] = "model"
            if "content" in item:
                item["parts"] = item.pop("content")
        ans = ""
        try:
            response = self.model.generate_content(history, generation_config=gen_conf, stream=True)
            for resp in response:
                ans = resp.text
                yield ans

            yield response._chunks[-1].usage_metadata.total_token_count
        except Exception as e:
            yield ans + "\n**ERROR**: " + str(e)

        yield 0
```
**Implement the _clean_conf function**
```python
class GeminiChat(Base):
    def __init__(self, key, model_name, base_url=None, **kwargs):
        super().__init__(key, model_name, base_url=base_url, **kwargs)

        from google.generativeai import GenerativeModel, client

        client.configure(api_key=key)
        _client = client.get_default_generative_client()
        self.model_name = "models/" + model_name
        self.model = GenerativeModel(model_name=self.model_name)
        self.model._client = _client

    def _clean_conf(self, gen_conf):
        for k in list(gen_conf.keys()):
            if k not in ["temperature", "top_p"]:
                del gen_conf[k]
        return gen_conf

    def _chat(self, history, gen_conf):
        from google.generativeai.types import content_types
        # implement _clean_conf to remove the wrong parameters
        gen_conf = self._clean_conf(gen_conf)

        system = history[0]["content"] if history and history[0]["role"] == "system" else ""
        hist = []
        for item in history:
            if item["role"] == "system":
                continue
            hist.append(deepcopy(item))
            item = hist[-1]
            if "role" in item and item["role"] == "assistant":
                item["role"] = "model"
            if "role" in item and item["role"] == "system":
                item["role"] = "user"
            if "content" in item:
                item["parts"] = item.pop("content")

        if system:
            self.model._system_instruction = content_types.to_content(system)
        response = self.model.generate_content(hist, generation_config=gen_conf)
        ans = response.text
        return ans, response.usage_metadata.total_token_count

    def chat_streamly(self, system, history, gen_conf):
        from google.generativeai.types import content_types
        # implement _clean_conf to remove the wrong parameters
        gen_conf = self._clean_conf(gen_conf)

        if system:
            self.model._system_instruction = content_types.to_content(system)
        #Removed duplicate parameter filtering logic "for k in list(gen_conf.keys()):"
        for item in history:
            if "role" in item and item["role"] == "assistant":
                item["role"] = "model"
            if "content" in item:
                item["parts"] = item.pop("content")
        ans = ""
        try:
            response = self.model.generate_content(history, generation_config=gen_conf, stream=True)
            for resp in response:
                ans = resp.text
                yield ans

            yield response._chunks[-1].usage_metadata.total_token_count
        except Exception as e:
            yield ans + "\n**ERROR**: " + str(e)

        yield 0
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-25 16:23:35 +08:00
49d67cbcb7 fix a bug when using huggingface embedding api (#8432)
### What problem does this PR solve?

image_version: v0.19.1
This PR fixes a bug in the HuggingFaceEmBedding API method that was
causing AssertionError: assert len(vects) == len(docs) during the
document embedding process.

#### Problem
The HuggingFaceEmbed.encode() method had an early return statement
inside the for loop, causing it to return after processing only the
first text input instead of processing all texts in the input list.

**Error Messenge**
```python
AssertionError: assert len(vects) == len(docs) # input chunks  != embedded  vectors from embedding api
File "/ragflow/rag/svr/task_executor.py", line 442, in embedding
```



**Buggy code(/ragflow/rag/llm/embedding_model.py)**
```python
class HuggingFaceEmbed(Base):
    def __init__(self, key, model_name, base_url=None):
        if not model_name:
            raise ValueError("Model name cannot be None")
        self.key = key
        self.model_name = model_name.split("___")[0]
        self.base_url = base_url or "http://127.0.0.1:8080"
        def encode(self, texts: list):
            embeddings = []
            for text in texts:
                response = requests.post(...)
                if response.status_code == 200:
                    try:
                        embedding = response.json()
                        embeddings.append(embedding[0])
                        #  Early return
                        return np.array(embeddings), sum([num_tokens_from_string(text) for text in texts]) 
                    except Exception as _e:
                        log_exception(_e, response)
                else:
                    raise Exception(...)
```
**Fixed Code(I just Rollback this function to the v0.19.0 version)**
```python
Class HuggingFaceEmbed(Base):
    def __init__(self, key, model_name, base_url=None):
        if not model_name:
            raise ValueError("Model name cannot be None")
        self.key = key
        self.model_name = model_name.split("___")[0]
        self.base_url = base_url or "http://127.0.0.1:8080"
        def encode(self, texts: list):
            embeddings = []
            for text in texts:
                response = requests.post(...)
                if response.status_code == 200:
                    embedding = response.json()
                    embeddings.append(embedding[0])  #  Only append, no return
                else:
                    raise Exception(...)
            return np.array(embeddings), sum([num_tokens_from_string(text) for text in texts])  #  Return after processing all
```
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-24 09:35:02 +08:00
fd7ac17605 Feat: Scratch MCP tool calling support. (#8263)
### What problem does this PR solve?

This is a cherry-pick from #7781 as requested.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-23 17:45:35 +08:00
244d8a47b9 Fix: AzureChat model code (#8426)
### What problem does this PR solve?

- Simplify AzureChat constructor by passing base_url directly
- Clean up spacing and formatting in chat_model.py
- Remove redundant parentheses and improve code consistency
- #8423

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-23 15:59:25 +08:00
ef5e7d8c44 Fix:embedding_model class SILICONFLOWEmbed(Base)Function reusing json (#8378)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8360

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-20 11:13:00 +08:00
35034fed73 Fix: Raptor: [Bug]: **ERROR**: Unknown field for GenerationConfig: max_tokens (#8331)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8324

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-18 16:40:57 +08:00
b1117a8717 Fix: base url issue. (#8281)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-16 13:40:25 +08:00
65d5268439 Feat: implement novitaAI embedding and reranking. (#8250)
### What problem does this PR solve?

Close #8227

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-13 15:42:17 +08:00
d36c8d18b1 Refa: make exception more clear. (#8224)
### What problem does this PR solve?

#8156

### Type of change
- [x] Refactoring
2025-06-12 17:53:59 +08:00
d5236b71f4 Refa: ollama keep alive issue. (#8216)
### What problem does this PR solve?

#8122

### Type of change

- [x] Refactoring
2025-06-12 15:09:40 +08:00
56ee69e9d9 Refa: chat with tools. (#8210)
### What problem does this PR solve?


### Type of change
- [x] Refactoring
2025-06-12 12:31:10 +08:00
1a5f991d86 Fix: auto-keyword and auto-question fail with qwq model (#8190)
### What problem does this PR solve?

Fix auto-keyword and auto-question fail with qwq model. #8189 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 11:37:07 +08:00
69e1fc496d Refa: chat models (#8187)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2025-06-11 17:20:12 +08:00
a43adafc6b Refa: Add error handling for JSON decode in embedding models (#8162)
### What problem does this PR solve?

Improve robustness of Jina, Nvidia, and SILICONFLOW embedding models by:
1. Adding try-catch blocks for JSON decode errors
2. Logging error details including response content
3. Raising exceptions with meaningful error messages

### Type of change

- [x] Refactoring
2025-06-10 19:04:17 +08:00
7ed9efcd4e Fix: QWenCV issue. (#8106)
### What problem does this PR solve?

Close #8097

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-06 17:55:13 +08:00
156290f8d0 Fix: url path join issue. (#8013)
### What problem does this PR solve?

Close #7980

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-03 14:18:40 +08:00
a1f06a4fdc Feat: Support tool calling in Generate component (#7572)
### What problem does this PR solve?

Hello, our use case requires LLM agent to invoke some tools, so I made a
simple implementation here.

This PR does two things:

1. A simple plugin mechanism based on `pluginlib`:

This mechanism lives in the `plugin` directory. It will only load
plugins from `plugin/embedded_plugins` for now.

A sample plugin `bad_calculator.py` is placed in
`plugin/embedded_plugins/llm_tools`, it accepts two numbers `a` and `b`,
then give a wrong result `a + b + 100`.

In the future, it can load plugins from external location with little
code change.

Plugins are divided into different types. The only plugin type supported
in this PR is `llm_tools`, which must implement the `LLMToolPlugin`
class in the `plugin/llm_tool_plugin.py`.
More plugin types can be added in the future.

2. A tool selector in the `Generate` component:

Added a tool selector to select one or more tools for LLM:


![image](https://github.com/user-attachments/assets/74a21fdf-9333-4175-991b-43df6524c5dc)

And with the `bad_calculator` tool, it results this with the `qwen-max`
model:


![image](https://github.com/user-attachments/assets/93aff9c4-8550-414a-90a2-1a15a5249d94)


### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2025-05-16 16:32:19 +08:00
5b626870d0 Refa: remove ollama keep alive. (#7560)
### What problem does this PR solve?

#7518

### Type of change

- [x] Refactoring
2025-05-09 17:51:49 +08:00
65537b8200 Fix:Set CUDA_VISIBLE_DEVICES In DefaultEmbedding (#7465)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/7420

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-06 14:38:36 +08:00
23dcbc94ef feat: replace models of novita (#7360)
### What problem does this PR solve?

Replace models of novita

### Type of change

- [x] Other (please describe): Replace models of novita
2025-04-28 13:35:09 +08:00
97a13ef1ab Fix: Qwen-vl-plus url error (#7281)
### What problem does this PR solve?

Fix Qwen-vl-* url error. #7277

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-25 09:20:10 +08:00
a008b38cf5 Fix: local variable referenced before assignment (#6909)
### What problem does this PR solve?

Fix: local variable referenced before assignment. #6803 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-09 20:29:12 +08:00
dc2c74b249 Feat: add primitive support for function calls (#6840)
### What problem does this PR solve?

This PR introduces ​**​primitive support for function calls​**​,
enabling the system to handle basic function call capabilities.
However, this feature is currently experimental and ​**​not yet enabled
for general use​**​, as it is only supported by a subset of models,
namely, Qwen and OpenAI models.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-04-08 16:09:03 +08:00
e7a2a4b7ff Log llm response on exception (#6750)
### What problem does this PR solve?

Log llm response on exception

### Type of change

- [x] Refactoring
2025-04-02 17:10:57 +08:00
46b5e32cd7 Feat: support vision llm for gpustack (#6636)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/6138

This PR is going to support vision llm for gpustack, modify url path
from `/v1-openai` to `/v1`

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-03-31 15:33:52 +08:00
c61df5dd25 Dynamic Context Window Size for Ollama Chat (#6582)
# Dynamic Context Window Size for Ollama Chat

## Problem Statement
Previously, the Ollama chat implementation used a fixed context window
size of 32768 tokens. This caused two main issues:
1. Performance degradation due to unnecessarily large context windows
for small conversations
2. Potential business logic failures when using smaller fixed sizes
(e.g., 2048 tokens)

## Solution
Implemented a dynamic context window size calculation that:
1. Uses a base context size of 8192 tokens
2. Applies a 1.2x buffer ratio to the total token count
3. Adds multiples of 8192 tokens based on the buffered token count
4. Implements a smart context size update strategy

## Implementation Details

### Token Counting Logic
```python
def count_tokens(text):
    """Calculate token count for text"""
    # Simple calculation: 1 token per ASCII character
    # 2 tokens for non-ASCII characters (Chinese, Japanese, Korean, etc.)
    total = 0
    for char in text:
        if ord(char) < 128:  # ASCII characters
            total += 1
        else:  # Non-ASCII characters
            total += 2
    return total
```

### Dynamic Context Calculation
```python
def _calculate_dynamic_ctx(self, history):
    """Calculate dynamic context window size"""
    # Calculate total tokens for all messages
    total_tokens = 0
    for message in history:
        content = message.get("content", "")
        content_tokens = count_tokens(content)
        role_tokens = 4  # Role marker token overhead
        total_tokens += content_tokens + role_tokens

    # Apply 1.2x buffer ratio
    total_tokens_with_buffer = int(total_tokens * 1.2)
    
    # Calculate context size in multiples of 8192
    if total_tokens_with_buffer <= 8192:
        ctx_size = 8192
    else:
        ctx_multiplier = (total_tokens_with_buffer // 8192) + 1
        ctx_size = ctx_multiplier * 8192
    
    return ctx_size
```

### Integration in Chat Method
```python
def chat(self, system, history, gen_conf):
    if system:
        history.insert(0, {"role": "system", "content": system})
    if "max_tokens" in gen_conf:
        del gen_conf["max_tokens"]
    try:
        # Calculate new context size
        new_ctx_size = self._calculate_dynamic_ctx(history)
        
        # Prepare options with context size
        options = {
            "num_ctx": new_ctx_size
        }
        # Add other generation options
        if "temperature" in gen_conf:
            options["temperature"] = gen_conf["temperature"]
        if "max_tokens" in gen_conf:
            options["num_predict"] = gen_conf["max_tokens"]
        if "top_p" in gen_conf:
            options["top_p"] = gen_conf["top_p"]
        if "presence_penalty" in gen_conf:
            options["presence_penalty"] = gen_conf["presence_penalty"]
        if "frequency_penalty" in gen_conf:
            options["frequency_penalty"] = gen_conf["frequency_penalty"]
            
        # Make API call with dynamic context size
        response = self.client.chat(
            model=self.model_name,
            messages=history,
            options=options,
            keep_alive=60
        )
        return response["message"]["content"].strip(), response.get("eval_count", 0) + response.get("prompt_eval_count", 0)
    except Exception as e:
        return "**ERROR**: " + str(e), 0
```

## Benefits
1. **Improved Performance**: Uses appropriate context windows based on
conversation length
2. **Better Resource Utilization**: Context window size scales with
content
3. **Maintained Compatibility**: Works with existing business logic
4. **Predictable Scaling**: Context growth in 8192-token increments
5. **Smart Updates**: Context size updates are optimized to reduce
unnecessary model reloads

## Future Considerations
1. Fine-tune buffer ratio based on usage patterns
2. Add monitoring for context window utilization
3. Consider language-specific token counting optimizations
4. Implement adaptive threshold based on conversation patterns
5. Add metrics for context size update frequency

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-03-28 12:38:27 +08:00
d2043ff9f2 Fix: LmStudioChat issue. (#6591)
### What problem does this PR solve?

#6577

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-27 14:59:15 +08:00
df3890827d Refa: change LLM chat output from full to delta (incremental) (#6534)
### What problem does this PR solve?

Change LLM chat output from full to delta (incremental)

### Type of change

- [x] Refactoring
2025-03-26 19:33:14 +08:00
cc8029a732 Fix: uploading in chat box issue. (#6547)
### What problem does this PR solve?

#6228

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-26 15:37:48 +08:00
12ad746ee6 Fix: Bedrock model invocation error. (#6533)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-26 11:27:12 +08:00
60c3a253ad Fix: api-key issue for xinference. (#6490)
### What problem does this PR solve?

#2792

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-25 15:01:13 +08:00
095fc84cf2 Fix: claude max tokens. (#6484)
### What problem does this PR solve?

#6458

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-25 10:41:55 +08:00