Commit Graph

192 Commits

Author SHA1 Message Date
a1f848bfe0 Fix:max_tokens must be at least 1, got -950, BadRequestError (#10252)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/10235

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-09-24 10:49:34 +08:00
38be53cf31 fix: prevent list index out of range in chat streaming (#10238)
### What problem does this PR solve?
issue:
[Bug]: ERROR: list index out of range #10188
change:
fix a potential list index out of range error in chat response parsing
by adding explicit checks for empty choices.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-23 19:59:39 +08:00
10cbbb76f8 revert gpt5 integration (#10228)
### What problem does this PR solve?

  Revert back to chat.completions.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [x] Other (please describe):
  Revert back to chat.completions.
2025-09-23 16:06:12 +08:00
1c84d1b562 Fix: azure OpenAI retry (#10213)
### What problem does this PR solve?

Currently, Azure OpenAI returns one minute Quota limit responses when
chat API is utilized. This change is needed in order to be able to
process almost any documents using models deployed in Azure Foundry.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-23 12:19:28 +08:00
da82566304 Fix: resolve hash collisions by switching to UUID &correct logic for always-true statements & Update GPT api integration & Support qianwen-deepresearch (#10208)
### What problem does this PR solve?

Fix: resolve hash collisions by switching to UUID &correct logic for
always-true statements, solved: #10165
Feat: Update GPT api integration, solved: #10204 
Feat: Support qianwen-deepresearch, solved: #10163 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-09-23 09:34:30 +08:00
94dbd4aac9 Refactor: use the same implement for total token count from res (#10197)
### What problem does this PR solve?
use the same implement for total token count from res

### Type of change

- [x] Refactoring
2025-09-22 17:17:06 +08:00
4693c5382a Feat: migrate OpenAI-compatible chats to LiteLLM (#10148)
### What problem does this PR solve?

Migrate OpenAI-compatible chats to LiteLLM.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-18 17:16:59 +08:00
f12b9fdcd4 Feat: add CometAPI to LLMFactory and update related mappings (#10119)
### Related issues
#10078

### What problem does this PR solve?
Integrate CometAPI provider.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
2025-09-18 09:51:29 +08:00
e1d86cfee3 Feat: add TokenPony model provider (#9932)
### What problem does this PR solve?

Add TokenPony as a LLM provider

Co-authored-by: huangzl <huangzl@shinemo.com>
2025-09-11 17:25:31 +08:00
936f27e9e5 Feat: add LongCat-Flash-Chat (#9973)
### What problem does this PR solve?

Add LongCat-Flash-Chat from Meituan, deepseek v3.1 from SiliconFlow,
kimi-k2-09-05-preview and kimi-k2-turbo-preview from Moonshot.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-08 19:00:52 +08:00
91d6fb8061 Fix miscalculated token count (#9776)
### What problem does this PR solve?

The total token was incorrectly accumulated when using the
OpenAI-API-Compatible api.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-05 19:17:21 +08:00
b58e882eaa Feat: add exponential back-off for Chat LiteLLM (#9880)
### What problem does this PR solve?

Add exponential back-off for Chat LiteLLM. #9858.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-03 13:31:43 +08:00
56cd576876 Refa: revise the implementation of LightRAG and enable response caching (#9828)
### What problem does this PR solve?

This revision performed a comprehensive check on LightRAG to ensure the
correctness of its implementation. It **did not involve** Entity
Resolution and Community Reports Generation. There is an example using
default entity types and the General chunking method, which shows good
results in both time and effectiveness. Moreover, response caching is
enabled for resuming failed tasks.


[The-Necklace.pdf](https://github.com/user-attachments/files/22042432/The-Necklace.pdf)

After:


![img_v3_02pk_177dbc6a-e7cc-4732-b202-ad4682d171fg](https://github.com/user-attachments/assets/5ef1d93a-9109-4fe9-8a7b-a65add16f82b)


```bash
Begin at:
Fri, 29 Aug 2025 16:48:03 GMT
Duration:
222.31 s
Progress:
16:48:04 Task has been received.
16:48:06 Page(1~7): Start to parse.
16:48:06 Page(1~7): OCR started
16:48:08 Page(1~7): OCR finished (1.89s)
16:48:11 Page(1~7): Layout analysis (3.72s)
16:48:11 Page(1~7): Table analysis (0.00s)
16:48:11 Page(1~7): Text merged (0.00s)
16:48:11 Page(1~7): Finish parsing.
16:48:12 Page(1~7): Generate 7 chunks
16:48:12 Page(1~7): Embedding chunks (0.29s)
16:48:12 Page(1~7): Indexing done (0.04s). Task done (7.84s)
16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: She had no dresses, no je...
16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: Her husband, already half...
16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: And this life lasted ten ...
16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: Then she asked, hesitatin...
16:49:30 Completed processing for f421fb06849e11f0bdd32724b93a52b2: She had no dresses, no je... after 1 gleanings, 21985 tokens.
16:49:30 Entities extraction of chunk 3 1/7 done, 12 nodes, 13 edges, 21985 tokens.
16:49:40 Completed processing for f421fb06849e11f0bdd32724b93a52b2: Finally, she replied, hes... after 1 gleanings, 22584 tokens.
16:49:40 Entities extraction of chunk 5 2/7 done, 19 nodes, 19 edges, 22584 tokens.
16:50:02 Completed processing for f421fb06849e11f0bdd32724b93a52b2: Then she asked, hesitatin... after 1 gleanings, 24610 tokens.
16:50:02 Entities extraction of chunk 0 3/7 done, 16 nodes, 28 edges, 24610 tokens.
16:50:03 Completed processing for f421fb06849e11f0bdd32724b93a52b2: And this life lasted ten ... after 1 gleanings, 24031 tokens.
16:50:04 Entities extraction of chunk 1 4/7 done, 24 nodes, 22 edges, 24031 tokens.
16:50:14 Completed processing for f421fb06849e11f0bdd32724b93a52b2: So they begged the jewell... after 1 gleanings, 24635 tokens.
16:50:14 Entities extraction of chunk 6 5/7 done, 27 nodes, 26 edges, 24635 tokens.
16:50:29 Completed processing for f421fb06849e11f0bdd32724b93a52b2: Her husband, already half... after 1 gleanings, 25758 tokens.
16:50:29 Entities extraction of chunk 2 6/7 done, 25 nodes, 35 edges, 25758 tokens.
16:51:35 Completed processing for f421fb06849e11f0bdd32724b93a52b2: The Necklace By Guy de Ma... after 1 gleanings, 27491 tokens.
16:51:35 Entities extraction of chunk 4 7/7 done, 39 nodes, 37 edges, 27491 tokens.
16:51:35 Entities and relationships extraction done, 147 nodes, 177 edges, 171094 tokens, 198.58s.
16:51:35 Entities merging done, 0.01s.
16:51:35 Relationships merging done, 0.01s.
16:51:35 ignored 7 relations due to missing entities.
16:51:35 generated subgraph for doc f421fb06849e11f0bdd32724b93a52b2 in 198.68 seconds.
16:51:35 run_graphrag f421fb06849e11f0bdd32724b93a52b2 graphrag_task_lock acquired
16:51:35 set_graph removed 0 nodes and 0 edges from index in 0.00s.
16:51:35 Get embedding of nodes: 9/147
16:51:35 Get embedding of nodes: 109/147
16:51:37 Get embedding of edges: 9/170
16:51:37 Get embedding of edges: 109/170
16:51:40 set_graph converted graph change to 319 chunks in 4.21s.
16:51:40 Insert chunks: 4/319
16:51:40 Insert chunks: 104/319
16:51:40 Insert chunks: 204/319
16:51:40 Insert chunks: 304/319
16:51:40 set_graph added/updated 147 nodes and 170 edges from index in 0.53s.
16:51:40 merging subgraph for doc f421fb06849e11f0bdd32724b93a52b2 into the global graph done in 4.79 seconds.
16:51:40 Knowledge Graph done (204.29s)
```

Before:


![img_v3_02pk_63370edf-ecee-4ee8-8ac8-69c8d2c712fg](https://github.com/user-attachments/assets/1162eb0f-68c2-4de5-abe0-cdfa168f71de)

```bash
Begin at:
Fri, 29 Aug 2025 17:00:47 GMT
processDuration:
173.38 s
Progress:
17:00:49 Task has been received.
17:00:51 Page(1~7): Start to parse.
17:00:51 Page(1~7): OCR started
17:00:53 Page(1~7): OCR finished (1.82s)
17:00:57 Page(1~7): Layout analysis (3.64s)
17:00:57 Page(1~7): Table analysis (0.00s)
17:00:57 Page(1~7): Text merged (0.00s)
17:00:57 Page(1~7): Finish parsing.
17:00:57 Page(1~7): Generate 7 chunks
17:00:57 Page(1~7): Embedding chunks (0.31s)
17:00:57 Page(1~7): Indexing done (0.03s). Task done (7.88s)
17:00:57 created task graphrag
17:01:00 Task has been received.
17:02:17 Entities extraction of chunk 1 1/7 done, 9 nodes, 9 edges, 10654 tokens.
17:02:31 Entities extraction of chunk 2 2/7 done, 12 nodes, 13 edges, 11066 tokens.
17:02:33 Entities extraction of chunk 4 3/7 done, 9 nodes, 10 edges, 10433 tokens.
17:02:42 Entities extraction of chunk 5 4/7 done, 11 nodes, 14 edges, 11290 tokens.
17:02:52 Entities extraction of chunk 6 5/7 done, 13 nodes, 15 edges, 11039 tokens.
17:02:55 Entities extraction of chunk 3 6/7 done, 14 nodes, 13 edges, 11466 tokens.
17:03:32 Entities extraction of chunk 0 7/7 done, 19 nodes, 18 edges, 13107 tokens.
17:03:32 Entities and relationships extraction done, 71 nodes, 89 edges, 79055 tokens, 149.66s.
17:03:32 Entities merging done, 0.01s.
17:03:32 Relationships merging done, 0.01s.
17:03:32 ignored 1 relations due to missing entities.
17:03:32 generated subgraph for doc b1d9d3b6848711f0aacd7ddc0714c4d3 in 149.69 seconds.
17:03:32 run_graphrag b1d9d3b6848711f0aacd7ddc0714c4d3 graphrag_task_lock acquired
17:03:32 set_graph removed 0 nodes and 0 edges from index in 0.00s.
17:03:32 Get embedding of nodes: 9/71
17:03:33 Get embedding of edges: 9/88
17:03:34 set_graph converted graph change to 161 chunks in 2.27s.
17:03:34 Insert chunks: 4/161
17:03:34 Insert chunks: 104/161
17:03:34 set_graph added/updated 71 nodes and 88 edges from index in 0.28s.
17:03:34 merging subgraph for doc b1d9d3b6848711f0aacd7ddc0714c4d3 into the global graph done in 2.60 seconds.
17:03:34 Knowledge Graph done (153.18s)

```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
- [x] Performance Improvement
2025-08-29 17:58:36 +08:00
fcd18d7d87 Fix: Ollama chat cannot access remote deployment (#9816)
### What problem does this PR solve?

Fix Ollama chat can only access localhost instance. #9806.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-29 13:35:41 +08:00
b6c1ca828e Refa: replace Chat Ollama implementation with LiteLLM (#9693)
### What problem does this PR solve?

replace Chat Ollama implementation with LiteLLM.

### Type of change

- [x] Refactoring
2025-08-25 17:56:31 +08:00
3947da10ae Fix: unexpected LLM parameters (#9661)
### What problem does this PR solve?

Remove unexpected LLM parameters.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-22 19:33:09 +08:00
a0c2da1219 Fix: Patch LiteLLM (#9416)
### What problem does this PR solve?

Patch LiteLLM refactor. #9408

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-12 15:54:30 +08:00
83771e500c Refa: migrate chat models to LiteLLM (#9394)
### What problem does this PR solve?

All models pass the mock response tests, which means that if a model can
return the correct response, everything should work as expected.
However, not all models have been fully tested in a real environment,
the real API_KEY. I suggest actively monitoring the refactored models
over the coming period to ensure they work correctly and fixing them
step by step, or waiting to merge until most have been tested in
practical environment.

### Type of change

- [x] Refactoring
2025-08-12 10:59:20 +08:00
7713e14d6a Update chat_model.py (#9318)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/9317
base on
https://discuss.ai.google.dev/t/valueerror-invalid-operation-the-response-text-quick-accessor-requires-the-response-to-contain-a-valid-part-but-none-were-returned/42866
should can be handled by retry 
### Type of change

- [x] Refactoring
2025-08-08 14:13:07 +08:00
35539092d0 Add **kwargs to model base class constructors (#9252)
Updated constructors for base and derived classes in chat, embedding,
rerank, sequence2txt, and tts models to accept **kwargs. This change
improves extensibility and allows passing additional parameters without
breaking existing interfaces.

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: IT: Sop.Son <sop.son@feavn.local>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-07 09:45:37 +08:00
e9cbf4611d Fix:Error when parsing files using Gemini: **ERROR**: GENERIC_ERROR - Unknown field for GenerationConfig: max_tokens (#9195)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/9177
The reason should be due to the gemin internal use a different parameter
name
`
        max_output_tokens (int):
            Optional. The maximum number of tokens to include in a
            response candidate.

            Note: The default value varies by model, see the
            ``Model.output_token_limit`` attribute of the ``Model``
            returned from the ``getModel`` function.

            This field is a member of `oneof`_ ``_max_output_tokens``.
`
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-04 10:06:09 +08:00
aeaeb169e4 Feat/support 302ai provider (#8742)
### What problem does this PR solve?

Support 302.AI provider.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-31 14:48:30 +08:00
d9fe279dde Feat: Redesign and refactor agent module (#9113)
### What problem does this PR solve?

#9082 #6365

<u> **WARNING: it's not compatible with the older version of `Agent`
module, which means that `Agent` from older versions can not work
anymore.**</u>

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-30 19:41:09 +08:00
021e8b57ae Fix: fix error 429 api rate limit when building knowledge graph for all chat model and Mistral embedding model (#9106)
### What problem does this PR solve?

fix error 429 api rate limit when building knowledge graph for all chat
model and Mistral embedding model.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-30 11:37:49 +08:00
b47dcc9108 Fix issue with keep_alive=-1 for ollama chat model by allowing a user to set an additional configuration option (#9017)
### What problem does this PR solve?

fix issue with `keep_alive=-1` for ollama chat model by allowing a user
to set an additional configuration option. It is no-breaking change
because it still uses a previous default value such as: `keep_alive=-1`

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [X] Performance Improvement
- [X] Other (please describe):
- Additional configuration option has been added to control behavior of
RAGFlow while working with ollama LLM
2025-07-24 11:20:14 +08:00
7ebc1f0943 Feat: add model provider DeepInfra (#9003)
### What problem does this PR solve?

Add model provider DeepInfra. This model list comes from our community. 

NOTE: most endpoints haven't been tested, but they should work as OpenAI
does.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-23 18:10:35 +08:00
9e45fcfdb3 Fix: fix typo in OpenAI error logging message (#8865)
### What problem does this PR solve?

Correct the logging message from "OpenAI cat_with_tools" to "OpenAI
chat_with_tools" in the `_exceptions` method of the `Base` class to
accurately reflect the method name and improve error traceability.

### Type of change

- [x] Typo
2025-07-16 15:31:57 +08:00
1895667573 Feat: add xAI provider (#8781)
### What problem does this PR solve?

Add xAI provider (experimental feature, requires user feedback).

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-11 10:35:23 +08:00
8281ceb406 Refa: refine retry gap. (#8773)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2025-07-10 14:28:57 +08:00
f8a6987f1e Refa: automatic LLMs registration (#8651)
### What problem does this PR solve?

Support automatic LLMs registration.

### Type of change

- [x] Refactoring
2025-07-03 19:05:31 +08:00
fffb7c0bba Fix: anthropic llm issue. (#8633)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-02 18:37:34 +08:00
1c77b4ed9b fix: Correctly format message parts in GoogleChat (#8596)
### What problem does this PR solve?

This PR addresses an incompatibility issue with the Google Chat API by
correcting the message content format in the `GoogleChat` class.
Previously, the content was directly assigned to the "parts" field,
which did not align with the API's expected format. This change ensures
that messages are properly formatted with a "text" key within a
dictionary, as required by the API.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-01 14:06:07 +08:00
aafeffa292 Feat: add gitee as LLM provider. (#8545)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-30 09:22:31 +08:00
e441c17c2c Refa: limit embedding concurrency and fix chat_with_tool (#8543)
### What problem does this PR solve?

#8538

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-06-27 19:28:41 +08:00
a10f05f4d7 Fix: chat with tools bug. (#8528)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-27 12:10:53 +08:00
340354b79c fix the error 'Unknown field for GenerationConfig: max_tokens' when u… (#8473)
### What problem does this PR solve?
[https://github.com/infiniflow/ragflow/issues/8324](url)

docker image version: v0.19.1

The `_clean_conf` function was not implemented in the `_chat` and
`chat_streamly` methods of the `GeminiChat` class, causing the error
"Unknown field for GenerationConfig: max_tokens" when the default LLM
config includes the "max_tokens" parameter.

**Buggy Code(ragflow/rag/llm/chat_model.py)**
```python
class GeminiChat(Base):
    def __init__(self, key, model_name, base_url=None, **kwargs):
        super().__init__(key, model_name, base_url=base_url, **kwargs)

        from google.generativeai import GenerativeModel, client

        client.configure(api_key=key)
        _client = client.get_default_generative_client()
        self.model_name = "models/" + model_name
        self.model = GenerativeModel(model_name=self.model_name)
        self.model._client = _client

    def _clean_conf(self, gen_conf):
        for k in list(gen_conf.keys()):
            if k not in ["temperature", "top_p"]:
                del gen_conf[k]
        return gen_conf

    def _chat(self, history, gen_conf):
        from google.generativeai.types import content_types

        system = history[0]["content"] if history and history[0]["role"] == "system" else ""
        hist = []
        for item in history:
            if item["role"] == "system":
                continue
            hist.append(deepcopy(item))
            item = hist[-1]
            if "role" in item and item["role"] == "assistant":
                item["role"] = "model"
            if "role" in item and item["role"] == "system":
                item["role"] = "user"
            if "content" in item:
                item["parts"] = item.pop("content")

        if system:
            self.model._system_instruction = content_types.to_content(system)
        response = self.model.generate_content(hist, generation_config=gen_conf)
        ans = response.text
        return ans, response.usage_metadata.total_token_count

    def chat_streamly(self, system, history, gen_conf):
        from google.generativeai.types import content_types

        if system:
            self.model._system_instruction = content_types.to_content(system)
        #_clean_conf was not implemented 
        for k in list(gen_conf.keys()):
            if k not in ["temperature", "top_p", "max_tokens"]:
                del gen_conf[k]
        for item in history:
            if "role" in item and item["role"] == "assistant":
                item["role"] = "model"
            if "content" in item:
                item["parts"] = item.pop("content")
        ans = ""
        try:
            response = self.model.generate_content(history, generation_config=gen_conf, stream=True)
            for resp in response:
                ans = resp.text
                yield ans

            yield response._chunks[-1].usage_metadata.total_token_count
        except Exception as e:
            yield ans + "\n**ERROR**: " + str(e)

        yield 0
```
**Implement the _clean_conf function**
```python
class GeminiChat(Base):
    def __init__(self, key, model_name, base_url=None, **kwargs):
        super().__init__(key, model_name, base_url=base_url, **kwargs)

        from google.generativeai import GenerativeModel, client

        client.configure(api_key=key)
        _client = client.get_default_generative_client()
        self.model_name = "models/" + model_name
        self.model = GenerativeModel(model_name=self.model_name)
        self.model._client = _client

    def _clean_conf(self, gen_conf):
        for k in list(gen_conf.keys()):
            if k not in ["temperature", "top_p"]:
                del gen_conf[k]
        return gen_conf

    def _chat(self, history, gen_conf):
        from google.generativeai.types import content_types
        # implement _clean_conf to remove the wrong parameters
        gen_conf = self._clean_conf(gen_conf)

        system = history[0]["content"] if history and history[0]["role"] == "system" else ""
        hist = []
        for item in history:
            if item["role"] == "system":
                continue
            hist.append(deepcopy(item))
            item = hist[-1]
            if "role" in item and item["role"] == "assistant":
                item["role"] = "model"
            if "role" in item and item["role"] == "system":
                item["role"] = "user"
            if "content" in item:
                item["parts"] = item.pop("content")

        if system:
            self.model._system_instruction = content_types.to_content(system)
        response = self.model.generate_content(hist, generation_config=gen_conf)
        ans = response.text
        return ans, response.usage_metadata.total_token_count

    def chat_streamly(self, system, history, gen_conf):
        from google.generativeai.types import content_types
        # implement _clean_conf to remove the wrong parameters
        gen_conf = self._clean_conf(gen_conf)

        if system:
            self.model._system_instruction = content_types.to_content(system)
        #Removed duplicate parameter filtering logic "for k in list(gen_conf.keys()):"
        for item in history:
            if "role" in item and item["role"] == "assistant":
                item["role"] = "model"
            if "content" in item:
                item["parts"] = item.pop("content")
        ans = ""
        try:
            response = self.model.generate_content(history, generation_config=gen_conf, stream=True)
            for resp in response:
                ans = resp.text
                yield ans

            yield response._chunks[-1].usage_metadata.total_token_count
        except Exception as e:
            yield ans + "\n**ERROR**: " + str(e)

        yield 0
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-25 16:23:35 +08:00
fd7ac17605 Feat: Scratch MCP tool calling support. (#8263)
### What problem does this PR solve?

This is a cherry-pick from #7781 as requested.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-23 17:45:35 +08:00
244d8a47b9 Fix: AzureChat model code (#8426)
### What problem does this PR solve?

- Simplify AzureChat constructor by passing base_url directly
- Clean up spacing and formatting in chat_model.py
- Remove redundant parentheses and improve code consistency
- #8423

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-23 15:59:25 +08:00
35034fed73 Fix: Raptor: [Bug]: **ERROR**: Unknown field for GenerationConfig: max_tokens (#8331)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8324

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-18 16:40:57 +08:00
b1117a8717 Fix: base url issue. (#8281)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-16 13:40:25 +08:00
d5236b71f4 Refa: ollama keep alive issue. (#8216)
### What problem does this PR solve?

#8122

### Type of change

- [x] Refactoring
2025-06-12 15:09:40 +08:00
56ee69e9d9 Refa: chat with tools. (#8210)
### What problem does this PR solve?


### Type of change
- [x] Refactoring
2025-06-12 12:31:10 +08:00
1a5f991d86 Fix: auto-keyword and auto-question fail with qwq model (#8190)
### What problem does this PR solve?

Fix auto-keyword and auto-question fail with qwq model. #8189 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 11:37:07 +08:00
69e1fc496d Refa: chat models (#8187)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2025-06-11 17:20:12 +08:00
156290f8d0 Fix: url path join issue. (#8013)
### What problem does this PR solve?

Close #7980

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-03 14:18:40 +08:00
a1f06a4fdc Feat: Support tool calling in Generate component (#7572)
### What problem does this PR solve?

Hello, our use case requires LLM agent to invoke some tools, so I made a
simple implementation here.

This PR does two things:

1. A simple plugin mechanism based on `pluginlib`:

This mechanism lives in the `plugin` directory. It will only load
plugins from `plugin/embedded_plugins` for now.

A sample plugin `bad_calculator.py` is placed in
`plugin/embedded_plugins/llm_tools`, it accepts two numbers `a` and `b`,
then give a wrong result `a + b + 100`.

In the future, it can load plugins from external location with little
code change.

Plugins are divided into different types. The only plugin type supported
in this PR is `llm_tools`, which must implement the `LLMToolPlugin`
class in the `plugin/llm_tool_plugin.py`.
More plugin types can be added in the future.

2. A tool selector in the `Generate` component:

Added a tool selector to select one or more tools for LLM:


![image](https://github.com/user-attachments/assets/74a21fdf-9333-4175-991b-43df6524c5dc)

And with the `bad_calculator` tool, it results this with the `qwen-max`
model:


![image](https://github.com/user-attachments/assets/93aff9c4-8550-414a-90a2-1a15a5249d94)


### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2025-05-16 16:32:19 +08:00
5b626870d0 Refa: remove ollama keep alive. (#7560)
### What problem does this PR solve?

#7518

### Type of change

- [x] Refactoring
2025-05-09 17:51:49 +08:00
97a13ef1ab Fix: Qwen-vl-plus url error (#7281)
### What problem does this PR solve?

Fix Qwen-vl-* url error. #7277

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-25 09:20:10 +08:00
a008b38cf5 Fix: local variable referenced before assignment (#6909)
### What problem does this PR solve?

Fix: local variable referenced before assignment. #6803 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-09 20:29:12 +08:00
dc2c74b249 Feat: add primitive support for function calls (#6840)
### What problem does this PR solve?

This PR introduces ​**​primitive support for function calls​**​,
enabling the system to handle basic function call capabilities.
However, this feature is currently experimental and ​**​not yet enabled
for general use​**​, as it is only supported by a subset of models,
namely, Qwen and OpenAI models.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-04-08 16:09:03 +08:00