Commit Graph

112 Commits

Author SHA1 Message Date
5fa6f2f151 Update embedding_model.py (#8836)
### What problem does this PR solve?

Remove useless covert for bge encode_queries

### Type of change

- [x] Performance Improvement
2025-07-15 14:04:58 +08:00
5383e254c4 Perf:Remove Useless Convert When BGE Embedding (#8816)
### What problem does this PR solve?

FlagModel internal support returns as numpy

### Type of change
- [x] Performance Improvement
2025-07-14 14:02:48 +08:00
8d027813f5 Refactor: Improve How To Handle QWenEmbed (#8765)
### What problem does this PR solve?

Based on https://github.com/infiniflow/ragflow/issues/8740 
1. A better handle for 'NoneType' object is not subscriptable
2. Add some logs to get the internal message

### Type of change

- [x] Refactoring
2025-07-10 10:30:18 +08:00
19419281c3 Fix: Change Ollama Embedding Keep Alive (#8734)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/8733

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-09 12:17:26 +08:00
e60ec0a31b Fix:disallowed special token while embedding (#8692)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8567

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-07 14:13:37 +08:00
9580e99650 fix: retry embedding with Qwen family models when limits temporarily reached. (#8690)
fix: retry embedding with Qwen family models when limits temporarily
reached.

APIs of Qwen family models are limited by calling rates. When reached,
the "output" attribute of the "resp" will be None, and in turn cause
TypeError when trying to retrieve "embeddings". Since these limits are
almost temporary, I have added a simple retry mechanism to avoid it.
Besides, if retry_max reached, the error can be early raised, instead of
hidden behind "TypeError".

### What problem does this PR solve?

Sometimes Qwen blocks calling due to rate limits, but it will cause the
whole parsing procedure stops when creating knowledge base. In this
situation, resp["output"] will be None, and resp["output"]["embeddings"]
will cause TypeError. Since the limits are temporary, I apply a simple
retry mechanism to solve it.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-07-07 12:15:52 +08:00
f8a6987f1e Refa: automatic LLMs registration (#8651)
### What problem does this PR solve?

Support automatic LLMs registration.

### Type of change

- [x] Refactoring
2025-07-03 19:05:31 +08:00
d46c24045f Feat: add GiteeAI as a llm provider. (#8572)
### What problem does this PR solve?

#1853

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-30 11:22:11 +08:00
aafeffa292 Feat: add gitee as LLM provider. (#8545)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-30 09:22:31 +08:00
49d67cbcb7 fix a bug when using huggingface embedding api (#8432)
### What problem does this PR solve?

image_version: v0.19.1
This PR fixes a bug in the HuggingFaceEmBedding API method that was
causing AssertionError: assert len(vects) == len(docs) during the
document embedding process.

#### Problem
The HuggingFaceEmbed.encode() method had an early return statement
inside the for loop, causing it to return after processing only the
first text input instead of processing all texts in the input list.

**Error Messenge**
```python
AssertionError: assert len(vects) == len(docs) # input chunks  != embedded  vectors from embedding api
File "/ragflow/rag/svr/task_executor.py", line 442, in embedding
```



**Buggy code(/ragflow/rag/llm/embedding_model.py)**
```python
class HuggingFaceEmbed(Base):
    def __init__(self, key, model_name, base_url=None):
        if not model_name:
            raise ValueError("Model name cannot be None")
        self.key = key
        self.model_name = model_name.split("___")[0]
        self.base_url = base_url or "http://127.0.0.1:8080"
        def encode(self, texts: list):
            embeddings = []
            for text in texts:
                response = requests.post(...)
                if response.status_code == 200:
                    try:
                        embedding = response.json()
                        embeddings.append(embedding[0])
                        #  Early return
                        return np.array(embeddings), sum([num_tokens_from_string(text) for text in texts]) 
                    except Exception as _e:
                        log_exception(_e, response)
                else:
                    raise Exception(...)
```
**Fixed Code(I just Rollback this function to the v0.19.0 version)**
```python
Class HuggingFaceEmbed(Base):
    def __init__(self, key, model_name, base_url=None):
        if not model_name:
            raise ValueError("Model name cannot be None")
        self.key = key
        self.model_name = model_name.split("___")[0]
        self.base_url = base_url or "http://127.0.0.1:8080"
        def encode(self, texts: list):
            embeddings = []
            for text in texts:
                response = requests.post(...)
                if response.status_code == 200:
                    embedding = response.json()
                    embeddings.append(embedding[0])  #  Only append, no return
                else:
                    raise Exception(...)
            return np.array(embeddings), sum([num_tokens_from_string(text) for text in texts])  #  Return after processing all
```
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-24 09:35:02 +08:00
ef5e7d8c44 Fix:embedding_model class SILICONFLOWEmbed(Base)Function reusing json (#8378)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8360

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-20 11:13:00 +08:00
65d5268439 Feat: implement novitaAI embedding and reranking. (#8250)
### What problem does this PR solve?

Close #8227

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-13 15:42:17 +08:00
d36c8d18b1 Refa: make exception more clear. (#8224)
### What problem does this PR solve?

#8156

### Type of change
- [x] Refactoring
2025-06-12 17:53:59 +08:00
a43adafc6b Refa: Add error handling for JSON decode in embedding models (#8162)
### What problem does this PR solve?

Improve robustness of Jina, Nvidia, and SILICONFLOW embedding models by:
1. Adding try-catch blocks for JSON decode errors
2. Logging error details including response content
3. Raising exceptions with meaningful error messages

### Type of change

- [x] Refactoring
2025-06-10 19:04:17 +08:00
156290f8d0 Fix: url path join issue. (#8013)
### What problem does this PR solve?

Close #7980

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-03 14:18:40 +08:00
65537b8200 Fix:Set CUDA_VISIBLE_DEVICES In DefaultEmbedding (#7465)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/7420

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-06 14:38:36 +08:00
46b5e32cd7 Feat: support vision llm for gpustack (#6636)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/6138

This PR is going to support vision llm for gpustack, modify url path
from `/v1-openai` to `/v1`

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-03-31 15:33:52 +08:00
b77ce4e846 Feat: support api-key for Ollama. (#6448)
### What problem does this PR solve?

#6189

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-24 14:53:17 +08:00
85480f6292 Fix: the error of Ollama embeddings interface returning "500 Internal Server Error" (#6350)
### What problem does this PR solve?

Fix the error where the Ollama embeddings interface returns a “500
Internal Server Error” when using models such as xiaobu-embedding-v2 for
embedding.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-21 15:25:48 +08:00
4f2816c01c Add support to boto3 default connection (#5246)
### What problem does this PR solve?
 
This pull request includes changes to the initialization logic of the
`ChatModel` and `EmbeddingModel` classes to enhance the handling of AWS
credentials.

Use cases:
- Use env variables for credentials instead of managing them on the DB 
- Easy connection when deploying on an AWS machine

### Type of change

- [X] New Feature (non-breaking change which adds functionality)
2025-02-24 11:01:14 +08:00
4776fa5e4e Refactor for total_tokens. (#4652)
### What problem does this PR solve?

#4567
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-26 13:54:26 +08:00
f1d9f4290e Fix TogetherAIEmbed. (#4623)
### What problem does this PR solve?

#4567

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-24 10:29:30 +08:00
be5f830878 Truncate text for zhipu embedding. (#4490)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-15 14:36:27 +08:00
7944aacafa Feat: add gpustack model provider (#4469)
### What problem does this PR solve?

Add GPUStack as a new model provider.
[GPUStack](https://github.com/gpustack/gpustack) is an open-source GPU
cluster manager for running LLMs. Currently, locally deployed models in
GPUStack cannot integrate well with RAGFlow. GPUStack provides both
OpenAI compatible APIs (Models / Chat Completions / Embeddings /
Speech2Text / TTS) and other APIs like Rerank. We would like to use
GPUStack as a model provider in ragflow.

[GPUStack Docs](https://docs.gpustack.ai/latest/quickstart/)

Related issue: https://github.com/infiniflow/ragflow/issues/4064.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)



### Testing Instructions
1. Install GPUStack and deploy the `llama-3.2-1b-instruct` llm, `bge-m3`
text embedding model, `bge-reranker-v2-m3` rerank model,
`faster-whisper-medium` Speech-to-Text model, `cosyvoice-300m-sft` in
GPUStack.
2. Add provider in ragflow settings.
3. Testing in ragflow.
2025-01-15 14:15:58 +08:00
b93c136797 Fix gemini embedding error. (#4356)
### What problem does this PR solve?

#4314

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-06 14:41:29 +08:00
4abc144d3d Fix error of changing embedding model (#4184)
### What problem does this PR solve?

1. Change embedding model of knowledge base won't change the default
embedding model.
2. Retrieval test bug

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-12-23 16:23:54 +08:00
d8fca43017 Make fast embed and default embed mutually exclusive. (#4121)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2024-12-19 17:27:09 +08:00
7474348394 Fix fastembed reloading issue. (#4117)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-19 16:18:18 +08:00
593ffc4067 Fix HuggingFace model error. (#3870)
### What problem does this PR solve?

#3865

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-05 13:28:42 +08:00
92ab7ef659 Refactor embedding batch_size (#3825)
### What problem does this PR solve?

Refactor embedding batch_size. Close #3657

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2024-12-03 16:22:39 +08:00
6a0583f5ad Fix voyage embedding. (#3818)
### What problem does this PR solve?

#3816 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-03 09:33:54 +08:00
d19f059f34 Detect invalid response from api.siliconflow.cn (#3792)
### What problem does this PR solve?

Detect invalid response from api.siliconflow.cn. Close #2643

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-02 12:55:05 +08:00
59a5813f1b add jina new models in jina connector (#3770)
### What problem does this PR solve?

add new models in jinna connector, to allow use models that support
multilingual models

### Type of change

- [X] Other (please describe): new connectors no breaking change
2024-12-02 10:06:39 +08:00
57208d8e53 Fix batch size issue. (#3675)
### What problem does this PR solve?

#3657

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-11-27 18:06:43 +08:00
8b35776916 Fix a bug in VolcEngine (#3658)
### What problem does this PR solve?

Fix a bug in VolcEngine  #3553

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
2024-11-27 09:30:49 +08:00
e5af18d5ea Update docs for v0.14.0 (#3625)
### What problem does this PR solve?


### Type of change

- [x] Documentation Update
2024-11-25 11:37:56 +08:00
d42362deb6 Add api for sessions and add max_tokens for tenant_llm (#3472)
### What problem does this PR solve?

Add api for sessions and add max_tokens for tenant_llm

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
2024-11-19 14:51:33 +08:00
4413683898 Introduced beartype (#3460)
### What problem does this PR solve?

Introduced [beartype](https://github.com/beartype/beartype) for runtime
type-checking.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-11-18 17:38:17 +08:00
1e90a1bf36 Move settings initialization after module init phase (#3438)
### What problem does this PR solve?

1. Module init won't connect database any more.
2. Config in settings need to be used with settings.CONFIG_NAME

### Type of change

- [x] Refactoring

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-11-15 17:30:56 +08:00
30f6421760 Use consistent log file names, introduced initLogger (#3403)
### What problem does this PR solve?

Use consistent log file names, introduced initLogger

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2024-11-14 17:13:48 +08:00
fa54cd5f5c exstract model dir from model‘s full name (#3368)
### What problem does this PR solve?

When model’s group name contains 0-9,we can't find downloaded
model,because we do not correctly exstract model dir's name from model‘s
full name

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: 王志鹏 <zhipeng3.wang@midea.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-11-13 14:10:16 +08:00
a2a5631da4 Rework logging (#3358)
Unified all log files into one.

### What problem does this PR solve?

Unified all log files into one.

### Type of change

- [x] Refactoring
2024-11-12 17:35:13 +08:00
0dff64f6ad fix: TypeError: only length-1 arrays can be converted to Python scalars (#3211)
### What problem does this PR solve?
fix "TypeError: only length-1 arrays can be converted to Python scalars"
while using cohere embedding model.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)


![image](https://github.com/user-attachments/assets/2c21a69f-cd76-4d25-b320-058964812db8)
2024-11-06 11:15:00 +08:00
4991107822 Fix keys of Xinference deployed models, especially has the same model name with public hosted models. (#2832)
### What problem does this PR solve?

Fix keys of Xinference deployed models, especially has the same model
name with public hosted models.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: 0000sir <0000sir@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-10-16 10:21:08 +08:00
18f80743eb support api-version and change default-model in adding azure-openai and openai (#2799)
### What problem does this PR solve?
#2701 #2712 #2749

### Type of change
-[x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-10-11 11:26:42 +08:00
7f44cf543a move import positions (#2753)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-10-09 10:34:58 +08:00
34761fa4ca Fix/bedrock issues (#2718)
### What problem does this PR solve?

Adding a Bedrock API key for Claude Sonnet was broken. I find the issue
came up when trying to test the LLM configuration, the system is a
required parameter in boto3.

As well, there were problems in Bedrock implementation for embeddings
when trying to encode queries.

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
2024-10-05 16:44:50 +08:00
96f56a3c43 add huggingface model (#2624)
### What problem does this PR solve?

#2469

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-09-27 19:15:38 +08:00
dda1367ab2 make it lighten (#2577)
### What problem does this PR solve?

#2295

### Type of change

- [x] Refactoring
2024-09-25 13:38:40 +08:00
7bb28ca2bd add lighten control (#2567)
### What problem does this PR solve?

#2295

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2024-09-24 19:22:01 +08:00