Commit Graph

511 Commits

Author SHA1 Message Date
4776fa5e4e Refactor for total_tokens. (#4652)
### What problem does this PR solve?

#4567
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-26 13:54:26 +08:00
c24137bd11 Fix too long integer for Table. (#4651)
### What problem does this PR solve?

#4594

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-26 12:54:58 +08:00
4011c8f68c Fix potential error. (#4650)
### What problem does this PR solve?
#4622

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-26 12:38:32 +08:00
2cb8edc42c Added GPUStack (#4649)
### What problem does this PR solve?



### Type of change


- [x] Documentation Update
2025-01-26 12:25:02 +08:00
530b0dab17 Make infinity able to cal embedding sim only. (#4644)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-26 10:29:52 +08:00
71c132f76d Make infinity adapt (#4635)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-24 17:45:04 +08:00
9d717f0b6e Fix csv reader exception. (#4628)
### What problem does this PR solve?

#4552
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-24 14:47:19 +08:00
f1d9f4290e Fix TogetherAIEmbed. (#4623)
### What problem does this PR solve?

#4567

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-24 10:29:30 +08:00
55f2b7c4d5 Code format. (#4611)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2025-01-23 18:43:32 +08:00
86892959a0 Rebuild graph when it's out of time. (#4607)
### What problem does this PR solve?

#4543

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-01-23 17:26:20 +08:00
13f04b7cca Fix pdf applying Q&A issue. (#4599)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-23 12:30:46 +08:00
dd0ebbea35 Light GraphRAG (#4585)
### What problem does this PR solve?

#4543

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-01-22 19:43:14 +08:00
3894de895b Update comments (#4569)
### What problem does this PR solve?

Add license statement.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-01-21 20:52:28 +08:00
3805621564 Fix xinference rerank issue. (#4499)
### What problem does this PR solve?
#4495
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-16 11:35:51 +08:00
c852a6dfbf Accelerate titles' embeddings. (#4492)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2025-01-15 15:20:29 +08:00
be5f830878 Truncate text for zhipu embedding. (#4490)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-15 14:36:27 +08:00
7944aacafa Feat: add gpustack model provider (#4469)
### What problem does this PR solve?

Add GPUStack as a new model provider.
[GPUStack](https://github.com/gpustack/gpustack) is an open-source GPU
cluster manager for running LLMs. Currently, locally deployed models in
GPUStack cannot integrate well with RAGFlow. GPUStack provides both
OpenAI compatible APIs (Models / Chat Completions / Embeddings /
Speech2Text / TTS) and other APIs like Rerank. We would like to use
GPUStack as a model provider in ragflow.

[GPUStack Docs](https://docs.gpustack.ai/latest/quickstart/)

Related issue: https://github.com/infiniflow/ragflow/issues/4064.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)



### Testing Instructions
1. Install GPUStack and deploy the `llama-3.2-1b-instruct` llm, `bge-m3`
text embedding model, `bge-reranker-v2-m3` rerank model,
`faster-whisper-medium` Speech-to-Text model, `cosyvoice-300m-sft` in
GPUStack.
2. Add provider in ragflow settings.
3. Testing in ragflow.
2025-01-15 14:15:58 +08:00
e478586a8e Refactor. (#4487)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2025-01-15 14:06:46 +08:00
f556f0239c Fix dify retrieval issue. (#4473)
### What problem does this PR solve?

#4464
#4469 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-14 13:16:05 +08:00
fd0bf3adf0 Format: dos2unix (#4467)
### What problem does this PR solve?

Format the code

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-01-13 18:19:01 +08:00
e098fcf6ad Fix csv for TAG. (#4454)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-13 12:03:18 +08:00
4dde73f897 Error message: Infinity not support table parsing method (#4439)
### What problem does this PR solve?

Specific error message.

### Type of change

- [x] Refactoring

Signed-off-by: jinhai <haijin.chn@gmail.com>
2025-01-10 16:39:13 +08:00
c5da3cdd97 Tagging (#4426)
### What problem does this PR solve?

#4367

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-01-09 17:07:21 +08:00
d64df4de9c Update error message (#4417)
### What problem does this PR solve?

1. Update error message
2. Remove space characters

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-01-08 20:18:27 +08:00
d9a4e4cc3b Fix page size error. (#4401)
### What problem does this PR solve?

#4400

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-07 19:06:31 +08:00
b93c136797 Fix gemini embedding error. (#4356)
### What problem does this PR solve?

#4314

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-06 14:41:29 +08:00
bad764bcda Improve storage engine (#4341)
### What problem does this PR solve?

- Bring `STORAGE_IMPL` back in `rag/svr/cache_file_svr.py`
- Simplify storage connection when working with AWS S3

### Type of change

- [x] Refactoring
2025-01-06 12:06:24 +08:00
50f209204e Synchronize with enterprise version (#4325)
### Type of change

- [x] Refactoring
2025-01-02 13:44:44 +08:00
0e5124ec99 Show the errors out. (#4305)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2024-12-31 15:32:02 +08:00
4ba4f622a5 Refactor (#4303)
### What problem does this PR solve?

### Type of change
- [x] Refactoring
2024-12-31 14:31:31 +08:00
8fb18f37f6 Code refactor. (#4291)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-12-30 18:38:51 +08:00
dd13a5d05c Fix some bugs in text2sql.(#4279)(#4281) (#4280)
Fix some bugs in text2sql.(#4279)(#4281)

### What problem does this PR solve?
- The incorrect results in parsing CSV files of the QA knowledge base in
the text2sql scenario. Process CSV files using the csv library. Decouple
CSV parsing from TXT parsing
- Most llm return results in markdown format ```sql query ```, Fix
execution error caused by LLM output SQLmarkdown format.### Type of
change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-30 10:32:19 +08:00
f948c0d9f1 Clean query. (#4259)
### What problem does this PR solve?

#4239

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-27 14:25:03 +08:00
722545e5e0 Fix bugs (#4241)
### What problem does this PR solve?

1. Refactor error message
2. Fix knowledges are created on ES and can't be found in Infinity. The
document chunk fetch error.

### Type of change

- [x] Fix bug
- [x] Refactoring

---------

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-12-26 16:08:17 +08:00
7e063283ba Removing invisible chars before tokenization. (#4233)
### What problem does this PR solve?

#4223

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-26 11:48:16 +08:00
b7a7413419 Bump infinity to 0.5.2 (#4207)
### What problem does this PR solve?

Bump infinity to 0.5.2

### Type of change

- [x] Refactoring
2024-12-24 15:17:37 +08:00
321e9f3719 fix: stop rerank by model when search result is empty (#4203)
### What problem does this PR solve?


stop rerank by model when search result is empty, otherwise rerank may
raise an error (qwen).

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: 刘博 <liubo@ynby.cn>
2024-12-24 14:33:46 +08:00
d030b4a680 Update progress time info (#4193)
### What problem does this PR solve?

Ignore the millisecond and microsecond value.

### Type of change

- [x] Refactoring

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-12-23 21:04:44 +08:00
a9fd6066d2 Fix score() issue (#4194)
### What problem does this PR solve?

as title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-12-23 21:01:20 +08:00
c373dba0bc Fix raptor bug. (#4192)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-23 18:59:48 +08:00
8d73cf6f02 Added time to progress message (#4185)
### What problem does this PR solve?

Added time to progress message

### Type of change

- [x] Refactoring
2024-12-23 17:25:55 +08:00
4abc144d3d Fix error of changing embedding model (#4184)
### What problem does this PR solve?

1. Change embedding model of knowledge base won't change the default
embedding model.
2. Retrieval test bug

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-12-23 16:23:54 +08:00
8f070c3d56 Fix 'SCORE' not found bug (#4178)
### What problem does this PR solve?

As title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-12-23 14:50:12 +08:00
31d67c850e Fetch chunk by batches. (#4177)
### What problem does this PR solve?

#4173

### Type of change

- [x] Performance Improvement
2024-12-23 12:12:15 +08:00
2cbe064080 Add Llama3.3 (#4174)
### What problem does this PR solve?

#4168

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-12-23 11:18:01 +08:00
f13f503952 Use s3 configuration from settings module (#4167)
### What problem does this PR solve?

Fix the issue when retrieving AWS credentials from the S3 configuration
from the settings module instead of getting from the environment
variables.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-12-23 10:22:45 +08:00
cb45431412 Fix Voyage re-rank model. Limit file name length. (#4171)
### What problem does this PR solve?

#4152 
#4154

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-23 10:03:50 +08:00
85083ad400 Validate returned chunk at list_chunks and add_chunk (#4153)
### What problem does this PR solve?

Validate returned chunk at list_chunks and add_chunk

### Type of change

- [x] Refactoring
2024-12-20 22:55:45 +08:00
a0dc9e1bdf Fix position_int on infinity (#4144)
### What problem does this PR solve?

Fix position_int on infinity

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-20 11:30:33 +08:00
101b8ff813 fix chunk method "Table" losing content when the Excel file has multi… (#4123)
…ple sheets

### What problem does this PR solve?
discussed in https://github.com/infiniflow/ragflow/pull/4102
- In excel_parser.py, `total` means the total number of rows in Excel,
but it return in the first iterate, that lead to the wrong `to_page`
- In table.py, it when Excel file has multiple sheets, it will be
divided into multiple parts, every part size is 3000, `data` may be
empty, because it has recorded in the last iterate.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-19 17:30:26 +08:00