Commit Graph

596 Commits

Author SHA1 Message Date
1333d3c02a Fix: float transfer exception. (#6197)
### What problem does this PR solve?

#6177

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-18 11:13:44 +08:00
7e4d693054 Fix: in case response.choices[0].message.content is None. (#6190)
### What problem does this PR solve?

#6164

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-18 10:00:27 +08:00
3a99c2b5f4 Refa: PARALLEL_DEVICES is a static parameter. (#6168)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2025-03-17 16:49:54 +08:00
fabc5e9259 Refa: fix re-rank scope. (#6152)
### What problem does this PR solve?

#6140

### Type of change


- [x] Refactoring
2025-03-17 13:26:29 +08:00
bfa8d342b3 Fix: retrieval debug mode issue. (#6150)
### What problem does this PR solve?

#6139

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-17 13:07:13 +08:00
3e19044dee Feat: add OCR's muti-gpus and parallel processing support (#5972)
### What problem does this PR solve?

Add OCR's muti-gpus and parallel processing support

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

@yuzhichang I've tried to resolve the comments in #5697. OCR jobs can
now be done on both CPU and GPU. ( By the way, I've encountered a
“Generate embedding error” issue #5954 that might be due to my outdated
GPUs? idk. ) Please review it and give me suggestions.

GPU:

![gpu_ocr](https://github.com/user-attachments/assets/0ee2ecfb-a665-4e50-8bc7-15941b9cd80e)

![smi](https://github.com/user-attachments/assets/a2312f8c-cf24-443d-bf89-bec50503546d)

CPU:

![cpu_ocr](https://github.com/user-attachments/assets/1ba6bb0b-94df-41ea-be79-790096da4bf1)
2025-03-17 11:58:40 +08:00
89a69eed72 Introduced task priority (#6118)
### What problem does this PR solve?

Introduced task priority

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-03-14 23:43:46 +08:00
e5a8b23684 Fix: empty tag field issue. (#6103)
### What problem does this PR solve?

#6102

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-14 17:35:57 +08:00
4fffee6695 Regards kb_id at ElasticSearch insert, update, delete. (#6105)
### What problem does this PR solve?

Regards kb_id at ElasticSearch insert, update, delete. Close #6066

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-14 17:34:02 +08:00
485bc7d7d6 Fix: limit the depth of DFS (#6101)
### What problem does this PR solve?

#6085

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-14 17:10:38 +08:00
5d75b6be62 Fix executor name (#6080)
### What problem does this PR solve?

Fix executor name

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-14 14:13:47 +08:00
56b228f187 Refa: remove max toekns for image2txt models. (#6078)
### What problem does this PR solve?

#6063

### Type of change


- [x] Refactoring
2025-03-14 13:51:45 +08:00
2d4a60cae6 Fix: Reduce excessive IO operations by loading LLM factory configurations (#6047)
…ions

### What problem does this PR solve?

This PR fixes an issue where the application was repeatedly reading the
llm_factories.json file from disk in multiple places, which could lead
to "Too many open files" errors under high load conditions. The fix
centralizes the file reading operation in the settings.py module and
stores the data in a global variable that can be accessed by other
modules.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):
2025-03-14 09:54:38 +08:00
4ff609b6a8 Fix: optimize OCR garbage identification to reduce unnecessary filtering (#6027)
### What problem does this PR solve?

Optimize OCR garbage identification to reduce unnecessary filtering.
#5713

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-13 18:48:32 +08:00
9c8060f619 0.17.1 release notes (#6021)
### What problem does this PR solve?



### Type of change

- [x] Documentation Update
2025-03-13 14:43:24 +08:00
e213873852 Optimize graphrag cache get entity (#6018)
### What problem does this PR solve?

Optimize graphrag cache get entity

### Type of change

- [x] Performance Improvement
2025-03-13 14:37:59 +08:00
e05cdc2f9c Fix: encode detect error. (#6006)
### What problem does this PR solve?

#5967

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-13 10:47:58 +08:00
3571270191 Refa: refine the context window size warning. (#5993)
### What problem does this PR solve?


### Type of change
- [x] Refactoring
2025-03-12 19:40:54 +08:00
7cd37c37cd Feat: add CSV file parsing support (#5989)
### What problem does this PR solve?

Add CSV file parsing support #4552, #5849, #5870

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-03-12 19:20:50 +08:00
6e13922bdc Feat: Add qwq model support to Tongyi-Qianwen factory (#5981)
### What problem does this PR solve?

add qwq model support to Tongyi-Qianwen factory
https://github.com/infiniflow/ragflow/issues/5869

### Type of change

- [x] New Feature (non-breaking change which adds functionality)


![image](https://github.com/user-attachments/assets/49f5c6a0-ecaf-41dd-a23a-2009f854d62c)


![image](https://github.com/user-attachments/assets/93ffa303-920e-4942-8188-bcd6b7209204)


![1741774779438](https://github.com/user-attachments/assets/25f2fd1d-8640-4df0-9a08-78ee9daaa8fe)


![image](https://github.com/user-attachments/assets/4763cf6c-1f76-43c4-80ee-74dfd666a184)

Co-authored-by: zhaozhicheng <zhicheng.zhao@fastonetech.com>
2025-03-12 18:54:15 +08:00
1c663b32b9 Fix:signal.SIGUSR1 and signal.SIGUSR2 can't use in window. so don't bind signal.SIGUSR1 and signal.SIGUSR2 in the windows env (#5941)
### What problem does this PR solve?
Fix:signal.SIGUSR1 and signal.SIGUSR2 can't use in window. so don't bind
signal.SIGUSR1 and signal.SIGUSR2 in the windows env

### Type of change

- [✓ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: tangyu <1@1.com>
2025-03-12 09:43:18 +08:00
caecaa7562 Feat: apply LLM to optimize citations. (#5935)
### What problem does this PR solve?

#5905

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-03-11 19:56:21 +08:00
6ec6ca6971 Refactor graphrag to remove redis lock (#5828)
### What problem does this PR solve?

Refactor graphrag to remove redis lock

### Type of change

- [x] Refactoring
2025-03-10 15:15:06 +08:00
15736c57c3 Fix: empty query issue. (#5830)
### What problem does this PR solve?

#5214

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-10 13:56:56 +08:00
b29539b442 Fix: CoHereRerank not respecting base_url when provided (#5784)
### What problem does this PR solve?

vLLM provider with a reranking model does not work : as vLLM uses under
the hood the [CoHereRerank
provider](https://github.com/infiniflow/ragflow/blob/v0.17.0/rag/llm/__init__.py#L250)
with a `base_url`, if this URL [is not passed to the Cohere
client](https://github.com/infiniflow/ragflow/blob/v0.17.0/rag/llm/rerank_model.py#L379-L382)
any attempt will endup on the Cohere SaaS (sending your private api key
in the process) instead of your vLLM instance.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-03-10 11:22:06 +08:00
2ad852d8df Fix: truncate message issue. (#5776)
### What problem does this PR solve?

Close #5761
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-07 17:41:56 +08:00
64c6cc4cf3 Fix: truncate message issue. (#5765)
### What problem does this PR solve?

Close #5761

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-07 16:33:25 +08:00
b1bbb9e210 Refa: make Rewrite component effective to relative data expression. (#5752)
### What problem does this PR solve?

#5716

### Type of change

- [x] Refactoring
2025-03-07 13:48:13 +08:00
df9b7b2fe9 Fix: rerank issue. (#5696)
### What problem does this PR solve?

#5673

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-06 15:05:19 +08:00
251ba7f058 Refa: remove max tokens since no one needs it. (#5690)
### What problem does this PR solve?

#5646 #5640

### Type of change

- [x] Refactoring
2025-03-06 11:29:40 +08:00
b8da2eeb69 Feat: support huggingface re-rank model. (#5684)
### What problem does this PR solve?

#5658

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-03-06 10:44:04 +08:00
4326873af6 refactor: no need to inherit in python3 clean the code (#5659)
### What problem does this PR solve?

As title

### Type of change


- [x] Refactoring

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-03-05 18:03:53 +08:00
e5041749a2 Fix: tavily search error. (#5653)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-05 17:03:05 +08:00
f65c3ae62b Refactored DocumentService.update_progress (#5642)
### What problem does this PR solve?

Refactored DocumentService.update_progress

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-05 14:48:03 +08:00
b0c21b00d9 Refactor: Optimize error handling and support parsing of XLS(EXCEL97—2003) files. (#5633)
Optimize error handling and support parsing of XLS(EXCEL97—2003) files.
2025-03-05 11:55:27 +08:00
76e8285904 use to_df replace to_pl when get infinity Result (#5604)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Performance Improvement

---------

Co-authored-by: wangwei <dwxiayi@163.com>
2025-03-05 09:35:40 +08:00
4d6484b03e Fix nursery.start_soon. Close #5575 (#5591)
### What problem does this PR solve?

Fix nursery.start_soon. Close #5575

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-04 14:46:54 +08:00
c813c1ff4c Made task_executor async to speedup parsing (#5530)
### What problem does this PR solve?

Made task_executor async to speedup parsing

### Type of change

- [x] Performance Improvement
2025-03-03 18:59:49 +08:00
c190086707 Fix: bad case for tokenizer. (#5543)
### What problem does this PR solve?

#5492

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-03 15:36:16 +08:00
8a2542157f Fix: possible memory leaks close #5277 (#5500)
### What problem does this PR solve?

close #5277 by make sure the file close

### Type of change

- [x] Performance Improvement

---------

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-03-03 10:26:45 +08:00
21943ce0e2 Refine error message while embedding model error, (#5490)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2025-02-28 17:52:38 +08:00
b418ce5643 Fix table parser issue. (#5482)
### What problem does this PR solve?

#1475
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-28 16:09:12 +08:00
85924e898e Fix: enhance aliyun oss access with adding prefix path (#5475)
### What problem does this PR solve?

Enhance aliyun oss access with adding prefix path.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-28 15:00:00 +08:00
622b72db4b Fix: add ctrl+c signal for better exit (#5469)
### What problem does this PR solve?

This patch add signal for ctrl + c that can exit the code friendly
cause code base use thread daemon can not exit friendly for being
started.

how to reproduce
1. docker-compose -f docker/docker-compose-base.yml up
2. other window `bash docker/launch_backend_service.sh`
3. stop 1 first
4. try to stop 2 then two thread can not exit which must use `kill pid`

This patch fix it 
and should fix most the related issues in the `issues`

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-02-28 14:52:40 +08:00
651422127c Feat: Accessing Alibaba Cloud OSS with Amazon S3 SDK (#5438)
Accessing Alibaba Cloud OSS with Amazon S3 SDK
2025-02-27 17:02:42 +08:00
4c9a3e918f Fix: add image2text issue. (#5431)
### What problem does this PR solve?

#5356

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-27 14:06:49 +08:00
fa76974e24 Fix issue of ask API. (#5400)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-26 19:45:22 +08:00
0284248c93 Fix: correct wrong vLLM rerank model (#5399)
### What problem does this PR solve?

Correct wrong vLLM rerank model #4316 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-26 18:59:36 +08:00
cdcaae17c6 Feat: add VLLM (#5380)
### What problem does this PR solve?

Read to add VLMM.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-02-26 16:04:53 +08:00
96e9d50060 Let parallism of RAPTOR controlable. (#5379)
### What problem does this PR solve?

#4874
### Type of change

- [x] Refactoring
2025-02-26 15:58:06 +08:00