Commit Graph

114 Commits

Author SHA1 Message Date
40e84ca41a Use Infinity single-field-multi-index (#11444)
### What problem does this PR solve?

Use Infinity single-field-multi-index

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2025-11-26 11:06:37 +08:00
b846a0f547 Fix: incorrect retrieval total count with pagination enabled (#11400)
### What problem does this PR solve?

Incorrect retrieval total count with pagination enabled.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-20 15:35:09 +08:00
e27ff8d3d4 Fix: rerank algorithm (#11266)
### What problem does this PR solve?

Fix: rerank algorithm #11234

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-14 13:59:54 +08:00
296476ab89 Refactor function name (#11210)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-12 19:00:15 +08:00
5629fbd2ca Fix: OpenSearch retrieval no return & Add documentation of /retrieval (#11083)
### What problem does this PR solve?

Fix: OpenSearch retrieval no return #11006
Add documentation #11072
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-11-07 09:28:42 +08:00
f98b24c9bf Move api.settings to common.settings (#11036)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-06 09:36:38 +08:00
27f0d82102 Fix: opensearch retrieval error (#10891)
### What problem does this PR solve?

Fix: opensearch retrieval error #10828

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-30 17:30:54 +08:00
766d900a41 Refactor: rename rmSpace to remove_redundant_spaces (#10796)
### What problem does this PR solve?

- rename rmSpace to remove_redundant_spaces
- move clean_markdown_block to common module
- add unit tests for remove_redundant_spaces and clean_markdown_block

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-10-28 09:46:32 +08:00
e59458c36b Fix: parsing excel with chartsheet & Clamp begin to a minimum of 0 to prevent negative indexing (#10819)
### What problem does this PR solve?

Fix: parsing excel with chartsheet #10815

Fix: Clamp begin to a minimum of 0 to prevent negative indexing #10804
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-28 09:40:37 +08:00
1d57801c0c Fix:ERROR 20 Method rag.nlp.search.Dealer.search() parameter highlight="None" violates type hint bool | list, as <class "builtins.NoneType"> "None" not list or bool. (#10743)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/10733

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-27 09:29:39 +08:00
ea73f13ebf Fix: infinity rerank error. (#10760)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-23 17:38:54 +08:00
43ea312144 Fix: search highlight. (#10616)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-16 18:45:43 +08:00
e48bec1cbf Don't rerank for infinity (#10579)
### What problem does this PR solve?

Don't need rerank for infinity since Infinity normalizes each way score
before fusion.

### Type of change

- [x] Refactoring
2025-10-15 20:15:49 +08:00
0d8791936e Feat: TOC retrieval (#10456)
### What problem does this PR solve?

#10436

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-10 17:07:55 +08:00
cbf04ee470 Feat: Use data pipeline to visualize the parsing configuration of the knowledge base (#10423)
### What problem does this PR solve?

#9869

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: jinhai <haijin.chn@gmail.com>
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: chanx <1243304602@qq.com>
Co-authored-by: balibabu <cike8899@users.noreply.github.com>
Co-authored-by: Lynn <lynn_inf@hotmail.com>
Co-authored-by: 纷繁下的无奈 <zhileihuang@126.com>
Co-authored-by: huangzl <huangzl@shinemo.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Wilmer <33392318@qq.com>
Co-authored-by: Adrian Weidig <adrianweidig@gmx.net>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: Liu An <asiro@qq.com>
Co-authored-by: buua436 <66937541+buua436@users.noreply.github.com>
Co-authored-by: BadwomanCraZY <511528396@qq.com>
Co-authored-by: cucusenok <31804608+cucusenok@users.noreply.github.com>
Co-authored-by: Russell Valentine <russ@coldstonelabs.org>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Billy Bao <newyorkupperbay@gmail.com>
Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
Co-authored-by: TensorNull <129579691+TensorNull@users.noreply.github.com>
Co-authored-by: TensorNull <tensor.null@gmail.com>
Co-authored-by: TeslaZY <TeslaZY@outlook.com>
Co-authored-by: Ajay <160579663+aybanda@users.noreply.github.com>
Co-authored-by: AB <aj@Ajays-MacBook-Air.local>
Co-authored-by: 天海蒼灆 <huangaoqin@tecpie.com>
Co-authored-by: He Wang <wanghechn@qq.com>
Co-authored-by: Atsushi Hatakeyama <atu729@icloud.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Mohamed Mathari <155896313+melmathari@users.noreply.github.com>
Co-authored-by: Mohamed Mathari <nocodeventure@Mac-mini-van-Mohamed.fritz.box>
Co-authored-by: Stephen Hu <stephenhu@seismic.com>
Co-authored-by: Shaun Zhang <zhangwfjh@users.noreply.github.com>
Co-authored-by: zhimeng123 <60221886+zhimeng123@users.noreply.github.com>
Co-authored-by: mxc <mxc@example.com>
Co-authored-by: Dominik Novotný <50611433+SgtMarmite@users.noreply.github.com>
Co-authored-by: EVGENY M <168018528+rjohny55@users.noreply.github.com>
Co-authored-by: mcoder6425 <mcoder64@gmail.com>
Co-authored-by: lemsn <lemsn@msn.com>
Co-authored-by: lemsn <lemsn@126.com>
Co-authored-by: Adrian Gora <47756404+adagora@users.noreply.github.com>
Co-authored-by: Womsxd <45663319+Womsxd@users.noreply.github.com>
Co-authored-by: FatMii <39074672+FatMii@users.noreply.github.com>
2025-10-09 12:36:19 +08:00
153e430b00 Feat: add meta data filter. (#9405)
### What problem does this PR solve?

#8531 
#7417 
#6761 
#6573
#6477

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-08-12 14:12:56 +08:00
d9fe279dde Feat: Redesign and refactor agent module (#9113)
### What problem does this PR solve?

#9082 #6365

<u> **WARNING: it's not compatible with the older version of `Agent`
module, which means that `Agent` from older versions can not work
anymore.**</u>

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-30 19:41:09 +08:00
342a04ec8a Added infinity rank_feature support (#9044)
### What problem does this PR solve?

Added infinity rank_feature support

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-29 09:14:23 +08:00
8af0d04ad0 Refactor:Improve the logic in search.py (#8716)
### What problem does this PR solve?

1. Remove the useless pop logic due to already been checked at the if
logic
2. merge log logic

### Type of change

- [x] Refactoring
2025-07-08 12:32:01 +08:00
d4e6e2bd21 Fix: doc_aggs issue. (#8418)
### What problem does this PR solve?

#8406

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-23 14:54:01 +08:00
3d0b440e9f fix(search.py):remove hard page_size (#8242)
### What problem does this PR solve?

Fix the restriction of forcing similarity_threshold=0 and page_size=30
when doc_ids is not empty

#8228

---------

Co-authored-by: shiqing.wusq <shiqing.wusq@dtzhejiang.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-13 14:56:25 +08:00
0c562f0a9f Refa: change citation mark as [ID:n] (#7923)
### What problem does this PR solve?

Change citation mark as [ID:n], it's easier for LLMs to follow the
instruction :) #7904

### Type of change

- [x] Refactoring
2025-05-29 10:03:51 +08:00
321a280031 Feat: add image preview to retrieval test. (#7610)
### What problem does this PR solve?

#7608

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-05-13 14:30:36 +08:00
573d46a4ef FIX:ZeroDivisionError when using large page_size in client.retrieve() (#7595)
### What problem does this PR solve?

Close #7592

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-13 10:46:31 +08:00
67dee2d74e Fix: fix retrieval tesing wrong pagination (#7174)
### What problem does this PR solve?

Fix retrieval testing wrong pagination. #7171 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-04-22 15:16:04 +08:00
d9266ed65a Fix: incorrect total chunks count in retrieval function after similarity filtering (#6741) (#6932)
### Related Issue:
https://github.com/infiniflow/ragflow/issues/6741

### Environment:
Using nightly version
Commit version:
[[6051abb](6051abb4a3)]

### Bug Description:
The retrieval function in rag/nlp/search.py returns the original total
chunks number
even after chunks are filtered by similarity_threshold. This creates
inconsistency
between the actual returned chunks and the reported total.

### Changes Made:
Added code to count how many search results actually meet or exceed the
configured similarity threshold
Positioned the calculation after the doc_ids conditional logic to ensure
special cases are handled correctly
Updated the ranks["total"] value to store this filtered count instead of
using the raw search result count
Using NumPy leverages optimized C-level batch operations to optimize
speed
2025-04-11 12:31:36 +08:00
0758c04941 Refa: token similarity calculations. (#6614)
### What problem does this PR solve?

#6507

### Type of change

- [x] Performance Improvement
2025-03-28 09:33:08 +08:00
cc8029a732 Fix: uploading in chat box issue. (#6547)
### What problem does this PR solve?

#6228

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-26 15:37:48 +08:00
ee5aa51d43 Fix: point in tag issue. (#6436)
### What problem does this PR solve?

#6414

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-24 10:45:29 +08:00
6e8d0e3177 Fix: rank feat issue. (#6225)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-18 16:07:29 +08:00
1333d3c02a Fix: float transfer exception. (#6197)
### What problem does this PR solve?

#6177

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-18 11:13:44 +08:00
fabc5e9259 Refa: fix re-rank scope. (#6152)
### What problem does this PR solve?

#6140

### Type of change


- [x] Refactoring
2025-03-17 13:26:29 +08:00
e5a8b23684 Fix: empty tag field issue. (#6103)
### What problem does this PR solve?

#6102

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-14 17:35:57 +08:00
7b3d700d5f Apply agentic searching. (#5196)
### What problem does this PR solve?

#5173

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-02-20 17:41:01 +08:00
29a59ed7e2 Fix: Use self.dataStore.indexExist in all_tags method of Dealer (#5108)
### What problem does this PR solve?

This PR fixes an AttributeError in the all_tags method of the Dealer
class. Previously, the method incorrectly called
self.docStoreConn.indexExist instead of self.dataStore.indexExist. Since
self.docStoreConn was never set (and self.dataStore is already
initialized in init), this resulted in an error when attempting to check
if the index exists. This change ensures that the proper connector is
used for the index existence check, thereby resolving the issue._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-19 11:50:57 +08:00
9ff825f39d Ignore exceptions when no index ahead. (#5047)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2025-02-18 09:09:22 +08:00
9bcccadebd Remove use of eval() from search.py (#4887)
Use `json.loads()` instead.

### What problem does this PR solve?

Using `eval()` can lead to code injections. I think this loads a JSON
field, right? If yes, why is this done via `eval()` and not
`json.loads()`?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-12 13:15:38 +08:00
448fa1c4d4 Robust for abnormal response from LLMs. (#4747)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-06 17:34:53 +08:00
4011c8f68c Fix potential error. (#4650)
### What problem does this PR solve?
#4622

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-26 12:38:32 +08:00
86892959a0 Rebuild graph when it's out of time. (#4607)
### What problem does this PR solve?

#4543

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-01-23 17:26:20 +08:00
dd0ebbea35 Light GraphRAG (#4585)
### What problem does this PR solve?

#4543

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-01-22 19:43:14 +08:00
c5da3cdd97 Tagging (#4426)
### What problem does this PR solve?

#4367

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-01-09 17:07:21 +08:00
d9a4e4cc3b Fix page size error. (#4401)
### What problem does this PR solve?

#4400

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-07 19:06:31 +08:00
321e9f3719 fix: stop rerank by model when search result is empty (#4203)
### What problem does this PR solve?


stop rerank by model when search result is empty, otherwise rerank may
raise an error (qwen).

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: 刘博 <liubo@ynby.cn>
2024-12-24 14:33:46 +08:00
c373dba0bc Fix raptor bug. (#4192)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-23 18:59:48 +08:00
31d67c850e Fetch chunk by batches. (#4177)
### What problem does this PR solve?

#4173

### Type of change

- [x] Performance Improvement
2024-12-23 12:12:15 +08:00
000cd6d615 Fix position lost issue. (#4068)
### What problem does this PR solve?

#4040

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-17 16:31:58 +08:00
03f00c9e6f Rename page_num_list, top_list, position_list (#3940)
### What problem does this PR solve?

Rename page_num_list, top_list, position_list to page_num_int, top_int,
position_int

### Type of change

- [x] Refactoring
2024-12-10 16:32:58 +08:00
56f473b680 Feat: Add question parameter to edit chunk modal (#3875)
### What problem does this PR solve?

Close #3873

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-12-05 14:51:19 +08:00
74b28ef1b0 Add pagerank to KB. (#3809)
### What problem does this PR solve?

#3794

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-12-03 14:30:35 +08:00