Commit Graph

232 Commits

Author SHA1 Message Date
982ed233a2 Fix: doc_aggs not correctly returned when no chunks retrieved. (#11578)
### What problem does this PR solve?

Fix: doc_aggs not correctly returned when no chunks retrieved.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-28 13:09:05 +08:00
9d8b96c1d0 Feat: add context for figure and table (#11547)
### What problem does this PR solve?

Add context for figure table.



![demo_figure_table_context](https://github.com/user-attachments/assets/61b37fac-e22e-40a4-9665-9396c7b4103e)


`==================()` for demonstrating purpose. 
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-27 10:21:44 +08:00
40e84ca41a Use Infinity single-field-multi-index (#11444)
### What problem does this PR solve?

Use Infinity single-field-multi-index

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2025-11-26 11:06:37 +08:00
74e0b58d89 Fix: excel default optimization. (#11519)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-25 19:54:20 +08:00
db0f6840d9 Feat: ignore chunk size when using custom delimiters (#11434)
### What problem does this PR solve?

Ignore chunk size when using custom delimiter.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-21 14:36:26 +08:00
b846a0f547 Fix: incorrect retrieval total count with pagination enabled (#11400)
### What problem does this PR solve?

Incorrect retrieval total count with pagination enabled.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-20 15:35:09 +08:00
38234aca53 feat: add OceanBase doc engine (#11228)
### What problem does this PR solve?

Add OceanBase doc engine. Close #5350

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-20 10:00:14 +08:00
7264fb6978 Fix: concat images in word document. (#11310)
### What problem does this PR solve?

Fix: concat images in word document. Partially solved issues in #11063 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-17 19:38:26 +08:00
bd4bc57009 Refactor: move mcp connection utilities to common (#11304)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-17 15:34:17 +08:00
e27ff8d3d4 Fix: rerank algorithm (#11266)
### What problem does this PR solve?

Fix: rerank algorithm #11234

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-14 13:59:54 +08:00
93422fa8cc Fix: Law parser (#11246)
### What problem does this PR solve?

Fix: Law parser
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-13 15:19:02 +08:00
296476ab89 Refactor function name (#11210)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-12 19:00:15 +08:00
5629fbd2ca Fix: OpenSearch retrieval no return & Add documentation of /retrieval (#11083)
### What problem does this PR solve?

Fix: OpenSearch retrieval no return #11006
Add documentation #11072
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-11-07 09:28:42 +08:00
f98b24c9bf Move api.settings to common.settings (#11036)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-06 09:36:38 +08:00
360f5c1179 Move token related functions to common (#10942)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-03 08:50:05 +08:00
44f2d6f5da Move 'get_project_base_directory' to common directory (#10940)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-02 21:05:28 +08:00
27f0d82102 Fix: opensearch retrieval error (#10891)
### What problem does this PR solve?

Fix: opensearch retrieval error #10828

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-30 17:30:54 +08:00
766d900a41 Refactor: rename rmSpace to remove_redundant_spaces (#10796)
### What problem does this PR solve?

- rename rmSpace to remove_redundant_spaces
- move clean_markdown_block to common module
- add unit tests for remove_redundant_spaces and clean_markdown_block

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-10-28 09:46:32 +08:00
e59458c36b Fix: parsing excel with chartsheet & Clamp begin to a minimum of 0 to prevent negative indexing (#10819)
### What problem does this PR solve?

Fix: parsing excel with chartsheet #10815

Fix: Clamp begin to a minimum of 0 to prevent negative indexing #10804
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-28 09:40:37 +08:00
501b7d4d01 Fix: prio synonym match than wordnet for english (#10762)
### What problem does this PR solve?

Fix: prio synonym match than wordnet for english

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-27 09:32:55 +08:00
1d57801c0c Fix:ERROR 20 Method rag.nlp.search.Dealer.search() parameter highlight="None" violates type hint bool | list, as <class "builtins.NoneType"> "None" not list or bool. (#10743)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/10733

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-27 09:29:39 +08:00
ea73f13ebf Fix: infinity rerank error. (#10760)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-23 17:38:54 +08:00
863c3e3d9c Fix: tree merge (#10691)
### What problem does this PR solve?

Fix: Fix tree merge, solved #10636

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-21 13:02:01 +08:00
43ea312144 Fix: search highlight. (#10616)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-16 18:45:43 +08:00
e48bec1cbf Don't rerank for infinity (#10579)
### What problem does this PR solve?

Don't need rerank for infinity since Infinity normalizes each way score
before fusion.

### Type of change

- [x] Refactoring
2025-10-15 20:15:49 +08:00
7d2f65671f Feat: debugging toc part. (#10486)
### What problem does this PR solve?

#10436

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-11 18:45:21 +08:00
0d8791936e Feat: TOC retrieval (#10456)
### What problem does this PR solve?

#10436

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-10 17:07:55 +08:00
cbf04ee470 Feat: Use data pipeline to visualize the parsing configuration of the knowledge base (#10423)
### What problem does this PR solve?

#9869

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: jinhai <haijin.chn@gmail.com>
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: chanx <1243304602@qq.com>
Co-authored-by: balibabu <cike8899@users.noreply.github.com>
Co-authored-by: Lynn <lynn_inf@hotmail.com>
Co-authored-by: 纷繁下的无奈 <zhileihuang@126.com>
Co-authored-by: huangzl <huangzl@shinemo.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Wilmer <33392318@qq.com>
Co-authored-by: Adrian Weidig <adrianweidig@gmx.net>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: Liu An <asiro@qq.com>
Co-authored-by: buua436 <66937541+buua436@users.noreply.github.com>
Co-authored-by: BadwomanCraZY <511528396@qq.com>
Co-authored-by: cucusenok <31804608+cucusenok@users.noreply.github.com>
Co-authored-by: Russell Valentine <russ@coldstonelabs.org>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Billy Bao <newyorkupperbay@gmail.com>
Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
Co-authored-by: TensorNull <129579691+TensorNull@users.noreply.github.com>
Co-authored-by: TensorNull <tensor.null@gmail.com>
Co-authored-by: TeslaZY <TeslaZY@outlook.com>
Co-authored-by: Ajay <160579663+aybanda@users.noreply.github.com>
Co-authored-by: AB <aj@Ajays-MacBook-Air.local>
Co-authored-by: 天海蒼灆 <huangaoqin@tecpie.com>
Co-authored-by: He Wang <wanghechn@qq.com>
Co-authored-by: Atsushi Hatakeyama <atu729@icloud.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Mohamed Mathari <155896313+melmathari@users.noreply.github.com>
Co-authored-by: Mohamed Mathari <nocodeventure@Mac-mini-van-Mohamed.fritz.box>
Co-authored-by: Stephen Hu <stephenhu@seismic.com>
Co-authored-by: Shaun Zhang <zhangwfjh@users.noreply.github.com>
Co-authored-by: zhimeng123 <60221886+zhimeng123@users.noreply.github.com>
Co-authored-by: mxc <mxc@example.com>
Co-authored-by: Dominik Novotný <50611433+SgtMarmite@users.noreply.github.com>
Co-authored-by: EVGENY M <168018528+rjohny55@users.noreply.github.com>
Co-authored-by: mcoder6425 <mcoder64@gmail.com>
Co-authored-by: lemsn <lemsn@msn.com>
Co-authored-by: lemsn <lemsn@126.com>
Co-authored-by: Adrian Gora <47756404+adagora@users.noreply.github.com>
Co-authored-by: Womsxd <45663319+Womsxd@users.noreply.github.com>
Co-authored-by: FatMii <39074672+FatMii@users.noreply.github.com>
2025-10-09 12:36:19 +08:00
9e323a9351 Feat(nlp): add "怎么办" pattern to question word removal (#10284)
### What problem does this PR solve?

Added "怎么办" to the regex pattern in rmWWW method to improve query
cleaning by removing this common question phrase along with other
question words.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-25 16:47:56 +08:00
ca9f30e1a1 Add tree_merge for law parsers, significantly outperforming hierarchical_merge (#10202)
### What problem does this PR solve?
Add tree_merge for law parsers, significantly outperforming
hierarchical_merge, solved: #8637
1. Add tree_merge for law parsers, include build_tree and get_tree by
dfs.
2. add Copyright statement for helath_utils
### Type of change

- [x] Documentation Update
- [x] Performance Improvement
2025-09-22 16:33:21 +08:00
0d9c1f1c3c Feat: dataflow supports Spreadsheet and Word processor document (#9996)
### What problem does this PR solve?

Dataflow supports Spreadsheet and Word processor document

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-10 13:02:53 +08:00
e9ee9269f5 Feat: user defined prompt. (#9972)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-08 14:05:01 +08:00
4e16936fa4 Refactor: Use re compile for weight method (#9929)
### What problem does this PR solve?

Use re compile for the weight method

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2025-09-05 12:29:44 +08:00
c27172b3bc Feat: init dataflow. (#9791)
### What problem does this PR solve?

#9790

Close #9782

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-08-28 18:40:32 +08:00
5abd0bbac1 Fix typo (#9766)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-08-27 18:56:40 +08:00
b5b8032a56 Feat: Support metadata auto filer for Search. (#9524)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-08-19 10:27:24 +08:00
153e430b00 Feat: add meta data filter. (#9405)
### What problem does this PR solve?

#8531 
#7417 
#6761 
#6573
#6477

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-08-12 14:12:56 +08:00
0a0bfc02a0 Refactor:naive_merge_with_images close useless images (#9296)
### What problem does this PR solve?

naive_merge_with_images close useless images

### Type of change

- [x] Refactoring
2025-08-07 11:07:29 +08:00
7efeaf6548 Fix:remove a img close which can not operate (#9267)
### What problem does this PR solve?


https://github.com/infiniflow/ragflow/issues/9149#issuecomment-3157129587

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-06 10:59:49 +08:00
667c5812d0 Fix:Repeated images when parsing markdown files with images (#9196)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/9149

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-04 13:35:58 +08:00
d9fe279dde Feat: Redesign and refactor agent module (#9113)
### What problem does this PR solve?

#9082 #6365

<u> **WARNING: it's not compatible with the older version of `Agent`
module, which means that `Agent` from older versions can not work
anymore.**</u>

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-30 19:41:09 +08:00
342a04ec8a Added infinity rank_feature support (#9044)
### What problem does this PR solve?

Added infinity rank_feature support

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-29 09:14:23 +08:00
dbc2a8689a Fix: no chunks parsed out for Law (#8842)
### What problem does this PR solve?

Fixes no chunks parsed out for Law. #5113 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-15 13:01:56 +08:00
f569401398 Fix: better_handle_different_types (#8775)
### What problem does this PR solve?


https://github.com/infiniflow/ragflow/issues/8719#issuecomment-3055883271

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-11 18:21:39 +08:00
00c954755e Fix:use the same logic to handle pos in tokenize_chunks_with_images (#8732)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8719

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-09 09:31:40 +08:00
8af0d04ad0 Refactor:Improve the logic in search.py (#8716)
### What problem does this PR solve?

1. Remove the useless pop logic due to already been checked at the if
logic
2. merge log logic

### Type of change

- [x] Refactoring
2025-07-08 12:32:01 +08:00
b705ff08fe Refa: improve GraphRAG similarity sensitivity to numeric differences (#8479)
### What problem does this PR solve?

Improve GraphRAG similarity sensitivity to numeric differences. #8444.

### Type of change

- [x] Refactoring
2025-06-25 16:20:59 +08:00
d4e6e2bd21 Fix: doc_aggs issue. (#8418)
### What problem does this PR solve?

#8406

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-23 14:54:01 +08:00
3d0b440e9f fix(search.py):remove hard page_size (#8242)
### What problem does this PR solve?

Fix the restriction of forcing similarity_threshold=0 and page_size=30
when doc_ids is not empty

#8228

---------

Co-authored-by: shiqing.wusq <shiqing.wusq@dtzhejiang.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-13 14:56:25 +08:00
93f5df716f Fix: order chunks from docx by positions. (#7979)
### What problem does this PR solve?

#7934

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-30 17:20:53 +08:00