982ed233a2
Fix: doc_aggs not correctly returned when no chunks retrieved. ( #11578 )
...
### What problem does this PR solve?
Fix: doc_aggs not correctly returned when no chunks retrieved.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-28 13:09:05 +08:00
9d8b96c1d0
Feat: add context for figure and table ( #11547 )
...
### What problem does this PR solve?
Add context for figure table.

`==================()` for demonstrating purpose.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-11-27 10:21:44 +08:00
40e84ca41a
Use Infinity single-field-multi-index ( #11444 )
...
### What problem does this PR solve?
Use Infinity single-field-multi-index
### Type of change
- [x] Refactoring
- [x] Performance Improvement
2025-11-26 11:06:37 +08:00
74e0b58d89
Fix: excel default optimization. ( #11519 )
...
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-25 19:54:20 +08:00
db0f6840d9
Feat: ignore chunk size when using custom delimiters ( #11434 )
...
### What problem does this PR solve?
Ignore chunk size when using custom delimiter.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-11-21 14:36:26 +08:00
b846a0f547
Fix: incorrect retrieval total count with pagination enabled ( #11400 )
...
### What problem does this PR solve?
Incorrect retrieval total count with pagination enabled.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-20 15:35:09 +08:00
38234aca53
feat: add OceanBase doc engine ( #11228 )
...
### What problem does this PR solve?
Add OceanBase doc engine. Close #5350
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-11-20 10:00:14 +08:00
7264fb6978
Fix: concat images in word document. ( #11310 )
...
### What problem does this PR solve?
Fix: concat images in word document. Partially solved issues in #11063
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-17 19:38:26 +08:00
bd4bc57009
Refactor: move mcp connection utilities to common ( #11304 )
...
### What problem does this PR solve?
As title
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com >
2025-11-17 15:34:17 +08:00
e27ff8d3d4
Fix: rerank algorithm ( #11266 )
...
### What problem does this PR solve?
Fix: rerank algorithm #11234
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-14 13:59:54 +08:00
93422fa8cc
Fix: Law parser ( #11246 )
...
### What problem does this PR solve?
Fix: Law parser
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-13 15:19:02 +08:00
296476ab89
Refactor function name ( #11210 )
...
### What problem does this PR solve?
As title
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com >
2025-11-12 19:00:15 +08:00
5629fbd2ca
Fix: OpenSearch retrieval no return & Add documentation of /retrieval ( #11083 )
...
### What problem does this PR solve?
Fix: OpenSearch retrieval no return #11006
Add documentation #11072
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com >
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com >
2025-11-07 09:28:42 +08:00
f98b24c9bf
Move api.settings to common.settings ( #11036 )
...
### What problem does this PR solve?
As title
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com >
2025-11-06 09:36:38 +08:00
360f5c1179
Move token related functions to common ( #10942 )
...
### What problem does this PR solve?
As title
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com >
2025-11-03 08:50:05 +08:00
44f2d6f5da
Move 'get_project_base_directory' to common directory ( #10940 )
...
### What problem does this PR solve?
As title
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com >
2025-11-02 21:05:28 +08:00
27f0d82102
Fix: opensearch retrieval error ( #10891 )
...
### What problem does this PR solve?
Fix: opensearch retrieval error #10828
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-30 17:30:54 +08:00
766d900a41
Refactor: rename rmSpace to remove_redundant_spaces ( #10796 )
...
### What problem does this PR solve?
- rename rmSpace to remove_redundant_spaces
- move clean_markdown_block to common module
- add unit tests for remove_redundant_spaces and clean_markdown_block
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com >
2025-10-28 09:46:32 +08:00
e59458c36b
Fix: parsing excel with chartsheet & Clamp begin to a minimum of 0 to prevent negative indexing ( #10819 )
...
### What problem does this PR solve?
Fix: parsing excel with chartsheet #10815
Fix: Clamp begin to a minimum of 0 to prevent negative indexing #10804
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-28 09:40:37 +08:00
501b7d4d01
Fix: prio synonym match than wordnet for english ( #10762 )
...
### What problem does this PR solve?
Fix: prio synonym match than wordnet for english
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-27 09:32:55 +08:00
1d57801c0c
Fix:ERROR 20 Method rag.nlp.search.Dealer.search() parameter highlight="None" violates type hint bool | list, as <class "builtins.NoneType"> "None" not list or bool. ( #10743 )
...
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/10733
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-27 09:29:39 +08:00
ea73f13ebf
Fix: infinity rerank error. ( #10760 )
...
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-23 17:38:54 +08:00
863c3e3d9c
Fix: tree merge ( #10691 )
...
### What problem does this PR solve?
Fix: Fix tree merge, solved #10636
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-21 13:02:01 +08:00
43ea312144
Fix: search highlight. ( #10616 )
...
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-16 18:45:43 +08:00
e48bec1cbf
Don't rerank for infinity ( #10579 )
...
### What problem does this PR solve?
Don't need rerank for infinity since Infinity normalizes each way score
before fusion.
### Type of change
- [x] Refactoring
2025-10-15 20:15:49 +08:00
7d2f65671f
Feat: debugging toc part. ( #10486 )
...
### What problem does this PR solve?
#10436
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-10-11 18:45:21 +08:00
0d8791936e
Feat: TOC retrieval ( #10456 )
...
### What problem does this PR solve?
#10436
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-10-10 17:07:55 +08:00
cbf04ee470
Feat: Use data pipeline to visualize the parsing configuration of the knowledge base ( #10423 )
...
### What problem does this PR solve?
#9869
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Signed-off-by: dependabot[bot] <support@github.com >
Signed-off-by: jinhai <haijin.chn@gmail.com >
Signed-off-by: Jin Hai <haijin.chn@gmail.com >
Co-authored-by: chanx <1243304602@qq.com >
Co-authored-by: balibabu <cike8899@users.noreply.github.com >
Co-authored-by: Lynn <lynn_inf@hotmail.com >
Co-authored-by: 纷繁下的无奈 <zhileihuang@126.com >
Co-authored-by: huangzl <huangzl@shinemo.com >
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com >
Co-authored-by: Wilmer <33392318@qq.com >
Co-authored-by: Adrian Weidig <adrianweidig@gmx.net >
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Yongteng Lei <yongtengrey@outlook.com >
Co-authored-by: Liu An <asiro@qq.com >
Co-authored-by: buua436 <66937541+buua436@users.noreply.github.com >
Co-authored-by: BadwomanCraZY <511528396@qq.com >
Co-authored-by: cucusenok <31804608+cucusenok@users.noreply.github.com >
Co-authored-by: Russell Valentine <russ@coldstonelabs.org >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Billy Bao <newyorkupperbay@gmail.com >
Co-authored-by: Zhedong Cen <cenzhedong2@126.com >
Co-authored-by: TensorNull <129579691+TensorNull@users.noreply.github.com >
Co-authored-by: TensorNull <tensor.null@gmail.com >
Co-authored-by: TeslaZY <TeslaZY@outlook.com >
Co-authored-by: Ajay <160579663+aybanda@users.noreply.github.com >
Co-authored-by: AB <aj@Ajays-MacBook-Air.local >
Co-authored-by: 天海蒼灆 <huangaoqin@tecpie.com >
Co-authored-by: He Wang <wanghechn@qq.com >
Co-authored-by: Atsushi Hatakeyama <atu729@icloud.com >
Co-authored-by: Jin Hai <haijin.chn@gmail.com >
Co-authored-by: Mohamed Mathari <155896313+melmathari@users.noreply.github.com >
Co-authored-by: Mohamed Mathari <nocodeventure@Mac-mini-van-Mohamed.fritz.box >
Co-authored-by: Stephen Hu <stephenhu@seismic.com >
Co-authored-by: Shaun Zhang <zhangwfjh@users.noreply.github.com >
Co-authored-by: zhimeng123 <60221886+zhimeng123@users.noreply.github.com >
Co-authored-by: mxc <mxc@example.com >
Co-authored-by: Dominik Novotný <50611433+SgtMarmite@users.noreply.github.com >
Co-authored-by: EVGENY M <168018528+rjohny55@users.noreply.github.com >
Co-authored-by: mcoder6425 <mcoder64@gmail.com >
Co-authored-by: lemsn <lemsn@msn.com >
Co-authored-by: lemsn <lemsn@126.com >
Co-authored-by: Adrian Gora <47756404+adagora@users.noreply.github.com >
Co-authored-by: Womsxd <45663319+Womsxd@users.noreply.github.com >
Co-authored-by: FatMii <39074672+FatMii@users.noreply.github.com >
2025-10-09 12:36:19 +08:00
9e323a9351
Feat(nlp): add "怎么办" pattern to question word removal ( #10284 )
...
### What problem does this PR solve?
Added "怎么办" to the regex pattern in rmWWW method to improve query
cleaning by removing this common question phrase along with other
question words.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-09-25 16:47:56 +08:00
ca9f30e1a1
Add tree_merge for law parsers, significantly outperforming hierarchical_merge ( #10202 )
...
### What problem does this PR solve?
Add tree_merge for law parsers, significantly outperforming
hierarchical_merge, solved: #8637
1. Add tree_merge for law parsers, include build_tree and get_tree by
dfs.
2. add Copyright statement for helath_utils
### Type of change
- [x] Documentation Update
- [x] Performance Improvement
2025-09-22 16:33:21 +08:00
0d9c1f1c3c
Feat: dataflow supports Spreadsheet and Word processor document ( #9996 )
...
### What problem does this PR solve?
Dataflow supports Spreadsheet and Word processor document
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-09-10 13:02:53 +08:00
e9ee9269f5
Feat: user defined prompt. ( #9972 )
...
### What problem does this PR solve?
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-09-08 14:05:01 +08:00
4e16936fa4
Refactor: Use re compile for weight method ( #9929 )
...
### What problem does this PR solve?
Use re compile for the weight method
### Type of change
- [x] Refactoring
- [x] Performance Improvement
2025-09-05 12:29:44 +08:00
c27172b3bc
Feat: init dataflow. ( #9791 )
...
### What problem does this PR solve?
#9790
Close #9782
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-08-28 18:40:32 +08:00
5abd0bbac1
Fix typo ( #9766 )
...
### What problem does this PR solve?
As title
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com >
2025-08-27 18:56:40 +08:00
b5b8032a56
Feat: Support metadata auto filer for Search. ( #9524 )
...
### What problem does this PR solve?
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-08-19 10:27:24 +08:00
153e430b00
Feat: add meta data filter. ( #9405 )
...
### What problem does this PR solve?
#8531
#7417
#6761
#6573
#6477
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-08-12 14:12:56 +08:00
0a0bfc02a0
Refactor:naive_merge_with_images close useless images ( #9296 )
...
### What problem does this PR solve?
naive_merge_with_images close useless images
### Type of change
- [x] Refactoring
2025-08-07 11:07:29 +08:00
7efeaf6548
Fix:remove a img close which can not operate ( #9267 )
...
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/9149#issuecomment-3157129587
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-06 10:59:49 +08:00
667c5812d0
Fix:Repeated images when parsing markdown files with images ( #9196 )
...
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/9149
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-04 13:35:58 +08:00
d9fe279dde
Feat: Redesign and refactor agent module ( #9113 )
...
### What problem does this PR solve?
#9082 #6365
<u> **WARNING: it's not compatible with the older version of `Agent`
module, which means that `Agent` from older versions can not work
anymore.**</u>
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-07-30 19:41:09 +08:00
342a04ec8a
Added infinity rank_feature support ( #9044 )
...
### What problem does this PR solve?
Added infinity rank_feature support
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-29 09:14:23 +08:00
dbc2a8689a
Fix: no chunks parsed out for Law ( #8842 )
...
### What problem does this PR solve?
Fixes no chunks parsed out for Law. #5113
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-15 13:01:56 +08:00
f569401398
Fix: better_handle_different_types ( #8775 )
...
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/8719#issuecomment-3055883271
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-11 18:21:39 +08:00
00c954755e
Fix:use the same logic to handle pos in tokenize_chunks_with_images ( #8732 )
...
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/8719
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-09 09:31:40 +08:00
8af0d04ad0
Refactor:Improve the logic in search.py ( #8716 )
...
### What problem does this PR solve?
1. Remove the useless pop logic due to already been checked at the if
logic
2. merge log logic
### Type of change
- [x] Refactoring
2025-07-08 12:32:01 +08:00
b705ff08fe
Refa: improve GraphRAG similarity sensitivity to numeric differences ( #8479 )
...
### What problem does this PR solve?
Improve GraphRAG similarity sensitivity to numeric differences. #8444 .
### Type of change
- [x] Refactoring
2025-06-25 16:20:59 +08:00
d4e6e2bd21
Fix: doc_aggs issue. ( #8418 )
...
### What problem does this PR solve?
#8406
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-23 14:54:01 +08:00
3d0b440e9f
fix(search.py):remove hard page_size ( #8242 )
...
### What problem does this PR solve?
Fix the restriction of forcing similarity_threshold=0 and page_size=30
when doc_ids is not empty
#8228
---------
Co-authored-by: shiqing.wusq <shiqing.wusq@dtzhejiang.com >
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com >
2025-06-13 14:56:25 +08:00
93f5df716f
Fix: order chunks from docx by positions. ( #7979 )
...
### What problem does this PR solve?
#7934
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-30 17:20:53 +08:00