Commit Graph

921 Commits

Author SHA1 Message Date
0d8791936e Feat: TOC retrieval (#10456)
### What problem does this PR solve?

#10436

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-10 17:07:55 +08:00
5d167cd772 feat: support qwq reasoning models with non-stream output (#10468)
### What problem does this PR solve?
issue:
[#6193](https://github.com/infiniflow/ragflow/issues/6193)
change:
support qwq reasoning models with non-stream output

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-10 16:38:04 +08:00
c802a6ffdd Feat: Add prompts for toc relevance check according to #10436 (#10457)
### What problem does this PR solve?

Feat: Add prompts for toc relevance check according to #10436

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-10 11:44:46 +08:00
6ab4c1a6e9 Refactor: improve how NvidiaCV calculate res total token counts (#10455)
### What problem does this PR solve?
improve how NvidiaCV calculate res total token counts

### Type of change
- [x] Refactoring
2025-10-10 11:03:40 +08:00
8aabc2807c Feat: Pipeline Docx file supports Markdown output (#10439)
### What problem does this PR solve?

Pipeline Docx file supports Markdown output.

<img width="1242" height="755" alt="image"
src="https://github.com/user-attachments/assets/63cca75b-20b9-4a90-a01c-c0c2fccf1f2a"
/>

<img width="1227" height="717" alt="image"
src="https://github.com/user-attachments/assets/0dcb94b2-7ba0-48d5-9231-dc6e5c4b4192"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-10 09:39:15 +08:00
d931c33ced Fix typos: retrievaler -> retriever (#10372)
### What problem does this PR solve?

Fix typos

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-10-10 09:17:36 +08:00
1a47e136e3 Feat: Adds a new feature that enables the LLM to extract a structured table of contents (TOC) directly from plain text. (#10428)
### What problem does this PR solve?

**Adds a new feature that enables the LLM to extract a structured table
of contents (TOC) directly from plain text.**
_This implementation prioritizes efficiency over reasoning — the model
runs in a strictly deterministic mode (thinking disabled) to minimize
latency.
As a result, overall performance may be less optimal, but the extraction
speed and consistency are guaranteed._

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-09 13:47:31 +08:00
cbf04ee470 Feat: Use data pipeline to visualize the parsing configuration of the knowledge base (#10423)
### What problem does this PR solve?

#9869

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: jinhai <haijin.chn@gmail.com>
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: chanx <1243304602@qq.com>
Co-authored-by: balibabu <cike8899@users.noreply.github.com>
Co-authored-by: Lynn <lynn_inf@hotmail.com>
Co-authored-by: 纷繁下的无奈 <zhileihuang@126.com>
Co-authored-by: huangzl <huangzl@shinemo.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Wilmer <33392318@qq.com>
Co-authored-by: Adrian Weidig <adrianweidig@gmx.net>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: Liu An <asiro@qq.com>
Co-authored-by: buua436 <66937541+buua436@users.noreply.github.com>
Co-authored-by: BadwomanCraZY <511528396@qq.com>
Co-authored-by: cucusenok <31804608+cucusenok@users.noreply.github.com>
Co-authored-by: Russell Valentine <russ@coldstonelabs.org>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Billy Bao <newyorkupperbay@gmail.com>
Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
Co-authored-by: TensorNull <129579691+TensorNull@users.noreply.github.com>
Co-authored-by: TensorNull <tensor.null@gmail.com>
Co-authored-by: TeslaZY <TeslaZY@outlook.com>
Co-authored-by: Ajay <160579663+aybanda@users.noreply.github.com>
Co-authored-by: AB <aj@Ajays-MacBook-Air.local>
Co-authored-by: 天海蒼灆 <huangaoqin@tecpie.com>
Co-authored-by: He Wang <wanghechn@qq.com>
Co-authored-by: Atsushi Hatakeyama <atu729@icloud.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Mohamed Mathari <155896313+melmathari@users.noreply.github.com>
Co-authored-by: Mohamed Mathari <nocodeventure@Mac-mini-van-Mohamed.fritz.box>
Co-authored-by: Stephen Hu <stephenhu@seismic.com>
Co-authored-by: Shaun Zhang <zhangwfjh@users.noreply.github.com>
Co-authored-by: zhimeng123 <60221886+zhimeng123@users.noreply.github.com>
Co-authored-by: mxc <mxc@example.com>
Co-authored-by: Dominik Novotný <50611433+SgtMarmite@users.noreply.github.com>
Co-authored-by: EVGENY M <168018528+rjohny55@users.noreply.github.com>
Co-authored-by: mcoder6425 <mcoder64@gmail.com>
Co-authored-by: lemsn <lemsn@msn.com>
Co-authored-by: lemsn <lemsn@126.com>
Co-authored-by: Adrian Gora <47756404+adagora@users.noreply.github.com>
Co-authored-by: Womsxd <45663319+Womsxd@users.noreply.github.com>
Co-authored-by: FatMii <39074672+FatMii@users.noreply.github.com>
2025-10-09 12:36:19 +08:00
dfc5fa1f4d Feat: add DeerAPI support (#10303)
### Related issues
#10078 

### What problem does this PR solve?
Integrate DeerAPI provider.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update

Co-authored-by: DeerAPI <tensor.null@gmail.com>
2025-10-09 11:14:49 +08:00
4585edc20e Refactor: improve cv model logics (#10414)
1. improve how to get total token count

Improve how to get total token count

### Type of change
- [x] Refactoring
2025-10-09 09:47:36 +08:00
518a00630e Fix highlight with infinity (#10345)
Fix highlight with infinity
Fix on OpenSUSE Tumbleweed

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-30 19:15:01 +08:00
fb950079ef Feat/service manage (#10381)
### What problem does this PR solve?

- Admin service support SHOW SERVICE <id>.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

issue: #10241
2025-09-30 16:23:09 +08:00
17757930a3 Feat: add support for international Dashscope service (#10356)
### What problem does this PR solve?

 Add support for international Dashscope service. #10340 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-29 14:49:45 +08:00
2d5d10ecbf Feat/admin drop user (#10342)
### What problem does this PR solve?

- Admin client support drop user.

Issue: #10241 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-29 10:16:13 +08:00
bd94b5dfb5 feat: add IBM DB2 support (#10306)
### What problem does this PR solve?

issue:#5617
change:add IBM DB2 support in ExeSQL 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-26 14:55:19 +08:00
ef59c5bab9 FIX: Rename the CometEmbed and CometSeq2txt classes to CometAPIEmbed and CometAPISeq2txt, and correct supported_models.mdx. (#10298)
### What problem does this PR solve?

Rename the CometEmbed and CometSeq2txt classes to CometAPIEmbed and
CometAPISeq2txt, and correct supported_models.mdx.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-26 10:50:56 +08:00
b0b866c8fd Refactor: move some functions out of api/utils/__init__.py (#10216)
### What problem does this PR solve?

Refactor import modules.

### Type of change

- [x] Refactoring

---------

Signed-off-by: jinhai <haijin.chn@gmail.com>
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-09-25 18:04:49 +08:00
9e323a9351 Feat(nlp): add "怎么办" pattern to question word removal (#10284)
### What problem does this PR solve?

Added "怎么办" to the regex pattern in rmWWW method to improve query
cleaning by removing this common question phrase along with other
question words.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-25 16:47:56 +08:00
daea357940 Fix: invalid COMPONENT_EXEC_TIMEOUT (#10278)
### What problem does this PR solve?

Fix invalid COMPONENT_EXEC_TIMEOUT. #10273

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-25 14:11:09 +08:00
193d93d820 Refactor: Improve the logic clean conf for ZhipuChat (#10274)
### What problem does this PR solve?
Improve the logic clean conf for ZhipuChat

### Type of change
- [x] Refactoring
2025-09-25 10:28:03 +08:00
a1f848bfe0 Fix:max_tokens must be at least 1, got -950, BadRequestError (#10252)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/10235

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-09-24 10:49:34 +08:00
38be53cf31 fix: prevent list index out of range in chat streaming (#10238)
### What problem does this PR solve?
issue:
[Bug]: ERROR: list index out of range #10188
change:
fix a potential list index out of range error in chat response parsing
by adding explicit checks for empty choices.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-23 19:59:39 +08:00
10cbbb76f8 revert gpt5 integration (#10228)
### What problem does this PR solve?

  Revert back to chat.completions.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [x] Other (please describe):
  Revert back to chat.completions.
2025-09-23 16:06:12 +08:00
1c84d1b562 Fix: azure OpenAI retry (#10213)
### What problem does this PR solve?

Currently, Azure OpenAI returns one minute Quota limit responses when
chat API is utilized. This change is needed in order to be able to
process almost any documents using models deployed in Azure Foundry.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-23 12:19:28 +08:00
4eb7659499 Fix bug: broken import from rag.prompts.prompts (#10217)
### What problem does this PR solve?

Fix broken imports

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: jinhai <haijin.chn@gmail.com>
2025-09-23 10:19:25 +08:00
da82566304 Fix: resolve hash collisions by switching to UUID &correct logic for always-true statements & Update GPT api integration & Support qianwen-deepresearch (#10208)
### What problem does this PR solve?

Fix: resolve hash collisions by switching to UUID &correct logic for
always-true statements, solved: #10165
Feat: Update GPT api integration, solved: #10204 
Feat: Support qianwen-deepresearch, solved: #10163 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-09-23 09:34:30 +08:00
94dbd4aac9 Refactor: use the same implement for total token count from res (#10197)
### What problem does this PR solve?
use the same implement for total token count from res

### Type of change

- [x] Refactoring
2025-09-22 17:17:06 +08:00
ca9f30e1a1 Add tree_merge for law parsers, significantly outperforming hierarchical_merge (#10202)
### What problem does this PR solve?
Add tree_merge for law parsers, significantly outperforming
hierarchical_merge, solved: #8637
1. Add tree_merge for law parsers, include build_tree and get_tree by
dfs.
2. add Copyright statement for helath_utils
### Type of change

- [x] Documentation Update
- [x] Performance Improvement
2025-09-22 16:33:21 +08:00
902703d145 Fix: skip tag query if tag kbs are invalid (#10168)
### What problem does this PR solve?

Skip `tag_query` step if `tag_kbs` are empty. 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-19 19:12:18 +08:00
70ce02faf4 Feat: add support for Anthropic third-party API (#10173)
### What problem does this PR solve?
issue:
[Bug]: anthropic model have not baseurl selecting,need add #8546
change:
This PR adds support for using Anthropic models through a third-party
API by allowing a custom base_url.
It ensures compatibility with both the official Anthropic endpoint and
external providers.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-19 19:06:14 +08:00
6c24ad7966 fix: correct rerank_model condition logic (#10174)
### What problem does this PR solve?

fix the rerank_model condition logic by correcting the np.isclose check.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-19 16:02:10 +08:00
4693c5382a Feat: migrate OpenAI-compatible chats to LiteLLM (#10148)
### What problem does this PR solve?

Migrate OpenAI-compatible chats to LiteLLM.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-18 17:16:59 +08:00
91b609447d Fix: embedding model failure in CometAPI (#10137)
### What problem does this PR solve?

Related PR:
Feat: add CometAPI to LLMFactory and update related mappings #10119 

Change:
Fixes the issue where the embedding model in CometAPI was not being
called correctly

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: TensorNull <tensor.null@gmail.com>
2025-09-18 14:49:47 +08:00
f12b9fdcd4 Feat: add CometAPI to LLMFactory and update related mappings (#10119)
### Related issues
#10078

### What problem does this PR solve?
Integrate CometAPI provider.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
2025-09-18 09:51:29 +08:00
ea0f1d47a5 Support image recognition for url links in Markdown file, fix log error in code_exec (#10139)
### What problem does this PR solve?

Support image recognition with image links in markdown files, solved
issue: #8755
Fixed log info error in code_exec, solved issue: #10064

### Type of change (8755)

- [x] New Feature (non-breaking change which adds functionality)

### Type of change (10064)

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-18 09:44:17 +08:00
d353f7f7f8 Feat/parse audio (#10133)
### What problem does this PR solve?

Dataflow support audio.  And fix giteeAI's sequence2text model. 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-09-18 09:31:32 +08:00
152111fd9d Feat/parse img (#10112)
### What problem does this PR solve?

support parse image by OCR or VLM.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-16 17:53:37 +08:00
65571e5254 Feat: dataflow supports text (#10058)
### What problem does this PR solve?

dataflow supports text.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-11 19:03:51 +08:00
e1d86cfee3 Feat: add TokenPony model provider (#9932)
### What problem does this PR solve?

Add TokenPony as a LLM provider

Co-authored-by: huangzl <huangzl@shinemo.com>
2025-09-11 17:25:31 +08:00
3d39b96c6f Fix: token num exceed (#10046)
### What problem does this PR solve?

fix text input exceed token num limit when using siliconflow's embedding
model BAAI/bge-large-zh-v1.5 and BAAI/bge-large-en-v1.5, truncate before
input.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-11 12:02:12 +08:00
179091b1a4 Fix: In ragflow/rag/app /naive.py, if there are multiple images in one line, the other images will be lost (#9968)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/9966

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-09-11 11:08:31 +08:00
1936ad82d2 Refactor:Improve BytesIO usage for GeminiCV (#10042)
### What problem does this PR solve?
Improve BytesIO usage for GeminiCV

### Type of change
- [x] Refactoring
2025-09-11 11:07:15 +08:00
127af4e45c Refactor:Improve BytesIO usage for image2base64 (#9997)
### What problem does this PR solve?

Improve BytesIO usage for image2base64

### Type of change

- [x] Refactoring
2025-09-10 15:55:33 +08:00
41cdba19ba Feat: dataflow supports markdown (#10003)
### What problem does this PR solve?

Dataflow supports markdown.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-09-10 13:31:02 +08:00
0d9c1f1c3c Feat: dataflow supports Spreadsheet and Word processor document (#9996)
### What problem does this PR solve?

Dataflow supports Spreadsheet and Word processor document

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-10 13:02:53 +08:00
38ff2ffc01 Fix: typo. (#10011)
### What problem does this PR solve?


### Type of change
- [x] Refactoring
2025-09-10 11:07:03 +08:00
906969fe4e Fix: exesql issue. (#9995)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-09 19:45:10 +08:00
936f27e9e5 Feat: add LongCat-Flash-Chat (#9973)
### What problem does this PR solve?

Add LongCat-Flash-Chat from Meituan, deepseek v3.1 from SiliconFlow,
kimi-k2-09-05-preview and kimi-k2-turbo-preview from Moonshot.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-08 19:00:52 +08:00
e9ee9269f5 Feat: user defined prompt. (#9972)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-08 14:05:01 +08:00
91d6fb8061 Fix miscalculated token count (#9776)
### What problem does this PR solve?

The total token was incorrectly accumulated when using the
OpenAI-API-Compatible api.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-05 19:17:21 +08:00