Commit Graph

497 Commits

Author SHA1 Message Date
6e1ebb2855 Fix: Optimize Prompts and Regex for use_sql() (#11148)
### What problem does this PR solve?

Fix: Optimize Prompts and Regex for use_sql() #11127 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-10 19:02:07 +08:00
d207291217 Fix: add download stats to kb logs. (#11112)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-10 13:28:07 +08:00
dd1c8c5779 Feat: add auto parse to connector. (#11099)
### What problem does this PR solve?

#10953

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-07 16:49:29 +08:00
34283d4db4 Feat: add data source to pipleline logs . (#11075)
### What problem does this PR solve?

#10953

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-07 11:43:59 +08:00
b7aa6d6c4f Fix: add avatar for UI (#11080)
### What problem does this PR solve?

Add avatar for admin UI.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-07 09:27:31 +08:00
af98763e27 Admin: add 'show version' (#11079)
### What problem does this PR solve?

```
admin> show version;
show_version
+-----------------------+
| version               |
+-----------------------+
| v0.21.0-241-gc6cf58d5 |
+-----------------------+
admin> \q
Goodbye!

```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-06 19:24:46 +08:00
3bd1fefe1f Feat: debug sync data. (#11073)
### What problem does this PR solve?

#10953 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-06 16:48:04 +08:00
adbb8319e0 Fix: add fields for logs. (#11039)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-06 09:49:57 +08:00
f98b24c9bf Move api.settings to common.settings (#11036)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-06 09:36:38 +08:00
cd6ed4b380 Feat: add webhook component. (#11033)
### What problem does this PR solve?

#10427

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-05 19:59:23 +08:00
02d10f8eda Move var from rag.settings to common.globals (#11022)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-05 15:48:50 +08:00
8584d4b642 Fix: numeric string miss transformation. (#11025)
### What problem does this PR solve?

#11024

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-05 15:14:30 +08:00
b86e07088b Fix: escape multi-steps issues. (#11016)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-05 14:51:00 +08:00
1a9215bc6f Move some vars to globals (#11017)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-05 14:14:38 +08:00
96c015fb85 Fix and refactor imports (#11010)
### What problem does this PR solve?

1. Move EMBEDDING_CFG to common.globals
2. Fix error imports
3. Move signal handles to common/signal_utils.py

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-05 11:07:54 +08:00
bab3fce136 Move some constants to common (#11004)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-05 08:01:39 +08:00
4bbbf92331 Refa: link connector to KB. (#10991)
### What problem does this PR solve?

#10953

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-04 20:13:52 +08:00
880a6a0428 Move some enumerate type to constants.py (#10998)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-04 19:25:25 +08:00
16d2be623c Minor tweaks (#10987)
### What problem does this PR solve?

1. Rename identifier name
2. Fix some return statement
3. Fix some typos

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-04 14:15:31 +08:00
19f71a961a Fix: Create dataset performance unmatched between HTTP api and web ui (#10960)
### What problem does this PR solve?

Fix: Create dataset performance unmatched between HTTP api and web ui
#10925

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-04 13:45:14 +08:00
3e5a39482e Feat: Support multiple data sources synchronizations (#10954)
### What problem does this PR solve?
#10953

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-03 19:59:18 +08:00
fa210e7c58 Feat: parsing hyperlinks in docx and pdf & Fix: default parser config of toc extraction (#10877)
### What problem does this PR solve?

Feat: parsing hyperlinks in docx and pdf #10848
Fix: default parser config of toc extraction

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-03 09:34:12 +08:00
360f5c1179 Move token related functions to common (#10942)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-03 08:50:05 +08:00
44f2d6f5da Move 'get_project_base_directory' to common directory (#10940)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-02 21:05:28 +08:00
6447b737ab Move singleton to common directory (#10935)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-02 12:24:08 +08:00
f52e56c2d6 Remove 'get_lan_ip' and add common misc_utils.py (#10880)
### What problem does this PR solve?

Add get_uuid, download_img and hash_str2int into misc_utils.py

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-10-31 16:42:01 +08:00
fa38aed01b Fix: the input length exceeds the context length (#10895)
### What problem does this PR solve?

Fix: the input length exceeds the context length #10750

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-30 19:00:53 +08:00
55eb525fdc Feat: rename file to avoid package name conflict (#10863)
### What problem does this PR solve?

Feat: rename file to avoid package name conflict

### Type of change

- [x] Refactoring
2025-10-29 12:19:57 +08:00
5a200f7652 Add time utils (#10849)
### What problem does this PR solve?

- Add time utilities and unit tests

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-10-28 19:09:14 +08:00
766d900a41 Refactor: rename rmSpace to remove_redundant_spaces (#10796)
### What problem does this PR solve?

- rename rmSpace to remove_redundant_spaces
- move clean_markdown_block to common module
- add unit tests for remove_redundant_spaces and clean_markdown_block

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-10-28 09:46:32 +08:00
73144e278b Don't release full image (#10654)
### What problem does this PR solve?

Introduced gpu profile in .env
Added Dockerfile_tei
fix datrie
Removed LIGHTEN flag

### Type of change

- [x] Documentation Update
- [x] Refactoring
2025-10-23 23:02:27 +08:00
a82e9b3d91 Fix: can't upload image in ollama model #10447 (#10717)
### What problem does this PR solve?

Fix: can't upload image in ollama model #10447

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)


### Change all `image=[]` to `image = None`

Changing `image=[]` to `images=None` avoids Python’s mutable default
parameter issue.
If you keep `images=[]`, all calls share the same list, so modifying it
(e.g., images.append()) will affect later calls.
Using images=None and creating a new list inside the function ensures
each call is independent.
This change does not affect current behavior — it simply makes the code
safer and more predictable.


把 `images=[]` 改成 `images=None` 是为了避免 Python 默认参数的可变对象问题。
如果保留 `images=[]`,所有调用都会共用同一个列表,一旦修改就会影响后续调用。
改成 None 并在函数内部重新创建列表,可以确保每次调用都是独立的。
这个修改不会影响现有运行结果,只是让代码更安全、更可控。
2025-10-22 12:24:12 +08:00
2d491188b8 Refa: improve flow of GraphRAG and RAPTOR (#10709)
### What problem does this PR solve?

Improve flow of GraphRAG and RAPTOR.

### Type of change

- [x] Refactoring
2025-10-22 09:29:20 +08:00
cfdd37820a Feat: Support attribute filtering #8703 (#10670)
### What problem does this PR solve?

Feat: Support attribute filtering #8703

### Type of change

- [X] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: writinwaters <cai.keith@gmail.com>
2025-10-21 10:38:40 +08:00
5b2e5dd334 Feat: Gemini supports video parsing (#10671)
### What problem does this PR solve?

Gemini supports video parsing.


![img_v3_02r8_adbd5adc-d665-4756-9a00-3ae0f12224fg](https://github.com/user-attachments/assets/30d8d296-c336-4b55-9823-803979e705ca)


![img_v3_02r8_ab60c046-1727-4029-ad2e-66097fd3ccbg](https://github.com/user-attachments/assets/441b1487-a970-427e-98b6-6e1e002f2bad)

Close: #10617

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-20 16:49:47 +08:00
8ee0b6ea54 File: Now parsing support all types of embedded documents, solved #10059 (#10635)
### What problem does this PR solve?

File: Now parsing support all types of embedded documents, solved #10059
Fix: Incomplete words in chat #10530
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-17 18:46:47 +08:00
8a41057236 Fix: Add RetryingPooledPostgresqlDatabase to handle max_retries param (#10524)
## What problem does this PR solve?

Fixes the PostgreSQL connection error that prevents RAGFlow from
starting:

peewee.ProgrammingError: invalid dsn: invalid connection option
"max_retries"


## Problem Analysis

The `BaseDataBase` class in `api/db/db_models.py` adds `max_retries` and
`retry_delay` to the database configuration dict before passing it to
the database connection constructor.

- **MySQL**: Has `RetryingPooledMySQLDatabase` class that properly
extracts these custom parameters using `kwargs.pop()` before calling the
parent constructor
- **PostgreSQL**: Was using the base `PooledPostgresqlDatabase` class
which passes all parameters directly to `psycopg2.connect()`, which
doesn't recognize `max_retries` as a valid connection option

## Solution

Created `RetryingPooledPostgresqlDatabase` class that:
- Extracts `max_retries` and `retry_delay` parameters before
initialization
- Implements retry logic with exponential backoff for connection
failures
- Handles PostgreSQL-specific connection errors (connection refused,
server closed, etc.)
- Mirrors the existing `RetryingPooledMySQLDatabase` implementation

Updated the `PooledDatabase` enum to use the new retrying class for
PostgreSQL.

## Benefits

 Prevents invalid connection parameters from being passed to psycopg2  
 Adds automatic retry logic for PostgreSQL connection failures  
 Provides better error logging for PostgreSQL-specific issues  
 Maintains consistency between MySQL and PostgreSQL database handling  

## Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

## Testing

Tested with PostgreSQL database configuration and verified:
- Server starts without the "invalid dsn" error
- Database connections are established successfully
- Retry logic works correctly on connection failures

Co-authored-by: Andrea Bugeja <andrea.bugeja@gig.com>
2025-10-16 15:08:41 +08:00
86b254d214 Improve file management (#10577)
### What problem does this PR solve?

Improve file management. #10287.

Passed tests:

1. Create folder `A` and `B`.
2. Upload a file inside `A`, called `file`.
3. Create a KB, called `K`.
3. Link `file` to `K`.
4. Parse `file` inside of `K`. (OK)
5. Move `file` from `A` to `B`.
6. Parse `file` inside of `K`. (OK)
7. Move `file` from `B` to `A`.
8. Parse `file` inside of `K`. (OK)
9. Move entire folder `A` into `B`. (B -> A -> file)
10. Parse `file` inside of `K`. (OK)
11. Delete folder `B`.
12. All clear. (There is no document inside of `K`)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-16 09:38:25 +08:00
16b5feadb7 Fix: canvas list with team. (#10549)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-10-14 19:38:54 +08:00
f92a45dcc4 Feat: let toc run asynchronizly... (#10513)
### What problem does this PR solve?

#10436 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-14 14:14:52 +08:00
0283e4098f Fix #10408 (#10471)
### What problem does this PR solve?

Google Cloud model does not work correctly with gemini-2.5 models
Close #10408

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-10-10 19:18:24 +08:00
0d8791936e Feat: TOC retrieval (#10456)
### What problem does this PR solve?

#10436

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-10 17:07:55 +08:00
9b06734ced Feat: add total in List dataset API (#10448)
### What problem does this PR solve?

Feat: add total in List dataset API,  solved #10360 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-10 11:20:55 +08:00
d931c33ced Fix typos: retrievaler -> retriever (#10372)
### What problem does this PR solve?

Fix typos

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-10-10 09:17:36 +08:00
cbf04ee470 Feat: Use data pipeline to visualize the parsing configuration of the knowledge base (#10423)
### What problem does this PR solve?

#9869

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: jinhai <haijin.chn@gmail.com>
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: chanx <1243304602@qq.com>
Co-authored-by: balibabu <cike8899@users.noreply.github.com>
Co-authored-by: Lynn <lynn_inf@hotmail.com>
Co-authored-by: 纷繁下的无奈 <zhileihuang@126.com>
Co-authored-by: huangzl <huangzl@shinemo.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Wilmer <33392318@qq.com>
Co-authored-by: Adrian Weidig <adrianweidig@gmx.net>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: Liu An <asiro@qq.com>
Co-authored-by: buua436 <66937541+buua436@users.noreply.github.com>
Co-authored-by: BadwomanCraZY <511528396@qq.com>
Co-authored-by: cucusenok <31804608+cucusenok@users.noreply.github.com>
Co-authored-by: Russell Valentine <russ@coldstonelabs.org>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Billy Bao <newyorkupperbay@gmail.com>
Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
Co-authored-by: TensorNull <129579691+TensorNull@users.noreply.github.com>
Co-authored-by: TensorNull <tensor.null@gmail.com>
Co-authored-by: TeslaZY <TeslaZY@outlook.com>
Co-authored-by: Ajay <160579663+aybanda@users.noreply.github.com>
Co-authored-by: AB <aj@Ajays-MacBook-Air.local>
Co-authored-by: 天海蒼灆 <huangaoqin@tecpie.com>
Co-authored-by: He Wang <wanghechn@qq.com>
Co-authored-by: Atsushi Hatakeyama <atu729@icloud.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Mohamed Mathari <155896313+melmathari@users.noreply.github.com>
Co-authored-by: Mohamed Mathari <nocodeventure@Mac-mini-van-Mohamed.fritz.box>
Co-authored-by: Stephen Hu <stephenhu@seismic.com>
Co-authored-by: Shaun Zhang <zhangwfjh@users.noreply.github.com>
Co-authored-by: zhimeng123 <60221886+zhimeng123@users.noreply.github.com>
Co-authored-by: mxc <mxc@example.com>
Co-authored-by: Dominik Novotný <50611433+SgtMarmite@users.noreply.github.com>
Co-authored-by: EVGENY M <168018528+rjohny55@users.noreply.github.com>
Co-authored-by: mcoder6425 <mcoder64@gmail.com>
Co-authored-by: lemsn <lemsn@msn.com>
Co-authored-by: lemsn <lemsn@126.com>
Co-authored-by: Adrian Gora <47756404+adagora@users.noreply.github.com>
Co-authored-by: Womsxd <45663319+Womsxd@users.noreply.github.com>
Co-authored-by: FatMii <39074672+FatMii@users.noreply.github.com>
2025-10-09 12:36:19 +08:00
7c620bdc69 Fix: unexpected operation of document management (#10366)
### What problem does this PR solve?

 Unexpected operation of document management.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-30 15:20:38 +08:00
2d5d10ecbf Feat/admin drop user (#10342)
### What problem does this PR solve?

- Admin client support drop user.

Issue: #10241 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-09-29 10:16:13 +08:00
62b7c655c5 Refactor: migrate the function to specific file (#10201)
### What problem does this PR solve?

Move base64 related function to api/common/base64.py

### Type of change

- [x] Refactoring

---------

Signed-off-by: jinhai <haijin.chn@gmail.com>
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-09-25 23:37:50 +08:00
b0b866c8fd Refactor: move some functions out of api/utils/__init__.py (#10216)
### What problem does this PR solve?

Refactor import modules.

### Type of change

- [x] Refactoring

---------

Signed-off-by: jinhai <haijin.chn@gmail.com>
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-09-25 18:04:49 +08:00
3a831d0c28 Fixed the issue where database connections were interrupted under high concurrency (#10126)
### What problem does this PR solve?

Fixed the issue where database connections were interrupted under high
concurrency

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: lemsn <lemsn@126.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-09-25 17:03:43 +08:00