Commit Graph

3270 Commits

Author SHA1 Message Date
c01237ec0f Fix: sandbox sandalone context error (#8340)
### What problem does this PR solve?

Fix sandbox sandalone context error. #8307.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-18 12:37:17 +08:00
371f61972d Feat: Add tool nodes and tool drop-down menu #3221 (#8335)
### What problem does this PR solve?

Feat: Add tool nodes and tool drop-down menu #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-18 12:36:44 +08:00
6ce282d462 Feat: Add child nodes and their connecting lines by clicking #3221 (#8314)
### What problem does this PR solve?
Feat: Add child nodes and their connecting lines by clicking #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-18 09:42:56 +08:00
4a2ff633e0 Fix typo in code (#8327)
### What problem does this PR solve?

Fix typo in code

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-06-18 09:41:09 +08:00
09b7ac26ad Doc: Update README badges (#8326)
### What problem does this PR solve?

- Highlight current language in README badges by changing color for
Traditional and Simplified Chinese

### Type of change

- [x] Documentation Update
2025-06-17 18:01:56 +08:00
0a13d79b94 Refa: Implement centralized file name length limit using FILE_NAME_LEN_LIMIT constant (#8318)
### What problem does this PR solve?

- Replace hardcoded 255-byte file name length checks with
FILE_NAME_LEN_LIMIT constant
- Update error messages to show the actual limit value
- #8290

### Type of change

- [x] Refactoring

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-17 18:01:30 +08:00
64e281b398 Fix: Add validation for empty filenames in document_app.py (#8321)
### What problem does this PR solve?

- Add validation for empty filenames in document_app.py and trim
whitespace

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-17 15:53:41 +08:00
307d5299e7 Feat: Add a child operator node by clicking the operator node anchor point #3221 (#8309)
### What problem does this PR solve?

Feat: Add a child operator node by clicking the operator node anchor
point #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-17 11:57:07 +08:00
a9532cb9e7 Feat: add authorization header for MCP server based on OAuth 2.1 (#8292)
### What problem does this PR solve?

Add authorization header for MCP server based on [OAuth
2.1](https://datatracker.ietf.org/doc/html/draft-ietf-oauth-v2-1-12#section-5).

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-17 09:29:12 +08:00
efc3caf702 Feat: Modify the anchor point positioning of the classification operator node #3221 (#8299)
### What problem does this PR solve?

Feat: Modify the anchor point positioning of the classification operator
node #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-17 09:28:30 +08:00
12303ff18f Update readme (#8304)
### What problem does this PR solve?

Update readme

### Type of change

- [x] Documentation Update
- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-06-16 21:14:50 +08:00
a3bebeb599 Fix: Enforce 255-byte filename limit (#8290)
### What problem does this PR solve?

- Add filename length validation (<=255 bytes) for document
upload/rename in both HTTP and SDK APIs
- Update error messages for consistency
- Fix comparison operator in SDK from '>=' to '>' for filename length
check

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-16 16:39:41 +08:00
bde76d2f55 Feat: Use the node ID as the key to destroy different types of form components to switch the form values ​​of the same type of operators #3221 (#8288)
### What problem does this PR solve?
Feat: Use the node ID as the key to destroy different types of form
components to switch the form values ​​of the same type of operators
#3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-16 16:28:20 +08:00
36ee1d271d Feat: Fixed the issue where the parameters could not be set after switching the large model parameter template. #8282 (#8283)
### What problem does this PR solve?

Feat: Fixed the issue where the parameters could not be set after
switching the large model parameter template. #8282

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-16 16:28:05 +08:00
601e024d77 Docs: add authorization header for MCP server based on OAuth 2.1 (#8293)
### What problem does this PR solve?

Add documentation of authorization header for MCP server based on OAuth
2.1

### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-06-16 16:27:40 +08:00
6287efde18 Docs: add sandbox FAQ (#8284)
### What problem does this PR solve?

Add sandbox FAQ. 

#7699 #7973 #8049 #8196 #8226.

### Type of change

- [x] Documentation Update
- [x] Refactoring
2025-06-16 13:41:27 +08:00
8f9bcb1c74 Feat: make document parsing and embedding batch sizes configurable via environment variables (#8266)
### Description

This PR introduces two new environment variables, ‎`DOC_BULK_SIZE` and
‎`EMBEDDING_BATCH_SIZE`, to allow flexible tuning of batch sizes for
document parsing and embedding vectorization in RAGFlow. By making these
parameters configurable, users can optimize performance and resource
usage according to their hardware capabilities and workload
requirements.

### What problem does this PR solve?

Previously, the batch sizes for document parsing and embedding were
hardcoded, limiting the ability to adjust throughput and memory
consumption. This PR enables users to set these values via environment
variables (in ‎`.env`, Helm chart, or directly in the deployment
environment), improving flexibility and scalability for both small and
large deployments.

- ‎`DOC_BULK_SIZE`: Controls how many document chunks are processed in a
single batch during document parsing (default: 4).
- ‎`EMBEDDING_BATCH_SIZE`: Controls how many text chunks are processed
in a single batch during embedding vectorization (default: 16).

This change updates the codebase, documentation, and configuration files
to reflect the new options.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):

### Additional context
- Updated ‎`.env`, ‎`helm/values.yaml`, and documentation to describe
the new variables.
- Modified relevant code paths to use the environment variables instead
of hardcoded values.
- Users can now tune these parameters to achieve better throughput or
reduce memory usage as needed.

Before:
Default value:
<img width="643" alt="image"
src="https://github.com/user-attachments/assets/086e1173-18f3-419d-a0f5-68394f63866a"
/>
After:
10x:
<img width="777" alt="image"
src="https://github.com/user-attachments/assets/5722bbc0-0bcb-4536-b928-077031e550f1"
/>
2025-06-16 13:40:47 +08:00
b1117a8717 Fix: base url issue. (#8281)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-16 13:40:25 +08:00
0fa1a1469e Fix: avoid mixing different embedding models in document parsing (#8260)
### What problem does this PR solve?

Fix mixing different embedding models in document parsing.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-16 13:40:12 +08:00
dabbc852c8 Fix: opendal storage health attribute not found & remove duplicate operator scheme initialization (#8265)
### What problem does this PR solve?

This PR fixes two issues in the OpenDAL storage connector:
1. The ‎`health` method was missing, which prevented health checks on
the storage backend.
3. The initialization of the ‎`opendal.Operator` object included a
redundant scheme parameter, causing unnecessary duplication and
potential confusion.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Background
- The absence of a ‎`health` method made it difficult to verify the
availability and reliability of the storage service.
- Initializing ‎`opendal.Operator` with both ‎`self._scheme` and
unpacked ‎`**self._kwargs` could lead to errors or unexpected behavior
if the scheme was already included in the kwargs.

### What is changed and how it works?
- Adds a ‎`health` method that writes a test file to verify storage
availability.
- Removes the duplicate scheme parameter from the ‎`opendal.Operator`
initialization to ensure clarity and prevent conflicts.

before:
<img width="762" alt="企业微信截图_46be646f-2e99-4e5e-be67-b1483426e77c"
src="https://github.com/user-attachments/assets/acecbb8c-4810-457f-8342-6355148551ba"
/>
<img width="767" alt="image"
src="https://github.com/user-attachments/assets/147cd5a2-dde3-466b-a9c1-d1d4f0819e5d"
/>

after:
<img width="1123" alt="企业微信截图_09d62997-8908-4985-b89f-7a78b5da55ac"
src="https://github.com/user-attachments/assets/97dc88c9-0f4e-4d77-88b3-cd818e8da046"
/>
2025-06-16 11:35:51 +08:00
545ea229b6 Refa: Structure Ask Message (#8276)
### What problem does this PR solve?

Refactoring codes for SDK

### Type of change

- [x] Refactoring
2025-06-16 10:17:21 +08:00
df17294865 Docs: Sandbox quickstart (#8264)
### What problem does this PR solve?

### Type of change


- [x] Documentation Update
2025-06-16 09:33:01 +08:00
b8e3852d3b Feat: Reset the default values ​​of large model parameters (#8262)
### What problem does this PR solve?

Feat: Reset the default values ​​of large model parameters

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-16 09:29:31 +08:00
0bde5397d0 Feat: Modify the style of the canvas operator node #3221 (#8261)
### What problem does this PR solve?

Feat: Modify the style of the canvas operator node #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-16 09:29:08 +08:00
f7074037ef Feat: Let number of task ahead be visible. (#8259)
### What problem does this PR solve?


![image](https://github.com/user-attachments/assets/d4ef0526-343a-426f-a85a-b05eb8b559a1)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-13 17:32:40 +08:00
1aa991d914 Refa: Translate test file content from Chinese to English in file_utils.py (#8258)
### What problem does this PR solve?

Update all test file creation functions to use English text instead of
Chinese for consistency with the project's language standards. This
includes DOCX, Excel, PPT, PDF, TXT, MD, JSON, EML, and HTML test file
generators.

### Type of change

- [x] Update test case
2025-06-13 17:30:29 +08:00
b2eed8fed1 Fix: incorrect progress updating (#8253)
### What problem does this PR solve?

Progress is only updated if it's valid and not regressive.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-13 17:24:14 +08:00
0c0188b688 Fix: Update customer service template with query references to RewriteQuestion (#8252)
### What problem does this PR solve?

- Add query references to "RewriteQuestion:AllNightsSniff" in multiple
components
- Set "selected" to false for retrieval node

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-13 17:23:53 +08:00
6b58b67d12 Feat: Add canvas node toolbar #3221 (#8249)
### What problem does this PR solve?

Feat: Add canvas node toolbar #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-13 16:52:52 +08:00
64af09ce7b Test: Add web API test suite for knowledge base operations (#8254)
### What problem does this PR solve?

- Implement RAGFlowWebApiAuth class for web API authentication
- Add comprehensive test cases for KB CRUD operations
- Set up common fixtures and utilities in conftest.py
- Add helper functions in common.py for web API requests

The changes establish a complete testing framework for knowledge base
management via web API endpoints.

### Type of change

- [x] Add test case
2025-06-13 16:39:10 +08:00
8f9e7a6f6f Refa: revert to original task message collection logic (#8251)
### What problem does this PR solve?

Get rid of 'RedisDB.get_unacked_iterator queue rag_flow_svr_queue_1
doesn't exist'

----

Edit: revert to original message collection logic.

### Type of change

- [x] Refactoring

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-13 16:38:53 +08:00
65d5268439 Feat: implement novitaAI embedding and reranking. (#8250)
### What problem does this PR solve?

Close #8227

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-13 15:42:17 +08:00
6aa0b0819d Fix: unify opendal config key from ‎schema to ‎scheme (#8232)
### What problem does this PR solve?

This PR resolves the inconsistency in the opendal configuration where
both ‎`schema` and ‎`scheme` were used as keys. The code and
configuration file now consistently use ‎`scheme`, which helps prevent
configuration errors and runtime issues. This change improves code
clarity and maintainability.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Additional context
- Updated both ‎`conf/service_conf.yaml` and
‎`rag/utils/opendal_conn.py` to use ‎`scheme` instead of ‎`schema`
- No breaking changes to other configuration fields
2025-06-13 14:56:51 +08:00
3d0b440e9f fix(search.py):remove hard page_size (#8242)
### What problem does this PR solve?

Fix the restriction of forcing similarity_threshold=0 and page_size=30
when doc_ids is not empty

#8228

---------

Co-authored-by: shiqing.wusq <shiqing.wusq@dtzhejiang.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-13 14:56:25 +08:00
800e263f64 Fix: Update customer_service.json (#8238)
### What problem does this PR solve?

The issue of reporting the 「Can't inference the where the component
input is. Please identify whose output is this component's input」error
when creating an Agent using the Customer service template has been
resolved.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-13 14:31:36 +08:00
ce65ea1fc1 Fix: Change allocate_container_blocking Calculate Time by async time (#8206)
### What problem does this PR solve?

Change allocate_container_blocking Calculate Time by async time

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-13 14:05:11 +08:00
2341939376 Docs: Miscellaneous editorial updates (#8237)
### What problem does this PR solve?


### Type of change


- [x] Documentation Update
2025-06-13 09:46:24 +08:00
a9d9215547 Feat: Connect conditional operators to other operators #3221 (#8231)
### What problem does this PR solve?

Feat: Connect conditional operators to other operators #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-13 09:30:34 +08:00
99725444f1 Fix: desc parameter parsing (#8229)
### What problem does this PR solve?

- Fix boolean parsing for 'desc' parameter in kb_app.py to properly
handle string values

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 19:17:47 +08:00
1ab0f52832 Fix:The OpenAI-Compatible Agent API returns an incorrect message (#8177)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8175

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 19:17:15 +08:00
24ca4cc6b7 Refa: GraphRAG and explaining GraphRAG stalling behavior on large files (#8223)
### What problem does this PR solve?

This PR investigates the cause of #7957.

TL;DR: Incorrect similarity calculations lead to too many candidates.
Since candidate selection involves interaction with the LLM, this causes
significant delays in the program.

What this PR does:

1. **Fix similarity calculation**:
When processing a 64 pages government document, the corrected similarity
calculation reduces the number of candidates from over 100,000 to around
16,000. With a default batch size of 100 pairs per LLM call, this fix
reduces unnecessary LLM interactions from over 1,000 calls to around
160, a roughly 10x improvement.
2. **Add concurrency and timeout limits**: 
Up to 5 entity types are processed in "parallel", each with a 180-second
timeout. These limits may be configurable in future updates.
3. **Improve logging**:
The candidate resolution process now reports progress in real time.
4. **Mitigates potential concurrency risks**


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-06-12 19:09:50 +08:00
d36c8d18b1 Refa: make exception more clear. (#8224)
### What problem does this PR solve?

#8156

### Type of change
- [x] Refactoring
2025-06-12 17:53:59 +08:00
86a1411b07 Refa: Test configs (#8220)
### What problem does this PR solve?

- Move common constants (HOST_ADDRESS, INVALID_API_TOKEN, etc.) to
configs.py
- Update test imports to use centralized configs
- Clean up duplicate constant definitions across test files

This improves maintainability by centralizing configuration.

### Type of change

- [x] Refactoring test case
2025-06-12 17:42:00 +08:00
54a465f9e8 Test: fix chunk deletion test assertions (#8222)
### What problem does this PR solve?

- Fix test assertions in test_delete_chunks.py to expect empty results
after deletion

Action 7619

### Type of change

- [x] Bug Fix test cases
2025-06-12 17:41:46 +08:00
bf7f7c7027 Feat: Display the connection lines between multiple conditions of the conditional operator #3221 (#8218)
### What problem does this PR solve?

Feat: Display the connection lines between multiple conditions of the
conditional operator #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-12 17:11:24 +08:00
7fbbc9650d Fix: Move pagerank field from create to update dataset API (#8217)
### What problem does this PR solve?

- Remove pagerank from CreateDatasetReq and add to UpdateDatasetReq
- Add pagerank update logic in dataset update endpoint
- Update API documentation to reflect changes
- Modify related test cases and SDK references

#8208

This change makes pagerank a mutable property that can only be set after
dataset creation, and only when using elasticsearch as the doc engine.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 15:47:49 +08:00
d0c5ff04a6 Fix: Add pagerank validation for non-elasticsearch doc engines (#8215)
### What problem does this PR solve?

Validate that pagerank updates are only allowed when using elasticsearch
as the document engine. Return an error if pagerank is set while using a
different doc engine, preventing potential inconsistencies in document
scoring.

#8208

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 15:47:22 +08:00
d5236b71f4 Refa: ollama keep alive issue. (#8216)
### What problem does this PR solve?

#8122

### Type of change

- [x] Refactoring
2025-06-12 15:09:40 +08:00
e7c85e569b Fix: Improve TS Warning For http_api_reference.md (#8172)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8157
The current master code should work fine, but hI ave some warnings, so I
added a declare to improve the warning

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 14:20:15 +08:00
84b4e32c34 Feat: The value selected in the Select component only displays the icon #3221 (#8209)
### What problem does this PR solve?
Feat: The value selected in the Select component only displays the icon
#3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-12 12:31:57 +08:00