Commit Graph

161 Commits

Author SHA1 Message Date
94dbd4aac9 Refactor: use the same implement for total token count from res (#10197)
### What problem does this PR solve?
use the same implement for total token count from res

### Type of change

- [x] Refactoring
2025-09-22 17:17:06 +08:00
ddaed541ff Fix S3 client initialization with signature_version and addressing_style (#9911)
### What problem does this PR solve?

Moved `signature_version` and `addressing_style` parameters to a
`Config` object from `botocore.config`
`signature_version` is now passed as `Config(signature_version='v4')`
`addressing_style` is now passed as `Config(s3={'addressing_style':
'path'})`
The `Config` object is then passed to `boto3.client()` via the `config`
parameter



## Changes Made
- Modified `rag/utils/s3_conn.py` in the `__open__()` method
- Updated parameter handling logic to use `config_kwargs` dictionary
- Maintained backward compatibility for configurations without these
parameters



## Related Issue
Fixes #9910


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Syed Shahmeer Ali <ashahmeer73@gmail.com>
2025-09-05 09:58:30 +08:00
982ec24fa7 Fix kb isolation infinity conn (#9913)
### What problem does this PR solve?

This PR fixes a critical bug in the knowledge base isolation feature
where chat responses were referencing documents from incorrect knowledge
bases. The issue was in the `infinity_conn.py` file where the
`equivalent_condition_to_str()` function was incorrectly skipping
`kb_id` filtering, causing documents from unintended knowledge bases to
be included in search results.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Syed Shahmeer Ali <ashahmeer73@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-09-04 21:14:56 +08:00
c27172b3bc Feat: init dataflow. (#9791)
### What problem does this PR solve?

#9790

Close #9782

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-08-28 18:40:32 +08:00
5abd0bbac1 Fix typo (#9766)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-08-27 18:56:40 +08:00
4b1b68c5fc Fix: no doc hits after meta data filter. (#9435)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-13 12:43:31 +08:00
153e430b00 Feat: add meta data filter. (#9405)
### What problem does this PR solve?

#8531 
#7417 
#6761 
#6573
#6477

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-08-12 14:12:56 +08:00
a2e1f5618d Fix: bytes style image issue. (#9304)
### What problem does this PR solve?

#9302

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-07 15:20:01 +08:00
4fc9e42e74 fix: add missing env vars and default values of service_conf.yaml (#9289)
### What problem does this PR solve?

Add missing env var `MYSQL_MAX_PACKET` to service_conf.yaml.template,
and add default values to opendal config to fix npe.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-07 10:41:05 +08:00
9ca86d801e Refa: add provider info while adding model. (#9273)
### What problem does this PR solve?
#9248

### Type of change

- [x] Refactoring
2025-08-07 09:40:42 +08:00
a249803961 Refa: ensure Redis stream queue could be created properly (#9223)
### What problem does this PR solve?

Ensure Redis queue could be created properly.

### Type of change

- [x] Refactoring
2025-08-05 09:54:31 +08:00
6ec3f18e22 Fix: self-deployed LLM error, (#9217)
### What problem does this PR solve?

Close #9197
Close #9145

### Type of change

- [x] Refactoring
- [x] Bug fixing.
2025-08-05 09:49:47 +08:00
d9fe279dde Feat: Redesign and refactor agent module (#9113)
### What problem does this PR solve?

#9082 #6365

<u> **WARNING: it's not compatible with the older version of `Agent`
module, which means that `Agent` from older versions can not work
anymore.**</u>

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-30 19:41:09 +08:00
342a04ec8a Added infinity rank_feature support (#9044)
### What problem does this PR solve?

Added infinity rank_feature support

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-29 09:14:23 +08:00
49f3f26622 Bug fix: OpenSearch chunk update some api error (#9032)
### What problem does this PR solve?

Fix a small non-blocking main workflow bug about chunk update When
OpenSearch is the doc engine.
When you wanna enable/disable a chunk in the web-page “Knowledge Base /
Dataset / Chunk”, the bug ocurred.
<img width="2388" height="662" alt="image"
src="https://github.com/user-attachments/assets/575987a0-c929-4589-bfa0-ba54e137cfd9"
/>

The reaseon why it ocurred is that some api params between OpenSearch
and ES differs. It functioned well no matter enable/disable/rewrite the
chunk after I fixed. I also checked the result when using the chat
web-page.

<img width="2394" height="660" alt="image"
src="https://github.com/user-attachments/assets/8b899dc6-d769-4e80-8dd8-ad0fbbca5f78"
/>

I will still focus on vector-database espeically OpenSearch.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: 张雨豪 <zhangyh80@chinatelecom.cn>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-07-25 09:57:24 +08:00
175f5eaa90 use quote_plus to escape password in opendal's mysql url (#8976)
### What problem does this PR solve?

Use `quote_plus` to escape password in opendal's mysql url to support
special characters like `#`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-23 10:17:34 +08:00
fd97ce3e5a fix s3 init config . (#8886)
### What problem does this PR solve?

when``` if 'signature_version' in self.s3_config:``` and ```if
'addressing_style' in self.s3_config:``` both true.
the config init is error, will be overwrite by last one. 

this pr is for fix that case.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-07-17 12:10:15 +08:00
c642dbefca Perf: Enhance timeout handling. (#8826)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2025-07-15 09:36:45 +08:00
1e6bda735a Fix: add ES re-connect once request timeout. (#8678)
### What problem does this PR solve?

#8669

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-07 09:22:25 +08:00
62b63acbb5 Refa: more robust mcp tool call (#8631)
### What problem does this PR solve?

More robust MCP tool call conn.

### Type of change

- [x] Refactoring
2025-07-02 18:37:54 +08:00
fffb7c0bba Fix: anthropic llm issue. (#8633)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-02 18:37:34 +08:00
695bfe34a2 fix opendal config 'oss_table' and 'max_allowed_packet' (#8611)
### What problem does this PR solve?

Fix the config option name of the opendal table name and setting of
'max_allowed_packet'.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: He Wang <wanghechn@qq.com>
2025-07-02 16:45:01 +08:00
32f8b3ad77 Fix: the output log is incorrect (#8577)
### What problem does this PR solve?

Fix: the output log is incorrect

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: liang <xiaofeng.liang@landstech.com.cn>
2025-07-01 10:49:43 +08:00
8801de2772 Refa: change mcp_client module to rag/utils/conn (#8578)
### What problem does this PR solve?

Change mcp_client module to rag/utils/conn.

### Type of change

- [x] Refactoring
2025-07-01 09:29:19 +08:00
f0e0783618 Fix: Database Query Vulnerable to Injection Attacks in rag/utils/opendal_conn.py (#8408)
**Context and Purpose:**

This PR automatically remediates a security vulnerability:
- **Description:** Detected possible formatted SQL query. Use
parameterized queries instead.
- **Rule ID:**
python.lang.security.audit.formatted-sql-query.formatted-sql-query
- **Severity:** HIGH
- **File:** rag/utils/opendal_conn.py
- **Lines Affected:** 98 - 98

This change is necessary to protect the application from potential
security risks associated with this vulnerability.

**Solution Implemented:**

The automated remediation process has applied the necessary changes to
the affected code in `rag/utils/opendal_conn.py` to resolve the
identified issue.

Please review the changes to ensure they are correct and integrate as
expected.
2025-06-23 14:54:25 +08:00
4784aa5b0b fix: List Chunks API fails to return the correct document status. (#8347)
### What problem does this PR solve?

The existing
/api/v1/datasets/{dataset_id}/documents/{document_id}/chunks endpoint
fails to accurately return a document's chunk status. Even when a chunk
is explicitly marked as unavailable, the API still returns true.

![img_v3_02nc_3458a1b7-609e-4f20-8cb7-2156a489848g](https://github.com/user-attachments/assets/ab3b8f69-1284-49c1-8af3-bdfae3416583)

![img_v3_02nc_82f1d96e-7596-4def-ba75-5a2bd10d56cg](https://github.com/user-attachments/assets/a8a4162b-b50d-4dfc-af72-e1d7812a0a93)

Co-authored-by: zhoudeyong <zhoudeyong@idr.ai>
2025-06-19 11:12:53 +08:00
dabbc852c8 Fix: opendal storage health attribute not found & remove duplicate operator scheme initialization (#8265)
### What problem does this PR solve?

This PR fixes two issues in the OpenDAL storage connector:
1. The ‎`health` method was missing, which prevented health checks on
the storage backend.
3. The initialization of the ‎`opendal.Operator` object included a
redundant scheme parameter, causing unnecessary duplication and
potential confusion.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Background
- The absence of a ‎`health` method made it difficult to verify the
availability and reliability of the storage service.
- Initializing ‎`opendal.Operator` with both ‎`self._scheme` and
unpacked ‎`**self._kwargs` could lead to errors or unexpected behavior
if the scheme was already included in the kwargs.

### What is changed and how it works?
- Adds a ‎`health` method that writes a test file to verify storage
availability.
- Removes the duplicate scheme parameter from the ‎`opendal.Operator`
initialization to ensure clarity and prevent conflicts.

before:
<img width="762" alt="企业微信截图_46be646f-2e99-4e5e-be67-b1483426e77c"
src="https://github.com/user-attachments/assets/acecbb8c-4810-457f-8342-6355148551ba"
/>
<img width="767" alt="image"
src="https://github.com/user-attachments/assets/147cd5a2-dde3-466b-a9c1-d1d4f0819e5d"
/>

after:
<img width="1123" alt="企业微信截图_09d62997-8908-4985-b89f-7a78b5da55ac"
src="https://github.com/user-attachments/assets/97dc88c9-0f4e-4d77-88b3-cd818e8da046"
/>
2025-06-16 11:35:51 +08:00
6aa0b0819d Fix: unify opendal config key from ‎schema to ‎scheme (#8232)
### What problem does this PR solve?

This PR resolves the inconsistency in the opendal configuration where
both ‎`schema` and ‎`scheme` were used as keys. The code and
configuration file now consistently use ‎`scheme`, which helps prevent
configuration errors and runtime issues. This change improves code
clarity and maintainability.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Additional context
- Updated both ‎`conf/service_conf.yaml` and
‎`rag/utils/opendal_conn.py` to use ‎`scheme` instead of ‎`schema`
- No breaking changes to other configuration fields
2025-06-13 14:56:51 +08:00
44287fb05f Oss support opendal(including mysql) (#8204)
### What problem does this PR solve?

#8074
Oss support opendal(including mysql)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-12 11:37:42 +08:00
60ab7027c0 fix: allow to do role auth for S3 bucket use. (#8149)
### What problem does this PR solve?

Close #8148 .

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-10 10:50:07 +08:00
5d6bf2224a Fix: Opensearch chunk management (#7802)
### What problem does this PR solve?

This PR solve the problems metioned in the
pr(https://github.com/infiniflow/ragflow/pull/7140) which is also
submitted by me

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):


### Introduction
I fixed the problems when using OpenSearch as the DOC_ENGINE, the
failures of pytest and the wrong API's return.
Mainly about delete chunk, list chunks, update chunk, retrieval chunk.
The pytest comand "cd sdk/python && uv sync --python 3.10 --group test
--frozen && source .venv/bin/activate && cd test/test_http_api &&
DOC_ENGINE=opensearch pytest test_chunk_management_within_dataset -s
--tb=short " is finally successful.

###Others
As some changes between Elasticsearch And Opensearch differ, some pytest
results about OpenSearch are correct and resonable. However, some pytest
params (skipif params) are incompatible. So I changed some pytest params
about skipif.

As a search engine programmer, I will still focus on the usage of vector
databases (especially OpenSearch) for the RAG stuff.
Thanks for your review
2025-05-26 16:57:58 +08:00
2f4d803db1 Delete Corresponding Minio Bucket When Deleting a Knowledge Base (#7841)
### What problem does this PR solve?

Delete Corresponding Minio Bucket When Deleting a Knowledge Base
[issue #4113 ](https://github.com/infiniflow/ragflow/issues/4113)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-05-26 10:02:51 +08:00
ab27609a64 Fix: whole knowledge graph lost after removing any document in the knowledge base (#7151)
### What problem does this PR solve?

When you removed any document in a knowledge base using knowledge graph,
the graph's `removed_kwd` is set to "Y".
However, in the function `graphrag.utils.get_gaph`, `rebuild_graph`
method is passed and directly return `None` while `removed_kwd=Y`,
making residual part of the graph abandoned (but old entity data still
exist in db).

Besides, infinity instance actually pass deleting graph components'
`source_id` when removing document. It may cause wrong graph after
rebuild.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-30 09:43:17 +08:00
c8c3b756b0 Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140)
### What problem does this PR solve?

This PR adds the support for latest OpenSearch2.19.1 as the store engine
& search engine option for RAGFlow.

### Main Benefit

1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is
much better than Elasticsearch
2. For search, OpenSearch2.19.1 supports full-text
search、vector_search、hybrid_search those are similar with Elasticsearch
on schema
3. For store, OpenSearch2.19.1 stores text、vector those are quite
simliar with Elasticsearch on schema

### Changes

- Support opensearch_python_connetor. I make a lot of adaptions since
the schema and api/method between ES and Opensearch differs in many
ways(especially the knn_search has a significant gap) :
rag/utils/opensearch_coon.py
- Support static config adaptions by changing:
conf/service_conf.yaml、api/settings.py、rag/settings.py
- Supprt some store&search schema changes between OpenSearch and ES:
conf/os_mapping.json
- Support OpenSearch python sdk : pyproject.toml
- Support docker config for OpenSearch2.19.1 :
docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template

### How to use
- I didn't change the priority that ES as the default doc/search engine.
Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it
will work.


### Others
Our team tested a lot of docs in our environment by using OpenSearch as
the vector database ,it works very well.
All the conifg for OpenSearch is necessary.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2025-04-24 16:03:31 +08:00
8362ab405c Fix: don't modify S3 file name when not using prefix_path (#7152)
### What problem does this PR solve?

Hello, I encountered a problem when trying to use a S3 backend
(seaweedfs) for storage in RAGFlow: when calling
`STORAGE_IMPL.get("bucket", "key")`, the actual request sent to S3 is
`bucket/bucket/key`, causing a `NoSuchKey` error.

I compared the code in `s3_conn.py` to `minio_conn.py` and
`oss_conn.py`, then decided to remove the `else` branch in
`use_prefix_path` method, and it works. I didn't configure `prefix_path`
or `bucket` in `s3` section of the `service_conf.yaml`.

I think this is a bug, but not sure.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-21 11:55:50 +08:00
d4dbdfb61d feat: Recover pending tasks while pod restart. (#7073)
### What problem does this PR solve?

If you deploy Ragflow using Kubernetes, the hostname will change during
a rolling update. This causes the consumer name of the task executor to
change, making it impossible to schedule tasks that were previously in a
pending state.
To address this, I introduced a recovery task that scans these pending
messages and re-publishes them, allowing the tasks to continue being
processed.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

---------

Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>
2025-04-19 16:18:51 +08:00
9e7d052c8d Fix: knowledge graph resolution with infinity raise error tokenizing in specific situations (#7048)
### What problem does this PR solve?

When running graph resolution with infinity, if single quotation marks
appeared in the entities name that to be delete, an error tokenizing of
sqlglot might occur after calling infinity.

For example:
```
INFINITY delete table ragflow_xxx, filter knowledge_graph_kwd IN ('entity') AND entity_kwd IN ('86 IMAGES FROM PREVIOUS CONTESTS', 'ADAM OPTIMIZATION', 'BACKGROUND'ESTIMATION')
```
may raise error
```
Error tokenizing 'TS', 'ADAM OPTIMIZATION', 'BACKGROUND'ESTIMATION''
```
and make the document parsing failed。

Replace one single quotation mark with double single quotation marks can
let sqlglot tokenize the entity name correctly.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-17 16:15:21 +08:00
fdc410e743 Fix set_graph on non-existing edge (#6777)
### What problem does this PR solve?

Fix set_graph on non-existing edge

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-03 11:09:04 +08:00
a73fbc61ff Fix: Handle the case of deleting empty blocks. Update the relevant message (#6643)
…gic to return the correct deletion message. Add handling for empty
arrays to ensure no errors occur during the deletion operation. Update
the test cases to verify the new logic.

### What problem does this PR solve?

fix this bug:https://github.com/infiniflow/ragflow/issues/6607

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: wenju.li <wenju.li@deepctr.cn>
2025-04-02 19:20:17 +08:00
e7a2a4b7ff Log llm response on exception (#6750)
### What problem does this PR solve?

Log llm response on exception

### Type of change

- [x] Refactoring
2025-04-02 17:10:57 +08:00
e2b66628f4 Feat: extend S3 storage compatibility and add knowledge base ID prefix (#6355)
### What problem does this PR solve?

- Added support for S3-compatible protocols.
- Enabled the use of knowledge base ID as a file prefix when storing
files in S3.
- Updated docker/README.md to include detailed S3 and OSS configuration
instructions.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-03-31 16:09:43 +08:00
65a8cd1772 Fix knowledge_graph_kwd on infinity. Close #6476 and #6624 (#6651)
### What problem does this PR solve?

Fix knowledge_graph_kwd on infinity. Close #6476 and #6624

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-28 22:05:40 +08:00
82ccbd2cba fix:  Remove unnecessary minio initialization (#6544)
### What problem does this PR solve?

Prevent applications from failing to start due to calling non-existent
or incorrect Minio connection configurations when using file storage
outside of Minio

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-03-27 09:54:25 +08:00
6bf26e2a81 Optimize graphrag again (#6513)
### What problem does this PR solve?

Removed set_entity and set_relation to avoid accessing doc engine during
graph computation.
Introduced GraphChange to avoid writing unchanged chunks.

### Type of change

- [x] Performance Improvement
2025-03-26 15:34:42 +08:00
ca9c3e59fa Call register_scripts on connecting redis (#6361)
### What problem does this PR solve?

Call register_scripts on connecting redis

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-20 23:20:37 +08:00
dba0caa00b Fix update_progress (#6340)
### What problem does this PR solve?

Fix update_progress

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-20 17:01:28 +08:00
bb869aca33 Fix get_unacked_iterator (#6280)
### What problem does this PR solve?

Fix get_unacked_iterator. Close #6132 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-19 17:46:58 +08:00
49086964b8 Fix: type violations. (#6262)
### What problem does this PR solve?

#6238
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-19 12:12:34 +08:00
dd81c30976 Fix: tag_feas deletion error. (#6257)
### What problem does this PR solve?

#6218

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-19 11:25:11 +08:00
6e8d0e3177 Fix: rank feat issue. (#6225)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-18 16:07:29 +08:00