### What problem does this PR solve?
Add get_uuid, download_img and hash_str2int into misc_utils.py
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
- rename rmSpace to remove_redundant_spaces
- move clean_markdown_block to common module
- add unit tests for remove_redundant_spaces and clean_markdown_block
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Improve file management. #10287.
Passed tests:
1. Create folder `A` and `B`.
2. Upload a file inside `A`, called `file`.
3. Create a KB, called `K`.
3. Link `file` to `K`.
4. Parse `file` inside of `K`. (OK)
5. Move `file` from `A` to `B`.
6. Parse `file` inside of `K`. (OK)
7. Move `file` from `B` to `A`.
8. Parse `file` inside of `K`. (OK)
9. Move entire folder `A` into `B`. (B -> A -> file)
10. Parse `file` inside of `K`. (OK)
11. Delete folder `B`.
12. All clear. (There is no document inside of `K`)
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Don't need rerank for infinity since Infinity normalizes each way score
before fusion.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
- Admin service support SHOW SERVICE <id>.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
issue: #10241
### What problem does this PR solve?
- Admin client support drop user.
Issue: #10241
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Refactor import modules.
### Type of change
- [x] Refactoring
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Moved `signature_version` and `addressing_style` parameters to a
`Config` object from `botocore.config`
`signature_version` is now passed as `Config(signature_version='v4')`
`addressing_style` is now passed as `Config(s3={'addressing_style':
'path'})`
The `Config` object is then passed to `boto3.client()` via the `config`
parameter
## Changes Made
- Modified `rag/utils/s3_conn.py` in the `__open__()` method
- Updated parameter handling logic to use `config_kwargs` dictionary
- Maintained backward compatibility for configurations without these
parameters
## Related Issue
Fixes#9910
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: Syed Shahmeer Ali <ashahmeer73@gmail.com>
### What problem does this PR solve?
This PR fixes a critical bug in the knowledge base isolation feature
where chat responses were referencing documents from incorrect knowledge
bases. The issue was in the `infinity_conn.py` file where the
`equivalent_condition_to_str()` function was incorrectly skipping
`kb_id` filtering, causing documents from unintended knowledge bases to
be included in search results.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Syed Shahmeer Ali <ashahmeer73@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Add missing env var `MYSQL_MAX_PACKET` to service_conf.yaml.template,
and add default values to opendal config to fix npe.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
#9082#6365
<u> **WARNING: it's not compatible with the older version of `Agent`
module, which means that `Agent` from older versions can not work
anymore.**</u>
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix a small non-blocking main workflow bug about chunk update When
OpenSearch is the doc engine.
When you wanna enable/disable a chunk in the web-page “Knowledge Base /
Dataset / Chunk”, the bug ocurred.
<img width="2388" height="662" alt="image"
src="https://github.com/user-attachments/assets/575987a0-c929-4589-bfa0-ba54e137cfd9"
/>
The reaseon why it ocurred is that some api params between OpenSearch
and ES differs. It functioned well no matter enable/disable/rewrite the
chunk after I fixed. I also checked the result when using the chat
web-page.
<img width="2394" height="660" alt="image"
src="https://github.com/user-attachments/assets/8b899dc6-d769-4e80-8dd8-ad0fbbca5f78"
/>
I will still focus on vector-database espeically OpenSearch.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: 张雨豪 <zhangyh80@chinatelecom.cn>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Use `quote_plus` to escape password in opendal's mysql url to support
special characters like `#`.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
when``` if 'signature_version' in self.s3_config:``` and ```if
'addressing_style' in self.s3_config:``` both true.
the config init is error, will be overwrite by last one.
this pr is for fix that case.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
### What problem does this PR solve?
Fix the config option name of the opendal table name and setting of
'max_allowed_packet'.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: He Wang <wanghechn@qq.com>
### What problem does this PR solve?
Fix: the output log is incorrect
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: liang <xiaofeng.liang@landstech.com.cn>
**Context and Purpose:**
This PR automatically remediates a security vulnerability:
- **Description:** Detected possible formatted SQL query. Use
parameterized queries instead.
- **Rule ID:**
python.lang.security.audit.formatted-sql-query.formatted-sql-query
- **Severity:** HIGH
- **File:** rag/utils/opendal_conn.py
- **Lines Affected:** 98 - 98
This change is necessary to protect the application from potential
security risks associated with this vulnerability.
**Solution Implemented:**
The automated remediation process has applied the necessary changes to
the affected code in `rag/utils/opendal_conn.py` to resolve the
identified issue.
Please review the changes to ensure they are correct and integrate as
expected.
### What problem does this PR solve?
This PR fixes two issues in the OpenDAL storage connector:
1. The `health` method was missing, which prevented health checks on
the storage backend.
3. The initialization of the `opendal.Operator` object included a
redundant scheme parameter, causing unnecessary duplication and
potential confusion.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Background
- The absence of a `health` method made it difficult to verify the
availability and reliability of the storage service.
- Initializing `opendal.Operator` with both `self._scheme` and
unpacked `**self._kwargs` could lead to errors or unexpected behavior
if the scheme was already included in the kwargs.
### What is changed and how it works?
- Adds a `health` method that writes a test file to verify storage
availability.
- Removes the duplicate scheme parameter from the `opendal.Operator`
initialization to ensure clarity and prevent conflicts.
before:
<img width="762" alt="企业微信截图_46be646f-2e99-4e5e-be67-b1483426e77c"
src="https://github.com/user-attachments/assets/acecbb8c-4810-457f-8342-6355148551ba"
/>
<img width="767" alt="image"
src="https://github.com/user-attachments/assets/147cd5a2-dde3-466b-a9c1-d1d4f0819e5d"
/>
after:
<img width="1123" alt="企业微信截图_09d62997-8908-4985-b89f-7a78b5da55ac"
src="https://github.com/user-attachments/assets/97dc88c9-0f4e-4d77-88b3-cd818e8da046"
/>
### What problem does this PR solve?
This PR resolves the inconsistency in the opendal configuration where
both `schema` and `scheme` were used as keys. The code and
configuration file now consistently use `scheme`, which helps prevent
configuration errors and runtime issues. This change improves code
clarity and maintainability.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Additional context
- Updated both `conf/service_conf.yaml` and
`rag/utils/opendal_conn.py` to use `scheme` instead of `schema`
- No breaking changes to other configuration fields
### What problem does this PR solve?
#8074
Oss support opendal(including mysql)
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
This PR solve the problems metioned in the
pr(https://github.com/infiniflow/ragflow/pull/7140) which is also
submitted by me
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### Introduction
I fixed the problems when using OpenSearch as the DOC_ENGINE, the
failures of pytest and the wrong API's return.
Mainly about delete chunk, list chunks, update chunk, retrieval chunk.
The pytest comand "cd sdk/python && uv sync --python 3.10 --group test
--frozen && source .venv/bin/activate && cd test/test_http_api &&
DOC_ENGINE=opensearch pytest test_chunk_management_within_dataset -s
--tb=short " is finally successful.
###Others
As some changes between Elasticsearch And Opensearch differ, some pytest
results about OpenSearch are correct and resonable. However, some pytest
params (skipif params) are incompatible. So I changed some pytest params
about skipif.
As a search engine programmer, I will still focus on the usage of vector
databases (especially OpenSearch) for the RAG stuff.
Thanks for your review
### What problem does this PR solve?
Delete Corresponding Minio Bucket When Deleting a Knowledge Base
[issue #4113 ](https://github.com/infiniflow/ragflow/issues/4113)
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
When you removed any document in a knowledge base using knowledge graph,
the graph's `removed_kwd` is set to "Y".
However, in the function `graphrag.utils.get_gaph`, `rebuild_graph`
method is passed and directly return `None` while `removed_kwd=Y`,
making residual part of the graph abandoned (but old entity data still
exist in db).
Besides, infinity instance actually pass deleting graph components'
`source_id` when removing document. It may cause wrong graph after
rebuild.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
This PR adds the support for latest OpenSearch2.19.1 as the store engine
& search engine option for RAGFlow.
### Main Benefit
1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is
much better than Elasticsearch
2. For search, OpenSearch2.19.1 supports full-text
search、vector_search、hybrid_search those are similar with Elasticsearch
on schema
3. For store, OpenSearch2.19.1 stores text、vector those are quite
simliar with Elasticsearch on schema
### Changes
- Support opensearch_python_connetor. I make a lot of adaptions since
the schema and api/method between ES and Opensearch differs in many
ways(especially the knn_search has a significant gap) :
rag/utils/opensearch_coon.py
- Support static config adaptions by changing:
conf/service_conf.yaml、api/settings.py、rag/settings.py
- Supprt some store&search schema changes between OpenSearch and ES:
conf/os_mapping.json
- Support OpenSearch python sdk : pyproject.toml
- Support docker config for OpenSearch2.19.1 :
docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template
### How to use
- I didn't change the priority that ES as the default doc/search engine.
Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it
will work.
### Others
Our team tested a lot of docs in our environment by using OpenSearch as
the vector database ,it works very well.
All the conifg for OpenSearch is necessary.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
### What problem does this PR solve?
Hello, I encountered a problem when trying to use a S3 backend
(seaweedfs) for storage in RAGFlow: when calling
`STORAGE_IMPL.get("bucket", "key")`, the actual request sent to S3 is
`bucket/bucket/key`, causing a `NoSuchKey` error.
I compared the code in `s3_conn.py` to `minio_conn.py` and
`oss_conn.py`, then decided to remove the `else` branch in
`use_prefix_path` method, and it works. I didn't configure `prefix_path`
or `bucket` in `s3` section of the `service_conf.yaml`.
I think this is a bug, but not sure.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):