Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140)

### What problem does this PR solve?

This PR adds the support for latest OpenSearch2.19.1 as the store engine
& search engine option for RAGFlow.

### Main Benefit

1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is
much better than Elasticsearch
2. For search, OpenSearch2.19.1 supports full-text
search、vector_search、hybrid_search those are similar with Elasticsearch
on schema
3. For store, OpenSearch2.19.1 stores text、vector those are quite
simliar with Elasticsearch on schema

### Changes

- Support opensearch_python_connetor. I make a lot of adaptions since
the schema and api/method between ES and Opensearch differs in many
ways(especially the knn_search has a significant gap) :
rag/utils/opensearch_coon.py
- Support static config adaptions by changing:
conf/service_conf.yaml、api/settings.py、rag/settings.py
- Supprt some store&search schema changes between OpenSearch and ES:
conf/os_mapping.json
- Support OpenSearch python sdk : pyproject.toml
- Support docker config for OpenSearch2.19.1 :
docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template

### How to use
- I didn't change the priority that ES as the default doc/search engine.
Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it
will work.


### Others
Our team tested a lot of docs in our environment by using OpenSearch as
the vector database ,it works very well.
All the conifg for OpenSearch is necessary.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
This commit is contained in:
pyyuhao
2025-04-24 16:03:31 +08:00
committed by GitHub
parent 9a8dda8fc7
commit c8c3b756b0
10 changed files with 865 additions and 0 deletions

26
uv.lock generated
View File

@ -1324,6 +1324,14 @@ wheels = [
{ url = "https://mirrors.aliyun.com/pypi/packages/c1/8b/5fe2cc11fee489817272089c4203e679c63b570a5aaeb18d852ae3cbba6a/et_xmlfile-2.0.0-py3-none-any.whl", hash = "sha256:7a91720bc756843502c3b7504c77b8fe44217c85c537d85037f0f536151b2caa" },
]
[[package]]
name = "events"
version = "0.5"
source = { registry = "https://mirrors.aliyun.com/pypi/simple" }
wheels = [
{ url = "https://mirrors.aliyun.com/pypi/packages/25/ed/e47dec0626edd468c84c04d97769e7ab4ea6457b7f54dcb3f72b17fcd876/Events-0.5-py3-none-any.whl", hash = "sha256:a7286af378ba3e46640ac9825156c93bdba7502174dd696090fdfcd4d80a1abd" },
]
[[package]]
name = "exceptiongroup"
version = "1.2.2"
@ -3652,6 +3660,22 @@ wheels = [
{ url = "https://mirrors.aliyun.com/pypi/packages/c0/da/977ded879c29cbd04de313843e76868e6e13408a94ed6b987245dc7c8506/openpyxl-3.1.5-py2.py3-none-any.whl", hash = "sha256:5282c12b107bffeef825f4617dc029afaf41d0ea60823bbb665ef3079dc79de2" },
]
[[package]]
name = "opensearch-py"
version = "2.7.1"
source = { registry = "https://mirrors.aliyun.com/pypi/simple" }
dependencies = [
{ name = "certifi" },
{ name = "events" },
{ name = "python-dateutil" },
{ name = "requests" },
{ name = "urllib3" },
]
sdist = { url = "https://mirrors.aliyun.com/pypi/packages/c4/ca/5be52de5c69ecd327c16f3fc0dba82b7ffda5bbd0c0e215bdf23a4d12b12/opensearch_py-2.7.1.tar.gz", hash = "sha256:67ab76e9373669bc71da417096df59827c08369ac3795d5438c9a8be21cbd759" }
wheels = [
{ url = "https://mirrors.aliyun.com/pypi/packages/80/8f/db678ae203d761922a73920215ea53a79faf3bb1ec6aa9511f809c8e234c/opensearch_py-2.7.1-py3-none-any.whl", hash = "sha256:5417650eba98a1c7648e502207cebf3a12beab623ffe0ebbf55f9b1b4b6e44e9" },
]
[[package]]
name = "orjson"
version = "3.10.15"
@ -4842,6 +4866,7 @@ dependencies = [
{ name = "opencv-python" },
{ name = "opencv-python-headless" },
{ name = "openpyxl" },
{ name = "opensearch-py" },
{ name = "ormsgpack" },
{ name = "pandas" },
{ name = "pdfplumber" },
@ -4978,6 +5003,7 @@ requires-dist = [
{ name = "opencv-python", specifier = "==4.10.0.84" },
{ name = "opencv-python-headless", specifier = "==4.10.0.84" },
{ name = "openpyxl", specifier = ">=3.1.0,<4.0.0" },
{ name = "opensearch-py", specifier = "==2.7.1" },
{ name = "ormsgpack", specifier = "==1.5.0" },
{ name = "pandas", specifier = ">=2.2.0,<3.0.0" },
{ name = "pdfplumber", specifier = "==0.10.4" },