Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140)

### What problem does this PR solve?

This PR adds the support for latest OpenSearch2.19.1 as the store engine
& search engine option for RAGFlow.

### Main Benefit

1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is
much better than Elasticsearch
2. For search, OpenSearch2.19.1 supports full-text
search、vector_search、hybrid_search those are similar with Elasticsearch
on schema
3. For store, OpenSearch2.19.1 stores text、vector those are quite
simliar with Elasticsearch on schema

### Changes

- Support opensearch_python_connetor. I make a lot of adaptions since
the schema and api/method between ES and Opensearch differs in many
ways(especially the knn_search has a significant gap) :
rag/utils/opensearch_coon.py
- Support static config adaptions by changing:
conf/service_conf.yaml、api/settings.py、rag/settings.py
- Supprt some store&search schema changes between OpenSearch and ES:
conf/os_mapping.json
- Support OpenSearch python sdk : pyproject.toml
- Support docker config for OpenSearch2.19.1 :
docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template

### How to use
- I didn't change the priority that ES as the default doc/search engine.
Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it
will work.


### Others
Our team tested a lot of docs in our environment by using OpenSearch as
the vector database ,it works very well.
All the conifg for OpenSearch is necessary.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
This commit is contained in:
pyyuhao
2025-04-24 16:03:31 +08:00
committed by GitHub
parent 9a8dda8fc7
commit c8c3b756b0
10 changed files with 865 additions and 0 deletions

View File

@ -2,8 +2,10 @@
# Available options:
# - `elasticsearch` (default)
# - `infinity` (https://github.com/infiniflow/infinity)
# - `opensearch` (https://github.com/opensearch-project/OpenSearch)
DOC_ENGINE=${DOC_ENGINE:-elasticsearch}
# ------------------------------
# docker env var for specifying vector db type at startup
# (based on the vector db type, the corresponding docker
@ -24,6 +26,16 @@ ES_PORT=1200
# The password for Elasticsearch.
ELASTIC_PASSWORD=infini_rag_flow
# the hostname where OpenSearch service is exposed, set it not the same as elasticsearch
OS_PORT=1201
# The hostname where the OpenSearch service is exposed
OS_HOST=opensearch01
# The password for OpenSearch.
# At least one uppercase letter, one lowercase letter, one digit, and one special character
OPENSEARCH_PASSWORD=infini_rag_flow_OS_01
# The port used to expose the Kibana service to the host machine,
# allowing EXTERNAL access to the service running inside the Docker container.
KIBANA_PORT=6601

View File

@ -35,6 +35,44 @@ services:
- ragflow
restart: on-failure
opensearch01:
container_name: ragflow-opensearch-01
profiles:
- opensearch
image: hub.icert.top/opensearchproject/opensearch:2.19.1
volumes:
- osdata01:/usr/share/opensearch/data
ports:
- ${OS_PORT}:9201
env_file: .env
environment:
- node.name=opensearch01
- OPENSEARCH_PASSWORD=${OPENSEARCH_PASSWORD}
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=${OPENSEARCH_PASSWORD}
- bootstrap.memory_lock=false
- discovery.type=single-node
- plugins.security.disabled=false
- plugins.security.ssl.http.enabled=false
- plugins.security.ssl.transport.enabled=true
- cluster.routing.allocation.disk.watermark.low=5gb
- cluster.routing.allocation.disk.watermark.high=3gb
- cluster.routing.allocation.disk.watermark.flood_stage=2gb
- TZ=${TIMEZONE}
- http.port=9201
mem_limit: ${MEM_LIMIT}
ulimits:
memlock:
soft: -1
hard: -1
healthcheck:
test: ["CMD-SHELL", "curl http://localhost:9201"]
interval: 10s
timeout: 10s
retries: 120
networks:
- ragflow
restart: on-failure
infinity:
container_name: ragflow-infinity
profiles:
@ -133,6 +171,8 @@ services:
volumes:
esdata01:
driver: local
osdata01:
driver: local
infinity_data:
driver: local
mysql_data:

View File

@ -17,6 +17,10 @@ es:
hosts: 'http://${ES_HOST:-es01}:9200'
username: '${ES_USER:-elastic}'
password: '${ELASTIC_PASSWORD:-infini_rag_flow}'
os:
hosts: 'http://${OS_HOST:-opensearch01}:9201'
username: '${OS_USER:-admin}'
password: '${OPENSEARCHH_PASSWORD:-infini_rag_flow_OS_01}'
infinity:
uri: '${INFINITY_HOST:-infinity}:23817'
db_name: 'default_db'