Fix: Opensearch chunk management (#7802)

### What problem does this PR solve?

This PR solve the problems metioned in the
pr(https://github.com/infiniflow/ragflow/pull/7140) which is also
submitted by me

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):


### Introduction
I fixed the problems when using OpenSearch as the DOC_ENGINE, the
failures of pytest and the wrong API's return.
Mainly about delete chunk, list chunks, update chunk, retrieval chunk.
The pytest comand "cd sdk/python && uv sync --python 3.10 --group test
--frozen && source .venv/bin/activate && cd test/test_http_api &&
DOC_ENGINE=opensearch pytest test_chunk_management_within_dataset -s
--tb=short " is finally successful.

###Others
As some changes between Elasticsearch And Opensearch differ, some pytest
results about OpenSearch are correct and resonable. However, some pytest
params (skipif params) are incompatible. So I changed some pytest params
about skipif.

As a search engine programmer, I will still focus on the usage of vector
databases (especially OpenSearch) for the RAG stuff.
Thanks for your review
This commit is contained in:
pyyuhao
2025-05-26 16:57:58 +08:00
committed by GitHub
parent c09bd9fe4a
commit 5d6bf2224a
4 changed files with 13 additions and 10 deletions

View File

@ -217,7 +217,7 @@ class OSConnection(DocStoreConnection):
if bqry:
s = s.query(bqry)
for field in highlightFields:
s = s.highlight(field)
s = s.highlight(field,force_source=True,no_match_size=30,require_field_match=False)
if orderBy:
orders = list()
@ -269,7 +269,7 @@ class OSConnection(DocStoreConnection):
for i in range(ATTEMPT_TIME):
try:
res = self.os.get(index=(indexName),
id=chunkId, source=True, )
id=chunkId, _source=True, )
if str(res.get("timed_out", "")).lower() == "true":
raise Exception("Es Timeout.")
chunk = res["_source"]
@ -329,7 +329,7 @@ class OSConnection(DocStoreConnection):
chunkId = condition["id"]
for i in range(ATTEMPT_TIME):
try:
self.os.update(index=indexName, id=chunkId, doc=doc)
self.os.update(index=indexName, id=chunkId, body=doc)
return True
except Exception as e:
logger.exception(
@ -411,7 +411,10 @@ class OSConnection(DocStoreConnection):
chunk_ids = condition["id"]
if not isinstance(chunk_ids, list):
chunk_ids = [chunk_ids]
qry = Q("ids", values=chunk_ids)
if not chunk_ids: # when chunk_ids is empty, delete all
qry = Q("match_all")
else:
qry = Q("ids", values=chunk_ids)
else:
qry = Q("bool")
for k, v in condition.items():