mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-02-03 00:55:10 +08:00
feature:Add OceanBase Storage Support for Table Parser (#12923)
### What problem does this PR solve? close #12770 This PR adds OceanBase as a storage backend for the Table Parser. It enables dynamic table schema storage via JSON and implements OceanBase SQL execution for text-to-SQL retrieval. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Changes - Table Parser stores row data into `chunk_data` when doc engine is OceanBase. (table.py) - OceanBase table schema adds `chunk_data` JSON column and migrates if needed. - Implemented OceanBase `sql()` to execute text-to-SQL results. (ob_conn.py) - Add `DOC_ENGINE_OCEANBASE` flag for engine detection (setting.py) ### Test 1. Set `DOC_ENGINE=oceanbase` (e.g. in `docker/.env`) <img width="1290" height="783" alt="doc_engine_ob" src="https://github.com/user-attachments/assets/7d1c609f-7bf2-4b2e-b4cc-4243e72ad4f1" /> 2. Upload an Excel file to Knowledge Base.(for test, we use as below) <img width="786" height="930" alt="excel" src="https://github.com/user-attachments/assets/bedf82f2-cd00-426b-8f4d-6978a151231a" /> 3. Choose **Table** as parsing method. <img width="2550" height="1134" alt="parse_excel" src="https://github.com/user-attachments/assets/aba11769-02be-4905-97e1-e24485e24cd0" /> 4.Ask a natural language query in chat. <img width="2550" height="1134" alt="query" src="https://github.com/user-attachments/assets/26a910a6-e503-4ac7-b66a-f5754bbb0e91" />
This commit is contained in:
@ -457,10 +457,10 @@ def chunk(filename, binary=None, from_page=0, to_page=10000000000, lang="Chinese
|
||||
txts.extend([str(c) for c in cln if c])
|
||||
clmns_map = [(py_clmns[i].lower() + fields_map[clmn_tys[i]], str(clmns[i]).replace("_", " ")) for i in
|
||||
range(len(clmns))]
|
||||
# For Infinity: Use original column names as keys since they're stored in chunk_data JSON
|
||||
# For Infinity/OceanBase: Use original column names as keys since they're stored in chunk_data JSON
|
||||
# For ES/OS: Use full field names with type suffixes (e.g., url_kwd, body_tks)
|
||||
if settings.DOC_ENGINE_INFINITY:
|
||||
# For Infinity: key = original column name, value = display name
|
||||
if settings.DOC_ENGINE_INFINITY or settings.DOC_ENGINE_OCEANBASE:
|
||||
# For Infinity/OceanBase: key = original column name, value = display name
|
||||
field_map = {py_clmns[i].lower(): str(clmns[i]).replace("_", " ") for i in range(len(clmns))}
|
||||
else:
|
||||
# For ES/OS: key = typed field name, value = display name
|
||||
@ -480,9 +480,9 @@ def chunk(filename, binary=None, from_page=0, to_page=10000000000, lang="Chinese
|
||||
continue
|
||||
if not isinstance(row[clmns[j]], pd.Series) and pd.isna(row[clmns[j]]):
|
||||
continue
|
||||
# For Infinity: Store in chunk_data JSON column
|
||||
# For Infinity/OceanBase: Store in chunk_data JSON column
|
||||
# For Elasticsearch/OpenSearch: Store as individual fields with type suffixes
|
||||
if settings.DOC_ENGINE_INFINITY:
|
||||
if settings.DOC_ENGINE_INFINITY or settings.DOC_ENGINE_OCEANBASE:
|
||||
data_json[str(clmns[j])] = row[clmns[j]]
|
||||
else:
|
||||
fld = clmns_map[j][0]
|
||||
@ -490,8 +490,8 @@ def chunk(filename, binary=None, from_page=0, to_page=10000000000, lang="Chinese
|
||||
row_fields.append((clmns[j], row[clmns[j]]))
|
||||
if not row_fields:
|
||||
continue
|
||||
# Add the data JSON field to the document (for Infinity only)
|
||||
if settings.DOC_ENGINE_INFINITY:
|
||||
# Add the data JSON field to the document (for Infinity/OceanBase)
|
||||
if settings.DOC_ENGINE_INFINITY or settings.DOC_ENGINE_OCEANBASE:
|
||||
d["chunk_data"] = data_json
|
||||
# Format as a structured text for better LLM comprehension
|
||||
# Format each field as "- Field Name: Value" on separate lines
|
||||
|
||||
Reference in New Issue
Block a user