Compare commits

...

13 Commits

Author SHA1 Message Date
06cef71ba6 Feat: add or logic operations for meta data filters. (#11404)
### What problem does this PR solve?

#11376 #11387

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-20 14:31:12 +08:00
d2b1da0e26 Fix: Optimize edge check & incorrect parameter usage (#11396)
### What problem does this PR solve?

Fix: incorrect parameter usage #8084
Fix: Optimize edge check #10851

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-20 12:49:47 +08:00
7c6d30f4c8 Fix:RagFlow not starting with Postgres DB (#11398)
### What problem does this PR solve?
issue:
#11293 
change:
RagFlow not starting with Postgres DB
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-20 12:49:13 +08:00
ea0352ee4a Fix: Introducing a new JSON editor (#11401)
### What problem does this PR solve?

Fix: Introducing a new JSON editor

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-20 12:44:32 +08:00
fa5cf10f56 Bump infinity to 0.6.6 (#11399)
Bump infinity to 0.6.6

- [x] Refactoring
2025-11-20 11:23:54 +08:00
3fe71ab7dd Use array syntax for commands in docker-compose-base.yml (#11391)
Use array syntax in order to prevent parameter quoting issues. This also
runs the command directly without a bash process, which means signals
(like SIGTERM) will be delivered directly to the server process.

Fixes issue #11390

### What problem does this PR solve?

`${REDIS_PASSWORD}` was not passed correctly, meaning if it was unset or
contains spaces (or shell code!) it was interpreted wrongly.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-20 10:14:56 +08:00
9f715d6bc2 Feature (canvas): Add mind tagging support (#11359)
### What problem does this PR solve?
Resolve the issue of missing thinking labels when viewing pre-existing
conversations
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-20 10:11:28 +08:00
48de3b26ba locale en add russian language option (#11392)
### What problem does this PR solve?
add russian language option

### Type of change


- [x] Other (please describe):
2025-11-20 10:10:51 +08:00
273c4bc4d3 Locale: update russian language (#11393)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change


- [x] Documentation Update
- [x] Other (please describe):
2025-11-20 10:10:39 +08:00
420c97199a Feat: Add TCADP parser for PPTX and spreadsheet document types. (#11041)
### What problem does this PR solve?

- Added TCADP Parser configuration fields to PDF, PPT, and spreadsheet
parsing forms
- Implemented support for setting table result type (Markdown/HTML) and
Markdown image response type (URL/Text)
- Updated TCADP Parser to handle return format settings from
configuration or parameters
- Enhanced frontend to dynamically show TCADP options based on selected
parsing method
- Modified backend to pass format parameters when calling TCADP API
- Optimized form default value logic for TCADP configuration items
- Updated multilingual resource files for new configuration options

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-20 10:08:42 +08:00
ecf0322165 fix(llm): handle None response in total_token_count_from_response (#10941)
### What problem does this PR solve?

Fixes #10933

This PR fixes a `TypeError` in the Gemini model provider where the
`total_token_count_from_response()` function could receive a `None`
response object, causing the error:

TypeError: argument of type 'NoneType' is not iterable

**Root Cause:**
The function attempted to use the `in` operator to check dictionary keys
(lines 48, 54, 60) without first validating that `resp` was not `None`.
When Gemini's `chat_streamly()` method returns `None`, this triggers the
error.

**Solution:**
1. Added a null check at the beginning of the function to return `0` if
`resp is None`
2. Added `isinstance(resp, dict)` checks before all `in` operations to
ensure type safety
3. This defensive programming approach prevents the TypeError while
maintaining backward compatibility

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Changes Made

**File:** `rag/utils/__init__.py`

- Line 36-38: Added `if resp is None: return 0` check
- Line 52: Added `isinstance(resp, dict)` before `'usage' in resp`
- Line 58: Added `isinstance(resp, dict)` before `'usage' in resp`  
- Line 64: Added `isinstance(resp, dict)` before `'meta' in resp`

### Testing

- [x] Code compiles without errors
- [x] Follows existing code style and conventions
- [x] Change is minimal and focused on the specific issue

### Additional Notes

This fix ensures robust handling of various response types from LLM
providers, particularly Gemini, w

---------

Signed-off-by: Zhang Zhefang <zhangzhefang@example.com>
2025-11-20 10:04:03 +08:00
38234aca53 feat: add OceanBase doc engine (#11228)
### What problem does this PR solve?

Add OceanBase doc engine. Close #5350

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-20 10:00:14 +08:00
1c06ec39ca fix cohere rerank base_url default (#11353)
### What problem does this PR solve?

**Cohere rerank base_url default handling**

- Background: When no rerank base URL is configured, the settings
pipeline was passing an empty string through RERANK_CFG →
TenantLLMService → CoHereRerank, so the Cohere client received
base_url="" and produced “missing protocol” errors during rerank calls.

- What changed: The CoHereRerank constructor now only forwards base_url
to the Cohere client when it isn’t empty/whitespace, causing the client
to fall back to its default API endpoint otherwise.

- Why it matters: This prevents invalid URL construction in the rerank
workflow and keeps tests/sanity checks that rely on the default Cohere
endpoint from failing when no custom base URL is specified.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Philipp Heyken Soares <philipp.heyken-soares@am.ai>
2025-11-20 09:46:39 +08:00
52 changed files with 7787 additions and 4052 deletions

View File

@ -132,8 +132,8 @@ class Retrieval(ToolBase, ABC):
metas = DocumentService.get_meta_by_kbs(kb_ids)
if self._param.meta_data_filter.get("method") == "auto":
chat_mdl = LLMBundle(self._canvas.get_tenant_id(), LLMType.CHAT)
filters = gen_meta_filter(chat_mdl, metas, query)
doc_ids.extend(meta_filter(metas, filters))
filters: dict = gen_meta_filter(chat_mdl, metas, query)
doc_ids.extend(meta_filter(metas, filters["conditions"], filters.get("logic", "and")))
if not doc_ids:
doc_ids = None
elif self._param.meta_data_filter.get("method") == "manual":
@ -165,7 +165,7 @@ class Retrieval(ToolBase, ABC):
out_parts.append(s[last:])
flt["value"] = "".join(out_parts)
doc_ids.extend(meta_filter(metas, filters))
doc_ids.extend(meta_filter(metas, filters, self._param.meta_data_filter.get("logic", "and")))
if not doc_ids:
doc_ids = None

View File

@ -305,12 +305,12 @@ async def retrieval_test():
metas = DocumentService.get_meta_by_kbs(kb_ids)
if meta_data_filter.get("method") == "auto":
chat_mdl = LLMBundle(current_user.id, LLMType.CHAT, llm_name=search_config.get("chat_id", ""))
filters = gen_meta_filter(chat_mdl, metas, question)
doc_ids.extend(meta_filter(metas, filters))
filters: dict = gen_meta_filter(chat_mdl, metas, question)
doc_ids.extend(meta_filter(metas, filters["conditions"], filters.get("logic", "and")))
if not doc_ids:
doc_ids = None
elif meta_data_filter.get("method") == "manual":
doc_ids.extend(meta_filter(metas, meta_data_filter["manual"]))
doc_ids.extend(meta_filter(metas, meta_data_filter["manual"], meta_data_filter.get("logic", "and")))
if not doc_ids:
doc_ids = None

View File

@ -159,10 +159,10 @@ async def webhook(tenant_id: str, agent_id: str):
data=False, message=str(e),
code=RetCode.EXCEPTION_ERROR)
def sse():
async def sse():
nonlocal canvas
try:
for ans in canvas.run(query=req.get("query", ""), files=req.get("files", []), user_id=req.get("user_id", tenant_id), webhook_payload=req):
async for ans in canvas.run(query=req.get("query", ""), files=req.get("files", []), user_id=req.get("user_id", tenant_id), webhook_payload=req):
yield "data:" + json.dumps(ans, ensure_ascii=False) + "\n\n"
cvs.dsl = json.loads(str(canvas))

View File

@ -120,7 +120,7 @@ async def retrieval(tenant_id):
retrieval_setting = req.get("retrieval_setting", {})
similarity_threshold = float(retrieval_setting.get("score_threshold", 0.0))
top = int(retrieval_setting.get("top_k", 1024))
metadata_condition = req.get("metadata_condition", {})
metadata_condition = req.get("metadata_condition", {}) or {}
metas = DocumentService.get_meta_by_kbs([kb_id])
doc_ids = []
@ -132,7 +132,7 @@ async def retrieval(tenant_id):
embd_mdl = LLMBundle(kb.tenant_id, LLMType.EMBEDDING.value, llm_name=kb.embd_id)
if metadata_condition:
doc_ids.extend(meta_filter(metas, convert_conditions(metadata_condition)))
doc_ids.extend(meta_filter(metas, convert_conditions(metadata_condition), metadata_condition.get("logic", "and")))
if not doc_ids and metadata_condition:
doc_ids = ["-999"]
ranks = settings.retriever.retrieval(

View File

@ -1442,9 +1442,9 @@ async def retrieval_test(tenant_id):
if doc_id not in doc_ids_list:
return get_error_data_result(f"The datasets don't own the document {doc_id}")
if not doc_ids:
metadata_condition = req.get("metadata_condition", {})
metadata_condition = req.get("metadata_condition", {}) or {}
metas = DocumentService.get_meta_by_kbs(kb_ids)
doc_ids = meta_filter(metas, convert_conditions(metadata_condition))
doc_ids = meta_filter(metas, convert_conditions(metadata_condition), metadata_condition.get("logic", "and"))
similarity_threshold = float(req.get("similarity_threshold", 0.2))
vector_similarity_weight = float(req.get("vector_similarity_weight", 0.3))
top = int(req.get("top_k", 1024))

View File

@ -428,17 +428,15 @@ async def agents_completion_openai_compatibility(tenant_id, agent_id):
return resp
else:
# For non-streaming, just return the response directly
response = next(
completion_openai(
async for response in completion_openai(
tenant_id,
agent_id,
question,
session_id=req.pop("session_id", req.get("id", "")) or req.get("metadata", {}).get("id", ""),
stream=False,
**req,
)
)
return jsonify(response)
):
return jsonify(response)
@manager.route("/agents/<agent_id>/completions", methods=["POST"]) # noqa: F821
@ -977,12 +975,12 @@ async def retrieval_test_embedded():
metas = DocumentService.get_meta_by_kbs(kb_ids)
if meta_data_filter.get("method") == "auto":
chat_mdl = LLMBundle(tenant_id, LLMType.CHAT, llm_name=search_config.get("chat_id", ""))
filters = gen_meta_filter(chat_mdl, metas, question)
doc_ids.extend(meta_filter(metas, filters))
filters: dict = gen_meta_filter(chat_mdl, metas, question)
doc_ids.extend(meta_filter(metas, filters["conditions"], filters.get("logic", "and")))
if not doc_ids:
doc_ids = None
elif meta_data_filter.get("method") == "manual":
doc_ids.extend(meta_filter(metas, meta_data_filter["manual"]))
doc_ids.extend(meta_filter(metas, meta_data_filter["manual"], meta_data_filter.get("logic", "and")))
if not doc_ids:
doc_ids = None

View File

@ -177,7 +177,7 @@ class UserCanvasService(CommonService):
return True
def completion(tenant_id, agent_id, session_id=None, **kwargs):
async def completion(tenant_id, agent_id, session_id=None, **kwargs):
query = kwargs.get("query", "") or kwargs.get("question", "")
files = kwargs.get("files", [])
inputs = kwargs.get("inputs", {})
@ -219,10 +219,14 @@ def completion(tenant_id, agent_id, session_id=None, **kwargs):
"id": message_id
})
txt = ""
for ans in canvas.run(query=query, files=files, user_id=user_id, inputs=inputs):
async for ans in canvas.run(query=query, files=files, user_id=user_id, inputs=inputs):
ans["session_id"] = session_id
if ans["event"] == "message":
txt += ans["data"]["content"]
if ans["data"].get("start_to_think", False):
txt += "<think>"
elif ans["data"].get("end_to_think", False):
txt += "</think>"
yield "data:" + json.dumps(ans, ensure_ascii=False) + "\n\n"
conv.message.append({"role": "assistant", "content": txt, "created_at": time.time(), "id": message_id})
@ -233,7 +237,7 @@ def completion(tenant_id, agent_id, session_id=None, **kwargs):
API4ConversationService.append_message(conv["id"], conv)
def completion_openai(tenant_id, agent_id, question, session_id=None, stream=True, **kwargs):
async def completion_openai(tenant_id, agent_id, question, session_id=None, stream=True, **kwargs):
tiktoken_encoder = tiktoken.get_encoding("cl100k_base")
prompt_tokens = len(tiktoken_encoder.encode(str(question)))
user_id = kwargs.get("user_id", "")
@ -241,7 +245,7 @@ def completion_openai(tenant_id, agent_id, question, session_id=None, stream=Tru
if stream:
completion_tokens = 0
try:
for ans in completion(
async for ans in completion(
tenant_id=tenant_id,
agent_id=agent_id,
session_id=session_id,
@ -300,7 +304,7 @@ def completion_openai(tenant_id, agent_id, question, session_id=None, stream=Tru
try:
all_content = ""
reference = {}
for ans in completion(
async for ans in completion(
tenant_id=tenant_id,
agent_id=agent_id,
session_id=session_id,

View File

@ -15,6 +15,7 @@
#
import logging
from datetime import datetime
import os
from typing import Tuple, List
from anthropic import BaseModel
@ -103,7 +104,8 @@ class SyncLogsService(CommonService):
Knowledgebase.avatar.alias("kb_avatar"),
Connector2Kb.auto_parse,
cls.model.from_beginning.alias("reindex"),
cls.model.status
cls.model.status,
cls.model.update_time
]
if not connector_id:
fields.append(Connector.config)
@ -116,7 +118,11 @@ class SyncLogsService(CommonService):
if connector_id:
query = query.where(cls.model.connector_id == connector_id)
else:
interval_expr = SQL("INTERVAL `t2`.`refresh_freq` MINUTE")
database_type = os.getenv("DB_TYPE", "mysql")
if "postgres" in database_type.lower():
interval_expr = SQL("make_interval(mins => t2.refresh_freq)")
else:
interval_expr = SQL("INTERVAL `t2`.`refresh_freq` MINUTE")
query = query.where(
Connector.input_type == InputType.POLL,
Connector.status == TaskStatus.SCHEDULE,

View File

@ -287,7 +287,7 @@ def convert_conditions(metadata_condition):
]
def meta_filter(metas: dict, filters: list[dict]):
def meta_filter(metas: dict, filters: list[dict], logic: str = "and"):
doc_ids = set([])
def filter_out(v2docs, operator, value):
@ -331,7 +331,10 @@ def meta_filter(metas: dict, filters: list[dict]):
if not doc_ids:
doc_ids = set(ids)
else:
doc_ids = doc_ids & set(ids)
if logic == "and":
doc_ids = doc_ids & set(ids)
else:
doc_ids = doc_ids | set(ids)
if not doc_ids:
return []
return list(doc_ids)
@ -407,12 +410,12 @@ def chat(dialog, messages, stream=True, **kwargs):
if dialog.meta_data_filter:
metas = DocumentService.get_meta_by_kbs(dialog.kb_ids)
if dialog.meta_data_filter.get("method") == "auto":
filters = gen_meta_filter(chat_mdl, metas, questions[-1])
attachments.extend(meta_filter(metas, filters))
filters: dict = gen_meta_filter(chat_mdl, metas, questions[-1])
attachments.extend(meta_filter(metas, filters["conditions"], filters.get("logic", "and")))
if not attachments:
attachments = None
elif dialog.meta_data_filter.get("method") == "manual":
attachments.extend(meta_filter(metas, dialog.meta_data_filter["manual"]))
attachments.extend(meta_filter(metas, dialog.meta_data_filter["manual"], dialog.meta_data_filter.get("logic", "and")))
if not attachments:
attachments = None
@ -778,12 +781,12 @@ def ask(question, kb_ids, tenant_id, chat_llm_name=None, search_config={}):
if meta_data_filter:
metas = DocumentService.get_meta_by_kbs(kb_ids)
if meta_data_filter.get("method") == "auto":
filters = gen_meta_filter(chat_mdl, metas, question)
doc_ids.extend(meta_filter(metas, filters))
filters: dict = gen_meta_filter(chat_mdl, metas, question)
doc_ids.extend(meta_filter(metas, filters["conditions"], filters.get("logic", "and")))
if not doc_ids:
doc_ids = None
elif meta_data_filter.get("method") == "manual":
doc_ids.extend(meta_filter(metas, meta_data_filter["manual"]))
doc_ids.extend(meta_filter(metas, meta_data_filter["manual"], meta_data_filter.get("logic", "and")))
if not doc_ids:
doc_ids = None
@ -853,12 +856,12 @@ def gen_mindmap(question, kb_ids, tenant_id, search_config={}):
if meta_data_filter:
metas = DocumentService.get_meta_by_kbs(kb_ids)
if meta_data_filter.get("method") == "auto":
filters = gen_meta_filter(chat_mdl, metas, question)
doc_ids.extend(meta_filter(metas, filters))
filters: dict = gen_meta_filter(chat_mdl, metas, question)
doc_ids.extend(meta_filter(metas, filters["conditions"], filters.get("logic", "and")))
if not doc_ids:
doc_ids = None
elif meta_data_filter.get("method") == "manual":
doc_ids.extend(meta_filter(metas, meta_data_filter["manual"]))
doc_ids.extend(meta_filter(metas, meta_data_filter["manual"], meta_data_filter.get("logic", "and")))
if not doc_ids:
doc_ids = None

View File

@ -27,6 +27,7 @@ from common.constants import SVR_QUEUE_NAME, Storage
import rag.utils
import rag.utils.es_conn
import rag.utils.infinity_conn
import rag.utils.ob_conn
import rag.utils.opensearch_conn
from rag.utils.azure_sas_conn import RAGFlowAzureSasBlob
from rag.utils.azure_spn_conn import RAGFlowAzureSpnBlob
@ -103,6 +104,7 @@ INFINITY = {}
AZURE = {}
S3 = {}
MINIO = {}
OB = {}
OSS = {}
OS = {}
@ -227,7 +229,7 @@ def init_settings():
FEISHU_OAUTH = get_base_config("oauth", {}).get("feishu")
OAUTH_CONFIG = get_base_config("oauth", {})
global DOC_ENGINE, docStoreConn, ES, OS, INFINITY
global DOC_ENGINE, docStoreConn, ES, OB, OS, INFINITY
DOC_ENGINE = os.environ.get("DOC_ENGINE", "elasticsearch")
# DOC_ENGINE = os.environ.get('DOC_ENGINE', "opensearch")
lower_case_doc_engine = DOC_ENGINE.lower()
@ -240,6 +242,9 @@ def init_settings():
elif lower_case_doc_engine == "opensearch":
OS = get_base_config("os", {})
docStoreConn = rag.utils.opensearch_conn.OSConnection()
elif lower_case_doc_engine == "oceanbase":
OB = get_base_config("oceanbase", {})
docStoreConn = rag.utils.ob_conn.OBConnection()
else:
raise Exception(f"Not supported doc engine: {DOC_ENGINE}")

View File

@ -35,6 +35,12 @@ def num_tokens_from_string(string: str) -> int:
return 0
def total_token_count_from_response(resp):
"""
Extract token count from LLM response in various formats.
Handles None responses and different response structures from various LLM providers.
Returns 0 if token count cannot be determined.
"""
if resp is None:
return 0
@ -50,19 +56,19 @@ def total_token_count_from_response(resp):
except Exception:
pass
if 'usage' in resp and 'total_tokens' in resp['usage']:
if isinstance(resp, dict) and 'usage' in resp and 'total_tokens' in resp['usage']:
try:
return resp["usage"]["total_tokens"]
except Exception:
pass
if 'usage' in resp and 'input_tokens' in resp['usage'] and 'output_tokens' in resp['usage']:
if isinstance(resp, dict) and 'usage' in resp and 'input_tokens' in resp['usage'] and 'output_tokens' in resp['usage']:
try:
return resp["usage"]["input_tokens"] + resp["usage"]["output_tokens"]
except Exception:
pass
if 'meta' in resp and 'tokens' in resp['meta'] and 'input_tokens' in resp['meta']['tokens'] and 'output_tokens' in resp['meta']['tokens']:
if isinstance(resp, dict) and 'meta' in resp and 'tokens' in resp['meta'] and 'input_tokens' in resp['meta']['tokens'] and 'output_tokens' in resp['meta']['tokens']:
try:
return resp["meta"]["tokens"]["input_tokens"] + resp["meta"]["tokens"]["output_tokens"]
except Exception:

View File

@ -28,6 +28,14 @@ os:
infinity:
uri: 'localhost:23817'
db_name: 'default_db'
oceanbase:
scheme: 'oceanbase' # set 'mysql' to create connection using mysql config
config:
db_name: 'test'
user: 'root@ragflow'
password: 'infini_rag_flow'
host: 'localhost'
port: 2881
redis:
db: 1
password: 'infini_rag_flow'
@ -139,5 +147,3 @@ user_default_llm:
# secret_id: 'tencent_secret_id'
# secret_key: 'tencent_secret_key'
# region: 'tencent_region'
# table_result_type: '1'
# markdown_image_response_type: '1'

View File

@ -192,12 +192,16 @@ class TencentCloudAPIClient:
class TCADPParser(RAGFlowPdfParser):
def __init__(self, secret_id: str = None, secret_key: str = None, region: str = "ap-guangzhou"):
def __init__(self, secret_id: str = None, secret_key: str = None, region: str = "ap-guangzhou",
table_result_type: str = None, markdown_image_response_type: str = None):
super().__init__()
# First initialize logger
self.logger = logging.getLogger(self.__class__.__name__)
# Log received parameters
self.logger.info(f"[TCADP] Initializing with parameters - table_result_type: {table_result_type}, markdown_image_response_type: {markdown_image_response_type}")
# Priority: read configuration from RAGFlow configuration system (service_conf.yaml)
try:
tcadp_parser = get_base_config("tcadp_config", {})
@ -205,14 +209,30 @@ class TCADPParser(RAGFlowPdfParser):
self.secret_id = secret_id or tcadp_parser.get("secret_id")
self.secret_key = secret_key or tcadp_parser.get("secret_key")
self.region = region or tcadp_parser.get("region", "ap-guangzhou")
self.table_result_type = tcadp_parser.get("table_result_type", "1")
self.markdown_image_response_type = tcadp_parser.get("markdown_image_response_type", "1")
self.logger.info("[TCADP] Configuration read from service_conf.yaml")
# Set table_result_type and markdown_image_response_type from config or parameters
self.table_result_type = table_result_type if table_result_type is not None else tcadp_parser.get("table_result_type", "1")
self.markdown_image_response_type = markdown_image_response_type if markdown_image_response_type is not None else tcadp_parser.get("markdown_image_response_type", "1")
else:
self.logger.error("[TCADP] Please configure tcadp_config in service_conf.yaml first")
# If config file is empty, use provided parameters or defaults
self.secret_id = secret_id
self.secret_key = secret_key
self.region = region or "ap-guangzhou"
self.table_result_type = table_result_type if table_result_type is not None else "1"
self.markdown_image_response_type = markdown_image_response_type if markdown_image_response_type is not None else "1"
except ImportError:
self.logger.info("[TCADP] Configuration module import failed")
# If config file is not available, use provided parameters or defaults
self.secret_id = secret_id
self.secret_key = secret_key
self.region = region or "ap-guangzhou"
self.table_result_type = table_result_type if table_result_type is not None else "1"
self.markdown_image_response_type = markdown_image_response_type if markdown_image_response_type is not None else "1"
# Log final values
self.logger.info(f"[TCADP] Final values - table_result_type: {self.table_result_type}, markdown_image_response_type: {self.markdown_image_response_type}")
if not self.secret_id or not self.secret_key:
raise ValueError("[TCADP] Please set Tencent Cloud API keys, configure tcadp_config in service_conf.yaml")
@ -400,6 +420,8 @@ class TCADPParser(RAGFlowPdfParser):
"TableResultType": self.table_result_type,
"MarkdownImageResponseType": self.markdown_image_response_type
}
self.logger.info(f"[TCADP] API request config - TableResultType: {self.table_result_type}, MarkdownImageResponseType: {self.markdown_image_response_type}")
result = client.reconstruct_document_sse(
file_type=file_type,

View File

@ -7,6 +7,7 @@
# Available options:
# - `elasticsearch` (default)
# - `infinity` (https://github.com/infiniflow/infinity)
# - `oceanbase` (https://github.com/oceanbase/oceanbase)
# - `opensearch` (https://github.com/opensearch-project/OpenSearch)
DOC_ENGINE=${DOC_ENGINE:-elasticsearch}
@ -62,6 +63,27 @@ INFINITY_THRIFT_PORT=23817
INFINITY_HTTP_PORT=23820
INFINITY_PSQL_PORT=5432
# The hostname where the OceanBase service is exposed
OCEANBASE_HOST=oceanbase
# The port used to expose the OceanBase service
OCEANBASE_PORT=2881
# The username for OceanBase
OCEANBASE_USER=root@ragflow
# The password for OceanBase
OCEANBASE_PASSWORD=infini_rag_flow
# The doc database of the OceanBase service to use
OCEANBASE_DOC_DBNAME=ragflow_doc
# OceanBase container configuration
OB_CLUSTER_NAME=${OB_CLUSTER_NAME:-ragflow}
OB_TENANT_NAME=${OB_TENANT_NAME:-ragflow}
OB_SYS_PASSWORD=${OCEANBASE_PASSWORD:-infini_rag_flow}
OB_TENANT_PASSWORD=${OCEANBASE_PASSWORD:-infini_rag_flow}
OB_MEMORY_LIMIT=${OB_MEMORY_LIMIT:-10G}
OB_SYSTEM_MEMORY=${OB_SYSTEM_MEMORY:-2G}
OB_DATAFILE_SIZE=${OB_DATAFILE_SIZE:-20G}
OB_LOG_DISK_SIZE=${OB_LOG_DISK_SIZE:-20G}
# The password for MySQL.
MYSQL_PASSWORD=infini_rag_flow
# The hostname where the MySQL service is exposed

View File

@ -138,6 +138,15 @@ The [.env](./.env) file contains important environment variables for Docker.
- `password`: The password for MinIO.
- `host`: The MinIO serving IP *and* port inside the Docker container. Defaults to `minio:9000`.
- `oceanbase`
- `scheme`: The connection scheme. Set to `mysql` to use mysql config, or other values to use config below.
- `config`:
- `db_name`: The OceanBase database name.
- `user`: The username for OceanBase.
- `password`: The password for OceanBase.
- `host`: The hostname of the OceanBase service.
- `port`: The port of OceanBase.
- `oss`
- `access_key`: The access key ID used to authenticate requests to the OSS service.
- `secret_key`: The secret access key used to authenticate requests to the OSS service.

View File

@ -72,7 +72,7 @@ services:
infinity:
profiles:
- infinity
image: infiniflow/infinity:v0.6.5
image: infiniflow/infinity:v0.6.6
volumes:
- infinity_data:/var/infinity
- ./infinity_conf.toml:/infinity_conf.toml
@ -96,6 +96,31 @@ services:
retries: 120
restart: on-failure
oceanbase:
profiles:
- oceanbase
image: oceanbase/oceanbase-ce:4.4.1.0-100000032025101610
volumes:
- ./oceanbase/data:/root/ob
- ./oceanbase/conf:/root/.obd/cluster
- ./oceanbase/init.d:/root/boot/init.d
ports:
- ${OCEANBASE_PORT:-2881}:2881
env_file: .env
environment:
- MODE=normal
- OB_SERVER_IP=127.0.0.1
mem_limit: ${MEM_LIMIT}
healthcheck:
test: [ 'CMD-SHELL', 'obclient -h127.0.0.1 -P2881 -uroot@${OB_TENANT_NAME:-ragflow} -p${OB_TENANT_PASSWORD:-infini_rag_flow} -e "CREATE DATABASE IF NOT EXISTS ${OCEANBASE_DOC_DBNAME:-ragflow_doc};"' ]
interval: 10s
retries: 30
start_period: 30s
timeout: 10s
networks:
- ragflow
restart: on-failure
sandbox-executor-manager:
profiles:
- sandbox
@ -154,7 +179,7 @@ services:
minio:
image: quay.io/minio/minio:RELEASE.2025-06-13T11-33-47Z
command: server --console-address ":9001" /data
command: ["server", "--console-address", ":9001", "/data"]
ports:
- ${MINIO_PORT}:9000
- ${MINIO_CONSOLE_PORT}:9001
@ -176,7 +201,7 @@ services:
redis:
# swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/valkey/valkey:8
image: valkey/valkey:8
command: redis-server --requirepass ${REDIS_PASSWORD} --maxmemory 128mb --maxmemory-policy allkeys-lru
command: ["redis-server", "--requirepass", "${REDIS_PASSWORD}", "--maxmemory", "128mb", "--maxmemory-policy", "allkeys-lru"]
env_file: .env
ports:
- ${REDIS_PORT}:6379
@ -256,6 +281,8 @@ volumes:
driver: local
infinity_data:
driver: local
ob_data:
driver: local
mysql_data:
driver: local
minio_data:

View File

@ -1,5 +1,5 @@
[general]
version = "0.6.5"
version = "0.6.6"
time_zone = "utc-8"
[network]

View File

@ -0,0 +1 @@
ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;

View File

@ -28,6 +28,14 @@ os:
infinity:
uri: '${INFINITY_HOST:-infinity}:23817'
db_name: 'default_db'
oceanbase:
scheme: 'oceanbase' # set 'mysql' to create connection using mysql config
config:
db_name: '${OCEANBASE_DOC_DBNAME:-test}'
user: '${OCEANBASE_USER:-root@ragflow}'
password: '${OCEANBASE_PASSWORD:-infini_rag_flow}'
host: '${OCEANBASE_HOST:-oceanbase}'
port: ${OCEANBASE_PORT:-2881}
redis:
db: 1
password: '${REDIS_PASSWORD:-infini_rag_flow}'
@ -142,5 +150,3 @@ user_default_llm:
# secret_id: '${TENCENT_SECRET_ID}'
# secret_key: '${TENCENT_SECRET_KEY}'
# region: '${TENCENT_REGION}'
# table_result_type: '1'
# markdown_image_response_type: '1'

View File

@ -2085,6 +2085,7 @@ curl --request POST \
"dataset_ids": ["b2a62730759d11ef987d0242ac120004"],
"document_ids": ["77df9ef4759a11ef8bdd0242ac120004"],
"metadata_condition": {
"logic": "and",
"conditions": [
{
"name": "author",

View File

@ -96,7 +96,7 @@ ragflow:
infinity:
image:
repository: infiniflow/infinity
tag: v0.6.5
tag: v0.6.6
pullPolicy: IfNotPresent
pullSecrets: []
storage:

View File

@ -49,7 +49,7 @@ dependencies = [
"html-text==0.6.2",
"httpx[socks]>=0.28.1,<0.29.0",
"huggingface-hub>=0.25.0,<0.26.0",
"infinity-sdk==0.6.5",
"infinity-sdk==0.6.6",
"infinity-emb>=0.0.66,<0.0.67",
"itsdangerous==2.1.2",
"json-repair==0.35.0",
@ -149,6 +149,7 @@ dependencies = [
"captcha>=0.7.1",
"pip>=25.2",
"pypandoc>=1.16",
"pyobvector==0.2.18",
]
[dependency-groups]

View File

@ -116,7 +116,7 @@ def by_plaintext(filename, binary=None, from_page=0, to_page=100000, callback=No
else:
vision_model = LLMBundle(kwargs["tenant_id"], LLMType.IMAGE2TEXT, llm_name=kwargs.get("layout_recognizer", ""), lang=kwargs.get("lang", "Chinese"))
pdf_parser = VisionParser(vision_model=vision_model, **kwargs)
sections, tables = pdf_parser(
filename if not binary else binary,
from_page=from_page,
@ -504,7 +504,7 @@ class Markdown(MarkdownParser):
return images if images else None
def __call__(self, filename, binary=None, separate_tables=True,delimiter=None):
def __call__(self, filename, binary=None, separate_tables=True, delimiter=None):
if binary:
encoding = find_codec(binary)
txt = binary.decode(encoding, errors="ignore")
@ -602,7 +602,7 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
_SerializedRelationships.load_from_xml = load_from_xml_v2
sections, tables = Docx()(filename, binary)
tables=vision_figure_parser_docx_wrapper(sections=sections,tbls=tables,callback=callback,**kwargs)
tables = vision_figure_parser_docx_wrapper(sections=sections, tbls=tables, callback=callback, **kwargs)
res = tokenize_table(tables, doc, is_english)
callback(0.8, "Finish parsing.")
@ -653,18 +653,47 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
if name in ["tcadp", "docling", "mineru"]:
parser_config["chunk_token_num"] = 0
res = tokenize_table(tables, doc, is_english)
callback(0.8, "Finish parsing.")
elif re.search(r"\.(csv|xlsx?)$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
excel_parser = ExcelParser()
if parser_config.get("html4excel"):
sections = [(_, "") for _ in excel_parser.html(binary, 12) if _]
# Check if tcadp_parser is selected for spreadsheet files
layout_recognizer = parser_config.get("layout_recognize", "DeepDOC")
if layout_recognizer == "TCADP Parser":
table_result_type = parser_config.get("table_result_type", "1")
markdown_image_response_type = parser_config.get("markdown_image_response_type", "1")
tcadp_parser = TCADPParser(
table_result_type=table_result_type,
markdown_image_response_type=markdown_image_response_type
)
if not tcadp_parser.check_installation():
callback(-1, "TCADP parser not available. Please check Tencent Cloud API configuration.")
return res
# Determine file type based on extension
file_type = "XLSX" if re.search(r"\.xlsx?$", filename, re.IGNORECASE) else "CSV"
sections, tables = tcadp_parser.parse_pdf(
filepath=filename,
binary=binary,
callback=callback,
output_dir=os.environ.get("TCADP_OUTPUT_DIR", ""),
file_type=file_type
)
parser_config["chunk_token_num"] = 0
res = tokenize_table(tables, doc, is_english)
callback(0.8, "Finish parsing.")
else:
sections = [(_, "") for _ in excel_parser(binary) if _]
parser_config["chunk_token_num"] = 12800
# Default DeepDOC parser
excel_parser = ExcelParser()
if parser_config.get("html4excel"):
sections = [(_, "") for _ in excel_parser.html(binary, 12) if _]
else:
sections = [(_, "") for _ in excel_parser(binary) if _]
parser_config["chunk_token_num"] = 12800
elif re.search(r"\.(txt|py|js|java|c|cpp|h|php|go|ts|sh|cs|kt|sql)$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
@ -676,7 +705,7 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
elif re.search(r"\.(md|markdown)$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
markdown_parser = Markdown(int(parser_config.get("chunk_token_num", 128)))
sections, tables = markdown_parser(filename, binary, separate_tables=False,delimiter=parser_config.get("delimiter", "\n!?;。;!?"))
sections, tables = markdown_parser(filename, binary, separate_tables=False, delimiter=parser_config.get("delimiter", "\n!?;。;!?"))
try:
vision_model = LLMBundle(kwargs["tenant_id"], LLMType.IMAGE2TEXT)

View File

@ -16,6 +16,7 @@ import io
import json
import os
import random
import re
from functools import partial
import trio
@ -83,6 +84,7 @@ class ParserParam(ProcessParamBase):
"output_format": "json",
},
"spreadsheet": {
"parse_method": "deepdoc", # deepdoc/tcadp_parser
"output_format": "html",
"suffix": [
"xls",
@ -102,8 +104,10 @@ class ParserParam(ProcessParamBase):
"output_format": "json",
},
"slides": {
"parse_method": "deepdoc", # deepdoc/tcadp_parser
"suffix": [
"pptx",
"ppt"
],
"output_format": "json",
},
@ -245,7 +249,12 @@ class Parser(ProcessBase):
bboxes.append(box)
elif conf.get("parse_method").lower() == "tcadp parser":
# ADP is a document parsing tool using Tencent Cloud API
tcadp_parser = TCADPParser()
table_result_type = conf.get("table_result_type", "1")
markdown_image_response_type = conf.get("markdown_image_response_type", "1")
tcadp_parser = TCADPParser(
table_result_type=table_result_type,
markdown_image_response_type=markdown_image_response_type
)
sections, _ = tcadp_parser.parse_pdf(
filepath=name,
binary=blob,
@ -301,14 +310,86 @@ class Parser(ProcessBase):
self.callback(random.randint(1, 5) / 100.0, "Start to work on a Spreadsheet.")
conf = self._param.setups["spreadsheet"]
self.set_output("output_format", conf["output_format"])
spreadsheet_parser = ExcelParser()
if conf.get("output_format") == "html":
htmls = spreadsheet_parser.html(blob, 1000000000)
self.set_output("html", htmls[0])
elif conf.get("output_format") == "json":
self.set_output("json", [{"text": txt} for txt in spreadsheet_parser(blob) if txt])
elif conf.get("output_format") == "markdown":
self.set_output("markdown", spreadsheet_parser.markdown(blob))
parse_method = conf.get("parse_method", "deepdoc")
# Handle TCADP parser
if parse_method.lower() == "tcadp parser":
table_result_type = conf.get("table_result_type", "1")
markdown_image_response_type = conf.get("markdown_image_response_type", "1")
tcadp_parser = TCADPParser(
table_result_type=table_result_type,
markdown_image_response_type=markdown_image_response_type
)
if not tcadp_parser.check_installation():
raise RuntimeError("TCADP parser not available. Please check Tencent Cloud API configuration.")
# Determine file type based on extension
if re.search(r"\.xlsx?$", name, re.IGNORECASE):
file_type = "XLSX"
else:
file_type = "CSV"
self.callback(0.2, f"Using TCADP parser for {file_type} file.")
sections, tables = tcadp_parser.parse_pdf(
filepath=name,
binary=blob,
callback=self.callback,
file_type=file_type,
file_start_page=1,
file_end_page=1000
)
# Process TCADP parser output based on configured output_format
output_format = conf.get("output_format", "html")
if output_format == "html":
# For HTML output, combine sections and tables into HTML
html_content = ""
for section, position_tag in sections:
if section:
html_content += section + "\n"
for table in tables:
if table:
html_content += table + "\n"
self.set_output("html", html_content)
elif output_format == "json":
# For JSON output, create a list of text items
result = []
# Add sections as text
for section, position_tag in sections:
if section:
result.append({"text": section})
# Add tables as text
for table in tables:
if table:
result.append({"text": table})
self.set_output("json", result)
elif output_format == "markdown":
# For markdown output, combine into markdown
md_content = ""
for section, position_tag in sections:
if section:
md_content += section + "\n\n"
for table in tables:
if table:
md_content += table + "\n\n"
self.set_output("markdown", md_content)
else:
# Default DeepDOC parser
spreadsheet_parser = ExcelParser()
if conf.get("output_format") == "html":
htmls = spreadsheet_parser.html(blob, 1000000000)
self.set_output("html", htmls[0])
elif conf.get("output_format") == "json":
self.set_output("json", [{"text": txt} for txt in spreadsheet_parser(blob) if txt])
elif conf.get("output_format") == "markdown":
self.set_output("markdown", spreadsheet_parser.markdown(blob))
def _word(self, name, blob):
self.callback(random.randint(1, 5) / 100.0, "Start to work on a Word Processor Document")
@ -326,22 +407,69 @@ class Parser(ProcessBase):
self.set_output("markdown", markdown_text)
def _slides(self, name, blob):
from deepdoc.parser.ppt_parser import RAGFlowPptParser as ppt_parser
self.callback(random.randint(1, 5) / 100.0, "Start to work on a PowerPoint Document")
conf = self._param.setups["slides"]
self.set_output("output_format", conf["output_format"])
ppt_parser = ppt_parser()
txts = ppt_parser(blob, 0, 100000, None)
parse_method = conf.get("parse_method", "deepdoc")
sections = [{"text": section} for section in txts if section.strip()]
# Handle TCADP parser
if parse_method.lower() == "tcadp parser":
table_result_type = conf.get("table_result_type", "1")
markdown_image_response_type = conf.get("markdown_image_response_type", "1")
tcadp_parser = TCADPParser(
table_result_type=table_result_type,
markdown_image_response_type=markdown_image_response_type
)
if not tcadp_parser.check_installation():
raise RuntimeError("TCADP parser not available. Please check Tencent Cloud API configuration.")
# json
assert conf.get("output_format") == "json", "have to be json for ppt"
if conf.get("output_format") == "json":
self.set_output("json", sections)
# Determine file type based on extension
if re.search(r"\.pptx?$", name, re.IGNORECASE):
file_type = "PPTX"
else:
file_type = "PPT"
self.callback(0.2, f"Using TCADP parser for {file_type} file.")
sections, tables = tcadp_parser.parse_pdf(
filepath=name,
binary=blob,
callback=self.callback,
file_type=file_type,
file_start_page=1,
file_end_page=1000
)
# Process TCADP parser output - PPT only supports json format
output_format = conf.get("output_format", "json")
if output_format == "json":
# For JSON output, create a list of text items
result = []
# Add sections as text
for section, position_tag in sections:
if section:
result.append({"text": section})
# Add tables as text
for table in tables:
if table:
result.append({"text": table})
self.set_output("json", result)
else:
# Default DeepDOC parser (supports .pptx format)
from deepdoc.parser.ppt_parser import RAGFlowPptParser as ppt_parser
ppt_parser = ppt_parser()
txts = ppt_parser(blob, 0, 100000, None)
sections = [{"text": section} for section in txts if section.strip()]
# json
assert conf.get("output_format") == "json", "have to be json for ppt"
if conf.get("output_format") == "json":
self.set_output("json", sections)
def _markdown(self, name, blob):
from functools import reduce
@ -579,6 +707,7 @@ class Parser(ProcessBase):
"video": self._video,
"email": self._email,
}
try:
from_upstream = ParserFromUpstream.model_validate(kwargs)
except Exception as e:

View File

@ -234,7 +234,11 @@ class CoHereRerank(Base):
def __init__(self, key, model_name, base_url=None):
from cohere import Client
self.client = Client(api_key=key, base_url=base_url)
# Only pass base_url if it's a non-empty string, otherwise use default Cohere API endpoint
client_kwargs = {"api_key": key}
if base_url and base_url.strip():
client_kwargs["base_url"] = base_url
self.client = Client(**client_kwargs)
self.model_name = model_name.split("___")[0]
def similarity(self, query: str, texts: list):

View File

@ -83,6 +83,7 @@ class FulltextQueryer:
return txt
def question(self, txt, tbl="qa", min_match: float = 0.6):
original_query = txt
txt = FulltextQueryer.add_space_between_eng_zh(txt)
txt = re.sub(
r"[ :|\r\n\t,,。??/`!&^%%()\[\]{}<>]+",
@ -127,7 +128,7 @@ class FulltextQueryer:
q.append(txt)
query = " ".join(q)
return MatchTextExpr(
self.query_fields, query, 100
self.query_fields, query, 100, {"original_query": original_query}
), keywords
def need_fine_grained_tokenize(tk):
@ -212,7 +213,7 @@ class FulltextQueryer:
if not query:
query = otxt
return MatchTextExpr(
self.query_fields, query, 100, {"minimum_should_match": min_match}
self.query_fields, query, 100, {"minimum_should_match": min_match, "original_query": original_query}
), keywords
return None, keywords
@ -259,6 +260,7 @@ class FulltextQueryer:
content_tks = [c.strip() for c in content_tks.strip() if c.strip()]
tks_w = self.tw.weights(content_tks, preprocess=False)
origin_keywords = keywords.copy()
keywords = [f'"{k.strip()}"' for k in keywords]
for tk, w in sorted(tks_w, key=lambda x: x[1] * -1)[:keywords_topn]:
tk_syns = self.syn.lookup(tk)
@ -274,4 +276,4 @@ class FulltextQueryer:
keywords.append(f"{tk}^{w}")
return MatchTextExpr(self.query_fields, " ".join(keywords), 100,
{"minimum_should_match": min(3, len(keywords) // 10)})
{"minimum_should_match": min(3, len(keywords) / 10), "original_query": " ".join(origin_keywords)})

View File

@ -429,7 +429,7 @@ def rank_memories(chat_mdl, goal:str, sub_goal:str, tool_call_summaries: list[st
return re.sub(r"^.*</think>", "", ans, flags=re.DOTALL)
def gen_meta_filter(chat_mdl, meta_data:dict, query: str) -> list:
def gen_meta_filter(chat_mdl, meta_data:dict, query: str) -> dict:
sys_prompt = PROMPT_JINJA_ENV.from_string(META_FILTER).render(
current_date=datetime.datetime.today().strftime('%Y-%m-%d'),
metadata_keys=json.dumps(meta_data),
@ -440,11 +440,13 @@ def gen_meta_filter(chat_mdl, meta_data:dict, query: str) -> list:
ans = re.sub(r"(^.*</think>|```json\n|```\n*$)", "", ans, flags=re.DOTALL)
try:
ans = json_repair.loads(ans)
assert isinstance(ans, list), ans
assert isinstance(ans, dict), ans
assert "conditions" in ans and isinstance(ans["conditions"], list), ans
return ans
except Exception:
logging.exception(f"Loading json failure: {ans}")
return []
return {"conditions": []}
def gen_json(system_prompt:str, user_prompt:str, chat_mdl, gen_conf = None):

View File

@ -9,11 +9,13 @@ You are a metadata filtering condition generator. Analyze the user's question an
}
2. **Output Requirements**:
- Always output a JSON array of filter objects
- Each object must have:
- Always output a JSON dictionary with only 2 keys: 'conditions'(filter objects) and 'logic' between the conditions ('and' or 'or').
- Each filter object in conditions must have:
"key": (metadata attribute name),
"value": (string value to compare),
"op": (operator from allowed list)
- Logic between all the conditions: 'and'(Intersection of results for each condition) / 'or' (union of results for all conditions)
3. **Operator Guide**:
- Use these operators only: ["contains", "not contains", "start with", "end with", "empty", "not empty", "=", "≠", ">", "<", "≥", "≤"]
@ -32,22 +34,97 @@ You are a metadata filtering condition generator. Analyze the user's question an
- Attribute doesn't exist in metadata
- Value has no match in metadata
5. **Example**:
5. **Example A**:
- User query: "上市日期七月份的有哪些商品不要蓝色的"
- Metadata: { "color": {...}, "listing_date": {...} }
- Output:
[
{
"logic": "and",
"conditions": [
{"key": "listing_date", "value": "2025-07-01", "op": "≥"},
{"key": "listing_date", "value": "2025-08-01", "op": "<"},
{"key": "color", "value": "blue", "op": "≠"}
]
}
6. **Final Output**:
- ONLY output valid JSON array
6. **Example B**:
- User query: "Both blue and red are acceptable."
- Metadata: { "color": {...}, "listing_date": {...} }
- Output:
{
"logic": "or",
"conditions": [
{"key": "color", "value": "blue", "op": "="},
{"key": "color", "value": "red", "op": "="}
]
}
7. **Final Output**:
- ONLY output valid JSON dictionary
- NO additional text/explanations
- Json schema is as following:
```json
{
"type": "object",
"properties": {
"logic": {
"type": "string",
"description": "Logic relationship between all the conditions, the default is 'and'.",
"enum": [
"and",
"or"
]
},
"conditions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"key": {
"type": "string",
"description": "Metadata attribute name."
},
"value": {
"type": "string",
"description": "Value to compare."
},
"op": {
"type": "string",
"description": "Operator from allowed list.",
"enum": [
"contains",
"not contains",
"start with",
"end with",
"empty",
"not empty",
"=",
"≠",
">",
"<",
"≥",
"≤"
]
}
},
"required": [
"key",
"value",
"op"
],
"additionalProperties": false
}
}
},
"required": [
"conditions"
],
"additionalProperties": false
}
```
**Current Task**:
- Today's date: {{current_date}}
- Available metadata keys: {{metadata_keys}}
- User query: "{{user_question}}"
- Today's date: {{ current_date }}
- Available metadata keys: {{ metadata_keys }}
- User query: "{{ user_question }}"

1562
rag/utils/ob_conn.py Normal file

File diff suppressed because it is too large Load Diff

View File

@ -69,7 +69,7 @@ class Document(Base):
response = res.json()
actual_keys = set(response.keys())
if actual_keys == error_keys:
raise Exception(res.get("message"))
raise Exception(response.get("message"))
else:
return res.content
except json.JSONDecodeError:

View File

@ -80,6 +80,7 @@ class Session(Base):
def _structure_answer(self, json_data):
answer = ""
if self.__session_type == "agent":
answer = json_data["data"]["content"]
elif self.__session_type == "chat":

6751
uv.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -79,6 +79,7 @@
"input-otp": "^1.4.1",
"js-base64": "^3.7.5",
"jsencrypt": "^3.3.2",
"jsoneditor": "^10.4.2",
"lexical": "^0.23.1",
"lodash": "^4.17.21",
"lucide-react": "^0.546.0",

View File

@ -0,0 +1,132 @@
.ace-tomorrow-night .ace_gutter {
background: var(--bg-card);
color: rgb(var(--text-primary));
}
.ace-tomorrow-night .ace_print-margin {
width: 1px;
background: #25282c;
}
.ace-tomorrow-night {
background: var(--bg-card);
color: rgb(var(--text-primary));
.ace_editor {
background: var(--bg-card);
}
}
.ace-tomorrow-night .ace_cursor {
color: #aeafad;
}
.ace-tomorrow-night .ace_marker-layer .ace_selection {
background: #373b41;
}
.ace-tomorrow-night.ace_multiselect .ace_selection.ace_start {
box-shadow: 0 0 3px 0px #1d1f21;
}
.ace-tomorrow-night .ace_marker-layer .ace_step {
background: rgb(102, 82, 0);
}
.ace-tomorrow-night .ace_marker-layer .ace_bracket {
margin: -1px 0 0 -1px;
border: 1px solid #4b4e55;
}
.ace-tomorrow-night .ace_marker-layer .ace_active-line {
background: var(--bg-card);
}
.ace-tomorrow-night .ace_gutter-active-line {
background-color: var(--bg-card);
}
.ace-tomorrow-night .ace_marker-layer .ace_selected-word {
border: 1px solid #373b41;
}
.ace-tomorrow-night .ace_invisible {
color: #4b4e55;
}
.ace-tomorrow-night .ace_keyword,
.ace-tomorrow-night .ace_meta,
.ace-tomorrow-night .ace_storage,
.ace-tomorrow-night .ace_storage.ace_type,
.ace-tomorrow-night .ace_support.ace_type {
color: #b294bb;
}
.ace-tomorrow-night .ace_keyword.ace_operator {
color: #8abeb7;
}
.ace-tomorrow-night .ace_constant.ace_character,
.ace-tomorrow-night .ace_constant.ace_language,
.ace-tomorrow-night .ace_constant.ace_numeric,
.ace-tomorrow-night .ace_keyword.ace_other.ace_unit,
.ace-tomorrow-night .ace_support.ace_constant,
.ace-tomorrow-night .ace_variable.ace_parameter {
color: #de935f;
}
.ace-tomorrow-night .ace_constant.ace_other {
color: #ced1cf;
}
.ace-tomorrow-night .ace_invalid {
color: #ced2cf;
background-color: #df5f5f;
}
.ace-tomorrow-night .ace_invalid.ace_deprecated {
color: #ced2cf;
background-color: #b798bf;
}
.ace-tomorrow-night .ace_fold {
background-color: #81a2be;
border-color: #c5c8c6;
}
.ace-tomorrow-night .ace_entity.ace_name.ace_function,
.ace-tomorrow-night .ace_support.ace_function,
.ace-tomorrow-night .ace_variable {
color: #81a2be;
}
.ace-tomorrow-night .ace_support.ace_class,
.ace-tomorrow-night .ace_support.ace_type {
color: #f0c674;
}
.ace-tomorrow-night .ace_heading,
.ace-tomorrow-night .ace_markup.ace_heading,
.ace-tomorrow-night .ace_string {
color: #b5bd68;
}
.ace-tomorrow-night .ace_entity.ace_name.ace_tag,
.ace-tomorrow-night .ace_entity.ace_other.ace_attribute-name,
.ace-tomorrow-night .ace_meta.ace_tag,
.ace-tomorrow-night .ace_string.ace_regexp,
.ace-tomorrow-night .ace_variable {
color: #cc6666;
}
.ace-tomorrow-night .ace_comment {
color: #969896;
}
.ace-tomorrow-night .ace_indent-guide {
background: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAACCAYAAACZgbYnAAAAEklEQVQImWNgYGBgYHB3d/8PAAOIAdULw8qMAAAAAElFTkSuQmCC)
right repeat-y;
}
.ace-tomorrow-night .ace_indent-guide-active {
background: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAACCAYAAACZgbYnAAAAEklEQVQIW2PQ1dX9zzBz5sz/ABCcBFFentLlAAAAAElFTkSuQmCC)
right repeat-y;
}

View File

@ -0,0 +1,83 @@
.jsoneditor {
border: none;
color: rgb(var(--text-primary));
overflow: auto;
scrollbar-width: none;
background-color: var(--bg-base);
.jsoneditor-menu {
background-color: var(--bg-base);
// border-color: var(--border-button);
border-bottom: thin solid var(--border-button);
}
.jsoneditor-navigation-bar {
border-bottom: 1px solid var(--border-button);
background-color: var(--bg-input);
}
.jsoneditor-tree {
background: var(--bg-base);
}
.jsoneditor-highlight {
background-color: var(--bg-card);
}
}
.jsoneditor-popover,
.jsoneditor-schema-error,
div.jsoneditor td,
div.jsoneditor textarea,
div.jsoneditor th,
div.jsoneditor-field,
div.jsoneditor-value,
pre.jsoneditor-preview {
font-family: consolas, menlo, monaco, 'Ubuntu Mono', source-code-pro,
monospace;
font-size: 14px;
color: rgb(var(--text-primary));
}
div.jsoneditor-field.jsoneditor-highlight,
div.jsoneditor-field[contenteditable='true']:focus,
div.jsoneditor-field[contenteditable='true']:hover,
div.jsoneditor-value.jsoneditor-highlight,
div.jsoneditor-value[contenteditable='true']:focus,
div.jsoneditor-value[contenteditable='true']:hover {
background-color: var(--bg-input);
border: 1px solid var(--border-button);
border-radius: 2px;
}
.jsoneditor-selected,
.jsoneditor-contextmenu .jsoneditor-menu li ul {
background: var(--bg-base);
}
.jsoneditor-contextmenu .jsoneditor-menu button {
color: rgb(var(--text-secondary));
}
.jsoneditor-menu a.jsoneditor-poweredBy {
display: none;
}
.ace-jsoneditor .ace_scroller {
background-color: var(--bg-base);
}
.jsoneditor-statusbar {
border-top: 1px solid var(--border-button);
background-color: var(--bg-base);
color: rgb(var(--text-primary));
}
.jsoneditor-menu > .jsoneditor-modes > button,
.jsoneditor-menu > button {
// color: rgb(var(--text-secondary));
background-color: var(--text-disabled);
}
.jsoneditor-menu > .jsoneditor-modes > button:active,
.jsoneditor-menu > .jsoneditor-modes > button:focus,
.jsoneditor-menu > button:active,
.jsoneditor-menu > button:focus {
background-color: rgb(var(--text-secondary));
}
.jsoneditor-menu > .jsoneditor-modes > button:hover,
.jsoneditor-menu > button:hover {
background-color: rgb(var(--text-secondary));
border: 1px solid var(--border-button);
}

View File

@ -0,0 +1,142 @@
import React, { useEffect, useRef } from 'react';
import { useTranslation } from 'react-i18next';
import './css/cloud9_night.less';
import './css/index.less';
import { JsonEditorOptions, JsonEditorProps } from './interface';
const defaultConfig: JsonEditorOptions = {
mode: 'code',
modes: ['tree', 'code'],
history: false,
search: false,
mainMenuBar: false,
navigationBar: false,
enableSort: false,
enableTransform: false,
indentation: 2,
};
const JsonEditor: React.FC<JsonEditorProps> = ({
value,
onChange,
height = '400px',
className = '',
options = {},
}) => {
const containerRef = useRef<HTMLDivElement>(null);
const editorRef = useRef<any>(null);
const { i18n } = useTranslation();
const currentLanguageRef = useRef<string>(i18n.language);
useEffect(() => {
if (typeof window !== 'undefined') {
const JSONEditor = require('jsoneditor');
import('jsoneditor/dist/jsoneditor.min.css');
if (containerRef.current) {
// Default configuration options
const defaultOptions: JsonEditorOptions = {
...defaultConfig,
language: i18n.language === 'zh' ? 'zh-CN' : 'en',
onChange: () => {
if (editorRef.current && onChange) {
try {
const updatedJson = editorRef.current.get();
onChange(updatedJson);
} catch (err) {
// Do not trigger onChange when parsing error occurs
console.error(err);
}
}
},
...options, // Merge user provided options with defaults
};
editorRef.current = new JSONEditor(
containerRef.current,
defaultOptions,
);
if (value) {
editorRef.current.set(value);
}
}
}
return () => {
if (editorRef.current) {
if (typeof editorRef.current.destroy === 'function') {
editorRef.current.destroy();
}
editorRef.current = null;
}
};
}, []);
useEffect(() => {
// Update language when i18n language changes
// Since JSONEditor doesn't have a setOptions method, we need to recreate the editor
if (editorRef.current && currentLanguageRef.current !== i18n.language) {
currentLanguageRef.current = i18n.language;
// Save current data
let currentData;
try {
currentData = editorRef.current.get();
} catch (e) {
// If there's an error getting data, use the passed value or empty object
currentData = value || {};
}
// Destroy the current editor
if (typeof editorRef.current.destroy === 'function') {
editorRef.current.destroy();
}
// Recreate the editor with new language
const JSONEditor = require('jsoneditor');
const newOptions: JsonEditorOptions = {
...defaultConfig,
language: i18n.language === 'zh' ? 'zh-CN' : 'en',
onChange: () => {
if (editorRef.current && onChange) {
try {
const updatedJson = editorRef.current.get();
onChange(updatedJson);
} catch (err) {
// Do not trigger onChange when parsing error occurs
}
}
},
...options, // Merge user provided options with defaults
};
editorRef.current = new JSONEditor(containerRef.current, newOptions);
editorRef.current.set(currentData);
}
}, [i18n.language, value, onChange, options]);
useEffect(() => {
if (editorRef.current && value !== undefined) {
try {
// Only update the editor when the value actually changes
const currentJson = editorRef.current.get();
if (JSON.stringify(currentJson) !== JSON.stringify(value)) {
editorRef.current.set(value);
}
} catch (err) {
// Skip update if there is a syntax error in the current editor
editorRef.current.set(value);
}
}
}, [value]);
return (
<div
ref={containerRef}
style={{ height }}
className={`ace-tomorrow-night w-full border border-border-button rounded-lg overflow-hidden bg-bg-input ${className} `}
/>
);
};
export default JsonEditor;

View File

@ -0,0 +1,339 @@
// JSONEditor configuration options interface see: https://github.com/josdejong/jsoneditor/blob/master/docs/api.md
export interface JsonEditorOptions {
/**
* Editor mode. Available values: 'tree' (default), 'view', 'form', 'text', and 'code'.
*/
mode?: 'tree' | 'view' | 'form' | 'text' | 'code';
/**
* Array of available modes
*/
modes?: Array<'tree' | 'view' | 'form' | 'text' | 'code'>;
/**
* Field name for the root node. Only applicable for modes 'tree', 'view', and 'form'
*/
name?: string;
/**
* Theme for the editor
*/
theme?: string;
/**
* Enable history (undo/redo). True by default. Only applicable for modes 'tree', 'view', and 'form'
*/
history?: boolean;
/**
* Enable search box. True by default. Only applicable for modes 'tree', 'view', and 'form'
*/
search?: boolean;
/**
* Main menu bar visibility
*/
mainMenuBar?: boolean;
/**
* Navigation bar visibility
*/
navigationBar?: boolean;
/**
* Status bar visibility
*/
statusBar?: boolean;
/**
* If true, object keys are sorted before display. false by default.
*/
sortObjectKeys?: boolean;
/**
* Enable transform functionality
*/
enableTransform?: boolean;
/**
* Enable sort functionality
*/
enableSort?: boolean;
/**
* Limit dragging functionality
*/
limitDragging?: boolean;
/**
* A JSON schema object
*/
schema?: any;
/**
* Schemas that are referenced using the `$ref` property from the JSON schema
*/
schemaRefs?: Record<string, any>;
/**
* Array of template objects
*/
templates?: Array<{
text: string;
title?: string;
className?: string;
field?: string;
value: any;
}>;
/**
* Ace editor instance
*/
ace?: any;
/**
* An instance of Ajv JSON schema validator
*/
ajv?: any;
/**
* Switch to enable/disable autocomplete
*/
autocomplete?: {
confirmKey?: string | string[];
caseSensitive?: boolean;
getOptions?: (
text: string,
path: Array<string | number>,
input: string,
editor: any,
) => string[] | Promise<string[]> | null;
};
/**
* Number of indentation spaces. 4 by default. Only applicable for modes 'text' and 'code'
*/
indentation?: number;
/**
* Available languages
*/
languages?: string[];
/**
* Language of the editor
*/
language?: string;
/**
* Callback method, triggered on change of contents. Does not pass the contents itself.
* See also onChangeJSON and onChangeText.
*/
onChange?: () => void;
/**
* Callback method, triggered in modes on change of contents, passing the changed contents as JSON.
* Only applicable for modes 'tree', 'view', and 'form'.
*/
onChangeJSON?: (json: any) => void;
/**
* Callback method, triggered in modes on change of contents, passing the changed contents as stringified JSON.
*/
onChangeText?: (text: string) => void;
/**
* Callback method, triggered when an error occurs
*/
onError?: (error: Error) => void;
/**
* Callback method, triggered when node is expanded
*/
onExpand?: (node: any) => void;
/**
* Callback method, triggered when node is collapsed
*/
onCollapse?: (node: any) => void;
/**
* Callback method, determines if a node is editable
*/
onEditable?: (node: any) => boolean | { field: boolean; value: boolean };
/**
* Callback method, triggered when an event occurs in a JSON field or value.
* Only applicable for modes 'form', 'tree' and 'view'
*/
onEvent?: (node: any, event: Event) => void;
/**
* Callback method, triggered when the editor comes into focus, passing an object {type, target}.
* Applicable for all modes
*/
onFocus?: (node: any) => void;
/**
* Callback method, triggered when the editor goes out of focus, passing an object {type, target}.
* Applicable for all modes
*/
onBlur?: (node: any) => void;
/**
* Callback method, triggered when creating menu items
*/
onCreateMenu?: (menuItems: any[], node: any) => any[];
/**
* Callback method, triggered on node selection change. Only applicable for modes 'tree', 'view', and 'form'
*/
onSelectionChange?: (selection: any) => void;
/**
* Callback method, triggered on text selection change. Only applicable for modes 'text' and 'code'
*/
onTextSelectionChange?: (selection: any) => void;
/**
* Callback method, triggered when a Node DOM is rendered. Function returns a css class name to be set on a node.
* Only applicable for modes 'form', 'tree' and 'view'
*/
onClassName?: (node: any) => string | undefined;
/**
* Callback method, triggered when validating nodes
*/
onValidate?: (
json: any,
) =>
| Array<{ path: Array<string | number>; message: string }>
| Promise<Array<{ path: Array<string | number>; message: string }>>;
/**
* Callback method, triggered when node name is determined
*/
onNodeName?: (parentNode: any, childNode: any, name: string) => string;
/**
* Callback method, triggered when mode changes
*/
onModeChange?: (newMode: string, oldMode: string) => void;
/**
* Color picker options
*/
colorPicker?: boolean;
/**
* Callback method for color picker
*/
onColorPicker?: (
callback: (color: string) => void,
parent: HTMLElement,
) => void;
/**
* If true, shows timestamp tag
*/
timestampTag?: boolean;
/**
* Format for timestamps
*/
timestampFormat?: string;
/**
* If true, unicode characters are escaped. false by default.
*/
escapeUnicode?: boolean;
/**
* Number of children allowed for a node in 'tree', 'view', or 'form' mode before
* the "show more/show all" buttons appear. 100 by default.
*/
maxVisibleChilds?: number;
/**
* Callback method for validation errors
*/
onValidationError?: (
errors: Array<{ path: Array<string | number>; message: string }>,
) => void;
/**
* Callback method for validation warnings
*/
onValidationWarning?: (
warnings: Array<{ path: Array<string | number>; message: string }>,
) => void;
/**
* The anchor element to apply an overlay and display the modals in a centered location. Defaults to document.body
*/
modalAnchor?: HTMLElement | null;
/**
* Anchor element for popups
*/
popupAnchor?: HTMLElement | null;
/**
* Function to create queries
*/
createQuery?: () => void;
/**
* Function to execute queries
*/
executeQuery?: () => void;
/**
* Query description
*/
queryDescription?: string;
/**
* Allow schema suggestions
*/
allowSchemaSuggestions?: boolean;
/**
* Show error table
*/
showErrorTable?: boolean;
/**
* Validate current JSON object against the configured JSON schema
* Must be implemented by tree mode and text mode
*/
validate?: () => Promise<any[]>;
/**
* Refresh the rendered contents
* Can be implemented by tree mode and text mode
*/
refresh?: () => void;
/**
* Callback method triggered when schema changes
*/
_onSchemaChange?: (schema: any, schemaRefs: any) => void;
}
export interface JsonEditorProps {
// JSON data to be displayed in the editor
value?: any;
// Callback function triggered when the JSON data changes
onChange?: (value: any) => void;
// Height of the editor
height?: string;
// Additional CSS class names
className?: string;
// Configuration options for the JSONEditor
options?: JsonEditorOptions;
}

View File

@ -25,6 +25,7 @@ export default {
portugueseBr: 'Portuguese (Brazil)',
chinese: 'Simplified Chinese',
traditionalChinese: 'Traditional Chinese',
russian: 'Russian',
language: 'Language',
languageMessage: 'Please input your language!',
languagePlaceholder: 'select your language',
@ -1752,6 +1753,8 @@ The variable aggregation node (originally the variable assignment node) is a cru
The Indexer will store the content in the corresponding data structures for the selected methods.`,
// file: 'File',
parserMethod: 'PDF parser',
tableResultType: 'Table Result Type',
markdownImageResponseType: 'Markdown Image Response Type',
// systemPrompt: 'System Prompt',
systemPromptPlaceholder:
'Enter system prompt for image analysis, if empty the system default value will be used',
@ -1934,6 +1937,7 @@ Important structured information may include: names, dates, locations, events, k
japanese: 'Japanese',
korean: 'Korean',
vietnamese: 'Vietnamese',
russian: 'Russian',
},
pagination: {
total: 'Total {{total}}',

File diff suppressed because it is too large Load Diff

View File

@ -1629,6 +1629,8 @@ General实体和关系提取提示来自 GitHub - microsoft/graphrag基于
Tokenizer 会根据所选方式将内容存储为对应的数据结构。`,
filenameEmbdWeight: '文件名嵌入权重',
parserMethod: '解析方法',
tableResultType: '表格返回形式',
markdownImageResponseType: '图片返回形式',
systemPromptPlaceholder:
'请输入用于图像分析的系统提示词,若为空则使用系统缺省值',
exportJson: '导出 JSON',

View File

@ -169,6 +169,7 @@ export const initialParserValues = {
{
fileFormat: FileType.Spreadsheet,
output_format: SpreadsheetOutputFormat.Html,
parse_method: ParseDocumentType.DeepDOC,
},
{
fileFormat: FileType.Image,
@ -192,6 +193,7 @@ export const initialParserValues = {
{
fileFormat: FileType.PowerPoint,
output_format: PptOutputFormat.Json,
parse_method: ParseDocumentType.DeepDOC,
},
],
};
@ -243,7 +245,7 @@ export const FileTypeSuffixMap = {
[FileType.Email]: ['eml', 'msg'],
[FileType.TextMarkdown]: ['md', 'markdown', 'mdx', 'txt'],
[FileType.Docx]: ['doc', 'docx'],
[FileType.PowerPoint]: ['pptx'],
[FileType.PowerPoint]: ['pptx', 'ppt'],
[FileType.Video]: ['mp4', 'avi', 'mkv'],
[FileType.Audio]: [
'da',

View File

@ -34,6 +34,8 @@ import { OutputFormatFormField } from './common-form-fields';
import { EmailFormFields } from './email-form-fields';
import { ImageFormFields } from './image-form-fields';
import { PdfFormFields } from './pdf-form-fields';
import { PptFormFields } from './ppt-form-fields';
import { SpreadsheetFormFields } from './spreadsheet-form-fields';
import { buildFieldNameWithPrefix } from './utils';
import { AudioFormFields, VideoFormFields } from './video-form-fields';
@ -41,6 +43,8 @@ const outputList = buildOutputList(initialParserValues.outputs);
const FileFormatWidgetMap = {
[FileType.PDF]: PdfFormFields,
[FileType.Spreadsheet]: SpreadsheetFormFields,
[FileType.PowerPoint]: PptFormFields,
[FileType.Video]: VideoFormFields,
[FileType.Audio]: AudioFormFields,
[FileType.Email]: EmailFormFields,
@ -65,6 +69,8 @@ export const FormSchema = z.object({
fields: z.array(z.string()).optional(),
llm_id: z.string().optional(),
system_prompt: z.string().optional(),
table_result_type: z.string().optional(),
markdown_image_response_type: z.string().optional(),
}),
),
});
@ -184,6 +190,8 @@ const ParserForm = ({ node }: INextOperatorForm) => {
lang: '',
fields: [],
llm_id: '',
table_result_type: '',
markdown_image_response_type: '',
});
}, [append]);

View File

@ -1,13 +1,30 @@
import { ParseDocumentType } from '@/components/layout-recognize-form-field';
import {
SelectWithSearch,
SelectWithSearchFlagOptionType,
} from '@/components/originui/select-with-search';
import { RAGFlowFormItem } from '@/components/ragflow-form';
import { isEmpty } from 'lodash';
import { useEffect, useMemo } from 'react';
import { useFormContext, useWatch } from 'react-hook-form';
import { useTranslation } from 'react-i18next';
import { LanguageFormField, ParserMethodFormField } from './common-form-fields';
import { CommonProps } from './interface';
import { useSetInitialLanguage } from './use-set-initial-language';
import { buildFieldNameWithPrefix } from './utils';
const tableResultTypeOptions: SelectWithSearchFlagOptionType[] = [
{ label: 'Markdown', value: '0' },
{ label: 'HTML', value: '1' },
];
const markdownImageResponseTypeOptions: SelectWithSearchFlagOptionType[] = [
{ label: 'URL', value: '0' },
{ label: 'Text', value: '1' },
];
export function PdfFormFields({ prefix }: CommonProps) {
const { t } = useTranslation();
const form = useFormContext();
const parseMethodName = buildFieldNameWithPrefix('parse_method', prefix);
@ -25,6 +42,12 @@ export function PdfFormFields({ prefix }: CommonProps) {
);
}, [parseMethod]);
const tcadpOptionsShown = useMemo(() => {
return (
!isEmpty(parseMethod) && parseMethod === ParseDocumentType.TCADPParser
);
}, [parseMethod]);
useSetInitialLanguage({ prefix, languageShown });
useEffect(() => {
@ -36,10 +59,68 @@ export function PdfFormFields({ prefix }: CommonProps) {
}
}, [form, parseMethodName]);
// Set default values for TCADP options when TCADP is selected
useEffect(() => {
if (tcadpOptionsShown) {
const tableResultTypeName = buildFieldNameWithPrefix(
'table_result_type',
prefix,
);
const markdownImageResponseTypeName = buildFieldNameWithPrefix(
'markdown_image_response_type',
prefix,
);
if (isEmpty(form.getValues(tableResultTypeName))) {
form.setValue(tableResultTypeName, '1', {
shouldValidate: true,
shouldDirty: true,
});
}
if (isEmpty(form.getValues(markdownImageResponseTypeName))) {
form.setValue(markdownImageResponseTypeName, '1', {
shouldValidate: true,
shouldDirty: true,
});
}
}
}, [tcadpOptionsShown, form, prefix]);
return (
<>
<ParserMethodFormField prefix={prefix}></ParserMethodFormField>
{languageShown && <LanguageFormField prefix={prefix}></LanguageFormField>}
{tcadpOptionsShown && (
<>
<RAGFlowFormItem
name={buildFieldNameWithPrefix('table_result_type', prefix)}
label={t('flow.tableResultType') || '表格返回形式'}
>
{(field) => (
<SelectWithSearch
value={field.value}
onChange={field.onChange}
options={tableResultTypeOptions}
></SelectWithSearch>
)}
</RAGFlowFormItem>
<RAGFlowFormItem
name={buildFieldNameWithPrefix(
'markdown_image_response_type',
prefix,
)}
label={t('flow.markdownImageResponseType') || '图片返回形式'}
>
{(field) => (
<SelectWithSearch
value={field.value}
onChange={field.onChange}
options={markdownImageResponseTypeOptions}
></SelectWithSearch>
)}
</RAGFlowFormItem>
</>
)}
</>
);
}

View File

@ -0,0 +1,125 @@
import { ParseDocumentType } from '@/components/layout-recognize-form-field';
import {
SelectWithSearch,
SelectWithSearchFlagOptionType,
} from '@/components/originui/select-with-search';
import { RAGFlowFormItem } from '@/components/ragflow-form';
import { isEmpty } from 'lodash';
import { useEffect, useMemo } from 'react';
import { useFormContext, useWatch } from 'react-hook-form';
import { useTranslation } from 'react-i18next';
import { ParserMethodFormField } from './common-form-fields';
import { CommonProps } from './interface';
import { buildFieldNameWithPrefix } from './utils';
const tableResultTypeOptions: SelectWithSearchFlagOptionType[] = [
{ label: 'Markdown', value: '0' },
{ label: 'HTML', value: '1' },
];
const markdownImageResponseTypeOptions: SelectWithSearchFlagOptionType[] = [
{ label: 'URL', value: '0' },
{ label: 'Text', value: '1' },
];
export function PptFormFields({ prefix }: CommonProps) {
const { t } = useTranslation();
const form = useFormContext();
const parseMethodName = buildFieldNameWithPrefix('parse_method', prefix);
const parseMethod = useWatch({
name: parseMethodName,
});
// PPT only supports DeepDOC and TCADPParser
const optionsWithoutLLM = [
{ label: ParseDocumentType.DeepDOC, value: ParseDocumentType.DeepDOC },
{
label: ParseDocumentType.TCADPParser,
value: ParseDocumentType.TCADPParser,
},
];
const tcadpOptionsShown = useMemo(() => {
return (
!isEmpty(parseMethod) && parseMethod === ParseDocumentType.TCADPParser
);
}, [parseMethod]);
useEffect(() => {
if (isEmpty(form.getValues(parseMethodName))) {
form.setValue(parseMethodName, ParseDocumentType.DeepDOC, {
shouldValidate: true,
shouldDirty: true,
});
}
}, [form, parseMethodName]);
// Set default values for TCADP options when TCADP is selected
useEffect(() => {
if (tcadpOptionsShown) {
const tableResultTypeName = buildFieldNameWithPrefix(
'table_result_type',
prefix,
);
const markdownImageResponseTypeName = buildFieldNameWithPrefix(
'markdown_image_response_type',
prefix,
);
if (isEmpty(form.getValues(tableResultTypeName))) {
form.setValue(tableResultTypeName, '1', {
shouldValidate: true,
shouldDirty: true,
});
}
if (isEmpty(form.getValues(markdownImageResponseTypeName))) {
form.setValue(markdownImageResponseTypeName, '1', {
shouldValidate: true,
shouldDirty: true,
});
}
}
}, [tcadpOptionsShown, form, prefix]);
return (
<>
<ParserMethodFormField
prefix={prefix}
optionsWithoutLLM={optionsWithoutLLM}
></ParserMethodFormField>
{tcadpOptionsShown && (
<>
<RAGFlowFormItem
name={buildFieldNameWithPrefix('table_result_type', prefix)}
label={t('flow.tableResultType') || '表格返回形式'}
>
{(field) => (
<SelectWithSearch
value={field.value}
onChange={field.onChange}
options={tableResultTypeOptions}
></SelectWithSearch>
)}
</RAGFlowFormItem>
<RAGFlowFormItem
name={buildFieldNameWithPrefix(
'markdown_image_response_type',
prefix,
)}
label={t('flow.markdownImageResponseType') || '图片返回形式'}
>
{(field) => (
<SelectWithSearch
value={field.value}
onChange={field.onChange}
options={markdownImageResponseTypeOptions}
></SelectWithSearch>
)}
</RAGFlowFormItem>
</>
)}
</>
);
}

View File

@ -0,0 +1,125 @@
import { ParseDocumentType } from '@/components/layout-recognize-form-field';
import {
SelectWithSearch,
SelectWithSearchFlagOptionType,
} from '@/components/originui/select-with-search';
import { RAGFlowFormItem } from '@/components/ragflow-form';
import { isEmpty } from 'lodash';
import { useEffect, useMemo } from 'react';
import { useFormContext, useWatch } from 'react-hook-form';
import { useTranslation } from 'react-i18next';
import { ParserMethodFormField } from './common-form-fields';
import { CommonProps } from './interface';
import { buildFieldNameWithPrefix } from './utils';
const tableResultTypeOptions: SelectWithSearchFlagOptionType[] = [
{ label: 'Markdown', value: '0' },
{ label: 'HTML', value: '1' },
];
const markdownImageResponseTypeOptions: SelectWithSearchFlagOptionType[] = [
{ label: 'URL', value: '0' },
{ label: 'Text', value: '1' },
];
export function SpreadsheetFormFields({ prefix }: CommonProps) {
const { t } = useTranslation();
const form = useFormContext();
const parseMethodName = buildFieldNameWithPrefix('parse_method', prefix);
const parseMethod = useWatch({
name: parseMethodName,
});
// Spreadsheet only supports DeepDOC and TCADPParser
const optionsWithoutLLM = [
{ label: ParseDocumentType.DeepDOC, value: ParseDocumentType.DeepDOC },
{
label: ParseDocumentType.TCADPParser,
value: ParseDocumentType.TCADPParser,
},
];
const tcadpOptionsShown = useMemo(() => {
return (
!isEmpty(parseMethod) && parseMethod === ParseDocumentType.TCADPParser
);
}, [parseMethod]);
useEffect(() => {
if (isEmpty(form.getValues(parseMethodName))) {
form.setValue(parseMethodName, ParseDocumentType.DeepDOC, {
shouldValidate: true,
shouldDirty: true,
});
}
}, [form, parseMethodName]);
// Set default values for TCADP options when TCADP is selected
useEffect(() => {
if (tcadpOptionsShown) {
const tableResultTypeName = buildFieldNameWithPrefix(
'table_result_type',
prefix,
);
const markdownImageResponseTypeName = buildFieldNameWithPrefix(
'markdown_image_response_type',
prefix,
);
if (isEmpty(form.getValues(tableResultTypeName))) {
form.setValue(tableResultTypeName, '1', {
shouldValidate: true,
shouldDirty: true,
});
}
if (isEmpty(form.getValues(markdownImageResponseTypeName))) {
form.setValue(markdownImageResponseTypeName, '1', {
shouldValidate: true,
shouldDirty: true,
});
}
}
}, [tcadpOptionsShown, form, prefix]);
return (
<>
<ParserMethodFormField
prefix={prefix}
optionsWithoutLLM={optionsWithoutLLM}
></ParserMethodFormField>
{tcadpOptionsShown && (
<>
<RAGFlowFormItem
name={buildFieldNameWithPrefix('table_result_type', prefix)}
label={t('flow.tableResultType') || '表格返回形式'}
>
{(field) => (
<SelectWithSearch
value={field.value}
onChange={field.onChange}
options={tableResultTypeOptions}
></SelectWithSearch>
)}
</RAGFlowFormItem>
<RAGFlowFormItem
name={buildFieldNameWithPrefix(
'markdown_image_response_type',
prefix,
)}
label={t('flow.markdownImageResponseType') || '图片返回形式'}
>
{(field) => (
<SelectWithSearch
value={field.value}
onChange={field.onChange}
options={markdownImageResponseTypeOptions}
></SelectWithSearch>
)}
</RAGFlowFormItem>
</>
)}
</>
);
}

View File

@ -1,7 +1,7 @@
import JsonEditor from '@/components/json-edit';
import { BlockButton, Button } from '@/components/ui/button';
import { Input } from '@/components/ui/input';
import { Segmented } from '@/components/ui/segmented';
import { Editor } from '@monaco-editor/react';
import { t } from 'i18next';
import { Trash2, X } from 'lucide-react';
import { useCallback } from 'react';
@ -31,32 +31,80 @@ export const useObjectFields = () => {
},
[],
);
const validateKeys = (
obj: any,
path: (string | number)[] = [],
): Array<{ path: (string | number)[]; message: string }> => {
const errors: Array<{ path: (string | number)[]; message: string }> = [];
if (obj !== null && typeof obj === 'object' && !Array.isArray(obj)) {
for (const key in obj) {
if (obj.hasOwnProperty(key)) {
if (!/^[a-zA-Z_]+$/.test(key)) {
errors.push({
path: [...path, key],
message: `Key "${key}" is invalid. Keys can only contain letters and underscores.`,
});
}
const nestedErrors = validateKeys(obj[key], [...path, key]);
errors.push(...nestedErrors);
}
}
} else if (Array.isArray(obj)) {
obj.forEach((item, index) => {
const nestedErrors = validateKeys(item, [...path, index]);
errors.push(...nestedErrors);
});
}
return errors;
};
const objectRender = useCallback((field: FieldValues) => {
const fieldValue =
typeof field.value === 'object'
? JSON.stringify(field.value, null, 2)
: JSON.stringify({}, null, 2);
console.log('object-render-field', field, fieldValue);
// const fieldValue =
// typeof field.value === 'object'
// ? JSON.stringify(field.value, null, 2)
// : JSON.stringify({}, null, 2);
// console.log('object-render-field', field, fieldValue);
return (
<Editor
height={200}
defaultLanguage="json"
theme="vs-dark"
value={fieldValue}
// <Editor
// height={200}
// defaultLanguage="json"
// theme="vs-dark"
// value={fieldValue}
// onChange={field.onChange}
// />
<JsonEditor
value={field.value}
onChange={field.onChange}
height="400px"
options={{
mode: 'code',
navigationBar: false,
mainMenuBar: true,
history: true,
onValidate: (json) => {
return validateKeys(json);
},
}}
/>
);
}, []);
const objectValidate = useCallback((value: any) => {
try {
if (!JSON.parse(value)) {
throw new Error(t('knowledgeDetails.formatTypeError'));
if (validateKeys(value, [])?.length > 0) {
throw new Error(t('flow.formatTypeError'));
}
if (!z.object({}).safeParse(value).success) {
throw new Error(t('flow.formatTypeError'));
}
if (value && typeof value === 'string' && !JSON.parse(value)) {
throw new Error(t('flow.formatTypeError'));
}
return true;
} catch (e) {
throw new Error(t('knowledgeDetails.formatTypeError'));
console.log('object-render-error', e, value);
throw new Error(t('flow.formatTypeError'));
}
}, []);
@ -219,6 +267,10 @@ export const useObjectFields = () => {
};
const handleCustomSchema = (value: TypesWithArray) => {
switch (value) {
case TypesWithArray.Object:
return z.object({});
case TypesWithArray.ArrayObject:
return z.array(z.object({}));
case TypesWithArray.ArrayString:
return z.array(z.string());
case TypesWithArray.ArrayNumber:

View File

@ -214,6 +214,36 @@ function transformParserParams(params: ParserFormSchemaType) {
parse_method: cur.parse_method,
lang: cur.lang,
};
// Only include TCADP parameters if TCADP Parser is selected
if (cur.parse_method?.toLowerCase() === 'tcadp parser') {
filteredSetup.table_result_type = cur.table_result_type;
filteredSetup.markdown_image_response_type =
cur.markdown_image_response_type;
}
break;
case FileType.Spreadsheet:
filteredSetup = {
...filteredSetup,
parse_method: cur.parse_method,
};
// Only include TCADP parameters if TCADP Parser is selected
if (cur.parse_method?.toLowerCase() === 'tcadp parser') {
filteredSetup.table_result_type = cur.table_result_type;
filteredSetup.markdown_image_response_type =
cur.markdown_image_response_type;
}
break;
case FileType.PowerPoint:
filteredSetup = {
...filteredSetup,
parse_method: cur.parse_method,
};
// Only include TCADP parameters if TCADP Parser is selected
if (cur.parse_method?.toLowerCase() === 'tcadp parser') {
filteredSetup.table_result_type = cur.table_result_type;
filteredSetup.markdown_image_response_type =
cur.markdown_image_response_type;
}
break;
case FileType.Image:
filteredSetup = {

View File

View File

@ -0,0 +1,40 @@
import { ParseDocumentType } from '@/components/layout-recognize-form-field';
import { isEmpty } from 'lodash';
import { useEffect } from 'react';
import { useFormContext } from 'react-hook-form';
import { ParserMethodFormField } from './common-form-fields';
import { CommonProps } from './interface';
import { buildFieldNameWithPrefix } from './utils';
export function PptFormFields({ prefix }: CommonProps) {
const form = useFormContext();
const parseMethodName = buildFieldNameWithPrefix('parse_method', prefix);
// PPT only supports DeepDOC and TCADPParser
const optionsWithoutLLM = [
{ label: ParseDocumentType.DeepDOC, value: ParseDocumentType.DeepDOC },
{
label: ParseDocumentType.TCADPParser,
value: ParseDocumentType.TCADPParser,
},
];
useEffect(() => {
if (isEmpty(form.getValues(parseMethodName))) {
form.setValue(parseMethodName, ParseDocumentType.DeepDOC, {
shouldValidate: true,
shouldDirty: true,
});
}
}, [form, parseMethodName]);
return (
<>
<ParserMethodFormField
prefix={prefix}
optionsWithoutLLM={optionsWithoutLLM}
></ParserMethodFormField>
</>
);
}

View File

@ -0,0 +1,40 @@
import { ParseDocumentType } from '@/components/layout-recognize-form-field';
import { isEmpty } from 'lodash';
import { useEffect } from 'react';
import { useFormContext } from 'react-hook-form';
import { ParserMethodFormField } from './common-form-fields';
import { CommonProps } from './interface';
import { buildFieldNameWithPrefix } from './utils';
export function SpreadsheetFormFields({ prefix }: CommonProps) {
const form = useFormContext();
const parseMethodName = buildFieldNameWithPrefix('parse_method', prefix);
// Spreadsheet only supports DeepDOC and TCADPParser
const optionsWithoutLLM = [
{ label: ParseDocumentType.DeepDOC, value: ParseDocumentType.DeepDOC },
{
label: ParseDocumentType.TCADPParser,
value: ParseDocumentType.TCADPParser,
},
];
useEffect(() => {
if (isEmpty(form.getValues(parseMethodName))) {
form.setValue(parseMethodName, ParseDocumentType.DeepDOC, {
shouldValidate: true,
shouldDirty: true,
});
}
}, [form, parseMethodName]);
return (
<>
<ParserMethodFormField
prefix={prefix}
optionsWithoutLLM={optionsWithoutLLM}
></ParserMethodFormField>
</>
);
}

View File