Feat: API supports toc_enhance. (#11437)

### What problem does this PR solve?

Close #11433

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
This commit is contained in:
Kevin Hu
2025-11-21 14:51:58 +08:00
committed by GitHub
parent db0f6840d9
commit 249296e417
11 changed files with 25 additions and 9 deletions

View File

@ -86,7 +86,7 @@ Try our demo at [https://demo.ragflow.io](https://demo.ragflow.io).
## 🔥 Latest Updates ## 🔥 Latest Updates
- 2025-11-19 Supports Gemini 3 Pro. - 2025-11-19 Supports Gemini 3 Pro.
- 2025-11-12 Supports data synchronization from Confluence, AWS S3, Discord, Google Drive. - 2025-11-12 Supports data synchronization from Confluence, S3, Notion, Discord, Google Drive.
- 2025-10-23 Supports MinerU & Docling as document parsing methods. - 2025-10-23 Supports MinerU & Docling as document parsing methods.
- 2025-10-15 Supports orchestrable ingestion pipeline. - 2025-10-15 Supports orchestrable ingestion pipeline.
- 2025-08-08 Supports OpenAI's latest GPT-5 series models. - 2025-08-08 Supports OpenAI's latest GPT-5 series models.

View File

@ -86,7 +86,7 @@ Coba demo kami di [https://demo.ragflow.io](https://demo.ragflow.io).
## 🔥 Pembaruan Terbaru ## 🔥 Pembaruan Terbaru
- 2025-11-19 Mendukung Gemini 3 Pro. - 2025-11-19 Mendukung Gemini 3 Pro.
- 2025-11-12 Mendukung sinkronisasi data dari Confluence, AWS S3, Discord, Google Drive. - 2025-11-12 Mendukung sinkronisasi data dari Confluence, S3, Notion, Discord, Google Drive.
- 2025-10-23 Mendukung MinerU & Docling sebagai metode penguraian dokumen. - 2025-10-23 Mendukung MinerU & Docling sebagai metode penguraian dokumen.
- 2025-10-15 Dukungan untuk jalur data yang terorkestrasi. - 2025-10-15 Dukungan untuk jalur data yang terorkestrasi.
- 2025-08-08 Mendukung model seri GPT-5 terbaru dari OpenAI. - 2025-08-08 Mendukung model seri GPT-5 terbaru dari OpenAI.

View File

@ -67,7 +67,7 @@
## 🔥 最新情報 ## 🔥 最新情報
- 2025-11-19 Gemini 3 Proをサポートしています - 2025-11-19 Gemini 3 Proをサポートしています
- 2025-11-12 Confluence、AWS S3、Discord、Google Drive からのデータ同期をサポートします。 - 2025-11-12 Confluence、S3、Notion、Discord、Google Drive からのデータ同期をサポートします。
- 2025-10-23 ドキュメント解析方法として MinerU と Docling をサポートします。 - 2025-10-23 ドキュメント解析方法として MinerU と Docling をサポートします。
- 2025-10-15 オーケストレーションされたデータパイプラインのサポート。 - 2025-10-15 オーケストレーションされたデータパイプラインのサポート。
- 2025-08-08 OpenAI の最新 GPT-5 シリーズモデルをサポートします。 - 2025-08-08 OpenAI の最新 GPT-5 シリーズモデルをサポートします。

View File

@ -68,7 +68,7 @@
## 🔥 업데이트 ## 🔥 업데이트
- 2025-11-19 Gemini 3 Pro를 지원합니다. - 2025-11-19 Gemini 3 Pro를 지원합니다.
- 2025-11-12 Confluence, AWS S3, Discord, Google Drive에서 데이터 동기화를 지원합니다. - 2025-11-12 Confluence, S3, Notion, Discord, Google Drive에서 데이터 동기화를 지원합니다.
- 2025-10-23 문서 파싱 방법으로 MinerU 및 Docling을 지원합니다. - 2025-10-23 문서 파싱 방법으로 MinerU 및 Docling을 지원합니다.
- 2025-10-15 조정된 데이터 파이프라인 지원. - 2025-10-15 조정된 데이터 파이프라인 지원.
- 2025-08-08 OpenAI의 최신 GPT-5 시리즈 모델을 지원합니다. - 2025-08-08 OpenAI의 최신 GPT-5 시리즈 모델을 지원합니다.

View File

@ -87,7 +87,7 @@ Experimente nossa demo em [https://demo.ragflow.io](https://demo.ragflow.io).
## 🔥 Últimas Atualizações ## 🔥 Últimas Atualizações
- 19-11-2025 Suporta Gemini 3 Pro. - 19-11-2025 Suporta Gemini 3 Pro.
- 12-11-2025 Suporta a sincronização de dados do Confluence, AWS S3, Discord e Google Drive. - 12-11-2025 Suporta a sincronização de dados do Confluence, S3, Notion, Discord e Google Drive.
- 23-10-2025 Suporta MinerU e Docling como métodos de análise de documentos. - 23-10-2025 Suporta MinerU e Docling como métodos de análise de documentos.
- 15-10-2025 Suporte para pipelines de dados orquestrados. - 15-10-2025 Suporte para pipelines de dados orquestrados.
- 08-08-2025 Suporta a mais recente série GPT-5 da OpenAI. - 08-08-2025 Suporta a mais recente série GPT-5 da OpenAI.

View File

@ -86,7 +86,7 @@
## 🔥 近期更新 ## 🔥 近期更新
- 2025-11-19 支援 Gemini 3 Pro. - 2025-11-19 支援 Gemini 3 Pro.
- 2025-11-12 支援從 Confluence、AWS S3、Discord、Google Drive 進行資料同步。 - 2025-11-12 支援從 Confluence、S3、Notion、Discord、Google Drive 進行資料同步。
- 2025-10-23 支援 MinerU 和 Docling 作為文件解析方法。 - 2025-10-23 支援 MinerU 和 Docling 作為文件解析方法。
- 2025-10-15 支援可編排的資料管道。 - 2025-10-15 支援可編排的資料管道。
- 2025-08-08 支援 OpenAI 最新的 GPT-5 系列模型。 - 2025-08-08 支援 OpenAI 最新的 GPT-5 系列模型。

View File

@ -86,7 +86,7 @@
## 🔥 近期更新 ## 🔥 近期更新
- 2025-11-19 支持 Gemini 3 Pro. - 2025-11-19 支持 Gemini 3 Pro.
- 2025-11-12 支持从 Confluence、AWS S3、Discord、Google Drive 进行数据同步。 - 2025-11-12 支持从 Confluence、S3、Notion、Discord、Google Drive 进行数据同步。
- 2025-10-23 支持 MinerU 和 Docling 作为文档解析方法。 - 2025-10-23 支持 MinerU 和 Docling 作为文档解析方法。
- 2025-10-15 支持可编排的数据管道。 - 2025-10-15 支持可编排的数据管道。
- 2025-08-08 支持 OpenAI 最新的 GPT-5 系列模型。 - 2025-08-08 支持 OpenAI 最新的 GPT-5 系列模型。

View File

@ -32,7 +32,7 @@ class IterationParam(ComponentParamBase):
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.items_ref = "" self.items_ref = ""
self.veriable={} self.variable={}
def get_input_form(self) -> dict[str, dict]: def get_input_form(self) -> dict[str, dict]:
return { return {

View File

@ -24,7 +24,7 @@ from flasgger import Swagger
from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer
from quart_cors import cors from quart_cors import cors
from common.constants import StatusEnum from common.constants import StatusEnum
from api.db.db_models import close_connection from api.db.db_models import close_connection, APIToken
from api.db.services import UserService from api.db.services import UserService
from api.utils.json_encode import CustomJSONEncoder from api.utils.json_encode import CustomJSONEncoder
from api.utils import commands from api.utils import commands
@ -124,6 +124,10 @@ def _load_user():
user = UserService.query( user = UserService.query(
access_token=access_token, status=StatusEnum.VALID.value access_token=access_token, status=StatusEnum.VALID.value
) )
if not user and len(authorization.split()) == 2:
objs = APIToken.query(token=authorization.split()[1])
if objs:
user = UserService.query(id=objs[0].tenant_id, status=StatusEnum.VALID.value)
if user: if user:
if not user[0].access_token or not user[0].access_token.strip(): if not user[0].access_token or not user[0].access_token.strip():
logging.warning(f"User {user[0].email} has empty access_token in database") logging.warning(f"User {user[0].email} has empty access_token in database")

View File

@ -1434,6 +1434,7 @@ async def retrieval_test(tenant_id):
question = req["question"] question = req["question"]
doc_ids = req.get("document_ids", []) doc_ids = req.get("document_ids", [])
use_kg = req.get("use_kg", False) use_kg = req.get("use_kg", False)
toc_enhance = req.get("toc_enhance", False)
langs = req.get("cross_languages", []) langs = req.get("cross_languages", [])
if not isinstance(doc_ids, list): if not isinstance(doc_ids, list):
return get_error_data_result("`documents` should be a list") return get_error_data_result("`documents` should be a list")
@ -1487,6 +1488,11 @@ async def retrieval_test(tenant_id):
highlight=highlight, highlight=highlight,
rank_feature=label_question(question, kbs), rank_feature=label_question(question, kbs),
) )
if toc_enhance:
chat_mdl = LLMBundle(kb.tenant_id, LLMType.CHAT)
cks = settings.retriever.retrieval_by_toc(question, ranks["chunks"], tenant_ids, chat_mdl, size)
if cks:
ranks["chunks"] = cks
if use_kg: if use_kg:
ck = settings.kg_retriever.retrieval(question, [k.tenant_id for k in kbs], kb_ids, embd_mdl, LLMBundle(kb.tenant_id, LLMType.CHAT)) ck = settings.kg_retriever.retrieval(question, [k.tenant_id for k in kbs], kb_ids, embd_mdl, LLMBundle(kb.tenant_id, LLMType.CHAT))
if ck["content_with_weight"]: if ck["content_with_weight"]:

View File

@ -2072,6 +2072,7 @@ Retrieves chunks from specified datasets.
- `"cross_languages"`: `list[string]` - `"cross_languages"`: `list[string]`
- `"metadata_condition"`: `object` - `"metadata_condition"`: `object`
- `"use_kg"`: `boolean` - `"use_kg"`: `boolean`
- `"toc_enhance"`: `boolean`
##### Request example ##### Request example
```bash ```bash
@ -2122,6 +2123,8 @@ curl --request POST \
The number of chunks engaged in vector cosine computation. Defaults to `1024`. The number of chunks engaged in vector cosine computation. Defaults to `1024`.
- `"use_kg"`: (*Body parameter*), `boolean` - `"use_kg"`: (*Body parameter*), `boolean`
The search includes text chunks related to the knowledge graph of the selected dataset to handle complex multi-hop queries. Defaults to `False`. The search includes text chunks related to the knowledge graph of the selected dataset to handle complex multi-hop queries. Defaults to `False`.
- `"toc_enhance"`: (*Body parameter*), `boolean`
The search includes table of content enhancement in order to boost rank of relevant chunks. Files parsed with `TOC Enhance` enabled is prerequisite. Defaults to `False`.
- `"rerank_id"`: (*Body parameter*), `integer` - `"rerank_id"`: (*Body parameter*), `integer`
The ID of the rerank model. The ID of the rerank model.
- `"keyword"`: (*Body parameter*), `boolean` - `"keyword"`: (*Body parameter*), `boolean`
@ -2136,6 +2139,9 @@ curl --request POST \
The languages that should be translated into, in order to achieve keywords retrievals in different languages. The languages that should be translated into, in order to achieve keywords retrievals in different languages.
- `"metadata_condition"`: (*Body parameter*), `object` - `"metadata_condition"`: (*Body parameter*), `object`
The metadata condition used for filtering chunks: The metadata condition used for filtering chunks:
- `"logic"`: (*Body parameter*), `string`
- `"and"` Intersection of the result from each condition (default).
- `"or"` union of the result from each condition.
- `"conditions"`: (*Body parameter*), `array` - `"conditions"`: (*Body parameter*), `array`
A list of metadata filter conditions. A list of metadata filter conditions.
- `"name"`: `string` - The metadata field name to filter by, e.g., `"author"`, `"company"`, `"url"`. Ensure this parameter before use. See [Set metadata](../guides/dataset/set_metadata.md) for details. - `"name"`: `string` - The metadata field name to filter by, e.g., `"author"`, `"company"`, `"url"`. Ensure this parameter before use. See [Set metadata](../guides/dataset/set_metadata.md) for details.