mirror of
https://github.com/infiniflow/ragflow.git
synced 2025-12-08 12:32:30 +08:00
Feat: API supports toc_enhance. (#11437)
### What problem does this PR solve? Close #11433 ### Type of change - [x] New Feature (non-breaking change which adds functionality)
This commit is contained in:
@ -86,7 +86,7 @@ Try our demo at [https://demo.ragflow.io](https://demo.ragflow.io).
|
|||||||
## 🔥 Latest Updates
|
## 🔥 Latest Updates
|
||||||
|
|
||||||
- 2025-11-19 Supports Gemini 3 Pro.
|
- 2025-11-19 Supports Gemini 3 Pro.
|
||||||
- 2025-11-12 Supports data synchronization from Confluence, AWS S3, Discord, Google Drive.
|
- 2025-11-12 Supports data synchronization from Confluence, S3, Notion, Discord, Google Drive.
|
||||||
- 2025-10-23 Supports MinerU & Docling as document parsing methods.
|
- 2025-10-23 Supports MinerU & Docling as document parsing methods.
|
||||||
- 2025-10-15 Supports orchestrable ingestion pipeline.
|
- 2025-10-15 Supports orchestrable ingestion pipeline.
|
||||||
- 2025-08-08 Supports OpenAI's latest GPT-5 series models.
|
- 2025-08-08 Supports OpenAI's latest GPT-5 series models.
|
||||||
|
|||||||
@ -86,7 +86,7 @@ Coba demo kami di [https://demo.ragflow.io](https://demo.ragflow.io).
|
|||||||
## 🔥 Pembaruan Terbaru
|
## 🔥 Pembaruan Terbaru
|
||||||
|
|
||||||
- 2025-11-19 Mendukung Gemini 3 Pro.
|
- 2025-11-19 Mendukung Gemini 3 Pro.
|
||||||
- 2025-11-12 Mendukung sinkronisasi data dari Confluence, AWS S3, Discord, Google Drive.
|
- 2025-11-12 Mendukung sinkronisasi data dari Confluence, S3, Notion, Discord, Google Drive.
|
||||||
- 2025-10-23 Mendukung MinerU & Docling sebagai metode penguraian dokumen.
|
- 2025-10-23 Mendukung MinerU & Docling sebagai metode penguraian dokumen.
|
||||||
- 2025-10-15 Dukungan untuk jalur data yang terorkestrasi.
|
- 2025-10-15 Dukungan untuk jalur data yang terorkestrasi.
|
||||||
- 2025-08-08 Mendukung model seri GPT-5 terbaru dari OpenAI.
|
- 2025-08-08 Mendukung model seri GPT-5 terbaru dari OpenAI.
|
||||||
|
|||||||
@ -67,7 +67,7 @@
|
|||||||
## 🔥 最新情報
|
## 🔥 最新情報
|
||||||
|
|
||||||
- 2025-11-19 Gemini 3 Proをサポートしています
|
- 2025-11-19 Gemini 3 Proをサポートしています
|
||||||
- 2025-11-12 Confluence、AWS S3、Discord、Google Drive からのデータ同期をサポートします。
|
- 2025-11-12 Confluence、S3、Notion、Discord、Google Drive からのデータ同期をサポートします。
|
||||||
- 2025-10-23 ドキュメント解析方法として MinerU と Docling をサポートします。
|
- 2025-10-23 ドキュメント解析方法として MinerU と Docling をサポートします。
|
||||||
- 2025-10-15 オーケストレーションされたデータパイプラインのサポート。
|
- 2025-10-15 オーケストレーションされたデータパイプラインのサポート。
|
||||||
- 2025-08-08 OpenAI の最新 GPT-5 シリーズモデルをサポートします。
|
- 2025-08-08 OpenAI の最新 GPT-5 シリーズモデルをサポートします。
|
||||||
|
|||||||
@ -68,7 +68,7 @@
|
|||||||
## 🔥 업데이트
|
## 🔥 업데이트
|
||||||
|
|
||||||
- 2025-11-19 Gemini 3 Pro를 지원합니다.
|
- 2025-11-19 Gemini 3 Pro를 지원합니다.
|
||||||
- 2025-11-12 Confluence, AWS S3, Discord, Google Drive에서 데이터 동기화를 지원합니다.
|
- 2025-11-12 Confluence, S3, Notion, Discord, Google Drive에서 데이터 동기화를 지원합니다.
|
||||||
- 2025-10-23 문서 파싱 방법으로 MinerU 및 Docling을 지원합니다.
|
- 2025-10-23 문서 파싱 방법으로 MinerU 및 Docling을 지원합니다.
|
||||||
- 2025-10-15 조정된 데이터 파이프라인 지원.
|
- 2025-10-15 조정된 데이터 파이프라인 지원.
|
||||||
- 2025-08-08 OpenAI의 최신 GPT-5 시리즈 모델을 지원합니다.
|
- 2025-08-08 OpenAI의 최신 GPT-5 시리즈 모델을 지원합니다.
|
||||||
|
|||||||
@ -87,7 +87,7 @@ Experimente nossa demo em [https://demo.ragflow.io](https://demo.ragflow.io).
|
|||||||
## 🔥 Últimas Atualizações
|
## 🔥 Últimas Atualizações
|
||||||
|
|
||||||
- 19-11-2025 Suporta Gemini 3 Pro.
|
- 19-11-2025 Suporta Gemini 3 Pro.
|
||||||
- 12-11-2025 Suporta a sincronização de dados do Confluence, AWS S3, Discord e Google Drive.
|
- 12-11-2025 Suporta a sincronização de dados do Confluence, S3, Notion, Discord e Google Drive.
|
||||||
- 23-10-2025 Suporta MinerU e Docling como métodos de análise de documentos.
|
- 23-10-2025 Suporta MinerU e Docling como métodos de análise de documentos.
|
||||||
- 15-10-2025 Suporte para pipelines de dados orquestrados.
|
- 15-10-2025 Suporte para pipelines de dados orquestrados.
|
||||||
- 08-08-2025 Suporta a mais recente série GPT-5 da OpenAI.
|
- 08-08-2025 Suporta a mais recente série GPT-5 da OpenAI.
|
||||||
|
|||||||
@ -86,7 +86,7 @@
|
|||||||
## 🔥 近期更新
|
## 🔥 近期更新
|
||||||
|
|
||||||
- 2025-11-19 支援 Gemini 3 Pro.
|
- 2025-11-19 支援 Gemini 3 Pro.
|
||||||
- 2025-11-12 支援從 Confluence、AWS S3、Discord、Google Drive 進行資料同步。
|
- 2025-11-12 支援從 Confluence、S3、Notion、Discord、Google Drive 進行資料同步。
|
||||||
- 2025-10-23 支援 MinerU 和 Docling 作為文件解析方法。
|
- 2025-10-23 支援 MinerU 和 Docling 作為文件解析方法。
|
||||||
- 2025-10-15 支援可編排的資料管道。
|
- 2025-10-15 支援可編排的資料管道。
|
||||||
- 2025-08-08 支援 OpenAI 最新的 GPT-5 系列模型。
|
- 2025-08-08 支援 OpenAI 最新的 GPT-5 系列模型。
|
||||||
|
|||||||
@ -86,7 +86,7 @@
|
|||||||
## 🔥 近期更新
|
## 🔥 近期更新
|
||||||
|
|
||||||
- 2025-11-19 支持 Gemini 3 Pro.
|
- 2025-11-19 支持 Gemini 3 Pro.
|
||||||
- 2025-11-12 支持从 Confluence、AWS S3、Discord、Google Drive 进行数据同步。
|
- 2025-11-12 支持从 Confluence、S3、Notion、Discord、Google Drive 进行数据同步。
|
||||||
- 2025-10-23 支持 MinerU 和 Docling 作为文档解析方法。
|
- 2025-10-23 支持 MinerU 和 Docling 作为文档解析方法。
|
||||||
- 2025-10-15 支持可编排的数据管道。
|
- 2025-10-15 支持可编排的数据管道。
|
||||||
- 2025-08-08 支持 OpenAI 最新的 GPT-5 系列模型。
|
- 2025-08-08 支持 OpenAI 最新的 GPT-5 系列模型。
|
||||||
|
|||||||
@ -32,7 +32,7 @@ class IterationParam(ComponentParamBase):
|
|||||||
def __init__(self):
|
def __init__(self):
|
||||||
super().__init__()
|
super().__init__()
|
||||||
self.items_ref = ""
|
self.items_ref = ""
|
||||||
self.veriable={}
|
self.variable={}
|
||||||
|
|
||||||
def get_input_form(self) -> dict[str, dict]:
|
def get_input_form(self) -> dict[str, dict]:
|
||||||
return {
|
return {
|
||||||
|
|||||||
@ -24,7 +24,7 @@ from flasgger import Swagger
|
|||||||
from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer
|
from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer
|
||||||
from quart_cors import cors
|
from quart_cors import cors
|
||||||
from common.constants import StatusEnum
|
from common.constants import StatusEnum
|
||||||
from api.db.db_models import close_connection
|
from api.db.db_models import close_connection, APIToken
|
||||||
from api.db.services import UserService
|
from api.db.services import UserService
|
||||||
from api.utils.json_encode import CustomJSONEncoder
|
from api.utils.json_encode import CustomJSONEncoder
|
||||||
from api.utils import commands
|
from api.utils import commands
|
||||||
@ -124,6 +124,10 @@ def _load_user():
|
|||||||
user = UserService.query(
|
user = UserService.query(
|
||||||
access_token=access_token, status=StatusEnum.VALID.value
|
access_token=access_token, status=StatusEnum.VALID.value
|
||||||
)
|
)
|
||||||
|
if not user and len(authorization.split()) == 2:
|
||||||
|
objs = APIToken.query(token=authorization.split()[1])
|
||||||
|
if objs:
|
||||||
|
user = UserService.query(id=objs[0].tenant_id, status=StatusEnum.VALID.value)
|
||||||
if user:
|
if user:
|
||||||
if not user[0].access_token or not user[0].access_token.strip():
|
if not user[0].access_token or not user[0].access_token.strip():
|
||||||
logging.warning(f"User {user[0].email} has empty access_token in database")
|
logging.warning(f"User {user[0].email} has empty access_token in database")
|
||||||
|
|||||||
@ -1434,6 +1434,7 @@ async def retrieval_test(tenant_id):
|
|||||||
question = req["question"]
|
question = req["question"]
|
||||||
doc_ids = req.get("document_ids", [])
|
doc_ids = req.get("document_ids", [])
|
||||||
use_kg = req.get("use_kg", False)
|
use_kg = req.get("use_kg", False)
|
||||||
|
toc_enhance = req.get("toc_enhance", False)
|
||||||
langs = req.get("cross_languages", [])
|
langs = req.get("cross_languages", [])
|
||||||
if not isinstance(doc_ids, list):
|
if not isinstance(doc_ids, list):
|
||||||
return get_error_data_result("`documents` should be a list")
|
return get_error_data_result("`documents` should be a list")
|
||||||
@ -1487,6 +1488,11 @@ async def retrieval_test(tenant_id):
|
|||||||
highlight=highlight,
|
highlight=highlight,
|
||||||
rank_feature=label_question(question, kbs),
|
rank_feature=label_question(question, kbs),
|
||||||
)
|
)
|
||||||
|
if toc_enhance:
|
||||||
|
chat_mdl = LLMBundle(kb.tenant_id, LLMType.CHAT)
|
||||||
|
cks = settings.retriever.retrieval_by_toc(question, ranks["chunks"], tenant_ids, chat_mdl, size)
|
||||||
|
if cks:
|
||||||
|
ranks["chunks"] = cks
|
||||||
if use_kg:
|
if use_kg:
|
||||||
ck = settings.kg_retriever.retrieval(question, [k.tenant_id for k in kbs], kb_ids, embd_mdl, LLMBundle(kb.tenant_id, LLMType.CHAT))
|
ck = settings.kg_retriever.retrieval(question, [k.tenant_id for k in kbs], kb_ids, embd_mdl, LLMBundle(kb.tenant_id, LLMType.CHAT))
|
||||||
if ck["content_with_weight"]:
|
if ck["content_with_weight"]:
|
||||||
|
|||||||
@ -2072,6 +2072,7 @@ Retrieves chunks from specified datasets.
|
|||||||
- `"cross_languages"`: `list[string]`
|
- `"cross_languages"`: `list[string]`
|
||||||
- `"metadata_condition"`: `object`
|
- `"metadata_condition"`: `object`
|
||||||
- `"use_kg"`: `boolean`
|
- `"use_kg"`: `boolean`
|
||||||
|
- `"toc_enhance"`: `boolean`
|
||||||
##### Request example
|
##### Request example
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@ -2122,6 +2123,8 @@ curl --request POST \
|
|||||||
The number of chunks engaged in vector cosine computation. Defaults to `1024`.
|
The number of chunks engaged in vector cosine computation. Defaults to `1024`.
|
||||||
- `"use_kg"`: (*Body parameter*), `boolean`
|
- `"use_kg"`: (*Body parameter*), `boolean`
|
||||||
The search includes text chunks related to the knowledge graph of the selected dataset to handle complex multi-hop queries. Defaults to `False`.
|
The search includes text chunks related to the knowledge graph of the selected dataset to handle complex multi-hop queries. Defaults to `False`.
|
||||||
|
- `"toc_enhance"`: (*Body parameter*), `boolean`
|
||||||
|
The search includes table of content enhancement in order to boost rank of relevant chunks. Files parsed with `TOC Enhance` enabled is prerequisite. Defaults to `False`.
|
||||||
- `"rerank_id"`: (*Body parameter*), `integer`
|
- `"rerank_id"`: (*Body parameter*), `integer`
|
||||||
The ID of the rerank model.
|
The ID of the rerank model.
|
||||||
- `"keyword"`: (*Body parameter*), `boolean`
|
- `"keyword"`: (*Body parameter*), `boolean`
|
||||||
@ -2136,6 +2139,9 @@ curl --request POST \
|
|||||||
The languages that should be translated into, in order to achieve keywords retrievals in different languages.
|
The languages that should be translated into, in order to achieve keywords retrievals in different languages.
|
||||||
- `"metadata_condition"`: (*Body parameter*), `object`
|
- `"metadata_condition"`: (*Body parameter*), `object`
|
||||||
The metadata condition used for filtering chunks:
|
The metadata condition used for filtering chunks:
|
||||||
|
- `"logic"`: (*Body parameter*), `string`
|
||||||
|
- `"and"` Intersection of the result from each condition (default).
|
||||||
|
- `"or"` union of the result from each condition.
|
||||||
- `"conditions"`: (*Body parameter*), `array`
|
- `"conditions"`: (*Body parameter*), `array`
|
||||||
A list of metadata filter conditions.
|
A list of metadata filter conditions.
|
||||||
- `"name"`: `string` - The metadata field name to filter by, e.g., `"author"`, `"company"`, `"url"`. Ensure this parameter before use. See [Set metadata](../guides/dataset/set_metadata.md) for details.
|
- `"name"`: `string` - The metadata field name to filter by, e.g., `"author"`, `"company"`, `"url"`. Ensure this parameter before use. See [Set metadata](../guides/dataset/set_metadata.md) for details.
|
||||||
|
|||||||
Reference in New Issue
Block a user