From 6814ace1aa1d449b792f2a87d5ee5686e41b3081 Mon Sep 17 00:00:00 2001 From: Jimmy Ben Klieve Date: Wed, 7 Jan 2026 10:00:09 +0800 Subject: [PATCH] docs: update docs icons (#12465) ### What problem does this PR solve? Update icons for docs. Trailing spaces are auto truncated by the editor, does not affect real content. ### Type of change - [x] Documentation Update --- docs/basics/rag.md | 14 +- docs/configurations.md | 3 + docs/contribution/_category_.json | 3 + docs/contribution/contributing.md | 5 +- docs/develop/_category_.json | 3 + docs/develop/acquire_ragflow_api_key.md | 3 + docs/develop/build_docker_image.mdx | 3 + docs/develop/launch_ragflow_from_source.md | 13 +- docs/develop/mcp/_category_.json | 3 + docs/develop/mcp/launch_mcp_server.md | 63 +- docs/develop/mcp/mcp_client_example.md | 7 +- docs/develop/mcp/mcp_tools.md | 3 + docs/develop/switch_doc_engine.md | 3 + docs/faq.mdx | 29 +- docs/guides/_category_.json | 3 + docs/guides/admin/_category_.json | 3 + docs/guides/admin/admin_cli.md | 7 +- docs/guides/admin/admin_service.md | 5 +- docs/guides/admin/admin_ui.md | 3 + docs/guides/agent/_category_.json | 3 + .../agent_component_reference/_category_.json | 3 + .../agent/agent_component_reference/agent.mdx | 31 +- .../await_response.mdx | 13 +- .../agent/agent_component_reference/begin.mdx | 11 +- .../agent_component_reference/categorize.mdx | 29 +- .../chunker_title.md | 5 +- .../chunker_token.md | 3 + .../agent/agent_component_reference/code.mdx | 25 +- .../agent_component_reference/execute_sql.md | 5 +- .../agent/agent_component_reference/http.md | 5 +- .../agent_component_reference/indexer.md | 3 + .../agent_component_reference/iteration.mdx | 13 +- .../agent_component_reference/message.mdx | 3 + .../agent/agent_component_reference/parser.md | 7 +- .../agent_component_reference/retrieval.mdx | 11 +- .../agent_component_reference/switch.mdx | 13 +- .../text_processing.mdx | 5 +- .../agent_component_reference/transformer.md | 21 +- docs/guides/agent/agent_introduction.md | 7 +- .../agent/best_practices/_category_.json | 3 + docs/guides/agent/embed_agent_into_webpage.md | 3 + docs/guides/agent/sandbox_quickstart.md | 7 +- docs/guides/ai_search.md | 5 +- docs/guides/chat/_category_.json | 3 + .../chat/best_practices/_category_.json | 3 + docs/guides/chat/implement_deep_research.md | 3 + docs/guides/chat/set_chat_variables.md | 9 +- docs/guides/chat/start_chat.md | 17 +- docs/guides/dataset/_category_.json | 3 + .../dataset/add_data_source/_category_.json | 3 + .../add_data_source/add_google_drive.md | 17 +- docs/guides/dataset/auto_metadata.md | 3 + .../dataset/autokeyword_autoquestion.mdx | 19 +- .../dataset/best_practices/_category_.json | 3 + .../configure_child_chunking_strategy.md | 3 + .../dataset/configure_knowledge_base.md | 27 +- .../dataset/construct_knowledge_graph.md | 11 +- docs/guides/dataset/enable_excel2html.md | 3 + docs/guides/dataset/enable_raptor.md | 9 +- .../dataset/extract_table_of_contents.md | 5 +- docs/guides/dataset/manage_metadata.md | 5 +- docs/guides/dataset/run_retrieval_test.md | 9 +- docs/guides/dataset/select_pdf_parser.md | 7 +- docs/guides/dataset/set_context_window.md | 3 + docs/guides/dataset/set_metadata.md | 3 + docs/guides/dataset/set_page_rank.md | 3 + docs/guides/dataset/use_tag_sets.md | 13 +- docs/guides/manage_files.md | 17 +- docs/guides/migration/_category_.json | 3 + docs/guides/models/_category_.json | 3 + docs/guides/models/deploy_local_llm.mdx | 33 +- docs/guides/models/llm_api_key_setup.md | 5 +- docs/guides/team/_category_.json | 3 + docs/guides/team/join_or_leave_team.md | 3 + docs/guides/team/manage_team_members.md | 3 + docs/guides/team/share_agents.md | 5 +- docs/guides/team/share_chat_assistant.md | 3 + docs/guides/team/share_knowledge_bases.md | 3 + docs/guides/team/share_model.md | 3 + docs/guides/tracing.mdx | 29 +- docs/guides/upgrade_ragflow.mdx | 3 + docs/quickstart.mdx | 43 +- docs/references/_category_.json | 3 + docs/references/glossary.mdx | 3 + docs/references/http_api_reference.md | 579 +++++++++--------- docs/references/python_api_reference.md | 221 +++---- docs/references/supported_models.mdx | 3 + docs/release_notes.md | 21 +- 88 files changed, 922 insertions(+), 661 deletions(-) diff --git a/docs/basics/rag.md b/docs/basics/rag.md index 4cf2e7997..fc7025a38 100644 --- a/docs/basics/rag.md +++ b/docs/basics/rag.md @@ -3,7 +3,7 @@ sidebar_position: 1 slug: /what-is-rag --- -# What is Retreival-Augmented-Generation (RAG)? +# What is Retreival-Augmented-Generation (RAG)? Since large language models (LLMs) became the focus of technology, their ability to handle general knowledge has been astonishing. However, when questions shift to internal corporate documents, proprietary knowledge bases, or real-time data, the limitations of LLMs become glaringly apparent: they cannot access private information outside their training data. Retrieval-Augmented Generation (RAG) was born precisely to address this core need. Before an LLM generates an answer, it first retrieves the most relevant context from an external knowledge base and inputs it as "reference material" to the LLM, thereby guiding it to produce accurate answers. In short, RAG elevates LLMs from "relying on memory" to "having evidence to rely on," significantly improving their accuracy and trustworthiness in specialized fields and real-time information queries. @@ -86,22 +86,22 @@ They are highly consistent at the technical base (e.g., vector retrieval, keywor RAG has demonstrated clear value in several typical scenarios: -1. Enterprise Knowledge Q&A and Internal Search +1. Enterprise Knowledge Q&A and Internal Search By vectorizing corporate private data and combining it with an LLM, RAG can directly return natural language answers based on authoritative sources, rather than document lists. While meeting intelligent Q&A needs, it inherently aligns with corporate requirements for data security, access control, and compliance. -2. Complex Document Understanding and Professional Q&A +2. Complex Document Understanding and Professional Q&A For structurally complex documents like contracts and regulations, the value of RAG lies in its ability to generate accurate, verifiable answers while maintaining context integrity. Its system accuracy largely depends on text chunking and semantic understanding strategies. -3. Dynamic Knowledge Fusion and Decision Support +3. Dynamic Knowledge Fusion and Decision Support In business scenarios requiring the synthesis of information from multiple sources, RAG evolves into a knowledge orchestration and reasoning support system for business decisions. Through a multi-path recall mechanism, it fuses knowledge from different systems and formats, maintaining factual consistency and logical controllability during the generation phase. ## The future of RAG The evolution of RAG is unfolding along several clear paths: -1. RAG as the data foundation for Agents +1. RAG as the data foundation for Agents RAG and agents have an architecture vs. scenario relationship. For agents to achieve autonomous and reliable decision-making and execution, they must rely on accurate and timely knowledge. RAG provides them with a standardized capability to access private domain knowledge and is an inevitable choice for building knowledge-aware agents. -2. Advanced RAG: Using LLMs to optimize retrieval itself +2. Advanced RAG: Using LLMs to optimize retrieval itself The core feature of next-generation RAG is fully utilizing the reasoning capabilities of LLMs to optimize the retrieval process, such as rewriting queries, summarizing or fusing results, or implementing intelligent routing. Empowering every aspect of retrieval with LLMs is key to breaking through current performance bottlenecks. -3. Towards context engineering 2.0 +3. Towards context engineering 2.0 Current RAG can be viewed as Context Engineering 1.0, whose core is assembling static knowledge context for single Q&A tasks. The forthcoming Context Engineering 2.0 will extend with RAG technology at its core, becoming a system that automatically and dynamically assembles comprehensive context for agents. The context fused by this system will come not only from documents but also include interaction memory, available tools/skills, and real-time environmental information. This marks the transition of agent development from a "handicraft workshop" model to the industrial starting point of automated context engineering. The essence of RAG is to build a dedicated, efficient, and trustworthy external data interface for large language models; its core is Retrieval, not Generation. Starting from the practical need to solve private data access, its technical depth is reflected in the optimization of retrieval for complex unstructured data. With its deep integration into agent architectures and its development towards automated context engineering, RAG is evolving from a technology that improves Q&A quality into the core infrastructure for building the next generation of trustworthy, controllable, and scalable intelligent applications. diff --git a/docs/configurations.md b/docs/configurations.md index b55042e8f..565354d6c 100644 --- a/docs/configurations.md +++ b/docs/configurations.md @@ -1,6 +1,9 @@ --- sidebar_position: 1 slug: /configurations +sidebar_custom_props: { + sidebarIcon: LucideCog +} --- # Configuration diff --git a/docs/contribution/_category_.json b/docs/contribution/_category_.json index 594fe200b..a9bd348a8 100644 --- a/docs/contribution/_category_.json +++ b/docs/contribution/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "Miscellaneous contribution guides." + }, + "customProps": { + "sidebarIcon": "LucideHandshake" } } diff --git a/docs/contribution/contributing.md b/docs/contribution/contributing.md index 5d1ec19c1..53d5d0839 100644 --- a/docs/contribution/contributing.md +++ b/docs/contribution/contributing.md @@ -1,6 +1,9 @@ --- sidebar_position: 1 slug: /contributing +sidebar_custom_props: { + categoryIcon: LucideBookA +} --- # Contribution guidelines @@ -32,7 +35,7 @@ The list below mentions some contributions you can make, but it is not a complet 1. Fork our GitHub repository. 2. Clone your fork to your local machine: `git clone git@github.com:/ragflow.git` -3. Create a local branch: +3. Create a local branch: `git checkout -b my-branch` 4. Provide sufficient information in your commit message `git commit -m 'Provide sufficient info in your commit message'` diff --git a/docs/develop/_category_.json b/docs/develop/_category_.json index 036bc99a1..c80693175 100644 --- a/docs/develop/_category_.json +++ b/docs/develop/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "Guides for hardcore developers" + }, + "customProps": { + "sidebarIcon": "LucideWrench" } } diff --git a/docs/develop/acquire_ragflow_api_key.md b/docs/develop/acquire_ragflow_api_key.md index 4dc4520fe..fec9f6da3 100644 --- a/docs/develop/acquire_ragflow_api_key.md +++ b/docs/develop/acquire_ragflow_api_key.md @@ -1,6 +1,9 @@ --- sidebar_position: 4 slug: /acquire_ragflow_api_key +sidebar_custom_props: { + categoryIcon: LucideKey +} --- # Acquire RAGFlow API key diff --git a/docs/develop/build_docker_image.mdx b/docs/develop/build_docker_image.mdx index 3d20430f3..3a1ef3506 100644 --- a/docs/develop/build_docker_image.mdx +++ b/docs/develop/build_docker_image.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 1 slug: /build_docker_image +sidebar_custom_props: { + categoryIcon: LucidePackage +} --- # Build RAGFlow Docker image diff --git a/docs/develop/launch_ragflow_from_source.md b/docs/develop/launch_ragflow_from_source.md index 0f1542529..11510f717 100644 --- a/docs/develop/launch_ragflow_from_source.md +++ b/docs/develop/launch_ragflow_from_source.md @@ -1,6 +1,9 @@ --- sidebar_position: 2 slug: /launch_ragflow_from_source +sidebar_custom_props: { + categoryIcon: LucideMonitorPlay +} --- # Launch service from source @@ -36,7 +39,7 @@ cd ragflow/ ### Install Python dependencies 1. Install uv: - + ```bash pipx install uv ``` @@ -88,13 +91,13 @@ docker compose -f docker/docker-compose-base.yml up -d ``` 3. **Optional:** If you cannot access HuggingFace, set the HF_ENDPOINT environment variable to use a mirror site: - + ```bash export HF_ENDPOINT=https://hf-mirror.com ``` 4. Check the configuration in **conf/service_conf.yaml**, ensuring all hosts and ports are correctly set. - + 5. Run the **entrypoint.sh** script to launch the backend service: ```shell @@ -123,10 +126,10 @@ docker compose -f docker/docker-compose-base.yml up -d 3. Start up the RAGFlow frontend service: ```bash - npm run dev + npm run dev ``` - *The following message appears, showing the IP address and port number of your frontend service:* + *The following message appears, showing the IP address and port number of your frontend service:* ![](https://github.com/user-attachments/assets/0daf462c-a24d-4496-a66f-92533534e187) diff --git a/docs/develop/mcp/_category_.json b/docs/develop/mcp/_category_.json index d2f129c23..eb7b1444a 100644 --- a/docs/develop/mcp/_category_.json +++ b/docs/develop/mcp/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "Guides and references on accessing RAGFlow's datasets via MCP." + }, + "customProps": { + "categoryIcon": "SiModelcontextprotocol" } } diff --git a/docs/develop/mcp/launch_mcp_server.md b/docs/develop/mcp/launch_mcp_server.md index 2b9f052f0..e3a27e071 100644 --- a/docs/develop/mcp/launch_mcp_server.md +++ b/docs/develop/mcp/launch_mcp_server.md @@ -1,6 +1,9 @@ --- sidebar_position: 1 slug: /launch_mcp_server +sidebar_custom_props: { + categoryIcon: LucideTvMinimalPlay +} --- # Launch RAGFlow MCP server @@ -9,13 +12,13 @@ Launch an MCP server from source or via Docker. --- -A RAGFlow Model Context Protocol (MCP) server is designed as an independent component to complement the RAGFlow server. Note that an MCP server must operate alongside a properly functioning RAGFlow server. +A RAGFlow Model Context Protocol (MCP) server is designed as an independent component to complement the RAGFlow server. Note that an MCP server must operate alongside a properly functioning RAGFlow server. -An MCP server can start up in either self-host mode (default) or host mode: +An MCP server can start up in either self-host mode (default) or host mode: -- **Self-host mode**: +- **Self-host mode**: When launching an MCP server in self-host mode, you must provide an API key to authenticate the MCP server with the RAGFlow server. In this mode, the MCP server can access *only* the datasets of a specified tenant on the RAGFlow server. -- **Host mode**: +- **Host mode**: In host mode, each MCP client can access their own datasets on the RAGFlow server. However, each client request must include a valid API key to authenticate the client with the RAGFlow server. Once a connection is established, an MCP server communicates with its client in MCP HTTP+SSE (Server-Sent Events) mode, unidirectionally pushing responses from the RAGFlow server to its client in real time. @@ -29,9 +32,9 @@ Once a connection is established, an MCP server communicates with its client in If you wish to try out our MCP server without upgrading RAGFlow, community contributor [yiminghub2024](https://github.com/yiminghub2024) 👏 shares their recommended steps [here](#launch-an-mcp-server-without-upgrading-ragflow). ::: -## Launch an MCP server +## Launch an MCP server -You can start an MCP server either from source code or via Docker. +You can start an MCP server either from source code or via Docker. ### Launch from source code @@ -48,7 +51,7 @@ uv run mcp/server/server.py --host=127.0.0.1 --port=9382 --base-url=http://127.0 # uv run mcp/server/server.py --host=127.0.0.1 --port=9382 --base-url=http://127.0.0.1:9380 --mode=host ``` -Where: +Where: - `host`: The MCP server's host address. - `port`: The MCP server's listening port. @@ -94,7 +97,7 @@ The MCP server is designed as an optional component that complements the RAGFlow # - --no-json-response # Disables JSON responses for the streamable-HTTP transport ``` -Where: +Where: - `mcp-host`: The MCP server's host address. - `mcp-port`: The MCP server's listening port. @@ -119,13 +122,13 @@ Run `docker compose -f docker-compose.yml up` to launch the RAGFlow server toget docker-ragflow-cpu-1 | Starting MCP Server on 0.0.0.0:9382 with base URL http://127.0.0.1:9380... docker-ragflow-cpu-1 | Starting 1 task executor(s) on host 'dd0b5e07e76f'... docker-ragflow-cpu-1 | 2025-04-18 15:41:18,816 INFO 27 ragflow_server log path: /ragflow/logs/ragflow_server.log, log levels: {'peewee': 'WARNING', 'pdfminer': 'WARNING', 'root': 'INFO'} - docker-ragflow-cpu-1 | + docker-ragflow-cpu-1 | docker-ragflow-cpu-1 | __ __ ____ ____ ____ _____ ______ _______ ____ docker-ragflow-cpu-1 | | \/ |/ ___| _ \ / ___|| ____| _ \ \ / / ____| _ \ docker-ragflow-cpu-1 | | |\/| | | | |_) | \___ \| _| | |_) \ \ / /| _| | |_) | docker-ragflow-cpu-1 | | | | | |___| __/ ___) | |___| _ < \ V / | |___| _ < docker-ragflow-cpu-1 | |_| |_|\____|_| |____/|_____|_| \_\ \_/ |_____|_| \_\ - docker-ragflow-cpu-1 | + docker-ragflow-cpu-1 | docker-ragflow-cpu-1 | MCP launch mode: self-host docker-ragflow-cpu-1 | MCP host: 0.0.0.0 docker-ragflow-cpu-1 | MCP port: 9382 @@ -138,13 +141,13 @@ Run `docker compose -f docker-compose.yml up` to launch the RAGFlow server toget docker-ragflow-cpu-1 | 2025-04-18 15:41:23,263 INFO 27 init database on cluster mode successfully docker-ragflow-cpu-1 | 2025-04-18 15:41:25,318 INFO 27 load_model /ragflow/rag/res/deepdoc/det.onnx uses CPU docker-ragflow-cpu-1 | 2025-04-18 15:41:25,367 INFO 27 load_model /ragflow/rag/res/deepdoc/rec.onnx uses CPU - docker-ragflow-cpu-1 | ____ ___ ______ ______ __ + docker-ragflow-cpu-1 | ____ ___ ______ ______ __ docker-ragflow-cpu-1 | / __ \ / | / ____// ____// /____ _ __ docker-ragflow-cpu-1 | / /_/ // /| | / / __ / /_ / // __ \| | /| / / - docker-ragflow-cpu-1 | / _, _// ___ |/ /_/ // __/ / // /_/ /| |/ |/ / - docker-ragflow-cpu-1 | /_/ |_|/_/ |_|\____//_/ /_/ \____/ |__/|__/ - docker-ragflow-cpu-1 | - docker-ragflow-cpu-1 | + docker-ragflow-cpu-1 | / _, _// ___ |/ /_/ // __/ / // /_/ /| |/ |/ / + docker-ragflow-cpu-1 | /_/ |_|/_/ |_|\____//_/ /_/ \____/ |__/|__/ + docker-ragflow-cpu-1 | + docker-ragflow-cpu-1 | docker-ragflow-cpu-1 | 2025-04-18 15:41:29,088 INFO 27 RAGFlow version: v0.18.0-285-gb2c299fa full docker-ragflow-cpu-1 | 2025-04-18 15:41:29,088 INFO 27 project base: /ragflow docker-ragflow-cpu-1 | 2025-04-18 15:41:29,088 INFO 27 Current configs, from /ragflow/conf/service_conf.yaml: @@ -153,12 +156,12 @@ Run `docker compose -f docker-compose.yml up` to launch the RAGFlow server toget docker-ragflow-cpu-1 | * Running on all addresses (0.0.0.0) docker-ragflow-cpu-1 | * Running on http://127.0.0.1:9380 docker-ragflow-cpu-1 | * Running on http://172.19.0.6:9380 - docker-ragflow-cpu-1 | ______ __ ______ __ + docker-ragflow-cpu-1 | ______ __ ______ __ docker-ragflow-cpu-1 | /_ __/___ ______/ /__ / ____/ _____ _______ __/ /_____ _____ docker-ragflow-cpu-1 | / / / __ `/ ___/ //_/ / __/ | |/_/ _ \/ ___/ / / / __/ __ \/ ___/ - docker-ragflow-cpu-1 | / / / /_/ (__ ) ,< / /____> 9200/tcp, :::9200->9200/tcp ragflow-es-01 @@ -368,7 +371,7 @@ Yes, we do. See the Python files under the **rag/app** folder. $ docker ps ``` - *The status of a healthy Elasticsearch component should look as follows:* + *The status of a healthy Elasticsearch component should look as follows:* ```bash cd29bcb254bc quay.io/minio/minio:RELEASE.2023-12-20T01-00-02Z "/usr/bin/docker-ent…" 2 weeks ago Up 11 hours 0.0.0.0:9001->9001/tcp, :::9001->9001/tcp, 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp ragflow-minio @@ -451,7 +454,7 @@ See [Upgrade RAGFlow](./guides/upgrade_ragflow.mdx) for more information. To switch your document engine from Elasticsearch to [Infinity](https://github.com/infiniflow/infinity): -1. Stop all running containers: +1. Stop all running containers: ```bash $ docker compose -f docker/docker-compose.yml down -v @@ -461,7 +464,7 @@ To switch your document engine from Elasticsearch to [Infinity](https://github.c ::: 2. In **docker/.env**, set `DOC_ENGINE=${DOC_ENGINE:-infinity}` -3. Restart your Docker image: +3. Restart your Docker image: ```bash $ docker compose -f docker-compose.yml up -d @@ -506,12 +509,12 @@ From v0.22.0 onwards, RAGFlow includes MinerU (≥ 2.6.3) as an optional PDF pa - `"vlm-mlx-engine"` - `"vlm-vllm-async-engine"` - `"vlm-lmdeploy-engine"`. - - `MINERU_SERVER_URL`: (optional) The downstream vLLM HTTP server (e.g., `http://vllm-host:30000`). Applicable when `MINERU_BACKEND` is set to `"vlm-http-client"`. + - `MINERU_SERVER_URL`: (optional) The downstream vLLM HTTP server (e.g., `http://vllm-host:30000`). Applicable when `MINERU_BACKEND` is set to `"vlm-http-client"`. - `MINERU_OUTPUT_DIR`: (optional) The local directory for holding the outputs of the MinerU API service (zip/JSON) before ingestion. - `MINERU_DELETE_OUTPUT`: Whether to delete temporary output when a temporary directory is used: - `1`: Delete. - `0`: Retain. -3. In the web UI, navigate to your dataset's **Configuration** page and find the **Ingestion pipeline** section: +3. In the web UI, navigate to your dataset's **Configuration** page and find the **Ingestion pipeline** section: - If you decide to use a chunking method from the **Built-in** dropdown, ensure it supports PDF parsing, then select **MinerU** from the **PDF parser** dropdown. - If you use a custom ingestion pipeline instead, select **MinerU** in the **PDF parser** section of the **Parser** component. diff --git a/docs/guides/_category_.json b/docs/guides/_category_.json index 895506b00..18f4890a9 100644 --- a/docs/guides/_category_.json +++ b/docs/guides/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "Guides for RAGFlow users and developers." + }, + "customProps": { + "sidebarIcon": "LucideBookMarked" } } diff --git a/docs/guides/admin/_category_.json b/docs/guides/admin/_category_.json index 590d62083..fa6d832fc 100644 --- a/docs/guides/admin/_category_.json +++ b/docs/guides/admin/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "RAGFlow administration" + }, + "customProps": { + "categoryIcon": "LucideUserCog" } } diff --git a/docs/guides/admin/admin_cli.md b/docs/guides/admin/admin_cli.md index 5a6cc3b0b..d03afc6f2 100644 --- a/docs/guides/admin/admin_cli.md +++ b/docs/guides/admin/admin_cli.md @@ -1,6 +1,9 @@ --- sidebar_position: 2 slug: /admin_cli +sidebar_custom_props: { + categoryIcon: LucideSquareTerminal +} --- # Admin CLI @@ -27,9 +30,9 @@ The RAGFlow Admin CLI is a command-line-based system administration tool that of The default password is admin. **Parameters:** - + - -h: RAGFlow admin server host address - + - -p: RAGFlow admin server port ## Default administrative account diff --git a/docs/guides/admin/admin_service.md b/docs/guides/admin/admin_service.md index 7e5f13025..52162a5b1 100644 --- a/docs/guides/admin/admin_service.md +++ b/docs/guides/admin/admin_service.md @@ -1,6 +1,9 @@ --- sidebar_position: 0 slug: /admin_service +sidebar_custom_props: { + categoryIcon: LucideActivity +} --- @@ -24,7 +27,7 @@ With its unified interface design, the Admin Service combines the convenience of python admin/server/admin_server.py ``` - The service will start and listen for incoming connections from the CLI on the configured port. + The service will start and listen for incoming connections from the CLI on the configured port. ### Using docker image diff --git a/docs/guides/admin/admin_ui.md b/docs/guides/admin/admin_ui.md index 148257ae5..67786421e 100644 --- a/docs/guides/admin/admin_ui.md +++ b/docs/guides/admin/admin_ui.md @@ -1,6 +1,9 @@ --- sidebar_position: 1 slug: /admin_ui +sidebar_custom_props: { + categoryIcon: LucidePalette +} --- # Admin UI diff --git a/docs/guides/agent/_category_.json b/docs/guides/agent/_category_.json index 020ba1d3f..dc81d28a4 100644 --- a/docs/guides/agent/_category_.json +++ b/docs/guides/agent/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "RAGFlow v0.8.0 introduces an agent mechanism, featuring a no-code workflow editor on the front end and a comprehensive graph-based task orchestration framework on the backend." + }, + "customProps": { + "categoryIcon": "RagAiAgent" } } diff --git a/docs/guides/agent/agent_component_reference/_category_.json b/docs/guides/agent/agent_component_reference/_category_.json index 7548ec803..c40dadb14 100644 --- a/docs/guides/agent/agent_component_reference/_category_.json +++ b/docs/guides/agent/agent_component_reference/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "A complete reference for RAGFlow's agent components." + }, + "customProps": { + "categoryIcon": "RagAiAgent" } } diff --git a/docs/guides/agent/agent_component_reference/agent.mdx b/docs/guides/agent/agent_component_reference/agent.mdx index 882c22be1..29b0e0d69 100644 --- a/docs/guides/agent/agent_component_reference/agent.mdx +++ b/docs/guides/agent/agent_component_reference/agent.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 2 slug: /agent_component +sidebar_custom_props: { + categoryIcon: RagAiAgent +} --- # Agent component @@ -16,7 +19,7 @@ An **Agent** component fine-tunes the LLM and sets its prompt. From v0.20.5 onwa ## Scenarios -An **Agent** component is essential when you need the LLM to assist with summarizing, translating, or controlling various tasks. +An **Agent** component is essential when you need the LLM to assist with summarizing, translating, or controlling various tasks. ## Prerequisites @@ -28,13 +31,13 @@ An **Agent** component is essential when you need the LLM to assist with summari ## Quickstart -### 1. Click on an **Agent** component to show its configuration panel +### 1. Click on an **Agent** component to show its configuration panel The corresponding configuration panel appears to the right of the canvas. Use this panel to define and fine-tune the **Agent** component's behavior. ### 2. Select your model -Click **Model**, and select a chat model from the dropdown menu. +Click **Model**, and select a chat model from the dropdown menu. :::tip NOTE If no model appears, check if your have added a chat model on the **Model providers** page. @@ -55,7 +58,7 @@ In this quickstart, we assume your **Agent** component is used standalone (witho ### 5. Skip Tools and Agent -The **+ Add tools** and **+ Add agent** sections are used *only* when you need to configure your **Agent** component as a planner (with tools or sub-Agents beneath). In this quickstart, we assume your **Agent** component is used standalone (without tools or sub-Agents beneath). +The **+ Add tools** and **+ Add agent** sections are used *only* when you need to configure your **Agent** component as a planner (with tools or sub-Agents beneath). In this quickstart, we assume your **Agent** component is used standalone (without tools or sub-Agents beneath). ### 6. Choose the next component @@ -71,7 +74,7 @@ In this section, we assume your **Agent** will be configured as a planner, with ![](https://raw.githubusercontent.com/infiniflow/ragflow-docs/main/images/mcp_page.jpg) -### 2. Configure your Tavily MCP server +### 2. Configure your Tavily MCP server Update your MCP server's name, URL (including the API key), server type, and other necessary settings. When configured correctly, the available tools will be displayed. @@ -110,7 +113,7 @@ On the canvas, click the newly-populated Tavily server to view and select its av Click the dropdown menu of **Model** to show the model configuration window. -- **Model**: The chat model to use. +- **Model**: The chat model to use. - Ensure you set the chat model correctly on the **Model providers** page. - You can use different models for different components to increase flexibility or improve overall performance. - **Creativity**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. @@ -118,21 +121,21 @@ Click the dropdown menu of **Model** to show the model configuration window. - **Improvise**: Produces more creative responses. - **Precise**: (Default) Produces more conservative responses. - **Balance**: A middle ground between **Improvise** and **Precise**. -- **Temperature**: The randomness level of the model's output. +- **Temperature**: The randomness level of the model's output. Defaults to 0.1. - Lower values lead to more deterministic and predictable outputs. - Higher values lead to more creative and varied outputs. - A temperature of zero results in the same output for the same prompt. -- **Top P**: Nucleus sampling. +- **Top P**: Nucleus sampling. - Reduces the likelihood of generating repetitive or unnatural text by setting a threshold *P* and restricting the sampling to tokens with a cumulative probability exceeding *P*. - Defaults to 0.3. -- **Presence penalty**: Encourages the model to include a more diverse range of tokens in the response. +- **Presence penalty**: Encourages the model to include a more diverse range of tokens in the response. - A higher **presence penalty** value results in the model being more likely to generate tokens not yet been included in the generated text. - Defaults to 0.4. -- **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. +- **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. - Defaults to 0.7. -- **Max tokens**: +- **Max tokens**: This sets the maximum length of the model's output, measured in the number of tokens (words or pieces of words). It is disabled by default, allowing the model to determine the number of tokens in its responses. :::tip NOTE @@ -142,7 +145,7 @@ Click the dropdown menu of **Model** to show the model configuration window. ### System prompt -Typically, you use the system prompt to describe the task for the LLM, specify how it should respond, and outline other miscellaneous requirements. We do not plan to elaborate on this topic, as it can be as extensive as prompt engineering. However, please be aware that the system prompt is often used in conjunction with keys (variables), which serve as various data inputs for the LLM. +Typically, you use the system prompt to describe the task for the LLM, specify how it should respond, and outline other miscellaneous requirements. We do not plan to elaborate on this topic, as it can be as extensive as prompt engineering. However, please be aware that the system prompt is often used in conjunction with keys (variables), which serve as various data inputs for the LLM. An **Agent** component relies on keys (variables) to specify its data inputs. Its immediate upstream component is *not* necessarily its data input, and the arrows in the workflow indicate *only* the processing sequence. Keys in a **Agent** component are used in conjunction with the system prompt to specify data inputs for the LLM. Use a forward slash `/` or the **(x)** button to show the keys to use. @@ -190,11 +193,11 @@ From v0.20.5 onwards, four framework-level prompt blocks are available in the ** The user-defined prompt. Defaults to `sys.query`, the user query. As a general rule, when using the **Agent** component as a standalone module (not as a planner), you usually need to specify the corresponding **Retrieval** component’s output variable (`formalized_content`) here as part of the input to the LLM. -### Tools +### Tools You can use an **Agent** component as a collaborator that reasons and reflects with the aid of other tools; for instance, **Retrieval** can serve as one such tool for an **Agent**. -### Agent +### Agent You use an **Agent** component as a collaborator that reasons and reflects with the aid of subagents or other tools, forming a multi-agent system. diff --git a/docs/guides/agent/agent_component_reference/await_response.mdx b/docs/guides/agent/agent_component_reference/await_response.mdx index 973e1dfa5..4f30c38d0 100644 --- a/docs/guides/agent/agent_component_reference/await_response.mdx +++ b/docs/guides/agent/agent_component_reference/await_response.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 5 slug: /await_response +sidebar_custom_props: { + categoryIcon: LucideMessageSquareDot +} --- # Await response component @@ -23,7 +26,7 @@ Whether to show the message defined in the **Message** field. ### Message -The static message to send out. +The static message to send out. Click **+ Add message** to add message options. When multiple messages are supplied, the **Message** component randomly selects one to send. @@ -31,9 +34,9 @@ Click **+ Add message** to add message options. When multiple messages are suppl You can define global variables within the **Await response** component, which can be either mandatory or optional. Once set, users will need to provide values for these variables when engaging with the agent. Click **+** to add a global variable, each with the following attributes: -- **Name**: _Required_ - A descriptive name providing additional details about the variable. -- **Type**: _Required_ +- **Name**: _Required_ + A descriptive name providing additional details about the variable. +- **Type**: _Required_ The type of the variable: - **Single-line text**: Accepts a single line of text without line breaks. - **Paragraph text**: Accepts multiple lines of text, including line breaks. @@ -41,7 +44,7 @@ You can define global variables within the **Await response** component, which c - **file upload**: Requires the user to upload one or multiple files. - **Number**: Accepts a number as input. - **Boolean**: Requires the user to toggle between on and off. -- **Key**: _Required_ +- **Key**: _Required_ The unique variable name. - **Optional**: A toggle indicating whether the variable is optional. diff --git a/docs/guides/agent/agent_component_reference/begin.mdx b/docs/guides/agent/agent_component_reference/begin.mdx index c265bd2c6..921ed898b 100644 --- a/docs/guides/agent/agent_component_reference/begin.mdx +++ b/docs/guides/agent/agent_component_reference/begin.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 1 slug: /begin_component +sidebar_custom_props: { + categoryIcon: LucideHome +} --- # Begin component @@ -36,9 +39,9 @@ An agent in conversational mode begins with an opening greeting. It is the agent You can define global variables within the **Begin** component, which can be either mandatory or optional. Once set, users will need to provide values for these variables when engaging with the agent. Click **+ Add variable** to add a global variable, each with the following attributes: -- **Name**: _Required_ - A descriptive name providing additional details about the variable. -- **Type**: _Required_ +- **Name**: _Required_ + A descriptive name providing additional details about the variable. +- **Type**: _Required_ The type of the variable: - **Single-line text**: Accepts a single line of text without line breaks. - **Paragraph text**: Accepts multiple lines of text, including line breaks. @@ -46,7 +49,7 @@ You can define global variables within the **Begin** component, which can be eit - **file upload**: Requires the user to upload one or multiple files. - **Number**: Accepts a number as input. - **Boolean**: Requires the user to toggle between on and off. -- **Key**: _Required_ +- **Key**: _Required_ The unique variable name. - **Optional**: A toggle indicating whether the variable is optional. diff --git a/docs/guides/agent/agent_component_reference/categorize.mdx b/docs/guides/agent/agent_component_reference/categorize.mdx index a40cc3731..9c710318e 100644 --- a/docs/guides/agent/agent_component_reference/categorize.mdx +++ b/docs/guides/agent/agent_component_reference/categorize.mdx @@ -1,11 +1,14 @@ --- sidebar_position: 8 slug: /categorize_component +sidebar_custom_props: { + categoryIcon: LucideSwatchBook +} --- # Categorize component -A component that classifies user inputs and applies strategies accordingly. +A component that classifies user inputs and applies strategies accordingly. --- @@ -23,7 +26,7 @@ A **Categorize** component is essential when you need the LLM to help you identi Select the source for categorization. -The **Categorize** component relies on query variables to specify its data inputs (queries). All global variables defined before the **Categorize** component are available in the dropdown list. +The **Categorize** component relies on query variables to specify its data inputs (queries). All global variables defined before the **Categorize** component are available in the dropdown list. ### Input @@ -31,7 +34,7 @@ The **Categorize** component relies on query variables to specify its data input The **Categorize** component relies on input variables to specify its data inputs (queries). Click **+ Add variable** in the **Input** section to add the desired input variables. There are two types of input variables: **Reference** and **Text**. - **Reference**: Uses a component's output or a user input as the data source. You are required to select from the dropdown menu: - - A component ID under **Component Output**, or + - A component ID under **Component Output**, or - A global variable under **Begin input**, which is defined in the **Begin** component. - **Text**: Uses fixed text as the query. You are required to enter static text. @@ -39,29 +42,29 @@ The **Categorize** component relies on input variables to specify its data input Click the dropdown menu of **Model** to show the model configuration window. -- **Model**: The chat model to use. +- **Model**: The chat model to use. - Ensure you set the chat model correctly on the **Model providers** page. - You can use different models for different components to increase flexibility or improve overall performance. - **Creativity**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. - This parameter has three options: + This parameter has three options: - **Improvise**: Produces more creative responses. - **Precise**: (Default) Produces more conservative responses. - **Balance**: A middle ground between **Improvise** and **Precise**. -- **Temperature**: The randomness level of the model's output. - Defaults to 0.1. +- **Temperature**: The randomness level of the model's output. + Defaults to 0.1. - Lower values lead to more deterministic and predictable outputs. - Higher values lead to more creative and varied outputs. - A temperature of zero results in the same output for the same prompt. -- **Top P**: Nucleus sampling. +- **Top P**: Nucleus sampling. - Reduces the likelihood of generating repetitive or unnatural text by setting a threshold *P* and restricting the sampling to tokens with a cumulative probability exceeding *P*. - Defaults to 0.3. -- **Presence penalty**: Encourages the model to include a more diverse range of tokens in the response. +- **Presence penalty**: Encourages the model to include a more diverse range of tokens in the response. - A higher **presence penalty** value results in the model being more likely to generate tokens not yet been included in the generated text. - Defaults to 0.4. -- **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. +- **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. - Defaults to 0.7. -- **Max tokens**: +- **Max tokens**: This sets the maximum length of the model's output, measured in the number of tokens (words or pieces of words). It is disabled by default, allowing the model to determine the number of tokens in its responses. :::tip NOTE @@ -81,7 +84,7 @@ This feature is used for multi-turn dialogue *only*. If your **Categorize** comp ### Category name -A **Categorize** component must have at least two categories. This field sets the name of the category. Click **+ Add Item** to include the intended categories. +A **Categorize** component must have at least two categories. This field sets the name of the category. Click **+ Add Item** to include the intended categories. :::tip NOTE You will notice that the category name is auto-populated. No worries. Each category is assigned a random name upon creation. Feel free to change it to a name that is understandable to the LLM. @@ -89,7 +92,7 @@ You will notice that the category name is auto-populated. No worries. Each categ #### Description -Description of this category. +Description of this category. You can input criteria, situation, or information that may help the LLM determine which inputs belong in this category. diff --git a/docs/guides/agent/agent_component_reference/chunker_title.md b/docs/guides/agent/agent_component_reference/chunker_title.md index 27b8a97ce..f75d8796e 100644 --- a/docs/guides/agent/agent_component_reference/chunker_title.md +++ b/docs/guides/agent/agent_component_reference/chunker_title.md @@ -1,6 +1,9 @@ --- sidebar_position: 31 slug: /chunker_title_component +sidebar_custom_props: { + categoryIcon: LucideBlocks +} --- # Title chunker component @@ -23,7 +26,7 @@ Placing a **Title chunker** after a **Token chunker** is invalid and will cause ### Hierarchy -Specifies the heading level to define chunk boundaries: +Specifies the heading level to define chunk boundaries: - H1 - H2 diff --git a/docs/guides/agent/agent_component_reference/chunker_token.md b/docs/guides/agent/agent_component_reference/chunker_token.md index d93f0ea42..8f9623015 100644 --- a/docs/guides/agent/agent_component_reference/chunker_token.md +++ b/docs/guides/agent/agent_component_reference/chunker_token.md @@ -1,6 +1,9 @@ --- sidebar_position: 32 slug: /chunker_token_component +sidebar_custom_props: { + categoryIcon: LucideBlocks +} --- # Token chunker component diff --git a/docs/guides/agent/agent_component_reference/code.mdx b/docs/guides/agent/agent_component_reference/code.mdx index ea4831581..a9b9c82b8 100644 --- a/docs/guides/agent/agent_component_reference/code.mdx +++ b/docs/guides/agent/agent_component_reference/code.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 13 slug: /code_component +sidebar_custom_props: { + categoryIcon: LucideCodeXml +} --- # Code component @@ -33,7 +36,7 @@ If your RAGFlow Sandbox is not working, please be sure to consult the [Troublesh ### 3. (Optional) Install necessary dependencies -If you need to import your own Python or JavaScript packages into Sandbox, please follow the commands provided in the [How to import my own Python or JavaScript packages into Sandbox?](#how-to-import-my-own-python-or-javascript-packages-into-sandbox) section to install the additional dependencies. +If you need to import your own Python or JavaScript packages into Sandbox, please follow the commands provided in the [How to import my own Python or JavaScript packages into Sandbox?](#how-to-import-my-own-python-or-javascript-packages-into-sandbox) section to install the additional dependencies. ### 4. Enable Sandbox-specific settings in RAGFlow @@ -43,11 +46,11 @@ Ensure all Sandbox-specific settings are enabled in **ragflow/docker/.env**. Any changes to the configuration or environment *require* a full service restart to take effect. -## Configurations +## Configurations ### Input -You can specify multiple input sources for the **Code** component. Click **+ Add variable** in the **Input variables** section to include the desired input variables. +You can specify multiple input sources for the **Code** component. Click **+ Add variable** in the **Input variables** section to include the desired input variables. ### Code @@ -59,7 +62,7 @@ If your code implementation includes defined variables, whether input or output #### A Python code example -```Python +```Python def main(arg1: str, arg2: str) -> dict: return { "result": arg1 + arg2, @@ -102,7 +105,7 @@ The defined output variable(s) will be auto-populated here. ### `HTTPConnectionPool(host='sandbox-executor-manager', port=9385): Read timed out.` -**Root cause** +**Root cause** - You did not properly install gVisor and `runsc` was not recognized as a valid Docker runtime. - You did not pull the required base images for the runners and no runner was started. @@ -144,11 +147,11 @@ docker build -t sandbox-executor-manager:latest ./sandbox/executor_manager ### `HTTPConnectionPool(host='none', port=9385): Max retries exceeded.` -**Root cause** +**Root cause** `sandbox-executor-manager` is not mapped in `/etc/hosts`. -**Solution** +**Solution** Add a new entry to `/etc/hosts`: @@ -156,11 +159,11 @@ Add a new entry to `/etc/hosts`: ### `Container pool is busy` -**Root cause** +**Root cause** -All runners are currently in use, executing tasks. +All runners are currently in use, executing tasks. -**Solution** +**Solution** Please try again shortly or increase the pool size in the configuration to improve availability and reduce waiting times. @@ -205,7 +208,7 @@ To import your JavaScript packages, navigate to `sandbox_base_image/nodejs` and (ragflow) ➜ ragflow/sandbox main ✓ cd sandbox_base_image/nodejs -(ragflow) ➜ ragflow/sandbox/sandbox_base_image/nodejs main ✓ npm install lodash +(ragflow) ➜ ragflow/sandbox/sandbox_base_image/nodejs main ✓ npm install lodash (ragflow) ➜ ragflow/sandbox/sandbox_base_image/nodejs main ✓ cd ../.. # go back to sandbox root directory diff --git a/docs/guides/agent/agent_component_reference/execute_sql.md b/docs/guides/agent/agent_component_reference/execute_sql.md index 47561eccb..23786df6d 100644 --- a/docs/guides/agent/agent_component_reference/execute_sql.md +++ b/docs/guides/agent/agent_component_reference/execute_sql.md @@ -1,6 +1,9 @@ --- sidebar_position: 25 slug: /execute_sql +sidebar_custom_props: { + categoryIcon: RagSql +} --- # Execute SQL tool @@ -9,7 +12,7 @@ A tool that execute SQL queries on a specified relational database. --- -The **Execute SQL** tool enables you to connect to a relational database and run SQL queries, whether entered directly or generated by the system’s Text2SQL capability via an **Agent** component. +The **Execute SQL** tool enables you to connect to a relational database and run SQL queries, whether entered directly or generated by the system’s Text2SQL capability via an **Agent** component. ## Prerequisites diff --git a/docs/guides/agent/agent_component_reference/http.md b/docs/guides/agent/agent_component_reference/http.md index 51277f018..6de2f0e45 100644 --- a/docs/guides/agent/agent_component_reference/http.md +++ b/docs/guides/agent/agent_component_reference/http.md @@ -1,11 +1,14 @@ --- sidebar_position: 30 slug: /http_request_component +sidebar_custom_props: { + categoryIcon: RagHTTP +} --- # HTTP request component -A component that calls remote services. +A component that calls remote services. --- diff --git a/docs/guides/agent/agent_component_reference/indexer.md b/docs/guides/agent/agent_component_reference/indexer.md index 5bc2d925e..236ab6e68 100644 --- a/docs/guides/agent/agent_component_reference/indexer.md +++ b/docs/guides/agent/agent_component_reference/indexer.md @@ -1,6 +1,9 @@ --- sidebar_position: 40 slug: /indexer_component +sidebar_custom_props: { + categoryIcon: LucideListPlus +} --- # Indexer component diff --git a/docs/guides/agent/agent_component_reference/iteration.mdx b/docs/guides/agent/agent_component_reference/iteration.mdx index 9d4907d87..3ec4998e7 100644 --- a/docs/guides/agent/agent_component_reference/iteration.mdx +++ b/docs/guides/agent/agent_component_reference/iteration.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 7 slug: /iteration_component +sidebar_custom_props: { + categoryIcon: LucideRepeat2 +} --- # Iteration component @@ -9,12 +12,12 @@ A component that splits text input into text segments and iterates a predefined --- -An **Interaction** component can divide text input into text segments and apply its built-in component workflow to each segment. +An **Interaction** component can divide text input into text segments and apply its built-in component workflow to each segment. ## Scenario -An **Iteration** component is essential when a workflow loop is required and the loop count is *not* fixed but depends on number of segments created from the output of specific agent components. +An **Iteration** component is essential when a workflow loop is required and the loop count is *not* fixed but depends on number of segments created from the output of specific agent components. - If, for instance, you plan to feed several paragraphs into an LLM for content generation, each with its own focus, and feeding them to the LLM all at once could create confusion or contradictions, then you can use an **Iteration** component, which encapsulates a **Generate** component, to repeat the content generation process for each paragraph. - Another example: If you wish to use the LLM to translate a lengthy paper into a target language without exceeding its token limit, consider using an **Iteration** component, which encapsulates a **Generate** component, to break the paper into smaller pieces and repeat the translation process for each one. @@ -29,12 +32,12 @@ Each **Iteration** component includes an internal **IterationItem** component. T The **IterationItem** component is visible *only* to the components encapsulated by the current **Iteration** components. ::: -### Build an internal workflow +### Build an internal workflow You are allowed to pull other components into the **Iteration** component to build an internal workflow, and these "added internal components" are no longer visible to components outside of the current **Iteration** component. :::danger IMPORTANT -To reference the created text segments from an added internal component, simply add a **Reference** variable that equals **IterationItem** within the **Input** section of that internal component. There is no need to reference the corresponding external component, as the **IterationItem** component manages the loop of the workflow for all created text segments. +To reference the created text segments from an added internal component, simply add a **Reference** variable that equals **IterationItem** within the **Input** section of that internal component. There is no need to reference the corresponding external component, as the **IterationItem** component manages the loop of the workflow for all created text segments. ::: :::tip NOTE @@ -48,7 +51,7 @@ An added internal component can reference an external component when necessary. The **Iteration** component uses input variables to specify its data inputs, namely the texts to be segmented. You are allowed to specify multiple input sources for the **Iteration** component. Click **+ Add variable** in the **Input** section to include the desired input variables. There are two types of input variables: **Reference** and **Text**. - **Reference**: Uses a component's output or a user input as the data source. You are required to select from the dropdown menu: - - A component ID under **Component Output**, or + - A component ID under **Component Output**, or - A global variable under **Begin input**, which is defined in the **Begin** component. - **Text**: Uses fixed text as the query. You are required to enter static text. diff --git a/docs/guides/agent/agent_component_reference/message.mdx b/docs/guides/agent/agent_component_reference/message.mdx index 9e12ba547..a049e3a89 100644 --- a/docs/guides/agent/agent_component_reference/message.mdx +++ b/docs/guides/agent/agent_component_reference/message.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 4 slug: /message_component +sidebar_custom_props: { + categoryIcon: LucideMessageSquareReply +} --- # Message component diff --git a/docs/guides/agent/agent_component_reference/parser.md b/docs/guides/agent/agent_component_reference/parser.md index 0eb0f6bff..8dcb702cf 100644 --- a/docs/guides/agent/agent_component_reference/parser.md +++ b/docs/guides/agent/agent_component_reference/parser.md @@ -1,6 +1,9 @@ --- sidebar_position: 30 slug: /parser_component +sidebar_custom_props: { + categoryIcon: LucideFilePlay +} --- # Parser component @@ -54,12 +57,12 @@ Starting from v0.22.0, RAGFlow includes MinerU (≥ 2.6.3) as an optional PDF p - `"vlm-mlx-engine"` - `"vlm-vllm-async-engine"` - `"vlm-lmdeploy-engine"`. - - `MINERU_SERVER_URL`: (optional) The downstream vLLM HTTP server (e.g., `http://vllm-host:30000`). Applicable when `MINERU_BACKEND` is set to `"vlm-http-client"`. + - `MINERU_SERVER_URL`: (optional) The downstream vLLM HTTP server (e.g., `http://vllm-host:30000`). Applicable when `MINERU_BACKEND` is set to `"vlm-http-client"`. - `MINERU_OUTPUT_DIR`: (optional) The local directory for holding the outputs of the MinerU API service (zip/JSON) before ingestion. - `MINERU_DELETE_OUTPUT`: Whether to delete temporary output when a temporary directory is used: - `1`: Delete. - `0`: Retain. -3. In the web UI, navigate to your dataset's **Configuration** page and find the **Ingestion pipeline** section: +3. In the web UI, navigate to your dataset's **Configuration** page and find the **Ingestion pipeline** section: - If you decide to use a chunking method from the **Built-in** dropdown, ensure it supports PDF parsing, then select **MinerU** from the **PDF parser** dropdown. - If you use a custom ingestion pipeline instead, select **MinerU** in the **PDF parser** section of the **Parser** component. diff --git a/docs/guides/agent/agent_component_reference/retrieval.mdx b/docs/guides/agent/agent_component_reference/retrieval.mdx index 1f88669cf..3adc2ab93 100644 --- a/docs/guides/agent/agent_component_reference/retrieval.mdx +++ b/docs/guides/agent/agent_component_reference/retrieval.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 3 slug: /retrieval_component +sidebar_custom_props: { + categoryIcon: LucideFolderSearch +} --- # Retrieval component @@ -21,13 +24,13 @@ Ensure you [have properly configured your target dataset(s)](../../dataset/confi ## Quickstart -### 1. Click on a **Retrieval** component to show its configuration panel +### 1. Click on a **Retrieval** component to show its configuration panel The corresponding configuration panel appears to the right of the canvas. Use this panel to define and fine-tune the **Retrieval** component's search behavior. ### 2. Input query variable(s) -The **Retrieval** component depends on query variables to specify its queries. +The **Retrieval** component depends on query variables to specify its queries. :::caution IMPORTANT - If you use the **Retrieval** component as a standalone workflow module, input query variables in the **Input Variables** text box. @@ -74,7 +77,7 @@ Select the query source for retrieval. Defaults to `sys.query`, which is the def The **Retrieval** component relies on query variables to specify its queries. All global variables defined before the **Retrieval** component can also be used as queries. Use the `(x)` button or type `/` to show all the available query variables. -### Knowledge bases +### Knowledge bases Select the dataset(s) to retrieve data from. @@ -110,7 +113,7 @@ Using a rerank model will *significantly* increase the system's response time. ### Empty response -- Set this as a response if no results are retrieved from the dataset(s) for your query, or +- Set this as a response if no results are retrieved from the dataset(s) for your query, or - Leave this field blank to allow the chat model to improvise when nothing is found. :::caution WARNING diff --git a/docs/guides/agent/agent_component_reference/switch.mdx b/docs/guides/agent/agent_component_reference/switch.mdx index 1840e666a..fe9092330 100644 --- a/docs/guides/agent/agent_component_reference/switch.mdx +++ b/docs/guides/agent/agent_component_reference/switch.mdx @@ -1,11 +1,14 @@ --- sidebar_position: 6 slug: /switch_component +sidebar_custom_props: { + categoryIcon: LucideSplit +} --- # Switch component -A component that evaluates whether specified conditions are met and directs the follow of execution accordingly. +A component that evaluates whether specified conditions are met and directs the follow of execution accordingly. --- @@ -13,7 +16,7 @@ A **Switch** component evaluates conditions based on the output of specific comp ## Scenarios -A **Switch** component is essential for condition-based direction of execution flow. While it shares similarities with the [Categorize](./categorize.mdx) component, which is also used in multi-pronged strategies, the key distinction lies in their approach: the evaluation of the **Switch** component is rule-based, whereas the **Categorize** component involves AI and uses an LLM for decision-making. +A **Switch** component is essential for condition-based direction of execution flow. While it shares similarities with the [Categorize](./categorize.mdx) component, which is also used in multi-pronged strategies, the key distinction lies in their approach: the evaluation of the **Switch** component is rule-based, whereas the **Categorize** component involves AI and uses an LLM for decision-making. ## Configurations @@ -39,12 +42,12 @@ When you have added multiple conditions for a specific case, a **Logical operato - Greater equal - Less than - Less equal - - Contains - - Not contains + - Contains + - Not contains - Starts with - Ends with - Is empty - Not empty -- **Value**: A single value, which can be an integer, float, or string. +- **Value**: A single value, which can be an integer, float, or string. - Delimiters, multiple values, or expressions are *not* supported. diff --git a/docs/guides/agent/agent_component_reference/text_processing.mdx b/docs/guides/agent/agent_component_reference/text_processing.mdx index 626ae67bf..bfc0d9dd4 100644 --- a/docs/guides/agent/agent_component_reference/text_processing.mdx +++ b/docs/guides/agent/agent_component_reference/text_processing.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 15 slug: /text_processing +sidebar_custom_props: { + categoryIcon: LucideType +} --- # Text processing component @@ -24,7 +27,7 @@ Appears only when you select **Split** as method. The variable to be split. Type `/` to quickly insert variables. -### Script +### Script Template for the merge. Appears only when you select **Merge** as method. Type `/` to quickly insert variables. diff --git a/docs/guides/agent/agent_component_reference/transformer.md b/docs/guides/agent/agent_component_reference/transformer.md index ad8274ac4..7afcf4de8 100644 --- a/docs/guides/agent/agent_component_reference/transformer.md +++ b/docs/guides/agent/agent_component_reference/transformer.md @@ -1,6 +1,9 @@ --- sidebar_position: 37 slug: /transformer_component +sidebar_custom_props: { + categoryIcon: LucideFileStack +} --- # Transformer component @@ -13,7 +16,7 @@ A **Transformer** component indexes chunks and configures their storage formats ## Scenario -A **Transformer** component is essential when you need the LLM to extract new information, such as keywords, questions, metadata, and summaries, from the original chunks. +A **Transformer** component is essential when you need the LLM to extract new information, such as keywords, questions, metadata, and summaries, from the original chunks. ## Configurations @@ -21,29 +24,29 @@ A **Transformer** component is essential when you need the LLM to extract new in Click the dropdown menu of **Model** to show the model configuration window. -- **Model**: The chat model to use. +- **Model**: The chat model to use. - Ensure you set the chat model correctly on the **Model providers** page. - You can use different models for different components to increase flexibility or improve overall performance. -- **Creativity**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. +- **Creativity**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. This parameter has three options: - **Improvise**: Produces more creative responses. - **Precise**: (Default) Produces more conservative responses. - **Balance**: A middle ground between **Improvise** and **Precise**. -- **Temperature**: The randomness level of the model's output. +- **Temperature**: The randomness level of the model's output. Defaults to 0.1. - Lower values lead to more deterministic and predictable outputs. - Higher values lead to more creative and varied outputs. - A temperature of zero results in the same output for the same prompt. -- **Top P**: Nucleus sampling. +- **Top P**: Nucleus sampling. - Reduces the likelihood of generating repetitive or unnatural text by setting a threshold *P* and restricting the sampling to tokens with a cumulative probability exceeding *P*. - Defaults to 0.3. -- **Presence penalty**: Encourages the model to include a more diverse range of tokens in the response. +- **Presence penalty**: Encourages the model to include a more diverse range of tokens in the response. - A higher **presence penalty** value results in the model being more likely to generate tokens not yet been included in the generated text. - Defaults to 0.4. -- **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. +- **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. - Defaults to 0.7. -- **Max tokens**: +- **Max tokens**: This sets the maximum length of the model's output, measured in the number of tokens (words or pieces of words). It is disabled by default, allowing the model to determine the number of tokens in its responses. :::tip NOTE @@ -62,7 +65,7 @@ Select the type of output to be generated by the LLM: ### System prompt -Typically, you use the system prompt to describe the task for the LLM, specify how it should respond, and outline other miscellaneous requirements. We do not plan to elaborate on this topic, as it can be as extensive as prompt engineering. +Typically, you use the system prompt to describe the task for the LLM, specify how it should respond, and outline other miscellaneous requirements. We do not plan to elaborate on this topic, as it can be as extensive as prompt engineering. :::tip NOTE The system prompt here automatically updates to match your selected **Result destination**. diff --git a/docs/guides/agent/agent_introduction.md b/docs/guides/agent/agent_introduction.md index fa21a7810..87d35dbc5 100644 --- a/docs/guides/agent/agent_introduction.md +++ b/docs/guides/agent/agent_introduction.md @@ -1,6 +1,9 @@ --- sidebar_position: 1 slug: /agent_introduction +sidebar_custom_props: { + categoryIcon: LucideBookOpenText +} --- # Introduction to agents @@ -24,7 +27,7 @@ Agents and RAG are complementary techniques, each enhancing the other’s capabi :::tip NOTE -Before proceeding, ensure that: +Before proceeding, ensure that: 1. You have properly set the LLM to use. See the guides on [Configure your API key](../models/llm_api_key_setup.md) or [Deploy a local LLM](../models/deploy_local_llm.mdx) for more information. 2. You have a dataset configured and the corresponding files properly parsed. See the guide on [Configure a dataset](../dataset/configure_knowledge_base.md) for more information. @@ -41,7 +44,7 @@ We also provide templates catered to different business scenarios. You can eithe ![agent_template](https://raw.githubusercontent.com/infiniflow/ragflow-docs/main/images/agent_template_list.jpg) -2. To create an agent from scratch, click **Create Agent**. Alternatively, to create an agent from one of our templates, click the desired card, such as **Deep Research**, name your agent in the pop-up dialogue, and click **OK** to confirm. +2. To create an agent from scratch, click **Create Agent**. Alternatively, to create an agent from one of our templates, click the desired card, such as **Deep Research**, name your agent in the pop-up dialogue, and click **OK** to confirm. *You are now taken to the **no-code workflow editor** page.* diff --git a/docs/guides/agent/best_practices/_category_.json b/docs/guides/agent/best_practices/_category_.json index c788383c0..e06d81d63 100644 --- a/docs/guides/agent/best_practices/_category_.json +++ b/docs/guides/agent/best_practices/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "Best practices on Agent configuration." + }, + "customProps": { + "categoryIcon": "LucideStar" } } diff --git a/docs/guides/agent/embed_agent_into_webpage.md b/docs/guides/agent/embed_agent_into_webpage.md index 1b532c4d7..5b4644c34 100644 --- a/docs/guides/agent/embed_agent_into_webpage.md +++ b/docs/guides/agent/embed_agent_into_webpage.md @@ -1,6 +1,9 @@ --- sidebar_position: 3 slug: /embed_agent_into_webpage +sidebar_custom_props: { + categoryIcon: LucideMonitorDot +} --- # Embed agent into webpage diff --git a/docs/guides/agent/sandbox_quickstart.md b/docs/guides/agent/sandbox_quickstart.md index 5baa935a8..2ea3ed0fb 100644 --- a/docs/guides/agent/sandbox_quickstart.md +++ b/docs/guides/agent/sandbox_quickstart.md @@ -1,13 +1,16 @@ --- sidebar_position: 20 slug: /sandbox_quickstart +sidebar_custom_props: { + categoryIcon: LucideCodesandbox +} --- # Sandbox quickstart A secure, pluggable code execution backend designed for RAGFlow and other applications requiring isolated code execution environments. -## Features: +## Features: - Seamless RAGFlow Integration — Works out-of-the-box with the code component of RAGFlow. - High Security — Uses gVisor for syscall-level sandboxing to isolate execution. @@ -55,7 +58,7 @@ Next, build the executor manager image: docker build -t sandbox-executor-manager:latest ./executor_manager ``` -## Running with RAGFlow +## Running with RAGFlow 1. Verify that gVisor is properly installed and operational. diff --git a/docs/guides/ai_search.md b/docs/guides/ai_search.md index 6bd533600..609192a21 100644 --- a/docs/guides/ai_search.md +++ b/docs/guides/ai_search.md @@ -1,6 +1,9 @@ --- sidebar_position: 2 slug: /ai_search +sidebar_custom_props: { + categoryIcon: LucideSearch +} --- # Search @@ -9,7 +12,7 @@ Conduct an AI search. --- -An AI search is a single-turn AI conversation using a predefined retrieval strategy (a hybrid search of weighted keyword similarity and weighted vector similarity) and the system's default chat model. It does not involve advanced RAG strategies like knowledge graph, auto-keyword, or auto-question. The related chunks are listed below the chat model's response in descending order based on their similarity scores. +An AI search is a single-turn AI conversation using a predefined retrieval strategy (a hybrid search of weighted keyword similarity and weighted vector similarity) and the system's default chat model. It does not involve advanced RAG strategies like knowledge graph, auto-keyword, or auto-question. The related chunks are listed below the chat model's response in descending order based on their similarity scores. ![Create search app](https://raw.githubusercontent.com/infiniflow/ragflow-docs/main/images/create_search_app.jpg) diff --git a/docs/guides/chat/_category_.json b/docs/guides/chat/_category_.json index 4b33e0c7b..d55b914ec 100644 --- a/docs/guides/chat/_category_.json +++ b/docs/guides/chat/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "Chat-specific guides." + }, + "customProps": { + "categoryIcon": "LucideMessagesSquare" } } diff --git a/docs/guides/chat/best_practices/_category_.json b/docs/guides/chat/best_practices/_category_.json index e92bb793d..a0e97731f 100644 --- a/docs/guides/chat/best_practices/_category_.json +++ b/docs/guides/chat/best_practices/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "Best practices on chat assistant configuration." + }, + "customProps": { + "categoryIcon": "LucideStar" } } diff --git a/docs/guides/chat/implement_deep_research.md b/docs/guides/chat/implement_deep_research.md index b5edd2d92..ec6d8ee8d 100644 --- a/docs/guides/chat/implement_deep_research.md +++ b/docs/guides/chat/implement_deep_research.md @@ -1,6 +1,9 @@ --- sidebar_position: 3 slug: /implement_deep_research +sidebar_custom_props: { + categoryIcon: LucideScanSearch +} --- # Implement deep research diff --git a/docs/guides/chat/set_chat_variables.md b/docs/guides/chat/set_chat_variables.md index 00f1a58c7..a6507a8a7 100644 --- a/docs/guides/chat/set_chat_variables.md +++ b/docs/guides/chat/set_chat_variables.md @@ -1,6 +1,9 @@ --- sidebar_position: 4 slug: /set_chat_variables +sidebar_custom_props: { + categoryIcon: LucideVariable +} --- # Set variables @@ -91,7 +94,7 @@ from ragflow_sdk import RAGFlow rag_object = RAGFlow(api_key="", base_url="http://:9380") assistant = rag_object.list_chats(name="Miss R") assistant = assistant[0] -session = assistant.create_session() +session = assistant.create_session() print("\n==================== Miss R =====================\n") print("Hello. What can I do for you?") @@ -99,9 +102,9 @@ print("Hello. What can I do for you?") while True: question = input("\n==================== User =====================\n> ") style = input("Please enter your preferred style (e.g., formal, informal, hilarious): ") - + print("\n==================== Miss R =====================\n") - + cont = "" for ans in session.ask(question, stream=True, style=style): print(ans.content[len(cont):], end='', flush=True) diff --git a/docs/guides/chat/start_chat.md b/docs/guides/chat/start_chat.md index 1e0dd0f10..279ea6230 100644 --- a/docs/guides/chat/start_chat.md +++ b/docs/guides/chat/start_chat.md @@ -1,6 +1,9 @@ --- sidebar_position: 1 slug: /start_chat +sidebar_custom_props: { + categoryIcon: LucideBot +} --- # Start AI chat @@ -42,8 +45,8 @@ You start an AI conversation by creating an assistant. - **Rerank model** sets the reranker model to use. It is left empty by default. - If **Rerank model** is left empty, the hybrid score system uses keyword similarity and vector similarity, and the default weight assigned to the vector similarity component is 1-0.7=0.3. - If **Rerank model** is selected, the hybrid score system uses keyword similarity and reranker score, and the default weight assigned to the reranker score is 1-0.7=0.3. - - [Cross-language search](../../references/glossary.mdx#cross-language-search): Optional - Select one or more target languages from the dropdown menu. The system’s default chat model will then translate your query into the selected target language(s). This translation ensures accurate semantic matching across languages, allowing you to retrieve relevant results regardless of language differences. + - [Cross-language search](../../references/glossary.mdx#cross-language-search): Optional + Select one or more target languages from the dropdown menu. The system’s default chat model will then translate your query into the selected target language(s). This translation ensures accurate semantic matching across languages, allowing you to retrieve relevant results regardless of language differences. - When selecting target languages, please ensure that these languages are present in the dataset to guarantee an effective search. - If no target language is selected, the system will search only in the language of your query, which may cause relevant information in other languages to be missed. - **Variable** refers to the variables (keys) to be used in the system prompt. `{knowledge}` is a reserved variable. Click **Add** to add more variables for the system prompt. @@ -55,23 +58,23 @@ You start an AI conversation by creating an assistant. 4. Update Model-specific Settings: - In **Model**: you select the chat model. Though you have selected the default chat model in **System Model Settings**, RAGFlow allows you to choose an alternative chat model for your dialogue. - - **Creativity**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. + - **Creativity**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. This parameter has three options: - **Improvise**: Produces more creative responses. - **Precise**: (Default) Produces more conservative responses. - **Balance**: A middle ground between **Improvise** and **Precise**. - - **Temperature**: The randomness level of the model's output. + - **Temperature**: The randomness level of the model's output. Defaults to 0.1. - Lower values lead to more deterministic and predictable outputs. - Higher values lead to more creative and varied outputs. - A temperature of zero results in the same output for the same prompt. - - **Top P**: Nucleus sampling. + - **Top P**: Nucleus sampling. - Reduces the likelihood of generating repetitive or unnatural text by setting a threshold *P* and restricting the sampling to tokens with a cumulative probability exceeding *P*. - Defaults to 0.3. - - **Presence penalty**: Encourages the model to include a more diverse range of tokens in the response. + - **Presence penalty**: Encourages the model to include a more diverse range of tokens in the response. - A higher **presence penalty** value results in the model being more likely to generate tokens not yet been included in the generated text. - Defaults to 0.4. - - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. + - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. - Defaults to 0.7. diff --git a/docs/guides/dataset/_category_.json b/docs/guides/dataset/_category_.json index 4c454f51f..9501311fd 100644 --- a/docs/guides/dataset/_category_.json +++ b/docs/guides/dataset/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "Guides on configuring a dataset." + }, + "customProps": { + "categoryIcon": "LucideDatabaseZap" } } diff --git a/docs/guides/dataset/add_data_source/_category_.json b/docs/guides/dataset/add_data_source/_category_.json index 42f2b164a..71b3d794d 100644 --- a/docs/guides/dataset/add_data_source/_category_.json +++ b/docs/guides/dataset/add_data_source/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "Add various data sources" + }, + "customProps": { + "categoryIcon": "LucideServer" } } diff --git a/docs/guides/dataset/add_data_source/add_google_drive.md b/docs/guides/dataset/add_data_source/add_google_drive.md index a1f2d895f..d4ee70a87 100644 --- a/docs/guides/dataset/add_data_source/add_google_drive.md +++ b/docs/guides/dataset/add_data_source/add_google_drive.md @@ -1,6 +1,9 @@ --- sidebar_position: 3 slug: /add_google_drive +sidebar_custom_props: { + categoryIcon: SiGoogledrive +} --- # Add Google Drive @@ -10,9 +13,9 @@ slug: /add_google_drive You can either create a dedicated project for RAGFlow or use an existing Google Cloud external project. -**Steps:** +**Steps:** 1. Open the project creation page\ -`https://console.cloud.google.com/projectcreate` +`https://console.cloud.google.com/projectcreate` ![placeholder-image](https://github.com/infiniflow/ragflow-docs/blob/040e4acd4c1eac6dc73dc44e934a6518de78d097/images/google_drive/image1.jpeg?raw=true) 2. Select **External** as the Audience ![placeholder-image](https://github.com/infiniflow/ragflow-docs/blob/040e4acd4c1eac6dc73dc44e934a6518de78d097/images/google_drive/image2.png?raw=true) @@ -96,11 +99,11 @@ Navigate to the Google API Library:\ Enable the following APIs: -- Google Drive API -- Admin SDK API -- Google Sheets API +- Google Drive API +- Admin SDK API +- Google Sheets API - Google Docs API - + ![placeholder-image](https://github.com/infiniflow/ragflow-docs/blob/040e4acd4c1eac6dc73dc44e934a6518de78d097/images/google_drive/image15.png?raw=true) @@ -126,7 +129,7 @@ Enable the following APIs: ![placeholder-image](https://github.com/infiniflow/ragflow-docs/blob/040e4acd4c1eac6dc73dc44e934a6518de78d097/images/google_drive/image23.png?raw=true) 5. Click **Authorize with Google** -A browser window will appear. +A browser window will appear. ![placeholder-image](https://github.com/infiniflow/ragflow-docs/blob/040e4acd4c1eac6dc73dc44e934a6518de78d097/images/google_drive/image25.jpeg?raw=true) Click: - **Continue** - **Select All → Continue** - Authorization should succeed - Select **OK** to add the data source diff --git a/docs/guides/dataset/auto_metadata.md b/docs/guides/dataset/auto_metadata.md index 35967b935..2cbf85429 100644 --- a/docs/guides/dataset/auto_metadata.md +++ b/docs/guides/dataset/auto_metadata.md @@ -1,6 +1,9 @@ --- sidebar_position: -6 slug: /auto_metadata +sidebar_custom_props: { + categoryIcon: LucideFileCodeCorner +} --- # Auto-extract metadata diff --git a/docs/guides/dataset/autokeyword_autoquestion.mdx b/docs/guides/dataset/autokeyword_autoquestion.mdx index e91764585..937394e4e 100644 --- a/docs/guides/dataset/autokeyword_autoquestion.mdx +++ b/docs/guides/dataset/autokeyword_autoquestion.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 3 slug: /autokeyword_autoquestion +sidebar_custom_props: { + categoryIcon: LucideSlidersHorizontal +} --- # Auto-keyword Auto-question @@ -20,14 +23,14 @@ Enabling this feature increases document indexing time and uses extra tokens, as Auto-keyword refers to the auto-keyword generation feature of RAGFlow. It uses a chat model to generate a set of keywords or synonyms from each chunk to correct errors and enhance retrieval accuracy. This feature is implemented as a slider under **Page rank** on the **Configuration** page of your dataset. -**Values**: +**Values**: -- 0: (Default) Disabled. -- Between 3 and 5 (inclusive): Recommended if you have chunks of approximately 1,000 characters. -- 30 (maximum) +- 0: (Default) Disabled. +- Between 3 and 5 (inclusive): Recommended if you have chunks of approximately 1,000 characters. +- 30 (maximum) :::tip NOTE -- If your chunk size increases, you can increase the value accordingly. Please note, as the value increases, the marginal benefit decreases. +- If your chunk size increases, you can increase the value accordingly. Please note, as the value increases, the marginal benefit decreases. - An Auto-keyword value must be an integer. If you set it to a non-integer, say 1.7, it will be rounded down to the nearest integer, which in this case is 1. ::: @@ -37,12 +40,12 @@ Auto-question is a feature of RAGFlow that automatically generates questions fro **Values**: -- 0: (Default) Disabled. -- 1 or 2: Recommended if you have chunks of approximately 1,000 characters. +- 0: (Default) Disabled. +- 1 or 2: Recommended if you have chunks of approximately 1,000 characters. - 10 (maximum) :::tip NOTE -- If your chunk size increases, you can increase the value accordingly. Please note, as the value increases, the marginal benefit decreases. +- If your chunk size increases, you can increase the value accordingly. Please note, as the value increases, the marginal benefit decreases. - An Auto-question value must be an integer. If you set it to a non-integer, say 1.7, it will be rounded down to the nearest integer, which in this case is 1. ::: diff --git a/docs/guides/dataset/best_practices/_category_.json b/docs/guides/dataset/best_practices/_category_.json index 79a1103d5..f1fe9fa41 100644 --- a/docs/guides/dataset/best_practices/_category_.json +++ b/docs/guides/dataset/best_practices/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "Best practices on configuring a dataset." + }, + "customProps": { + "categoryIcon": "LucideStar" } } diff --git a/docs/guides/dataset/configure_child_chunking_strategy.md b/docs/guides/dataset/configure_child_chunking_strategy.md index 0be4d2330..267b4b070 100644 --- a/docs/guides/dataset/configure_child_chunking_strategy.md +++ b/docs/guides/dataset/configure_child_chunking_strategy.md @@ -1,6 +1,9 @@ --- sidebar_position: -4 slug: /configure_child_chunking_strategy +sidebar_custom_props: { + categoryIcon: LucideGroup +} --- # Configure child chunking strategy diff --git a/docs/guides/dataset/configure_knowledge_base.md b/docs/guides/dataset/configure_knowledge_base.md index e7aaa50ff..85f00180d 100644 --- a/docs/guides/dataset/configure_knowledge_base.md +++ b/docs/guides/dataset/configure_knowledge_base.md @@ -1,6 +1,9 @@ --- sidebar_position: -10 slug: /configure_knowledge_base +sidebar_custom_props: { + categoryIcon: LucideCog +} --- # Configure dataset @@ -22,7 +25,7 @@ _Each time a dataset is created, a folder with the same name is generated in the ## Configure dataset -The following screenshot shows the configuration page of a dataset. A proper configuration of your dataset is crucial for future AI chats. For example, choosing the wrong embedding model or chunking method would cause unexpected semantic loss or mismatched answers in chats. +The following screenshot shows the configuration page of a dataset. A proper configuration of your dataset is crucial for future AI chats. For example, choosing the wrong embedding model or chunking method would cause unexpected semantic loss or mismatched answers in chats. ![dataset configuration](https://raw.githubusercontent.com/infiniflow/ragflow-docs/main/images/configure_knowledge_base.jpg) @@ -60,14 +63,14 @@ You can also change a file's chunking method on the **Files** page.
From v0.21.0 onward, RAGFlow supports ingestion pipeline for customized data ingestion and cleansing workflows. - + To use a customized data pipeline: 1. On the **Agent** page, click **+ Create agent** > **Create from blank**. 2. Select **Ingestion pipeline** and name your data pipeline in the popup, then click **Save** to show the data pipeline canvas. 3. After updating your data pipeline, click **Save** on the top right of the canvas. 4. Navigate to the **Configuration** page of your dataset, select **Choose pipeline** in **Ingestion pipeline**. - + *Your saved data pipeline will appear in the dropdown menu below.*
@@ -83,9 +86,9 @@ Some embedding models are optimized for specific languages, so performance may b ### Upload file - RAGFlow's File system allows you to link a file to multiple datasets, in which case each target dataset holds a reference to the file. -- In **Knowledge Base**, you are also given the option of uploading a single file or a folder of files (bulk upload) from your local machine to a dataset, in which case the dataset holds file copies. +- In **Knowledge Base**, you are also given the option of uploading a single file or a folder of files (bulk upload) from your local machine to a dataset, in which case the dataset holds file copies. -While uploading files directly to a dataset seems more convenient, we *highly* recommend uploading files to RAGFlow's File system and then linking them to the target datasets. This way, you can avoid permanently deleting files uploaded to the dataset. +While uploading files directly to a dataset seems more convenient, we *highly* recommend uploading files to RAGFlow's File system and then linking them to the target datasets. This way, you can avoid permanently deleting files uploaded to the dataset. ### Parse file @@ -93,14 +96,14 @@ File parsing is a crucial topic in dataset configuration. The meaning of file pa ![parse file](https://raw.githubusercontent.com/infiniflow/ragflow-docs/main/images/parse_file.jpg) -- As shown above, RAGFlow allows you to use a different chunking method for a particular file, offering flexibility beyond the default method. -- As shown above, RAGFlow allows you to enable or disable individual files, offering finer control over dataset-based AI chats. +- As shown above, RAGFlow allows you to use a different chunking method for a particular file, offering flexibility beyond the default method. +- As shown above, RAGFlow allows you to enable or disable individual files, offering finer control over dataset-based AI chats. ### Intervene with file parsing results -RAGFlow features visibility and explainability, allowing you to view the chunking results and intervene where necessary. To do so: +RAGFlow features visibility and explainability, allowing you to view the chunking results and intervene where necessary. To do so: -1. Click on the file that completes file parsing to view the chunking results: +1. Click on the file that completes file parsing to view the chunking results: _You are taken to the **Chunk** page:_ @@ -113,7 +116,7 @@ RAGFlow features visibility and explainability, allowing you to view the chunkin ![update chunk](https://raw.githubusercontent.com/infiniflow/ragflow-docs/main/images/add_keyword_question.jpg) :::caution NOTE -You can add keywords to a file chunk to increase its ranking for queries containing those keywords. This action increases its keyword weight and can improve its position in search list. +You can add keywords to a file chunk to increase its ranking for queries containing those keywords. This action increases its keyword weight and can improve its position in search list. ::: 4. In Retrieval testing, ask a quick question in **Test text** to double-check if your configurations work: @@ -141,7 +144,7 @@ As of RAGFlow v0.23.1, the search feature is still in a rudimentary form, suppor You are allowed to delete a dataset. Hover your mouse over the three dot of the intended dataset card and the **Delete** option appears. Once you delete a dataset, the associated folder under **root/.knowledge** directory is AUTOMATICALLY REMOVED. The consequence is: -- The files uploaded directly to the dataset are gone; -- The file references, which you created from within RAGFlow's File system, are gone, but the associated files still exist. +- The files uploaded directly to the dataset are gone; +- The file references, which you created from within RAGFlow's File system, are gone, but the associated files still exist. ![delete dataset](https://raw.githubusercontent.com/infiniflow/ragflow-docs/main/images/delete_datasets.jpg) diff --git a/docs/guides/dataset/construct_knowledge_graph.md b/docs/guides/dataset/construct_knowledge_graph.md index 471080811..4c4b56740 100644 --- a/docs/guides/dataset/construct_knowledge_graph.md +++ b/docs/guides/dataset/construct_knowledge_graph.md @@ -1,6 +1,9 @@ --- sidebar_position: 8 slug: /construct_knowledge_graph +sidebar_custom_props: { + categoryIcon: LucideWandSparkles +} --- # Construct knowledge graph @@ -63,7 +66,7 @@ In a knowledge graph, a community is a cluster of entities linked by relationshi ## Quickstart 1. Navigate to the **Configuration** page of your dataset and update: - + - Entity types: *Required* - Specifies the entity types in the knowledge graph to generate. You don't have to stick with the default, but you need to customize them for your documents. - Method: *Optional* - Entity resolution: *Optional* @@ -74,12 +77,12 @@ In a knowledge graph, a community is a cluster of entities linked by relationshi *You can click the pause button in the dropdown to halt the build process when necessary.* -3. Go back to the **Configuration** page: - +3. Go back to the **Configuration** page: + *Once a knowledge graph is generated, the **Knowledge graph** field changes from `Not generated` to `Generated at a specific timestamp`. You can delete it by clicking the recycle bin button to the right of the field.* 4. To use the created knowledge graph, do either of the following: - + - In the **Chat setting** panel of your chat app, switch on the **Use knowledge graph** toggle. - If you are using an agent, click the **Retrieval** agent component to specify the dataset(s) and switch on the **Use knowledge graph** toggle. diff --git a/docs/guides/dataset/enable_excel2html.md b/docs/guides/dataset/enable_excel2html.md index 5a7a8fa41..7449ee59b 100644 --- a/docs/guides/dataset/enable_excel2html.md +++ b/docs/guides/dataset/enable_excel2html.md @@ -1,6 +1,9 @@ --- sidebar_position: 4 slug: /enable_excel2html +sidebar_custom_props: { + categoryIcon: LucideToggleRight +} --- # Enable Excel2HTML diff --git a/docs/guides/dataset/enable_raptor.md b/docs/guides/dataset/enable_raptor.md index 2d8fa2453..abe6f6a8c 100644 --- a/docs/guides/dataset/enable_raptor.md +++ b/docs/guides/dataset/enable_raptor.md @@ -1,6 +1,9 @@ --- sidebar_position: 7 slug: /enable_raptor +sidebar_custom_props: { + categoryIcon: LucideNetwork +} --- # Enable RAPTOR @@ -76,7 +79,7 @@ A random seed. Click **+** to change the seed value. ## Quickstart 1. Navigate to the **Configuration** page of your dataset and update: - + - Prompt: *Optional* - We recommend that you keep it as-is until you understand the mechanism behind. - Max token: *Optional* - Threshold: *Optional* @@ -86,8 +89,8 @@ A random seed. Click **+** to change the seed value. *You can click the pause button in the dropdown to halt the build process when necessary.* -3. Go back to the **Configuration** page: - +3. Go back to the **Configuration** page: + *The **RAPTOR** field changes from `Not generated` to `Generated at a specific timestamp` when a RAPTOR hierarchical tree structure is generated. You can delete it by clicking the recycle bin button to the right of the field.* 4. Once a RAPTOR hierarchical tree structure is generated, your chat assistant and **Retrieval** agent component will use it for retrieval as a default. diff --git a/docs/guides/dataset/extract_table_of_contents.md b/docs/guides/dataset/extract_table_of_contents.md index 58e920613..4e67ecae4 100644 --- a/docs/guides/dataset/extract_table_of_contents.md +++ b/docs/guides/dataset/extract_table_of_contents.md @@ -1,6 +1,9 @@ --- sidebar_position: 4 slug: /enable_table_of_contents +sidebar_custom_props: { + categoryIcon: LucideTableOfContents +} --- # Extract table of contents @@ -28,7 +31,7 @@ The system's default chat model is used to summarize clustered content. Before p 2. Enable **TOC Enhance**. 3. To use this technique during retrieval, do either of the following: - + - In the **Chat setting** panel of your chat app, switch on the **TOC Enhance** toggle. - If you are using an agent, click the **Retrieval** agent component to specify the dataset(s) and switch on the **TOC Enhance** toggle. diff --git a/docs/guides/dataset/manage_metadata.md b/docs/guides/dataset/manage_metadata.md index a848007fb..1f6439f51 100644 --- a/docs/guides/dataset/manage_metadata.md +++ b/docs/guides/dataset/manage_metadata.md @@ -1,6 +1,9 @@ --- sidebar_position: -5 slug: /manage_metadata +sidebar_custom_props: { + categoryIcon: LucideCode +} --- # Manage metadata @@ -19,7 +22,7 @@ From v0.23.0 onwards, RAGFlow allows you to manage metadata both at the dataset ![](https://raw.githubusercontent.com/infiniflow/ragflow-docs/main/images/click_metadata.png) -2. On the **Manage Metadata** page, you can do either of the following: +2. On the **Manage Metadata** page, you can do either of the following: - Edit Values: You can modify existing values. If you rename two values to be identical, they will be automatically merged. - Delete: You can delete specific values or entire fields. These changes will apply to all associated files. diff --git a/docs/guides/dataset/run_retrieval_test.md b/docs/guides/dataset/run_retrieval_test.md index 87bd29835..0291043c2 100644 --- a/docs/guides/dataset/run_retrieval_test.md +++ b/docs/guides/dataset/run_retrieval_test.md @@ -1,6 +1,9 @@ --- sidebar_position: 10 slug: /run_retrieval_test +sidebar_custom_props: { + categoryIcon: LucideTextSearch +} --- # Run retrieval test @@ -53,7 +56,7 @@ The switch is disabled by default. When enabled, RAGFlow performs the following 3. Find similar entities and their N-hop relationships from the graph using the embeddings of the extracted query entities. 4. Retrieve similar relationships from the graph using the query embedding. 5. Rank these retrieved entities and relationships by multiplying each one's PageRank value with its similarity score to the query, returning the top n as the final retrieval. -6. Retrieve the report for the community involving the most entities in the final retrieval. +6. Retrieve the report for the community involving the most entities in the final retrieval. *The retrieved entity descriptions, relationship descriptions, and the top 1 community report are sent to the LLM for content generation.* :::danger IMPORTANT @@ -78,10 +81,10 @@ This field is where you put in your testing query. 1. Navigate to the **Retrieval testing** page of your dataset, enter your query in **Test text**, and click **Testing** to run the test. 2. If the results are unsatisfactory, tune the options listed in the Configuration section and rerun the test. - *The following is a screenshot of a retrieval test conducted without using knowledge graph. It demonstrates a hybrid search combining weighted keyword similarity and weighted vector cosine similarity. The overall hybrid similarity score is 28.56, calculated as 25.17 (term similarity score) x 0.7 + 36.49 (vector similarity score) x 0.3:* + *The following is a screenshot of a retrieval test conducted without using knowledge graph. It demonstrates a hybrid search combining weighted keyword similarity and weighted vector cosine similarity. The overall hybrid similarity score is 28.56, calculated as 25.17 (term similarity score) x 0.7 + 36.49 (vector similarity score) x 0.3:* ![Image](https://github.com/user-attachments/assets/541554d4-3f3e-44e1-954b-0ae77d7372c6) - *The following is a screenshot of a retrieval test conducted using a knowledge graph. It shows that only vector similarity is used for knowledge graph-generated chunks:* + *The following is a screenshot of a retrieval test conducted using a knowledge graph. It shows that only vector similarity is used for knowledge graph-generated chunks:* ![Image](https://github.com/user-attachments/assets/30a03091-0f7b-4058-901a-f4dc5ca5aa6b) :::caution WARNING diff --git a/docs/guides/dataset/select_pdf_parser.md b/docs/guides/dataset/select_pdf_parser.md index 148314908..95e0305f6 100644 --- a/docs/guides/dataset/select_pdf_parser.md +++ b/docs/guides/dataset/select_pdf_parser.md @@ -1,6 +1,9 @@ --- sidebar_position: -3 slug: /select_pdf_parser +sidebar_custom_props: { + categoryIcon: LucideFileText +} --- # Select PDF parser @@ -54,12 +57,12 @@ Starting from v0.22.0, RAGFlow includes MinerU (≥ 2.6.3) as an optional PDF p - `"vlm-mlx-engine"` - `"vlm-vllm-async-engine"` - `"vlm-lmdeploy-engine"`. - - `MINERU_SERVER_URL`: (optional) The downstream vLLM HTTP server (e.g., `http://vllm-host:30000`). Applicable when `MINERU_BACKEND` is set to `"vlm-http-client"`. + - `MINERU_SERVER_URL`: (optional) The downstream vLLM HTTP server (e.g., `http://vllm-host:30000`). Applicable when `MINERU_BACKEND` is set to `"vlm-http-client"`. - `MINERU_OUTPUT_DIR`: (optional) The local directory for holding the outputs of the MinerU API service (zip/JSON) before ingestion. - `MINERU_DELETE_OUTPUT`: Whether to delete temporary output when a temporary directory is used: - `1`: Delete. - `0`: Retain. -3. In the web UI, navigate to your dataset's **Configuration** page and find the **Ingestion pipeline** section: +3. In the web UI, navigate to your dataset's **Configuration** page and find the **Ingestion pipeline** section: - If you decide to use a chunking method from the **Built-in** dropdown, ensure it supports PDF parsing, then select **MinerU** from the **PDF parser** dropdown. - If you use a custom ingestion pipeline instead, select **MinerU** in the **PDF parser** section of the **Parser** component. diff --git a/docs/guides/dataset/set_context_window.md b/docs/guides/dataset/set_context_window.md index 7f9abdd80..e3f84262a 100644 --- a/docs/guides/dataset/set_context_window.md +++ b/docs/guides/dataset/set_context_window.md @@ -1,6 +1,9 @@ --- sidebar_position: -8 slug: /set_context_window +sidebar_custom_props: { + categoryIcon: LucideListChevronsUpDown +} --- # Set context window size diff --git a/docs/guides/dataset/set_metadata.md b/docs/guides/dataset/set_metadata.md index 34db390cd..5af503400 100644 --- a/docs/guides/dataset/set_metadata.md +++ b/docs/guides/dataset/set_metadata.md @@ -1,6 +1,9 @@ --- sidebar_position: -7 slug: /set_metadata +sidebar_custom_props: { + categoryIcon: LucideCode +} --- # Set metadata diff --git a/docs/guides/dataset/set_page_rank.md b/docs/guides/dataset/set_page_rank.md index 5df848a0e..d18b6271b 100644 --- a/docs/guides/dataset/set_page_rank.md +++ b/docs/guides/dataset/set_page_rank.md @@ -1,6 +1,9 @@ --- sidebar_position: -2 slug: /set_page_rank +sidebar_custom_props: { + categoryIcon: LucideStickyNote +} --- # Set page rank diff --git a/docs/guides/dataset/use_tag_sets.md b/docs/guides/dataset/use_tag_sets.md index 389a97b0a..29b005d87 100644 --- a/docs/guides/dataset/use_tag_sets.md +++ b/docs/guides/dataset/use_tag_sets.md @@ -1,6 +1,9 @@ --- sidebar_position: 6 slug: /use_tag_sets +sidebar_custom_props: { + categoryIcon: LucideTags +} --- # Use tag set @@ -43,10 +46,10 @@ A tag set is *not* involved in document indexing or retrieval. Do not specify a 1. Click **+ Create dataset** to create a dataset. 2. Navigate to the **Configuration** page of the created dataset, select **Built-in** in **Ingestion pipeline**, then choose **Tag** as the default chunking method from the **Built-in** drop-down menu. -3. Go back to the **Files** page and upload and parse your table file in XLSX, CSV, or TXT formats. - _A tag cloud appears under the **Tag view** section, indicating the tag set is created:_ +3. Go back to the **Files** page and upload and parse your table file in XLSX, CSV, or TXT formats. + _A tag cloud appears under the **Tag view** section, indicating the tag set is created:_ ![Image](https://github.com/user-attachments/assets/abefbcbf-c130-4abe-95e1-267b0d2a0505) -4. Click the **Table** tab to view the tag frequency table: +4. Click the **Table** tab to view the tag frequency table: ![Image](https://github.com/user-attachments/assets/af91d10c-5ea5-491f-ab21-3803d5ebf59f) ## 2. Tag chunks @@ -60,12 +63,12 @@ Once a tag set is created, you can apply it to your dataset: If the tag set is missing from the dropdown, check that it has been created or configured correctly. ::: -3. Re-parse your documents to start the auto-tagging process. +3. Re-parse your documents to start the auto-tagging process. _In an AI chat scenario using auto-tagged datasets, each query will be tagged using the corresponding tag set(s) and chunks with these tags will have a higher chance to be retrieved._ ## 3. Update tag set -Creating a tag set is *not* for once and for all. Oftentimes, you may find it necessary to update or delete existing tags or add new entries. +Creating a tag set is *not* for once and for all. Oftentimes, you may find it necessary to update or delete existing tags or add new entries. - You can update the existing tag set in the tag frequency table. - To add new entries, you can add and parse new table files in XLSX, CSV, or TXT formats. diff --git a/docs/guides/manage_files.md b/docs/guides/manage_files.md index 27c6f1d36..2d60c485d 100644 --- a/docs/guides/manage_files.md +++ b/docs/guides/manage_files.md @@ -1,6 +1,9 @@ --- sidebar_position: 6 slug: /manage_files +sidebar_custom_props: { + categoryIcon: LucideFolderDot +} --- # Files @@ -13,7 +16,7 @@ Compared to uploading files directly to various datasets, uploading them to RAGF ## Create folder -RAGFlow's file management allows you to establish your file system with nested folder structures. To create a folder in the root directory of RAGFlow: +RAGFlow's file management allows you to establish your file system with nested folder structures. To create a folder in the root directory of RAGFlow: ![create new folder](https://github.com/infiniflow/ragflow/assets/93570324/3a37a5f4-43a6-426d-a62a-e5cd2ff7a533) @@ -23,7 +26,7 @@ Each dataset in RAGFlow has a corresponding folder under the **root/.knowledgeba ## Upload file -RAGFlow's file management supports file uploads from your local machine, allowing both individual and bulk uploads: +RAGFlow's file management supports file uploads from your local machine, allowing both individual and bulk uploads: ![upload file](https://github.com/infiniflow/ragflow/assets/93570324/5d7ded14-ce2b-4703-8567-9356a978f45c) @@ -45,7 +48,7 @@ RAGFlow's file management allows you to *link* an uploaded file to multiple data ![link knowledgebase](https://github.com/infiniflow/ragflow/assets/93570324/6c6b8db4-3269-4e35-9434-6089887e3e3f) -You can link your file to one dataset or multiple datasets at one time: +You can link your file to one dataset or multiple datasets at one time: ![link multiple kb](https://github.com/infiniflow/ragflow/assets/93570324/6c508803-fb1f-435d-b688-683066fd7fff) @@ -68,9 +71,9 @@ RAGFlow's file management allows you to rename a file or folder: ## Delete files or folders -RAGFlow's file management allows you to delete files or folders individually or in bulk. +RAGFlow's file management allows you to delete files or folders individually or in bulk. -To delete a file or folder: +To delete a file or folder: ![delete file](https://github.com/infiniflow/ragflow/assets/93570324/85872728-125d-45e9-a0ee-21e9d4cedb8b) @@ -78,7 +81,7 @@ To bulk delete files or folders: ![bulk delete](https://github.com/infiniflow/ragflow/assets/93570324/519b99ab-ec7f-4c8a-8cea-e0b6dcb3cb46) -> - You are not allowed to delete the **root/.knowledgebase** folder. +> - You are not allowed to delete the **root/.knowledgebase** folder. > - Deleting files that have been linked to datasets will **AUTOMATICALLY REMOVE** all associated file references across the datasets. ## Download uploaded file @@ -87,4 +90,4 @@ RAGFlow's file management allows you to download an uploaded file: ![download_file](https://github.com/infiniflow/ragflow/assets/93570324/cf3b297f-7d9b-4522-bf5f-4f45743e4ed5) -> As of RAGFlow v0.23.1, bulk download is not supported, nor can you download an entire folder. +> As of RAGFlow v0.23.1, bulk download is not supported, nor can you download an entire folder. diff --git a/docs/guides/migration/_category_.json b/docs/guides/migration/_category_.json index dcb812716..1099886f2 100644 --- a/docs/guides/migration/_category_.json +++ b/docs/guides/migration/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "RAGFlow migration guide" + }, + "customProps": { + "categoryIcon": "LucideArrowRightLeft" } } diff --git a/docs/guides/models/_category_.json b/docs/guides/models/_category_.json index 8536f8e47..b4a996b4f 100644 --- a/docs/guides/models/_category_.json +++ b/docs/guides/models/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "Guides on model settings." + }, + "customProps": { + "categoryIcon": "LucideBox" } } diff --git a/docs/guides/models/deploy_local_llm.mdx b/docs/guides/models/deploy_local_llm.mdx index 7d8e58eee..2e141a79e 100644 --- a/docs/guides/models/deploy_local_llm.mdx +++ b/docs/guides/models/deploy_local_llm.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 2 slug: /deploy_local_llm +sidebar_custom_props: { + categoryIcon: LucideMonitorCog +} --- # Deploy local models @@ -53,9 +56,9 @@ $ sudo docker exec ollama ollama pull llama3.2 ``` ```bash -$ sudo docker exec ollama ollama pull bge-m3 -> pulling daec91ffb5dd... 100% ▕████████████████▏ 1.2 GB -> success +$ sudo docker exec ollama ollama pull bge-m3 +> pulling daec91ffb5dd... 100% ▕████████████████▏ 1.2 GB +> success ``` ### 2. Find Ollama URL and ensure it is accessible @@ -105,7 +108,7 @@ Max retries exceeded with url: /api/chat (Caused by NewConnectionError('** **Model providers** **>** **System Model Settings** to update your model: - + - *You should now be able to find **llama3.2** from the dropdown list under **Chat model**, and **bge-m3** from the dropdown list under **Embedding model**.* ### 6. Update Chat Configuration @@ -125,7 +128,7 @@ To deploy a local model, e.g., **Mistral**, using Xinference: ### 1. Check firewall settings -Ensure that your host machine's firewall allows inbound connections on port 9997. +Ensure that your host machine's firewall allows inbound connections on port 9997. ### 2. Start an Xinference instance @@ -148,13 +151,13 @@ In RAGFlow, click on your logo on the top right of the page **>** **Model provid ### 5. Complete basic Xinference settings -Enter an accessible base URL, such as `http://:9997/v1`. +Enter an accessible base URL, such as `http://:9997/v1`. > For rerank model, please use the `http://:9997/v1/rerank` as the base URL. ### 6. Update System Model Settings Click on your logo **>** **Model providers** **>** **System Model Settings** to update your model. - + *You should now be able to find **mistral** from the dropdown list under **Chat model**.* ### 7. Update Chat Configuration @@ -170,7 +173,7 @@ To deploy a local model, e.g., **Qwen2**, using IPEX-LLM-accelerated Ollama: ### 1. Check firewall settings Ensure that your host machine's firewall allows inbound connections on port 11434. For example: - + ```bash sudo ufw allow 11434/tcp ``` @@ -179,7 +182,7 @@ sudo ufw allow 11434/tcp #### 2.1 Install IPEX-LLM for Ollama -:::tip NOTE +:::tip NOTE IPEX-LLM's supports Ollama on Linux and Windows systems. ::: @@ -191,7 +194,7 @@ For detailed information about installing IPEX-LLM for Ollama, see [Run llama.cp #### 2.2 Initialize Ollama -1. Activate the `llm-cpp` Conda environment and initialize Ollama: +1. Activate the `llm-cpp` Conda environment and initialize Ollama: - + ```bash conda activate llm-cpp init-ollama @@ -218,7 +221,7 @@ For detailed information about installing IPEX-LLM for Ollama, see [Run llama.cp 2. If the installed `ipex-llm[cpp]` requires an upgrade to the Ollama binary files, remove the old binary files and reinitialize Ollama using `init-ollama` (Linux) or `init-ollama.bat` (Windows). - + *A symbolic link to Ollama appears in your current directory, and you can use this executable file following standard Ollama commands.* #### 2.3 Launch Ollama service @@ -226,7 +229,7 @@ For detailed information about installing IPEX-LLM for Ollama, see [Run llama.cp 1. Set the environment variable `OLLAMA_NUM_GPU` to `999` to ensure that all layers of your model run on the Intel GPU; otherwise, some layers may default to CPU. 2. For optimal performance on Intel Arc™ A-Series Graphics with Linux OS (Kernel 6.2), set the following environment variable before launching the Ollama service: - ```bash + ```bash export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ``` 3. Launch the Ollama service: @@ -314,12 +317,12 @@ To enable IPEX-LLM accelerated Ollama in RAGFlow, you must also complete the con 3. [Update System Model Settings](#6-update-system-model-settings) 4. [Update Chat Configuration](#7-update-chat-configuration) -### 5. Deploy VLLM +### 5. Deploy VLLM ubuntu 22.04/24.04 ```bash - pip install vllm + pip install vllm ``` ### 5.1 RUN VLLM WITH BEST PRACTISE diff --git a/docs/guides/models/llm_api_key_setup.md b/docs/guides/models/llm_api_key_setup.md index f61d71c58..b996105c4 100644 --- a/docs/guides/models/llm_api_key_setup.md +++ b/docs/guides/models/llm_api_key_setup.md @@ -1,6 +1,9 @@ --- sidebar_position: 1 slug: /llm_api_key_setup +sidebar_custom_props: { + categoryIcon: LucideKey +} --- # Configure model API key @@ -30,7 +33,7 @@ You have two options for configuring your model API key: - Update `api_key` with yours. - Update `base_url` if you use a proxy to connect to the remote service. 3. Reboot your system for your changes to take effect. -4. Log into RAGFlow. +4. Log into RAGFlow. _After logging into RAGFlow, you will find your chosen model appears under **Added models** on the **Model providers** page._ ### Configure model API key after logging into RAGFlow diff --git a/docs/guides/team/_category_.json b/docs/guides/team/_category_.json index 37bbf1307..f245a5f35 100644 --- a/docs/guides/team/_category_.json +++ b/docs/guides/team/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "Team-specific guides." + }, + "customProps": { + "categoryIcon": "LucideUsers" } } diff --git a/docs/guides/team/join_or_leave_team.md b/docs/guides/team/join_or_leave_team.md index 978523d80..a4acf5737 100644 --- a/docs/guides/team/join_or_leave_team.md +++ b/docs/guides/team/join_or_leave_team.md @@ -1,6 +1,9 @@ --- sidebar_position: 3 slug: /join_or_leave_team +sidebar_custom_props: { + categoryIcon: LucideLogOut +} --- # Join or leave a team diff --git a/docs/guides/team/manage_team_members.md b/docs/guides/team/manage_team_members.md index edd8289cd..c529c1c06 100644 --- a/docs/guides/team/manage_team_members.md +++ b/docs/guides/team/manage_team_members.md @@ -1,6 +1,9 @@ --- sidebar_position: 2 slug: /manage_team_members +sidebar_custom_props: { + categoryIcon: LucideUserCog +} --- # Manage team members diff --git a/docs/guides/team/share_agents.md b/docs/guides/team/share_agents.md index f6be1a728..84f13e7c0 100644 --- a/docs/guides/team/share_agents.md +++ b/docs/guides/team/share_agents.md @@ -1,6 +1,9 @@ --- sidebar_position: 6 slug: /share_agent +sidebar_custom_props: { + categoryIcon: LucideShare2 +} --- # Share Agent @@ -11,7 +14,7 @@ Share an Agent with your team members. When ready, you may share your Agents with your team members so that they can use them. Please note that your Agents are not shared automatically; you must manually enable sharing by selecting the corresponding **Permissions** radio button: -1. Click the intended Agent to open its editing canvas. +1. Click the intended Agent to open its editing canvas. 2. Click **Management** > **Settings** to show the **Agent settings** dialogue. 3. Change **Permissions** from **Only me** to **Team**. 4. Click **Save** to apply your changes. diff --git a/docs/guides/team/share_chat_assistant.md b/docs/guides/team/share_chat_assistant.md index f8f172ee5..c8d04eb8b 100644 --- a/docs/guides/team/share_chat_assistant.md +++ b/docs/guides/team/share_chat_assistant.md @@ -1,6 +1,9 @@ --- sidebar_position: 5 slug: /share_chat_assistant +sidebar_custom_props: { + categoryIcon: LucideShare2 +} --- # Share chat assistant diff --git a/docs/guides/team/share_knowledge_bases.md b/docs/guides/team/share_knowledge_bases.md index 4eeccd264..57e67912e 100644 --- a/docs/guides/team/share_knowledge_bases.md +++ b/docs/guides/team/share_knowledge_bases.md @@ -1,6 +1,9 @@ --- sidebar_position: 4 slug: /share_datasets +sidebar_custom_props: { + categoryIcon: LucideShare2 +} --- # Share dataset diff --git a/docs/guides/team/share_model.md b/docs/guides/team/share_model.md index 459641fca..831415baa 100644 --- a/docs/guides/team/share_model.md +++ b/docs/guides/team/share_model.md @@ -1,6 +1,9 @@ --- sidebar_position: 7 slug: /share_model +sidebar_custom_props: { + categoryIcon: LucideShare2 +} --- # Share models diff --git a/docs/guides/tracing.mdx b/docs/guides/tracing.mdx index c9f37ba75..41b5a41a6 100644 --- a/docs/guides/tracing.mdx +++ b/docs/guides/tracing.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 9 slug: /tracing +sidebar_custom_props: { + categoryIcon: LucideLocateFixed +} --- # Tracing @@ -15,10 +18,10 @@ This document is contributed by our community contributor [jannikmaierhoefer](ht RAGFlow ships with a built-in [Langfuse](https://langfuse.com) integration so that you can **inspect and debug every retrieval and generation step** of your RAG pipelines in near real-time. -Langfuse stores traces, spans and prompt payloads in a purpose-built observability backend and offers filtering and visualisations on top. +Langfuse stores traces, spans and prompt payloads in a purpose-built observability backend and offers filtering and visualisations on top. :::info NOTE -• RAGFlow **≥ 0.18.0** (contains the Langfuse connector) +• RAGFlow **≥ 0.18.0** (contains the Langfuse connector) • A Langfuse workspace (cloud or self-hosted) with a _Project Public Key_ and _Secret Key_ ::: @@ -26,9 +29,9 @@ Langfuse stores traces, spans and prompt payloads in a purpose-built observabili ## 1. Collect your Langfuse credentials -1. Sign in to your Langfuse dashboard. -2. Open **Settings ▸ Projects** and either create a new project or select an existing one. -3. Copy the **Public Key** and **Secret Key**. +1. Sign in to your Langfuse dashboard. +2. Open **Settings ▸ Projects** and either create a new project or select an existing one. +3. Copy the **Public Key** and **Secret Key**. 4. Note the Langfuse **host** (e.g. `https://cloud.langfuse.com`). Use the base URL of your own installation if you self-host. > The keys are _project-scoped_: one pair of keys is enough for all environments that should write into the same project. @@ -39,10 +42,10 @@ Langfuse stores traces, spans and prompt payloads in a purpose-built observabili RAGFlow stores the credentials _per tenant_. You can configure them either via the web UI or the HTTP API. -1. Log in to RAGFlow and click your avatar in the top-right corner. -2. Select **API ▸ Scroll down to the bottom ▸ Langfuse Configuration**. -3. Fill in you Langfuse **Host**, **Public Key** and **Secret Key**. -4. Click **Save**. +1. Log in to RAGFlow and click your avatar in the top-right corner. +2. Select **API ▸ Scroll down to the bottom ▸ Langfuse Configuration**. +3. Fill in you Langfuse **Host**, **Public Key** and **Secret Key**. +4. Click **Save**. ![Example RAGFlow trace in Langfuse](https://langfuse.com/images/docs/ragflow/ragflow-configuration.gif) @@ -52,14 +55,14 @@ Once saved, RAGFlow starts emitting traces automatically – no code change requ ## 3. Run a pipeline and watch the traces -1. Execute any chat or retrieval pipeline in RAGFlow (e.g. the Quickstart demo). -2. Open your Langfuse project ▸ **Traces**. +1. Execute any chat or retrieval pipeline in RAGFlow (e.g. the Quickstart demo). +2. Open your Langfuse project ▸ **Traces**. 3. Filter by **name ~ `ragflow-*`** (RAGFlow prefixes each trace with `ragflow-`). For every user request you will see: -• a **trace** representing the overall request -• **spans** for retrieval, ranking and generation steps +• a **trace** representing the overall request +• **spans** for retrieval, ranking and generation steps • the complete **prompts**, **retrieved documents** and **LLM responses** as metadata ![Example RAGFlow trace in Langfuse](https://langfuse.com/images/docs/ragflow/ragflow-trace-frame.png) diff --git a/docs/guides/upgrade_ragflow.mdx b/docs/guides/upgrade_ragflow.mdx index 419fe76e4..e299dc74b 100644 --- a/docs/guides/upgrade_ragflow.mdx +++ b/docs/guides/upgrade_ragflow.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 11 slug: /upgrade_ragflow +sidebar_custom_props: { + categoryIcon: LucideArrowBigUpDash +} --- # Upgrading diff --git a/docs/quickstart.mdx b/docs/quickstart.mdx index 387de9d79..3a0f336eb 100644 --- a/docs/quickstart.mdx +++ b/docs/quickstart.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 0 slug: / +sidebar_custom_props: { + sidebarIcon: LucideRocket +} --- # Get started @@ -12,9 +15,9 @@ RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on d This quick start guide describes a general process from: -- Starting up a local RAGFlow server, -- Creating a dataset, -- Intervening with file parsing, to +- Starting up a local RAGFlow server, +- Creating a dataset, +- Intervening with file parsing, to - Establishing an AI chat based on your datasets. :::danger IMPORTANT @@ -71,7 +74,7 @@ This section provides instructions on setting up the RAGFlow server on Linux. If :::caution WARNING This change will be reset after a system reboot. If you forget to update the value the next time you start up the server, you may get a `Can't connect to ES cluster` exception. ::: - + 1.3. To ensure your change remains permanent, add or update the `vm.max_map_count` value in **/etc/sysctl.conf** accordingly: ```bash @@ -145,7 +148,7 @@ This section provides instructions on setting up the RAGFlow server on Linux. If ``` #### If you are on Windows with Docker Desktop WSL 2 backend, then use docker-desktop to set `vm.max_map_count`: - 1.1. Run the following in WSL: + 1.1. Run the following in WSL: ```bash $ wsl -d docker-desktop -u root $ sysctl -w vm.max_map_count=262144 @@ -172,7 +175,7 @@ This section provides instructions on setting up the RAGFlow server on Linux. If ``` ```bash - # Append a line, which reads: + # Append a line, which reads: vm.max_map_count = 262144 ``` ::: @@ -227,13 +230,13 @@ This section provides instructions on setting up the RAGFlow server on Linux. If / /_/ // /| | / / __ / /_ / // __ \| | /| / / / _, _// ___ |/ /_/ // __/ / // /_/ /| |/ |/ / /_/ |_|/_/ |_|\____//_/ /_/ \____/ |__/|__/ - + * Running on all addresses (0.0.0.0) ``` :::danger IMPORTANT If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a `network anomaly` error because, at that moment, your RAGFlow may not be fully initialized. - ::: + ::: 5. In your web browser, enter the IP address of your server and log in to RAGFlow. @@ -245,24 +248,24 @@ This section provides instructions on setting up the RAGFlow server on Linux. If RAGFlow is a RAG engine and needs to work with an LLM to offer grounded, hallucination-free question-answering capabilities. RAGFlow supports most mainstream LLMs. For a complete list of supported models, please refer to [Supported Models](./references/supported_models.mdx). -:::note -RAGFlow also supports deploying LLMs locally using Ollama, Xinference, or LocalAI, but this part is not covered in this quick start guide. +:::note +RAGFlow also supports deploying LLMs locally using Ollama, Xinference, or LocalAI, but this part is not covered in this quick start guide. ::: -To add and configure an LLM: +To add and configure an LLM: 1. Click on your logo on the top right of the page **>** **Model providers**. 2. Click on the desired LLM and update the API key accordingly. -3. Click **System Model Settings** to select the default models: +3. Click **System Model Settings** to select the default models: - - Chat model, - - Embedding model, + - Chat model, + - Embedding model, - Image-to-text model, - and more. -> Some models, such as the image-to-text model **qwen-vl-max**, are subsidiary to a specific LLM. And you may need to update your API key to access these models. +> Some models, such as the image-to-text model **qwen-vl-max**, are subsidiary to a specific LLM. And you may need to update your API key to access these models. ## Create your first dataset @@ -278,21 +281,21 @@ To create your first dataset: ![dataset configuration](https://raw.githubusercontent.com/infiniflow/ragflow-docs/main/images/configure_knowledge_base.jpg) -3. RAGFlow offers multiple chunk templates that cater to different document layouts and file formats. Select the embedding model and chunking method (template) for your dataset. +3. RAGFlow offers multiple chunk templates that cater to different document layouts and file formats. Select the embedding model and chunking method (template) for your dataset. - :::danger IMPORTANT - Once you have selected an embedding model and used it to parse a file, you are no longer allowed to change it. The obvious reason is that we must ensure that all files in a specific dataset are parsed using the *same* embedding model (ensure that they are being compared in the same embedding space). + :::danger IMPORTANT + Once you have selected an embedding model and used it to parse a file, you are no longer allowed to change it. The obvious reason is that we must ensure that all files in a specific dataset are parsed using the *same* embedding model (ensure that they are being compared in the same embedding space). ::: _You are taken to the **Dataset** page of your dataset._ -4. Click **+ Add file** **>** **Local files** to start uploading a particular file to the dataset. +4. Click **+ Add file** **>** **Local files** to start uploading a particular file to the dataset. 5. In the uploaded file entry, click the play button to start file parsing: ![parse file](https://raw.githubusercontent.com/infiniflow/ragflow-docs/main/images/parse_file.jpg) - :::caution NOTE + :::caution NOTE - If your file parsing gets stuck at below 1%, see [this FAQ](./faq.mdx#why-does-my-document-parsing-stall-at-under-one-percent). - If your file parsing gets stuck at near completion, see [this FAQ](./faq.mdx#why-does-my-pdf-parsing-stall-near-completion-while-the-log-does-not-show-any-error) ::: diff --git a/docs/references/_category_.json b/docs/references/_category_.json index fec533560..f41a83cc7 100644 --- a/docs/references/_category_.json +++ b/docs/references/_category_.json @@ -4,5 +4,8 @@ "link": { "type": "generated-index", "description": "Miscellaneous References" + }, + "customProps": { + "sidebarIcon": "LucideScrollText" } } diff --git a/docs/references/glossary.mdx b/docs/references/glossary.mdx index ceec555dd..f4cb071e7 100644 --- a/docs/references/glossary.mdx +++ b/docs/references/glossary.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 0 slug: /glossary +sidebar_custom_props: { + categoryIcon: LucideCaseUpper +} --- # Glossary diff --git a/docs/references/http_api_reference.md b/docs/references/http_api_reference.md index 8cc35ac7e..872c3cedb 100644 --- a/docs/references/http_api_reference.md +++ b/docs/references/http_api_reference.md @@ -1,6 +1,9 @@ --- sidebar_position: 4 slug: /http_api_reference +sidebar_custom_props: { + categoryIcon: LucideGlobe +} --- # HTTP API @@ -79,17 +82,17 @@ curl --request POST \ ##### Request Parameters -- `model` (*Body parameter*) `string`, *Required* +- `model` (*Body parameter*) `string`, *Required* The model used to generate the response. The server will parse this automatically, so you can set it to any value for now. -- `messages` (*Body parameter*) `list[object]`, *Required* +- `messages` (*Body parameter*) `list[object]`, *Required* A list of historical chat messages used to generate the response. This must contain at least one message with the `user` role. -- `stream` (*Body parameter*) `boolean` +- `stream` (*Body parameter*) `boolean` Whether to receive the response as a stream. Set this to `false` explicitly if you prefer to receive the entire response in one go instead of as a stream. -- `extra_body` (*Body parameter*) `object` - Extra request parameters: +- `extra_body` (*Body parameter*) `object` + Extra request parameters: - `reference`: `boolean` - include reference in the final chunk (stream) or in the final message (non-stream). - `metadata_condition`: `object` - metadata filter conditions applied to retrieval results. @@ -209,16 +212,16 @@ curl --request POST \ ##### Request Parameters -- `model` (*Body parameter*) `string`, *Required* +- `model` (*Body parameter*) `string`, *Required* The model used to generate the response. The server will parse this automatically, so you can set it to any value for now. -- `messages` (*Body parameter*) `list[object]`, *Required* +- `messages` (*Body parameter*) `list[object]`, *Required* A list of historical chat messages used to generate the response. This must contain at least one message with the `user` role. -- `stream` (*Body parameter*) `boolean` +- `stream` (*Body parameter*) `boolean` Whether to receive the response as a stream. Set this to `false` explicitly if you prefer to receive the entire response in one go instead of as a stream. -- `session_id` (*Body parameter*) `string` +- `session_id` (*Body parameter*) `string` Agent session id. #### Response @@ -474,33 +477,33 @@ curl --request POST \ ##### Request parameters -- `"name"`: (*Body parameter*), `string`, *Required* - The unique name of the dataset to create. It must adhere to the following requirements: +- `"name"`: (*Body parameter*), `string`, *Required* + The unique name of the dataset to create. It must adhere to the following requirements: - Basic Multilingual Plane (BMP) only - Maximum 128 characters - Case-insensitive -- `"avatar"`: (*Body parameter*), `string` +- `"avatar"`: (*Body parameter*), `string` Base64 encoding of the avatar. - Maximum 65535 characters -- `"description"`: (*Body parameter*), `string` +- `"description"`: (*Body parameter*), `string` A brief description of the dataset to create. - Maximum 65535 characters -- `"embedding_model"`: (*Body parameter*), `string` +- `"embedding_model"`: (*Body parameter*), `string` The name of the embedding model to use. For example: `"BAAI/bge-large-zh-v1.5@BAAI"` - Maximum 255 characters - Must follow `model_name@model_factory` format -- `"permission"`: (*Body parameter*), `string` - Specifies who can access the dataset to create. Available options: +- `"permission"`: (*Body parameter*), `string` + Specifies who can access the dataset to create. Available options: - `"me"`: (Default) Only you can manage the dataset. - `"team"`: All team members can manage the dataset. -- `"chunk_method"`: (*Body parameter*), `enum` - The default chunk method of the dataset to create. Mutually exclusive with `"parse_type"` and `"pipeline_id"`. If you set `"chunk_method"`, do not include `"parse_type"` or `"pipeline_id"`. - Available options: +- `"chunk_method"`: (*Body parameter*), `enum` + The default chunk method of the dataset to create. Mutually exclusive with `"parse_type"` and `"pipeline_id"`. If you set `"chunk_method"`, do not include `"parse_type"` or `"pipeline_id"`. + Available options: - `"naive"`: General (default) - `"book"`: Book - `"email"`: Email @@ -514,8 +517,8 @@ curl --request POST \ - `"table"`: Table - `"tag"`: Tag -- `"parser_config"`: (*Body parameter*), `object` - The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`: +- `"parser_config"`: (*Body parameter*), `object` + The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`: - If `"chunk_method"` is `"naive"`, the `"parser_config"` object contains the following attributes: - `"auto_keywords"`: `int` - Defaults to `0` @@ -547,17 +550,17 @@ curl --request POST \ - Defaults to: `{"use_raptor": false}` - `"graphrag"`: `object` GRAPHRAG-specific settings. - Defaults to: `{"use_graphrag": false}` - - If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute: + - If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute: - `"raptor"`: `object` RAPTOR-specific settings. - Defaults to: `{"use_raptor": false}`. - If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object. -- `"parse_type"`: (*Body parameter*), `int` - The ingestion pipeline parse type identifier, i.e., the number of parsers in your **Parser** component. +- `"parse_type"`: (*Body parameter*), `int` + The ingestion pipeline parse type identifier, i.e., the number of parsers in your **Parser** component. - Required (along with `"pipeline_id"`) if specifying an ingestion pipeline. - Must not be included when `"chunk_method"` is specified. -- `"pipeline_id"`: (*Body parameter*), `string` +- `"pipeline_id"`: (*Body parameter*), `string` The ingestion pipeline ID. Can be found in the corresponding URL in the RAGFlow UI. - Required (along with `"parse_type"`) if specifying an ingestion pipeline. - Must be a 32-character lowercase hexadecimal string, e.g., `"d0bebe30ae2211f0970942010a8e0005"`. @@ -594,10 +597,10 @@ Success: "name": "RAGFlow example", "pagerank": 0, "parser_config": { - "chunk_token_num": 128, - "delimiter": "\\n!?;。;!?", - "html4excel": false, - "layout_recognize": "DeepDOC", + "chunk_token_num": 128, + "delimiter": "\\n!?;。;!?", + "html4excel": false, + "layout_recognize": "DeepDOC", "raptor": { "use_raptor": false } @@ -655,7 +658,7 @@ curl --request DELETE \ ##### Request parameters -- `"ids"`: (*Body parameter*), `list[string]` or `null`, *Required* +- `"ids"`: (*Body parameter*), `list[string]` or `null`, *Required* Specifies the datasets to delete: - If `null`, all datasets will be deleted. - If an array of IDs, only the specified datasets will be deleted. @@ -667,7 +670,7 @@ Success: ```json { - "code": 0 + "code": 0 } ``` @@ -720,32 +723,32 @@ curl --request PUT \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The ID of the dataset to update. -- `"name"`: (*Body parameter*), `string` +- `"name"`: (*Body parameter*), `string` The revised name of the dataset. - Basic Multilingual Plane (BMP) only - Maximum 128 characters - Case-insensitive -- `"avatar"`: (*Body parameter*), `string` +- `"avatar"`: (*Body parameter*), `string` The updated base64 encoding of the avatar. - Maximum 65535 characters -- `"embedding_model"`: (*Body parameter*), `string` - The updated embedding model name. +- `"embedding_model"`: (*Body parameter*), `string` + The updated embedding model name. - Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`. - Maximum 255 characters - Must follow `model_name@model_factory` format -- `"permission"`: (*Body parameter*), `string` - The updated dataset permission. Available options: +- `"permission"`: (*Body parameter*), `string` + The updated dataset permission. Available options: - `"me"`: (Default) Only you can manage the dataset. - `"team"`: All team members can manage the dataset. -- `"pagerank"`: (*Body parameter*), `int` +- `"pagerank"`: (*Body parameter*), `int` refer to [Set page rank](https://ragflow.io/docs/dev/set_page_rank) - Default: `0` - Minimum: `0` - Maximum: `100` -- `"chunk_method"`: (*Body parameter*), `enum` - The chunking method for the dataset. Available options: +- `"chunk_method"`: (*Body parameter*), `enum` + The chunking method for the dataset. Available options: - `"naive"`: General (default) - `"book"`: Book - `"email"`: Email @@ -758,8 +761,8 @@ curl --request PUT \ - `"qa"`: Q&A - `"table"`: Table - `"tag"`: Tag -- `"parser_config"`: (*Body parameter*), `object` - The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`: +- `"parser_config"`: (*Body parameter*), `object` + The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`: - If `"chunk_method"` is `"naive"`, the `"parser_config"` object contains the following attributes: - `"auto_keywords"`: `int` - Defaults to `0` @@ -788,7 +791,7 @@ curl --request PUT \ - Defaults to: `{"use_raptor": false}` - `"graphrag"`: `object` GRAPHRAG-specific settings. - Defaults to: `{"use_graphrag": false}` - - If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute: + - If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute: - `"raptor"`: `object` RAPTOR-specific settings. - Defaults to: `{"use_raptor": false}`. - If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object. @@ -799,7 +802,7 @@ Success: ```json { - "code": 0 + "code": 0 } ``` @@ -837,19 +840,19 @@ curl --request GET \ ##### Request parameters -- `page`: (*Filter parameter*) +- `page`: (*Filter parameter*) Specifies the page on which the datasets will be displayed. Defaults to `1`. -- `page_size`: (*Filter parameter*) +- `page_size`: (*Filter parameter*) The number of datasets on each page. Defaults to `30`. -- `orderby`: (*Filter parameter*) +- `orderby`: (*Filter parameter*) The field by which datasets should be sorted. Available options: - `create_time` (default) - `update_time` -- `desc`: (*Filter parameter*) +- `desc`: (*Filter parameter*) Indicates whether the retrieved datasets should be sorted in descending order. Defaults to `true`. -- `name`: (*Filter parameter*) +- `name`: (*Filter parameter*) The name of the dataset to retrieve. -- `id`: (*Filter parameter*) +- `id`: (*Filter parameter*) The ID of the dataset to retrieve. #### Response @@ -932,7 +935,7 @@ curl --request GET \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The ID of the target dataset. #### Response @@ -1012,7 +1015,7 @@ curl --request DELETE \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The ID of the target dataset. #### Response @@ -1060,7 +1063,7 @@ curl --request POST \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The ID of the target dataset. #### Response @@ -1110,7 +1113,7 @@ curl --request GET \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The ID of the target dataset. #### Response @@ -1175,7 +1178,7 @@ curl --request POST \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The ID of the target dataset. #### Response @@ -1225,7 +1228,7 @@ curl --request GET \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The ID of the target dataset. #### Response @@ -1301,9 +1304,9 @@ curl --request POST \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The ID of the dataset to which the documents will be uploaded. -- `'file'`: (*Body parameter*) +- `'file'`: (*Body parameter*) A document to upload. #### Response @@ -1378,8 +1381,8 @@ curl --request PUT \ --header 'Content-Type: application/json' \ --data ' { - "name": "manual.txt", - "chunk_method": "manual", + "name": "manual.txt", + "chunk_method": "manual", "parser_config": {"chunk_token_num": 128} }' @@ -1387,14 +1390,14 @@ curl --request PUT \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The ID of the associated dataset. -- `document_id`: (*Path parameter*) +- `document_id`: (*Path parameter*) The ID of the document to update. - `"name"`: (*Body parameter*), `string` - `"meta_fields"`: (*Body parameter*), `dict[str, Any]` The meta fields of the document. -- `"chunk_method"`: (*Body parameter*), `string` - The parsing method to apply to the document: +- `"chunk_method"`: (*Body parameter*), `string` + The parsing method to apply to the document: - `"naive"`: General - `"manual`: Manual - `"qa"`: Q&A @@ -1406,8 +1409,8 @@ curl --request PUT \ - `"picture"`: Picture - `"one"`: One - `"email"`: Email -- `"parser_config"`: (*Body parameter*), `object` - The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`: +- `"parser_config"`: (*Body parameter*), `object` + The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`: - If `"chunk_method"` is `"naive"`, the `"parser_config"` object contains the following attributes: - `"chunk_token_num"`: Defaults to `256`. - `"layout_recognize"`: Defaults to `true`. @@ -1418,10 +1421,10 @@ curl --request PUT \ - If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute: - `"raptor"`: RAPTOR-specific settings. Defaults to: `{"use_raptor": false}`. - If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object. -- `"enabled"`: (*Body parameter*), `integer` - Whether the document should be **available** in the knowledge base. - - `1` → (available) - - `0` → (unavailable) +- `"enabled"`: (*Body parameter*), `integer` + Whether the document should be **available** in the knowledge base. + - `1` → (available) + - `0` → (unavailable) #### Response @@ -1545,9 +1548,9 @@ curl --request GET \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The associated dataset ID. -- `documents_id`: (*Path parameter*) +- `documents_id`: (*Path parameter*) The ID of the document to download. #### Response @@ -1595,30 +1598,30 @@ curl --request GET \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The associated dataset ID. -- `keywords`: (*Filter parameter*), `string` +- `keywords`: (*Filter parameter*), `string` The keywords used to match document titles. - `page`: (*Filter parameter*), `integer` Specifies the page on which the documents will be displayed. Defaults to `1`. -- `page_size`: (*Filter parameter*), `integer` +- `page_size`: (*Filter parameter*), `integer` The maximum number of documents on each page. Defaults to `30`. -- `orderby`: (*Filter parameter*), `string` +- `orderby`: (*Filter parameter*), `string` The field by which documents should be sorted. Available options: - `create_time` (default) - `update_time` -- `desc`: (*Filter parameter*), `boolean` +- `desc`: (*Filter parameter*), `boolean` Indicates whether the retrieved documents should be sorted in descending order. Defaults to `true`. -- `id`: (*Filter parameter*), `string` +- `id`: (*Filter parameter*), `string` The ID of the document to retrieve. -- `create_time_from`: (*Filter parameter*), `integer` +- `create_time_from`: (*Filter parameter*), `integer` Unix timestamp for filtering documents created after this time. 0 means no filter. Defaults to `0`. -- `create_time_to`: (*Filter parameter*), `integer` +- `create_time_to`: (*Filter parameter*), `integer` Unix timestamp for filtering documents created before this time. 0 means no filter. Defaults to `0`. -- `suffix`: (*Filter parameter*), `array[string]` +- `suffix`: (*Filter parameter*), `array[string]` Filter by file suffix. Supports multiple values, e.g., `pdf`, `txt`, and `docx`. Defaults to all suffixes. -- `run`: (*Filter parameter*), `array[string]` - Filter by document processing status. Supports numeric, text, and mixed formats: +- `run`: (*Filter parameter*), `array[string]` + Filter by document processing status. Supports numeric, text, and mixed formats: - Numeric format: `["0", "1", "2", "3", "4"]` - Text format: `[UNSTART, RUNNING, CANCEL, DONE, FAIL]` - Mixed format: `[UNSTART, 1, DONE]` (mixing numeric and text formats) @@ -1627,7 +1630,7 @@ curl --request GET \ - `1` / `RUNNING`: Document is currently being processed - `2` / `CANCEL`: Document processing was cancelled - `3` / `DONE`: Document processing completed successfully - - `4` / `FAIL`: Document processing failed + - `4` / `FAIL`: Document processing failed Defaults to all statuses. - `metadata_condition`: (*Filter parameter*), `object` (JSON in query) Optional metadata filter applied to documents when `document_ids` is not provided. Uses the same structure as retrieval: @@ -1741,9 +1744,9 @@ curl --request DELETE \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The associated dataset ID. -- `"ids"`: (*Body parameter*), `list[string]` +- `"ids"`: (*Body parameter*), `list[string]` The IDs of the documents to delete. If it is not specified, all documents in the specified dataset will be deleted. #### Response @@ -1798,9 +1801,9 @@ curl --request POST \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The dataset ID. -- `"document_ids"`: (*Body parameter*), `list[string]`, *Required* +- `"document_ids"`: (*Body parameter*), `list[string]`, *Required* The IDs of the documents to parse. #### Response @@ -1855,9 +1858,9 @@ curl --request DELETE \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The associated dataset ID. -- `"document_ids"`: (*Body parameter*), `list[string]`, *Required* +- `"document_ids"`: (*Body parameter*), `list[string]`, *Required* The IDs of the documents for which the parsing should be stopped. #### Response @@ -1917,13 +1920,13 @@ curl --request POST \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The associated dataset ID. -- `document_ids`: (*Path parameter*) +- `document_ids`: (*Path parameter*) The associated document ID. -- `"content"`: (*Body parameter*), `string`, *Required* +- `"content"`: (*Body parameter*), `string`, *Required* The text content of the chunk. -- `"important_keywords`(*Body parameter*), `list[string]` +- `"important_keywords`(*Body parameter*), `list[string]` The key terms or phrases to tag with the chunk. - `"questions"`(*Body parameter*), `list[string]` If there is a given question, the embedded chunks will be based on them @@ -1979,22 +1982,22 @@ Lists chunks in a specified document. ```bash curl --request GET \ --url http://{address}/api/v1/datasets/{dataset_id}/documents/{document_id}/chunks?keywords={keywords}&page={page}&page_size={page_size}&id={chunk_id} \ - --header 'Authorization: Bearer ' + --header 'Authorization: Bearer ' ``` ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The associated dataset ID. -- `document_id`: (*Path parameter*) +- `document_id`: (*Path parameter*) The associated document ID. -- `keywords`(*Filter parameter*), `string` +- `keywords`(*Filter parameter*), `string` The keywords used to match chunk content. -- `page`(*Filter parameter*), `integer` +- `page`(*Filter parameter*), `integer` Specifies the page on which the chunks will be displayed. Defaults to `1`. -- `page_size`(*Filter parameter*), `integer` +- `page_size`(*Filter parameter*), `integer` The maximum number of chunks on each page. Defaults to `1024`. -- `id`(*Filter parameter*), `string` +- `id`(*Filter parameter*), `string` The ID of the chunk to retrieve. #### Response @@ -2099,11 +2102,11 @@ curl --request DELETE \ ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The associated dataset ID. -- `document_ids`: (*Path parameter*) +- `document_ids`: (*Path parameter*) The associated document ID. -- `"chunk_ids"`: (*Body parameter*), `list[string]` +- `"chunk_ids"`: (*Body parameter*), `list[string]` The IDs of the chunks to delete. If it is not specified, all chunks of the specified document will be deleted. #### Response @@ -2153,26 +2156,26 @@ curl --request PUT \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer ' \ --data ' - { - "content": "ragflow123", - "important_keywords": [] + { + "content": "ragflow123", + "important_keywords": [] }' ``` ##### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The associated dataset ID. -- `document_ids`: (*Path parameter*) +- `document_ids`: (*Path parameter*) The associated document ID. -- `chunk_id`: (*Path parameter*) +- `chunk_id`: (*Path parameter*) The ID of the chunk to update. -- `"content"`: (*Body parameter*), `string` +- `"content"`: (*Body parameter*), `string` The text content of the chunk. -- `"important_keywords"`: (*Body parameter*), `list[string]` +- `"important_keywords"`: (*Body parameter*), `list[string]` A list of key terms or phrases to tag with the chunk. -- `"available"`: (*Body parameter*) `boolean` - The chunk's availability status in the dataset. Value options: +- `"available"`: (*Body parameter*) `boolean` + The chunk's availability status in the dataset. Value options: - `true`: Available (default) - `false`: Unavailable @@ -2248,18 +2251,18 @@ Batch update or delete document-level metadata within a specified dataset. If bo #### Request parameters -- `dataset_id`: (*Path parameter*) +- `dataset_id`: (*Path parameter*) The associated dataset ID. -- `"selector"`: (*Body parameter*), `object`, *optional* - A document selector: - - `"document_ids"`: `list[string]` *optional* - The associated document ID. - - `"metadata_condition"`: `object`, *optional* +- `"selector"`: (*Body parameter*), `object`, *optional* + A document selector: + - `"document_ids"`: `list[string]` *optional* + The associated document ID. + - `"metadata_condition"`: `object`, *optional* - `"logic"`: Defines the logic relation between conditions if multiple conditions are provided. Options: - `"and"` (default) - `"or"` - - `"conditions"`: `list[object]` *optional* - Each object: `{ "name": string, "comparison_operator": string, "value": string }` + - `"conditions"`: `list[object]` *optional* + Each object: `{ "name": string, "comparison_operator": string, "value": string }` - `"name"`: `string` The key name to search by. - `"comparison_operator"`: `string` Available options: - `"is"` @@ -2276,14 +2279,14 @@ Batch update or delete document-level metadata within a specified dataset. If bo - `"≤"` - `"empty"` - `"not empty"` - - `"value"`: `string` The key value to search by. -- `"updates"`: (*Body parameter*), `list[object]`, *optional* - Replaces metadata of the retrieved documents. Each object: `{ "key": string, "match": string, "value": string }`. + - `"value"`: `string` The key value to search by. +- `"updates"`: (*Body parameter*), `list[object]`, *optional* + Replaces metadata of the retrieved documents. Each object: `{ "key": string, "match": string, "value": string }`. - `"key"`: `string` The name of the key to update. - `"match"`: `string` *optional* The current value of the key to update. When omitted, the corresponding keys are updated to `"value"` regardless of their current values. - `"value"`: `string` The new value to set for the specified keys. -- `"deletes`: (*Body parameter*), `list[ojbect]`, *optional* - Deletes metadata of the retrieved documents. Each object: `{ "key": string, "value": string }`. +- `"deletes`: (*Body parameter*), `list[ojbect]`, *optional* + Deletes metadata of the retrieved documents. Each object: `{ "key": string, "value": string }`. - `"key"`: `string` The name of the key to delete. - `"value"`: `string` *Optional* The value of the key to delete. - When provided, only keys with a matching value are deleted. @@ -2345,16 +2348,16 @@ Retrieves chunks from specified datasets. - `'content-Type: application/json'` - `'Authorization: Bearer '` - Body: - - `"question"`: `string` - - `"dataset_ids"`: `list[string]` + - `"question"`: `string` + - `"dataset_ids"`: `list[string]` - `"document_ids"`: `list[string]` - - `"page"`: `integer` - - `"page_size"`: `integer` - - `"similarity_threshold"`: `float` - - `"vector_similarity_weight"`: `float` - - `"top_k"`: `integer` - - `"rerank_id"`: `string` - - `"keyword"`: `boolean` + - `"page"`: `integer` + - `"page_size"`: `integer` + - `"similarity_threshold"`: `float` + - `"vector_similarity_weight"`: `float` + - `"top_k"`: `integer` + - `"rerank_id"`: `string` + - `"keyword"`: `boolean` - `"highlight"`: `boolean` - `"cross_languages"`: `list[string]` - `"metadata_condition"`: `object` @@ -2393,45 +2396,45 @@ curl --request POST \ ##### Request parameter -- `"question"`: (*Body parameter*), `string`, *Required* +- `"question"`: (*Body parameter*), `string`, *Required* The user query or query keywords. -- `"dataset_ids"`: (*Body parameter*) `list[string]` +- `"dataset_ids"`: (*Body parameter*) `list[string]` The IDs of the datasets to search. If you do not set this argument, ensure that you set `"document_ids"`. -- `"document_ids"`: (*Body parameter*), `list[string]` +- `"document_ids"`: (*Body parameter*), `list[string]` The IDs of the documents to search. Ensure that all selected documents use the same embedding model. Otherwise, an error will occur. If you do not set this argument, ensure that you set `"dataset_ids"`. -- `"page"`: (*Body parameter*), `integer` +- `"page"`: (*Body parameter*), `integer` Specifies the page on which the chunks will be displayed. Defaults to `1`. -- `"page_size"`: (*Body parameter*) +- `"page_size"`: (*Body parameter*) The maximum number of chunks on each page. Defaults to `30`. -- `"similarity_threshold"`: (*Body parameter*) +- `"similarity_threshold"`: (*Body parameter*) The minimum similarity score. Defaults to `0.2`. -- `"vector_similarity_weight"`: (*Body parameter*), `float` +- `"vector_similarity_weight"`: (*Body parameter*), `float` The weight of vector cosine similarity. Defaults to `0.3`. If x represents the weight of vector cosine similarity, then (1 - x) is the term similarity weight. -- `"top_k"`: (*Body parameter*), `integer` +- `"top_k"`: (*Body parameter*), `integer` The number of chunks engaged in vector cosine computation. Defaults to `1024`. -- `"use_kg"`: (*Body parameter*), `boolean` +- `"use_kg"`: (*Body parameter*), `boolean` Whether to search chunks related to the generated knowledge graph for multi-hop queries. Defaults to `False`. Before enabling this, ensure you have successfully constructed a knowledge graph for the specified datasets. See [here](https://ragflow.io/docs/dev/construct_knowledge_graph) for details. -- `"toc_enhance"`: (*Body parameter*), `boolean` +- `"toc_enhance"`: (*Body parameter*), `boolean` Whether to search chunks with extracted table of content. Defaults to `False`. Before enabling this, ensure you have enabled `TOC_Enhance` and successfully extracted table of contents for the specified datasets. See [here](https://ragflow.io/docs/dev/enable_table_of_contents) for details. -- `"rerank_id"`: (*Body parameter*), `integer` +- `"rerank_id"`: (*Body parameter*), `integer` The ID of the rerank model. -- `"keyword"`: (*Body parameter*), `boolean` - Indicates whether to enable keyword-based matching: +- `"keyword"`: (*Body parameter*), `boolean` + Indicates whether to enable keyword-based matching: - `true`: Enable keyword-based matching. - `false`: Disable keyword-based matching (default). -- `"highlight"`: (*Body parameter*), `boolean` - Specifies whether to enable highlighting of matched terms in the results: +- `"highlight"`: (*Body parameter*), `boolean` + Specifies whether to enable highlighting of matched terms in the results: - `true`: Enable highlighting of matched terms. - `false`: Disable highlighting of matched terms (default). -- `"cross_languages"`: (*Body parameter*) `list[string]` +- `"cross_languages"`: (*Body parameter*) `list[string]` The languages that should be translated into, in order to achieve keywords retrievals in different languages. -- `"metadata_condition"`: (*Body parameter*), `object` - The metadata condition used for filtering chunks: +- `"metadata_condition"`: (*Body parameter*), `object` + The metadata condition used for filtering chunks: - `"logic"`: (*Body parameter*), `string` - `"and"`: Return only results that satisfy *every* condition (default). - `"or"`: Return results that satisfy *any* condition. - - `"conditions"`: (*Body parameter*), `array` - A list of metadata filter conditions. + - `"conditions"`: (*Body parameter*), `array` + A list of metadata filter conditions. - `"name"`: `string` - The metadata field name to filter by, e.g., `"author"`, `"company"`, `"url"`. Ensure this parameter before use. See [Set metadata](../guides/dataset/set_metadata.md) for details. - `comparison_operator`: `string` - The comparison operator. Can be one of: - `"contains"` @@ -2538,16 +2541,16 @@ curl --request POST \ ##### Request parameters -- `"name"`: (*Body parameter*), `string`, *Required* +- `"name"`: (*Body parameter*), `string`, *Required* The name of the chat assistant. -- `"avatar"`: (*Body parameter*), `string` +- `"avatar"`: (*Body parameter*), `string` Base64 encoding of the avatar. -- `"dataset_ids"`: (*Body parameter*), `list[string]` +- `"dataset_ids"`: (*Body parameter*), `list[string]` The IDs of the associated datasets. -- `"llm"`: (*Body parameter*), `object` - The LLM settings for the chat assistant to create. If it is not explicitly set, a JSON object with the following values will be generated as the default. An `llm` JSON object contains the following attributes: - - `"model_name"`, `string` - The chat model name. If not set, the user's default chat model will be used. +- `"llm"`: (*Body parameter*), `object` + The LLM settings for the chat assistant to create. If it is not explicitly set, a JSON object with the following values will be generated as the default. An `llm` JSON object contains the following attributes: + - `"model_name"`, `string` + The chat model name. If not set, the user's default chat model will be used. :::caution WARNING `model_type` is an *internal* parameter, serving solely as a temporary workaround for the current model-configuration design limitations. @@ -2558,23 +2561,23 @@ curl --request POST \ - It is subject to change or removal in future releases. ::: - - `"model_type"`: `string` + - `"model_type"`: `string` A model type specifier. Only `"chat"` and `"image2text"` are recognized; any other inputs, or when omitted, are treated as `"chat"`. - `"model_name"`, `string` - - `"temperature"`: `float` - Controls the randomness of the model's predictions. A lower temperature results in more conservative responses, while a higher temperature yields more creative and diverse responses. Defaults to `0.1`. - - `"top_p"`: `float` - Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. It focuses on the most likely words, cutting off the less probable ones. Defaults to `0.3` - - `"presence_penalty"`: `float` + - `"temperature"`: `float` + Controls the randomness of the model's predictions. A lower temperature results in more conservative responses, while a higher temperature yields more creative and diverse responses. Defaults to `0.1`. + - `"top_p"`: `float` + Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. It focuses on the most likely words, cutting off the less probable ones. Defaults to `0.3` + - `"presence_penalty"`: `float` This discourages the model from repeating the same information by penalizing words that have already appeared in the conversation. Defaults to `0.4`. - - `"frequency penalty"`: `float` + - `"frequency penalty"`: `float` Similar to the presence penalty, this reduces the model’s tendency to repeat the same words frequently. Defaults to `0.7`. -- `"prompt"`: (*Body parameter*), `object` - Instructions for the LLM to follow. If it is not explicitly set, a JSON object with the following values will be generated as the default. A `prompt` JSON object contains the following attributes: +- `"prompt"`: (*Body parameter*), `object` + Instructions for the LLM to follow. If it is not explicitly set, a JSON object with the following values will be generated as the default. A `prompt` JSON object contains the following attributes: - `"similarity_threshold"`: `float` RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted reranking score during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`. - `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`. - `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `6`. - - `"variables"`: `object[]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that: + - `"variables"`: `object[]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that: - `"knowledge"` is a reserved variable, which represents the retrieved chunks. - All the variables in 'System' should be curly bracketed. - The default value is `[{"key": "knowledge", "optional": true}]`. @@ -2682,32 +2685,32 @@ curl --request PUT \ #### Parameters -- `chat_id`: (*Path parameter*) +- `chat_id`: (*Path parameter*) The ID of the chat assistant to update. -- `"name"`: (*Body parameter*), `string`, *Required* +- `"name"`: (*Body parameter*), `string`, *Required* The revised name of the chat assistant. -- `"avatar"`: (*Body parameter*), `string` +- `"avatar"`: (*Body parameter*), `string` Base64 encoding of the avatar. -- `"dataset_ids"`: (*Body parameter*), `list[string]` +- `"dataset_ids"`: (*Body parameter*), `list[string]` The IDs of the associated datasets. -- `"llm"`: (*Body parameter*), `object` - The LLM settings for the chat assistant to create. If it is not explicitly set, a dictionary with the following values will be generated as the default. An `llm` object contains the following attributes: - - `"model_name"`, `string` - The chat model name. If not set, the user's default chat model will be used. - - `"temperature"`: `float` - Controls the randomness of the model's predictions. A lower temperature results in more conservative responses, while a higher temperature yields more creative and diverse responses. Defaults to `0.1`. - - `"top_p"`: `float` - Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. It focuses on the most likely words, cutting off the less probable ones. Defaults to `0.3` - - `"presence_penalty"`: `float` +- `"llm"`: (*Body parameter*), `object` + The LLM settings for the chat assistant to create. If it is not explicitly set, a dictionary with the following values will be generated as the default. An `llm` object contains the following attributes: + - `"model_name"`, `string` + The chat model name. If not set, the user's default chat model will be used. + - `"temperature"`: `float` + Controls the randomness of the model's predictions. A lower temperature results in more conservative responses, while a higher temperature yields more creative and diverse responses. Defaults to `0.1`. + - `"top_p"`: `float` + Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. It focuses on the most likely words, cutting off the less probable ones. Defaults to `0.3` + - `"presence_penalty"`: `float` This discourages the model from repeating the same information by penalizing words that have already appeared in the conversation. Defaults to `0.2`. - - `"frequency penalty"`: `float` + - `"frequency penalty"`: `float` Similar to the presence penalty, this reduces the model’s tendency to repeat the same words frequently. Defaults to `0.7`. -- `"prompt"`: (*Body parameter*), `object` - Instructions for the LLM to follow. A `prompt` object contains the following attributes: +- `"prompt"`: (*Body parameter*), `object` + Instructions for the LLM to follow. A `prompt` object contains the following attributes: - `"similarity_threshold"`: `float` RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted rerank score during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`. - `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`. - `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`. - - `"variables"`: `object[]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that: + - `"variables"`: `object[]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that: - `"knowledge"` is a reserved variable, which represents the retrieved chunks. - All the variables in 'System' should be curly bracketed. - The default value is `[{"key": "knowledge", "optional": true}]` @@ -2769,7 +2772,7 @@ curl --request DELETE \ ##### Request parameters -- `"ids"`: (*Body parameter*), `list[string]` +- `"ids"`: (*Body parameter*), `list[string]` The IDs of the chat assistants to delete. If it is not specified, all chat assistants in the system will be deleted. #### Response @@ -2816,19 +2819,19 @@ curl --request GET \ ##### Request parameters -- `page`: (*Filter parameter*), `integer` +- `page`: (*Filter parameter*), `integer` Specifies the page on which the chat assistants will be displayed. Defaults to `1`. -- `page_size`: (*Filter parameter*), `integer` +- `page_size`: (*Filter parameter*), `integer` The number of chat assistants on each page. Defaults to `30`. -- `orderby`: (*Filter parameter*), `string` +- `orderby`: (*Filter parameter*), `string` The attribute by which the results are sorted. Available options: - `create_time` (default) - `update_time` -- `desc`: (*Filter parameter*), `boolean` +- `desc`: (*Filter parameter*), `boolean` Indicates whether the retrieved chat assistants should be sorted in descending order. Defaults to `true`. -- `id`: (*Filter parameter*), `string` +- `id`: (*Filter parameter*), `string` The ID of the chat assistant to retrieve. -- `name`: (*Filter parameter*), `string` +- `name`: (*Filter parameter*), `string` The name of the chat assistant to retrieve. #### Response @@ -2929,11 +2932,11 @@ curl --request POST \ ##### Request parameters -- `chat_id`: (*Path parameter*) +- `chat_id`: (*Path parameter*) The ID of the associated chat assistant. -- `"name"`: (*Body parameter*), `string` +- `"name"`: (*Body parameter*), `string` The name of the chat session to create. -- `"user_id"`: (*Body parameter*), `string` +- `"user_id"`: (*Body parameter*), `string` Optional user-defined ID. #### Response @@ -3004,13 +3007,13 @@ curl --request PUT \ ##### Request Parameter -- `chat_id`: (*Path parameter*) +- `chat_id`: (*Path parameter*) The ID of the associated chat assistant. -- `session_id`: (*Path parameter*) +- `session_id`: (*Path parameter*) The ID of the session to update. -- `"name"`: (*Body Parameter*), `string` +- `"name"`: (*Body Parameter*), `string` The revised name of the session. -- `"user_id"`: (*Body parameter*), `string` +- `"user_id"`: (*Body parameter*), `string` Optional user-defined ID. #### Response @@ -3057,23 +3060,23 @@ curl --request GET \ ##### Request Parameters -- `chat_id`: (*Path parameter*) +- `chat_id`: (*Path parameter*) The ID of the associated chat assistant. -- `page`: (*Filter parameter*), `integer` +- `page`: (*Filter parameter*), `integer` Specifies the page on which the sessions will be displayed. Defaults to `1`. -- `page_size`: (*Filter parameter*), `integer` +- `page_size`: (*Filter parameter*), `integer` The number of sessions on each page. Defaults to `30`. -- `orderby`: (*Filter parameter*), `string` - The field by which sessions should be sorted. Available options: +- `orderby`: (*Filter parameter*), `string` + The field by which sessions should be sorted. Available options: - `create_time` (default) - `update_time` -- `desc`: (*Filter parameter*), `boolean` +- `desc`: (*Filter parameter*), `boolean` Indicates whether the retrieved sessions should be sorted in descending order. Defaults to `true`. -- `name`: (*Filter parameter*) `string` +- `name`: (*Filter parameter*) `string` The name of the chat session to retrieve. -- `id`: (*Filter parameter*), `string` +- `id`: (*Filter parameter*), `string` The ID of the chat session to retrieve. -- `user_id`: (*Filter parameter*), `string` +- `user_id`: (*Filter parameter*), `string` The optional user-defined ID passed in when creating session. #### Response @@ -3145,9 +3148,9 @@ curl --request DELETE \ ##### Request Parameters -- `chat_id`: (*Path parameter*) +- `chat_id`: (*Path parameter*) The ID of the associated chat assistant. -- `"ids"`: (*Body Parameter*), `list[string]` +- `"ids"`: (*Body Parameter*), `list[string]` The IDs of the sessions to delete. If it is not specified, all sessions associated with the specified chat assistant will be deleted. #### Response @@ -3243,20 +3246,20 @@ curl --request POST \ ##### Request Parameters -- `chat_id`: (*Path parameter*) +- `chat_id`: (*Path parameter*) The ID of the associated chat assistant. -- `"question"`: (*Body Parameter*), `string`, *Required* +- `"question"`: (*Body Parameter*), `string`, *Required* The question to start an AI-powered conversation. -- `"stream"`: (*Body Parameter*), `boolean` +- `"stream"`: (*Body Parameter*), `boolean` Indicates whether to output responses in a streaming way: - `true`: Enable streaming (default). - `false`: Disable streaming. -- `"session_id"`: (*Body Parameter*) +- `"session_id"`: (*Body Parameter*) The ID of session. If it is not provided, a new session will be generated. -- `"user_id"`: (*Body parameter*), `string` +- `"user_id"`: (*Body parameter*), `string` The optional user-defined ID. Valid *only* when no `session_id` is provided. -- `"metadata_condition"`: (*Body parameter*), `object` - Optional metadata filter conditions applied to retrieval results. +- `"metadata_condition"`: (*Body parameter*), `object` + Optional metadata filter conditions applied to retrieval results. - `logic`: `string`, one of `and` / `or` - `conditions`: `list[object]` where each condition contains: - `name`: `string` metadata key @@ -3411,9 +3414,9 @@ curl --request POST \ ##### Request parameters -- `agent_id`: (*Path parameter*) +- `agent_id`: (*Path parameter*) The ID of the associated agent. -- `user_id`: (*Filter parameter*) +- `user_id`: (*Filter parameter*) The optional user-defined ID for parsing docs (especially images) when creating a session while uploading files. #### Response @@ -3625,7 +3628,7 @@ Failure: ### Converse with agent -**POST** `/api/v1/agents/{agent_id}/completions` +**POST** `/api/v1/agents/{agent_id}/completions` Asks a specified agent a question to start an AI-powered conversation. @@ -3687,7 +3690,7 @@ curl --request POST \ }' ``` -- If the **Begin** component takes parameters, include their values in the body of `"inputs"` as follows: +- If the **Begin** component takes parameters, include their values in the body of `"inputs"` as follows: ```bash curl --request POST \ @@ -3740,24 +3743,24 @@ curl --request POST \ ##### Request Parameters -- `agent_id`: (*Path parameter*), `string` +- `agent_id`: (*Path parameter*), `string` The ID of the associated agent. -- `"question"`: (*Body Parameter*), `string`, *Required* +- `"question"`: (*Body Parameter*), `string`, *Required* The question to start an AI-powered conversation. -- `"stream"`: (*Body Parameter*), `boolean` - Indicates whether to output responses in a streaming way: +- `"stream"`: (*Body Parameter*), `boolean` + Indicates whether to output responses in a streaming way: - `true`: Enable streaming (default). - `false`: Disable streaming. -- `"session_id"`: (*Body Parameter*) +- `"session_id"`: (*Body Parameter*) The ID of the session. If it is not provided, a new session will be generated. -- `"inputs"`: (*Body Parameter*) - Variables specified in the **Begin** component. -- `"user_id"`: (*Body parameter*), `string` +- `"inputs"`: (*Body Parameter*) + Variables specified in the **Begin** component. +- `"user_id"`: (*Body parameter*), `string` The optional user-defined ID. Valid *only* when no `session_id` is provided. :::tip NOTE -For now, this method does *not* support a file type input/variable. As a workaround, use the following to upload a file to an agent: -`http://{address}/v1/canvas/upload/{agent_id}` +For now, this method does *not* support a file type input/variable. As a workaround, use the following to upload a file to an agent: +`http://{address}/v1/canvas/upload/{agent_id}` *You will get a corresponding file ID from its response body.* ::: @@ -4304,23 +4307,23 @@ curl --request GET \ ##### Request Parameters -- `agent_id`: (*Path parameter*) +- `agent_id`: (*Path parameter*) The ID of the associated agent. -- `page`: (*Filter parameter*), `integer` +- `page`: (*Filter parameter*), `integer` Specifies the page on which the sessions will be displayed. Defaults to `1`. -- `page_size`: (*Filter parameter*), `integer` +- `page_size`: (*Filter parameter*), `integer` The number of sessions on each page. Defaults to `30`. -- `orderby`: (*Filter parameter*), `string` - The field by which sessions should be sorted. Available options: +- `orderby`: (*Filter parameter*), `string` + The field by which sessions should be sorted. Available options: - `create_time` (default) - `update_time` -- `desc`: (*Filter parameter*), `boolean` +- `desc`: (*Filter parameter*), `boolean` Indicates whether the retrieved sessions should be sorted in descending order. Defaults to `true`. -- `id`: (*Filter parameter*), `string` +- `id`: (*Filter parameter*), `string` The ID of the agent session to retrieve. -- `user_id`: (*Filter parameter*), `string` +- `user_id`: (*Filter parameter*), `string` The optional user-defined ID passed in when creating session. -- `dsl`: (*Filter parameter*), `boolean` +- `dsl`: (*Filter parameter*), `boolean` Indicates whether to include the dsl field of the sessions in the response. Defaults to `true`. #### Response @@ -4506,9 +4509,9 @@ curl --request DELETE \ ##### Request Parameters -- `agent_id`: (*Path parameter*) +- `agent_id`: (*Path parameter*) The ID of the associated agent. -- `"ids"`: (*Body Parameter*), `list[string]` +- `"ids"`: (*Body Parameter*), `list[string]` The IDs of the sessions to delete. If it is not specified, all sessions associated with the specified agent will be deleted. #### Response @@ -4639,19 +4642,19 @@ curl --request GET \ ##### Request parameters -- `page`: (*Filter parameter*), `integer` +- `page`: (*Filter parameter*), `integer` Specifies the page on which the agents will be displayed. Defaults to `1`. -- `page_size`: (*Filter parameter*), `integer` +- `page_size`: (*Filter parameter*), `integer` The number of agents on each page. Defaults to `30`. -- `orderby`: (*Filter parameter*), `string` +- `orderby`: (*Filter parameter*), `string` The attribute by which the results are sorted. Available options: - `create_time` (default) - `update_time` -- `desc`: (*Filter parameter*), `boolean` +- `desc`: (*Filter parameter*), `boolean` Indicates whether the retrieved agents should be sorted in descending order. Defaults to `true`. -- `id`: (*Filter parameter*), `string` +- `id`: (*Filter parameter*), `string` The ID of the agent to retrieve. -- `title`: (*Filter parameter*), `string` +- `title`: (*Filter parameter*), `string` The name of the agent to retrieve. #### Response @@ -4763,11 +4766,11 @@ curl --request POST \ ##### Request parameters -- `title`: (*Body parameter*), `string`, *Required* +- `title`: (*Body parameter*), `string`, *Required* The title of the agent. -- `description`: (*Body parameter*), `string` +- `description`: (*Body parameter*), `string` The description of the agent. Defaults to `None`. -- `dsl`: (*Body parameter*), `object`, *Required* +- `dsl`: (*Body parameter*), `object`, *Required* The canvas DSL object of the agent. #### Response @@ -4829,13 +4832,13 @@ curl --request PUT \ ##### Request parameters -- `agent_id`: (*Path parameter*), `string` +- `agent_id`: (*Path parameter*), `string` The id of the agent to be updated. -- `title`: (*Body parameter*), `string` +- `title`: (*Body parameter*), `string` The title of the agent. -- `description`: (*Body parameter*), `string` +- `description`: (*Body parameter*), `string` The description of the agent. -- `dsl`: (*Body parameter*), `object` +- `dsl`: (*Body parameter*), `object` The canvas DSL object of the agent. Only specify the parameter you want to change in the request body. If a parameter does not exist or is `None`, it won't be updated. @@ -4889,7 +4892,7 @@ curl --request DELETE \ ##### Request parameters -- `agent_id`: (*Path parameter*), `string` +- `agent_id`: (*Path parameter*), `string` The id of the agent to be deleted. #### Response @@ -4943,7 +4946,7 @@ curl --request GET ##### Request parameters -- `address`: (*Path parameter*), string +- `address`: (*Path parameter*), string The host and port of the backend service (e.g., `localhost:7897`). --- @@ -4986,11 +4989,11 @@ Content-Type: application/json } ``` -Explanation: +Explanation: -- Each service is reported as "ok" or "nok". -- The top-level `status` reflects overall health. -- If any service is "nok", detailed error info appears in `_meta`. +- Each service is reported as "ok" or "nok". +- The top-level `status` reflects overall health. +- If any service is "nok", detailed error info appears in `_meta`. --- @@ -5029,9 +5032,9 @@ curl --request POST \ ##### Request parameters -- `'file'`: (*Form parameter*), `file`, *Required* +- `'file'`: (*Form parameter*), `file`, *Required* The file(s) to upload. Multiple files can be uploaded in a single request. -- `'parent_id'`: (*Form parameter*), `string` +- `'parent_id'`: (*Form parameter*), `string` The parent folder ID where the file will be uploaded. If not specified, files will be uploaded to the root folder. #### Response @@ -5100,11 +5103,11 @@ curl --request POST \ ##### Request parameters -- `"name"`: (*Body parameter*), `string`, *Required* +- `"name"`: (*Body parameter*), `string`, *Required* The name of the file or folder to create. -- `"parent_id"`: (*Body parameter*), `string` +- `"parent_id"`: (*Body parameter*), `string` The parent folder ID. If not specified, the file/folder will be created in the root folder. -- `"type"`: (*Body parameter*), `string` +- `"type"`: (*Body parameter*), `string` The type of the file to create. Available options: - `"FOLDER"`: Create a folder - `"VIRTUAL"`: Create a virtual file @@ -5161,18 +5164,18 @@ curl --request GET \ ##### Request parameters -- `parent_id`: (*Filter parameter*), `string` +- `parent_id`: (*Filter parameter*), `string` The folder ID to list files from. If not specified, the root folder is used by default. -- `keywords`: (*Filter parameter*), `string` +- `keywords`: (*Filter parameter*), `string` Search keyword to filter files by name. -- `page`: (*Filter parameter*), `integer` +- `page`: (*Filter parameter*), `integer` Specifies the page on which the files will be displayed. Defaults to `1`. -- `page_size`: (*Filter parameter*), `integer` +- `page_size`: (*Filter parameter*), `integer` The number of files on each page. Defaults to `15`. -- `orderby`: (*Filter parameter*), `string` +- `orderby`: (*Filter parameter*), `string` The field by which files should be sorted. Available options: - `create_time` (default) -- `desc`: (*Filter parameter*), `boolean` +- `desc`: (*Filter parameter*), `boolean` Indicates whether the retrieved files should be sorted in descending order. Defaults to `true`. #### Response @@ -5280,7 +5283,7 @@ curl --request GET \ ##### Request parameters -- `file_id`: (*Filter parameter*), `string`, *Required* +- `file_id`: (*Filter parameter*), `string`, *Required* The ID of the file whose immediate parent folder to retrieve. #### Response @@ -5333,7 +5336,7 @@ curl --request GET \ ##### Request parameters -- `file_id`: (*Filter parameter*), `string`, *Required* +- `file_id`: (*Filter parameter*), `string`, *Required* The ID of the file whose parent folders to retrieve. #### Response @@ -5399,7 +5402,7 @@ curl --request POST \ ##### Request parameters -- `"file_ids"`: (*Body parameter*), `list[string]`, *Required* +- `"file_ids"`: (*Body parameter*), `list[string]`, *Required* The IDs of the files or folders to delete. #### Response @@ -5456,9 +5459,9 @@ curl --request POST \ ##### Request parameters -- `"file_id"`: (*Body parameter*), `string`, *Required* +- `"file_id"`: (*Body parameter*), `string`, *Required* The ID of the file or folder to rename. -- `"name"`: (*Body parameter*), `string`, *Required* +- `"name"`: (*Body parameter*), `string`, *Required* The new name for the file or folder. Note: Changing file extensions is *not* supported. #### Response @@ -5516,7 +5519,7 @@ curl --request GET \ ##### Request parameters -- `file_id`: (*Path parameter*), `string`, *Required* +- `file_id`: (*Path parameter*), `string`, *Required* The ID of the file to download. #### Response @@ -5568,9 +5571,9 @@ curl --request POST \ ##### Request parameters -- `"src_file_ids"`: (*Body parameter*), `list[string]`, *Required* +- `"src_file_ids"`: (*Body parameter*), `list[string]`, *Required* The IDs of the files or folders to move. -- `"dest_file_id"`: (*Body parameter*), `string`, *Required* +- `"dest_file_id"`: (*Body parameter*), `string`, *Required* The ID of the destination folder. #### Response @@ -5636,9 +5639,9 @@ curl --request POST \ ##### Request parameters -- `"file_ids"`: (*Body parameter*), `list[string]`, *Required* +- `"file_ids"`: (*Body parameter*), `list[string]`, *Required* The IDs of the files to convert. If a folder ID is provided, all files within that folder will be converted. -- `"kb_ids"`: (*Body parameter*), `list[string]`, *Required* +- `"kb_ids"`: (*Body parameter*), `list[string]`, *Required* The IDs of the target datasets. #### Response diff --git a/docs/references/python_api_reference.md b/docs/references/python_api_reference.md index 3689da3f3..089dd9819 100644 --- a/docs/references/python_api_reference.md +++ b/docs/references/python_api_reference.md @@ -1,6 +1,9 @@ --- sidebar_position: 5 slug: /python_api_reference +sidebar_custom_props: { + categoryIcon: SiPython +} --- # Python API @@ -108,7 +111,7 @@ RAGFlow.create_dataset( avatar: Optional[str] = None, description: Optional[str] = None, embedding_model: Optional[str] = "BAAI/bge-large-zh-v1.5@BAAI", - permission: str = "me", + permission: str = "me", chunk_method: str = "naive", parser_config: DataSet.ParserConfig = None ) -> DataSet @@ -136,7 +139,7 @@ A brief description of the dataset to create. Defaults to `None`. ##### permission -Specifies who can access the dataset to create. Available options: +Specifies who can access the dataset to create. Available options: - `"me"`: (Default) Only you can manage the dataset. - `"team"`: All team members can manage the dataset. @@ -161,29 +164,29 @@ The chunking method of the dataset to create. Available options: The parser configuration of the dataset. A `ParserConfig` object's attributes vary based on the selected `chunk_method`: -- `chunk_method`=`"naive"`: +- `chunk_method`=`"naive"`: `{"chunk_token_num":512,"delimiter":"\\n","html4excel":False,"layout_recognize":True,"raptor":{"use_raptor":False}}`. -- `chunk_method`=`"qa"`: +- `chunk_method`=`"qa"`: `{"raptor": {"use_raptor": False}}` -- `chunk_method`=`"manuel"`: +- `chunk_method`=`"manuel"`: `{"raptor": {"use_raptor": False}}` -- `chunk_method`=`"table"`: +- `chunk_method`=`"table"`: `None` -- `chunk_method`=`"paper"`: +- `chunk_method`=`"paper"`: `{"raptor": {"use_raptor": False}}` -- `chunk_method`=`"book"`: +- `chunk_method`=`"book"`: `{"raptor": {"use_raptor": False}}` -- `chunk_method`=`"laws"`: +- `chunk_method`=`"laws"`: `{"raptor": {"use_raptor": False}}` -- `chunk_method`=`"picture"`: +- `chunk_method`=`"picture"`: `None` -- `chunk_method`=`"presentation"`: +- `chunk_method`=`"presentation"`: `{"raptor": {"use_raptor": False}}` -- `chunk_method`=`"one"`: +- `chunk_method`=`"one"`: `None` -- `chunk_method`=`"knowledge-graph"`: +- `chunk_method`=`"knowledge-graph"`: `{"chunk_token_num":128,"delimiter":"\\n","entity_types":["organization","person","location","event","time"]}` -- `chunk_method`=`"email"`: +- `chunk_method`=`"email"`: `None` #### Returns @@ -236,9 +239,9 @@ rag_object.delete_datasets(ids=["d94a8dc02c9711f0930f7fbc369eab6d","e94a8dc02c97 ```python RAGFlow.list_datasets( - page: int = 1, - page_size: int = 30, - orderby: str = "create_time", + page: int = 1, + page_size: int = 30, + orderby: str = "create_time", desc: bool = True, id: str = None, name: str = None @@ -317,25 +320,25 @@ A dictionary representing the attributes to update, with the following keys: - Basic Multilingual Plane (BMP) only - Maximum 128 characters - Case-insensitive -- `"avatar"`: (*Body parameter*), `string` +- `"avatar"`: (*Body parameter*), `string` The updated base64 encoding of the avatar. - Maximum 65535 characters -- `"embedding_model"`: (*Body parameter*), `string` - The updated embedding model name. +- `"embedding_model"`: (*Body parameter*), `string` + The updated embedding model name. - Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`. - Maximum 255 characters - Must follow `model_name@model_factory` format -- `"permission"`: (*Body parameter*), `string` - The updated dataset permission. Available options: +- `"permission"`: (*Body parameter*), `string` + The updated dataset permission. Available options: - `"me"`: (Default) Only you can manage the dataset. - `"team"`: All team members can manage the dataset. -- `"pagerank"`: (*Body parameter*), `int` +- `"pagerank"`: (*Body parameter*), `int` refer to [Set page rank](https://ragflow.io/docs/dev/set_page_rank) - Default: `0` - Minimum: `0` - Maximum: `100` -- `"chunk_method"`: (*Body parameter*), `enum` - The chunking method for the dataset. Available options: +- `"chunk_method"`: (*Body parameter*), `enum` + The chunking method for the dataset. Available options: - `"naive"`: General (default) - `"book"`: Book - `"email"`: Email @@ -385,7 +388,7 @@ Uploads documents to the current dataset. A list of dictionaries representing the documents to upload, each containing the following keys: -- `"display_name"`: (Optional) The file name to display in the dataset. +- `"display_name"`: (Optional) The file name to display in the dataset. - `"blob"`: (Optional) The binary content of the file to upload. #### Returns @@ -431,29 +434,29 @@ A dictionary representing the attributes to update, with the following keys: - `"one"`: One - `"email"`: Email - `"parser_config"`: `dict[str, Any]` The parsing configuration for the document. Its attributes vary based on the selected `"chunk_method"`: - - `"chunk_method"`=`"naive"`: + - `"chunk_method"`=`"naive"`: `{"chunk_token_num":128,"delimiter":"\\n","html4excel":False,"layout_recognize":True,"raptor":{"use_raptor":False}}`. - - `chunk_method`=`"qa"`: + - `chunk_method`=`"qa"`: `{"raptor": {"use_raptor": False}}` - - `chunk_method`=`"manuel"`: + - `chunk_method`=`"manuel"`: `{"raptor": {"use_raptor": False}}` - - `chunk_method`=`"table"`: + - `chunk_method`=`"table"`: `None` - - `chunk_method`=`"paper"`: + - `chunk_method`=`"paper"`: `{"raptor": {"use_raptor": False}}` - - `chunk_method`=`"book"`: + - `chunk_method`=`"book"`: `{"raptor": {"use_raptor": False}}` - - `chunk_method`=`"laws"`: + - `chunk_method`=`"laws"`: `{"raptor": {"use_raptor": False}}` - - `chunk_method`=`"presentation"`: + - `chunk_method`=`"presentation"`: `{"raptor": {"use_raptor": False}}` - - `chunk_method`=`"picture"`: + - `chunk_method`=`"picture"`: `None` - - `chunk_method`=`"one"`: + - `chunk_method`=`"one"`: `None` - - `chunk_method`=`"knowledge-graph"`: + - `chunk_method`=`"knowledge-graph"`: `{"chunk_token_num":128,"delimiter":"\\n","entity_types":["organization","person","location","event","time"]}` - - `chunk_method`=`"email"`: + - `chunk_method`=`"email"`: `None` #### Returns @@ -586,27 +589,27 @@ A `Document` object contains the following attributes: - `"FAIL"` - `status`: `str` Reserved for future use. - `parser_config`: `ParserConfig` Configuration object for the parser. Its attributes vary based on the selected `chunk_method`: - - `chunk_method`=`"naive"`: + - `chunk_method`=`"naive"`: `{"chunk_token_num":128,"delimiter":"\\n","html4excel":False,"layout_recognize":True,"raptor":{"use_raptor":False}}`. - - `chunk_method`=`"qa"`: + - `chunk_method`=`"qa"`: `{"raptor": {"use_raptor": False}}` - - `chunk_method`=`"manuel"`: + - `chunk_method`=`"manuel"`: `{"raptor": {"use_raptor": False}}` - - `chunk_method`=`"table"`: + - `chunk_method`=`"table"`: `None` - - `chunk_method`=`"paper"`: + - `chunk_method`=`"paper"`: `{"raptor": {"use_raptor": False}}` - - `chunk_method`=`"book"`: + - `chunk_method`=`"book"`: `{"raptor": {"use_raptor": False}}` - - `chunk_method`=`"laws"`: + - `chunk_method`=`"laws"`: `{"raptor": {"use_raptor": False}}` - - `chunk_method`=`"presentation"`: + - `chunk_method`=`"presentation"`: `{"raptor": {"use_raptor": False}}` - - `chunk_method`=`"picure"`: + - `chunk_method`=`"picure"`: `None` - - `chunk_method`=`"one"`: + - `chunk_method`=`"one"`: `None` - - `chunk_method`=`"email"`: + - `chunk_method`=`"email"`: `None` #### Examples @@ -724,9 +727,9 @@ A list of tuples with detailed parsing results: ... ] ``` -- `status`: The final parsing state (e.g., `success`, `failed`, `cancelled`). -- `chunk_count`: The number of content chunks created from the document. -- `token_count`: The total number of tokens processed. +- `status`: The final parsing state (e.g., `success`, `failed`, `cancelled`). +- `chunk_count`: The number of content chunks created from the document. +- `token_count`: The total number of tokens processed. --- @@ -986,11 +989,11 @@ The user query or query keywords. Defaults to `""`. ##### dataset_ids: `list[str]`, *Required* -The IDs of the datasets to search. Defaults to `None`. +The IDs of the datasets to search. Defaults to `None`. ##### document_ids: `list[str]` -The IDs of the documents to search. Defaults to `None`. You must ensure all selected documents use the same embedding model. Otherwise, an error will occur. +The IDs of the documents to search. Defaults to `None`. You must ensure all selected documents use the same embedding model. Otherwise, an error will occur. ##### page: `int` @@ -1023,7 +1026,7 @@ Indicates whether to enable keyword-based matching: - `True`: Enable keyword-based matching. - `False`: Disable keyword-based matching (default). -##### cross_languages: `list[string]` +##### cross_languages: `list[string]` The languages that should be translated into, in order to achieve keywords retrievals in different languages. @@ -1064,10 +1067,10 @@ for c in rag_object.retrieve(dataset_ids=[dataset.id],document_ids=[doc.id]): ```python RAGFlow.create_chat( - name: str, - avatar: str = "", - dataset_ids: list[str] = [], - llm: Chat.LLM = None, + name: str, + avatar: str = "", + dataset_ids: list[str] = [], + llm: Chat.LLM = None, prompt: Chat.Prompt = None ) -> Chat ``` @@ -1092,15 +1095,15 @@ The IDs of the associated datasets. Defaults to `[""]`. The LLM settings for the chat assistant to create. Defaults to `None`. When the value is `None`, a dictionary with the following values will be generated as the default. An `LLM` object contains the following attributes: -- `model_name`: `str` - The chat model name. If it is `None`, the user's default chat model will be used. -- `temperature`: `float` - Controls the randomness of the model's predictions. A lower temperature results in more conservative responses, while a higher temperature yields more creative and diverse responses. Defaults to `0.1`. -- `top_p`: `float` - Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. It focuses on the most likely words, cutting off the less probable ones. Defaults to `0.3` -- `presence_penalty`: `float` +- `model_name`: `str` + The chat model name. If it is `None`, the user's default chat model will be used. +- `temperature`: `float` + Controls the randomness of the model's predictions. A lower temperature results in more conservative responses, while a higher temperature yields more creative and diverse responses. Defaults to `0.1`. +- `top_p`: `float` + Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. It focuses on the most likely words, cutting off the less probable ones. Defaults to `0.3` +- `presence_penalty`: `float` This discourages the model from repeating the same information by penalizing words that have already appeared in the conversation. Defaults to `0.2`. -- `frequency penalty`: `float` +- `frequency penalty`: `float` Similar to the presence penalty, this reduces the model’s tendency to repeat the same words frequently. Defaults to `0.7`. ##### prompt: `Chat.Prompt` @@ -1160,8 +1163,8 @@ A dictionary representing the attributes to update, with the following keys: - `"dataset_ids"`: `list[str]` The datasets to update. - `"llm"`: `dict` The LLM settings: - `"model_name"`, `str` The chat model name. - - `"temperature"`, `float` Controls the randomness of the model's predictions. A lower temperature results in more conservative responses, while a higher temperature yields more creative and diverse responses. - - `"top_p"`, `float` Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. + - `"temperature"`, `float` Controls the randomness of the model's predictions. A lower temperature results in more conservative responses, while a higher temperature yields more creative and diverse responses. + - `"top_p"`, `float` Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. - `"presence_penalty"`, `float` This discourages the model from repeating the same information by penalizing words that have appeared in the conversation. - `"frequency penalty"`, `float` Similar to presence penalty, this reduces the model’s tendency to repeat the same words. - `"prompt"` : Instructions for the LLM to follow. @@ -1231,9 +1234,9 @@ rag_object.delete_chats(ids=["id_1","id_2"]) ```python RAGFlow.list_chats( - page: int = 1, - page_size: int = 30, - orderby: str = "create_time", + page: int = 1, + page_size: int = 30, + orderby: str = "create_time", desc: bool = True, id: str = None, name: str = None @@ -1263,11 +1266,11 @@ The attribute by which the results are sorted. Available options: Indicates whether the retrieved chat assistants should be sorted in descending order. Defaults to `True`. -##### id: `str` +##### id: `str` The ID of the chat assistant to retrieve. Defaults to `None`. -##### name: `str` +##### name: `str` The name of the chat assistant to retrieve. Defaults to `None`. @@ -1367,9 +1370,9 @@ session.update({"name": "updated_name"}) ```python Chat.list_sessions( - page: int = 1, - page_size: int = 30, - orderby: str = "create_time", + page: int = 1, + page_size: int = 30, + orderby: str = "create_time", desc: bool = True, id: str = None, name: str = None @@ -1506,25 +1509,25 @@ The content of the message. Defaults to `"Hi! I am your assistant, can I help yo A list of `Chunk` objects representing references to the message, each containing the following attributes: -- `id` `str` +- `id` `str` The chunk ID. -- `content` `str` +- `content` `str` The content of the chunk. -- `img_id` `str` +- `img_id` `str` The ID of the snapshot of the chunk. Applicable only when the source of the chunk is an image, PPT, PPTX, or PDF file. -- `document_id` `str` +- `document_id` `str` The ID of the referenced document. -- `document_name` `str` +- `document_name` `str` The name of the referenced document. -- `position` `list[str]` +- `position` `list[str]` The location information of the chunk within the referenced document. -- `dataset_id` `str` +- `dataset_id` `str` The ID of the dataset to which the referenced document belongs. -- `similarity` `float` +- `similarity` `float` A composite similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity. It is the weighted sum of `vector_similarity` and `term_similarity`. -- `vector_similarity` `float` +- `vector_similarity` `float` A vector similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between vector embeddings. -- `term_similarity` `float` +- `term_similarity` `float` A keyword similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between keywords. #### Examples @@ -1535,7 +1538,7 @@ from ragflow_sdk import RAGFlow rag_object = RAGFlow(api_key="", base_url="http://:9380") assistant = rag_object.list_chats(name="Miss R") assistant = assistant[0] -session = assistant.create_session() +session = assistant.create_session() print("\n==================== Miss R =====================\n") print("Hello. What can I do for you?") @@ -1543,7 +1546,7 @@ print("Hello. What can I do for you?") while True: question = input("\n==================== User =====================\n> ") print("\n==================== Miss R =====================\n") - + cont = "" for ans in session.ask(question, stream=True): print(ans.content[len(cont):], end='', flush=True) @@ -1631,25 +1634,25 @@ The content of the message. Defaults to `"Hi! I am your assistant, can I help yo A list of `Chunk` objects representing references to the message, each containing the following attributes: -- `id` `str` +- `id` `str` The chunk ID. -- `content` `str` +- `content` `str` The content of the chunk. -- `image_id` `str` +- `image_id` `str` The ID of the snapshot of the chunk. Applicable only when the source of the chunk is an image, PPT, PPTX, or PDF file. -- `document_id` `str` +- `document_id` `str` The ID of the referenced document. -- `document_name` `str` +- `document_name` `str` The name of the referenced document. -- `position` `list[str]` +- `position` `list[str]` The location information of the chunk within the referenced document. -- `dataset_id` `str` +- `dataset_id` `str` The ID of the dataset to which the referenced document belongs. -- `similarity` `float` +- `similarity` `float` A composite similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity. It is the weighted sum of `vector_similarity` and `term_similarity`. -- `vector_similarity` `float` +- `vector_similarity` `float` A vector similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between vector embeddings. -- `term_similarity` `float` +- `term_similarity` `float` A keyword similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between keywords. #### Examples @@ -1660,7 +1663,7 @@ from ragflow_sdk import RAGFlow, Agent rag_object = RAGFlow(api_key="", base_url="http://:9380") AGENT_id = "AGENT_ID" agent = rag_object.list_agents(id = AGENT_id)[0] -session = agent.create_session() +session = agent.create_session() print("\n===== Miss R ====\n") print("Hello. What can I do for you?") @@ -1668,7 +1671,7 @@ print("Hello. What can I do for you?") while True: question = input("\n===== User ====\n> ") print("\n==== Miss R ====\n") - + cont = "" for ans in session.ask(question, stream=True): print(ans.content[len(cont):], end='', flush=True) @@ -1681,9 +1684,9 @@ while True: ```python Agent.list_sessions( - page: int = 1, - page_size: int = 30, - orderby: str = "update_time", + page: int = 1, + page_size: int = 30, + orderby: str = "update_time", desc: bool = True, id: str = None ) -> List[Session] @@ -1774,9 +1777,9 @@ agent.delete_sessions(ids=["id_1","id_2"]) ```python RAGFlow.list_agents( - page: int = 1, - page_size: int = 30, - orderby: str = "create_time", + page: int = 1, + page_size: int = 30, + orderby: str = "create_time", desc: bool = True, id: str = None, title: str = None @@ -1806,11 +1809,11 @@ The attribute by which the results are sorted. Available options: Indicates whether the retrieved agents should be sorted in descending order. Defaults to `True`. -##### id: `str` +##### id: `str` The ID of the agent to retrieve. Defaults to `None`. -##### name: `str` +##### name: `str` The name of the agent to retrieve. Defaults to `None`. diff --git a/docs/references/supported_models.mdx b/docs/references/supported_models.mdx index a572fb849..1d7a0387c 100644 --- a/docs/references/supported_models.mdx +++ b/docs/references/supported_models.mdx @@ -1,6 +1,9 @@ --- sidebar_position: 1 slug: /supported_models +sidebar_custom_props: { + categoryIcon: LucideBox +} --- # Supported models diff --git a/docs/release_notes.md b/docs/release_notes.md index 98d5dfbe0..e724f5037 100644 --- a/docs/release_notes.md +++ b/docs/release_notes.md @@ -1,6 +1,9 @@ --- sidebar_position: 2 slug: /release_notes +sidebar_custom_props: { + sidebarIcon: LucideClipboardPenLine +} --- # Releases @@ -20,7 +23,7 @@ Released on December 31, 2025. ### Fixed issues -- Memory: +- Memory: - The RAGFlow server failed to start if an empty memory object existed. - Unable to delete a newly created empty Memory. - RAG: MDX file parsing was not supported. @@ -256,7 +259,7 @@ Ecommerce Customer Service Workflow: A template designed to handle enquiries abo ### Fixed issues -- Dataset: +- Dataset: - Unable to share resources with the team. - Inappropriate restrictions on the number and size of uploaded files. - Chat: @@ -272,13 +275,13 @@ Released on August 20, 2025. ### Improvements -- Revamps the user interface for the **Datasets**, **Chat**, and **Search** pages. +- Revamps the user interface for the **Datasets**, **Chat**, and **Search** pages. - Search and Chat: Introduces document-level metadata filtering, allowing automatic or manual filtering during chats or searches. - Search: Supports creating search apps tailored to various business scenarios - Chat: Supports comparing answer performance of up to three chat model settings on a single **Chat** page. -- Agent: - - Implements a toggle in the **Agent** component to enable or disable citation. - - Introduces a drag-and-drop method for creating components. +- Agent: + - Implements a toggle in the **Agent** component to enable or disable citation. + - Introduces a drag-and-drop method for creating components. - Documentation: Corrects inaccuracies in the API reference. ### New Agent templates @@ -288,8 +291,8 @@ Released on August 20, 2025. ### Fixed issues - The timeout mechanism introduced in v0.20.0 caused tasks like GraphRAG to halt. -- Predefined opening greeting in the **Agent** component was missing during conversations. -- An automatic line break issue in the prompt editor. +- Predefined opening greeting in the **Agent** component was missing during conversations. +- An automatic line break issue in the prompt editor. - A memory leak issue caused by PyPDF. [#9469](https://github.com/infiniflow/ragflow/pull/9469) ### API changes @@ -373,7 +376,7 @@ Released on June 23, 2025. ### Newly supported models -- Qwen 3 Embedding. [#8184](https://github.com/infiniflow/ragflow/pull/8184) +- Qwen 3 Embedding. [#8184](https://github.com/infiniflow/ragflow/pull/8184) - Voyage Multimodal 3. [#7987](https://github.com/infiniflow/ragflow/pull/7987) ## v0.19.0