Miscellaneous updates (#6245)

### What problem does this PR solve?


### Type of change


- [x] Documentation Update
This commit is contained in:
writinwaters
2025-03-18 19:49:06 +08:00
committed by GitHub
parent d16033dd2c
commit f540559c41
20 changed files with 94 additions and 55 deletions

View File

@ -51,18 +51,25 @@ If a rerank model is selected, a combination of weighted keyword similarity and
Using a rerank model will *significantly* increase the system's response time.
:::
### Tavily API key
If an API key is correctly set here, Tavily-based web searches will be used to supplement knowledge base retrieval.
### Use knowledge graph
It will retrieve descriptions of relevant entities,relations and community reports, which will enhance inference of multi-hop and complex question.
### Knowledge bases
*Required*
You are required to select the knowledge base(s) to retrieve data from.
Select the knowledge base(s) to retrieve data from.
:::danger IMPORTANT
If you select multiple knowledge bases, you must ensure that the knowledge bases (datasets) you select use the same embedding model; otherwise, an error message would occur.
:::
### Empty response
Set this as a response if no results are retrieved from the knowledge bases for your query, or leave this field blank to allow the LLM to improvise when nothing is found.
## Examples

View File

@ -39,18 +39,18 @@ This section covers the following topics:
RAGFlow offers multiple chunking template to facilitate chunking files of different layouts and ensure semantic integrity. In **Chunk method**, you can choose the default template that suits the layouts and formats of your files. The following table shows the descriptions and the compatible file formats of each supported chunk template:
| **Template** | Description | File format |
|--------------|-----------------------------------------------------------------------|------------------------------------------------------------------------------|
| General | Files are consecutively chunked based on a preset chunk token number. | DOCX, XLSX, XLS (Excel97~2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV |
| Q&A | | XLSX, XLS (Excel97~2003), CSV/TXT |
| Manual | | PDF |
| Table | | XLSX, XLS (Excel97~2003), CSV/TXT |
| Paper | | PDF |
| Book | | DOCX, PDF, TXT |
| Laws | | DOCX, PDF, TXT |
| Presentation | | PDF, PPTX |
| Picture | | JPEG, JPG, PNG, TIF, GIF |
| One | The entire document is chunked as one. | DOCX, XLSX, XLS (Excel97~2003), PDF, TXT |
| **Template** | Description | File format |
|--------------|-----------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
| General | Files are consecutively chunked based on a preset chunk token number. | DOCX, XLSX, XLS (Excel97~2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML |
| Q&A | | XLSX, XLS (Excel97~2003), CSV/TXT |
| Manual | | PDF |
| Table | | XLSX, XLS (Excel97~2003), CSV/TXT |
| Paper | | PDF |
| Book | | DOCX, PDF, TXT |
| Laws | | DOCX, PDF, TXT |
| Presentation | | PDF, PPTX |
| Picture | | JPEG, JPG, PNG, TIF, GIF |
| One | The entire document is chunked as one. | DOCX, XLSX, XLS (Excel97~2003), PDF, TXT |
You can also change a file's chunk method on the **Datasets** page.

View File

@ -13,6 +13,10 @@ Retrieval accuracy is the touchstone for a production-ready RAG framework. In ad
To use this feature, ensure you have at least one properly configured tag set, specify the tag set(s) on the **Configuration** page of your knowledge base (dataset), and then re-parse your documents to initiate the auto-tag process. During this process, each chunk in your dataset is compared with every entry in the specified tag set(s), and tags are automatically applied based on similarity.
:::danger IMPORTANT
The auto-tagging feature is *unavailable* on the [Infinity](https://github.com/infiniflow/infinity) document engine.
:::
## Scenarios
Auto-tagging applies in situations where chunks are so similar to each other that the intended chunks cannot be distinguished from the rest. For example, when you have a few chunks about iPhone and a majority about iPhone case or iPhone accessaries, it becomes difficult to retrieve the iPhone-specific chunks without additional information.

View File

@ -16,6 +16,10 @@ By default, each RAGFlow user is assigned a single team named after their name.
- Update the default configurations for your datasets.
- Parse documents in your datasets.
:::danger IMPORTANT
To allow your team members to view and update your knowledge base, ensure that you set **Permissions** on its **Configuration** page from **Only me** to **Team**.
:::
:::tip NOTE
Team members are currently *not* allowed to invite users to your team, and only you, the team owner, is permitted to do so.
:::
@ -43,3 +47,5 @@ When using email address to invite a team member, ensure it is associated with a
## Accept or decline team invite
![accept_or_decline_team_invite](https://github.com/user-attachments/assets/6a2cb61f-03d5-4423-9ed1-71df97ff4114)
_After accepting the team invite, you should be able to view and update the team owner's knowledge bases whose **Permissions** is set to **Team**._

View File

@ -51,7 +51,7 @@ Released on March 11, 2025.
- A repetitive knowledge graph extraction issue.
- Issues with API calling.
- Options in the **Document parser** dropdown are missing.
- Options in the **PDF parser**, aka **Document parser**, dropdown are missing.
- A Tavily web search issue.
- Unable to preview diagrams or images in an AI chat.
@ -59,7 +59,7 @@ Released on March 11, 2025.
#### Added documents
[Use tag set](./guides/dataset/use_tag_sets.md)
- [Use tag set](./guides/dataset/use_tag_sets.md)
## v0.17.0
@ -71,7 +71,7 @@ Released on March 3, 2025.
- AI chat: Leverages Tavily-based web search to enhance contexts in agentic reasoning. To activate this, enter the correct Tavily API key under the **Assistant Setting** tab of your chat assistant dialogue.
- AI chat: Supports starting a chat without specifying knowledge bases.
- AI chat: HTML files can also be previewed and referenced, in addition to PDF files.
- Dataset: Adds a **Document parser** dropdown menu to dataset configurations. This includes a DeepDoc model option, which is time-consuming, a much faster **naive** option (plain text), which skips DLA (Document Layout Analysis), OCR (Optical Character Recognition), and TSR (Table Structure Recognition) tasks, and several currently *experimental* large model options.
- Dataset: Adds a **PDF parser**, aka **Document parser**, dropdown menu to dataset configurations. This includes a DeepDoc model option, which is time-consuming, a much faster **naive** option (plain text), which skips DLA (Document Layout Analysis), OCR (Optical Character Recognition), and TSR (Table Structure Recognition) tasks, and several currently *experimental* large model options.
- Agent component: **(x)** or a forward slash `/` can be used to insert available keys (variables) in the system prompt field of the **Generate** or **Template** component.
- Object storage: Supports using Aliyun OSS (Object Storage Service) as a file storage option.
- Models: Updates the supported model list for Tongyi-Qianwen (Qwen), adding DeepSeek-specific models; adds ModelScope as a model provider.
@ -99,7 +99,7 @@ Adds a key option `"meta_fields"` to the [Update document](./references/python_a
#### Added documents
[Run retrieval test](./guides/dataset/run_retrieval_test.md)
- [Run retrieval test](./guides/dataset/run_retrieval_test.md)
## v0.16.0