DOC: Miscellaneous UI and editorial updates (#7324)

### What problem does this PR solve?



### Type of change


- [x] Documentation Update
This commit is contained in:
writinwaters
2025-04-27 11:44:08 +08:00
committed by GitHub
parent 3da8776a3c
commit dadd8d9f94
16 changed files with 97 additions and 68 deletions

View File

@ -71,7 +71,7 @@ As mentioned earlier, the **Begin** component is indispensable for an agent. Sti
### Is the uploaded file in a knowledge base?
No. Files uploaded to an agent as input are not stored in a knowledge base and hence will not be processed using RAGFlow's built-in OCR, DLR or TSR models, or chunked using RAGFlow's built-in chunk methods.
No. Files uploaded to an agent as input are not stored in a knowledge base and hence will not be processed using RAGFlow's built-in OCR, DLR or TSR models, or chunked using RAGFlow's built-in chunking methods.
### How to upload a webpage or file from a URL?

View File

@ -22,22 +22,22 @@ _Each time a knowledge base is created, a folder with the same name is generated
## Configure knowledge base
The following screenshot shows the configuration page of a knowledge base. A proper configuration of your knowledge base is crucial for future AI chats. For example, choosing the wrong embedding model or chunk method would cause unexpected semantic loss or mismatched answers in chats.
The following screenshot shows the configuration page of a knowledge base. A proper configuration of your knowledge base is crucial for future AI chats. For example, choosing the wrong embedding model or chunking method would cause unexpected semantic loss or mismatched answers in chats.
![knowledge base configuration](https://github.com/infiniflow/ragflow/assets/93570324/384c671a-8b9c-468c-b1c9-1401128a9b65)
This section covers the following topics:
- Select chunk method
- Select chunking method
- Select embedding model
- Upload file
- Parse file
- Intervene with file parsing results
- Run retrieval testing
### Select chunk method
### Select chunking method
RAGFlow offers multiple chunking template to facilitate chunking files of different layouts and ensure semantic integrity. In **Chunk method**, you can choose the default template that suits the layouts and formats of your files. The following table shows the descriptions and the compatible file formats of each supported chunk template:
RAGFlow offers multiple chunking template to facilitate chunking files of different layouts and ensure semantic integrity. In **Chunking method**, you can choose the default template that suits the layouts and formats of your files. The following table shows the descriptions and the compatible file formats of each supported chunk template:
| **Template** | Description | File format |
|--------------|-----------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
@ -54,9 +54,9 @@ RAGFlow offers multiple chunking template to facilitate chunking files of differ
| One | Each document is chunked in its entirety (as one). | DOCX, XLSX, XLS (Excel97~2003), PDF, TXT |
| Tag | The knowledge base functions as a tag set for the others. | XLSX, CSV/TXT |
You can also change a file's chunk method on the **Datasets** page.
You can also change a file's chunking method on the **Datasets** page.
![change chunk method](https://github.com/infiniflow/ragflow/assets/93570324/ac116353-2793-42b2-b181-65e7082bed42)
![change chunking method](https://github.com/infiniflow/ragflow/assets/93570324/ac116353-2793-42b2-b181-65e7082bed42)
### Select embedding model
@ -76,13 +76,13 @@ While uploading files directly to a knowledge base seems more convenient, we *hi
### Parse file
File parsing is a crucial topic in knowledge base configuration. The meaning of file parsing in RAGFlow is twofold: chunking files based on file layout and building embedding and full-text (keyword) indexes on these chunks. After having selected the chunk method and embedding model, you can start parsing a file:
File parsing is a crucial topic in knowledge base configuration. The meaning of file parsing in RAGFlow is twofold: chunking files based on file layout and building embedding and full-text (keyword) indexes on these chunks. After having selected the chunking method and embedding model, you can start parsing a file:
![parse file](https://github.com/infiniflow/ragflow/assets/93570324/5311f166-6426-447f-aa1f-bd488f1cfc7b)
- Click the play button next to **UNSTART** to start file parsing.
- Click the red-cross icon and then refresh, if your file parsing stalls for a long time.
- As shown above, RAGFlow allows you to use a different chunk method for a particular file, offering flexibility beyond the default method.
- As shown above, RAGFlow allows you to use a different chunking method for a particular file, offering flexibility beyond the default method.
- As shown above, RAGFlow allows you to enable or disable individual files, offering finer control over knowledge base-based AI chats.
### Intervene with file parsing results

View File

@ -9,7 +9,7 @@ Generate a knowledge graph for your knowledge base.
---
To enhance multi-hop question-answering, RAGFlow adds a knowledge graph construction step between data extraction and indexing, as illustrated below. This step creates additional chunks from existing ones generated by your specified chunk method.
To enhance multi-hop question-answering, RAGFlow adds a knowledge graph construction step between data extraction and indexing, as illustrated below. This step creates additional chunks from existing ones generated by your specified chunking method.
![Image](https://github.com/user-attachments/assets/1ec21d8e-f255-4d65-9918-69b72dfa142b)

View File

@ -67,8 +67,8 @@ It defaults to 0.1, with a maximum limit of 1. A higher **Threshold** means fewe
### Max cluster
The maximum number of clusters to create. Defaults to 108, with a maximum limit of 1024.
The maximum number of clusters to create. Defaults to 64, with a maximum limit of 1024.
### Random seed
A random seed. Click the **+** button to change the seed value.
A random seed. Click **+** to change the seed value.

View File

@ -11,7 +11,7 @@ Conduct a retrieval test on your knowledge base to check whether the intended ch
After your files are uploaded and parsed, it is recommended that you run a retrieval test before proceeding with the chat assistant configuration. Running a retrieval test is *not* an unnecessary or superfluous step at all! Just like fine-tuning a precision instrument, RAGFlow requires careful tuning to deliver optimal question answering performance. Your knowledge base settings, chat assistant configurations, and the specified large and small models can all significantly impact the final results. Running a retrieval test verifies whether the intended chunks can be recovered, allowing you to quickly identify areas for improvement or pinpoint any issue that needs addressing. For instance, when debugging your question answering system, if you know that the correct chunks can be retrieved, you can focus your efforts elsewhere. For example, in issue [#5627](https://github.com/infiniflow/ragflow/issues/5627), the problem was found to be due to the LLM's limitations.
During a retrieval test, chunks created from your specified chunk method are retrieved using a hybrid search. This search combines weighted keyword similarity with either weighted vector cosine similarity or a weighted reranking score, depending on your settings:
During a retrieval test, chunks created from your specified chunking method are retrieved using a hybrid search. This search combines weighted keyword similarity with either weighted vector cosine similarity or a weighted reranking score, depending on your settings:
- If no rerank model is selected, weighted keyword similarity will be combined with weighted vector cosine similarity.
- If a rerank model is selected, weighted keyword similarity will be combined with weighted vector reranking score.

View File

@ -32,7 +32,7 @@ The page rank value must be an integer. Range: [0,100]
If you set the page rank value to a non-integer, say 1.7, it will be rounded down to the nearest integer, which in this case is 1.
:::
## Mechanism
## Scoring mechanism
If you configure a chat assistant's **similarity threshold** to 0.2, only chunks with a hybrid score greater than 0.2 x 100 = 20 will be retrieved and sent to the chat model for content generation. This initial filtering step is crucial for narrowing down relevant information.

View File

@ -42,7 +42,7 @@ As a rule of thumb, consider including the following entries in your tag table:
### Create a tag set
1. Click **+ Create knowledge base** to create a knowledge base.
2. Navigate to the **Configuration** page of the created knowledge base and choose **Tag** as the default chunk method.
2. Navigate to the **Configuration** page of the created knowledge base and choose **Tag** as the default chunking method.
3. Navigate to the **Dataset** page and upload and parse your table file in XLSX, CSV, or TXT formats.
_A tag cloud appears under the **Tag view** section, indicating the tag set is created:_
![Image](https://github.com/user-attachments/assets/abefbcbf-c130-4abe-95e1-267b0d2a0505)