Miscellaneous updates (#6245)

### What problem does this PR solve?


### Type of change


- [x] Documentation Update
This commit is contained in:
writinwaters
2025-03-18 19:49:06 +08:00
committed by GitHub
parent d16033dd2c
commit f540559c41
20 changed files with 94 additions and 55 deletions

View File

@ -39,18 +39,18 @@ This section covers the following topics:
RAGFlow offers multiple chunking template to facilitate chunking files of different layouts and ensure semantic integrity. In **Chunk method**, you can choose the default template that suits the layouts and formats of your files. The following table shows the descriptions and the compatible file formats of each supported chunk template:
| **Template** | Description | File format |
|--------------|-----------------------------------------------------------------------|------------------------------------------------------------------------------|
| General | Files are consecutively chunked based on a preset chunk token number. | DOCX, XLSX, XLS (Excel97~2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV |
| Q&A | | XLSX, XLS (Excel97~2003), CSV/TXT |
| Manual | | PDF |
| Table | | XLSX, XLS (Excel97~2003), CSV/TXT |
| Paper | | PDF |
| Book | | DOCX, PDF, TXT |
| Laws | | DOCX, PDF, TXT |
| Presentation | | PDF, PPTX |
| Picture | | JPEG, JPG, PNG, TIF, GIF |
| One | The entire document is chunked as one. | DOCX, XLSX, XLS (Excel97~2003), PDF, TXT |
| **Template** | Description | File format |
|--------------|-----------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
| General | Files are consecutively chunked based on a preset chunk token number. | DOCX, XLSX, XLS (Excel97~2003), PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML |
| Q&A | | XLSX, XLS (Excel97~2003), CSV/TXT |
| Manual | | PDF |
| Table | | XLSX, XLS (Excel97~2003), CSV/TXT |
| Paper | | PDF |
| Book | | DOCX, PDF, TXT |
| Laws | | DOCX, PDF, TXT |
| Presentation | | PDF, PPTX |
| Picture | | JPEG, JPG, PNG, TIF, GIF |
| One | The entire document is chunked as one. | DOCX, XLSX, XLS (Excel97~2003), PDF, TXT |
You can also change a file's chunk method on the **Datasets** page.

View File

@ -13,6 +13,10 @@ Retrieval accuracy is the touchstone for a production-ready RAG framework. In ad
To use this feature, ensure you have at least one properly configured tag set, specify the tag set(s) on the **Configuration** page of your knowledge base (dataset), and then re-parse your documents to initiate the auto-tag process. During this process, each chunk in your dataset is compared with every entry in the specified tag set(s), and tags are automatically applied based on similarity.
:::danger IMPORTANT
The auto-tagging feature is *unavailable* on the [Infinity](https://github.com/infiniflow/infinity) document engine.
:::
## Scenarios
Auto-tagging applies in situations where chunks are so similar to each other that the intended chunks cannot be distinguished from the rest. For example, when you have a few chunks about iPhone and a majority about iPhone case or iPhone accessaries, it becomes difficult to retrieve the iPhone-specific chunks without additional information.