0321 chunkmethods (#6520)

### What problem does this PR solve?

#6061 

### Type of change


- [x] Documentation Update
This commit is contained in:
writinwaters
2025-03-26 09:03:18 +08:00
committed by GitHub
parent bf483fdf02
commit d17970ebd0
14 changed files with 73 additions and 61 deletions

View File

@ -50,6 +50,10 @@ If your agent's **Begin** component takes a variable, you *cannot* embed it into
- **boolean**: Requires the user to toggle between on and off.
- **Optional**: A toggle indicating whether the variable is optional.
:::danger IMPORTAN
If you set the key type as **file**, ensure the token count of the uploaded file does not exceed your model provider's maximum token limit; otherwise, the plain text in your file will be truncated and incomplete.
:::
## Examples
As mentioned earlier, the **Begin** component is indispensable for an agent. Still, you can take a look at our three-step interpreter agent template, where the **Begin** component takes two global variables:
@ -64,7 +68,7 @@ As mentioned earlier, the **Begin** component is indispensable for an agent. Sti
### Is the uploaded file in a knowledge base?
No. Files uploaded to an agent as input are not stored in a knowledge base and will not be chunked using RAGFlow's built-in chunk methods. However, RAGFlow's built-in OSR, DLR, and TSR models will still be applied to process the document.
No. Files uploaded to an agent as input are not stored in a knowledge base and hence will not be processed using RAGFlow's built-in OCR, DLR or TSR models, or chunked using RAGFlow's built-in chunk methods.
### How to upload a webpage or file from a URL?
@ -74,5 +78,8 @@ If you set the type of a variable as **file**, your users will be able to upload
### File size limit for an uploaded file
The maximum file size for each uploaded file is determined by the variable `MAX_CONTENT_LENGTH` in `/docker/.env`. It defaults to 128 MB. If you change the default file size, ensure you also update the value of `client_max_body_size` in `/docker/nginx/nginx.conf` accordingly.
There is no *specific* file size limit for a file uploaded to an agent. However, note that model providers typically have a default or explicit maximum token setting, which can range from 8196 to 128k: The plain text part of the uploaded file will be passed in as the key value, but if the file's token count exceeds this limit, the string will be truncated and incomplete.
:::tip NOTE
The variables `MAX_CONTENT_LENGTH` in `/docker/.env` and `client_max_body_size` in `/docker/nginx/nginx.conf` set the file size limit for each upload to a knowledge base or **File Management**. These settings DO NOT apply in this scenario.
:::

View File

@ -56,7 +56,11 @@ Click the dropdown menu of **Model** to show the model configuration window.
Typically, you use the system prompt to describe the task for the LLM, specify how it should respond, and outline other miscellaneous requirements. We do not plan to elaborate on this topic, as it can be as extensive as prompt engineering. However, please be aware that the system prompt is often used in conjunction with keys (variables), which serve as various data inputs for the LLM.
Keys in a system prompt should be enclosed in curly braces. Below is a prompt excerpt of a **Generate** component from the **Interpreter** template (component ID: **Reflect**):
:::danger IMPORTANT
A **Generate** component relies on keys (variables) to specify its data inputs. Its immediate upstream component is *not* necessarily its data input, and the arrows in the workflow indicate *only* the processing sequence. Keys in a **Generate** component are used in conjunction with the system prompt to specify data inputs for the LLM. Use a forward slash `/` or the **(x)** button to show the keys to use.
:::
Below is a prompt excerpt of a **Generate** component from the **Interpreter** template (component ID: **Reflect**):
```text
Your task is to read a source text and a translation to {target_lang}, and give constructive suggestions to improve the translation. The source text and initial translation, delimited by XML tags <SOURCE_TEXT></SOURCE_TEXT> and <TRANSLATION></TRANSLATION>, are as follows:
@ -76,11 +80,6 @@ When writing suggestions, pay attention to whether there are ways to improve the
Where `{source_text}` and `{target_lang}` are global variables defined by the **Begin** component, while `{translation_1}` is the output of another **Generate** component with the component ID **Translate directly**.
:::danger IMPORTANT
A **Generate** component relies on keys (variables) to specify its data inputs. Its immediate upstream component is *not* necessarily its data input, and the arrows in the workflow indicate *only* the processing sequence. Keys in a **Generate** component are used in conjunction with the system prompt to specify data inputs for the LLM. Use a forward slash `/` to show the keys to use.
:::
### Cite
This toggle sets whether to cite the original text as reference.

View File

@ -68,6 +68,10 @@ In a knowledge graph, a community is a cluster of entities linked by relationshi
_A **Knowledge graph** entry appears under **Configuration** once a knowledge graph is created._
3. Click **Knowledge graph** to view the details of the generated graph.
4. To use the created knowledge graph, do either of the following:
- In your **Chat Configuration** dialogue, click the **Assistant Setting** tab to add the corresponding knowledge base(s) and click the **Prompt Engine** tab to switch on the **Use knowledge graph** toggle.
- If you are using an agent, click the **Retrieval** agent component to specify the knowledge base(s) and switch on the **Use knowledge graph** toggle.
## Frequently asked questions

View File

@ -9,6 +9,22 @@ A complete reference for RAGFlow's RESTful API. Before proceeding, please ensure
---
## ERROR CODES
---
| Code | Message | Description |
|------|-----------------------|----------------------------|
| 400 | Bad Request | Invalid request parameters |
| 401 | Unauthorized | Unauthorized access |
| 403 | Forbidden | Access denied |
| 404 | Not Found | Resource not found |
| 500 | Internal Server Error | Server internal error |
| 1001 | Invalid Chunk ID | Invalid Chunk ID |
| 1002 | Chunk Update Failed | Chunk update failed |
---
## OpenAI-Compatible API
---
@ -531,24 +547,6 @@ Failure:
---
## Error Codes
---
| Code | Message | Description |
| ---- | --------------------- | -------------------------- |
| 400 | Bad Request | Invalid request parameters |
| 401 | Unauthorized | Unauthorized access |
| 403 | Forbidden | Access denied |
| 404 | Not Found | Resource not found |
| 500 | Internal Server Error | Server internal error |
| 1001 | Invalid Chunk ID | Invalid Chunk ID |
| 1002 | Chunk Update Failed | Chunk update failed |
---
---
## FILE MANAGEMENT WITHIN DATASET
---
@ -1771,7 +1769,7 @@ Lists chat assistants.
#### Request
- Method: GET
- URL: `/api/v1/chats?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}`
- URL: `/api/v1/chats?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={chat_name}&id={chat_id}`
- Headers:
- `'Authorization: Bearer <YOUR_API_KEY>'`
@ -1779,7 +1777,7 @@ Lists chat assistants.
```bash
curl --request GET \
--url http://{address}/api/v1/chats?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id} \
--url http://{address}/api/v1/chats?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={chat_name}&id={chat_id} \
--header 'Authorization: Bearer <YOUR_API_KEY>'
```

View File

@ -18,6 +18,22 @@ pip install ragflow-sdk
---
## ERROR CODES
---
| Code | Message | Description |
|------|----------------------|-----------------------------|
| 400 | Bad Request | Invalid request parameters |
| 401 | Unauthorized | Unauthorized access |
| 403 | Forbidden | Access denied |
| 404 | Not Found | Resource not found |
| 500 | Internal Server Error| Server internal error |
| 1001 | Invalid Chunk ID | Invalid Chunk ID |
| 1002 | Chunk Update Failed | Chunk update failed |
---
## OpenAI-Compatible API
---
@ -317,23 +333,6 @@ dataset = rag_object.list_datasets(name="kb_name")
dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "chunk_method":"manual"})
```
---
## Error Codes
---
| Code | Message | Description |
|------|---------|-------------|
| 400 | Bad Request | Invalid request parameters |
| 401 | Unauthorized | Unauthorized access |
| 403 | Forbidden | Access denied |
| 404 | Not Found | Resource not found |
| 500 | Internal Server Error | Server internal error |
| 1001 | Invalid Chunk ID | Invalid Chunk ID |
| 1002 | Chunk Update Failed | Chunk update failed |
---
## FILE MANAGEMENT WITHIN DATASET