From 2471a6e115de34f1ea6e1ccb657802751427cbae Mon Sep 17 00:00:00 2001 From: writinwaters <93570324+writinwaters@users.noreply.github.com> Date: Wed, 2 Apr 2025 13:56:55 +0800 Subject: [PATCH] Updated max_tokens descriptions (#6751) ### What problem does this PR solve? #6721 ### Type of change - [x] Documentation Update --- .../agent_component_reference/categorize.mdx | 5 +--- .../agent_component_reference/keyword.mdx | 5 +--- .../agent_component_reference/rewrite.mdx | 5 +--- docs/guides/chat/start_chat.md | 23 +++++++++++++++---- docs/guides/team/join_or_leave_team.md | 2 +- docs/release_notes.md | 7 ++++++ 6 files changed, 30 insertions(+), 17 deletions(-) diff --git a/docs/guides/agent/agent_component_reference/categorize.mdx b/docs/guides/agent/agent_component_reference/categorize.mdx index a034e4622..095132515 100644 --- a/docs/guides/agent/agent_component_reference/categorize.mdx +++ b/docs/guides/agent/agent_component_reference/categorize.mdx @@ -33,7 +33,7 @@ Click the dropdown menu of **Model** to show the model configuration window. - **Model**: The chat model to use. - Ensure you set the chat model correctly on the **Model providers** page. - You can use different models for different components to increase flexibility or improve overall performance. -- **Preset configurations**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. +- **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. This parameter has three options: - **Improvise**: Produces more creative responses. - **Precise**: (Default) Produces more conservative responses. @@ -52,9 +52,6 @@ Click the dropdown menu of **Model** to show the model configuration window. - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. - Defaults to 0.7. -- **Max tokens**: Sets the maximum length of the model's output, measured in the number of tokens. - - Defaults to 512. - - If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. :::tip NOTE - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. diff --git a/docs/guides/agent/agent_component_reference/keyword.mdx b/docs/guides/agent/agent_component_reference/keyword.mdx index 0b0c55504..38161f704 100644 --- a/docs/guides/agent/agent_component_reference/keyword.mdx +++ b/docs/guides/agent/agent_component_reference/keyword.mdx @@ -34,7 +34,7 @@ Click the dropdown menu of **Model** to show the model configuration window. - **Model**: The chat model to use. - Ensure you set the chat model correctly on the **Model providers** page. - You can use different models for different components to increase flexibility or improve overall performance. -- **Preset configurations**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. +- **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. This parameter has three options: - **Improvise**: Produces more creative responses. - **Precise**: (Default) Produces more conservative responses. @@ -53,9 +53,6 @@ Click the dropdown menu of **Model** to show the model configuration window. - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. - Defaults to 0.7. -- **Max tokens**: Sets the maximum length of the model's output, measured in the number of tokens. - - Defaults to 512. - - If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. :::tip NOTE - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. diff --git a/docs/guides/agent/agent_component_reference/rewrite.mdx b/docs/guides/agent/agent_component_reference/rewrite.mdx index b89c5a9a7..283432ade 100644 --- a/docs/guides/agent/agent_component_reference/rewrite.mdx +++ b/docs/guides/agent/agent_component_reference/rewrite.mdx @@ -32,7 +32,7 @@ Click the dropdown menu of **Model** to show the model configuration window. - **Model**: The chat model to use. - Ensure you set the chat model correctly on the **Model providers** page. - You can use different models for different components to increase flexibility or improve overall performance. -- **Preset configurations**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. +- **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. This parameter has three options: - **Improvise**: Produces more creative responses. - **Precise**: (Default) Produces more conservative responses. @@ -51,9 +51,6 @@ Click the dropdown menu of **Model** to show the model configuration window. - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. - Defaults to 0.7. -- **Max tokens**: Sets the maximum length of the model's output, measured in the number of tokens. - - Defaults to 512. - - If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. :::tip NOTE - It is not necessary to stick with the same model for all components. If a specific model is not performing well for a particular task, consider using a different one. diff --git a/docs/guides/chat/start_chat.md b/docs/guides/chat/start_chat.md index 0a5bc23eb..1711fb5a3 100644 --- a/docs/guides/chat/start_chat.md +++ b/docs/guides/chat/start_chat.md @@ -48,10 +48,25 @@ You start an AI conversation by creating an assistant. 4. Update **Model Setting**: - In **Model**: you select the chat model. Though you have selected the default chat model in **System Model Settings**, RAGFlow allows you to choose an alternative chat model for your dialogue. - - **Preset configurations** refers to the level that the LLM improvises. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. - - **Temperature**: Level of the prediction randomness of the LLM. The higher the value, the more creative the LLM is. - - **Top P** is also known as "nucleus sampling". See [here](https://en.wikipedia.org/wiki/Top-p_sampling) for more information. - - **Max Tokens**: The maximum length of the LLM's responses. Note that the responses may be curtailed if this value is set too low. + - **Freedom**: A shortcut to **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty** settings, indicating the freedom level of the model. From **Improvise**, **Precise**, to **Balance**, each preset configuration corresponds to a unique combination of **Temperature**, **Top P**, **Presence penalty**, and **Frequency penalty**. + This parameter has three options: + - **Improvise**: Produces more creative responses. + - **Precise**: (Default) Produces more conservative responses. + - **Balance**: A middle ground between **Improvise** and **Precise**. + - **Temperature**: The randomness level of the model's output. + Defaults to 0.1. + - Lower values lead to more deterministic and predictable outputs. + - Higher values lead to more creative and varied outputs. + - A temperature of zero results in the same output for the same prompt. + - **Top P**: Nucleus sampling. + - Reduces the likelihood of generating repetitive or unnatural text by setting a threshold *P* and restricting the sampling to tokens with a cumulative probability exceeding *P*. + - Defaults to 0.3. + - **Presence penalty**: Encourages the model to include a more diverse range of tokens in the response. + - A higher **presence penalty** value results in the model being more likely to generate tokens not yet been included in the generated text. + - Defaults to 0.4. + - **Frequency penalty**: Discourages the model from repeating the same words or phrases too frequently in the generated text. + - A higher **frequency penalty** value results in the model being more conservative in its use of repeated tokens. + - Defaults to 0.7. 5. Now, let's start the show: diff --git a/docs/guides/team/join_or_leave_team.md b/docs/guides/team/join_or_leave_team.md index 9bb0d4100..f2aeebb3b 100644 --- a/docs/guides/team/join_or_leave_team.md +++ b/docs/guides/team/join_or_leave_team.md @@ -39,4 +39,4 @@ _After accepting the team invite, you should be able to view and update the team ## Leave a joined team -![Image](https://github.com/user-attachments/assets/4e4c6971-131b-490b-85d8-b362e0811b86) \ No newline at end of file +![quit](https://github.com/user-attachments/assets/a9d812a9-382d-4913-83b9-d72cb5e7c953) \ No newline at end of file diff --git a/docs/release_notes.md b/docs/release_notes.md index ea9ee5e0d..454fdfbcc 100644 --- a/docs/release_notes.md +++ b/docs/release_notes.md @@ -11,6 +11,13 @@ Key features, improvements and bug fixes in the latest releases. Released on March 13, 2025. +### Compatibility changes + +- Removes the **Max_tokens** setting from **Chat configuration**. +- Removes the **Max_tokens** setting from **Generate**, **Rewrite**, **Categorize**, **Keyword** agent components. + +From this release onwards, if you still see RAGFlow's responses being cut short or truncated, check the **Max_tokens** setting of your model provider. + ### Improvements - Adds OpenAI-compatible APIs.