Files
ragflow/docs/guides/dataset/autokeyword_autoquestion.mdx
writinwaters 157cd8b1b0 Docs: Added auto-keyword auto-question guide (#8113)
### What problem does this PR solve?

### Type of change


- [x] Documentation Update
2025-06-06 19:27:41 +08:00

68 lines
4.3 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
sidebar_position: 3
slug: /autokeyword_autoquestion
---
# Auto-keyword Auto-question
import APITable from '@site/src/components/APITable';
Use a chat model to generate keywords and questions from the original chunks.
---
When selecting a chunking method, you can also enable auto-keyword or auto-question generation to increase retrieval rates. This feature uses a chat model to produce a specified number of keywords and questions from each created chunk, creating a layer of higher-level information from the original content.
:::tip NOTE
Enabling this feature increases document indexing time, as all created chunks will be sent to the chat model for keyword or question generation.
:::
- **Auto-keyword**
- **Definition:** The number of additional keywords the LLM generates for each chunk. By supplying synonyms for text that is unfriendly to tokenization or multilingual content, this improves recall for full-text or hybrid retrieval. It can also be used to correct bad cases. Disabling this can significantly accelerate parsing.
- **Common Values:**
- `0`: Disabled;
- `3`-`5` = Recommended (if a chunk has over a thousand characters, more keywords may be needed);
- Maximum `30`. Note that, as the number increases, the marginal benefit decreases.
- **Auto-question**
- **Definition:** Generates potential FAQ-style questions for each chunk, making retrieval matches more aligned with real user queries (Who/What/Why).
- **Common Values:**
- `0` = disabled;
- `12` = commonly used (if a chunk has thousands of characters, more may be needed);
- Upper limit `30` (to avoid generating too many at once). Can also be used to correct bad cases.
- **Typical Use Cases:** Scenarios requiring FAQ retrieval, such as product manuals, policy documents, etc.
## Configuration
On the **Configuration** page of your knowledge base, you will find the Auto-keyword and Auto-question sliders under **Page rank**.
:::tip NOTE
The Auto-keyword or Auto-question value must be an integer. If you set their value to a non-integer, say 1.7, it will be rounded down to the nearest integer, which in this case is 1.
:::
## Best practices
If you are uncertain how to set auto-keyword or auto-question values, here are some best practices gathered from our community:
```mdx-code-block
<APITable>
```
| Use cases or typical scenarios | Document volume/length | Auto_keyword (030) | Auto_question (030) |
|---------------------------------------------------------------------|---------------------------------|----------------------------|----------------------------|
| 1. Internal Process Guidance for Employee Handbook | Small, under 10 pages | 0 | 0 |
| 2. Customer Service FAQ Hot Questions | Medium, 10100 pages | 37 | 13 |
| 3. Technical Whitepapers: Development Standards, Protocol Explanations | Large, over 100 pages | 24 | 12 |
| 4. Contracts / Regulations / Legal Clause Retrieval | Large, over 50 pages | 25 | 01 |
| 5. Multi-repository Layered New Documents + Old Archive | Many | Adjust as appropriate |Adjust as appropriate |
| 6. Social Media Comment Pool: Multilingual & Mixed Spelling | Very large volume of short text | 812 | 0 |
| 7. Operational Logs for DevOps Troubleshooting | Very large volume of short text | 36 | 0 |
| 8. Marketing Asset Library: Multilingual Product Descriptions | Medium | 610 | 12 |
| 9. Training Courseware / eBooks | Large | 25 | 12 |
| 10. Maintenance Manual: Equipment Diagrams + Steps | Medium | 37 | 12 |
```mdx-code-block
</APITable>
```