mirror of
https://github.com/infiniflow/ragflow.git
synced 2025-12-08 20:42:30 +08:00
Docs: How to accelerate question answering (#10179)
### What problem does this PR solve? ### Type of change - [x] Documentation Update
This commit is contained in:
@ -229,18 +229,4 @@ The global variable name for the output of the **Agent** component, which can be
|
||||
|
||||
### Why does it take so long for my Agent to respond?
|
||||
|
||||
An Agent’s response time generally depends on two key factors: the LLM’s capabilities and the prompt, the latter reflecting task complexity. When using an Agent, you should always balance task demands with the LLM’s ability. See [How to balance task complexity with an Agent's performance and speed?](#how-to-balance-task-complexity-with-an-agents-performance-and-speed) for details.
|
||||
|
||||
## Best practices
|
||||
|
||||
### How to balance task complexity with an Agent’s performance and speed?
|
||||
|
||||
- For simple tasks, such as retrieval, rewriting, formatting, or structured data extraction, use concise prompts, remove planning or reasoning instructions, enforce output length limits, and select smaller or Turbo-class models. This significantly reduces latency and cost with minimal impact on quality.
|
||||
|
||||
- For complex tasks, like multi-step reasoning, cross-document synthesis, or tool-based workflows, maintain or enhance prompts that include planning, reflection, and verification steps.
|
||||
|
||||
- In multi-Agent orchestration systems, delegate simple subtasks to sub-Agents using smaller, faster models, and reserve more powerful models for the lead Agent to handle complexity and uncertainty.
|
||||
|
||||
:::tip KEY INSIGHT
|
||||
Focus on minimizing output tokens — through summarization, bullet points, or explicit length limits — as this has far greater impact on reducing latency than optimizing input size.
|
||||
:::
|
||||
See [here](../best_practices/accelerate_agent_question_answering.md) for details.
|
||||
Reference in New Issue
Block a user