Docs: How to accelerate question answering (#10179)

### What problem does this PR solve?


### Type of change

- [x] Documentation Update
This commit is contained in:
writinwaters
2025-09-19 18:18:46 +08:00
committed by GitHub
parent 6c24ad7966
commit 3f1741c8c6
7 changed files with 89 additions and 23 deletions

View File

@ -229,18 +229,4 @@ The global variable name for the output of the **Agent** component, which can be
### Why does it take so long for my Agent to respond?
An Agents response time generally depends on two key factors: the LLMs capabilities and the prompt, the latter reflecting task complexity. When using an Agent, you should always balance task demands with the LLMs ability. See [How to balance task complexity with an Agent's performance and speed?](#how-to-balance-task-complexity-with-an-agents-performance-and-speed) for details.
## Best practices
### How to balance task complexity with an Agents performance and speed?
- For simple tasks, such as retrieval, rewriting, formatting, or structured data extraction, use concise prompts, remove planning or reasoning instructions, enforce output length limits, and select smaller or Turbo-class models. This significantly reduces latency and cost with minimal impact on quality.
- For complex tasks, like multi-step reasoning, cross-document synthesis, or tool-based workflows, maintain or enhance prompts that include planning, reflection, and verification steps.
- In multi-Agent orchestration systems, delegate simple subtasks to sub-Agents using smaller, faster models, and reserve more powerful models for the lead Agent to handle complexity and uncertainty.
:::tip KEY INSIGHT
Focus on minimizing output tokens — through summarization, bullet points, or explicit length limits — as this has far greater impact on reducing latency than optimizing input size.
:::
See [here](../best_practices/accelerate_agent_question_answering.md) for details.