Docs: How to accelerate question answering (#10179)

### What problem does this PR solve? ### Type of change - [x] Documentation Update
2026-01-31 15:45:08 +08:00 · 2025-09-19 18:18:46 +08:00
parent 6c24ad7966
commit 3f1741c8c6
7 changed files with 89 additions and 23 deletions
--- a/docs/guides/agent/agent_component_reference/agent.mdx
+++ b/docs/guides/agent/agent_component_reference/agent.mdx
@ -229,18 +229,4 @@ The global variable name for the output of the **Agent** component, which can be

 ### Why does it take so long for my Agent to respond?

-An Agent’s response time generally depends on two key factors: the LLM’s capabilities and the prompt, the latter reflecting task complexity. When using an Agent, you should always balance task demands with the LLM’s ability. See [How to balance task complexity with an Agent's performance and speed?](#how-to-balance-task-complexity-with-an-agents-performance-and-speed) for details.
-
-## Best practices
-
-### How to balance task complexity with an Agent’s performance and speed?
-
- For simple tasks, such as retrieval, rewriting, formatting, or structured data extraction, use concise prompts, remove planning or reasoning instructions, enforce output length limits, and select smaller or Turbo-class models. This significantly reduces latency and cost with minimal impact on quality.
-
- For complex tasks, like multi-step reasoning, cross-document synthesis, or tool-based workflows, maintain or enhance prompts that include planning, reflection, and verification steps.
-
- In multi-Agent orchestration systems, delegate simple subtasks to sub-Agents using smaller, faster models, and reserve more powerful models for the lead Agent to handle complexity and uncertainty.
-
-:::tip KEY INSIGHT
-Focus on minimizing output tokens — through summarization, bullet points, or explicit length limits — as this has far greater impact on reducing latency than optimizing input size.
-:::
+See [here](../best_practices/accelerate_agent_question_answering.md) for details.