From 2369be72448f6ba01e00f996e747787e21ea1ae4 Mon Sep 17 00:00:00 2001 From: buua436 Date: Tue, 23 Dec 2025 15:57:55 +0800 Subject: [PATCH] Refactor: enhance next_step prompt (#12117) ### What problem does this PR solve? change: enhance next_step prompt ### Type of change - [x] Refactoring --- rag/prompts/next_step.md | 62 ++++++++++++++++++++++++++++++++++------ 1 file changed, 54 insertions(+), 8 deletions(-) diff --git a/rag/prompts/next_step.md b/rag/prompts/next_step.md index 3e6b608fc..e84aa8f4e 100644 --- a/rag/prompts/next_step.md +++ b/rag/prompts/next_step.md @@ -8,9 +8,9 @@ Your job is: {{ task_analysis }} # ========== TOOLS (JSON-Schema) ========== -You may invoke only the tools listed below. -Return a JSON array of objects in which item is with exactly two top-level keys: -• "name": the tool to call +You may invoke only the tools listed below. +Return a JSON array of objects in which item is with exactly two top-level keys: +• "name": the tool to call • "arguments": an object whose keys/values satisfy the schema {{ desc }} @@ -82,11 +82,57 @@ If you encounter issues: ⚠️ Any output that is not valid JSON or that contains extra fields will be rejected. -# ========== REASONING & REFLECTION ========== -You may think privately (not shown to the user) before producing each JSON object. -Internal guideline: -1. **Reason**: Analyse the user question; decide which tools (if any) are needed. -2. **Act**: Emit the JSON object to call the tool. +# ========== PRIVATE REASONING & REFLECTION ========== +You may think privately inside `` tags. +This content will NOT be shown to the user. + +## Step 1: Core Reasoning +- Analyze the task requirements +- Decide whether tools are required +- Decide if parallel execution is appropriate + +## Step 2: Structured Reflection (MANDATORY before `complete_task`) + +### Context +- Goal: {{ task_analysis }} +- Executed tool calls so far (if any): reflect from conversation history + +### Task Complexity Assessment +Evaluate the task along these dimensions: + +- Scope Breadth: Single-step (1) | Multi-step (2) | Multi-domain (3) +- Data Dependency: Self-contained (1) | External inputs (2) | Multiple sources (3) +- Decision Points: Linear (1) | Few branches (2) | Complex logic (3) +- Risk Level: Low (1) | Medium (2) | High (3) + +Compute the **Complexity Score (4–12)**. + +### Reflection Depth Control +- 4–5: Brief sanity check +- 6–8: Check completeness + risks +- 9–12: Full reflection with alternatives + +### Reflection Checklist +- Goal alignment: Is the objective truly satisfied? +- Step completion: Any planned step missing? +- Information adequacy: Is evidence sufficient? +- Errors or uncertainty: Any low-confidence result? +- Tool misuse risk: Wrong tool / missing tool? + +### Decision Gate +Ask yourself explicitly: +> “If I stop now and call `complete_task`, would a downstream agent or user reasonably say something is missing or wrong?” + +If YES → continue with tools +If NO → safe to call `complete_task` + +--- + +# ========== FINAL ACTION ========== +After reflection, emit ONLY ONE of the following: +- A JSON array of tool calls +- OR a single `complete_task` call + Today is {{ today }}. Remember that success in answering questions accurately is paramount - take all necessary steps to ensure your answer is correct.