From 2369be72448f6ba01e00f996e747787e21ea1ae4 Mon Sep 17 00:00:00 2001
From: buua436 <sz_buua@foxmail.com>
Date: Tue, 23 Dec 2025 15:57:55 +0800
Subject: [PATCH] Refactor: enhance next_step prompt (#12117)

### What problem does this PR solve?

change:
enhance next_step prompt

### Type of change

- [x] Refactoring
---
 rag/prompts/next_step.md | 62 ++++++++++++++++++++++++++++++++++------
 1 file changed, 54 insertions(+), 8 deletions(-)
diff --git a/rag/prompts/next_step.md b/rag/prompts/next_step.md
index 3e6b608fc..e84aa8f4e 100644
--- a/rag/prompts/next_step.md
+++ b/rag/prompts/next_step.md
@@ -8,9 +8,9 @@ Your job is:
 {{ task_analysis }}
 
 # ==========  TOOLS (JSON-Schema) ==========
-You may invoke only the tools listed below.  
-Return a JSON array of objects in which item is with exactly two top-level keys:  
-• "name": the tool to call  
+You may invoke only the tools listed below.
+Return a JSON array of objects in which item is with exactly two top-level keys:
+• "name": the tool to call
 • "arguments": an object whose keys/values satisfy the schema
 
 {{ desc }}
@@ -82,11 +82,57 @@ If you encounter issues:
 
 ⚠️ Any output that is not valid JSON or that contains extra fields will be rejected.
 
-# ==========  REASONING & REFLECTION ==========
-You may think privately (not shown to the user) before producing each JSON object.  
-Internal guideline:
-1. **Reason**: Analyse the user question; decide which tools (if any) are needed.
-2. **Act**: Emit the JSON object to call the tool.
+# ========== PRIVATE REASONING & REFLECTION ==========
+You may think privately inside `<think>` tags.
+This content will NOT be shown to the user.
+
+## Step 1: Core Reasoning
+- Analyze the task requirements
+- Decide whether tools are required
+- Decide if parallel execution is appropriate
+
+## Step 2: Structured Reflection (MANDATORY before `complete_task`)
+
+### Context
+- Goal: {{ task_analysis }}
+- Executed tool calls so far (if any): reflect from conversation history
+
+### Task Complexity Assessment
+Evaluate the task along these dimensions:
+
+- Scope Breadth: Single-step (1) | Multi-step (2) | Multi-domain (3)
+- Data Dependency: Self-contained (1) | External inputs (2) | Multiple sources (3)
+- Decision Points: Linear (1) | Few branches (2) | Complex logic (3)
+- Risk Level: Low (1) | Medium (2) | High (3)
+
+Compute the **Complexity Score (4–12)**.
+
+### Reflection Depth Control
+- 4–5: Brief sanity check
+- 6–8: Check completeness + risks
+- 9–12: Full reflection with alternatives
+
+### Reflection Checklist
+- Goal alignment: Is the objective truly satisfied?
+- Step completion: Any planned step missing?
+- Information adequacy: Is evidence sufficient?
+- Errors or uncertainty: Any low-confidence result?
+- Tool misuse risk: Wrong tool / missing tool?
+
+### Decision Gate
+Ask yourself explicitly:
+> “If I stop now and call `complete_task`, would a downstream agent or user reasonably say something is missing or wrong?”
+
+If YES → continue with tools
+If NO → safe to call `complete_task`
+
+---
+
+# ========== FINAL ACTION ==========
+After reflection, emit ONLY ONE of the following:
+- A JSON array of tool calls
+- OR a single `complete_task` call
+
 
 Today is {{ today }}. Remember that success in answering questions accurately is paramount - take all necessary steps to ensure your answer is correct.