Refactor: enhance next_step prompt (#12117)

### What problem does this PR solve? change: enhance next_step prompt ### Type of change - [x] Refactoring
2026-02-08 11:35:04 +08:00 · 2025-12-23 15:57:55 +08:00
parent 00bb6fbd28
commit 2369be7244
1 changed files with 54 additions and 8 deletions
--- a/rag/prompts/next_step.md
+++ b/rag/prompts/next_step.md
@ -8,9 +8,9 @@ Your job is:
 {{ task_analysis }}

 # ==========  TOOLS (JSON-Schema) ==========
-You may invoke only the tools listed below.  
-Return a JSON array of objects in which item is with exactly two top-level keys:  
-• "name": the tool to call  
+You may invoke only the tools listed below.
+Return a JSON array of objects in which item is with exactly two top-level keys:
+• "name": the tool to call
 • "arguments": an object whose keys/values satisfy the schema

 {{ desc }}
@ -82,11 +82,57 @@ If you encounter issues:

 ⚠️ Any output that is not valid JSON or that contains extra fields will be rejected.

-# ==========  REASONING & REFLECTION ==========
-You may think privately (not shown to the user) before producing each JSON object.  
-Internal guideline:
-1. **Reason**: Analyse the user question; decide which tools (if any) are needed.
-2. **Act**: Emit the JSON object to call the tool.
+# ========== PRIVATE REASONING & REFLECTION ==========
+You may think privately inside `<think>` tags.
+This content will NOT be shown to the user.
+
+## Step 1: Core Reasoning
+- Analyze the task requirements
+- Decide whether tools are required
+- Decide if parallel execution is appropriate
+
+## Step 2: Structured Reflection (MANDATORY before `complete_task`)
+
+### Context
+- Goal: {{ task_analysis }}
+- Executed tool calls so far (if any): reflect from conversation history
+
+### Task Complexity Assessment
+Evaluate the task along these dimensions:
+
+- Scope Breadth: Single-step (1) | Multi-step (2) | Multi-domain (3)
+- Data Dependency: Self-contained (1) | External inputs (2) | Multiple sources (3)
+- Decision Points: Linear (1) | Few branches (2) | Complex logic (3)
+- Risk Level: Low (1) | Medium (2) | High (3)
+
+Compute the **Complexity Score (4–12)**.
+
+### Reflection Depth Control
+- 4–5: Brief sanity check
+- 6–8: Check completeness + risks
+- 9–12: Full reflection with alternatives
+
+### Reflection Checklist
+- Goal alignment: Is the objective truly satisfied?
+- Step completion: Any planned step missing?
+- Information adequacy: Is evidence sufficient?
+- Errors or uncertainty: Any low-confidence result?
+- Tool misuse risk: Wrong tool / missing tool?
+
+### Decision Gate
+Ask yourself explicitly:
+> “If I stop now and call `complete_task`, would a downstream agent or user reasonably say something is missing or wrong?”
+
+If YES → continue with tools
+If NO → safe to call `complete_task`
+
+---
+
+# ========== FINAL ACTION ==========
+After reflection, emit ONLY ONE of the following:
+- A JSON array of tool calls
+- OR a single `complete_task` call
+

 Today is {{ today }}. Remember that success in answering questions accurately is paramount - take all necessary steps to ensure your answer is correct.