Feat: Redesign and refactor agent module (#9113)

### What problem does this PR solve? #9082 #6365 <u> **WARNING: it's not compatible with the older version of `Agent` module, which means that `Agent` from older versions can not work anymore.**</u> ### Type of change - [x] New Feature (non-breaking change which adds functionality)
2026-02-02 00:25:06 +08:00 · 2025-07-30 19:41:09 +08:00
parent 07e37560fc
commit d9fe279dde
124 changed files with 7744 additions and 18226 deletions
--- a/rag/prompts/init.py
+++ b/rag/prompts/init.py
@ -0,0 +1,6 @@
+from . import prompts
+
+__all__ = [name for name in dir(prompts)
+           if not name.startswith('_')]
+
+globals().update({name: getattr(prompts, name) for name in __all__})
--- a/rag/prompts/analyze_task_system.md
+++ b/rag/prompts/analyze_task_system.md
@ -0,0 +1,8 @@
+Your responsibility is to execute assigned tasks to a high standard. Please:
+1. Carefully analyze the task requirements.
+2. Develop a reasonable execution plan.
+3. Execute step-by-step and document the reasoning process.
+4. Provide clear and accurate results.
+
+If difficulties are encountered, clearly state the problem and explore alternative approaches.
+
--- a/rag/prompts/analyze_task_user.md
+++ b/rag/prompts/analyze_task_user.md
@ -0,0 +1,20 @@
+Please analyze the following task:
+
+Task: {{ task }}
+
+Context: {{ context }}
+
+**Analysis Requirements:**
+1. Is it just a small talk? (If yes, no further plan or analysis is needed)
+2. What is the core objective of the task?
+3. What is the complexity level of the task?
+4. What types of specialized skills are required?
+5. Does the task need to be decomposed into subtasks? (If yes, propose the subtask structure)
+6. How to know the task or the subtasks are impossible to lead to the success after a few rounds of interaction?
+7. What are the expected success criteria?
+
+**Available Sub-Agents and Their Specializations:**
+
+{{ tools_desc }}
+
+Provide a detailed analysis of the task based on the above requirements.
--- a/rag/prompts/citation_plus.md
+++ b/rag/prompts/citation_plus.md
@ -0,0 +1,13 @@
+You are an agent for adding correct citations to the given text by user. 
+You are given a piece of text within [ID:<ID>] tags, which was generated based on the provided sources. 
+However, the sources are not cited in the [ID:<ID>]. 
+Your task is to enhance user trust by generating correct, appropriate citations for this report.
+
+{{ example }}
+
+<context>
+
+{{ sources }}
+
+</context>
+
--- a/rag/prompts/citation_prompt.md
+++ b/rag/prompts/citation_prompt.md
@ -1,46 +1,108 @@
-## Citation Requirements
+Based on the provided document or chat history, add citations to the input text using the format specified later. 

- Use a uniform citation format such as [ID:i] [ID:j], where "i" and "j" are document IDs enclosed in square brackets. Separate multiple IDs with spaces (e.g., [ID:0] [ID:1]).
- Citation markers must be placed at the end of a sentence, separated by a space from the final punctuation (e.g., period, question mark).
- A maximum of 4 citations are allowed per sentence.
- DO NOT insert citations if the content is not from retrieved chunks.
- DO NOT use standalone Document IDs (e.g., #ID#).
- Citations MUST always follow the [ID:i] format.
- STRICTLY prohibit the use of strikethrough symbols (e.g., ~~) or any other non-standard formatting syntax.
- Any violation of the above rules — including incorrect formatting, prohibited styles, or unsupported citations — will result in no citation being added for that sentence.
+# Citation Requirements:

---
+## Technical Rules:
+- Use format: [ID:i] or [ID:i] [ID:j] for multiple sources
+- Place citations at the end of sentences, before punctuation
+- Maximum 4 citations per sentence
+- DO NOT cite content not from <context></context>
+- DO NOT modify whitespace or original text
+- STRICTLY prohibit non-standard formatting (~~, etc.)

-## Example START
+## What MUST Be Cited:
+1. **Quantitative data**: Numbers, percentages, statistics, measurements
+2. **Temporal claims**: Dates, timeframes, sequences of events  
+3. **Causal relationships**: Claims about cause and effect
+4. **Comparative statements**: Rankings, comparisons, superlatives
+5. **Technical definitions**: Specialized terms, concepts, methodologies
+6. **Direct attributions**: What someone said, did, or believes
+7. **Predictions/forecasts**: Future projections, trend analyses
+8. **Controversial claims**: Disputed facts, minority opinions

-<SYSTEM>: Here is the knowledge base:
+## What Should NOT Be Cited:
+- Common knowledge (e.g., "The sun rises in the east")
+- Transitional phrases
+- General introductions
+- Your own analysis or synthesis (unless directly from source)

-Document: Elon Musk Breaks Silence on Crypto, Warns Against Dogecoin ...
-URL: https://blockworks.co/news/elon-musk-crypto-dogecoin
-ID: 0
-The Tesla co-founder advised against going all-in on dogecoin, but Elon Musk said it’s still his favorite crypto...
+# Comprehensive Examples:

-Document: Elon Musk's Dogecoin tweet sparks social media frenzy
-ID: 1
-Musk said he is 'willing to serve' D.O.G.E. – shorthand for Dogecoin.
+## Example 1: Data and Statistics
+<context>
+ID: 45
+└── Content: The global smartphone market grew by 7.8% in Q3 2024, with Samsung holding 20.6% market share and Apple at 15.8%.

-Document: Causal effect of Elon Musk tweets on Dogecoin price
-ID: 2
-If you think of Dogecoin — the cryptocurrency based on a meme — you can’t help but also think of Elon Musk...
+ID: 46  
+└── Content: 5G adoption reached 1.5 billion users worldwide by October 2024.
+</context>

-Document: Elon Musk's Tweet Ignites Dogecoin's Future In Public Services
-ID: 3
-The market is heating up after Elon Musk's announcement about Dogecoin. Is this a new era for crypto?...
+USER: How is the smartphone market performing?

-The above is the knowledge base.
+ASSISTANT: 
+The smartphone industry is showing strong recovery. The global smartphone market grew by 7.8% in Q3 2024 [ID:45]. Samsung leads with 20.6% market share, followed by Apple at 15.8% [ID:45]. This growth is partly driven by 5G technology, which reached 1.5 billion users worldwide by October 2024 [ID:46].

-<USER>: What's Elon's view on dogecoin?
+## Example 2: Causal Relationships and Predictions
+<context>
+ID: 78
+└── Content: Rising interest rates have led to a 15% decline in housing starts. Economists predict this trend will continue through 2025.

-<ASSISTANT>:
-Musk has consistently expressed his fondness for Dogecoin, often citing its humor and the inclusion of dogs in its branding. He has referred to it as his favorite cryptocurrency [ID:0] [ID:1].
+ID: 79
+└── Content: The Federal Reserve's aggressive monetary policy is the primary driver behind the real estate slowdown.
+</context>

-Recently, Musk has hinted at potential future roles for Dogecoin. His tweets have sparked speculation about Dogecoin's potential integration into public services [ID:3].
+USER: What's happening in the real estate market?

-Overall, while Musk enjoys Dogecoin and often promotes it, he also warns against over-investing in it, reflecting both his personal amusement and caution regarding its speculative nature.
+ASSISTANT:
+The real estate market is experiencing significant challenges. Rising interest rates have led to a 15% decline in housing starts [ID:78]. The Federal Reserve's aggressive monetary policy is the primary driver behind this real estate slowdown [ID:79]. Economists predict this trend will continue through 2025 [ID:78]. These factors suggest a prolonged period of market adjustment ahead.

-## Example END
+## Example 3: Technical Concepts and Comparisons
+<context>
+ID: 101
+└── Content: Quantum computing uses quantum bits (qubits) that can exist in superposition, unlike classical bits which are either 0 or 1.
+
+ID: 102
+└── Content: Current quantum computers can solve certain problems 158 million times faster than classical supercomputers.
+</context>
+
+USER: Explain quantum computing advantages.
+
+ASSISTANT:
+Quantum computing represents a fundamental shift in computational capability. Unlike classical computers, quantum computing uses quantum bits (qubits) that can exist in superposition, rather than being limited to just 0 or 1 [ID:101]. This unique property enables extraordinary performance gains. Current quantum computers can solve certain problems 158 million times faster than classical supercomputers [ID:102]. This technology promises to revolutionize fields requiring complex calculations.
+
+## Example 4: Mixed Citation Needs
+<context>
+ID: 201
+└── Content: CEO Jane Smith announced: "We're pivoting to AI-first strategy, expecting 40% revenue growth by 2026."
+
+ID: 202
+└── Content: Industry analysts remain skeptical, with Morgan Stanley rating the stock as "underperform."
+
+ID: 203
+└── Content: The company's AI division currently represents only 5% of total revenue.
+</context>
+
+USER: What's the company's new strategy?
+
+ASSISTANT:
+The company is undergoing a major strategic transformation. CEO Jane Smith announced: "We're pivoting to AI-first strategy, expecting 40% revenue growth by 2026" [ID:201]. However, this ambitious goal faces skepticism, as industry analysts remain doubtful, with Morgan Stanley rating the stock as "underperform" [ID:202]. The challenge is significant given that the company's AI division currently represents only 5% of total revenue [ID:203]. Success will depend on execution of this bold vision.
+
+## Example 5: When NOT to Cite
+<context>
+ID: 301
+└── Content: Paris is the capital of France. The Eiffel Tower was completed in 1889.
+</context>
+
+USER: Tell me about Paris.
+
+ASSISTANT:
+Paris is the capital of France. It's known for its rich history, culture, and architecture. The Eiffel Tower was completed in 1889 [ID:301]. The city attracts millions of tourists annually. Paris remains one of the world's most visited destinations.
+(Note: Only the specific date needs citation, not common knowledge about Paris)
+
+--- Examples END ---
+
+REMEMBER: 
+- Cite FACTS, not opinions or transitions
+- Each citation supports the ENTIRE sentence
+- When in doubt, ask: "Would a fact-checker need to verify this?"
+- Place citations at sentence end, before punctuation
--- a/rag/prompts/next_step.md
+++ b/rag/prompts/next_step.md
@ -0,0 +1,63 @@
+You are an expert Planning Agent tasked with solving problems efficiently through structured plans.
+Your job is:
+1. Based on the task analysis, chose some right tools to execute.
+2. Track progress and adapt plans(tool calls) when necessary.
+3. Use `complete_task` if no further step you need to take from tools. (All necessary steps done or little hope to be done)
+
+# ========== TASK ANALYSIS =============
+{{ task_analisys }}
+
+
+# ==========  TOOLS (JSON-Schema) ==========
+You may invoke only the tools listed below.  
+Return a JSON array of objects in which item is with exactly two top-level keys:  
+• "name": the tool to call  
+• "arguments": an object whose keys/values satisfy the schema
+
+{{ desc }}
+
+# ==========  RESPONSE FORMAT ==========
+✦ **When you need a tool**  
+Return ONLY the Json (no additional keys, no commentary, end with `<|stop|>`), such as following:
+[{
+  "name": "<tool_name1>",
+  "arguments": { /* tool arguments matching its schema */ }
+},{
+  "name": "<tool_name2>",
+  "arguments": { /* tool arguments matching its schema */ }
+}...]<|stop|>
+
+✦ **When you are certain the task is solved OR no further information can be obtained**  
+Return ONLY:
+[{
+  "name": "complete_task",
+  "arguments": { "answer": "<final answer text>" }
+}]<|stop|>
+
+<verification_steps>
+Before providing a final answer:
+1. Double-check all gathered information
+2. Verify calculations and logic
+3. Ensure answer matches exactly what was asked
+4. Confirm answer format meets requirements
+5. Run additional verification if confidence is not 100%
+</verification_steps>
+
+<error_handling>
+If you encounter issues:
+1. Try alternative approaches before giving up
+2. Use different tools or combinations of tools
+3. Break complex problems into simpler sub-tasks
+4. Verify intermediate results frequently
+5. Never return "I cannot answer" without exhausting all options
+</error_handling>
+
+⚠️ Any output that is not valid JSON or that contains extra fields will be rejected.
+
+# ==========  REASONING & REFLECTION ==========
+You may think privately (not shown to the user) before producing each JSON object.  
+Internal guideline:
+1. **Reason**: Analyse the user question; decide which tools (if any) are needed.
+2. **Act**: Emit the JSON object to call the tool.
+
+Today is {{ today }}. Remember that success in answering questions accurately is paramount - take all necessary steps to ensure your answer is correct.
--- a/rag/prompts/prompt_template.py
+++ b/rag/prompts/prompt_template.py
@ -0,0 +1,20 @@
+import os
+
+
+PROMPT_DIR = os.path.dirname(__file__)
+
+_loaded_prompts = {}
+
+
+def load_prompt(name: str) -> str:
+    if name in _loaded_prompts:
+        return _loaded_prompts[name]
+
+    path = os.path.join(PROMPT_DIR, f"{name}.md")
+    if not os.path.isfile(path):
+        raise FileNotFoundError(f"Prompt file '{name}.md' not found in prompts/ directory.")
+
+    with open(path, "r", encoding="utf-8") as f:
+        content = f.read().strip()
+        _loaded_prompts[name] = content
+        return content
--- a/rag/prompts/prompts.py
+++ b/rag/prompts/prompts.py
@ -0,0 +1,415 @@
+#
+#  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+import datetime
+import json
+import logging
+import re
+from copy import deepcopy
+from typing import Tuple
+import jinja2
+import json_repair
+from api.utils import hash_str2int
+from rag.prompts.prompt_template import load_prompt
+from rag.settings import TAG_FLD
+from rag.utils import encoder, num_tokens_from_string
+
+
+STOP_TOKEN="<|STOP|>"
+COMPLETE_TASK="complete_task"
+
+
+def get_value(d, k1, k2):
+    return d.get(k1, d.get(k2))
+
+
+def chunks_format(reference):
+
+    return [
+        {
+            "id": get_value(chunk, "chunk_id", "id"),
+            "content": get_value(chunk, "content", "content_with_weight"),
+            "document_id": get_value(chunk, "doc_id", "document_id"),
+            "document_name": get_value(chunk, "docnm_kwd", "document_name"),
+            "dataset_id": get_value(chunk, "kb_id", "dataset_id"),
+            "image_id": get_value(chunk, "image_id", "img_id"),
+            "positions": get_value(chunk, "positions", "position_int"),
+            "url": chunk.get("url"),
+            "similarity": chunk.get("similarity"),
+            "vector_similarity": chunk.get("vector_similarity"),
+            "term_similarity": chunk.get("term_similarity"),
+            "doc_type": chunk.get("doc_type_kwd"),
+        }
+        for chunk in reference.get("chunks", [])
+    ]
+
+
+def message_fit_in(msg, max_length=4000):
+    def count():
+        nonlocal msg
+        tks_cnts = []
+        for m in msg:
+            tks_cnts.append({"role": m["role"], "count": num_tokens_from_string(m["content"])})
+        total = 0
+        for m in tks_cnts:
+            total += m["count"]
+        return total
+
+    c = count()
+    if c < max_length:
+        return c, msg
+
+    msg_ = [m for m in msg if m["role"] == "system"]
+    if len(msg) > 1:
+        msg_.append(msg[-1])
+    msg = msg_
+    c = count()
+    if c < max_length:
+        return c, msg
+
+    ll = num_tokens_from_string(msg_[0]["content"])
+    ll2 = num_tokens_from_string(msg_[-1]["content"])
+    if ll / (ll + ll2) > 0.8:
+        m = msg_[0]["content"]
+        m = encoder.decode(encoder.encode(m)[: max_length - ll2])
+        msg[0]["content"] = m
+        return max_length, msg
+
+    m = msg_[-1]["content"]
+    m = encoder.decode(encoder.encode(m)[: max_length - ll2])
+    msg[-1]["content"] = m
+    return max_length, msg
+
+
+def kb_prompt(kbinfos, max_tokens, hash_id=False):
+    from api.db.services.document_service import DocumentService
+
+    knowledges = [get_value(ck, "content", "content_with_weight") for ck in kbinfos["chunks"]]
+    kwlg_len = len(knowledges)
+    used_token_count = 0
+    chunks_num = 0
+    for i, c in enumerate(knowledges):
+        if not c:
+            continue
+        used_token_count += num_tokens_from_string(c)
+        chunks_num += 1
+        if max_tokens * 0.97 < used_token_count:
+            knowledges = knowledges[:i]
+            logging.warning(f"Not all the retrieval into prompt: {len(knowledges)}/{kwlg_len}")
+            break
+
+    docs = DocumentService.get_by_ids([get_value(ck, "doc_id", "document_id") for ck in kbinfos["chunks"][:chunks_num]])
+    docs = {d.id: d.meta_fields for d in docs}
+
+    def draw_node(k, line):
+        if not line:
+            return ""
+        return f"\n├── {k}: " + re.sub(r"\n+", " ", line, flags=re.DOTALL)
+
+    knowledges = []
+    for i, ck in enumerate(kbinfos["chunks"][:chunks_num]):
+        cnt = "\nID: {}".format(i if not hash_id else hash_str2int(get_value(ck, "id", "chunk_id"), 100))
+        cnt += draw_node("Title", get_value(ck, "docnm_kwd", "document_name"))
+        cnt += draw_node("URL", ck['url'])  if "url" in ck else ""
+        for k, v in docs.get(get_value(ck, "doc_id", "document_id"), {}).items():
+            cnt += draw_node(k, v)
+        cnt += "\n└── Content:\n"
+        cnt += get_value(ck, "content", "content_with_weight")
+        knowledges.append(cnt)
+
+    return knowledges
+
+
+CITATION_PROMPT_TEMPLATE = load_prompt("citation_prompt")
+CITATION_PLUS_TEMPLATE = load_prompt("citation_plus")
+CONTENT_TAGGING_PROMPT_TEMPLATE = load_prompt("content_tagging_prompt")
+CROSS_LANGUAGES_SYS_PROMPT_TEMPLATE = load_prompt("cross_languages_sys_prompt")
+CROSS_LANGUAGES_USER_PROMPT_TEMPLATE = load_prompt("cross_languages_user_prompt")
+FULL_QUESTION_PROMPT_TEMPLATE = load_prompt("full_question_prompt")
+KEYWORD_PROMPT_TEMPLATE = load_prompt("keyword_prompt")
+QUESTION_PROMPT_TEMPLATE = load_prompt("question_prompt")
+VISION_LLM_DESCRIBE_PROMPT = load_prompt("vision_llm_describe_prompt")
+VISION_LLM_FIGURE_DESCRIBE_PROMPT = load_prompt("vision_llm_figure_describe_prompt")
+
+ANALYZE_TASK_SYSTEM = load_prompt("analyze_task_system")
+ANALYZE_TASK_USER = load_prompt("analyze_task_user")
+NEXT_STEP = load_prompt("next_step")
+REFLECT = load_prompt("reflect")
+SUMMARY4MEMORY = load_prompt("summary4memory")
+RANK_MEMORY = load_prompt("rank_memory")
+
+PROMPT_JINJA_ENV = jinja2.Environment(autoescape=False, trim_blocks=True, lstrip_blocks=True)
+
+
+def citation_prompt() -> str:
+    template = PROMPT_JINJA_ENV.from_string(CITATION_PROMPT_TEMPLATE)
+    return template.render()
+
+
+def citation_plus(sources: str) -> str:
+    template = PROMPT_JINJA_ENV.from_string(CITATION_PLUS_TEMPLATE)
+    return template.render(example=citation_prompt(), sources=sources)
+
+
+def keyword_extraction(chat_mdl, content, topn=3):
+    template = PROMPT_JINJA_ENV.from_string(KEYWORD_PROMPT_TEMPLATE)
+    rendered_prompt = template.render(content=content, topn=topn)
+
+    msg = [{"role": "system", "content": rendered_prompt}, {"role": "user", "content": "Output: "}]
+    _, msg = message_fit_in(msg, chat_mdl.max_length)
+    kwd = chat_mdl.chat(rendered_prompt, msg[1:], {"temperature": 0.2})
+    if isinstance(kwd, tuple):
+        kwd = kwd[0]
+    kwd = re.sub(r"^.*</think>", "", kwd, flags=re.DOTALL)
+    if kwd.find("**ERROR**") >= 0:
+        return ""
+    return kwd
+
+
+def question_proposal(chat_mdl, content, topn=3):
+    template = PROMPT_JINJA_ENV.from_string(QUESTION_PROMPT_TEMPLATE)
+    rendered_prompt = template.render(content=content, topn=topn)
+
+    msg = [{"role": "system", "content": rendered_prompt}, {"role": "user", "content": "Output: "}]
+    _, msg = message_fit_in(msg, chat_mdl.max_length)
+    kwd = chat_mdl.chat(rendered_prompt, msg[1:], {"temperature": 0.2})
+    if isinstance(kwd, tuple):
+        kwd = kwd[0]
+    kwd = re.sub(r"^.*</think>", "", kwd, flags=re.DOTALL)
+    if kwd.find("**ERROR**") >= 0:
+        return ""
+    return kwd
+
+
+def full_question(tenant_id=None, llm_id=None, messages=[], language=None, chat_mdl=None):
+    from api.db import LLMType
+    from api.db.services.llm_service import LLMBundle
+    from api.db.services.llm_service import TenantLLMService
+
+    if not chat_mdl:
+        if TenantLLMService.llm_id2llm_type(llm_id) == "image2text":
+            chat_mdl = LLMBundle(tenant_id, LLMType.IMAGE2TEXT, llm_id)
+        else:
+            chat_mdl = LLMBundle(tenant_id, LLMType.CHAT, llm_id)
+    conv = []
+    for m in messages:
+        if m["role"] not in ["user", "assistant"]:
+            continue
+        conv.append("{}: {}".format(m["role"].upper(), m["content"]))
+    conversation = "\n".join(conv)
+    today = datetime.date.today().isoformat()
+    yesterday = (datetime.date.today() - datetime.timedelta(days=1)).isoformat()
+    tomorrow = (datetime.date.today() + datetime.timedelta(days=1)).isoformat()
+
+    template = PROMPT_JINJA_ENV.from_string(FULL_QUESTION_PROMPT_TEMPLATE)
+    rendered_prompt = template.render(
+        today=today,
+        yesterday=yesterday,
+        tomorrow=tomorrow,
+        conversation=conversation,
+        language=language,
+    )
+
+    ans = chat_mdl.chat(rendered_prompt, [{"role": "user", "content": "Output: "}])
+    ans = re.sub(r"^.*</think>", "", ans, flags=re.DOTALL)
+    return ans if ans.find("**ERROR**") < 0 else messages[-1]["content"]
+
+
+def cross_languages(tenant_id, llm_id, query, languages=[]):
+    from api.db import LLMType
+    from api.db.services.llm_service import LLMBundle
+    from api.db.services.llm_service import TenantLLMService
+
+    if llm_id and TenantLLMService.llm_id2llm_type(llm_id) == "image2text":
+        chat_mdl = LLMBundle(tenant_id, LLMType.IMAGE2TEXT, llm_id)
+    else:
+        chat_mdl = LLMBundle(tenant_id, LLMType.CHAT, llm_id)
+
+    rendered_sys_prompt = PROMPT_JINJA_ENV.from_string(CROSS_LANGUAGES_SYS_PROMPT_TEMPLATE).render()
+    rendered_user_prompt = PROMPT_JINJA_ENV.from_string(CROSS_LANGUAGES_USER_PROMPT_TEMPLATE).render(query=query, languages=languages)
+
+    ans = chat_mdl.chat(rendered_sys_prompt, [{"role": "user", "content": rendered_user_prompt}], {"temperature": 0.2})
+    ans = re.sub(r"^.*</think>", "", ans, flags=re.DOTALL)
+    if ans.find("**ERROR**") >= 0:
+        return query
+    return "\n".join([a for a in re.sub(r"(^Output:|\n+)", "", ans, flags=re.DOTALL).split("===") if a.strip()])
+
+
+def content_tagging(chat_mdl, content, all_tags, examples, topn=3):
+    template = PROMPT_JINJA_ENV.from_string(CONTENT_TAGGING_PROMPT_TEMPLATE)
+
+    for ex in examples:
+        ex["tags_json"] = json.dumps(ex[TAG_FLD], indent=2, ensure_ascii=False)
+
+    rendered_prompt = template.render(
+        topn=topn,
+        all_tags=all_tags,
+        examples=examples,
+        content=content,
+    )
+
+    msg = [{"role": "system", "content": rendered_prompt}, {"role": "user", "content": "Output: "}]
+    _, msg = message_fit_in(msg, chat_mdl.max_length)
+    kwd = chat_mdl.chat(rendered_prompt, msg[1:], {"temperature": 0.5})
+    if isinstance(kwd, tuple):
+        kwd = kwd[0]
+    kwd = re.sub(r"^.*</think>", "", kwd, flags=re.DOTALL)
+    if kwd.find("**ERROR**") >= 0:
+        raise Exception(kwd)
+
+    try:
+        obj = json_repair.loads(kwd)
+    except json_repair.JSONDecodeError:
+        try:
+            result = kwd.replace(rendered_prompt[:-1], "").replace("user", "").replace("model", "").strip()
+            result = "{" + result.split("{")[1].split("}")[0] + "}"
+            obj = json_repair.loads(result)
+        except Exception as e:
+            logging.exception(f"JSON parsing error: {result} -> {e}")
+            raise e
+    res = {}
+    for k, v in obj.items():
+        try:
+            if int(v) > 0:
+                res[str(k)] = int(v)
+        except Exception:
+            pass
+    return res
+
+
+def vision_llm_describe_prompt(page=None) -> str:
+    template = PROMPT_JINJA_ENV.from_string(VISION_LLM_DESCRIBE_PROMPT)
+
+    return template.render(page=page)
+
+
+def vision_llm_figure_describe_prompt() -> str:
+    template = PROMPT_JINJA_ENV.from_string(VISION_LLM_FIGURE_DESCRIBE_PROMPT)
+    return template.render()
+
+
+def tool_schema(tools_description: list[dict], complete_task=False):
+    if not tools_description:
+        return ""
+    desc = {}
+    if complete_task:
+        desc[COMPLETE_TASK] = {
+            "type": "function",
+            "function": {
+                "name": COMPLETE_TASK,
+                "description": "When you have the final answer and are ready to complete the task, call this function with your answer",
+                "parameters": {
+                    "type": "object",
+                    "properties": {"answer":{"type":"string", "description": "The final answer to the user's question"}},
+                    "required": ["answer"]
+                }
+            }
+        }
+    for tool in tools_description:
+        desc[tool["function"]["name"]] = tool
+
+    return "\n\n".join([f"## {i+1}. {fnm}\n{json.dumps(des, ensure_ascii=False, indent=4)}" for i, (fnm, des) in enumerate(desc.items())])
+
+
+def form_history(history, limit=-6):
+    context = ""
+    for h in history[limit:]:
+        if h["role"] == "system":
+            continue
+        role = "USER"
+        if h["role"].upper()!= role:
+            role = "AGENT"
+        context += f"\n{role}: {h['content'][:2048] + ('...' if len(h['content'])>2048 else '')}"
+    return context
+
+
+def analyze_task(chat_mdl, task_name, tools_description: list[dict]):
+    tools_desc = tool_schema(tools_description)
+    context = ""
+
+    template = PROMPT_JINJA_ENV.from_string(ANALYZE_TASK_USER)
+
+    kwd = chat_mdl.chat(ANALYZE_TASK_SYSTEM,[{"role": "user", "content": template.render(task=task_name, context=context, tools_desc=tools_desc)}], {})
+    if isinstance(kwd, tuple):
+        kwd = kwd[0]
+    kwd = re.sub(r"^.*</think>", "", kwd, flags=re.DOTALL)
+    if kwd.find("**ERROR**") >= 0:
+        return ""
+    return kwd
+
+
+def next_step(chat_mdl, history:list, tools_description: list[dict], task_desc):
+    if not tools_description:
+        return ""
+    desc = tool_schema(tools_description)
+    template = PROMPT_JINJA_ENV.from_string(NEXT_STEP)
+    user_prompt = "\nWhat's the next tool to call? If ready OR IMPOSSIBLE TO BE READY, then call `complete_task`."
+    hist = deepcopy(history)
+    if hist[-1]["role"] == "user":
+        hist[-1]["content"] += user_prompt
+    else:
+        hist.append({"role": "user", "content": user_prompt})
+    json_str = chat_mdl.chat(template.render(task_analisys=task_desc, desc=desc, today=datetime.datetime.now().strftime("%Y-%m-%d")),
+                             hist[1:], stop=["<|stop|>"])
+    tk_cnt = num_tokens_from_string(json_str)
+    json_str = re.sub(r"^.*</think>", "", json_str, flags=re.DOTALL)
+    return json_str, tk_cnt
+
+
+def reflect(chat_mdl, history: list[dict], tool_call_res: list[Tuple]):
+    tool_calls = [{"name": p[0], "result": p[1]} for p in tool_call_res]
+    goal = history[1]["content"]
+    template = PROMPT_JINJA_ENV.from_string(REFLECT)
+    user_prompt = template.render(goal=goal, tool_calls=tool_calls)
+    hist = deepcopy(history)
+    if hist[-1]["role"] == "user":
+        hist[-1]["content"] += user_prompt
+    else:
+        hist.append({"role": "user", "content": user_prompt})
+    _, msg = message_fit_in(hist, chat_mdl.max_length)
+    ans = chat_mdl.chat(msg[0]["content"], msg[1:])
+    ans = re.sub(r"^.*</think>", "", ans, flags=re.DOTALL)
+    return """
+**Observation**
+{}
+
+**Reflection**
+{}
+    """.format(json.dumps(tool_calls, ensure_ascii=False, indent=2), ans)
+
+
+def form_message(system_prompt, user_prompt):
+    return [{"role": "system", "content": system_prompt},{"role": "user", "content": user_prompt}]
+
+
+def tool_call_summary(chat_mdl, name: str, params: dict, result: str) -> str:
+    template = PROMPT_JINJA_ENV.from_string(SUMMARY4MEMORY)
+    system_prompt = template.render(name=name,
+                           params=json.dumps(params, ensure_ascii=False, indent=2),
+                           result=result)
+    user_prompt = "→ Summary: "
+    _, msg = message_fit_in(form_message(system_prompt, user_prompt), chat_mdl.max_length)
+    ans = chat_mdl.chat(msg[0]["content"], msg[1:])
+    return re.sub(r"^.*</think>", "", ans, flags=re.DOTALL)
+
+
+def rank_memories(chat_mdl, goal:str, sub_goal:str, tool_call_summaries: list[str]):
+    template = PROMPT_JINJA_ENV.from_string(RANK_MEMORY)
+    system_prompt = template.render(goal=goal, sub_goal=sub_goal, results=[{"i": i, "content": s} for i,s in enumerate(tool_call_summaries)])
+    user_prompt = " → rank: "
+    _, msg = message_fit_in(form_message(system_prompt, user_prompt), chat_mdl.max_length)
+    ans = chat_mdl.chat(msg[0]["content"], msg[1:], stop="<|stop|>")
+    return re.sub(r"^.*</think>", "", ans, flags=re.DOTALL)
+
--- a/rag/prompts/rank_memory.md
+++ b/rag/prompts/rank_memory.md
@ -0,0 +1,30 @@
+**Task**: Sort the tool call results based on relevance to the overall goal and current sub-goal. Return ONLY a sorted list of indices (0-indexed).
+
+**Rules**:
+1. Analyze each result's contribution to both:
+   - The overall goal (primary priority)
+   - The current sub-goal (secondary priority)
+2. Sort from MOST relevant (highest impact) to LEAST relevant
+3. Output format: Strictly a Python-style list of integers. Example: [2, 0, 1]
+
+🔹 Overall Goal: {{ goal }}
+🔹 Sub-goal: {{ sub_goal }}
+
+**Examples**:  
+🔹 Tool Response:  
+ - index: 0
+     > Tokyo temperature is 78°F.
+ - index: 1
+     > Error: Authentication failed (expired API key).
+ - index: 2
+     > Available: 12 widgets in stock (max 5 per customer).
+ 
+ → rank: [1,2,0]<|stop|>
+ 
+
+**Your Turn**:  
+🔹 Tool Response:
+{% for f in results %}
+ - index: f.i
+     > f.content
+{% endfor %}
--- a/rag/prompts/reflect.md
+++ b/rag/prompts/reflect.md
@ -0,0 +1,34 @@
+**Context**:
+ - To achieve the goal: {{ goal }}.
+ - You have executed following tool calls:
+{% for call in tool_calls %}
+Tool call: `{{ call.name }}`
+Results: {{ call.result }}
+{% endfor %}
+
+
+**Reflection Instructions:**
+
+Analyze the current state of the overall task ({{ goal }}), then provide structured responses to the following:
+
+## 1. Goal Achievement Status
+ - Does the current outcome align with the original purpose of this task phase? 
+ - If not, what critical gaps exist?
+
+## 2. Step Completion Check
+ - Which planned steps were completed? (List verified items)
+ - Which steps are pending/incomplete? (Specify exactly what’s missing)
+
+## 3. Information Adequacy
+ - Is the collected data sufficient to proceed?
+ - What key information is still needed? (e.g., metrics, user input, external data)
+
+## 4. Critical Observations
+ - Unexpected outcomes: [Flag anomalies/errors]
+ - Risks/blockers: [Identify immediate obstacles]
+ - Accuracy concerns: [Highlight unreliable results]
+
+## 5. Next-Step Recommendations
+ - Proposed immediate action: [Concrete next step]
+ - Alternative strategies if blocked: [Workaround solution]
+ - Tools/inputs required for next phase: [Specify resources]
--- a/rag/prompts/summary4memory.md
+++ b/rag/prompts/summary4memory.md
@ -0,0 +1,35 @@
+**Role**: AI Assistant  
+**Task**: Summarize tool call responses  
+**Rules**:  
+1. Context: You've executed a tool (API/function) and received a response.  
+2. Condense the response into 1-2 short sentences.  
+3. Never omit:  
+   - Success/error status  
+   - Core results (e.g., data points, decisions)  
+   - Critical constraints (e.g., limits, conditions)  
+4. Exclude technical details like timestamps/request IDs unless crucial.  
+5. Use language as the same as main content of the tool response.  
+
+**Response Template**:  
+"[Status] + [Key Outcome] + [Critical Constraints]"  
+
+**Examples**:  
+🔹 Tool Response:  
+{"status": "success", "temperature": 78.2, "unit": "F", "location": "Tokyo", "timestamp": 16923456}  
+→ Summary: "Success: Tokyo temperature is 78°F."  
+
+🔹 Tool Response:  
+{"error": "invalid_api_key", "message": "Authentication failed: expired key"}  
+→ Summary: "Error: Authentication failed (expired API key)."  
+
+🔹 Tool Response:  
+{"available": true, "inventory": 12, "product": "widget", "limit": "max 5 per customer"}  
+→ Summary: "Available: 12 widgets in stock (max 5 per customer)."  
+
+**Your Turn**:  
+ - Tool call: {{ name }}
+ - Tool inputs as following:
+{{ params }}
+
+ - Tool Response:
+{{ result }}
--- a/rag/prompts/tool_call_summary.md
+++ b/rag/prompts/tool_call_summary.md
@ -0,0 +1,19 @@
+**Task Instruction:**
+
+You are tasked with reading and analyzing tool call result based on the following inputs: **Inputs for current call**, and **Results**. Your objective is to extract relevant and helpful information for **Inputs for current call** from the **Results** and seamlessly integrate this information into the previous steps to continue reasoning for the original question.
+
+**Guidelines:**
+
+1. **Analyze the Results:**
+  - Carefully review the content of each results of tool call.
+  - Identify factual information that is relevant to the **Inputs for current call** and can aid in the reasoning process for the original question.
+
+2. **Extract Relevant Information:**
+  - Select the information from the Searched Web Pages that directly contributes to advancing the previous reasoning steps.
+  - Ensure that the extracted information is accurate and relevant.
+
+  - **Inputs for current call:**  
+  {{ inputs }}
+
+  - **Results:**  
+  {{ results }}