Update version info to v0.14.1 (#3720 )

### What problem does this PR solve? Update version info to v0.14.1 ### Type of change - [x] Documentation Update --------- Signed-off-by: jinhai <haijin.chn@gmail.com>
Fix: Clicking the checkbox of the pop-up window for editing chunk is invalid #3726 (#3727 )
2025-12-08 20:42:30 +08:00 · 2024-11-28 20:09:20 +08:00 · 2024-11-28 20:08:06 +08:00 · 2024-11-28 19:37:01 +08:00 · 2024-11-28 19:32:50 +08:00 · 2024-11-28 19:15:31 +08:00
103 changed files with 1985 additions and 384 deletions
--- a/README.md
+++ b/README.md
@ -20,7 +20,7 @@
        <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
    </a>
    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
-        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.14.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.14.0">
+        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.14.1-brightgreen" alt="docker pull infiniflow/ragflow:v0.14.1">
    </a>
    <a href="https://github.com/infiniflow/ragflow/releases/latest">
        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@ -176,14 +176,14 @@ releases! 🌟
   ```

   > - To download a RAGFlow slim Docker image of a specific version, update the `RAGFLOW_IMAGE` variable in *
-       *docker/.env** to your desired version. For example, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0-slim`. After
+       *docker/.env** to your desired version. For example, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.1-slim`. After
       making this change, rerun the command above to initiate the download.
   > - To download the dev version of RAGFlow Docker image *including* embedding models and Python libraries, update the
       `RAGFLOW_IMAGE` variable in **docker/.env** to `RAGFLOW_IMAGE=infiniflow/ragflow:dev`. After making this change,
       rerun the command above to initiate the download.
   > - To download a specific version of RAGFlow Docker image *including* embedding models and Python libraries, update
       the `RAGFLOW_IMAGE` variable in **docker/.env** to your desired version. For example,
-       `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0`. After making this change, rerun the command above to initiate the
+       `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.1`. After making this change, rerun the command above to initiate the
       download.

   > **NOTE:** A RAGFlow Docker image that includes embedding models and Python libraries is approximately 9GB in size
@ -333,8 +333,7 @@ docker build -f Dockerfile -t infiniflow/ragflow:dev .
   cd web
   npm install --force
   ```  
-7. Configure frontend to update `proxy.target` in **.umirc.ts** to `http://127.0.0.1:9380`:
-8. Launch frontend service:
+7. Launch frontend service:
   ```bash
   npm run dev 
   ```  
--- a/README_id.md
+++ b/README_id.md
@ -20,7 +20,7 @@
        <img alt="Lencana Daring" src="https://img.shields.io/badge/Online-Demo-4e6b99">
    </a>
    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
-        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.14.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.14.0">
+        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.14.1-brightgreen" alt="docker pull infiniflow/ragflow:v0.14.1">
    </a>
    <a href="https://github.com/infiniflow/ragflow/releases/latest">
        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Rilis%20Terbaru" alt="Rilis Terbaru">
@ -169,14 +169,14 @@ Coba demo kami di [https://demo.ragflow.io](https://demo.ragflow.io).
   ```

   > - Untuk mengunduh versi tertentu dari image Docker RAGFlow slim, perbarui variabel `RAGFlow_IMAGE` di *
-       *docker/.env** sesuai dengan versi yang diinginkan. Misalnya, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0-slim`.
+       *docker/.env** sesuai dengan versi yang diinginkan. Misalnya, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.1-slim`.
       Setelah mengubah ini, jalankan ulang perintah di atas untuk memulai unduhan.
   > - Untuk mengunduh versi dev dari image Docker RAGFlow *termasuk* model embedding dan library Python, perbarui
       variabel `RAGFlow_IMAGE` di **docker/.env** menjadi `RAGFLOW_IMAGE=infiniflow/ragflow:dev`. Setelah mengubah ini,
       jalankan ulang perintah di atas untuk memulai unduhan.
   > - Untuk mengunduh versi tertentu dari image Docker RAGFlow *termasuk* model embedding dan library Python, perbarui
       variabel `RAGFlow_IMAGE` di **docker/.env** sesuai dengan versi yang diinginkan. Misalnya,
-       `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0`. Setelah mengubah ini, jalankan ulang perintah di atas untuk memulai unduhan.
+       `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.1`. Setelah mengubah ini, jalankan ulang perintah di atas untuk memulai unduhan.

   > **CATATAN:** Image Docker RAGFlow yang mencakup model embedding dan library Python berukuran sekitar 9GB
   dan mungkin memerlukan waktu lebih lama untuk dimuat.
@ -307,9 +307,8 @@ docker build -f Dockerfile -t infiniflow/ragflow:dev .
   ```bash
   cd web
   npm install --force
-   ```  
-7. Konfigurasikan frontend untuk memperbarui `proxy.target` di **.umirc.ts** menjadi `http://127.0.0.1:9380`:
-8. Jalankan aplikasi frontend:
+   ```
+7. Jalankan aplikasi frontend:
   ```bash
   npm run dev 
   ```  
--- a/README_ja.md
+++ b/README_ja.md
@ -20,7 +20,7 @@
        <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
    </a>
    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
-        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.14.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.14.0">
+        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.14.1-brightgreen" alt="docker pull infiniflow/ragflow:v0.14.1">
    </a>
    <a href="https://github.com/infiniflow/ragflow/releases/latest">
        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@ -148,9 +148,9 @@
   $ docker compose -f docker-compose.yml up -d
   ```

-   > - 特定のバージョンのRAGFlow slim Dockerイメージをダウンロードするには、**docker/.env**内の`RAGFlow_IMAGE`変数を希望のバージョンに更新します。例えば、`RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0`とします。この変更を行った後、上記のコマンドを再実行してダウンロードを開始してください。
+   > - 特定のバージョンのRAGFlow slim Dockerイメージをダウンロードするには、**docker/.env**内の`RAGFlow_IMAGE`変数を希望のバージョンに更新します。例えば、`RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.1`とします。この変更を行った後、上記のコマンドを再実行してダウンロードを開始してください。
   > - RAGFlowの埋め込みモデルとPythonライブラリを含む開発版Dockerイメージをダウンロードするには、**docker/.env**内の`RAGFlow_IMAGE`変数を`RAGFLOW_IMAGE=infiniflow/ragflow:dev`に更新します。この変更を行った後、上記のコマンドを再実行してダウンロードを開始してください。
-   > - 特定のバージョンのRAGFlow Dockerイメージ（埋め込みモデルとPythonライブラリを含む）をダウンロードするには、**docker/.env**内の`RAGFlow_IMAGE`変数を希望のバージョンに更新します。例えば、`RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0`とします。この変更を行った後、上記のコマンドを再実行してダウンロードを開始してください。  
+   > - 特定のバージョンのRAGFlow Dockerイメージ（埋め込みモデルとPythonライブラリを含む）をダウンロードするには、**docker/.env**内の`RAGFlow_IMAGE`変数を希望のバージョンに更新します。例えば、`RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.1`とします。この変更を行った後、上記のコマンドを再実行してダウンロードを開始してください。  
   
   > **NOTE:** 埋め込みモデルとPythonライブラリを含むRAGFlow Dockerイメージのサイズは約9GBであり、読み込みにかなりの時間がかかる場合があります。

@ -289,8 +289,7 @@ docker build -f Dockerfile -t infiniflow/ragflow:dev .
   cd web
   npm install --force
   ```  
-7. フロントエンドを設定し、**.umirc.ts** の `proxy.target` を `http://127.0.0.1:9380` に更新します:
-8. フロントエンドサービスを起動する:  
+7. フロントエンドサービスを起動する:  
   ```bash
   npm run dev 
   ```
--- a/README_ko.md
+++ b/README_ko.md
@ -20,7 +20,7 @@
        <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
    </a>
    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
-        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.14.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.14.0">
+        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.14.1-brightgreen" alt="docker pull infiniflow/ragflow:v0.14.1">
    </a>
    <a href="https://github.com/infiniflow/ragflow/releases/latest">
        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@ -152,9 +152,9 @@
   $ docker compose -f docker-compose.yml up -d
   ```
   
-   > - 특정 버전의 RAGFlow slim Docker 이미지를 다운로드하려면, **docker/.env**에서 `RAGFlow_IMAGE` 변수를 원하는 버전으로 업데이트하세요. 예를 들어, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0-slim`으로 설정합니다. 이 변경을 완료한 후, 위의 명령을 다시 실행하여 다운로드를 시작하세요.
+   > - 특정 버전의 RAGFlow slim Docker 이미지를 다운로드하려면, **docker/.env**에서 `RAGFlow_IMAGE` 변수를 원하는 버전으로 업데이트하세요. 예를 들어, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.1-slim`으로 설정합니다. 이 변경을 완료한 후, 위의 명령을 다시 실행하여 다운로드를 시작하세요.
   > - RAGFlow의 임베딩 모델과 Python 라이브러리를 포함한 개발 버전 Docker 이미지를 다운로드하려면, **docker/.env**에서 `RAGFlow_IMAGE` 변수를 `RAGFLOW_IMAGE=infiniflow/ragflow:dev`로 업데이트하세요. 이 변경을 완료한 후, 위의 명령을 다시 실행하여 다운로드를 시작하세요.
-   > - 특정 버전의 RAGFlow Docker 이미지를 임베딩 모델과 Python 라이브러리를 포함하여 다운로드하려면, **docker/.env**에서 `RAGFlow_IMAGE` 변수를 원하는 버전으로 업데이트하세요. 예를 들어, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0` 로 설정합니다. 이 변경을 완료한 후, 위의 명령을 다시 실행하여 다운로드를 시작하세요.  
+   > - 특정 버전의 RAGFlow Docker 이미지를 임베딩 모델과 Python 라이브러리를 포함하여 다운로드하려면, **docker/.env**에서 `RAGFlow_IMAGE` 변수를 원하는 버전으로 업데이트하세요. 예를 들어, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.1` 로 설정합니다. 이 변경을 완료한 후, 위의 명령을 다시 실행하여 다운로드를 시작하세요.  
   
   > **NOTE:** 임베딩 모델과 Python 라이브러리를 포함한 RAGFlow Docker 이미지의 크기는 약 9GB이며, 로드하는 데 상당히 오랜 시간이 걸릴 수 있습니다.

@ -291,8 +291,7 @@ docker build -f Dockerfile -t infiniflow/ragflow:dev .
   cd web
   npm install --force
   ```  
-7. **.umirc.ts** 에서 `proxy.target` 을 `http://127.0.0.1:9380` 으로 업데이트합니다:
-8. 프론트엔드 서비스를 시작합니다:  
+7. 프론트엔드 서비스를 시작합니다:  
   ```bash
   npm run dev 
   ```
--- a/README_zh.md
+++ b/README_zh.md
@ -20,7 +20,7 @@
        <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
    </a>
    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
-        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.14.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.14.0">
+        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.14.1-brightgreen" alt="docker pull infiniflow/ragflow:v0.14.1">
    </a>
    <a href="https://github.com/infiniflow/ragflow/releases/latest">
        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@ -149,9 +149,9 @@
   $ docker compose -f docker-compose.yml up -d
   ```

-   > - 如果你想下载并运行特定版本的 RAGFlow slim Docker 镜像，请在 **docker/.env** 文件中找到 `RAGFLOW_IMAGE` 变量，将其改为对应版本。例如 `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0-slim`，然后再运行上述命令。
+   > - 如果你想下载并运行特定版本的 RAGFlow slim Docker 镜像，请在 **docker/.env** 文件中找到 `RAGFLOW_IMAGE` 变量，将其改为对应版本。例如 `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.1-slim`，然后再运行上述命令。
   > - 如果您想安装内置 embedding 模型和 Python 库的 dev 版本的 Docker 镜像，需要将 **docker/.env** 文件中的 `RAGFLOW_IMAGE` 变量修改为： `RAGFLOW_IMAGE=infiniflow/ragflow:dev`。
-   > - 如果您想安装内置 embedding 模型和 Python 库的指定版本的 RAGFlow Docker 镜像，需要将 **docker/.env** 文件中的 `RAGFLOW_IMAGE` 变量修改为： `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0`。修改后，再运行上面的命令。
+   > - 如果您想安装内置 embedding 模型和 Python 库的指定版本的 RAGFlow Docker 镜像，需要将 **docker/.env** 文件中的 `RAGFLOW_IMAGE` 变量修改为： `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.1`。修改后，再运行上面的命令。
   > **注意：** 安装内置 embedding 模型和 Python 库的指定版本的 RAGFlow Docker 镜像大小约 9 GB，可能需要更长时间下载，请耐心等待。
   
 4. 服务器启动成功后再次确认服务器状态：
@ -296,8 +296,7 @@ docker build -f Dockerfile -t infiniflow/ragflow:dev .
   cd web
   npm install --force
   ```  
-7. 配置前端，将 **.umirc.ts** 的 `proxy.target` 更新为 `http://127.0.0.1:9380`：  
-8. 启动前端服务：  
+7. 启动前端服务：  
   ```bash
   npm run dev 
   ```  
--- a/agent/component/crawler.py
+++ b/agent/component/crawler.py
@ -17,6 +17,7 @@ from abc import ABC
 import asyncio
 from crawl4ai import AsyncWebCrawler
 from agent.component.base import ComponentBase, ComponentParamBase
+from api.utils.web_utils import is_valid_url


 class CrawlerParam(ComponentParamBase):
@ -39,7 +40,7 @@ class Crawler(ComponentBase, ABC):
    def _run(self, history, **kwargs):
        ans = self.get_input()
        ans = " - ".join(ans["content"]) if "content" in ans else ""
-        if not ans:
+        if not is_valid_url(ans):
            return Crawler.be_output("")
        try:
            result = asyncio.run(self.get_web(ans))
@ -64,7 +65,3 @@ class Crawler(ComponentBase, ABC):
            elif self._param.extract_type == 'content':
                result.extracted_content
            return result.markdown
-            
-
-
-    
--- a/agent/templates/interpreter.json
+++ b/agent/templates/interpreter.json
@ -1,7 +1,7 @@
 {
  "id": 4,
  "title": "Interpreter",
-  "description": "Translation Agent: Agentic translation using reflection workflow.\n\nThis is inspired by Andrew NG's project: https://github.com/andrewyng/translation-agent\n\n",
+  "description": "A translation agent based on a reflection agentic workflow, inspired by Andrew Ng's project: https://github.com/andrewyng/translation-agent\n\n1. Prompt an LLM to translate a text into the target language.\n2. Have the LLM reflect on the translation and provide constructive suggestions for improvement.\n3. Use these suggestions to improve the translation.",
  "canvas_type": "chatbot",
  "dsl": {
            "answer": [],
@ -473,7 +473,7 @@
                    {
                        "data": {
                            "form": {
-                                "text": "Translation Agent: Agentic translation using reflection workflow\n\nThis is inspired by Andrew NG's project: https://github.com/andrewyng/translation-agent\n\n1. Prompt an LLM to translate a text to target_language;\n2. Have the LLM reflect on the translation to come up with constructive suggestions for improving it;\n3. Use the suggestions to improve the translation."
+                                "text": "Translation Agent: Agentic translation using reflection workflow\n\nThis is inspired by Andrew NG's project: https://github.com/andrewyng/translation-agent\n\n1. Prompt an LLM to translate a text into the target language;\n2. Have the LLM reflect on the translation and provide constructive suggestions for improvement;\n3. Use these suggestions to improve the translation."
                            },
                            "label": "Note",
                            "name": "Breif"
--- a/agent/templates/seo_blog.json
+++ b/agent/templates/seo_blog.json
@ -1,7 +1,7 @@
 {
  "id": 9,
  "title": "SEO Blog Generator",
-  "description": "A generator for Blogs with SEO just by giving title or keywords.",
+  "description": "A blog generator that creates SEO-optimized content based on your chosen title or keywords.",
  "canvas_type": "chatbot",
  "dsl": {
            "answer": [],
--- a/api/apps/chunk_app.py
+++ b/api/apps/chunk_app.py
@ -96,7 +96,7 @@ def get():
        kb_ids = KnowledgebaseService.get_kb_ids(tenant_id)
        chunk = settings.docStoreConn.get(chunk_id, search.index_name(tenant_id), kb_ids)
        if chunk is None:
-            return server_error_response("Chunk not found")
+            return server_error_response(Exception("Chunk not found"))
        k = []
        for n in chunk.keys():
            if re.search(r"(_vec$|_sm_|_tks|_ltks)", n):
@ -155,7 +155,7 @@ def set():
        v, c = embd_mdl.encode([doc.name, req["content_with_weight"]])
        v = 0.1 * v[0] + 0.9 * v[1] if doc.parser_id != ParserType.QA else v[1]
        d["q_%d_vec" % len(v)] = v.tolist()
-        settings.docStoreConn.insert([d], search.index_name(tenant_id), doc.kb_id)
+        settings.docStoreConn.update({"id": req["chunk_id"]}, d, search.index_name(tenant_id), doc.kb_id)
        return get_json_result(data=True)
    except Exception as e:
        return server_error_response(e)
--- a/api/apps/dialog_app.py
+++ b/api/apps/dialog_app.py
@ -74,11 +74,17 @@ def set_dialog():
        e, tenant = TenantService.get_by_id(current_user.id)
        if not e:
            return get_data_error_result(message="Tenant not found!")
+        kbs = KnowledgebaseService.get_by_ids(req.get("kb_ids"))
+        embd_count = len(set([kb.embd_id for kb in kbs]))
+        if embd_count != 1:
+            return get_data_error_result(message=f'Datasets use different embedding models: {[kb.embd_id for kb in kbs]}"')
+
        llm_id = req.get("llm_id", tenant.llm_id)
        if not dialog_id:
            if not req.get("kb_ids"):
                return get_data_error_result(
                    message="Fail! Please select knowledgebase!")
+
            dia = {
                "id": get_uuid(),
                "tenant_id": current_user.id,
--- a/api/apps/kb_app.py
+++ b/api/apps/kb_app.py
@ -29,6 +29,7 @@ from api.db.db_models import File
 from api.utils.api_utils import get_json_result
 from api import settings
 from rag.nlp import search
+from api.constants import DATASET_NAME_LIMIT


@manager.route('/create', methods=['post'])
@ -36,10 +37,19 @@ from rag.nlp import search
@validate_request("name")
 def create():
    req = request.json
-    req["name"] = req["name"].strip()
-    req["name"] = duplicate_name(
+    dataset_name = req["name"]
+    if not isinstance(dataset_name, str):
+        return get_data_error_result(message="Dataset name must be string.")
+    if dataset_name == "":
+        return get_data_error_result(message="Dataset name can't be empty.")
+    if len(dataset_name) >= DATASET_NAME_LIMIT:
+        return get_data_error_result(
+            message=f"Dataset name length is {len(dataset_name)} which is large than {DATASET_NAME_LIMIT}")
+
+    dataset_name = dataset_name.strip()
+    dataset_name = duplicate_name(
        KnowledgebaseService.query,
-        name=req["name"],
+        name=dataset_name,
        tenant_id=current_user.id,
        status=StatusEnum.VALID.value)
    try:
@ -73,7 +83,8 @@ def update():
        if not KnowledgebaseService.query(
                created_by=current_user.id, id=req["kb_id"]):
            return get_json_result(
-                data=False, message='Only owner of knowledgebase authorized for this operation.', code=settings.RetCode.OPERATING_ERROR)
+                data=False, message='Only owner of knowledgebase authorized for this operation.',
+                code=settings.RetCode.OPERATING_ERROR)

        e, kb = KnowledgebaseService.get_by_id(req["kb_id"])
        if not e:
@ -81,7 +92,8 @@ def update():
                message="Can't find this knowledgebase!")

        if req["name"].lower() != kb.name.lower() \
-                and len(KnowledgebaseService.query(name=req["name"], tenant_id=current_user.id, status=StatusEnum.VALID.value)) > 1:
+                and len(
+            KnowledgebaseService.query(name=req["name"], tenant_id=current_user.id, status=StatusEnum.VALID.value)) > 1:
            return get_data_error_result(
                message="Duplicated knowledgebase name.")

@ -125,15 +137,16 @@ def detail():
@manager.route('/list', methods=['GET'])
@login_required
 def list_kbs():
-    page_number = request.args.get("page", 1)
-    items_per_page = request.args.get("page_size", 150)
+    keywords = request.args.get("keywords", "")
+    page_number = int(request.args.get("page", 1))
+    items_per_page = int(request.args.get("page_size", 150))
    orderby = request.args.get("orderby", "create_time")
    desc = request.args.get("desc", True)
    try:
        tenants = TenantService.get_joined_tenants_by_user_id(current_user.id)
-        kbs = KnowledgebaseService.get_by_tenant_ids(
-            [m["tenant_id"] for m in tenants], current_user.id, page_number, items_per_page, orderby, desc)
-        return get_json_result(data=kbs)
+        kbs, total = KnowledgebaseService.get_by_tenant_ids(
+            [m["tenant_id"] for m in tenants], current_user.id, page_number, items_per_page, orderby, desc, keywords)
+        return get_json_result(data={"kbs": kbs, "total": total})
    except Exception as e:
        return server_error_response(e)

@ -151,10 +164,11 @@ def rm():
        )
    try:
        kbs = KnowledgebaseService.query(
-                created_by=current_user.id, id=req["kb_id"])
+            created_by=current_user.id, id=req["kb_id"])
        if not kbs:
            return get_json_result(
-                data=False, message='Only owner of knowledgebase authorized for this operation.', code=settings.RetCode.OPERATING_ERROR)
+                data=False, message='Only owner of knowledgebase authorized for this operation.',
+                code=settings.RetCode.OPERATING_ERROR)

        for doc in DocumentService.query(kb_id=req["kb_id"]):
            if not DocumentService.remove_document(doc, kbs[0].tenant_id):
@ -162,13 +176,15 @@ def rm():
                    message="Database error (Document removal)!")
            f2d = File2DocumentService.get_by_document_id(doc.id)
            FileService.filter_delete([File.source_type == FileSource.KNOWLEDGEBASE, File.id == f2d[0].file_id])
-            FileService.filter_delete([File.source_type == FileSource.KNOWLEDGEBASE, File.type == "folder", File.name == kbs[0].name])
            File2DocumentService.delete_by_document_id(doc.id)
-
+        FileService.filter_delete(
+            [File.source_type == FileSource.KNOWLEDGEBASE, File.type == "folder", File.name == kbs[0].name])
        if not KnowledgebaseService.delete_by_id(req["kb_id"]):
            return get_data_error_result(
                message="Database error (Knowledgebase removal)!")
-        settings.docStoreConn.delete({"kb_id": req["kb_id"]}, search.index_name(kbs[0].tenant_id), req["kb_id"])
+        for kb in kbs:
+            settings.docStoreConn.delete({"kb_id": kb.id}, search.index_name(kb.tenant_id), kb.id)
+            settings.docStoreConn.deleteIdx(search.index_name(kb.tenant_id), kb.id)
        return get_json_result(data=True)
    except Exception as e:
        return server_error_response(e)
--- a/api/apps/sdk/dataset.py
+++ b/api/apps/sdk/dataset.py
@ -252,9 +252,9 @@ def delete(tenant_id):
                    File.id == f2d[0].file_id,
                ]
            )
-            FileService.filter_delete(
-                [File.source_type == FileSource.KNOWLEDGEBASE, File.type == "folder", File.name == kbs[0].name])
            File2DocumentService.delete_by_document_id(doc.id)
+        FileService.filter_delete(
+            [File.source_type == FileSource.KNOWLEDGEBASE, File.type == "folder", File.name == kbs[0].name])
        if not KnowledgebaseService.delete_by_id(id):
            return get_error_data_result(message="Delete dataset error.(Database error)")
    return get_result(code=settings.RetCode.SUCCESS)
--- a/api/apps/user_app.py
+++ b/api/apps/user_app.py
@ -252,7 +252,7 @@ def feishu_callback():
    if res["code"] != 0:
        return redirect("/?error=%s" % res["message"])

-    if "contact:user.email:readonly" not in res["data"]["scope"].split(" "):
+    if "contact:user.email:readonly" not in res["data"]["scope"].split():
        return redirect("/?error=contact:user.email:readonly not in scope")
    session["access_token"] = res["data"]["access_token"]
    session["access_token_from"] = "feishu"
--- a/api/constants.py
+++ b/api/constants.py
@ -23,3 +23,5 @@ API_VERSION = "v1"
 RAG_FLOW_SERVICE_NAME = "ragflow"
 REQUEST_WAIT_SEC = 2
 REQUEST_MAX_WAIT_SEC = 300
+
+DATASET_NAME_LIMIT = 128
--- a/api/db/services/knowledgebase_service.py
+++ b/api/db/services/knowledgebase_service.py
@ -16,6 +16,7 @@
 from api.db import StatusEnum, TenantPermission
 from api.db.db_models import Knowledgebase, DB, Tenant, User, UserTenant,Document
 from api.db.services.common_service import CommonService
+from peewee import fn


 class KnowledgebaseService(CommonService):
@ -34,7 +35,7 @@ class KnowledgebaseService(CommonService):
    @classmethod
    @DB.connection_context()
    def get_by_tenant_ids(cls, joined_tenant_ids, user_id,
-                          page_number, items_per_page, orderby, desc):
+                          page_number, items_per_page, orderby, desc, keywords):
        fields = [
            cls.model.id,
            cls.model.avatar,
@ -51,20 +52,31 @@ class KnowledgebaseService(CommonService):
            User.avatar.alias('tenant_avatar'),
            cls.model.update_time
        ]
-        kbs = cls.model.select(*fields).join(User, on=(cls.model.tenant_id == User.id)).where(
-            ((cls.model.tenant_id.in_(joined_tenant_ids) & (cls.model.permission ==
-                                                            TenantPermission.TEAM.value)) | (
-                     cls.model.tenant_id == user_id))
-            & (cls.model.status == StatusEnum.VALID.value)
-        )
+        if keywords:
+            kbs = cls.model.select(*fields).join(User, on=(cls.model.tenant_id == User.id)).where(
+                ((cls.model.tenant_id.in_(joined_tenant_ids) & (cls.model.permission ==
+                                                                TenantPermission.TEAM.value)) | (
+                    cls.model.tenant_id == user_id))
+                & (cls.model.status == StatusEnum.VALID.value),
+                (fn.LOWER(cls.model.name).contains(keywords.lower()))
+            )
+        else:
+            kbs = cls.model.select(*fields).join(User, on=(cls.model.tenant_id == User.id)).where(
+                ((cls.model.tenant_id.in_(joined_tenant_ids) & (cls.model.permission ==
+                                                                TenantPermission.TEAM.value)) | (
+                    cls.model.tenant_id == user_id))
+                & (cls.model.status == StatusEnum.VALID.value)
+            )
        if desc:
            kbs = kbs.order_by(cls.model.getter_by(orderby).desc())
        else:
            kbs = kbs.order_by(cls.model.getter_by(orderby).asc())

+        count = kbs.count()
+
        kbs = kbs.paginate(page_number, items_per_page)

-        return list(kbs.dicts())
+        return list(kbs.dicts()), count

    @classmethod
    @DB.connection_context()
--- a/api/utils/file_utils.py
+++ b/api/utils/file_utils.py
@ -146,7 +146,7 @@ def rewrite_yaml_conf(conf_path, config):


 def rewrite_json_file(filepath, json_data):
-    with open(filepath, "w") as f:
+    with open(filepath, "w", encoding='utf-8') as f:
        json.dump(json_data, f, indent=4, separators=(",", ": "))
    f.close()

@ -170,29 +170,52 @@ def filename_type(filename):
    return FileType.OTHER.value

 def thumbnail_img(filename, blob):
+    """
+    MySQL LongText max length is 65535
+    """
    filename = filename.lower()
    if re.match(r".*\.pdf$", filename):
        pdf = pdfplumber.open(BytesIO(blob))
        buffered = BytesIO()
-        pdf.pages[0].to_image(resolution=32).annotated.save(buffered, format="png")
-        return buffered.getvalue()
+        resolution = 32
+        img = None
+        for _ in range(10):
+            # https://github.com/jsvine/pdfplumber?tab=readme-ov-file#creating-a-pageimage-with-to_image
+            pdf.pages[0].to_image(resolution=resolution).annotated.save(buffered, format="png")
+            img = buffered.getvalue()
+            if len(img) >= 64000 and resolution >= 2:
+                resolution = resolution / 2
+                buffered = BytesIO()
+            else:
+                break
+        return img

-    if re.match(r".*\.(jpg|jpeg|png|tif|gif|icon|ico|webp)$", filename):
+    elif re.match(r".*\.(jpg|jpeg|png|tif|gif|icon|ico|webp)$", filename):
        image = Image.open(BytesIO(blob))
        image.thumbnail((30, 30))
        buffered = BytesIO()
        image.save(buffered, format="png")
        return buffered.getvalue()

-    if re.match(r".*\.(ppt|pptx)$", filename):
+    elif re.match(r".*\.(ppt|pptx)$", filename):
        import aspose.slides as slides
        import aspose.pydrawing as drawing
        try:
            with slides.Presentation(BytesIO(blob)) as presentation:
                buffered = BytesIO()
-                presentation.slides[0].get_thumbnail(0.03, 0.03).save(
-                    buffered, drawing.imaging.ImageFormat.png)
-                return buffered.getvalue()
+                scale = 0.03
+                img = None
+                for _ in range(10):
+                    # https://reference.aspose.com/slides/python-net/aspose.slides/slide/get_thumbnail/#float-float
+                    presentation.slides[0].get_thumbnail(scale, scale).save(
+                        buffered, drawing.imaging.ImageFormat.png)
+                    img = buffered.getvalue()
+                    if len(img) >= 64000:
+                        scale = scale / 2.0
+                        buffered = BytesIO()
+                    else:
+                        break
+                return img
        except Exception:
            pass
    return None
--- a/conf/mapping.json
+++ b/conf/mapping.json
@ -130,7 +130,8 @@
      },
      {
        "string": {
-          "match": "*_(with_weight|list)$",
+          "match_pattern": "regex",
+          "match": "^.*_(with_weight|list)$",
          "mapping": {
            "type": "text",
            "index": "false",
--- a/deepdoc/parser/docx_parser.py
+++ b/deepdoc/parser/docx_parser.py
@ -47,7 +47,7 @@ class RAGFlowDocxParser:
            for p, n in patt:
                if re.search(p, b):
                    return n
-            tks = [t for t in rag_tokenizer.tokenize(b).split(" ") if len(t) > 1]
+            tks = [t for t in rag_tokenizer.tokenize(b).split() if len(t) > 1]
            if len(tks) > 3:
                if len(tks) < 12:
                    return "Tx"
--- a/deepdoc/parser/pdf_parser.py
+++ b/deepdoc/parser/pdf_parser.py
@ -108,13 +108,13 @@ class RAGFlowPdfParser:
        h = max(self.__height(up), self.__height(down))
        y_dis = self._y_dis(up, down)
        LEN = 6
-        tks_down = rag_tokenizer.tokenize(down["text"][:LEN]).split(" ")
-        tks_up = rag_tokenizer.tokenize(up["text"][-LEN:]).split(" ")
+        tks_down = rag_tokenizer.tokenize(down["text"][:LEN]).split()
+        tks_up = rag_tokenizer.tokenize(up["text"][-LEN:]).split()
        tks_all = up["text"][-LEN:].strip() \
                  + (" " if re.match(r"[a-zA-Z0-9]+",
                                     up["text"][-1] + down["text"][0]) else "") \
                  + down["text"][:LEN].strip()
-        tks_all = rag_tokenizer.tokenize(tks_all).split(" ")
+        tks_all = rag_tokenizer.tokenize(tks_all).split()
        fea = [
            up.get("R", -1) == down.get("R", -1),
            y_dis / h,
@ -565,13 +565,13 @@ class RAGFlowPdfParser:
            if i >= len(self.boxes):
                break
            prefix = self.boxes[i]["text"].strip()[:3] if not eng else " ".join(
-                self.boxes[i]["text"].strip().split(" ")[:2])
+                self.boxes[i]["text"].strip().split()[:2])
            while not prefix:
                self.boxes.pop(i)
                if i >= len(self.boxes):
                    break
                prefix = self.boxes[i]["text"].strip()[:3] if not eng else " ".join(
-                    self.boxes[i]["text"].strip().split(" ")[:2])
+                    self.boxes[i]["text"].strip().split()[:2])
            self.boxes.pop(i)
            if i >= len(self.boxes) or not prefix:
                break
--- a/deepdoc/parser/resume/entities/corporations.py
+++ b/deepdoc/parser/resume/entities/corporations.py
@ -47,7 +47,7 @@ def corpNorm(nm, add_region=True):
    nm = re.sub(r"(计算机|技术|(技术|科技|网络)*有限公司|公司|有限|研发中心|中国|总部)$", "", nm, 10000, re.IGNORECASE)
    if not nm or (len(nm)<5 and not regions.isName(nm[0:2])):return nm

-    tks = rag_tokenizer.tokenize(nm).split(" ")
+    tks = rag_tokenizer.tokenize(nm).split()
    reg = [t for i,t in enumerate(tks) if regions.isName(t) and (t != "中国" or i > 0)]
    nm = ""
    for t in tks:
--- a/deepdoc/parser/resume/entities/schools.py
+++ b/deepdoc/parser/resume/entities/schools.py
@ -11,7 +11,10 @@
 #  limitations under the License.
 #

-import os, json,re,copy
+import os
+import json
+import re
+import copy
 import pandas as pd
 current_file_path = os.path.dirname(os.path.abspath(__file__))
 TBL = pd.read_csv(os.path.join(current_file_path, "res/schools.csv"), sep="\t", header=0).fillna("")
@ -23,7 +26,7 @@ GOOD_SCH = set([re.sub(r"[,. &（）()]+", "", c) for c in GOOD_SCH])
 def loadRank(fnm):
    global TBL
    TBL["rank"] = 1000000
-    with open(fnm, "r",encoding='UTF-8') as f:
+    with open(fnm, "r", encoding='utf-8') as f:
        while True:
            l = f.readline()
            if not l:break
@ -32,7 +35,7 @@ def loadRank(fnm):
                nm,rk = l[0].strip(),int(l[1])
                #assert len(TBL[((TBL.name_cn == nm) | (TBL.name_en == nm))]),f"<{nm}>"
                TBL.loc[((TBL.name_cn == nm) | (TBL.name_en == nm)), "rank"] = rk
-            except Exception as e:
+            except Exception:
                pass


@ -41,7 +44,7 @@ loadRank(os.path.join(current_file_path, "res/school.rank.csv"))

 def split(txt):
    tks = []
-    for t in re.sub(r"[ \t]+", " ",txt).split(" "):
+    for t in re.sub(r"[ \t]+", " ",txt).split():
        if tks and re.match(r".*[a-zA-Z]$", tks[-1]) and \
           re.match(r"[a-zA-Z]", t) and tks:
            tks[-1] = tks[-1] + " " + t
--- a/deepdoc/parser/resume/step_one.py
+++ b/deepdoc/parser/resume/step_one.py
@ -80,7 +80,7 @@ def refactor(df):
    def loadjson(line):
        try:
            return json.loads(line)
-        except Exception as e:
+        except Exception:
            pass
        return {}

@ -183,4 +183,4 @@ def refactor(df):
                "\r",
                "\\n"))
    # print(df.values.tolist())
-    return dict(zip([n.split(" ")[0] for n in FIELDS], df.values.tolist()[0]))
+    return dict(zip([n.split()[0] for n in FIELDS], df.values.tolist()[0]))
--- a/deepdoc/parser/resume/step_two.py
+++ b/deepdoc/parser/resume/step_two.py
@ -100,7 +100,7 @@ def forEdu(cv):
        if n.get("school_name") and isinstance(n["school_name"], str):
            sch.append(re.sub(r"(211|985|重点大学|[,&;；-])", "", n["school_name"]))
            e["sch_nm_kwd"] = sch[-1]
-        fea.append(rag_tokenizer.fine_grained_tokenize(rag_tokenizer.tokenize(n.get("school_name", ""))).split(" ")[-1])
+        fea.append(rag_tokenizer.fine_grained_tokenize(rag_tokenizer.tokenize(n.get("school_name", ""))).split()[-1])

        if n.get("discipline_name") and isinstance(n["discipline_name"], str):
            maj.append(n["discipline_name"])
@ -485,7 +485,7 @@ def parse(cv):
        nm = re.sub(r"[\n——\-\(（\+].*", "", cv["name"].strip())
        nm = re.sub(r"[ \t　]+", " ", nm)
        if re.match(r"[a-zA-Z ]+$", nm):
-            if len(nm.split(" ")) > 1:
+            if len(nm.split()) > 1:
                cv["name"] = nm
            else:
                nm = ""
@ -503,7 +503,7 @@ def parse(cv):
        for py in PY.get_pinyins(nm[:20], ''):
            for i in range(2, len(py) + 1): cv["name_py_pref_tks"] += " " + py[:i]
        for py in PY.get_pinyins(nm[:20], ' '):
-            py = py.split(" ")
+            py = py.split()
            for i in range(1, len(py) + 1): cv["name_py_pref0_tks"] += " " + "".join(py[:i])

        cv["name_kwd"] = name
--- a/deepdoc/vision/t_ocr.py
+++ b/deepdoc/vision/t_ocr.py
@ -41,7 +41,7 @@ def main(args):
            "score": 1} for b, t in bxs if b[0][0] <= b[1][0] and b[0][1] <= b[-1][1]]
        img = draw_box(images[i], bxs, ["ocr"], 1.)
        img.save(outputs[i], quality=95)
-        with open(outputs[i] + ".txt", "w+") as f:
+        with open(outputs[i] + ".txt", "w+", encoding='utf-8') as f:
            f.write("\n".join([o["text"] for o in bxs]))


--- a/deepdoc/vision/t_recognizer.py
+++ b/deepdoc/vision/t_recognizer.py
@ -50,7 +50,7 @@ def main(args):
        if args.mode.lower() == "tsr":
            #lyt = [t for t in lyt if t["type"] == "table column"]
            html = get_table_html(images[i], lyt, ocr)
-            with open(outputs[i] + ".html", "w+") as f:
+            with open(outputs[i] + ".html", "w+", encoding='utf-8') as f:
                f.write(html)
            lyt = [{
                "type": t["label"],
--- a/deepdoc/vision/table_structure_recognizer.py
+++ b/deepdoc/vision/table_structure_recognizer.py
@ -117,7 +117,7 @@ class TableStructureRecognizer(Recognizer):
        for p, n in patt:
            if re.search(p, b["text"].strip()):
                return n
-        tks = [t for t in rag_tokenizer.tokenize(b["text"]).split(" ") if len(t) > 1]
+        tks = [t for t in rag_tokenizer.tokenize(b["text"]).split() if len(t) > 1]
        if len(tks) > 3:
            if len(tks) < 12:
                return "Tx"
--- a/docker/README.md
+++ b/docker/README.md
@ -14,7 +14,7 @@
 - **docker-compose.yml**  
  Sets up environment for RAGFlow and its dependencies.
 - **docker-compose-base.yml**  
-  Sets up environment for RAGFlow's base services: Elasticsearch, MySQL, MinIO, and Redis.
+  Sets up environment for RAGFlow's dependencies: Elasticsearch/[Infinity](https://github.com/infiniflow/infinity), MySQL, MinIO, and Redis.

 ## 🐬 Docker environment variables

--- a/docker/docker-compose-base.yml
+++ b/docker/docker-compose-base.yml
@ -39,6 +39,8 @@ services:
    image: infiniflow/infinity:v0.5.0-dev5
    volumes:
      - infinity_data:/var/infinity
+      - ./infinity_conf.toml:/infinity_conf.toml
+    command: ["-f", "/infinity_conf.toml"]
    ports:
      - ${INFINITY_THRIFT_PORT}:23817
      - ${INFINITY_HTTP_PORT}:23820
--- a/docker/infinity_conf.toml
+++ b/docker/infinity_conf.toml
@ -0,0 +1,66 @@
+[general]
+version                  = "0.5.0"
+time_zone                = "utc-8"
+
+[network]
+server_address           = "0.0.0.0"
+postgres_port            = 5432
+http_port                = 23820
+client_port              = 23817
+connection_pool_size     = 128
+
+[log]
+log_filename             = "infinity.log"
+log_dir                  = "/var/infinity/log"
+log_to_stdout            = true
+log_file_max_size        = "100MB"
+log_file_rotate_count    = 10
+
+# trace/debug/info/warning/error/critical 6 log levels, default: info
+log_level               = "info"
+
+[storage]
+persistence_dir         = "/var/infinity/persistence"
+data_dir                = "/var/infinity/data"
+# periodically activates garbage collection:
+# 0 means real-time,
+# s means seconds, for example "60s", 60 seconds
+# m means minutes, for example "60m", 60 minutes
+# h means hours, for example "1h", 1 hour
+optimize_interval        = "10s"
+cleanup_interval         = "60s"
+compact_interval         = "120s"
+storage_type             = "local"
+
+# dump memory index entry when it reachs the capacity
+mem_index_capacity       = 65536
+
+# S3 storage config example:
+# [storage.object_storage]
+# url                      = "127.0.0.1:9000"
+# bucket_name              = "infinity"
+# access_key               = "minioadmin"
+# secret_key               = "minioadmin"
+# enable_https             = false
+
+[buffer]
+buffer_manager_size      = "8GB"
+lru_num                  = 7
+temp_dir                 = "/var/infinity/tmp"
+result_cache             = "off"
+memindex_memory_quota    = "4GB"
+
+[wal]
+wal_dir                       = "/var/infinity/wal"
+full_checkpoint_interval      = "30s"
+delta_checkpoint_interval     = "5s"
+# delta_checkpoint_threshold  = 1000000000
+wal_compact_threshold         = "1GB"
+
+# flush_at_once: write and flush log each commit
+# only_write: write log, OS control when to flush the log, default
+# flush_per_second: logs are written after each commit and flushed to disk per second.
+wal_flush                     = "only_write"
+
+[resource]
+resource_dir                  = "/var/infinity/resource"
--- a/docs/configurations.md
+++ b/docs/configurations.md
@ -5,7 +5,7 @@ slug: /configurations

 # Configurations

-Configurations for installing RAGFlow via Docker.
+Configurations for deploying RAGFlow via Docker.

 ## Guidelines

@ -32,7 +32,7 @@ docker compose -f docker/docker-compose.yml up -d
 - **docker-compose.yml**  
  Sets up environment for RAGFlow and its dependencies.
 - **docker-compose-base.yml**  
-  Sets up environment for RAGFlow's base services: Elasticsearch, MySQL, MinIO, and Redis.
+  Sets up environment for RAGFlow's dependencies: Elasticsearch/[Infinity](https://github.com/infiniflow/infinity), MySQL, MinIO, and Redis.

 ## Docker environment variables

--- a/docs/guides/_category_.json
+++ b/docs/guides/_category_.json
@ -1,6 +1,6 @@
 {
  "label": "Guides",
-  "position": 2,
+  "position": 3,
  "link": {
    "type": "generated-index",
    "description": "Guides for RAGFlow users and developers."
--- a/docs/guides/configure_knowledge_base.md
+++ b/docs/guides/configure_knowledge_base.md
@ -103,7 +103,7 @@ RAGFlow features visibility and explainability, allowing you to view the chunkin

 2. Hover over each snapshot for a quick view of each chunk.

-3. Double click the chunked texts to add keywords or make *manual* changes where necessary:
+3. Double-click the chunked texts to add keywords or make *manual* changes where necessary:

   ![update chunk](https://github.com/infiniflow/ragflow/assets/93570324/1d84b408-4e9f-46fd-9413-8c1059bf9c76)

@ -111,7 +111,7 @@ RAGFlow features visibility and explainability, allowing you to view the chunkin
 You can add keywords to a file chunk to increase its ranking for queries containing those keywords. This action increases its keyword weight and can improve its position in search list.  
 :::

-4. In Retrieval testing, ask a quick question in **Test text** to double check if your configurations work:
+4. In Retrieval testing, ask a quick question in **Test text** to double-check if your configurations work:

   _As you can tell from the following, RAGFlow responds with truthful citations._

@ -128,7 +128,7 @@ RAGFlow uses multiple recall of both full-text search and vector search in its c

 ## Search for knowledge base

-As of RAGFlow v0.14.0, the search feature is still in a rudimentary form, supporting only knowledge base search by name.
+As of RAGFlow v0.14.1, the search feature is still in a rudimentary form, supporting only knowledge base search by name.

 ![search knowledge base](https://github.com/infiniflow/ragflow/assets/93570324/836ae94c-2438-42be-879e-c7ad2a59693e)

--- a/docs/guides/deploy_local_llm.mdx
+++ b/docs/guides/deploy_local_llm.mdx
@ -7,6 +7,8 @@ slug: /deploy_local_llm
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';

+Run models locally using Ollama, Xinference, or other frameworks.
+
 RAGFlow supports deploying models locally using Ollama, Xinference, IPEX-LLM, or jina. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models.

 RAGFlow seamlessly integrates with Ollama and Xinference, without the need for further environment configurations. You can use them to deploy two types of local models in RAGFlow: chat models and embedding models.
@ -106,7 +108,7 @@ Click on your logo **>** **Model Providers** **>** **System Model Settings** to

 Update your chat model accordingly in **Chat Configuration**:

-> If your local model is an embedding model, update it on the configruation page of your knowledge base.
+> If your local model is an embedding model, update it on the configuration page of your knowledge base.

 ## Deploy a local model using Xinference

@ -159,7 +161,7 @@ Click on your logo **>** **Model Providers** **>** **System Model Settings** to

 Update your chat model accordingly in **Chat Configuration**:

-> If your local model is an embedding model, update it on the configruation page of your knowledge base.
+> If your local model is an embedding model, update it on the configuration page of your knowledge base.

 ## Deploy a local model using IPEX-LLM

--- a/docs/guides/develop/acquire_ragflow_api_key.md
+++ b/docs/guides/develop/acquire_ragflow_api_key.md
@ -7,7 +7,7 @@ slug: /acquire_ragflow_api_key

 A key is required for the RAGFlow server to authenticate your requests via HTTP or a Python API. This documents provides instructions on obtaining a RAGFlow API key.

-1. Click your avatar on the top right corner of the RAGFlow UI to access the configuration page.
+1. Click your avatar in the top right corner of the RAGFlow UI to access the configuration page.
 2. Click **API** to switch to the **API** page.
 3. Obtain a RAGFlow API key:

--- a/docs/guides/manage_files.md
+++ b/docs/guides/manage_files.md
@ -81,4 +81,4 @@ RAGFlow's file management allows you to download an uploaded file:

 ![download_file](https://github.com/infiniflow/ragflow/assets/93570324/cf3b297f-7d9b-4522-bf5f-4f45743e4ed5)

-> As of RAGFlow v0.14.0, bulk download is not supported, nor can you download an entire folder. 
+> As of RAGFlow v0.14.1, bulk download is not supported, nor can you download an entire folder. 
--- a/docs/guides/manage_team_members.md
+++ b/docs/guides/manage_team_members.md
@ -17,7 +17,7 @@ By default, each RAGFlow user is assigned a single team named after their name.
 Team members are currently *not* allowed to invite users to your team, and only you, the team owner, is permitted to do so.
 :::

-To enter the **Team** page, click on your avatar on the top right corner of the page **>** Team:
+To enter the **Team** page, click on your avatar in the top right corner of the page **>** Team:

 ![team](https://github.com/user-attachments/assets/0eac2503-26bc-4568-b3f2-bcd84069a07a)

@ -27,7 +27,7 @@ _On the **Team** page, you can view the information about members of your team a

 You are, by default, the owner of your own team and the only person permitted to invite users to join your team or remove team members.

-![invite_team_member](https://github.com/user-attachments/assets/75e19d53-3a00-480e-8b16-fe00c23c4486)
+![invite_team_member](https://github.com/user-attachments/assets/d85b55c3-7e86-4f04-a414-ca18a9ee8963)

 ## Remove team members

@ -36,4 +36,3 @@ You are, by default, the owner of your own team and the only person permitted to
 ## Accept or decline team invite

 ![accept_or_decline_team_invite](https://github.com/user-attachments/assets/6a2cb61f-03d5-4423-9ed1-71df97ff4114)
-
--- a/docs/guides/run_health_check.md
+++ b/docs/guides/run_health_check.md
@ -5,7 +5,7 @@ slug: /run_health_check

 # Run health check on RAGFlow's dependencies

-Double check the health status of RAGFlow's dependencies.
+Double-check the health status of RAGFlow's dependencies.

 The operation of RAGFlow depends on four services:

@ -16,7 +16,7 @@ The operation of RAGFlow depends on four services:

 If an exception or error occurs related to any of the above services, such as `Exception: Can't connect to ES cluster`, refer to this document to check their health status.

-You can also click you avatar on the top right corner of the page **>** System to view the visualized health status of RAGFlow's core services. The following screenshot shows that all services are 'green' (running healthily). The task executor displays the *cumulative* number of completed and failed document parsing tasks from the past 30 minutes:
+You can also click you avatar in the top right corner of the page **>** System to view the visualized health status of RAGFlow's core services. The following screenshot shows that all services are 'green' (running healthily). The task executor displays the *cumulative* number of completed and failed document parsing tasks from the past 30 minutes:

 ![system_status_page](https://github.com/user-attachments/assets/b0c1a11e-93e3-4947-b17a-1bfb4cdab6e4)

--- a/docs/guides/start_chat.md
+++ b/docs/guides/start_chat.md
@ -19,7 +19,7 @@ You start an AI conversation by creating an assistant.

   - **Assistant name** is the name of your chat assistant. Each assistant corresponds to a dialogue with a unique combination of knowledge bases, prompts, hybrid search configurations, and large model settings.
   - **Empty response**:
-     - If you wish to *confine* RAGFlow's answers to your knowledge bases, leave a response here. Then when it doesn't retrieve an answer, it *uniformly* responds with what you set here.
+     - If you wish to *confine* RAGFlow's answers to your knowledge bases, leave a response here. Then, when it doesn't retrieve an answer, it *uniformly* responds with what you set here.
     - If you wish RAGFlow to *improvise* when it doesn't retrieve an answer from your knowledge bases, leave it blank, which may give rise to hallucinations.
   - **Show Quote**: This is a key feature of RAGFlow and enabled by default. RAGFlow does not work like a black box. instead, it clearly shows the sources of information that its responses are based on.
   - Select the corresponding knowledge bases. You can select one or multiple knowledge bases, but ensure that they use the same embedding model, otherwise an error would occur.
--- a/docs/guides/upgrade_ragflow.mdx
+++ b/docs/guides/upgrade_ragflow.mdx
@ -11,7 +11,9 @@ Upgrade RAGFlow to `dev-slim`/`dev` or the latest, published release.

 ## Upgrade RAGFlow to `dev-slim`/`dev`, the most recent, tested Docker image

-`dev-slim` refers to the RAGFlow Docker image *without* embedding models, while `dev` refers to the RAGFlow Docker image with embedding models. For details on their differences, see **docker/.env**.
+`dev-slim` refers to the RAGFlow Docker image *without* embedding models, while `dev` refers to the RAGFlow Docker image with embedding models. For details on their differences, see [ragflow/docker/.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env).
+
+To upgrade RAGFlow, you must upgrade **both** your code **and** your Docker image:

 1. Clone the repo

@ -52,22 +54,24 @@ RAGFLOW_IMAGE=infiniflow/ragflow:dev

 ## Upgrade RAGFlow to the most recent, officially published release

+To upgrade RAGFlow, you must upgrade **both** your code **and** your Docker image:
+
 1. Clone the repo

   ```bash
   git clone https://github.com/infiniflow/ragflow.git
   ```

-2. Switch to the latest, officially published release, e.g., `v0.14.0`:
+2. Switch to the latest, officially published release, e.g., `v0.14.1`:

   ```bash
-   git checkout v0.14.0
+   git checkout v0.14.1
   ```

 3. Update **ragflow/docker/.env** as follows:

   ```bash
-   RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0
+   RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.1
   ```

 4. Update the RAGFlow image and restart RAGFlow:
--- a/docs/quickstart.mdx
+++ b/docs/quickstart.mdx
@ -32,9 +32,9 @@ This section provides instructions on setting up the RAGFlow server on Linux. If
 <details>
  <summary>1. Ensure <code>vm.max_map_count</code> &ge; 262144:</summary>

-   `vm.max_map_count`. This value sets the maximum number of memory map areas a process may have. Its default value is 65530. While most applications require fewer than a thousand maps, reducing this value can result in abmornal behaviors, and the system will throw out-of-memory errors when a process reaches the limitation. 
+   `vm.max_map_count`. This value sets the maximum number of memory map areas a process may have. Its default value is 65530. While most applications require fewer than a thousand maps, reducing this value can result in abnormal behaviors, and the system will throw out-of-memory errors when a process reaches the limitation.

-   RAGFlow v0.14.0 uses Elasticsearch for multiple recall. Setting the value of `vm.max_map_count` correctly is crucial to the proper functioning of the Elasticsearch component.
+   RAGFlow v0.14.1 uses Elasticsearch or [Infinity](https://github.com/infiniflow/infinity) for multiple recall. Setting the value of `vm.max_map_count` correctly is crucial to the proper functioning of the Elasticsearch component.

 <Tabs
  defaultValue="linux"
@ -184,9 +184,9 @@ This section provides instructions on setting up the RAGFlow server on Linux. If
   $ docker compose -f docker-compose.yml up -d
   ```

-   > - To download a RAGFlow slim Docker image of a specific version, update the `RAGFlOW_IMAGE` variable in **docker/.env** to your desired version. For example, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0-slim`. After making this change, rerun the command above to initiate the download.
+   > - To download a RAGFlow slim Docker image of a specific version, update the `RAGFlOW_IMAGE` variable in **docker/.env** to your desired version. For example, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.1-slim`. After making this change, rerun the command above to initiate the download.
   > - To download the dev version of RAGFlow Docker image *including* embedding models and Python libraries, update the `RAGFlOW_IMAGE` variable in **docker/.env** to `RAGFLOW_IMAGE=infiniflow/ragflow:dev`. After making this change, rerun the command above to initiate the download.
-   > - To download a specific version of RAGFlow Docker image *including* embedding models and Python libraries, update the `RAGFlOW_IMAGE` variable in **docker/.env** to your desired version. For example, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0`. After making this change, rerun the command above to initiate the download.
+   > - To download a specific version of RAGFlow Docker image *including* embedding models and Python libraries, update the `RAGFlOW_IMAGE` variable in **docker/.env** to your desired version. For example, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.1`. After making this change, rerun the command above to initiate the download.

 :::tip NOTE
 A RAGFlow Docker image that includes embedding models and Python libraries is approximately 9GB in size and may take significantly longer time to load.
--- a/docs/references/_category_.json
+++ b/docs/references/_category_.json
@ -1,6 +1,6 @@
 {
  "label": "References",
-  "position": 4,
+  "position": 6,
  "link": {
    "type": "generated-index",
    "description": "Miscellaneous References"
--- a/docs/references/http_api_reference.md
+++ b/docs/references/http_api_reference.md
@ -2120,7 +2120,7 @@ Failure:

 ---

-## Create session with an agent
+## Create session with agent

 **POST** `/api/v1/agents/{agent_id}/sessions`

--- a/docs/references/supported_models.mdx
+++ b/docs/references/supported_models.mdx
@ -4,7 +4,7 @@ slug: /supported_models
 ---

 # Supported models
-import APITable from '../../src/components/APITable';
+import APITable from '@site/src/components/APITable';

 A complete list of models supported by RAGFlow, which will continue to expand.

--- a/docs/release_notes.md
+++ b/docs/release_notes.md
@ -0,0 +1,118 @@
+---
+sidebar_position: 2
+slug: /release_notes
+---
+
+# Release notes
+
+Key features and improvements in the latest releases.
+
+## v0.14.0
+
+Released on November 26, 2024.
+
+### New features
+
+- Supports [Infinity](https://github.com/infiniflow/infinity) or Elasticsearch (default) as document engine for vector storage and full-text indexing. [#2894](https://github.com/infiniflow/ragflow/pull/2894)
+- Enhances user experience by adding more variables to the Agent and implementing auto-saving.
+- Adds a three-step translation agent template, inspired by [Andrew Ng's translation agent](https://github.com/andrewyng/translation-agent).
+- Adds an SEO-optimized blog writing agent template.
+- Provides HTTP and Python APIs for conversing with an agent.
+- Supports the use of English synonyms during retrieval processes.
+- Optimizes term weight calculations, reducing the retrieval time by 50%.
+- Improves task executor monitoring with additional performance indicators.
+- Replaces Redis with Valkey.
+- Adds three new UI languages (*contributed by the community*): Indonesian, Spanish, and Vietnamese.
+
+### Compatibility changes
+
+As of this release, **service_config.yaml.template** replaces **service_config.yaml** for configuring backend services. Upon Docker container startup, the environment variables defined in this template file are automatically populated and a **service_config.yaml** is auto-generated from it. [#3341](https://github.com/infiniflow/ragflow/pull/3341)
+
+This approach eliminates the need to manually update **service_config.yaml** after making changes to **.env**, facilitating dynamic environment configurations.
+
+:::danger IMPORTANT
+Ensure that you [upgrade **both** your code **and** Docker image to this release](https://ragflow.io/docs/dev/upgrade_ragflow#upgrade-ragflow-to-the-most-recent-officially-published-release) before trying this new approach.
+:::
+
+### Related APIs
+
+#### HTTP APIs
+
+- [Create session with agent](https://ragflow.io/docs/dev/http_api_reference#create-session-with-agent)
+- [Converse with agent](https://ragflow.io/docs/dev/http_api_reference#converse-with-agent)
+
+#### Python APIs
+
+- [Create session with agent](https://ragflow.io/docs/dev/python_api_reference#create-session-with-agent)
+- [Converse with agent](https://ragflow.io/docs/dev/python_api_reference#create-session-with-agent)
+
+### Documentation
+
+#### Added documents
+
+- [Configurations](https://ragflow.io/docs/dev/configurations)
+- [Manage team members](https://ragflow.io/docs/dev/manage_team_members)
+- [Run health check on RAGFlow's dependencies](https://ragflow.io/docs/dev/run_health_check)
+
+## v0.13.0
+
+Released on October 31, 2024.
+
+### New features
+
+- Adds the team management functionality for all users.
+- Updates the Agent UI to improve usability.
+- Adds support for Markdown chunking in the **General** chunk method.
+- Introduces an **invoke** tool within the Agent UI.
+- Integrates support for Dify's knowledge base API.
+- Adds support for GLM4-9B and Yi-Lightning models.
+- Introduces HTTP and Python APIs for dataset management, file management within dataset, and chat assistant management.
+
+:::tip NOTE
+To download RAGFlow's Python SDK:
+
+```bash
+pip install ragflow-sdk==0.13.0
+```
+:::
+
+### Documentation
+
+#### Added documents
+
+- [Acquire a RAGFlow API key](https://ragflow.io/docs/dev/acquire_ragflow_api_key)
+- [HTTP API Reference](https://ragflow.io/docs/dev/http_api_reference)
+- [Python API Reference](https://ragflow.io/docs/dev/python_api_reference)
+
+## v0.12.0
+
+Released on September 30, 2024.
+
+### New features
+
+- Offers slim editions of RAGFlow's Docker images, which do not include built-in BGE/BCE embedding or reranking models.
+- Improves the results of multi-round dialogues.
+- Enables users to remove added LLM vendors.
+- Adds support for OpenTTS and SparkTTS models.
+- Implements an **Excel to HTML** toggle in the **General** chunk method, allowing users to parse a spreadsheet into either HTML tables or key-value pairs by row.
+- Adds agent tools **YahooFance** and **Jin10**.
+- Adds a template for an investment advisor agent.
+
+### Compatibility changes
+
+As of this release, RAGFlow offers slim editions of its Docker images to improve the experience for users with limited Internet access. A slim edition of RAGFlow's Docker image does not include built-in BGE/BCE embedding models and has a size of about 1GB; a full edition of RAGFlow is approximately 9GB and includes both built-in embedding models and embedding models that will be downloaded once you select them in the RAGFlow UI.
+
+The default Docker image edition is `dev-slim`. The following list clarifies the differences between various editions:
+
+- `dev-slim`: The slim edition of the most recent tested Docker image.
+- `v0.12.0-slim`: The slim edition of the most recent **officially released** Docker image.
+- `dev`: The full edition of the most recent tested Docker image.
+- `v0.12.0`: The full edition of the most recent **officially released** Docker image.
+
+See [Upgrade RAGFlow](https://ragflow.io/docs/dev/upgrade_ragflow) for instructions on upgrading.
+
+### Documentation
+
+#### Added documents
+
+- [Upgrade RAGFlow](https://ragflow.io/docs/dev/upgrade_ragflow)
--- a/graphrag/index.py
+++ b/graphrag/index.py
@ -64,27 +64,27 @@ def build_knowledge_graph_chunks(tenant_id: str, chunks: list[str], callback, en
    BATCH_SIZE=4
    texts, graphs = [], []
    cnt = 0
-    threads = []
    max_workers = int(os.environ.get('GRAPH_EXTRACTOR_MAX_WORKERS', 50))
-    exe = ThreadPoolExecutor(max_workers=max_workers)
-    for i in range(len(chunks)):
-        tkn_cnt = num_tokens_from_string(chunks[i])
-        if cnt+tkn_cnt >= left_token_count and texts:
+    with ThreadPoolExecutor(max_workers=max_workers) as exe:
+        threads = []
+        for i in range(len(chunks)):
+            tkn_cnt = num_tokens_from_string(chunks[i])
+            if cnt+tkn_cnt >= left_token_count and texts:
+                for b in range(0, len(texts), BATCH_SIZE):
+                    threads.append(exe.submit(ext, ["\n".join(texts[b:b+BATCH_SIZE])], {"entity_types": entity_types}, callback))
+                texts = []
+                cnt = 0
+            texts.append(chunks[i])
+            cnt += tkn_cnt
+        if texts:
            for b in range(0, len(texts), BATCH_SIZE):
                threads.append(exe.submit(ext, ["\n".join(texts[b:b+BATCH_SIZE])], {"entity_types": entity_types}, callback))
-            texts = []
-            cnt = 0
-        texts.append(chunks[i])
-        cnt += tkn_cnt
-    if texts:
-        for b in range(0, len(texts), BATCH_SIZE):
-            threads.append(exe.submit(ext, ["\n".join(texts[b:b+BATCH_SIZE])], {"entity_types": entity_types}, callback))

-    callback(0.5, "Extracting entities.")
-    graphs = []
-    for i, _ in enumerate(threads):
-        graphs.append(_.result().output)
-        callback(0.5 + 0.1*i/len(threads), f"Entities extraction progress ... {i+1}/{len(threads)}")
+        callback(0.5, "Extracting entities.")
+        graphs = []
+        for i, _ in enumerate(threads):
+            graphs.append(_.result().output)
+            callback(0.5 + 0.1*i/len(threads), f"Entities extraction progress ... {i+1}/{len(threads)}")

    graph = reduce(graph_merge, graphs) if graphs else nx.Graph()
    er = EntityResolution(llm_bdl)
--- a/graphrag/mind_map_extractor.py
+++ b/graphrag/mind_map_extractor.py
@ -88,26 +88,26 @@ class MindMapExtractor:
            prompt_variables = {}

        try:
-            max_workers = int(os.environ.get('MINDMAP_EXTRACTOR_MAX_WORKERS', 12))
-            exe = ThreadPoolExecutor(max_workers=max_workers)
-            threads = []
-            token_count = max(self._llm.max_length * 0.8, self._llm.max_length - 512)
-            texts = []
            res = []
-            cnt = 0
-            for i in range(len(sections)):
-                section_cnt = num_tokens_from_string(sections[i])
-                if cnt + section_cnt >= token_count and texts:
+            max_workers = int(os.environ.get('MINDMAP_EXTRACTOR_MAX_WORKERS', 12))
+            with ThreadPoolExecutor(max_workers=max_workers) as exe:
+                threads = []
+                token_count = max(self._llm.max_length * 0.8, self._llm.max_length - 512)
+                texts = []
+                cnt = 0
+                for i in range(len(sections)):
+                    section_cnt = num_tokens_from_string(sections[i])
+                    if cnt + section_cnt >= token_count and texts:
+                        threads.append(exe.submit(self._process_document, "".join(texts), prompt_variables))
+                        texts = []
+                        cnt = 0
+                    texts.append(sections[i])
+                    cnt += section_cnt
+                if texts:
                    threads.append(exe.submit(self._process_document, "".join(texts), prompt_variables))
-                    texts = []
-                    cnt = 0
-                texts.append(sections[i])
-                cnt += section_cnt
-            if texts:
-                threads.append(exe.submit(self._process_document, "".join(texts), prompt_variables))

-            for i, _ in enumerate(threads):
-                res.append(_.result())
+                for i, _ in enumerate(threads):
+                    res.append(_.result())

            if not res:
                return MindMapResult(output={"id": "root", "children": []})
--- a/graphrag/search.py
+++ b/graphrag/search.py
@ -59,8 +59,8 @@ class KGSearch(Dealer):
        q_vec = matchDense.embedding_data
        src = req.get("fields", ["docnm_kwd", "content_ltks", "kb_id", "img_id", "title_tks", "important_kwd",
                                 "doc_id", f"q_{len(q_vec)}_vec", "position_list", "name_kwd",
-                                 "q_1024_vec", "q_1536_vec", "available_int", "content_with_weight",
-                                 "weight_int", "weight_flt", "rank_int"
+                                 "available_int", "content_with_weight",
+                                 "weight_int", "weight_flt"
                                 ])

        fusionExpr = FusionExpr("weighted_sum", 32, {"weights": "0.5, 0.5"})
--- a/poetry.lock
+++ b/poetry.lock
@ -196,13 +196,13 @@ files = [

 [[package]]
 name = "akshare"
-version = "1.15.32"
+version = "1.15.33"
 description = "AKShare is an elegant and simple financial data interface library for Python, built for human beings!"
 optional = false
 python-versions = ">=3.8"
 files = [
-    {file = "akshare-1.15.32-py3-none-any.whl", hash = "sha256:4424f4adc4364f4f0a63fe3153772a3113578c6bc594a5b46ed065d8805f5084"},
-    {file = "akshare-1.15.32.tar.gz", hash = "sha256:fb3129cbcd089cc949e15365ff664cba1739f121fd0b7710fddb4adf5b1fee53"},
+    {file = "akshare-1.15.33-py3-none-any.whl", hash = "sha256:6f56682404cbff892b23ec896528199392f06a11a14c287cdc36f1f57c3ea612"},
+    {file = "akshare-1.15.33.tar.gz", hash = "sha256:6bd46e9ab767237c57adce35a409b73a6607b12b8c36d3bf2d2fbe83121d6e3d"},
 ]

 [package.dependencies]
@ -3808,84 +3808,86 @@ i18n = ["Babel (>=2.7)"]

 [[package]]
 name = "jiter"
-version = "0.7.1"
+version = "0.8.0"
 description = "Fast iterable JSON parser."
 optional = false
 python-versions = ">=3.8"
 files = [
-    {file = "jiter-0.7.1-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:262e96d06696b673fad6f257e6a0abb6e873dc22818ca0e0600f4a1189eb334f"},
-    {file = "jiter-0.7.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:be6de02939aac5be97eb437f45cfd279b1dc9de358b13ea6e040e63a3221c40d"},
-    {file = "jiter-0.7.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:935f10b802bc1ce2b2f61843e498c7720aa7f4e4bb7797aa8121eab017293c3d"},
-    {file = "jiter-0.7.1-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:9cd3cccccabf5064e4bb3099c87bf67db94f805c1e62d1aefd2b7476e90e0ee2"},
-    {file = "jiter-0.7.1-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:4aa919ebfc5f7b027cc368fe3964c0015e1963b92e1db382419dadb098a05192"},
-    {file = "jiter-0.7.1-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:5ae2d01e82c94491ce4d6f461a837f63b6c4e6dd5bb082553a70c509034ff3d4"},
-    {file = "jiter-0.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9f9568cd66dbbdab67ae1b4c99f3f7da1228c5682d65913e3f5f95586b3cb9a9"},
-    {file = "jiter-0.7.1-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:9ecbf4e20ec2c26512736284dc1a3f8ed79b6ca7188e3b99032757ad48db97dc"},
-    {file = "jiter-0.7.1-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:b1a0508fddc70ce00b872e463b387d49308ef02b0787992ca471c8d4ba1c0fa1"},
-    {file = "jiter-0.7.1-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:f84c9996664c460f24213ff1e5881530abd8fafd82058d39af3682d5fd2d6316"},
-    {file = "jiter-0.7.1-cp310-none-win32.whl", hash = "sha256:c915e1a1960976ba4dfe06551ea87063b2d5b4d30759012210099e712a414d9f"},
-    {file = "jiter-0.7.1-cp310-none-win_amd64.whl", hash = "sha256:75bf3b7fdc5c0faa6ffffcf8028a1f974d126bac86d96490d1b51b3210aa0f3f"},
-    {file = "jiter-0.7.1-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:ad04a23a91f3d10d69d6c87a5f4471b61c2c5cd6e112e85136594a02043f462c"},
-    {file = "jiter-0.7.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:1e47a554de88dff701226bb5722b7f1b6bccd0b98f1748459b7e56acac2707a5"},
-    {file = "jiter-0.7.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1e44fff69c814a2e96a20b4ecee3e2365e9b15cf5fe4e00869d18396daa91dab"},
-    {file = "jiter-0.7.1-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:df0a1d05081541b45743c965436f8b5a1048d6fd726e4a030113a2699a6046ea"},
-    {file = "jiter-0.7.1-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f22cf8f236a645cb6d8ffe2a64edb5d2b66fb148bf7c75eea0cb36d17014a7bc"},
-    {file = "jiter-0.7.1-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:da8589f50b728ea4bf22e0632eefa125c8aa9c38ed202a5ee6ca371f05eeb3ff"},
-    {file = "jiter-0.7.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f20de711224f2ca2dbb166a8d512f6ff48c9c38cc06b51f796520eb4722cc2ce"},
-    {file = "jiter-0.7.1-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:8a9803396032117b85ec8cbf008a54590644a062fedd0425cbdb95e4b2b60479"},
-    {file = "jiter-0.7.1-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:3d8bae77c82741032e9d89a4026479061aba6e646de3bf5f2fc1ae2bbd9d06e0"},
-    {file = "jiter-0.7.1-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:3dc9939e576bbc68c813fc82f6620353ed68c194c7bcf3d58dc822591ec12490"},
-    {file = "jiter-0.7.1-cp311-none-win32.whl", hash = "sha256:f7605d24cd6fab156ec89e7924578e21604feee9c4f1e9da34d8b67f63e54892"},
-    {file = "jiter-0.7.1-cp311-none-win_amd64.whl", hash = "sha256:f3ea649e7751a1a29ea5ecc03c4ada0a833846c59c6da75d747899f9b48b7282"},
-    {file = "jiter-0.7.1-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:ad36a1155cbd92e7a084a568f7dc6023497df781adf2390c345dd77a120905ca"},
-    {file = "jiter-0.7.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:7ba52e6aaed2dc5c81a3d9b5e4ab95b039c4592c66ac973879ba57c3506492bb"},
-    {file = "jiter-0.7.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2b7de0b6f6728b678540c7927587e23f715284596724be203af952418acb8a2d"},
-    {file = "jiter-0.7.1-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:9463b62bd53c2fb85529c700c6a3beb2ee54fde8bef714b150601616dcb184a6"},
-    {file = "jiter-0.7.1-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:627164ec01d28af56e1f549da84caf0fe06da3880ebc7b7ee1ca15df106ae172"},
-    {file = "jiter-0.7.1-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:25d0e5bf64e368b0aa9e0a559c3ab2f9b67e35fe7269e8a0d81f48bbd10e8963"},
-    {file = "jiter-0.7.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c244261306f08f8008b3087059601997016549cb8bb23cf4317a4827f07b7d74"},
-    {file = "jiter-0.7.1-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:7ded4e4b75b68b843b7cea5cd7c55f738c20e1394c68c2cb10adb655526c5f1b"},
-    {file = "jiter-0.7.1-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:80dae4f1889b9d09e5f4de6b58c490d9c8ce7730e35e0b8643ab62b1538f095c"},
-    {file = "jiter-0.7.1-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:5970cf8ec943b51bce7f4b98d2e1ed3ada170c2a789e2db3cb484486591a176a"},
-    {file = "jiter-0.7.1-cp312-none-win32.whl", hash = "sha256:701d90220d6ecb3125d46853c8ca8a5bc158de8c49af60fd706475a49fee157e"},
-    {file = "jiter-0.7.1-cp312-none-win_amd64.whl", hash = "sha256:7824c3ecf9ecf3321c37f4e4d4411aad49c666ee5bc2a937071bdd80917e4533"},
-    {file = "jiter-0.7.1-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:097676a37778ba3c80cb53f34abd6943ceb0848263c21bf423ae98b090f6c6ba"},
-    {file = "jiter-0.7.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:3298af506d4271257c0a8f48668b0f47048d69351675dd8500f22420d4eec378"},
-    {file = "jiter-0.7.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:12fd88cfe6067e2199964839c19bd2b422ca3fd792949b8f44bb8a4e7d21946a"},
-    {file = "jiter-0.7.1-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:dacca921efcd21939123c8ea8883a54b9fa7f6545c8019ffcf4f762985b6d0c8"},
-    {file = "jiter-0.7.1-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:de3674a5fe1f6713a746d25ad9c32cd32fadc824e64b9d6159b3b34fd9134143"},
-    {file = "jiter-0.7.1-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:65df9dbae6d67e0788a05b4bad5706ad40f6f911e0137eb416b9eead6ba6f044"},
-    {file = "jiter-0.7.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7ba9a358d59a0a55cccaa4957e6ae10b1a25ffdabda863c0343c51817610501d"},
-    {file = "jiter-0.7.1-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:576eb0f0c6207e9ede2b11ec01d9c2182973986514f9c60bc3b3b5d5798c8f50"},
-    {file = "jiter-0.7.1-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:e550e29cdf3577d2c970a18f3959e6b8646fd60ef1b0507e5947dc73703b5627"},
-    {file = "jiter-0.7.1-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:81d968dbf3ce0db2e0e4dec6b0a0d5d94f846ee84caf779b07cab49f5325ae43"},
-    {file = "jiter-0.7.1-cp313-none-win32.whl", hash = "sha256:f892e547e6e79a1506eb571a676cf2f480a4533675f834e9ae98de84f9b941ac"},
-    {file = "jiter-0.7.1-cp313-none-win_amd64.whl", hash = "sha256:0302f0940b1455b2a7fb0409b8d5b31183db70d2b07fd177906d83bf941385d1"},
-    {file = "jiter-0.7.1-cp38-cp38-macosx_10_12_x86_64.whl", hash = "sha256:c65a3ce72b679958b79d556473f192a4dfc5895e8cc1030c9f4e434690906076"},
-    {file = "jiter-0.7.1-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:e80052d3db39f9bb8eb86d207a1be3d9ecee5e05fdec31380817f9609ad38e60"},
-    {file = "jiter-0.7.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:70a497859c4f3f7acd71c8bd89a6f9cf753ebacacf5e3e799138b8e1843084e3"},
-    {file = "jiter-0.7.1-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:c1288bc22b9e36854a0536ba83666c3b1fb066b811019d7b682c9cf0269cdf9f"},
-    {file = "jiter-0.7.1-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:b096ca72dd38ef35675e1d3b01785874315182243ef7aea9752cb62266ad516f"},
-    {file = "jiter-0.7.1-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:8dbbd52c50b605af13dbee1a08373c520e6fcc6b5d32f17738875847fea4e2cd"},
-    {file = "jiter-0.7.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:af29c5c6eb2517e71ffa15c7ae9509fa5e833ec2a99319ac88cc271eca865519"},
-    {file = "jiter-0.7.1-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:f114a4df1e40c03c0efbf974b376ed57756a1141eb27d04baee0680c5af3d424"},
-    {file = "jiter-0.7.1-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:191fbaee7cf46a9dd9b817547bf556facde50f83199d07fc48ebeff4082f9df4"},
-    {file = "jiter-0.7.1-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:0e2b445e5ee627fb4ee6bbceeb486251e60a0c881a8e12398dfdff47c56f0723"},
-    {file = "jiter-0.7.1-cp38-none-win32.whl", hash = "sha256:47ac4c3cf8135c83e64755b7276339b26cd3c7ddadf9e67306ace4832b283edf"},
-    {file = "jiter-0.7.1-cp38-none-win_amd64.whl", hash = "sha256:60b49c245cd90cde4794f5c30f123ee06ccf42fb8730a019a2870cd005653ebd"},
-    {file = "jiter-0.7.1-cp39-cp39-macosx_10_12_x86_64.whl", hash = "sha256:8f212eeacc7203256f526f550d105d8efa24605828382cd7d296b703181ff11d"},
-    {file = "jiter-0.7.1-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:d9e247079d88c00e75e297e6cb3a18a039ebcd79fefc43be9ba4eb7fb43eb726"},
-    {file = "jiter-0.7.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f0aacaa56360139c53dcf352992b0331f4057a0373bbffd43f64ba0c32d2d155"},
-    {file = "jiter-0.7.1-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:bc1b55314ca97dbb6c48d9144323896e9c1a25d41c65bcb9550b3e0c270ca560"},
-    {file = "jiter-0.7.1-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f281aae41b47e90deb70e7386558e877a8e62e1693e0086f37d015fa1c102289"},
-    {file = "jiter-0.7.1-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:93c20d2730a84d43f7c0b6fb2579dc54335db742a59cf9776d0b80e99d587382"},
-    {file = "jiter-0.7.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e81ccccd8069110e150613496deafa10da2f6ff322a707cbec2b0d52a87b9671"},
-    {file = "jiter-0.7.1-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:0a7d5e85766eff4c9be481d77e2226b4c259999cb6862ccac5ef6621d3c8dcce"},
-    {file = "jiter-0.7.1-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:f52ce5799df5b6975439ecb16b1e879d7655e1685b6e3758c9b1b97696313bfb"},
-    {file = "jiter-0.7.1-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:e0c91a0304373fdf97d56f88356a010bba442e6d995eb7773cbe32885b71cdd8"},
-    {file = "jiter-0.7.1-cp39-none-win32.whl", hash = "sha256:5c08adf93e41ce2755970e8aa95262298afe2bf58897fb9653c47cd93c3c6cdc"},
-    {file = "jiter-0.7.1-cp39-none-win_amd64.whl", hash = "sha256:6592f4067c74176e5f369228fb2995ed01400c9e8e1225fb73417183a5e635f0"},
-    {file = "jiter-0.7.1.tar.gz", hash = "sha256:448cf4f74f7363c34cdef26214da527e8eeffd88ba06d0b80b485ad0667baf5d"},
+    {file = "jiter-0.8.0-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:dee4eeb293ffcd2c3b31ebab684dbf7f7b71fe198f8eddcdf3a042cc6e10205a"},
+    {file = "jiter-0.8.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:aad1e6e9b01cf0304dcee14db03e92e0073287a6297caf5caf2e9dbfea16a924"},
+    {file = "jiter-0.8.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:504099fb7acdbe763e10690d560a25d4aee03d918d6a063f3a761d8a09fb833f"},
+    {file = "jiter-0.8.0-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:2373487caad7fe39581f588ab5c9262fc1ade078d448626fec93f4ffba528858"},
+    {file = "jiter-0.8.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:c341ecc3f9bccde952898b0c97c24f75b84b56a7e2f8bbc7c8e38cab0875a027"},
+    {file = "jiter-0.8.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0e48e7a336529b9419d299b70c358d4ebf99b8f4b847ed3f1000ec9f320e8c0c"},
+    {file = "jiter-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f5ee157a8afd2943be690db679f82fafb8d347a8342e8b9c34863de30c538d55"},
+    {file = "jiter-0.8.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:d7dceae3549b80087f913aad4acc2a7c1e0ab7cb983effd78bdc9c41cabdcf18"},
+    {file = "jiter-0.8.0-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:e29e9ecce53d396772590438214cac4ab89776f5e60bd30601f1050b34464019"},
+    {file = "jiter-0.8.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:fa1782f22d5f92c620153133f35a9a395d3f3823374bceddd3e7032e2fdfa0b1"},
+    {file = "jiter-0.8.0-cp310-none-win32.whl", hash = "sha256:f754ef13b4e4f67a3bf59fe974ef4342523801c48bf422f720bd37a02a360584"},
+    {file = "jiter-0.8.0-cp310-none-win_amd64.whl", hash = "sha256:796f750b65f5d605f5e7acaccc6b051675e60c41d7ac3eab40dbd7b5b81a290f"},
+    {file = "jiter-0.8.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:f6f4e645efd96b4690b9b6091dbd4e0fa2885ba5c57a0305c1916b75b4f30ff6"},
+    {file = "jiter-0.8.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:f61cf6d93c1ade9b8245c9f14b7900feadb0b7899dbe4aa8de268b705647df81"},
+    {file = "jiter-0.8.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0396bc5cb1309c6dab085e70bb3913cdd92218315e47b44afe9eace68ee8adaa"},
+    {file = "jiter-0.8.0-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:62d0e42ec5dc772bd8554a304358220be5d97d721c4648b23f3a9c01ccc2cb26"},
+    {file = "jiter-0.8.0-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:ec4b711989860705733fc59fb8c41b2def97041cea656b37cf6c8ea8dee1c3f4"},
+    {file = "jiter-0.8.0-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:859cc35bf304ab066d88f10a44a3251a9cd057fb11ec23e00be22206db878f4f"},
+    {file = "jiter-0.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5000195921aa293b39b9b5bc959d7fa658e7f18f938c0e52732da8e3cc70a278"},
+    {file = "jiter-0.8.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:36050284c0abde57aba34964d3920f3d6228211b65df7187059bb7c7f143759a"},
+    {file = "jiter-0.8.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:a88f608e050cfe45c48d771e86ecdbf5258314c883c986d4217cc79e1fb5f689"},
+    {file = "jiter-0.8.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:646cf4237665b2e13b4159d8f26d53f59bc9f2e6e135e3a508a2e5dd26d978c6"},
+    {file = "jiter-0.8.0-cp311-none-win32.whl", hash = "sha256:21fe5b8345db1b3023052b2ade9bb4d369417827242892051244af8fae8ba231"},
+    {file = "jiter-0.8.0-cp311-none-win_amd64.whl", hash = "sha256:30c2161c5493acf6b6c3c909973fb64ae863747def01cc7574f3954e0a15042c"},
+    {file = "jiter-0.8.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:d91a52d8f49ada2672a4b808a0c5c25d28f320a2c9ca690e30ebd561eb5a1002"},
+    {file = "jiter-0.8.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:c38cf25cf7862f61410b7a49684d34eb3b5bcbd7ddaf4773eea40e0bd43de706"},
+    {file = "jiter-0.8.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c6189beb5c4b3117624be6b2e84545cff7611f5855d02de2d06ff68e316182be"},
+    {file = "jiter-0.8.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e13fa849c0e30643554add089983caa82f027d69fad8f50acadcb21c462244ab"},
+    {file = "jiter-0.8.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:d7765ca159d0a58e8e0f8ca972cd6d26a33bc97b4480d0d2309856763807cd28"},
+    {file = "jiter-0.8.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:1b0befe7c6e9fc867d5bed21bab0131dfe27d1fa5cd52ba2bced67da33730b7d"},
+    {file = "jiter-0.8.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e7d6363d4c6f1052b1d8b494eb9a72667c3ef5f80ebacfe18712728e85327000"},
+    {file = "jiter-0.8.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:a873e57009863eeac3e3969e4653f07031d6270d037d6224415074ac17e5505c"},
+    {file = "jiter-0.8.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:2582912473c0d9940791479fe1bf2976a34f212eb8e0a82ee9e645ac275c5d16"},
+    {file = "jiter-0.8.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:646163201af42f55393ee6e8f6136b8df488253a6533f4230a64242ecbfe6048"},
+    {file = "jiter-0.8.0-cp312-none-win32.whl", hash = "sha256:96e75c9abfbf7387cba89a324d2356d86d8897ac58c956017d062ad510832dae"},
+    {file = "jiter-0.8.0-cp312-none-win_amd64.whl", hash = "sha256:ed6074552b4a32e047b52dad5ab497223721efbd0e9efe68c67749f094a092f7"},
+    {file = "jiter-0.8.0-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:dd5e351cb9b3e676ec3360a85ea96def515ad2b83c8ae3a251ce84985a2c9a6f"},
+    {file = "jiter-0.8.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:ba9f12b0f801ecd5ed0cec29041dc425d1050922b434314c592fc30d51022467"},
+    {file = "jiter-0.8.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a7ba461c3681728d556392e8ae56fb44a550155a24905f01982317b367c21dd4"},
+    {file = "jiter-0.8.0-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:3a15ed47ab09576db560dbc5c2c5a64477535beb056cd7d997d5dd0f2798770e"},
+    {file = "jiter-0.8.0-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:cef55042816d0737142b0ec056c0356a5f681fb8d6aa8499b158e87098f4c6f8"},
+    {file = "jiter-0.8.0-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:549f170215adeb5e866f10617c3d019d8eb4e6d4e3c6b724b3b8c056514a3487"},
+    {file = "jiter-0.8.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f867edeb279d22020877640d2ea728de5817378c60a51be8af731a8a8f525306"},
+    {file = "jiter-0.8.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:aef8845f463093799db4464cee2aa59d61aa8edcb3762aaa4aacbec3f478c929"},
+    {file = "jiter-0.8.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:d0d6e22e4062c3d3c1bf3594baa2f67fc9dcdda8275abad99e468e0c6540bc54"},
+    {file = "jiter-0.8.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:079e62e64696241ac3f408e337aaac09137ed760ccf2b72b1094b48745c13641"},
+    {file = "jiter-0.8.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:74d2b56ed3da5760544df53b5f5c39782e68efb64dc3aa0bba4cc08815e6fae8"},
+    {file = "jiter-0.8.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:798dafe108cba58a7bb0a50d4d5971f98bb7f3c974e1373e750de6eb21c1a329"},
+    {file = "jiter-0.8.0-cp313-none-win32.whl", hash = "sha256:ca6d3064dfc743eb0d3d7539d89d4ba886957c717567adc72744341c1e3573c9"},
+    {file = "jiter-0.8.0-cp313-none-win_amd64.whl", hash = "sha256:38caedda64fe1f04b06d7011fc15e86b3b837ed5088657bf778656551e3cd8f9"},
+    {file = "jiter-0.8.0-cp38-cp38-macosx_10_12_x86_64.whl", hash = "sha256:bb5c8a0a8d081c338db22e5b8d53a89a121790569cbb85f7d3cfb1fe0fbe9836"},
+    {file = "jiter-0.8.0-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:202dbe8970bfb166fab950eaab8f829c505730a0b33cc5e1cfb0a1c9dd56b2f9"},
+    {file = "jiter-0.8.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9046812e5671fdcfb9ae02881fff1f6a14d484b7e8b3316179a372cdfa1e8026"},
+    {file = "jiter-0.8.0-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e6ac56425023e52d65150918ae25480d0a1ce2a6bf5ea2097f66a2cc50f6d692"},
+    {file = "jiter-0.8.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:7dfcf97210c6eab9d2a1c6af15dd39e1d5154b96a7145d0a97fa1df865b7b834"},
+    {file = "jiter-0.8.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:d4e3c8444d418686f78c9a547b9b90031faf72a0a1a46bfec7fb31edbd889c0d"},
+    {file = "jiter-0.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6507011a299b7f578559084256405a8428875540d8d13530e00b688e41b09493"},
+    {file = "jiter-0.8.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:0aae4738eafdd34f0f25c2d3668ce9e8fa0d7cb75a2efae543c9a69aebc37323"},
+    {file = "jiter-0.8.0-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:7f5d782e790396b13f2a7b36bdcaa3736a33293bdda80a4bf1a3ce0cd5ef9f15"},
+    {file = "jiter-0.8.0-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:cc7f993bc2c4e03015445adbb16790c303282fce2e8d9dc3a3905b1d40e50564"},
+    {file = "jiter-0.8.0-cp38-none-win32.whl", hash = "sha256:d4a8a6eda018a991fa58ef707dd51524055d11f5acb2f516d70b1be1d15ab39c"},
+    {file = "jiter-0.8.0-cp38-none-win_amd64.whl", hash = "sha256:4cca948a3eda8ea24ed98acb0ee19dc755b6ad2e570ec85e1527d5167f91ff67"},
+    {file = "jiter-0.8.0-cp39-cp39-macosx_10_12_x86_64.whl", hash = "sha256:ef89663678d8257063ce7c00d94638e05bd72f662c5e1eb0e07a172e6c1a9a9f"},
+    {file = "jiter-0.8.0-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:c402ddcba90b4cc71db3216e8330f4db36e0da2c78cf1d8a9c3ed8f272602a94"},
+    {file = "jiter-0.8.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1a6dfe795b7a173a9f8ba7421cdd92193d60c1c973bbc50dc3758a9ad0fa5eb6"},
+    {file = "jiter-0.8.0-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:8ec29a31b9abd6be39453a2c45da067138a3005d65d2c0507c530e0f1fdcd9a4"},
+    {file = "jiter-0.8.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:2a488f8c54bddc3ddefaf3bfd6de4a52c97fc265d77bc2dcc6ee540c17e8c342"},
+    {file = "jiter-0.8.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:aeb5561adf4d26ca0d01b5811b4d7b56a8986699a473d700757b4758ef787883"},
+    {file = "jiter-0.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4ab961858d7ad13132328517d29f121ae1b2d94502191d6bcf96bddcc8bb5d1c"},
+    {file = "jiter-0.8.0-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:a207e718d114d23acf0850a2174d290f42763d955030d9924ffa4227dbd0018f"},
+    {file = "jiter-0.8.0-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:733bc9dc8ff718a0ae4695239e9268eb93e88b73b367dfac3ec227d8ce2f1e77"},
+    {file = "jiter-0.8.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:d1ec27299e22d05e13a06e460bf7f75f26f9aaa0e0fb7d060f40e88df1d81faa"},
+    {file = "jiter-0.8.0-cp39-none-win32.whl", hash = "sha256:e8dbfcb46553e6661d3fc1f33831598fcddf73d0f67834bce9fc3e9ebfe5c439"},
+    {file = "jiter-0.8.0-cp39-none-win_amd64.whl", hash = "sha256:af2ce2487b3a93747e2cb5150081d4ae1e5874fce5924fc1a12e9e768e489ad8"},
+    {file = "jiter-0.8.0.tar.gz", hash = "sha256:86fee98b569d4cc511ff2e3ec131354fafebd9348a487549c31ad371ae730310"},
 ]

 [[package]]
@ -5968,6 +5970,47 @@ timezone = ["backports-zoneinfo", "tzdata"]
 xlsx2csv = ["xlsx2csv (>=0.8.0)"]
 xlsxwriter = ["xlsxwriter"]

+[[package]]
+name = "polars-lts-cpu"
+version = "1.15.0"
+description = "Blazingly fast DataFrame library"
+optional = false
+python-versions = ">=3.9"
+files = [
+    {file = "polars_lts_cpu-1.15.0-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:21dfba817d74b34b01e9f9c301d07a6308c84e84588f28854413046ea8cc2692"},
+    {file = "polars_lts_cpu-1.15.0-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:3ec8bb702555fe29eca975cb18612acc7eb62e29023de5487b3e2365c8fbfdf4"},
+    {file = "polars_lts_cpu-1.15.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:cf5414c234de781861f7a258eeb5a14f06280cc98d6d1d3fc08cdc506e764e21"},
+    {file = "polars_lts_cpu-1.15.0-cp39-abi3-manylinux_2_24_aarch64.whl", hash = "sha256:930acac72f7143d9baee45c8314e2d1bf342347e37df9da7231403d810f13081"},
+    {file = "polars_lts_cpu-1.15.0-cp39-abi3-win_amd64.whl", hash = "sha256:71be1e40f25ae4987d8279d8881e855fb7bd1957ed51a149c5191006bc430e72"},
+    {file = "polars_lts_cpu-1.15.0.tar.gz", hash = "sha256:0a1108675521871e6d30e4c1941a0a9962a0af979da7d68fc41fdb1a07efd8d7"},
+]
+
+[package.extras]
+adbc = ["adbc-driver-manager[dbapi]", "adbc-driver-sqlite[dbapi]"]
+all = ["polars-lts-cpu[async,cloudpickle,database,deltalake,excel,fsspec,graph,iceberg,numpy,pandas,plot,pyarrow,pydantic,style,timezone]"]
+async = ["gevent"]
+calamine = ["fastexcel (>=0.9)"]
+cloudpickle = ["cloudpickle"]
+connectorx = ["connectorx (>=0.3.2)"]
+database = ["nest-asyncio", "polars-lts-cpu[adbc,connectorx,sqlalchemy]"]
+deltalake = ["deltalake (>=0.15.0)"]
+excel = ["polars-lts-cpu[calamine,openpyxl,xlsx2csv,xlsxwriter]"]
+fsspec = ["fsspec"]
+gpu = ["cudf-polars-cu12"]
+graph = ["matplotlib"]
+iceberg = ["pyiceberg (>=0.5.0)"]
+numpy = ["numpy (>=1.16.0)"]
+openpyxl = ["openpyxl (>=3.0.0)"]
+pandas = ["pandas", "polars-lts-cpu[pyarrow]"]
+plot = ["altair (>=5.4.0)"]
+pyarrow = ["pyarrow (>=7.0.0)"]
+pydantic = ["pydantic"]
+sqlalchemy = ["polars-lts-cpu[pandas]", "sqlalchemy"]
+style = ["great-tables (>=0.8.0)"]
+timezone = ["backports-zoneinfo", "tzdata"]
+xlsx2csv = ["xlsx2csv (>=0.8.0)"]
+xlsxwriter = ["xlsxwriter"]
+
 [[package]]
 name = "pooch"
 version = "1.8.2"
@ -10036,4 +10079,4 @@ cffi = ["cffi (>=1.11)"]
 [metadata]
 lock-version = "2.0"
 python-versions = ">=3.10,<3.13"
-content-hash = "4d6643be9ffc45d46c9a706380c0c6796d3ddef1d0acb74cdf7d074aeab9dbfa"
+content-hash = "393f51d55da83dc829e387a5f08087a2e90a40ea63dd034586b7717143a115ca"
--- a/pyproject.toml
+++ b/pyproject.toml
@ -16,7 +16,7 @@ azure-identity = "1.17.1"
 azure-storage-file-datalake = "12.16.0"
 anthropic = "=0.34.1"
 arxiv = "2.1.3"
-aspose-slides = { version = "^24.9.0", markers = "platform_machine == 'x86_64'" }
+aspose-slides = { version = "^24.9.0", markers = "platform_machine == 'x86_64' or (sys_platform == 'darwin' and platform_machine == 'arm64') " }
 beartype = "^0.18.5"
 bio = "1.7.1"
 boto3 = "1.34.140"
@ -118,7 +118,8 @@ pymysql = "^1.1.1"
 mini-racer = "^0.12.4"
 pyicu = "^2.13.1"
 flasgger = "^0.9.7.1"
-polars = "^1.9.0"
+polars = { version = "^1.9.0", markers = "platform_machine == 'x86_64'" }
+polars-lts-cpu = { version = "^1.9.0", markers = "platform_machine == 'arm64'" }


 [tool.poetry.group.full]
--- a/rag/app/paper.py
+++ b/rag/app/paper.py
@ -99,11 +99,11 @@ class Pdf(PdfParser):
            i += 1
            txt = b["text"].lower().strip()
            if re.match("(abstract|摘要)", txt):
-                if len(txt.split(" ")) > 32 or len(txt) > 64:
+                if len(txt.split()) > 32 or len(txt) > 64:
                    abstr = txt + self._line_tag(b, zoomin)
                    break
                txt = self.boxes[i]["text"].lower().strip()
-                if len(txt.split(" ")) > 32 or len(txt) > 64:
+                if len(txt.split()) > 32 or len(txt) > 64:
                    abstr = txt + self._line_tag(self.boxes[i], zoomin)
                i += 1
                break
--- a/rag/app/picture.py
+++ b/rag/app/picture.py
@ -33,7 +33,7 @@ def chunk(filename, binary, tenant_id, lang, callback=None, **kwargs):
    txt = "\n".join([t[0] for _, t in bxs if t[0]])
    eng = lang.lower() == "english"
    callback(0.4, "Finish OCR: (%s ...)" % txt[:12])
-    if (eng and len(txt.split(" ")) > 32) or len(txt) > 32:
+    if (eng and len(txt.split()) > 32) or len(txt) > 32:
        tokenize(doc, txt, eng)
        callback(0.8, "OCR results is too long to use CV LLM.")
        return [doc]
@ -41,7 +41,10 @@ def chunk(filename, binary, tenant_id, lang, callback=None, **kwargs):
    try:
        callback(0.4, "Use CV LLM to describe the picture.")
        cv_mdl = LLMBundle(tenant_id, LLMType.IMAGE2TEXT, lang=lang)
-        ans = cv_mdl.describe(binary)
+        img_binary = io.BytesIO()
+        img.save(img_binary, format='JPEG')
+        img_binary.seek(0)
+        ans = cv_mdl.describe(img_binary.read())
        callback(0.8, "CV LLM respond: %s ..." % ans[:32])
        txt += "\n" + ans
        tokenize(doc, txt, eng)
--- a/rag/benchmark.py
+++ b/rag/benchmark.py
@ -237,8 +237,8 @@ class Benchmark:
                scores = sorted(scores, key=lambda kk: kk[1])
                for score in scores[:10]:
                    f.write('- text: ' + str(texts[score[0]]) + '\t qrel: ' + str(score[1]) + '\n')
-        json.dump(qrels, open(os.path.join(file_path, dataset + '.qrels.json'), "w+"), indent=2)
-        json.dump(run, open(os.path.join(file_path, dataset + '.run.json'), "w+"), indent=2)
+        json.dump(qrels, open(os.path.join(file_path, dataset + '.qrels.json'), "w+", encoding='utf-8'), indent=2)
+        json.dump(run, open(os.path.join(file_path, dataset + '.run.json'), "w+", encoding='utf-8'), indent=2)
        print(os.path.join(file_path, dataset + '_result.md'), 'Saved!')

    def __call__(self, dataset, file_path, miracl_corpus=''):
--- a/rag/llm/init.py
+++ b/rag/llm/init.py
@ -48,6 +48,7 @@ EmbeddingModel = {
    "BaiduYiyan": BaiduYiyanEmbed,
    "Voyage AI": VoyageEmbed,
    "HuggingFace": HuggingFaceEmbed,
+    "VolcEngine":VolcEngineEmbed,
 }

 CvModel = {
--- a/rag/llm/chat_model.py
+++ b/rag/llm/chat_model.py
@ -366,7 +366,7 @@ class OllamaChat(Base):
                keep_alive=-1
            )
            ans = response["message"]["content"].strip()
-            return ans, response["eval_count"] + response.get("prompt_eval_count", 0)
+            return ans, response.get("eval_count", 0) + response.get("prompt_eval_count", 0)
        except Exception as e:
            return "**ERROR**: " + str(e), 0

--- a/rag/llm/embedding_model.py
+++ b/rag/llm/embedding_model.py
@ -38,7 +38,7 @@ class Base(ABC):
    def __init__(self, key, model_name):
        pass

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        raise NotImplementedError("Please implement encode method!")

    def encode_queries(self, text: str):
@ -78,7 +78,7 @@ class DefaultEmbedding(Base):
                                                            use_fp16=torch.cuda.is_available())
        self._model = DefaultEmbedding._model

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        texts = [truncate(t, 2048) for t in texts]
        token_count = 0
        for t in texts:
@ -101,7 +101,7 @@ class OpenAIEmbed(Base):
        self.client = OpenAI(api_key=key, base_url=base_url)
        self.model_name = model_name

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        texts = [truncate(t, 8191) for t in texts]
        res = self.client.embeddings.create(input=texts,
                                            model=self.model_name)
@ -123,7 +123,7 @@ class LocalAIEmbed(Base):
        self.client = OpenAI(api_key="empty", base_url=base_url)
        self.model_name = model_name.split("___")[0]

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        res = self.client.embeddings.create(input=texts, model=self.model_name)
        return (
            np.array([d.embedding for d in res.data]),
@ -200,7 +200,7 @@ class ZhipuEmbed(Base):
        self.client = ZhipuAI(api_key=key)
        self.model_name = model_name

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        arr = []
        tks_num = 0
        for txt in texts:
@ -221,7 +221,7 @@ class OllamaEmbed(Base):
        self.client = Client(host=kwargs["base_url"])
        self.model_name = model_name

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        arr = []
        tks_num = 0
        for txt in texts:
@ -252,7 +252,7 @@ class FastEmbed(Base):
            from fastembed import TextEmbedding
            self._model = TextEmbedding(model_name, cache_dir, threads, **kwargs)

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        # Using the internal tokenizer to encode the texts and get the total
        # number of tokens
        encodings = self._model.model.tokenizer.encode_batch(texts)
@ -278,7 +278,7 @@ class XinferenceEmbed(Base):
        self.client = OpenAI(api_key=key, base_url=base_url)
        self.model_name = model_name

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        res = self.client.embeddings.create(input=texts,
                                            model=self.model_name)
        return np.array([d.embedding for d in res.data]
@ -394,7 +394,7 @@ class MistralEmbed(Base):
        self.client = MistralClient(api_key=key)
        self.model_name = model_name

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        texts = [truncate(t, 8196) for t in texts]
        res = self.client.embeddings(input=texts,
                                            model=self.model_name)
@ -418,7 +418,7 @@ class BedrockEmbed(Base):
        self.client = boto3.client(service_name='bedrock-runtime', region_name=self.bedrock_region,
                                   aws_access_key_id=self.bedrock_ak, aws_secret_access_key=self.bedrock_sk)

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        texts = [truncate(t, 8196) for t in texts]
        embeddings = []
        token_count = 0
@ -456,7 +456,7 @@ class GeminiEmbed(Base):
        genai.configure(api_key=key)
        self.model_name = 'models/' + model_name
        
-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        texts = [truncate(t, 2048) for t in texts]
        token_count = sum(num_tokens_from_string(text) for text in texts)
        result = genai.embed_content(
@ -541,7 +541,7 @@ class CoHereEmbed(Base):
        self.client = Client(api_key=key)
        self.model_name = model_name

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        res = self.client.embed(
            texts=texts,
            model=self.model_name,
@ -599,7 +599,7 @@ class SILICONFLOWEmbed(Base):
        self.base_url = base_url
        self.model_name = model_name

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        payload = {
            "model": self.model_name,
            "input": texts,
@ -628,7 +628,7 @@ class ReplicateEmbed(Base):
        self.model_name = model_name
        self.client = Client(api_token=key)

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        res = self.client.run(self.model_name, input={"texts": json.dumps(texts)})
        return np.array(res), sum([num_tokens_from_string(text) for text in texts])

@ -647,7 +647,7 @@ class BaiduYiyanEmbed(Base):
        self.client = qianfan.Embedding(ak=ak, sk=sk)
        self.model_name = model_name

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        res = self.client.do(model=self.model_name, texts=texts).body
        return (
            np.array([r["embedding"] for r in res["data"]]),
@ -669,7 +669,7 @@ class VoyageEmbed(Base):
        self.client = voyageai.Client(api_key=key)
        self.model_name = model_name

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        res = self.client.embed(
            texts=texts, model=self.model_name, input_type="document"
        )
@ -691,7 +691,7 @@ class HuggingFaceEmbed(Base):
        self.model_name = model_name
        self.base_url = base_url or "http://127.0.0.1:8080"

-    def encode(self, texts: list, batch_size=32):
+    def encode(self, texts: list, batch_size=16):
        embeddings = []
        for text in texts:
            response = requests.post(
@ -718,3 +718,10 @@ class HuggingFaceEmbed(Base):
        else:
            raise Exception(f"Error: {response.status_code} - {response.text}")

+class VolcEngineEmbed(OpenAIEmbed):
+    def __init__(self, key, model_name, base_url="https://ark.cn-beijing.volces.com/api/v3"):
+        if not base_url:
+            base_url = "https://ark.cn-beijing.volces.com/api/v3"
+        ark_api_key = json.loads(key).get('ark_api_key', '')
+        model_name = json.loads(key).get('ep_id', '') + json.loads(key).get('endpoint_id', '')
+        super().__init__(ark_api_key,model_name,base_url)
--- a/rag/llm/rerank_model.py
+++ b/rag/llm/rerank_model.py
@ -158,6 +158,8 @@ class XInferenceRerank(Base):
    def __init__(self, key="xxxxxxx", model_name="", base_url=""):
        if base_url.find("/v1") == -1:
            base_url = urljoin(base_url, "/v1/rerank")
+        if base_url.find("/rerank") == -1:
+            base_url = urljoin(base_url, "/v1/rerank")
        self.model_name = model_name
        self.base_url = base_url
        self.headers = {
--- a/rag/nlp/init.py
+++ b/rag/nlp/init.py
@ -28,6 +28,8 @@ from cn2an import cn2an
 from PIL import Image
 import json

+import chardet
+
 all_codecs = [
    'utf-8', 'gb2312', 'gbk', 'utf_16', 'ascii', 'big5', 'big5hkscs',
    'cp037', 'cp273', 'cp424', 'cp437',
@ -43,12 +45,17 @@ all_codecs = [
    'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_t', 'koi8_u',
    'kz1048', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman',
    'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213',
-    'utf_32', 'utf_32_be', 'utf_32_le''utf_16_be', 'utf_16_le', 'utf_7'
+    'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16_be', 'utf_16_le', 'utf_7', 'windows-1250', 'windows-1251',
+    'windows-1252', 'windows-1253', 'windows-1254', 'windows-1255', 'windows-1256',
+    'windows-1257', 'windows-1258', 'latin-2'
 ]


 def find_codec(blob):
-    global all_codecs
+    detected = chardet.detect(blob[:1024])
+    if detected['confidence'] > 0.5:
+        return detected['encoding']
+
    for c in all_codecs:
        try:
            blob[:1024].decode(c)
@ -318,12 +325,12 @@ def remove_contents_table(sections, eng=False):
        sections.pop(i)
        if i >= len(sections):
            break
-        prefix = get(i)[:3] if not eng else " ".join(get(i).split(" ")[:2])
+        prefix = get(i)[:3] if not eng else " ".join(get(i).split()[:2])
        while not prefix:
            sections.pop(i)
            if i >= len(sections):
                break
-            prefix = get(i)[:3] if not eng else " ".join(get(i).split(" ")[:2])
+            prefix = get(i)[:3] if not eng else " ".join(get(i).split()[:2])
        sections.pop(i)
        if i >= len(sections) or not prefix:
            break
@ -382,7 +389,7 @@ def title_frequency(bull, sections):
 def not_title(txt):
    if re.match(r"第[零一二三四五六七八九十百0-9]+条", txt):
        return False
-    if len(txt.split(" ")) > 12 or (txt.find(" ") < 0 and len(txt) >= 32):
+    if len(txt.split()) > 12 or (txt.find(" ") < 0 and len(txt) >= 32):
        return True
    return re.search(r"[,;，。；！!]", txt)

--- a/rag/nlp/query.py
+++ b/rag/nlp/query.py
@ -54,7 +54,7 @@ class FulltextQueryer:
    def rmWWW(txt):
        patts = [
            (
-                r"是*(什么样的|哪家|一下|那家|请问|啥样|咋样了|什么时候|何时|何地|何人|是否|是不是|多少|哪里|怎么|哪儿|怎么样|如何|哪些|是啥|啥是|啊|吗|呢|吧|咋|什么|有没有|呀)是*",
+                r"是*(什么样的|哪家|一下|那家|请问|啥样|咋样了|什么时候|何时|何地|何人|是否|是不是|多少|哪里|怎么|哪儿|怎么样|如何|哪些|是啥|啥是|啊|吗|呢|吧|咋|什么|有没有|呀|谁|哪位|哪个)是*",
                "",
            ),
            (r"(^| )(what|who|how|which|where|why)('re|'s)? ", " "),
@ -74,7 +74,7 @@ class FulltextQueryer:

        if not self.isChinese(txt):
            txt = FulltextQueryer.rmWWW(txt)
-            tks = rag_tokenizer.tokenize(txt).split(" ")
+            tks = rag_tokenizer.tokenize(txt).split()
            keywords = [t for t in tks if t]
            tks_w = self.tw.weights(tks, preprocess=False)
            tks_w = [(re.sub(r"[ \\\"'^]", "", tk), w) for tk, w in tks_w]
@ -83,7 +83,7 @@ class FulltextQueryer:
            syns = []
            for tk, w in tks_w:
                syn = self.syn.lookup(tk)
-                syn = rag_tokenizer.tokenize(" ".join(syn)).split(" ")
+                syn = rag_tokenizer.tokenize(" ".join(syn)).split()
                keywords.extend(syn)
                syn = ["\"{}\"^{:.4f}".format(s, w / 4.) for s in syn]
                syns.append(" ".join(syn))
@ -114,7 +114,7 @@ class FulltextQueryer:

        txt = FulltextQueryer.rmWWW(txt)
        qs, keywords = [], []
-        for tt in self.tw.split(txt)[:256]:  # .split(" "):
+        for tt in self.tw.split(txt)[:256]:  # .split():
            if not tt:
                continue
            keywords.append(tt)
@ -125,7 +125,7 @@ class FulltextQueryer:
            tms = []
            for tk, w in sorted(twts, key=lambda x: x[1] * -1):
                sm = (
-                    rag_tokenizer.fine_grained_tokenize(tk).split(" ")
+                    rag_tokenizer.fine_grained_tokenize(tk).split()
                    if need_fine_grained_tokenize(tk)
                    else []
                )
@ -194,7 +194,7 @@ class FulltextQueryer:
        def toDict(tks):
            d = {}
            if isinstance(tks, str):
-                tks = tks.split(" ")
+                tks = tks.split()
            for t, c in self.tw.weights(tks, preprocess=False):
                if t not in d:
                    d[t] = 0
--- a/rag/nlp/rag_tokenizer.py
+++ b/rag/nlp/rag_tokenizer.py
@ -192,7 +192,7 @@ class RagTokenizer:

        # if split chars is part of token
        res = []
-        tks = re.sub(r"[ ]+", " ", tks).split(" ")
+        tks = re.sub(r"[ ]+", " ", tks).split()
        s = 0
        while True:
            if s >= len(tks):
@ -329,7 +329,7 @@ class RagTokenizer:
        return self.merge_(res)

    def fine_grained_tokenize(self, tks):
-        tks = tks.split(" ")
+        tks = tks.split()
        zh_num = len([1 for c in tks if c and is_chinese(c[0])])
        if zh_num < len(tks) * 0.2:
            res = []
@ -393,7 +393,7 @@ def is_alphabet(s):

 def naiveQie(txt):
    tks = []
-    for t in txt.split(" "):
+    for t in txt.split():
        if tks and re.match(r".*[a-zA-Z]$", tks[-1]
                            ) and re.match(r".*[a-zA-Z]$", t):
            tks.append(" ")
--- a/rag/nlp/search.py
+++ b/rag/nlp/search.py
@ -46,6 +46,9 @@ class Dealer:

    def get_vector(self, txt, emb_mdl, topk=10, similarity=0.1):
        qv, _ = emb_mdl.encode_queries(txt)
+        shape = np.array(qv).shape
+        if len(shape) > 1:
+            raise Exception(f"Dealer.get_vector returned array's shape {shape} doesn't match expectation(exact one dimension).")
        embedding_data = [float(v) for v in qv]
        vector_column_name = f"q_{len(embedding_data)}_vec"
        return MatchDenseExpr(vector_column_name, embedding_data, 'float', 'cosine', topk, {"similarity": similarity})
@ -114,7 +117,7 @@ class Dealer:

            for k in keywords:
                kwds.add(k)
-                for kk in rag_tokenizer.fine_grained_tokenize(k).split(" "):
+                for kk in rag_tokenizer.fine_grained_tokenize(k).split():
                    if len(kk) < 2:
                        continue
                    if kk in kwds:
@ -186,7 +189,7 @@ class Dealer:
        assert len(ans_v[0]) == len(chunk_v[0]), "The dimension of query and chunk do not match: {} vs. {}".format(
                len(ans_v[0]), len(chunk_v[0]))

-        chunks_tks = [rag_tokenizer.tokenize(self.qryr.rmWWW(ck)).split(" ")
+        chunks_tks = [rag_tokenizer.tokenize(self.qryr.rmWWW(ck)).split()
                      for ck in chunks]
        cites = {}
        thr = 0.63
@ -195,7 +198,7 @@ class Dealer:
                sim, tksim, vtsim = self.qryr.hybrid_similarity(ans_v[i],
                                                                chunk_v,
                                                                rag_tokenizer.tokenize(
-                                                                    self.qryr.rmWWW(pieces_[i])).split(" "),
+                                                                    self.qryr.rmWWW(pieces_[i])).split(),
                                                                chunks_tks,
                                                                tkweight, vtweight)
                mx = np.max(sim) * 0.99
@ -244,8 +247,8 @@ class Dealer:
                sres.field[i]["important_kwd"] = [sres.field[i]["important_kwd"]]
        ins_tw = []
        for i in sres.ids:
-            content_ltks = sres.field[i][cfield].split(" ")
-            title_tks = [t for t in sres.field[i].get("title_tks", "").split(" ") if t]
+            content_ltks = sres.field[i][cfield].split()
+            title_tks = [t for t in sres.field[i].get("title_tks", "").split() if t]
            important_kwd = sres.field[i].get("important_kwd", [])
            tks = content_ltks + title_tks + important_kwd
            ins_tw.append(tks)
@ -265,8 +268,8 @@ class Dealer:
                sres.field[i]["important_kwd"] = [sres.field[i]["important_kwd"]]
        ins_tw = []
        for i in sres.ids:
-            content_ltks = sres.field[i][cfield].split(" ")
-            title_tks = [t for t in sres.field[i].get("title_tks", "").split(" ") if t]
+            content_ltks = sres.field[i][cfield].split()
+            title_tks = [t for t in sres.field[i].get("title_tks", "").split() if t]
            important_kwd = sres.field[i].get("important_kwd", [])
            tks = content_ltks + title_tks + important_kwd
            ins_tw.append(tks)
@ -279,8 +282,8 @@ class Dealer:
    def hybrid_similarity(self, ans_embd, ins_embd, ans, inst):
        return self.qryr.hybrid_similarity(ans_embd,
                                           ins_embd,
-                                           rag_tokenizer.tokenize(ans).split(" "),
-                                           rag_tokenizer.tokenize(inst).split(" "))
+                                           rag_tokenizer.tokenize(ans).split(),
+                                           rag_tokenizer.tokenize(inst).split())

    def retrieval(self, question, embd_mdl, tenant_ids, kb_ids, page, page_size, similarity_threshold=0.2,
                  vector_similarity_weight=0.3, top=1024, doc_ids=None, aggs=True, rerank_mdl=None, highlight=False):
--- a/rag/nlp/term_weight.py
+++ b/rag/nlp/term_weight.py
@ -99,7 +99,7 @@ class Dealer:
            txt = re.sub(p, r, txt)

        res = []
-        for t in rag_tokenizer.tokenize(txt).split(" "):
+        for t in rag_tokenizer.tokenize(txt).split():
            tk = t
            if (stpwd and tk in self.stop_words) or (
                    re.match(r"[0-9]$", tk) and not num):
@ -150,7 +150,7 @@ class Dealer:

    def split(self, txt):
        tks = []
-        for t in re.sub(r"[ \t]+", " ", txt).split(" "):
+        for t in re.sub(r"[ \t]+", " ", txt).split():
            if tks and re.match(r".*[a-zA-Z]$", tks[-1]) and \
               re.match(r".*[a-zA-Z]$", t) and tks and \
               self.ne.get(t, "") != "func" and self.ne.get(tks[-1], "") != "func":
@ -198,7 +198,7 @@ class Dealer:
                s = 0

            if not s and len(t) >= 4:
-                s = [tt for tt in rag_tokenizer.fine_grained_tokenize(t).split(" ") if len(tt) > 1]
+                s = [tt for tt in rag_tokenizer.fine_grained_tokenize(t).split() if len(tt) > 1]
                if len(s) > 1:
                    s = np.min([freq(tt) for tt in s]) / 6.
                else:
@ -214,7 +214,7 @@ class Dealer:
            elif re.match(r"[a-z. -]+$", t):
                return 300
            elif len(t) >= 4:
-                s = [tt for tt in rag_tokenizer.fine_grained_tokenize(t).split(" ") if len(tt) > 1]
+                s = [tt for tt in rag_tokenizer.fine_grained_tokenize(t).split() if len(tt) > 1]
                if len(s) > 1:
                    return max(3, np.min([df(tt) for tt in s]) / 6.)

@ -228,7 +228,7 @@ class Dealer:
            idf2 = np.array([idf(df(t), 1000000000) for t in tks])
            wts = (0.3 * idf1 + 0.7 * idf2) * \
                np.array([ner(t) * postag(t) for t in tks])
-            wts = [math.pow(s, 2) for s in wts]
+            wts = [s for s in wts]
            tw = list(zip(tks, wts))
        else:
            for tk in tks:
@ -237,7 +237,7 @@ class Dealer:
                idf2 = np.array([idf(df(t), 1000000000) for t in tt])
                wts = (0.3 * idf1 + 0.7 * idf2) * \
                    np.array([ner(t) * postag(t) for t in tt])
-                wts = [math.pow(s, 2) for s in wts]
+                wts = [s for s in wts]
                tw.extend(zip(tt, wts))

        S = np.sum([s for _, s in tw])
--- a/rag/svr/task_executor.py
+++ b/rag/svr/task_executor.py
@ -114,6 +114,7 @@ def set_progress(task_id, from_page=0, to_page=-1, prog=None, msg="Processing...
    if prog is not None:
        d["progress"] = prog
    try:
+        logging.info(f"set_progress({task_id}), progress: {prog}, progress_msg: {msg}")
        TaskService.update_progress(task_id, d)
    except Exception:
        logging.exception(f"set_progress({task_id}) got exception")
@ -492,6 +493,7 @@ def report_status():
            logging.exception("report_status got exception")
        time.sleep(30)

+
 def analyze_heap(snapshot1: tracemalloc.Snapshot, snapshot2: tracemalloc.Snapshot, snapshot_id: int, dump_full: bool):
    msg = ""
    if dump_full:
@ -508,6 +510,7 @@ def analyze_heap(snapshot1: tracemalloc.Snapshot, snapshot2: tracemalloc.Snapsho
        msg += '\n'.join(stat.traceback.format())
    logging.info(msg)

+
 def main():
    settings.init_settings()
    background_thread = threading.Thread(target=report_status)
--- a/rag/utils/es_conn.py
+++ b/rag/utils/es_conn.py
@ -85,6 +85,9 @@ class ESConnection(DocStoreConnection):
            logging.exception("ESConnection.createIndex error %s" % (indexName))

    def deleteIdx(self, indexName: str, knowledgebaseId: str):
+        if len(knowledgebaseId) > 0:
+            # The index need to be alive after any kb deletion since all kb under this tenant are in one index.
+            return
        try:
            self.es.indices.delete(index=indexName, allow_no_indices=True)
        except NotFoundError:
@ -148,9 +151,9 @@ class ESConnection(DocStoreConnection):
                vector_similarity_weight = float(weights.split(",")[1])
        for m in matchExprs:
            if isinstance(m, MatchTextExpr):
-                minimum_should_match = "0%"
-                if "minimum_should_match" in m.extra_options:
-                    minimum_should_match = str(int(m.extra_options["minimum_should_match"] * 100)) + "%"
+                minimum_should_match = m.extra_options.get("minimum_should_match", 0.0)
+                if isinstance(minimum_should_match, float):
+                    minimum_should_match = str(int(minimum_should_match * 100)) + "%"
                bqry.must.append(Q("query_string", fields=m.fields,
                                   type="best_fields", query=m.matching_text,
                                   minimum_should_match=minimum_should_match,
@ -215,11 +218,11 @@ class ESConnection(DocStoreConnection):
                                  id=chunkId, source=True, )
                if str(res.get("timed_out", "")).lower() == "true":
                    raise Exception("Es Timeout.")
-                if not res.get("found"):
-                    return None
                chunk = res["_source"]
                chunk["id"] = chunkId
                return chunk
+            except NotFoundError:
+                return None
            except Exception as e:
                logging.exception(f"ESConnection.get({chunkId}) got exception")
                if str(e).find("Timeout") > 0:
@ -294,7 +297,7 @@ class ESConnection(DocStoreConnection):
                        f"Condition `{str(k)}={str(v)}` value type is {str(type(v))}, expected to be int, str or list.")
            scripts = []
            for k, v in newValue.items():
-                if not isinstance(k, str) or not v:
+                if (not isinstance(k, str) or not v) and k != "available_int":
                    continue
                if isinstance(v, str):
                    scripts.append(f"ctx._source.{k} = '{v}'")
@ -400,7 +403,7 @@ class ESConnection(DocStoreConnection):
            if not hlts:
                continue
            txt = "...".join([a for a in list(hlts.items())[0][1]])
-            if not is_english(txt.split(" ")):
+            if not is_english(txt.split()):
                ans[d["_id"]] = txt
                continue

--- a/rag/utils/infinity_conn.py
+++ b/rag/utils/infinity_conn.py
@ -231,15 +231,10 @@ class InfinityConnection(DocStoreConnection):
                if len(filter_cond) != 0:
                    filter_fulltext = f"({filter_cond}) AND {filter_fulltext}"
                logging.debug(f"filter_fulltext: {filter_fulltext}")
-                minimum_should_match = "0%"
-                if "minimum_should_match" in matchExpr.extra_options:
-                    minimum_should_match = (
-                            str(int(matchExpr.extra_options["minimum_should_match"] * 100))
-                            + "%"
-                    )
-                    matchExpr.extra_options.update(
-                        {"minimum_should_match": minimum_should_match}
-                    )
+                minimum_should_match = matchExpr.extra_options.get("minimum_should_match", 0.0)
+                if isinstance(minimum_should_match, float):
+                    str_minimum_should_match = str(int(minimum_should_match * 100)) + "%"
+                    matchExpr.extra_options["minimum_should_match"] = str_minimum_should_match
                for k, v in matchExpr.extra_options.items():
                    if not isinstance(v, str):
                        matchExpr.extra_options[k] = str(v)
@ -315,7 +310,9 @@ class InfinityConnection(DocStoreConnection):
            table_name = f"{indexName}_{knowledgebaseId}"
            table_instance = db_instance.get_table(table_name)
            kb_res = table_instance.output(["*"]).filter(f"id = '{chunkId}'").to_pl()
-            df_list.append(kb_res)
+            if len(kb_res) != 0 and kb_res.shape[0] > 0:
+                df_list.append(kb_res)
+
        self.connPool.release_conn(inf_conn)
        res = concat_dataframes(df_list, ["id"])
        res_fields = self.getFields(res, res.columns)
@ -424,7 +421,7 @@ class InfinityConnection(DocStoreConnection):
                    v = list(v)
                elif fieldnm == "important_kwd":
                    assert isinstance(v, str)
-                    v = v.split(" ")
+                    v = v.split()
                else:
                    if not isinstance(v, str):
                        v = str(v)
--- a/rag/utils/minio_conn.py
+++ b/rag/utils/minio_conn.py
@ -1,6 +1,7 @@
 import logging
 import time
 from minio import Minio
+from minio.error import S3Error
 from io import BytesIO
 from rag import settings
 from rag.utils import singleton
@ -84,8 +85,11 @@ class RAGFlowMinio(object):
                return True
            else:
                return False
+        except S3Error as e:
+            if e.code in ["NoSuchKey", "NoSuchBucket", "ResourceNotFound"]:
+                return False
        except Exception:
-            logging.exception(f"Not found: {bucket}/{filename}")
+            logging.exception(f"obj_exist {bucket}/{filename} got exception")
            return False

    def get_presigned_url(self, bucket, fnm, expires):
--- a/sdk/python/test/conftest.py
+++ b/sdk/python/test/conftest.py
@ -7,10 +7,13 @@ import requests

 HOST_ADDRESS = os.getenv('HOST_ADDRESS', 'http://127.0.0.1:9380')

-def generate_random_email():
-    return 'user_' + ''.join(random.choices(string.ascii_lowercase + string.digits, k=8))+'@1.com'
+# def generate_random_email():
+#     return 'user_' + ''.join(random.choices(string.ascii_lowercase + string.digits, k=8))+'@1.com'

-EMAIL = generate_random_email()
+def generate_email():
+    return 'user_123@1.com'
+
+EMAIL = generate_email()
 # password is "123"
 PASSWORD='''ctAseGvejiaSWWZ88T/m4FQVOpQyUvP+x7sXtdv3feqZACiQleuewkUi35E16wSd5C5QcnkkcV9cYc8TKPTRZlxappDuirxghxoOvFcJxFU4ixLsD
 fN33jCHRoDUW81IH9zjij/vaw8IbVyb6vuwg6MX6inOEBRRzVbRYxXOu1wkWY6SsI8X70oF9aeLFp/PzQpjoe/YbSqpTq8qqrmHzn9vO+yvyYyvmDsphXe
@ -37,7 +40,10 @@ def login():

@pytest.fixture(scope="session")
 def get_api_key_fixture():
-    register()
+    try:
+        register()
+    except Exception as e:
+        print(e)
    auth = login()
    url = HOST_ADDRESS + "/v1/system/new_token"
    auth = {"Authorization": auth}
@ -49,7 +55,10 @@ def get_api_key_fixture():

@pytest.fixture(scope="session")
 def get_auth():
-    register()
+    try:
+        register()
+    except Exception as e:
+        print(e)
    auth = login()
    return auth

--- a/sdk/python/test/test_frontend_api/common.py
+++ b/sdk/python/test/test_frontend_api/common.py
@ -1,2 +1,34 @@
 import os
-HOST_ADDRESS=os.getenv('HOST_ADDRESS', 'http://127.0.0.1:9380')
+import requests
+
+HOST_ADDRESS = os.getenv('HOST_ADDRESS', 'http://127.0.0.1:9380')
+
+DATASET_NAME_LIMIT = 128
+
+def create_dataset(auth, dataset_name):
+    authorization = {"Authorization": auth}
+    url = f"{HOST_ADDRESS}/v1/kb/create"
+    json = {"name": dataset_name}
+    res = requests.post(url=url, headers=authorization, json=json)
+    return res.json()
+
+
+def list_dataset(auth, page_number):
+    authorization = {"Authorization": auth}
+    url = f"{HOST_ADDRESS}/v1/kb/list?page={page_number}"
+    res = requests.get(url=url, headers=authorization)
+    return res.json()
+
+
+def rm_dataset(auth, dataset_id):
+    authorization = {"Authorization": auth}
+    url = f"{HOST_ADDRESS}/v1/kb/rm"
+    json = {"kb_id": dataset_id}
+    res = requests.post(url=url, headers=authorization, json=json)
+    return res.json()
+
+def update_dataset(auth, json_req):
+    authorization = {"Authorization": auth}
+    url = f"{HOST_ADDRESS}/v1/kb/update"
+    res = requests.post(url=url, headers=authorization, json=json_req)
+    return res.json()
--- a/sdk/python/test/test_frontend_api/test_dataset.py
+++ b/sdk/python/test/test_frontend_api/test_dataset.py
@ -1,10 +1,137 @@
-from common import HOST_ADDRESS
-import requests
-def test_create_dataset(get_auth):
-    authorization={"Authorization": get_auth}
-    url = f"{HOST_ADDRESS}/v1/kb/create"
-    json = {"name":"test_create_dataset"}
-    res = requests.post(url=url,headers=authorization,json=json)
-    res = res.json()
-    assert res.get("code") == 0,f"{res.get('message')}"
+from common import HOST_ADDRESS, create_dataset, list_dataset, rm_dataset, update_dataset, DATASET_NAME_LIMIT
+import re
+import pytest
+import random
+import string

+
+def test_dataset(get_auth):
+    # create dataset
+    res = create_dataset(get_auth, "test_create_dataset")
+    assert res.get("code") == 0, f"{res.get('message')}"
+
+    # list dataset
+    page_number = 1
+    dataset_list = []
+    while True:
+        res = list_dataset(get_auth, page_number)
+        data = res.get("data").get("kbs")
+        for item in data:
+            dataset_id = item.get("id")
+            dataset_list.append(dataset_id)
+        if len(dataset_list) < page_number * 150:
+            break
+        page_number += 1
+
+    print(f"found {len(dataset_list)} datasets")
+    # delete dataset
+    for dataset_id in dataset_list:
+        res = rm_dataset(get_auth, dataset_id)
+        assert res.get("code") == 0, f"{res.get('message')}"
+    print(f"{len(dataset_list)} datasets are deleted")
+
+
+def test_dataset_1k_dataset(get_auth):
+    # create dataset
+    authorization = {"Authorization": get_auth}
+    url = f"{HOST_ADDRESS}/v1/kb/create"
+    for i in range(1000):
+        res = create_dataset(get_auth, f"test_create_dataset_{i}")
+        assert res.get("code") == 0, f"{res.get('message')}"
+
+    # list dataset
+    page_number = 1
+    dataset_list = []
+    while True:
+        res = list_dataset(get_auth, page_number)
+        data = res.get("data").get("kbs")
+        for item in data:
+            dataset_id = item.get("id")
+            dataset_list.append(dataset_id)
+        if len(dataset_list) < page_number * 150:
+            break
+        page_number += 1
+
+    print(f"found {len(dataset_list)} datasets")
+    # delete dataset
+    for dataset_id in dataset_list:
+        res = rm_dataset(get_auth, dataset_id)
+        assert res.get("code") == 0, f"{res.get('message')}"
+    print(f"{len(dataset_list)} datasets are deleted")
+
+
+def test_duplicated_name_dataset(get_auth):
+    # create dataset
+    for i in range(20):
+        res = create_dataset(get_auth, "test_create_dataset")
+        assert res.get("code") == 0, f"{res.get('message')}"
+
+    # list dataset
+    res = list_dataset(get_auth, 1)
+    data = res.get("data").get("kbs")
+    dataset_list = []
+    pattern = r'^test_create_dataset.*'
+    for item in data:
+        dataset_name = item.get("name")
+        dataset_id = item.get("id")
+        dataset_list.append(dataset_id)
+        match = re.match(pattern, dataset_name)
+        assert match != None
+
+    for dataset_id in dataset_list:
+        res = rm_dataset(get_auth, dataset_id)
+        assert res.get("code") == 0, f"{res.get('message')}"
+    print(f"{len(dataset_list)} datasets are deleted")
+
+
+def test_invalid_name_dataset(get_auth):
+    # create dataset
+    # with pytest.raises(Exception) as e:
+    res = create_dataset(get_auth, 0)
+    assert res['code'] == 102
+
+    res = create_dataset(get_auth, "")
+    assert res['code'] == 102
+
+    long_string = ""
+
+    while len(long_string) <= DATASET_NAME_LIMIT:
+        long_string += random.choice(string.ascii_letters + string.digits)
+
+    res = create_dataset(get_auth, long_string)
+    assert res['code'] == 102
+    print(res)
+
+
+def test_update_different_params_dataset(get_auth):
+    # create dataset
+    res = create_dataset(get_auth, "test_create_dataset")
+    assert res.get("code") == 0, f"{res.get('message')}"
+
+    # list dataset
+    page_number = 1
+    dataset_list = []
+    while True:
+        res = list_dataset(get_auth, page_number)
+        data = res.get("data").get("kbs")
+        for item in data:
+            dataset_id = item.get("id")
+            dataset_list.append(dataset_id)
+        if len(dataset_list) < page_number * 150:
+            break
+        page_number += 1
+
+    print(f"found {len(dataset_list)} datasets")
+    dataset_id = dataset_list[0]
+
+    json_req = {"kb_id": dataset_id, "name": "test_update_dataset", "description": "test", "permission": "me", "parser_id": "presentation"}
+    res = update_dataset(get_auth, json_req)
+    assert res.get("code") == 0, f"{res.get('message')}"
+
+    # delete dataset
+    for dataset_id in dataset_list:
+        res = rm_dataset(get_auth, dataset_id)
+        assert res.get("code") == 0, f"{res.get('message')}"
+    print(f"{len(dataset_list)} datasets are deleted")
+
+# update dataset with different parameters
--- a/sdk/python/test/test_sdk_api/t_chunk.py
+++ b/sdk/python/test/test_sdk_api/t_chunk.py
@ -190,4 +190,7 @@ def test_retrieve_chunks(get_api_key_fixture):
    docs = ds.upload_documents(documents)
    doc = docs[0]
    doc.add_chunk(content="This is a chunk addition test")
-    rag.retrieve(dataset_ids=[ds.id],document_ids=[doc.id])
+    rag.retrieve(dataset_ids=[ds.id],document_ids=[doc.id])
+    rag.delete_datasets(ids=[ds.id])
+
+# test different parameters for the retrieval
--- a/printEnvironment.sh
+++ b/printEnvironment.sh
@ -15,7 +15,7 @@ get_distro_info() {
    echo "$distro_id $distro_version (Kernel version: $kernel_version)"
 }

-# get Git repo name
+# get Git repository name
 git_repo_name=''
 if git rev-parse --is-inside-work-tree > /dev/null 2>&1; then
    git_repo_name=$(basename "$(git rev-parse --show-toplevel)")
@ -48,8 +48,8 @@ else
    python_version="Python not installed"
 fi

-# Print all infomation
-echo "Current Repo: $git_repo_name"
+# Print all information
+echo "Current Repository: $git_repo_name"

 # get Commit ID
 git_version=$(git log -1 --pretty=format:'%h')
--- a/web/.umirc.ts
+++ b/web/.umirc.ts
@ -34,7 +34,7 @@ export default defineConfig({
  proxy: [
    {
      context: ['/api', '/v1'],
-      target: 'http://127.0.0.1:9456/',
+      target: 'http://127.0.0.1:9380/',
      changeOrigin: true,
      ws: true,
      logger: console,
--- a/web/package-lock.json
+++ b/web/package-lock.json
@ -27,8 +27,10 @@
        "@radix-ui/react-switch": "^1.1.1",
        "@radix-ui/react-tabs": "^1.1.1",
        "@radix-ui/react-toast": "^1.2.2",
+        "@tailwindcss/line-clamp": "^0.4.4",
        "@tanstack/react-query": "^5.40.0",
        "@tanstack/react-query-devtools": "^5.51.5",
+        "@tanstack/react-table": "^8.20.5",
        "@uiw/react-markdown-preview": "^5.1.3",
        "ahooks": "^3.7.10",
        "antd": "^5.12.7",
@ -56,6 +58,7 @@
        "react-force-graph": "^1.44.4",
        "react-hook-form": "^7.53.1",
        "react-i18next": "^14.0.0",
+        "react-infinite-scroll-component": "^6.1.0",
        "react-markdown": "^9.0.1",
        "react-pdf-highlighter": "^6.1.0",
        "react-string-replace": "^1.1.1",
@ -5532,6 +5535,14 @@
        "node": ">=6"
      }
    },
+    "node_modules/@tailwindcss/line-clamp": {
+      "version": "0.4.4",
+      "resolved": "https://registry.npmmirror.com/@tailwindcss/line-clamp/-/line-clamp-0.4.4.tgz",
+      "integrity": "sha512-5U6SY5z8N42VtrCrKlsTAA35gy2VSyYtHWCsg1H87NU1SXnEfekTVlrga9fzUDrrHcGi2Lb5KenUWb4lRQT5/g==",
+      "peerDependencies": {
+        "tailwindcss": ">=2.0.0 || >=3.0.0 || >=3.0.0-alpha.1"
+      }
+    },
    "node_modules/@tanstack/match-sorter-utils": {
      "version": "8.11.3",
      "resolved": "https://registry.npmmirror.com/@tanstack/match-sorter-utils/-/match-sorter-utils-8.11.3.tgz",
@ -5599,6 +5610,37 @@
        "url": "https://github.com/sponsors/tannerlinsley"
      }
    },
+    "node_modules/@tanstack/react-table": {
+      "version": "8.20.5",
+      "resolved": "https://registry.npmmirror.com/@tanstack/react-table/-/react-table-8.20.5.tgz",
+      "integrity": "sha512-WEHopKw3znbUZ61s9i0+i9g8drmDo6asTWbrQh8Us63DAk/M0FkmIqERew6P71HI75ksZ2Pxyuf4vvKh9rAkiA==",
+      "dependencies": {
+        "@tanstack/table-core": "8.20.5"
+      },
+      "engines": {
+        "node": ">=12"
+      },
+      "funding": {
+        "type": "github",
+        "url": "https://github.com/sponsors/tannerlinsley"
+      },
+      "peerDependencies": {
+        "react": ">=16.8",
+        "react-dom": ">=16.8"
+      }
+    },
+    "node_modules/@tanstack/table-core": {
+      "version": "8.20.5",
+      "resolved": "https://registry.npmmirror.com/@tanstack/table-core/-/table-core-8.20.5.tgz",
+      "integrity": "sha512-P9dF7XbibHph2PFRz8gfBKEXEY/HJPOhym8CHmjF8y3q5mWpKx9xtZapXQUWCgkqvsK0R46Azuz+VaxD4Xl+Tg==",
+      "engines": {
+        "node": ">=12"
+      },
+      "funding": {
+        "type": "github",
+        "url": "https://github.com/sponsors/tannerlinsley"
+      }
+    },
    "node_modules/@testing-library/dom": {
      "version": "10.1.0",
      "resolved": "https://registry.npmmirror.com/@testing-library/dom/-/dom-10.1.0.tgz",
@ -24696,6 +24738,25 @@
        }
      }
    },
+    "node_modules/react-infinite-scroll-component": {
+      "version": "6.1.0",
+      "resolved": "https://registry.npmmirror.com/react-infinite-scroll-component/-/react-infinite-scroll-component-6.1.0.tgz",
+      "integrity": "sha512-SQu5nCqy8DxQWpnUVLx7V7b7LcA37aM7tvoWjTLZp1dk6EJibM5/4EJKzOnl07/BsM1Y40sKLuqjCwwH/xV0TQ==",
+      "dependencies": {
+        "throttle-debounce": "^2.1.0"
+      },
+      "peerDependencies": {
+        "react": ">=16.0.0"
+      }
+    },
+    "node_modules/react-infinite-scroll-component/node_modules/throttle-debounce": {
+      "version": "2.3.0",
+      "resolved": "https://registry.npmmirror.com/throttle-debounce/-/throttle-debounce-2.3.0.tgz",
+      "integrity": "sha512-H7oLPV0P7+jgvrk+6mwwwBDmxTaxnu9HMXmloNLXwnNO0ZxZ31Orah2n8lU1eMPvsaowP2CX+USCgyovXfdOFQ==",
+      "engines": {
+        "node": ">=8"
+      }
+    },
    "node_modules/react-is": {
      "version": "18.2.0",
      "resolved": "https://registry.npmmirror.com/react-is/-/react-is-18.2.0.tgz",
--- a/web/package.json
+++ b/web/package.json
@ -38,8 +38,10 @@
    "@radix-ui/react-switch": "^1.1.1",
    "@radix-ui/react-tabs": "^1.1.1",
    "@radix-ui/react-toast": "^1.2.2",
+    "@tailwindcss/line-clamp": "^0.4.4",
    "@tanstack/react-query": "^5.40.0",
    "@tanstack/react-query-devtools": "^5.51.5",
+    "@tanstack/react-table": "^8.20.5",
    "@uiw/react-markdown-preview": "^5.1.3",
    "ahooks": "^3.7.10",
    "antd": "^5.12.7",
@ -67,6 +69,7 @@
    "react-force-graph": "^1.44.4",
    "react-hook-form": "^7.53.1",
    "react-i18next": "^14.0.0",
+    "react-infinite-scroll-component": "^6.1.0",
    "react-markdown": "^9.0.1",
    "react-pdf-highlighter": "^6.1.0",
    "react-string-replace": "^1.1.1",
--- a/web/src/components/list-filter-bar.tsx
+++ b/web/src/components/list-filter-bar.tsx
@ -0,0 +1,25 @@
+import { Filter, Search } from 'lucide-react';
+import { PropsWithChildren } from 'react';
+import { Button } from './ui/button';
+
+interface IProps {
+  title: string;
+}
+
+export default function ListFilterBar({
+  title,
+  children,
+}: PropsWithChildren<IProps>) {
+  return (
+    <div className="flex justify-between mb-6">
+      <span className="text-3xl font-bold ">{title}</span>
+      <div className="flex gap-4 items-center">
+        <Filter className="size-5" />
+        <Search className="size-5" />
+        <Button variant={'tertiary'} size={'sm'}>
+          {children}
+        </Button>
+      </div>
+    </div>
+  );
+}
--- a/web/src/components/ui/button.tsx
+++ b/web/src/components/ui/button.tsx
@ -13,7 +13,7 @@ const buttonVariants = cva(
        destructive:
          'bg-destructive text-destructive-foreground hover:bg-destructive/90',
        outline:
-          'border border-input bg-background hover:bg-accent hover:text-accent-foreground',
+          'border border-colors-outline-sentiment-primary bg-background hover:bg-accent hover:text-accent-foreground',
        secondary:
          'bg-secondary text-secondary-foreground hover:bg-secondary/80',
        ghost: 'hover:bg-accent hover:text-accent-foreground',
--- a/web/src/constants/knowledge.ts
+++ b/web/src/constants/knowledge.ts
@ -4,6 +4,8 @@ export enum KnowledgeRouteKey {
  Configuration = 'configuration',
 }

+export const DatasetBaseKey = 'dataset';
+
 export enum RunningStatus {
  UNSTART = '0', // need to run
  RUNNING = '1', // need to cancel
--- a/web/src/hooks/knowledge-hooks.ts
+++ b/web/src/hooks/knowledge-hooks.ts
@ -3,14 +3,17 @@ import { IKnowledge, ITestingResult } from '@/interfaces/database/knowledge';
 import i18n from '@/locales/config';
 import kbService from '@/services/knowledge-service';
 import {
+  useInfiniteQuery,
  useIsMutating,
  useMutation,
  useMutationState,
  useQuery,
  useQueryClient,
 } from '@tanstack/react-query';
+import { useDebounce } from 'ahooks';
 import { message } from 'antd';
 import { useSearchParams } from 'umi';
+import { useHandleSearchChange } from './logic-hooks';
 import { useSetPaginationParams } from './route-hook';

 export const useKnowledgeBaseId = (): string => {
@ -50,7 +53,7 @@ export const useNextFetchKnowledgeList = (
    gcTime: 0, // https://tanstack.com/query/latest/docs/framework/react/guides/caching?from=reactQueryV3
    queryFn: async () => {
      const { data } = await kbService.getList();
-      const list = data?.data ?? [];
+      const list = data?.data?.kbs ?? [];
      return shouldFilterListWithoutDocument
        ? list.filter((x: IKnowledge) => x.chunk_num > 0)
        : list;
@ -60,6 +63,52 @@ export const useNextFetchKnowledgeList = (
  return { list: data, loading };
 };

+export const useInfiniteFetchKnowledgeList = () => {
+  const { searchString, handleInputChange } = useHandleSearchChange();
+  const debouncedSearchString = useDebounce(searchString, { wait: 500 });
+
+  const PageSize = 30;
+  const {
+    data,
+    error,
+    fetchNextPage,
+    hasNextPage,
+    isFetching,
+    isFetchingNextPage,
+    status,
+  } = useInfiniteQuery({
+    queryKey: ['infiniteFetchKnowledgeList', debouncedSearchString],
+    queryFn: async ({ pageParam }) => {
+      const { data } = await kbService.getList({
+        page: pageParam,
+        page_size: PageSize,
+        keywords: debouncedSearchString,
+      });
+      const list = data?.data ?? [];
+      return list;
+    },
+    initialPageParam: 1,
+    getNextPageParam: (lastPage, pages, lastPageParam) => {
+      if (lastPageParam * PageSize <= lastPage.total) {
+        return lastPageParam + 1;
+      }
+      return undefined;
+    },
+  });
+  return {
+    data,
+    loading: isFetching,
+    error,
+    fetchNextPage,
+    hasNextPage,
+    isFetching,
+    isFetchingNextPage,
+    status,
+    handleInputChange,
+    searchString,
+  };
+};
+
 export const useCreateKnowledge = () => {
  const queryClient = useQueryClient();
  const {
@ -95,7 +144,9 @@ export const useDeleteKnowledge = () => {
      const { data } = await kbService.rmKb({ kb_id: id });
      if (data.code === 0) {
        message.success(i18n.t(`message.deleted`));
-        queryClient.invalidateQueries({ queryKey: ['fetchKnowledgeList'] });
+        queryClient.invalidateQueries({
+          queryKey: ['infiniteFetchKnowledgeList'],
+        });
      }
      return data?.data ?? [];
    },
--- a/web/src/interfaces/database/document.ts
+++ b/web/src/interfaces/database/document.ts
@ -11,7 +11,7 @@ export interface IDocumentInfo {
  name: string;
  parser_config: IParserConfig;
  parser_id: string;
-  process_begin_at: null;
+  process_begin_at?: string;
  process_duation: number;
  progress: number;
  progress_msg: string;
@ -27,11 +27,11 @@ export interface IDocumentInfo {
 }

 export interface IParserConfig {
-  delimiter: string;
-  html4excel: boolean;
-  layout_recognize: boolean;
+  delimiter?: string;
+  html4excel?: boolean;
+  layout_recognize?: boolean;
  pages: any[];
-  raptor: Raptor;
+  raptor?: Raptor;
 }

 interface Raptor {
--- a/web/src/layouts/next-header.tsx
+++ b/web/src/layouts/next-header.tsx
@ -0,0 +1,113 @@
+import { Avatar, AvatarFallback, AvatarImage } from '@/components/ui/avatar';
+import { Button } from '@/components/ui/button';
+import { Container } from '@/components/ui/container';
+import { Segmented, SegmentedValue } from '@/components/ui/segmented ';
+import { useTranslate } from '@/hooks/common-hooks';
+import { useNavigateWithFromState } from '@/hooks/route-hook';
+import {
+  ChevronDown,
+  Cpu,
+  Github,
+  Library,
+  MessageSquareText,
+  Search,
+  Star,
+  Zap,
+} from 'lucide-react';
+import { useCallback, useMemo, useState } from 'react';
+import { useLocation } from 'umi';
+
+export function Header() {
+  const { t } = useTranslate('header');
+  const { pathname } = useLocation();
+  const navigate = useNavigateWithFromState();
+  const [currentPath, setCurrentPath] = useState('/home');
+
+  const tagsData = useMemo(
+    () => [
+      { path: '/home', name: t('knowledgeBase'), icon: Library },
+      { path: '/chat', name: t('chat'), icon: MessageSquareText },
+      { path: '/search', name: t('search'), icon: Search },
+      { path: '/flow', name: t('flow'), icon: Cpu },
+      // { path: '/file', name: t('fileManager'), icon: FileIcon },
+    ],
+    [t],
+  );
+
+  const options = useMemo(() => {
+    return tagsData.map((tag) => {
+      const HeaderIcon = tag.icon;
+
+      return {
+        label: (
+          <div className="flex items-center gap-1">
+            <HeaderIcon className="size-5"></HeaderIcon>
+            <span>{tag.name}</span>
+          </div>
+        ),
+        value: tag.path,
+      };
+    });
+  }, [tagsData]);
+
+  // const currentPath = useMemo(() => {
+  //   return tagsData.find((x) => pathname.startsWith(x.path))?.name || 'home';
+  // }, [pathname, tagsData]);
+
+  const handleChange = (path: SegmentedValue) => {
+    // navigate(path as string);
+    setCurrentPath(path as string);
+  };
+
+  const handleLogoClick = useCallback(() => {
+    navigate('/');
+  }, [navigate]);
+
+  return (
+    <section className="py-6 px-10 flex justify-between items-center border-b">
+      <div className="flex items-center gap-4">
+        <img
+          src={'/logo.svg'}
+          alt="logo"
+          className="w-[100] h-[100] mr-[12]"
+          onClick={handleLogoClick}
+        />
+        <Button variant="secondary">
+          <Github />
+          21.5k stars
+          <Star />
+        </Button>
+      </div>
+      <div>
+        <Segmented
+          options={options}
+          value={currentPath}
+          onChange={handleChange}
+          className="bg-colors-background-inverse-standard text-backgroundInverseStandard-foreground"
+        ></Segmented>
+      </div>
+      <div className="flex items-center gap-4">
+        <Container>
+          V 0.13.0
+          <Button variant="secondary" className="size-8">
+            <ChevronDown />
+          </Button>
+        </Container>
+        <Container className="px-3 py-2">
+          <Avatar className="w-[30px] h-[30px]">
+            <AvatarImage src="https://github.com/shadcn.png" />
+            <AvatarFallback>CN</AvatarFallback>
+          </Avatar>
+          yifanwu92@gmail.com
+          <Button
+            variant="destructive"
+            className="py-[2px] px-[8px] h-[23px] rounded-[4px]"
+          >
+            <Zap />
+            Pro
+          </Button>
+        </Container>
+      </div>
+    </section>
+  );
+}
--- a/web/src/layouts/next.tsx
+++ b/web/src/layouts/next.tsx
@ -0,0 +1,11 @@
+import { Outlet } from 'umi';
+import { Header } from './next-header';
+
+export default function NextLayout() {
+  return (
+    <section>
+      <Header></Header>
+      <Outlet />
+    </section>
+  );
+}
--- a/web/src/locales/en.ts
+++ b/web/src/locales/en.ts
@ -75,6 +75,7 @@ export default {
      namePlaceholder: 'Please input name!',
      doc: 'Docs',
      searchKnowledgePlaceholder: 'Search',
+      noMoreData: 'It is all, nothing more',
    },
    knowledgeDetails: {
      dataset: 'Dataset',
@ -127,17 +128,17 @@ export default {
      runningStatus4: 'FAIL',
      pageRanges: 'Page Ranges',
      pageRangesTip:
-        'page ranges: Define the page ranges that need to be parsed. The pages that not included in these ranges will be ignored.',
+        'Range of pages to be parsed; pages outside this range will not be processed.',
      fromPlaceholder: 'from',
      fromMessage: 'Missing start page number',
      toPlaceholder: 'to',
-      toMessage: 'Missing end page number(excluded)',
+      toMessage: 'Missing end page number (excluded)',
      layoutRecognize: 'Layout recognition',
      layoutRecognizeTip:
-        'Use visual models for layout analysis to better identify document structure, find where the titles, text blocks, images, and tables are. Without this feature, only the plain text of the PDF can be obtained.',
+        'Use visual models for layout analysis to better understand the structure of the document and effectively locate document titles, text blocks, images, and tables. If disabled, only the plain text from the PDF will be retrieved.',
      taskPageSize: 'Task page size',
      taskPageSizeMessage: 'Please input your task page size!',
-      taskPageSizeTip: `If using layout recognize, the PDF file will be split into groups of successive. Layout analysis will be performed parallelly between groups to increase the processing speed. The 'Task page size' determines the size of groups. The larger the page size is, the lower the chance of splitting continuous text between pages into different chunks.`,
+      taskPageSizeTip: `During layout recognition, a PDF file is split into chunks and processed in parallel to increase processing speed. This parameter sets the size of each chunk. A larger chunk size reduces the likelihood of splitting continuous text between pages.`,
      addPage: 'Add page',
      greaterThan: 'The current value must be greater than to!',
      greaterThanPrevious:
@ -157,7 +158,7 @@ export default {
      topKTip: `K chunks will be fed into rerank models.`,
      delimiter: `Delimiter`,
      html4excel: 'Excel to HTML',
-      html4excelTip: `Excel will be parsed into HTML table or not. If it's FALSE, every row in Excel will be formed as a chunk.`,
+      html4excelTip: `When enabled, the spreadsheet will be parsed into HTML tables; otherwise, it will be parsed into key-value pairs by row.`,
      autoKeywords: 'Auto-keyword',
      autoKeywordsTip: `Extract N keywords for each chunk to increase their ranking for queries containing those keywords. You can check or update the added keywords for a chunk from the chunk list. Be aware that extra tokens will be consumed by the LLM specified in 'System model settings'.`,
      autoQuestions: 'Auto-question',
@ -693,7 +694,7 @@ The above is the content you need to summarize.`,
        "A component that sends out a static message. If multiple messages are supplied, it randomly selects one to send. Ensure its downstream is 'Answer', the interface component.",
      keywordDescription: `A component that retrieves top N search results from user's input. Ensure the TopN value is set properly before use.`,
      switchDescription: `A component that evaluates conditions based on the output of previous components and directs the flow of execution accordingly. It allows for complex branching logic by defining cases and specifying actions for each case or default action if no conditions are met.`,
-      wikipediaDescription: `This component is used to get search result from wikipedia.org. Typically, it performs as a supplement to knowledgebases. Top N specifies the number of search results you need to adapt.`,
+      wikipediaDescription: `A component that searches from wikipedia.org, using TopN to specify the number of search results. It supplements the existing knowledge bases.`,
      promptText: `Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:
        {input}
  The above is the content you need to summarize.`,
@ -717,10 +718,10 @@ The above is the content you need to summarize.`,
      keywordExtract: 'Keyword',
      keywordExtractDescription: `A component that extracts keywords from a user query, with Top N specifing the number of keywords to extract.`,
      baidu: 'Baidu',
-      baiduDescription: `This component is used to get search result from www.baidu.com. Typically, it performs as a supplement to knowledgebases. Top N specifies the number of search results you need to adapt.`,
+      baiduDescription: `A component that searches from baidu.com, using TopN to specify the number of search results. It supplements the existing knowledge bases.`,
      duckDuckGo: 'DuckDuckGo',
      duckDuckGoDescription:
-        'A component that retrieves search results from duckduckgo.com, with TopN specifying the number of search results. It supplements existing knowledge bases.',
+        'A component that searches from duckduckgo.com, allowing you to specify the number of search results using TopN. It supplements the existing knowledge bases.',
      channel: 'Channel',
      channelTip: `Perform text search or news search on the component's input`,
      text: 'Text',
@ -731,23 +732,23 @@ The above is the content you need to summarize.`,
      wikipedia: 'Wikipedia',
      pubMed: 'PubMed',
      pubMedDescription:
-        'This component is used to get search result from https://pubmed.ncbi.nlm.nih.gov/. Typically, it performs as a supplement to knowledgebases. Top N specifies the number of search results you need to adapt. E-mail is a required field.',
+        'A component that searches from https://pubmed.ncbi.nlm.nih.gov/, allowing you to specify the number of search results using TopN. It supplements the existing knowledge bases.',
      email: 'Email',
      emailTip:
-        'This component is used to get search result from https://pubmed.ncbi.nlm.nih.gov/. Typically, it performs as a supplement to knowledgebases. Top N specifies the number of search results you need to adapt. E-mail is a required field.',
+        'E-mail is a required field. You must input an E-mail address here.',
      arXiv: 'ArXiv',
      arXivDescription:
-        'This component is used to get search result from https://arxiv.org/. Typically, it performs as a supplement to knowledgebases. Top N specifies the number of search results you need to adapt.',
+        'A component that searches from https://arxiv.org/, allowing you to specify the number of search results using TopN. It supplements the existing knowledge bases.',
      sortBy: 'Sort by',
      submittedDate: 'Submitted date',
      lastUpdatedDate: 'Last updated date',
      relevance: 'Relevance',
      google: 'Google',
      googleDescription:
-        'This component is used to get search result fromhttps://www.google.com/ . Typically, it performs as a supplement to knowledgebases. Top N and SerpApi API key specifies the number of search results you need to adapt.',
+        'A component that searches from https://www.google.com/, allowing you to specify the number of search results using TopN. It supplements the existing knowledge bases. Please note that this requires an API key from serpapi.com.',
      bing: 'Bing',
      bingDescription:
-        'This component is used to get search result from https://www.bing.com/. Typically, it performs as a supplement to knowledgebases. Top N and Bing Subscription-Key specifies the number of search results you need to adapt.',
+        'A component that searches from https://www.bing.com/, allowing you to specify the number of search results using TopN. It supplements the existing knowledge bases. Please note that this requires an API key from microsoft.com.',
      apiKey: 'API KEY',
      country: 'Country&Region',
      language: 'Language',
--- a/web/src/locales/zh-traditional.ts
+++ b/web/src/locales/zh-traditional.ts
@ -75,6 +75,7 @@ export default {
      namePlaceholder: '請輸入名稱',
      doc: '文件',
      searchKnowledgePlaceholder: '搜索',
+      noMoreData: 'It is all, nothing more',
    },
    knowledgeDetails: {
      dataset: '數據集',
@ -295,7 +296,7 @@ export default {
      search: '搜尋',
      all: '所有',
      enabled: '啟用',
-      disabled: '禁用的',
+      disabled: '禁用',
      keyword: '關鍵詞',
      function: '函數',
      chunkMessage: '請輸入值！',
--- a/web/src/locales/zh.ts
+++ b/web/src/locales/zh.ts
@ -75,6 +75,7 @@ export default {
      namePlaceholder: '请输入名称',
      doc: '文档',
      searchKnowledgePlaceholder: '搜索',
+      noMoreData: '沒有更多的數據了',
    },
    knowledgeDetails: {
      dataset: '数据集',
@ -312,7 +313,7 @@ export default {
      search: '搜索',
      all: '所有',
      enabled: '启用',
-      disabled: '禁用的',
+      disabled: '禁用',
      keyword: '关键词',
      function: '函数',
      chunkMessage: '请输入值！',
--- a/web/src/pages/add-knowledge/components/knowledge-chunk/components/chunk-creating-modal/index.tsx
+++ b/web/src/pages/add-knowledge/components/knowledge-chunk/components/chunk-creating-modal/index.tsx
@ -2,7 +2,7 @@ import EditTag from '@/components/edit-tag';
 import { useFetchChunk } from '@/hooks/chunk-hooks';
 import { IModalProps } from '@/interfaces/common';
 import { DeleteOutlined } from '@ant-design/icons';
-import { Checkbox, Divider, Form, Input, Modal, Space } from 'antd';
+import { Divider, Form, Input, Modal, Space, Switch } from 'antd';
 import React, { useEffect, useState } from 'react';
 import { useTranslation } from 'react-i18next';
 import { useDeleteChunkByIds } from '../../hooks';
@ -31,9 +31,14 @@ const ChunkCreatingModal: React.FC<IModalProps<any> & kFProps> = ({

  useEffect(() => {
    if (data?.code === 0) {
-      const { content_with_weight, important_kwd = [] } = data.data;
+      const {
+        content_with_weight,
+        important_kwd = [],
+        available_int,
+      } = data.data;
      form.setFieldsValue({ content: content_with_weight });
      setKeywords(important_kwd);
+      setChecked(available_int === 1);
    }

    if (!chunkId) {
@ -48,6 +53,7 @@ const ChunkCreatingModal: React.FC<IModalProps<any> & kFProps> = ({
      onOk?.({
        content: values.content,
        keywords, // keywords
+        available_int: checked ? 1 : 0, // available_int
      });
    } catch (errorInfo) {
      console.log('Failed:', errorInfo);
@ -82,16 +88,19 @@ const ChunkCreatingModal: React.FC<IModalProps<any> & kFProps> = ({
        </Form.Item>
      </Form>
      <section>
-        <p>{t('chunk.keyword')} *</p>
+        <p className="mb-2">{t('chunk.keyword')} *</p>
        <EditTag tags={keywords} setTags={setKeywords} />
      </section>
      {chunkId && (
        <section>
          <Divider></Divider>
          <Space size={'large'}>
-            <Checkbox onChange={handleCheck} checked={checked}>
-              {t('chunk.enabled')}
-            </Checkbox>
+            <Switch
+              checkedChildren={t('chunk.enabled')}
+              unCheckedChildren={t('chunk.disabled')}
+              onChange={handleCheck}
+              checked={checked}
+            />

            <span onClick={handleRemove}>
              <DeleteOutlined /> {t('common.delete')}
--- a/web/src/pages/add-knowledge/components/knowledge-chunk/hooks.ts
+++ b/web/src/pages/add-knowledge/components/knowledge-chunk/hooks.ts
@ -95,12 +95,21 @@ export const useUpdateChunk = () => {
  const { documentId } = useGetKnowledgeSearchParams();

  const onChunkUpdatingOk = useCallback(
-    async ({ content, keywords }: { content: string; keywords: string }) => {
+    async ({
+      content,
+      keywords,
+      available_int,
+    }: {
+      content: string;
+      keywords: string;
+      available_int: number;
+    }) => {
      const code = await createChunk({
        content_with_weight: content,
        doc_id: documentId,
        chunk_id: chunkId,
        important_kwd: keywords, // keywords
+        available_int,
      });

      if (code === 0) {
--- a/web/src/pages/dataset/dataset/dataset-table.tsx
+++ b/web/src/pages/dataset/dataset/dataset-table.tsx
@ -0,0 +1,268 @@
+'use client';
+
+import {
+  ColumnDef,
+  ColumnFiltersState,
+  SortingState,
+  VisibilityState,
+  flexRender,
+  getCoreRowModel,
+  getFilteredRowModel,
+  getPaginationRowModel,
+  getSortedRowModel,
+  useReactTable,
+} from '@tanstack/react-table';
+import { ArrowUpDown, MoreHorizontal } from 'lucide-react';
+import * as React from 'react';
+
+import { Button } from '@/components/ui/button';
+import { Checkbox } from '@/components/ui/checkbox';
+import {
+  DropdownMenu,
+  DropdownMenuContent,
+  DropdownMenuItem,
+  DropdownMenuLabel,
+  DropdownMenuSeparator,
+  DropdownMenuTrigger,
+} from '@/components/ui/dropdown-menu';
+import {
+  Table,
+  TableBody,
+  TableCell,
+  TableHead,
+  TableHeader,
+  TableRow,
+} from '@/components/ui/table';
+import { RunningStatus } from '@/constants/knowledge';
+import { IDocumentInfo } from '@/interfaces/database/document';
+
+const data: IDocumentInfo[] = [
+  {
+    chunk_num: 1,
+    create_date: 'Thu, 28 Nov 2024 17:10:22 GMT',
+    create_time: 1732785022792,
+    created_by: 'b0975cb4bc3111ee9b830aef05f5e94f',
+    id: '990cb30ead6811efb9b9fa163e197198',
+    kb_id: '25a8cfbe9cd411efbc12fa163e197198',
+    location: 'mian.jpg',
+    name: 'mian.jpg',
+    parser_config: {
+      pages: [[1, 1000000]],
+    },
+    parser_id: 'picture',
+    process_begin_at: 'Thu, 28 Nov 2024 17:10:25 GMT',
+    process_duation: 8.46185,
+    progress: 1,
+    progress_msg:
+      '\nTask has been received.\nPage(1~100000001): Finish OCR: (用小麦粉\n金\nONGXI ...)\nPage(1~100000001): OCR results is too long to use CV LLM.\nPage(1~100000001): Finished slicing files (1 chunks in 0.34s). Start to embedding the content.\nPage(1~100000001): Finished embedding (in 0.35s)! Start to build index!\nPage(1~100000001): Indexing elapsed in 0.02s.\nPage(1~100000001): Done!',
+    run: RunningStatus.RUNNING,
+    size: 19692,
+    source_type: 'local',
+    status: '1',
+    thumbnail:
+      '/v1/document/image/25a8cfbe9cd411efbc12fa163e197198-thumbnail_990cb30ead6811efb9b9fa163e197198.png',
+    token_num: 115,
+    type: 'visual',
+    update_date: 'Thu, 28 Nov 2024 17:10:33 GMT',
+    update_time: 1732785033462,
+  },
+];
+
+export const columns: ColumnDef<IDocumentInfo>[] = [
+  {
+    id: 'select',
+    header: ({ table }) => (
+      <Checkbox
+        checked={
+          table.getIsAllPageRowsSelected() ||
+          (table.getIsSomePageRowsSelected() && 'indeterminate')
+        }
+        onCheckedChange={(value) => table.toggleAllPageRowsSelected(!!value)}
+        aria-label="Select all"
+      />
+    ),
+    cell: ({ row }) => (
+      <Checkbox
+        checked={row.getIsSelected()}
+        onCheckedChange={(value) => row.toggleSelected(!!value)}
+        aria-label="Select row"
+      />
+    ),
+    enableSorting: false,
+    enableHiding: false,
+  },
+  {
+    accessorKey: 'status',
+    header: 'Status',
+    cell: ({ row }) => (
+      <div className="capitalize">{row.getValue('status')}</div>
+    ),
+  },
+  {
+    accessorKey: 'email',
+    header: ({ column }) => {
+      return (
+        <Button
+          variant="ghost"
+          onClick={() => column.toggleSorting(column.getIsSorted() === 'asc')}
+        >
+          Email
+          <ArrowUpDown />
+        </Button>
+      );
+    },
+    cell: ({ row }) => <div className="lowercase">{row.getValue('email')}</div>,
+  },
+  {
+    accessorKey: 'amount',
+    header: () => <div className="text-right">Amount</div>,
+    cell: ({ row }) => {
+      const amount = parseFloat(row.getValue('amount'));
+
+      // Format the amount as a dollar amount
+      const formatted = new Intl.NumberFormat('en-US', {
+        style: 'currency',
+        currency: 'USD',
+      }).format(amount);
+
+      return <div className="text-right font-medium">{formatted}</div>;
+    },
+  },
+  {
+    id: 'actions',
+    enableHiding: false,
+    cell: ({ row }) => {
+      const payment = row.original;
+
+      return (
+        <DropdownMenu>
+          <DropdownMenuTrigger asChild>
+            <Button variant="ghost" className="h-8 w-8 p-0">
+              <span className="sr-only">Open menu</span>
+              <MoreHorizontal />
+            </Button>
+          </DropdownMenuTrigger>
+          <DropdownMenuContent align="end">
+            <DropdownMenuLabel>Actions</DropdownMenuLabel>
+            <DropdownMenuItem
+              onClick={() => navigator.clipboard.writeText(payment.id)}
+            >
+              Copy payment ID
+            </DropdownMenuItem>
+            <DropdownMenuSeparator />
+            <DropdownMenuItem>View customer</DropdownMenuItem>
+            <DropdownMenuItem>View payment details</DropdownMenuItem>
+          </DropdownMenuContent>
+        </DropdownMenu>
+      );
+    },
+  },
+];
+
+export function DatasetTable() {
+  const [sorting, setSorting] = React.useState<SortingState>([]);
+  const [columnFilters, setColumnFilters] = React.useState<ColumnFiltersState>(
+    [],
+  );
+  const [columnVisibility, setColumnVisibility] =
+    React.useState<VisibilityState>({});
+  const [rowSelection, setRowSelection] = React.useState({});
+
+  const table = useReactTable({
+    data,
+    columns,
+    onSortingChange: setSorting,
+    onColumnFiltersChange: setColumnFilters,
+    getCoreRowModel: getCoreRowModel(),
+    getPaginationRowModel: getPaginationRowModel(),
+    getSortedRowModel: getSortedRowModel(),
+    getFilteredRowModel: getFilteredRowModel(),
+    onColumnVisibilityChange: setColumnVisibility,
+    onRowSelectionChange: setRowSelection,
+    state: {
+      sorting,
+      columnFilters,
+      columnVisibility,
+      rowSelection,
+    },
+  });
+
+  return (
+    <div className="w-full">
+      <div className="rounded-md border">
+        <Table>
+          <TableHeader>
+            {table.getHeaderGroups().map((headerGroup) => (
+              <TableRow key={headerGroup.id}>
+                {headerGroup.headers.map((header) => {
+                  return (
+                    <TableHead key={header.id}>
+                      {header.isPlaceholder
+                        ? null
+                        : flexRender(
+                            header.column.columnDef.header,
+                            header.getContext(),
+                          )}
+                    </TableHead>
+                  );
+                })}
+              </TableRow>
+            ))}
+          </TableHeader>
+          <TableBody>
+            {table.getRowModel().rows?.length ? (
+              table.getRowModel().rows.map((row) => (
+                <TableRow
+                  key={row.id}
+                  data-state={row.getIsSelected() && 'selected'}
+                >
+                  {row.getVisibleCells().map((cell) => (
+                    <TableCell key={cell.id}>
+                      {flexRender(
+                        cell.column.columnDef.cell,
+                        cell.getContext(),
+                      )}
+                    </TableCell>
+                  ))}
+                </TableRow>
+              ))
+            ) : (
+              <TableRow>
+                <TableCell
+                  colSpan={columns.length}
+                  className="h-24 text-center"
+                >
+                  No results.
+                </TableCell>
+              </TableRow>
+            )}
+          </TableBody>
+        </Table>
+      </div>
+      <div className="flex items-center justify-end space-x-2 py-4">
+        <div className="flex-1 text-sm text-muted-foreground">
+          {table.getFilteredSelectedRowModel().rows.length} of{' '}
+          {table.getFilteredRowModel().rows.length} row(s) selected.
+        </div>
+        <div className="space-x-2">
+          <Button
+            variant="outline"
+            size="sm"
+            onClick={() => table.previousPage()}
+            disabled={!table.getCanPreviousPage()}
+          >
+            Previous
+          </Button>
+          <Button
+            variant="outline"
+            size="sm"
+            onClick={() => table.nextPage()}
+            disabled={!table.getCanNextPage()}
+          >
+            Next
+          </Button>
+        </div>
+      </div>
+    </div>
+  );
+}
--- a/web/src/pages/dataset/dataset/index.tsx
+++ b/web/src/pages/dataset/dataset/index.tsx
@ -0,0 +1,15 @@
+import ListFilterBar from '@/components/list-filter-bar';
+import { Upload } from 'lucide-react';
+import { DatasetTable } from './dataset-table';
+
+export default function Dataset() {
+  return (
+    <section className="p-8 text-foreground">
+      <ListFilterBar title="Files">
+        <Upload />
+        Upload file
+      </ListFilterBar>
+      <DatasetTable></DatasetTable>
+    </section>
+  );
+}
--- a/web/src/pages/dataset/index.tsx
+++ b/web/src/pages/dataset/index.tsx
@ -0,0 +1,13 @@
+import { Outlet } from 'umi';
+import { SideBar } from './sidebar';
+
+export default function DatasetWrapper() {
+  return (
+    <div className="text-foreground flex">
+      <SideBar></SideBar>
+      <div className="flex-1">
+        <Outlet />
+      </div>
+    </div>
+  );
+}
--- a/web/src/pages/dataset/settings/index.tsx
+++ b/web/src/pages/dataset/settings/index.tsx
@ -0,0 +1,3 @@
+export default function DatasetSettings() {
+  return <div>DatasetSettings</div>;
+}
--- a/web/src/pages/dataset/sidebar/hooks.tsx
+++ b/web/src/pages/dataset/sidebar/hooks.tsx
@ -0,0 +1,16 @@
+import { DatasetBaseKey, KnowledgeRouteKey } from '@/constants/knowledge';
+import { useCallback } from 'react';
+import { useNavigate } from 'umi';
+
+export const useHandleMenuClick = () => {
+  const navigate = useNavigate();
+
+  const handleMenuClick = useCallback(
+    (key: KnowledgeRouteKey) => () => {
+      navigate(`/${DatasetBaseKey}/${key}`);
+    },
+    [navigate],
+  );
+
+  return { handleMenuClick };
+};
--- a/web/src/pages/dataset/sidebar/index.tsx
+++ b/web/src/pages/dataset/sidebar/index.tsx
@ -0,0 +1,66 @@
+import { Button } from '@/components/ui/button';
+import { KnowledgeRouteKey } from '@/constants/knowledge';
+import { useSecondPathName } from '@/hooks/route-hook';
+import { cn } from '@/lib/utils';
+import { Banknote, LayoutGrid, User } from 'lucide-react';
+import { useHandleMenuClick } from './hooks';
+
+const items = [
+  { icon: User, label: 'Dataset', key: KnowledgeRouteKey.Dataset },
+  {
+    icon: LayoutGrid,
+    label: 'Retrieval testing',
+    key: KnowledgeRouteKey.Testing,
+  },
+  { icon: Banknote, label: 'Settings', key: KnowledgeRouteKey.Configuration },
+];
+
+const dataset = {
+  id: 1,
+  title: 'Legal knowledge base',
+  files: '1,242 files',
+  size: '152 MB',
+  created: '12.02.2024',
+  image: 'https://github.com/shadcn.png',
+};
+
+export function SideBar() {
+  const pathName = useSecondPathName();
+  const { handleMenuClick } = useHandleMenuClick();
+
+  return (
+    <aside className="w-[303px]">
+      <div className="p-6 space-y-2 border-b">
+        <div
+          className="w-[70px] h-[70px] rounded-xl bg-cover"
+          style={{ backgroundImage: `url(${dataset.image})` }}
+        />
+
+        <h3 className="text-lg font-semibold mb-2">{dataset.title}</h3>
+        <div className="text-sm opacity-80">
+          {dataset.files} | {dataset.size}
+        </div>
+        <div className="text-sm opacity-80">Created {dataset.created}</div>
+      </div>
+      <div className="mt-4">
+        {items.map((item, itemIdx) => {
+          const active = pathName === item.key;
+          return (
+            <Button
+              key={itemIdx}
+              variant={active ? 'secondary' : 'ghost'}
+              className={cn('w-full justify-start gap-2.5 p-6 relative')}
+              onClick={handleMenuClick(item.key)}
+            >
+              <item.icon className="w-6 h-6" />
+              <span>{item.label}</span>
+              {active && (
+                <div className="absolute right-0 w-[5px] h-[66px] bg-primary rounded-l-xl shadow-[0_0_5.94px_#7561ff,0_0_11.88px_#7561ff,0_0_41.58px_#7561ff,0_0_83.16px_#7561ff,0_0_142.56px_#7561ff,0_0_249.48px_#7561ff]" />
+              )}
+            </Button>
+          );
+        })}
+      </div>
+    </aside>
+  );
+}
--- a/web/src/pages/dataset/testing/index.tsx
+++ b/web/src/pages/dataset/testing/index.tsx
@ -0,0 +1,3 @@
+export default function RetrievalTesting() {
+  return <div>Retrieval testing</div>;
+}
--- a/web/src/pages/datasets/index.tsx
+++ b/web/src/pages/datasets/index.tsx
@ -0,0 +1,126 @@
+import ListFilterBar from '@/components/list-filter-bar';
+import { Button } from '@/components/ui/button';
+import { Card, CardContent } from '@/components/ui/card';
+import { ChevronRight, MoreHorizontal, Plus } from 'lucide-react';
+
+const datasets = [
+  {
+    id: 1,
+    title: 'Legal knowledge base',
+    files: '1,242 files',
+    size: '152 MB',
+    created: '12.02.2024',
+    image: 'https://github.com/shadcn.png',
+  },
+  {
+    id: 2,
+    title: 'HR knowledge base',
+    files: '1,242 files',
+    size: '152 MB',
+    created: '12.02.2024',
+    image: 'https://github.com/shadcn.png',
+  },
+  {
+    id: 3,
+    title: 'IT knowledge base',
+    files: '1,242 files',
+    size: '152 MB',
+    created: '12.02.2024',
+    image: 'https://github.com/shadcn.png',
+  },
+  {
+    id: 4,
+    title: 'Legal knowledge base',
+    files: '1,242 files',
+    size: '152 MB',
+    created: '12.02.2024',
+    image: 'https://github.com/shadcn.png',
+  },
+  {
+    id: 5,
+    title: 'Legal knowledge base',
+    files: '1,242 files',
+    size: '152 MB',
+    created: '12.02.2024',
+    image: 'https://github.com/shadcn.png',
+  },
+  {
+    id: 6,
+    title: 'Legal knowledge base',
+    files: '1,242 files',
+    size: '152 MB',
+    created: '12.02.2024',
+    image: 'https://github.com/shadcn.png',
+  },
+  {
+    id: 7,
+    title: 'Legal knowledge base',
+    files: '1,242 files',
+    size: '152 MB',
+    created: '12.02.2024',
+    image: 'https://github.com/shadcn.png',
+  },
+  {
+    id: 8,
+    title: 'Legal knowledge base',
+    files: '1,242 files',
+    size: '152 MB',
+    created: '12.02.2024',
+    image: 'https://github.com/shadcn.png',
+  },
+  {
+    id: 9,
+    title: 'Legal knowledge base',
+    files: '1,242 files',
+    size: '152 MB',
+    created: '12.02.2024',
+    image: 'https://github.com/shadcn.png',
+  },
+];
+
+export default function Datasets() {
+  return (
+    <section className="p-8 text-foreground">
+      <ListFilterBar title="Datasets">
+        <Plus className="mr-2 h-4 w-4" />
+        Create dataset
+      </ListFilterBar>
+      <div className="grid gap-6 sm:grid-cols-1 md:grid-cols-2 lg:grid-cols-4 xl:grid-cols-6 2xl:grid-cols-8">
+        {datasets.map((dataset) => (
+          <Card
+            key={dataset.id}
+            className="bg-colors-background-inverse-weak flex-1"
+          >
+            <CardContent className="p-4">
+              <div className="flex justify-between mb-4">
+                <div
+                  className="w-[70px] h-[70px] rounded-xl bg-cover"
+                  style={{ backgroundImage: `url(${dataset.image})` }}
+                />
+                <Button variant="ghost" size="icon">
+                  <MoreHorizontal className="h-6 w-6" />
+                </Button>
+              </div>
+              <div className="flex justify-between items-end">
+                <div>
+                  <h3 className="text-lg font-semibold mb-2">
+                    {dataset.title}
+                  </h3>
+                  <p className="text-sm opacity-80">
+                    {dataset.files} | {dataset.size}
+                  </p>
+                  <p className="text-sm opacity-80">
+                    Created {dataset.created}
+                  </p>
+                </div>
+                <Button variant="secondary" size="icon">
+                  <ChevronRight className="h-6 w-6" />
+                </Button>
+              </div>
+            </CardContent>
+          </Card>
+        ))}
+      </div>
+    </section>
+  );
+}
--- a/web/src/pages/flow/constant.tsx
+++ b/web/src/pages/flow/constant.tsx
@ -455,7 +455,7 @@ export const initialArXivValues = {

 export const initialGoogleValues = {
  top_n: 10,
-  api_key: 'Xxx(get from https://serpapi.com/manage-api-key)',
+  api_key: 'YOUR_API_KEY (obtained from https://serpapi.com/manage-api-key)',
  country: 'cn',
  language: 'en',
  ...initialQueryBaseValues,
@ -465,7 +465,7 @@ export const initialBingValues = {
  top_n: 10,
  channel: 'Webpages',
  api_key:
-    '"YOUR_ACCESS_KEY"(get from https://www.microsoft.com/en-us/bing/apis/bing-web-search-api)',
+    'YOUR_API_KEY (obtained from https://www.microsoft.com/en-us/bing/apis/bing-web-search-api)',
  country: 'CH',
  language: 'en',
  ...initialQueryBaseValues,
--- a/web/src/pages/knowledge/index.less
+++ b/web/src/pages/knowledge/index.less
@ -2,6 +2,7 @@

 .knowledge {
  padding: 48px 0;
+  overflow: auto;
 }

 .topWrapper {
--- a/web/src/pages/knowledge/index.tsx
+++ b/web/src/pages/knowledge/index.tsx
@ -1,18 +1,26 @@
-import { useNextFetchKnowledgeList } from '@/hooks/knowledge-hooks';
+import { useInfiniteFetchKnowledgeList } from '@/hooks/knowledge-hooks';
 import { useFetchUserInfo } from '@/hooks/user-setting-hooks';
 import { PlusOutlined, SearchOutlined } from '@ant-design/icons';
-import { Button, Empty, Flex, Input, Space, Spin } from 'antd';
+import {
+  Button,
+  Divider,
+  Empty,
+  Flex,
+  Input,
+  Skeleton,
+  Space,
+  Spin,
+} from 'antd';
+import { useTranslation } from 'react-i18next';
+import InfiniteScroll from 'react-infinite-scroll-component';
+import { useSaveKnowledge } from './hooks';
 import KnowledgeCard from './knowledge-card';
 import KnowledgeCreatingModal from './knowledge-creating-modal';

-import { useTranslation } from 'react-i18next';
-import { useSaveKnowledge, useSearchKnowledge } from './hooks';
+import { useMemo } from 'react';
 import styles from './index.less';

 const KnowledgeList = () => {
-  const { searchString, handleInputChange } = useSearchKnowledge();
-  const { loading, list: data } = useNextFetchKnowledgeList();
-  const list = data.filter((x) => x.name.includes(searchString));
  const { data: userInfo } = useFetchUserInfo();
  const { t } = useTranslation('translation', { keyPrefix: 'knowledgeList' });
  const {
@ -22,9 +30,23 @@ const KnowledgeList = () => {
    onCreateOk,
    loading: creatingLoading,
  } = useSaveKnowledge();
+  const {
+    fetchNextPage,
+    data,
+    hasNextPage,
+    searchString,
+    handleInputChange,
+    loading,
+  } = useInfiniteFetchKnowledgeList();
+  console.log('🚀 ~ KnowledgeList ~ data:', data);
+  const nextList = data?.pages?.flatMap((x) => x.kbs) ?? [];
+
+  const total = useMemo(() => {
+    return data?.pages.at(-1).total ?? 0;
+  }, [data?.pages]);

  return (
-    <Flex className={styles.knowledge} vertical flex={1}>
+    <Flex className={styles.knowledge} vertical flex={1} id="scrollableDiv">
      <div className={styles.topWrapper}>
        <div>
          <span className={styles.title}>
@ -53,21 +75,30 @@ const KnowledgeList = () => {
        </Space>
      </div>
      <Spin spinning={loading}>
-        <Flex
-          gap={'large'}
-          wrap="wrap"
-          className={styles.knowledgeCardContainer}
+        <InfiniteScroll
+          dataLength={nextList?.length ?? 0}
+          next={fetchNextPage}
+          hasMore={hasNextPage}
+          loader={<Skeleton avatar paragraph={{ rows: 1 }} active />}
+          endMessage={total && <Divider plain>{t('noMoreData')} 🤐</Divider>}
+          scrollableTarget="scrollableDiv"
        >
-          {list.length > 0 ? (
-            list.map((item: any) => {
-              return (
-                <KnowledgeCard item={item} key={item.name}></KnowledgeCard>
-              );
-            })
-          ) : (
-            <Empty className={styles.knowledgeEmpty}></Empty>
-          )}
-        </Flex>
+          <Flex
+            gap={'large'}
+            wrap="wrap"
+            className={styles.knowledgeCardContainer}
+          >
+            {nextList?.length > 0 ? (
+              nextList.map((item: any) => {
+                return (
+                  <KnowledgeCard item={item} key={item.name}></KnowledgeCard>
+                );
+              })
+            ) : (
+              <Empty className={styles.knowledgeEmpty}></Empty>
+            )}
+          </Flex>
+        </InfiniteScroll>
      </Spin>
      <KnowledgeCreatingModal
        loading={creatingLoading}
--- a/web/src/pages/profile-setting/components.tsx
+++ b/web/src/pages/profile-setting/components.tsx
@ -0,0 +1,5 @@
+import { PropsWithChildren } from 'react';
+
+export function Title({ children }: PropsWithChildren) {
+  return <span className="font-bold text-xl">{children}</span>;
+}
--- a/web/src/pages/profile-setting/model/index.tsx
+++ b/web/src/pages/profile-setting/model/index.tsx
@ -0,0 +1,47 @@
+import { Button } from '@/components/ui/button';
+import { Input } from '@/components/ui/input';
+import {
+  AddModelCard,
+  ModelLibraryCard,
+  SystemModelSetting,
+} from './model-card';
+
+const addedModelList = new Array(4).fill(1);
+
+const modelLibraryList = new Array(4).fill(1);
+
+export default function ModelManagement() {
+  return (
+    <section className="p-8 space-y-8">
+      <div className="flex justify-between items-center ">
+        <h1 className="text-4xl font-bold">Team management</h1>
+        <Button className="hover:bg-[#6B4FD8] text-white bg-colors-background-core-standard">
+          Unfinished
+        </Button>
+      </div>
+      <SystemModelSetting></SystemModelSetting>
+      <section>
+        <h2 className="text-2xl font-semibold mb-3">Added model</h2>
+        <div className="grid grid-cols-1 lg:grid-cols-2 xl:grid-cols-4 2xl:grid-cols-4 gap-4">
+          {addedModelList.map((x, idx) => (
+            <AddModelCard key={idx}></AddModelCard>
+          ))}
+        </div>
+      </section>
+      <section>
+        <div className="flex justify-between items-center mb-3">
+          <h2 className="text-2xl font-semibold ">Model library</h2>
+          <Input
+            placeholder="search"
+            className="bg-colors-background-inverse-weak w-1/5"
+          ></Input>
+        </div>
+        <div className="grid grid-cols-2 lg:grid-cols-4 xl:grid-cols-6 2xl:grid-cols-8 gap-4">
+          {modelLibraryList.map((x, idx) => (
+            <ModelLibraryCard key={idx}></ModelLibraryCard>
+          ))}
+        </div>
+      </section>
+    </section>
+  );
+}
--- a/web/src/pages/profile-setting/model/model-card.tsx
+++ b/web/src/pages/profile-setting/model/model-card.tsx
@ -0,0 +1,136 @@
+import { Avatar, AvatarFallback, AvatarImage } from '@/components/ui/avatar';
+import { Button } from '@/components/ui/button';
+import { Card, CardContent } from '@/components/ui/card';
+import {
+  Select,
+  SelectContent,
+  SelectItem,
+  SelectTrigger,
+  SelectValue,
+} from '@/components/ui/select';
+import { Key, MoreVertical, Plus, Trash2 } from 'lucide-react';
+import { PropsWithChildren } from 'react';
+
+const settings = [
+  {
+    title: 'GPT Model',
+    description:
+      'The default chat LLM all the newly created knowledgebase will use.',
+    model: 'DeepseekChat',
+  },
+  {
+    title: 'Embedding Model',
+    description:
+      'The default embedding model all the newly created knowledgebase will use.',
+    model: 'DeepseekChat',
+  },
+  {
+    title: 'Image Model',
+    description:
+      'The default multi-capable model all the newly created knowledgebase will use. It can generate a picture or video.',
+    model: 'DeepseekChat',
+  },
+  {
+    title: 'Speech2TXT Model',
+    description:
+      'The default ASR model all the newly created knowledgebase will use. Use this model to translate voices to text something text.',
+    model: 'DeepseekChat',
+  },
+  {
+    title: 'TTS Model',
+    description:
+      'The default text to speech model all the newly created knowledgebase will use.',
+    model: 'DeepseekChat',
+  },
+];
+
+function Title({ children }: PropsWithChildren) {
+  return <span className="font-bold text-xl">{children}</span>;
+}
+
+export function SystemModelSetting() {
+  return (
+    <Card>
+      <CardContent className="p-4 space-y-6">
+        {settings.map((x, idx) => (
+          <div key={idx} className="flex items-center">
+            <div className="flex-1 flex flex-col">
+              <span className="font-semibold text-base">{x.title}</span>
+              <span className="text-colors-text-neutral-standard">
+                {x.description}
+              </span>
+            </div>
+            <div className="flex-1">
+              <Select defaultValue="english">
+                <SelectTrigger className="bg-colors-background-inverse-weak">
+                  <SelectValue />
+                </SelectTrigger>
+                <SelectContent>
+                  <SelectItem value="english">English</SelectItem>
+                </SelectContent>
+              </Select>
+            </div>
+          </div>
+        ))}
+      </CardContent>
+    </Card>
+  );
+}
+
+export function AddModelCard() {
+  return (
+    <Card className="pt-4">
+      <CardContent className="space-y-4">
+        <div className="flex justify-between space-y-4">
+          <Avatar>
+            <AvatarImage src="https://github.com/shadcn.png" alt="@shadcn" />
+            <AvatarFallback>CN</AvatarFallback>
+          </Avatar>
+          <Button variant={'outline'}>Sub models</Button>
+        </div>
+        <Title>Deep seek</Title>
+        <p>LLM,TEXT EMBEDDING, SPEECH2TEXT, MODERATION</p>
+        <Card>
+          <CardContent className="p-3 flex gap-2">
+            <Button variant={'secondary'}>
+              deepseek-chat <Trash2 />
+            </Button>
+            <Button variant={'secondary'}>
+              deepseek-code <Trash2 />
+            </Button>
+          </CardContent>
+        </Card>
+        <div className="flex justify-end gap-2">
+          <Button variant="secondary" size="icon">
+            <MoreVertical className="h-4 w-4" />
+          </Button>
+          <Button variant={'tertiary'}>
+            <Key /> API
+          </Button>
+        </div>
+      </CardContent>
+    </Card>
+  );
+}
+
+export function ModelLibraryCard() {
+  return (
+    <Card className="pt-4">
+      <CardContent className="space-y-4">
+        <Avatar className="mb-4">
+          <AvatarImage src="https://github.com/shadcn.png" alt="@shadcn" />
+          <AvatarFallback>CN</AvatarFallback>
+        </Avatar>
+
+        <Title>Deep seek</Title>
+        <p>LLM,TEXT EMBEDDING, SPEECH2TEXT, MODERATION</p>
+
+        <div className="text-right">
+          <Button variant={'tertiary'}>
+            <Plus /> Add
+          </Button>
+        </div>
+      </CardContent>
+    </Card>
+  );
+}
--- a/web/src/pages/profile-setting/prompt/index.tsx
+++ b/web/src/pages/profile-setting/prompt/index.tsx
@ -0,0 +1,48 @@
+import { Button } from '@/components/ui/button';
+import { Card, CardContent } from '@/components/ui/card';
+import { Plus, Trash2 } from 'lucide-react';
+import { Title } from '../components';
+
+const text = `You are an intelligent assistant. Please summarize the content of the knowledge base to answer the question. Please list the data in the knowledge base and answer in detail. When all knowledge base content is irrelevant to the question, your answer must include the sentence "The answer you are looking for is not found in the knowledge base!" Answers need to consider chat history.
+      Here is the knowledge base:
+      {knowledge}
+      The above is the knowledge base.`;
+
+const PromptManagement = () => {
+  const modelLibraryList = new Array(8).fill(1);
+
+  return (
+    <div className="p-8 ">
+      <div className="mx-auto">
+        <div className="flex justify-between items-center mb-8">
+          <h1 className="text-4xl font-bold">Prompt templates</h1>
+          <Button variant={'tertiary'} size={'sm'}>
+            <Plus className="mr-2 h-4 w-4" />
+            Create template
+          </Button>
+        </div>
+      </div>
+      <div className="grid grid-cols-2 lg:grid-cols-3 xl:grid-cols-4 2xl:grid-cols-6 gap-4">
+        {modelLibraryList.map((x, idx) => (
+          <Card className="p-0" key={idx}>
+            <CardContent className="space-y-4 p-4">
+              <Title>Prompt name</Title>
+              <p className="line-clamp-3">{text}</p>
+
+              <div className="flex justify-end gap-2">
+                <Button size={'sm'} variant={'secondary'}>
+                  <Trash2 />
+                </Button>
+                <Button variant={'outline'} size={'sm'}>
+                  Edit
+                </Button>
+              </div>
+            </CardContent>
+          </Card>
+        ))}
+      </div>
+    </div>
+  );
+};
+
+export default PromptManagement;
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Jin Hai	834c4d81f3	Update version info to v0.14.1 (#3720 ) ### What problem does this PR solve? Update version info to v0.14.1 ### Type of change - [x] Documentation Update --------- Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-11-28 20:09:20 +08:00
balibabu	a3e0ac9c0b	Fix: Clicking the checkbox of the pop-up window for editing chunk is invalid #3726 (#3727 ) ### What problem does this PR solve? Fix: Clicking the checkbox of the pop-up window for editing chunk is invalid #3726 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-28 20:08:06 +08:00
Zhichang Yu	80af3cc2d4	Don't log exception if object doesn't exist (#3724 ) ### What problem does this PR solve? Don't log exception if object doesn't exist. Close #1483 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-28 19:37:01 +08:00
writinwaters	966bcda6b9	Updated descriptions for the Agent components (#3728 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2024-11-28 19:32:50 +08:00
Zhichang Yu	112ef42a19	Ensure thumbnail be smaller than 64K (#3722 ) ### What problem does this PR solve? Ensure thumbnail be smaller than 64K. Close #1443 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2024-11-28 19:15:31 +08:00
Kevin Hu	91f1814a87	Fix error response (#3719 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2024-11-28 18:56:10 +08:00
balibabu	4e8e4fe53f	Feat: Add Dataset page #3221 (#3721 ) ### What problem does this PR solve? Feat: Add Dataset page #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-11-28 18:44:36 +08:00
Jin Hai	cdae8d28fe	Fix test cases (#3718 ) ### What problem does this PR solve? Fix test cases ### Type of change - [x] Other (please describe): Fix error cases --------- Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-11-28 17:37:46 +08:00
Zhichang Yu	964a6f4ec4	Added an infinity configuration file to easily customize the settings of Infinity (#3715 ) ### What problem does this PR solve? Added an infinity configuration file to easily customize the settings of Infinity ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-11-28 15:59:00 +08:00
Jin Hai	9fcad0500d	Add more web test cases (#3702 ) ### What problem does this PR solve? Test cases about dataset ### Type of change - [x] Other (please describe): test cases --------- Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-11-28 15:46:35 +08:00
balibabu	ec560cc99d	Feat: Scrolling knowledge base list and set the number of entries per page to 30 #3695 (#3712 ) ### What problem does this PR solve? Feat: Scrolling knowledge base list and set the number of entries per page to 30 #3695 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-11-28 15:25:38 +08:00
writinwaters	7ae8828e61	Added release notes v0.12.0 (#3711 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2024-11-28 14:57:50 +08:00
Zhichang Yu	43e367f2ea	Detect shape error of embedding (#3710 ) ### What problem does this PR solve? Detect shape error of embedding. Close #2997 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-28 14:10:22 +08:00
kuschzzp	e678819f70	Fix RGBA error (#3707 ) ### What problem does this PR solve? Passing cv_mdl.describe() is not an RGB converted image ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-28 13:09:02 +08:00
Zhichang Yu	bc701d7b4c	Edit chunk shall update instead of insert it (#3709 ) ### What problem does this PR solve? Edit chunk shall update instead of insert it. Close #3679 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-28 13:00:38 +08:00
Kevin Hu	9f57534843	Revert "Feat: Scrolling knowledge base list #3695 " (#3708 ) Reverts infiniflow/ragflow#3703	2024-11-28 11:44:23 +08:00
balibabu	52b3492b18	Feat: Scrolling knowledge base list #3695 (#3703 ) ### What problem does this PR solve? Feat: Scrolling knowledge base list #3695 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-11-28 10:51:30 +08:00
writinwaters	2229431803	Added release notes for v0.13.0 (#3691 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2024-11-27 19:26:03 +08:00
Kevin Hu	57208d8e53	Fix batch size issue. (#3675 ) ### What problem does this PR solve? #3657 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-27 18:06:43 +08:00
balibabu	535b15ace9	Feat: Add dataset sidebar #3221 (#3683 ) ### What problem does this PR solve? Feat: Add dataset sidebar #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-11-27 18:06:05 +08:00
Zhichang Yu	2249d5d413	Always open text file for write with UTF-8 (#3688 ) ### What problem does this PR solve? Always open text file for write with UTF-8. Close #932 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-27 16:24:16 +08:00
Zhichang Yu	6fb1a181aa	Added aspose on macosx/arm64 (#3686 ) ### What problem does this PR solve? Added aspose on macosx/arm64. Close #3666 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-27 15:00:07 +08:00
Zhichang Yu	90ffcb4ddb	Fix graphrag + infinity bugs (#3681 ) ### What problem does this PR solve? Fix graphrag + infinity bugs ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-27 12:45:43 +08:00
liuhua	7f48acb3fd	Fix enable/disable bug (#3662 ) ### What problem does this PR solve? Fix enable/disable bug #3628 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2024-11-27 09:37:11 +08:00
Zhichang Yu	d61bbe6750	Use polars-lts-cpu on arm64 (#3667 ) ### What problem does this PR solve? Use polars-lts-cpu on arm64 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-27 09:32:41 +08:00
balibabu	ee37ee3d28	Feat: Add Datasets page #3221 (#3661 ) ### What problem does this PR solve? Feat: Add Datasets page #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-11-27 09:31:08 +08:00
liuhua	8b35776916	Fix a bug in VolcEngine (#3658 ) ### What problem does this PR solve? Fix a bug in VolcEngine #3553 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>	2024-11-27 09:30:49 +08:00
Jin Hai	b6f3f15f0b	Fix KB list bugs and add web api test (#3649 ) ### What problem does this PR solve? 1. Read KB list path parameter, page_number and page_size, which type isn't int 2. Add cases on create / list / delete datasets. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Test cases Signed-off-by: jinhai <haijin.chn@gmail.com>	2024-11-26 18:21:15 +08:00
writinwaters	fa8e2c1678	Added release notes (#3660 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2024-11-26 18:11:39 +08:00
Zhichang Yu	7669fc8f52	Fix es get NotFoundError (#3659 ) ### What problem does this PR solve? Fix es get NotFoundError ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-26 18:07:07 +08:00
balibabu	98cf1c2a9d	Feat: add PromptManagement page #3221 (#3650 ) ### What problem does this PR solve? Feat: add PromptManagement page #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-11-26 16:55:44 +08:00
Zhichang Yu	5337cad7e4	Check model id when set dialog. Close #849 (#3655 ) ### What problem does this PR solve? Check model id when set dialog. Close #849 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-26 16:32:46 +08:00
Kevin Hu	0891a393d7	Let ThreadPool exit gracefully. (#3653 ) ### What problem does this PR solve? #3646 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-26 16:31:07 +08:00
liuhua	5c59651bda	Fix the bug causing garbled text (#3640 ) ### What problem does this PR solve? Fix the bug causing garbled text #3613 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>	2024-11-26 12:06:56 +08:00
Zhichang Yu	f6c3d7ccf6	Fixed es mapping (#3643 ) ### What problem does this PR solve? Fixed es mapping ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-26 12:00:19 +08:00
Kevin Hu	3df1663e4f	For security. (#3642 ) ### What problem does this PR solve? ### Type of change - [x] Performance Improvement	2024-11-26 09:34:34 +08:00
balibabu	32cf566a08	Feat: Add ModelManagement page #3221 (#3638 ) ### What problem does this PR solve? Feat: Add ModelManagement page #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-11-26 09:10:48 +08:00
writinwaters	769c67a470	Updated UI (#3639 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Documentation Update	2024-11-25 19:32:25 +08:00