Update version info (#564 )

### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Documentation Update - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>
fix bug of file management (#565 )
2025-12-08 20:42:30 +08:00 · 2024-04-26 20:07:26 +08:00 · 2024-04-26 19:59:21 +08:00 · 2024-04-26 18:55:37 +08:00 · 2024-04-26 18:55:21 +08:00 · 2024-04-26 17:22:23 +08:00
142 changed files with 5763 additions and 1214 deletions
--- a/.gitattributes
+++ b/.gitattributes
@ -0,0 +1 @@
 *.sh text eol=lf
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@ -1,5 +1,5 @@
 name: Bug Report
-description: Create a bug issue for infinity
+description: Create a bug issue for RAGFlow
 title: "[Bug]: "
 labels: [bug]
 body:
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@ -1,7 +1,7 @@
 ---
 name: Feature request
 title: '[Feature Request]: '
-about: Suggest an idea for Infinity
+about: Suggest an idea for RAGFlow
 labels: ''
 ---
--- a/.github/ISSUE_TEMPLATE/feature_request.yml
+++ b/.github/ISSUE_TEMPLATE/feature_request.yml
@ -1,5 +1,5 @@
 name: Feature request
-description: Propose a feature request for infinity.
+description: Propose a feature request for RAGFlow.
 title: "[Feature Request]: "
 labels: [feature request]
 body:
--- a/.github/ISSUE_TEMPLATE/question.yml
+++ b/.github/ISSUE_TEMPLATE/question.yml
@ -1,5 +1,5 @@
 name: Question
-description: Ask questions on infinity
+description: Ask questions on RAGFlow
 title: "[Question]: "
 labels: [question]
 body:
--- a/.github/ISSUE_TEMPLATE/subtask.yml
+++ b/.github/ISSUE_TEMPLATE/subtask.yml
@ -1,5 +1,5 @@
 name: Subtask
-description: "Propose a subtask for infinity"
+description: "Propose a subtask for RAGFlow"
 title: "[Subtask]: "
 labels: [subtask]
--- a/.github/pull_request_template.md
+++ b/.github/pull_request_template.md
@ -2,16 +2,11 @@
 _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._
 Issue link:#[Link the issue here]
 ### Type of change
 - [ ] Bug Fix (non-breaking change which fixes an issue)
 - [ ] New Feature (non-breaking change which adds functionality)
 - [ ] Breaking Change (fix or feature that could cause existing functionality not to work as expected)
 - [ ] Documentation Update
 - [ ] Refactoring
 - [ ] Performance Improvement
 - [ ] Test cases
 - [ ] Python SDK impacted, Need to update PyPI
 - [ ] Other (please describe):
--- a/.gitignore
+++ b/.gitignore
@ -21,3 +21,9 @@ Cargo.lock
 .idea/
 .vscode/
 # Exclude Mac generated files
 .DS_Store
 # Exclude the log folder
 docker/ragflow-logs/
--- a/Dockerfile.scratch.oc9
+++ b/Dockerfile.scratch.oc9
@ -0,0 +1,56 @@
 FROM opencloudos/opencloudos:9.0
 USER root
 WORKDIR /ragflow
 RUN dnf update -y && dnf install -y wget curl gcc-c++ openmpi-devel
 RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh && \
    bash ~/miniconda.sh -b -p /root/miniconda3 && \
    rm ~/miniconda.sh && ln -s /root/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
    echo ". /root/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc && \
    echo "conda activate base" >> ~/.bashrc
 ENV PATH /root/miniconda3/bin:$PATH
 RUN conda create -y --name py11 python=3.11
 ENV CONDA_DEFAULT_ENV py11
 ENV CONDA_PREFIX /root/miniconda3/envs/py11
 ENV PATH $CONDA_PREFIX/bin:$PATH
 # RUN curl -sL https://rpm.nodesource.com/setup_14.x | bash -
 RUN dnf install -y nodejs
 RUN dnf install -y nginx
 ADD ./web ./web
 ADD ./api ./api
 ADD ./conf ./conf
 ADD ./deepdoc ./deepdoc
 ADD ./rag ./rag
 ADD ./requirements.txt ./requirements.txt
 RUN dnf install -y openmpi openmpi-devel python3-openmpi
 ENV C_INCLUDE_PATH /usr/include/openmpi-x86_64:$C_INCLUDE_PATH
 ENV LD_LIBRARY_PATH /usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
 RUN rm /root/miniconda3/envs/py11/compiler_compat/ld
 RUN cd ./web && npm i && npm run build
 RUN conda run -n py11 pip install $(grep -ivE "mpi4py" ./requirements.txt) # without mpi4py==3.1.5
 RUN conda run -n py11 pip install redis
 RUN dnf update -y && \
    dnf install -y glib2 mesa-libGL && \
    dnf clean all
 RUN conda run -n py11 pip install ollama
 RUN conda run -n py11 python -m nltk.downloader punkt
 RUN conda run -n py11 python -m nltk.downloader wordnet
 ENV PYTHONPATH=/ragflow/
 ENV HF_ENDPOINT=https://hf-mirror.com
 ADD docker/entrypoint.sh ./entrypoint.sh
 RUN chmod +x ./entrypoint.sh
 ENTRYPOINT ["./entrypoint.sh"]
--- a/README.md
+++ b/README.md
@ -11,11 +11,14 @@
 </p>
 <p align="center">
    <a href="https://github.com/infiniflow/ragflow/releases/latest">
        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
    </a>
    <a href="https://demo.ragflow.io" target="_blank">
        <img alt="Static Badge" src="https://img.shields.io/badge/RAGFLOW-LLM-white?&labelColor=dd0af7"></a>
    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
-        <img src="https://img.shields.io/badge/docker_pull-ragflow:v1.0-brightgreen"
+        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.4.0-brightgreen"
-            alt="docker pull infiniflow/ragflow:v0.2.0"></a>
+            alt="docker pull infiniflow/ragflow:v0.4.0"></a>
      <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
    <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=7d09f1" alt="license">
  </a>
@ -55,8 +58,10 @@
 ## 📌 Latest Features
 - 2024-04-26 Add file management.
 - 2024-04-19 Support conversation API ([detail](./docs/conversation_api.md)).
 - 2024-04-16 Add an embedding model 'bce-embedding-base_v1' from [BCEmbedding](https://github.com/netease-youdao/BCEmbedding).
- 2024-04-16 Add [FastEmbed](https://github.com/qdrant/fastembed) is designed for light and speeding embedding.
+- 2024-04-16 Add [FastEmbed](https://github.com/qdrant/fastembed), which is designed specifically for light and speedy embedding.
 - 2024-04-11 Support [Xinference](./docs/xinference.md) for local LLM deployment.
 - 2024-04-10 Add a new layout recognization model for analyzing Laws documentation.
 - 2024-04-08 Support [Ollama](./docs/ollama.md) for local LLM deployment.
@ -72,8 +77,9 @@
 ### 📝 Prerequisites
- CPU >= 2 cores
+- CPU >= 4 cores
- RAM >= 8 GB
+- RAM >= 16 GB
 - Disk >= 50 GB
 - Docker >= 24.0.0 & Docker Compose >= v2.26.1
  > If you have not installed Docker on your local machine (Windows, Mac, or Linux), see [Install Docker Engine](https://docs.docker.com/engine/install/).
@ -137,9 +143,10 @@
    * Running on http://x.x.x.x:9380
    INFO:werkzeug:Press CTRL+C to quit
   ```
   > If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a `network anomaly` error because, at that moment, your RAGFlow may not be fully initialized.  
 5. In your web browser, enter the IP address of your server and log in to RAGFlow.
-   > In the given scenario, you only need to enter `http://IP_OF_YOUR_MACHINE` (sans port number) as the default HTTP serving port `80` can be omitted when using the default configurations.
+   > With default settings, you only need to enter `http://IP_OF_YOUR_MACHINE` (**sans** port number) as the default HTTP serving port `80` can be omitted when using the default configurations.
 6. In [service_conf.yaml](./docker/service_conf.yaml), select the desired LLM factory in `user_default_llm` and update the `API_KEY` field with the corresponding API key.
   > See [./docs/llm_api_key_setup.md](./docs/llm_api_key_setup.md) for more information.
@ -173,7 +180,7 @@ To build the Docker images from source:
 ```bash
 $ git clone https://github.com/infiniflow/ragflow.git
 $ cd ragflow/
-$ docker build -t infiniflow/ragflow:v0.2.0 .
+$ docker build -t infiniflow/ragflow:v0.4.0 .
 $ cd ragflow/docker
 $ chmod +x ./entrypoint.sh
 $ docker compose up -d
--- a/README_ja.md
+++ b/README_ja.md
@ -11,11 +11,14 @@
 </p>
 <p align="center">
    <a href="https://github.com/infiniflow/ragflow/releases/latest">
        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
    </a>
    <a href="https://demo.ragflow.io" target="_blank">
        <img alt="Static Badge" src="https://img.shields.io/badge/RAGFLOW-LLM-white?&labelColor=dd0af7"></a>
    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
-        <img src="https://img.shields.io/badge/docker_pull-ragflow:v1.0-brightgreen"
+        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.4.0-brightgreen"
-            alt="docker pull infiniflow/ragflow:v0.2.0"></a>
+            alt="docker pull infiniflow/ragflow:v0.4.0"></a>
      <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
    <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=7d09f1" alt="license">
  </a>
@ -55,6 +58,8 @@
 ## 📌 最新の機能
 - 2024-04-26 「ファイル管理」機能を追加しました。
 - 2024-04-19 会話 API をサポートします ([詳細](./docs/conversation_api.md))。
 - 2024-04-16 [BCEmbedding](https://github.com/netease-youdao/BCEmbedding) から埋め込みモデル「bce-embedding-base_v1」を追加します。
 - 2024-04-16 [FastEmbed](https://github.com/qdrant/fastembed) は、軽量かつ高速な埋め込み用に設計されています。
 - 2024-04-11 ローカル LLM デプロイメント用に [Xinference](./docs/xinference.md) をサポートします。
@ -72,8 +77,9 @@
 ### 📝 必要条件
- CPU >= 2 cores
+- CPU >= 4 cores
- RAM >= 8 GB
+- RAM >= 16 GB
 - Disk >= 50 GB
 - Docker >= 24.0.0 & Docker Compose >= v2.26.1
  > ローカルマシン（Windows、Mac、または Linux）に Docker をインストールしていない場合は、[Docker Engine のインストール](https://docs.docker.com/engine/install/) を参照してください。
@ -137,6 +143,7 @@
    * Running on http://x.x.x.x:9380
    INFO:werkzeug:Press CTRL+C to quit
   ```
   > もし確認ステップをスキップして直接 RAGFlow にログインした場合、その時点で RAGFlow が完全に初期化されていない可能性があるため、ブラウザーがネットワーク異常エラーを表示するかもしれません。
 5. ウェブブラウザで、プロンプトに従ってサーバーの IP アドレスを入力し、RAGFlow にログインします。
   > デフォルトの設定を使用する場合、デフォルトの HTTP サービングポート `80` は省略できるので、与えられたシナリオでは、`http://IP_OF_YOUR_MACHINE`（ポート番号は省略）だけを入力すればよい。
@ -173,7 +180,7 @@
 ```bash
 $ git clone https://github.com/infiniflow/ragflow.git
 $ cd ragflow/
-$ docker build -t infiniflow/ragflow:v0.2.0 .
+$ docker build -t infiniflow/ragflow:v0.4.0 .
 $ cd ragflow/docker
 $ chmod +x ./entrypoint.sh
 $ docker compose up -d
--- a/README_zh.md
+++ b/README_zh.md
@ -11,11 +11,14 @@
 </p>
 <p align="center">
    <a href="https://github.com/infiniflow/ragflow/releases/latest">
        <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
    </a>
    <a href="https://demo.ragflow.io" target="_blank">
        <img alt="Static Badge" src="https://img.shields.io/badge/RAGFLOW-LLM-white?&labelColor=dd0af7"></a>
    <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
-        <img src="https://img.shields.io/badge/docker_pull-ragflow:v1.0-brightgreen"
+        <img src="https://img.shields.io/badge/docker_pull-ragflow:v0.4.0-brightgreen"
-            alt="docker pull infiniflow/ragflow:v0.2.0"></a>
+            alt="docker pull infiniflow/ragflow:v0.4.0"></a>
      <a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
    <img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=7d09f1" alt="license">
  </a>
@ -55,6 +58,8 @@
 ## 📌 新增功能
 - 2024-04-26 增添了'文件管理'功能.
 - 2024-04-19 支持对话 API ([更多](./docs/conversation_api.md)).
 - 2024-04-16 添加嵌入模型 [BCEmbedding](https://github.com/netease-youdao/BCEmbedding) 。
 - 2024-04-16 添加 [FastEmbed](https://github.com/qdrant/fastembed) 专为轻型和高速嵌入而设计。
 - 2024-04-11 支持用 [Xinference](./docs/xinference.md) 本地化部署大模型。
@ -72,8 +77,9 @@
 ### 📝 前提条件
- CPU >= 2 核
+- CPU >= 4 核
- RAM >= 8 GB
+- RAM >= 16 GB
 - Disk >= 50 GB
 - Docker >= 24.0.0 & Docker Compose >= v2.26.1
  > 如果你并没有在本机安装 Docker（Windows、Mac，或者 Linux）, 可以参考文档 [Install Docker Engine](https://docs.docker.com/engine/install/) 自行安装。
@ -137,6 +143,7 @@
    * Running on http://x.x.x.x:9380
    INFO:werkzeug:Press CTRL+C to quit
   ```
   > 如果您跳过这一步系统确认步骤就登录 RAGFlow，你的浏览器有可能会提示 `network anomaly` 或 `网络异常`，因为 RAGFlow 可能并未完全启动成功。
 5. 在你的浏览器中输入你的服务器对应的 IP 地址并登录 RAGFlow。
   > 上面这个例子中，您只需输入 http://IP_OF_YOUR_MACHINE 即可：未改动过配置则无需输入端口（默认的 HTTP 服务端口 80）。
@ -173,7 +180,7 @@
 ```bash
 $ git clone https://github.com/infiniflow/ragflow.git
 $ cd ragflow/
-$ docker build -t infiniflow/ragflow:v0.2.0 .
+$ docker build -t infiniflow/ragflow:v0.4.0 .
 $ cd ragflow/docker
 $ chmod +x ./entrypoint.sh
 $ docker compose up -d
--- a/api/apps/init.py
+++ b/api/apps/init.py
@ -54,7 +54,7 @@ app.errorhandler(Exception)(server_error_response)
 #app.config["LOGIN_DISABLED"] = True
 app.config["SESSION_PERMANENT"] = False
 app.config["SESSION_TYPE"] = "filesystem"
-app.config['MAX_CONTENT_LENGTH'] = os.environ.get("MAX_CONTENT_LENGTH", 128 * 1024 * 1024)
+app.config['MAX_CONTENT_LENGTH'] = int(os.environ.get("MAX_CONTENT_LENGTH", 128 * 1024 * 1024))
 Session(app)
 login_manager = LoginManager()
--- a/api/apps/api_app.py
+++ b/api/apps/api_app.py
@ -13,18 +13,28 @@
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import os
 import re
 from datetime import datetime, timedelta
 from flask import request
 from flask_login import login_required, current_user
 from api.db import FileType, ParserType
 from api.db.db_models import APIToken, API4Conversation
 from api.db.services import duplicate_name
 from api.db.services.api_service import APITokenService, API4ConversationService
 from api.db.services.dialog_service import DialogService, chat
 from api.db.services.document_service import DocumentService
 from api.db.services.knowledgebase_service import KnowledgebaseService
 from api.db.services.user_service import UserTenantService
 from api.settings import RetCode
 from api.utils import get_uuid, current_timestamp, datetime_format
 from api.utils.api_utils import server_error_response, get_data_error_result, get_json_result, validate_request
 from itsdangerous import URLSafeTimedSerializer
 from api.utils.file_utils import filename_type, thumbnail
 from rag.utils import MINIO
 def generate_confirmation_token(tenent_id):
    serializer = URLSafeTimedSerializer(tenent_id)
@ -105,8 +115,8 @@ def stats():
        res = {
            "pv": [(o["dt"], o["pv"]) for o in objs],
            "uv": [(o["dt"], o["uv"]) for o in objs],
-            "speed": [(o["dt"], o["tokens"]/o["duration"]) for o in objs],
+            "speed": [(o["dt"], float(o["tokens"])/(float(o["duration"]+0.1))) for o in objs],
-            "tokens": [(o["dt"], o["tokens"]/1000.) for o in objs],
+            "tokens": [(o["dt"], float(o["tokens"])/1000.) for o in objs],
            "round": [(o["dt"], o["round"]) for o in objs],
            "thumb_up": [(o["dt"], o["thumb_up"]) for o in objs]
        }
@ -115,8 +125,7 @@ def stats():
        return server_error_response(e)
-@manager.route('/new_conversation', methods=['POST'])
+@manager.route('/new_conversation', methods=['GET'])
@validate_request("user_id")
 def set_conversation():
    token = request.headers.get('Authorization').split()[1]
    objs = APIToken.query(token=token)
@ -131,7 +140,7 @@ def set_conversation():
        conv = {
            "id": get_uuid(),
            "dialog_id": dia.id,
-            "user_id": req["user_id"],
+            "user_id": request.args.get("user_id", ""),
            "message": [{"role": "assistant", "content": dia.prompt_config["prologue"]}]
        }
        API4ConversationService.save(**conv)
@ -177,7 +186,6 @@ def completion():
        conv.reference.append(ans["reference"])
        conv.message.append({"role": "assistant", "content": ans["answer"]})
        API4ConversationService.append_message(conv.id, conv.to_dict())
        APITokenService.APITokenService(token)
        return get_json_result(data=ans)
    except Exception as e:
        return server_error_response(e)
@ -193,4 +201,74 @@ def get(conversation_id):
        return get_json_result(data=conv.to_dict())
    except Exception as e:
-        return server_error_response(e)
+        return server_error_response(e)
@manager.route('/document/upload', methods=['POST'])
@validate_request("kb_name")
 def upload():
    token = request.headers.get('Authorization').split()[1]
    objs = APIToken.query(token=token)
    if not objs:
        return get_json_result(
            data=False, retmsg='Token is not valid!"', retcode=RetCode.AUTHENTICATION_ERROR)
    kb_name = request.form.get("kb_name").strip()
    tenant_id = objs[0].tenant_id
    try:
        e, kb = KnowledgebaseService.get_by_name(kb_name, tenant_id)
        if not e:
            return get_data_error_result(
                retmsg="Can't find this knowledgebase!")
        kb_id = kb.id
    except Exception as e:
        return server_error_response(e)
    if 'file' not in request.files:
        return get_json_result(
            data=False, retmsg='No file part!', retcode=RetCode.ARGUMENT_ERROR)
    file = request.files['file']
    if file.filename == '':
        return get_json_result(
            data=False, retmsg='No file selected!', retcode=RetCode.ARGUMENT_ERROR)
    try:
        if DocumentService.get_doc_count(kb.tenant_id) >= int(os.environ.get('MAX_FILE_NUM_PER_USER', 8192)):
            return get_data_error_result(
                retmsg="Exceed the maximum file number of a free user!")
        filename = duplicate_name(
            DocumentService.query,
            name=file.filename,
            kb_id=kb_id)
        filetype = filename_type(filename)
        if not filetype:
            return get_data_error_result(
                retmsg="This type of file has not been supported yet!")
        location = filename
        while MINIO.obj_exist(kb_id, location):
            location += "_"
        blob = request.files['file'].read()
        MINIO.put(kb_id, location, blob)
        doc = {
            "id": get_uuid(),
            "kb_id": kb.id,
            "parser_id": kb.parser_id,
            "parser_config": kb.parser_config,
            "created_by": kb.tenant_id,
            "type": filetype,
            "name": filename,
            "location": location,
            "size": len(blob),
            "thumbnail": thumbnail(filename, blob)
        }
        if doc["type"] == FileType.VISUAL:
            doc["parser_id"] = ParserType.PICTURE.value
        if re.search(r"\.(ppt|pptx|pages)$", filename):
            doc["parser_id"] = ParserType.PRESENTATION.value
        doc = DocumentService.insert(doc)
        return get_json_result(data=doc.to_json())
    except Exception as e:
        return server_error_response(e)
--- a/api/apps/document_app.py
+++ b/api/apps/document_app.py
@ -23,6 +23,9 @@ import flask
 from elasticsearch_dsl import Q
 from flask import request
 from flask_login import login_required, current_user
 from api.db.services.file2document_service import File2DocumentService
 from api.db.services.file_service import FileService
 from rag.nlp import search
 from rag.utils import ELASTICSEARCH
 from api.db.services import duplicate_name
@ -58,7 +61,8 @@ def upload():
        if not e:
            return get_data_error_result(
                retmsg="Can't find this knowledgebase!")
-        if DocumentService.get_doc_count(kb.tenant_id) >= int(os.environ.get('MAX_FILE_NUM_PER_USER', 8192)):
+        MAX_FILE_NUM_PER_USER = int(os.environ.get('MAX_FILE_NUM_PER_USER', 0))
        if MAX_FILE_NUM_PER_USER > 0 and DocumentService.get_doc_count(kb.tenant_id) >= MAX_FILE_NUM_PER_USER:
            return get_data_error_result(
                retmsg="Exceed the maximum file number of a free user!")
@ -67,7 +71,7 @@ def upload():
            name=file.filename,
            kb_id=kb.id)
        filetype = filename_type(filename)
-        if not filetype:
+        if filetype == FileType.OTHER.value:
            return get_data_error_result(
                retmsg="This type of file has not been supported yet!")
@ -217,26 +221,37 @@ def change_status():
@validate_request("doc_id")
 def rm():
    req = request.json
-    try:
+    doc_ids = req["doc_id"]
-        e, doc = DocumentService.get_by_id(req["doc_id"])
+    if isinstance(doc_ids, str): doc_ids = [doc_ids]
-        if not e:
+    errors = ""
-            return get_data_error_result(retmsg="Document not found!")
+    for doc_id in doc_ids:
-        tenant_id = DocumentService.get_tenant_id(req["doc_id"])
+        try:
-        if not tenant_id:
+            e, doc = DocumentService.get_by_id(doc_id)
            return get_data_error_result(retmsg="Tenant not found!")
        ELASTICSEARCH.deleteByQuery(
            Q("match", doc_id=doc.id), idxnm=search.index_name(tenant_id))
-        DocumentService.increment_chunk_num(
+            if not e:
-            doc.id, doc.kb_id, doc.token_num * -1, doc.chunk_num * -1, 0)
+                return get_data_error_result(retmsg="Document not found!")
-        if not DocumentService.delete(doc):
+            tenant_id = DocumentService.get_tenant_id(doc_id)
-            return get_data_error_result(
+            if not tenant_id:
-                retmsg="Database error (Document removal)!")
+                return get_data_error_result(retmsg="Tenant not found!")
-        MINIO.rm(doc.kb_id, doc.location)
+            ELASTICSEARCH.deleteByQuery(
-        return get_json_result(data=True)
+                Q("match", doc_id=doc.id), idxnm=search.index_name(tenant_id))
-    except Exception as e:
+            DocumentService.increment_chunk_num(
-        return server_error_response(e)
+                doc.id, doc.kb_id, doc.token_num * -1, doc.chunk_num * -1, 0)
            if not DocumentService.delete(doc):
                return get_data_error_result(
                    retmsg="Database error (Document removal)!")
            informs = File2DocumentService.get_by_document_id(doc_id)
            if not informs:
                MINIO.rm(doc.kb_id, doc.location)
            else:
                File2DocumentService.delete_by_document_id(doc_id)
        except Exception as e:
            errors += str(e)
    if errors: return server_error_response(e)
    return get_json_result(data=True)
@manager.route('/run', methods=['POST'])
@ -288,6 +303,11 @@ def rename():
            return get_data_error_result(
                retmsg="Database error (Document rename)!")
        informs = File2DocumentService.get_by_document_id(req["doc_id"])
        if informs:
            e, file = FileService.get_by_id(informs[0].file_id)
            FileService.update_by_id(file.id, {"name": req["name"]})
        return get_json_result(data=True)
    except Exception as e:
        return server_error_response(e)
@ -301,7 +321,13 @@ def get(doc_id):
        if not e:
            return get_data_error_result(retmsg="Document not found!")
-        response = flask.make_response(MINIO.get(doc.kb_id, doc.location))
+        informs = File2DocumentService.get_by_document_id(doc_id)
        if not informs:
            response = flask.make_response(MINIO.get(doc.kb_id, doc.location))
        else:
            e, file = FileService.get_by_id(informs[0].file_id)
            response = flask.make_response(MINIO.get(file.parent_id, doc.location))
        ext = re.search(r"\.([^.]+)$", doc.name)
        if ext:
            if doc.type == FileType.VISUAL.value:
--- a/api/apps/file2document_app.py
+++ b/api/apps/file2document_app.py
@ -0,0 +1,137 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License
 #
 from elasticsearch_dsl import Q
 from api.db.db_models import File2Document
 from api.db.services.file2document_service import File2DocumentService
 from api.db.services.file_service import FileService
 from flask import request
 from flask_login import login_required, current_user
 from api.db.services.knowledgebase_service import KnowledgebaseService
 from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
 from api.utils import get_uuid
 from api.db import FileType
 from api.db.services.document_service import DocumentService
 from api.settings import RetCode
 from api.utils.api_utils import get_json_result
 from rag.nlp import search
 from rag.utils import ELASTICSEARCH
@manager.route('/convert', methods=['POST'])
@login_required
@validate_request("file_ids", "kb_ids")
 def convert():
    req = request.json
    kb_ids = req["kb_ids"]
    file_ids = req["file_ids"]
    file2documents = []
    try:
        for file_id in file_ids:
            e, file = FileService.get_by_id(file_id)
            file_ids_list = [file_id]
            if file.type == FileType.FOLDER.value:
                file_ids_list = FileService.get_all_innermost_file_ids(file_id, [])
            for id in file_ids_list:
                informs = File2DocumentService.get_by_file_id(id)
                # delete
                for inform in informs:
                    doc_id = inform.document_id
                    e, doc = DocumentService.get_by_id(doc_id)
                    if not e:
                        return get_data_error_result(retmsg="Document not found!")
                    tenant_id = DocumentService.get_tenant_id(doc_id)
                    if not tenant_id:
                        return get_data_error_result(retmsg="Tenant not found!")
                    ELASTICSEARCH.deleteByQuery(
                        Q("match", doc_id=doc.id), idxnm=search.index_name(tenant_id))
                    DocumentService.increment_chunk_num(
                        doc.id, doc.kb_id, doc.token_num * -1, doc.chunk_num * -1, 0)
                    if not DocumentService.delete(doc):
                        return get_data_error_result(
                            retmsg="Database error (Document removal)!")
                File2DocumentService.delete_by_file_id(id)
                # insert
                for kb_id in kb_ids:
                    e, kb = KnowledgebaseService.get_by_id(kb_id)
                    if not e:
                        return get_data_error_result(
                            retmsg="Can't find this knowledgebase!")
                    e, file = FileService.get_by_id(id)
                    if not e:
                        return get_data_error_result(
                            retmsg="Can't find this file!")
                    doc = DocumentService.insert({
                        "id": get_uuid(),
                        "kb_id": kb.id,
                        "parser_id": kb.parser_id,
                        "parser_config": kb.parser_config,
                        "created_by": current_user.id,
                        "type": file.type,
                        "name": file.name,
                        "location": file.location,
                        "size": file.size
                    })
                    file2document = File2DocumentService.insert({
                        "id": get_uuid(),
                        "file_id": id,
                        "document_id": doc.id,
                    })
                    file2documents.append(file2document.to_json())
        return get_json_result(data=file2documents)
    except Exception as e:
        return server_error_response(e)
@manager.route('/rm', methods=['POST'])
@login_required
@validate_request("file_ids")
 def rm():
    req = request.json
    file_ids = req["file_ids"]
    if not file_ids:
        return get_json_result(
            data=False, retmsg='Lack of "Files ID"', retcode=RetCode.ARGUMENT_ERROR)
    try:
        for file_id in file_ids:
            informs = File2DocumentService.get_by_file_id(file_id)
            if not informs:
                return get_data_error_result(retmsg="Inform not found!")
            for inform in informs:
                if not inform:
                    return get_data_error_result(retmsg="Inform not found!")
                File2DocumentService.delete_by_file_id(file_id)
                doc_id = inform.document_id
                e, doc = DocumentService.get_by_id(doc_id)
                if not e:
                    return get_data_error_result(retmsg="Document not found!")
                tenant_id = DocumentService.get_tenant_id(doc_id)
                if not tenant_id:
                    return get_data_error_result(retmsg="Tenant not found!")
                ELASTICSEARCH.deleteByQuery(
                    Q("match", doc_id=doc.id), idxnm=search.index_name(tenant_id))
                DocumentService.increment_chunk_num(
                    doc.id, doc.kb_id, doc.token_num * -1, doc.chunk_num * -1, 0)
                if not DocumentService.delete(doc):
                    return get_data_error_result(
                        retmsg="Database error (Document removal)!")
        return get_json_result(data=True)
    except Exception as e:
        return server_error_response(e)
--- a/api/apps/file_app.py
+++ b/api/apps/file_app.py
@ -0,0 +1,347 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License
 #
 import os
 import pathlib
 import re
 import flask
 from elasticsearch_dsl import Q
 from flask import request
 from flask_login import login_required, current_user
 from api.db.services.document_service import DocumentService
 from api.db.services.file2document_service import File2DocumentService
 from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
 from api.utils import get_uuid
 from api.db import FileType
 from api.db.services import duplicate_name
 from api.db.services.file_service import FileService
 from api.settings import RetCode
 from api.utils.api_utils import get_json_result
 from api.utils.file_utils import filename_type
 from rag.nlp import search
 from rag.utils import ELASTICSEARCH
 from rag.utils.minio_conn import MINIO
@manager.route('/upload', methods=['POST'])
@login_required
 # @validate_request("parent_id")
 def upload():
    pf_id = request.form.get("parent_id")
    if not pf_id:
        root_folder = FileService.get_root_folder(current_user.id)
        pf_id = root_folder.id
    if 'file' not in request.files:
        return get_json_result(
            data=False, retmsg='No file part!', retcode=RetCode.ARGUMENT_ERROR)
    file_objs = request.files.getlist('file')
    for file_obj in file_objs:
        if file_obj.filename == '':
            return get_json_result(
                data=False, retmsg='No file selected!', retcode=RetCode.ARGUMENT_ERROR)
    file_res = []
    try:
        for file_obj in file_objs:
            e, file = FileService.get_by_id(pf_id)
            if not e:
                return get_data_error_result(
                    retmsg="Can't find this folder!")
            MAX_FILE_NUM_PER_USER = int(os.environ.get('MAX_FILE_NUM_PER_USER', 0))
            if MAX_FILE_NUM_PER_USER > 0 and DocumentService.get_doc_count(current_user.id) >= MAX_FILE_NUM_PER_USER:
                return get_data_error_result(
                    retmsg="Exceed the maximum file number of a free user!")
            # split file name path
            if not file_obj.filename:
                e, file = FileService.get_by_id(pf_id)
                file_obj_names = [file.name, file_obj.filename]
            else:
                full_path = '/' + file_obj.filename
                file_obj_names = full_path.split('/')
            file_len = len(file_obj_names)
            # get folder
            file_id_list = FileService.get_id_list_by_id(pf_id, file_obj_names, 1, [pf_id])
            len_id_list = len(file_id_list)
            # create folder
            if file_len != len_id_list:
                e, file = FileService.get_by_id(file_id_list[len_id_list - 1])
                if not e:
                    return get_data_error_result(retmsg="Folder not found!")
                last_folder = FileService.create_folder(file, file_id_list[len_id_list - 1], file_obj_names,
                                                        len_id_list)
            else:
                e, file = FileService.get_by_id(file_id_list[len_id_list - 2])
                if not e:
                    return get_data_error_result(retmsg="Folder not found!")
                last_folder = FileService.create_folder(file, file_id_list[len_id_list - 2], file_obj_names,
                                                        len_id_list)
            # file type
            filetype = filename_type(file_obj_names[file_len - 1])
            location = file_obj_names[file_len - 1]
            while MINIO.obj_exist(last_folder.id, location):
                location += "_"
            blob = file_obj.read()
            filename = duplicate_name(
                FileService.query,
                name=file_obj_names[file_len - 1],
                parent_id=last_folder.id)
            file = {
                "id": get_uuid(),
                "parent_id": last_folder.id,
                "tenant_id": current_user.id,
                "created_by": current_user.id,
                "type": filetype,
                "name": filename,
                "location": location,
                "size": len(blob),
            }
            file = FileService.insert(file)
            MINIO.put(last_folder.id, location, blob)
            file_res.append(file.to_json())
        return get_json_result(data=file_res)
    except Exception as e:
        return server_error_response(e)
@manager.route('/create', methods=['POST'])
@login_required
@validate_request("name")
 def create():
    req = request.json
    pf_id = request.json.get("parent_id")
    input_file_type = request.json.get("type")
    if not pf_id:
        root_folder = FileService.get_root_folder(current_user.id)
        pf_id = root_folder.id
    try:
        if not FileService.is_parent_folder_exist(pf_id):
            return get_json_result(
                data=False, retmsg="Parent Folder Doesn't Exist!", retcode=RetCode.OPERATING_ERROR)
        if FileService.query(name=req["name"], parent_id=pf_id):
            return get_data_error_result(
                retmsg="Duplicated folder name in the same folder.")
        if input_file_type == FileType.FOLDER.value:
            file_type = FileType.FOLDER.value
        else:
            file_type = FileType.VIRTUAL.value
        file = FileService.insert({
            "id": get_uuid(),
            "parent_id": pf_id,
            "tenant_id": current_user.id,
            "created_by": current_user.id,
            "name": req["name"],
            "location": "",
            "size": 0,
            "type": file_type
        })
        return get_json_result(data=file.to_json())
    except Exception as e:
        return server_error_response(e)
@manager.route('/list', methods=['GET'])
@login_required
 def list():
    pf_id = request.args.get("parent_id")
    keywords = request.args.get("keywords", "")
    page_number = int(request.args.get("page", 1))
    items_per_page = int(request.args.get("page_size", 15))
    orderby = request.args.get("orderby", "create_time")
    desc = request.args.get("desc", True)
    if not pf_id:
        root_folder = FileService.get_root_folder(current_user.id)
        pf_id = root_folder.id
    try:
        e, file = FileService.get_by_id(pf_id)
        if not e:
            return get_data_error_result(retmsg="Folder not found!")
        files, total = FileService.get_by_pf_id(
            current_user.id, pf_id, page_number, items_per_page, orderby, desc, keywords)
        parent_folder = FileService.get_parent_folder(pf_id)
        if not FileService.get_parent_folder(pf_id):
            return get_json_result(retmsg="File not found!")
        return get_json_result(data={"total": total, "files": files, "parent_folder": parent_folder.to_json()})
    except Exception as e:
        return server_error_response(e)
@manager.route('/root_folder', methods=['GET'])
@login_required
 def get_root_folder():
    try:
        root_folder = FileService.get_root_folder(current_user.id)
        return get_json_result(data={"root_folder": root_folder.to_json()})
    except Exception as e:
        return server_error_response(e)
@manager.route('/parent_folder', methods=['GET'])
@login_required
 def get_parent_folder():
    file_id = request.args.get("file_id")
    try:
        e, file = FileService.get_by_id(file_id)
        if not e:
            return get_data_error_result(retmsg="Folder not found!")
        parent_folder = FileService.get_parent_folder(file_id)
        return get_json_result(data={"parent_folder": parent_folder.to_json()})
    except Exception as e:
        return server_error_response(e)
@manager.route('/all_parent_folder', methods=['GET'])
@login_required
 def get_all_parent_folders():
    file_id = request.args.get("file_id")
    try:
        e, file = FileService.get_by_id(file_id)
        if not e:
            return get_data_error_result(retmsg="Folder not found!")
        parent_folders = FileService.get_all_parent_folders(file_id)
        parent_folders_res = []
        for parent_folder in parent_folders:
            parent_folders_res.append(parent_folder.to_json())
        return get_json_result(data={"parent_folders": parent_folders_res})
    except Exception as e:
        return server_error_response(e)
@manager.route('/rm', methods=['POST'])
@login_required
@validate_request("file_ids")
 def rm():
    req = request.json
    file_ids = req["file_ids"]
    try:
        for file_id in file_ids:
            e, file = FileService.get_by_id(file_id)
            if not e:
                return get_data_error_result(retmsg="File or Folder not found!")
            if not file.tenant_id:
                return get_data_error_result(retmsg="Tenant not found!")
            if file.type == FileType.FOLDER.value:
                file_id_list = FileService.get_all_innermost_file_ids(file_id, [])
                for inner_file_id in file_id_list:
                    e, file = FileService.get_by_id(inner_file_id)
                    if not e:
                        return get_data_error_result(retmsg="File not found!")
                    MINIO.rm(file.parent_id, file.location)
                FileService.delete_folder_by_pf_id(current_user.id, file_id)
            else:
                if not FileService.delete(file):
                    return get_data_error_result(
                        retmsg="Database error (File removal)!")
            # delete file2document
            informs = File2DocumentService.get_by_file_id(file_id)
            for inform in informs:
                doc_id = inform.document_id
                e, doc = DocumentService.get_by_id(doc_id)
                if not e:
                    return get_data_error_result(retmsg="Document not found!")
                tenant_id = DocumentService.get_tenant_id(doc_id)
                if not tenant_id:
                    return get_data_error_result(retmsg="Tenant not found!")
                ELASTICSEARCH.deleteByQuery(
                    Q("match", doc_id=doc.id), idxnm=search.index_name(tenant_id))
                DocumentService.increment_chunk_num(
                    doc.id, doc.kb_id, doc.token_num * -1, doc.chunk_num * -1, 0)
                if not DocumentService.delete(doc):
                    return get_data_error_result(
                        retmsg="Database error (Document removal)!")
            File2DocumentService.delete_by_file_id(file_id)
        return get_json_result(data=True)
    except Exception as e:
        return server_error_response(e)
@manager.route('/rename', methods=['POST'])
@login_required
@validate_request("file_id", "name")
 def rename():
    req = request.json
    try:
        e, file = FileService.get_by_id(req["file_id"])
        if not e:
            return get_data_error_result(retmsg="File not found!")
        if pathlib.Path(req["name"].lower()).suffix != pathlib.Path(
                file.name.lower()).suffix:
            return get_json_result(
                data=False,
                retmsg="The extension of file can't be changed",
                retcode=RetCode.ARGUMENT_ERROR)
        if FileService.query(name=req["name"], pf_id=file.parent_id):
            return get_data_error_result(
                retmsg="Duplicated file name in the same folder.")
        if not FileService.update_by_id(
                req["file_id"], {"name": req["name"]}):
            return get_data_error_result(
                retmsg="Database error (File rename)!")
        informs = File2DocumentService.get_by_file_id(req["file_id"])
        if informs:
            if not DocumentService.update_by_id(
                    informs[0].document_id, {"name": req["name"]}):
                return get_data_error_result(
                    retmsg="Database error (Document rename)!")
        return get_json_result(data=True)
    except Exception as e:
        return server_error_response(e)
@manager.route('/get/<file_id>', methods=['GET'])
 # @login_required
 def get(file_id):
    try:
        e, doc = FileService.get_by_id(file_id)
        if not e:
            return get_data_error_result(retmsg="Document not found!")
        response = flask.make_response(MINIO.get(doc.parent_id, doc.location))
        ext = re.search(r"\.([^.]+)$", doc.name)
        if ext:
            if doc.type == FileType.VISUAL.value:
                response.headers.set('Content-Type', 'image/%s' % ext.group(1))
            else:
                response.headers.set(
                    'Content-Type',
                    'application/%s' %
                    ext.group(1))
        return response
    except Exception as e:
        return server_error_response(e)
--- a/api/apps/kb_app.py
+++ b/api/apps/kb_app.py
@ -111,7 +111,7 @@ def detail():
@login_required
 def list():
    page_number = request.args.get("page", 1)
-    items_per_page = request.args.get("page_size", 15)
+    items_per_page = request.args.get("page_size", 150)
    orderby = request.args.get("orderby", "create_time")
    desc = request.args.get("desc", True)
    try:
--- a/api/apps/llm_app.py
+++ b/api/apps/llm_app.py
@ -28,7 +28,7 @@ from rag.llm import EmbeddingModel, ChatModel
 def factories():
    try:
        fac = LLMFactoriesService.get_all()
-        return get_json_result(data=[f.to_dict() for f in fac if f.name not in ["QAnything", "FastEmbed"]])
+        return get_json_result(data=[f.to_dict() for f in fac if f.name not in ["Youdao", "FastEmbed"]])
    except Exception as e:
        return server_error_response(e)
@ -174,7 +174,7 @@ def list():
        llms = [m.to_dict()
                for m in llms if m.status == StatusEnum.VALID.value]
        for m in llms:
-            m["available"] = m["fid"] in facts or m["llm_name"].lower() == "flag-embedding" or m["fid"] in ["QAnything","FastEmbed"]
+            m["available"] = m["fid"] in facts or m["llm_name"].lower() == "flag-embedding" or m["fid"] in ["Youdao","FastEmbed"]
        llm_set = set([m["llm_name"] for m in llms])
        for o in objs:
--- a/api/apps/user_app.py
+++ b/api/apps/user_app.py
@ -14,6 +14,7 @@
 #  limitations under the License.
 #
 import re
 from datetime import datetime
 from flask import request, session, redirect
 from werkzeug.security import generate_password_hash, check_password_hash
@ -22,11 +23,12 @@ from flask_login import login_required, current_user, login_user, logout_user
 from api.db.db_models import TenantLLM
 from api.db.services.llm_service import TenantLLMService, LLMService
 from api.utils.api_utils import server_error_response, validate_request
-from api.utils import get_uuid, get_format_time, decrypt, download_img
+from api.utils import get_uuid, get_format_time, decrypt, download_img, current_timestamp, datetime_format
-from api.db import UserTenantRole, LLMType
+from api.db import UserTenantRole, LLMType, FileType
 from api.settings import RetCode, GITHUB_OAUTH, CHAT_MDL, EMBEDDING_MDL, ASR_MDL, IMAGE2TEXT_MDL, PARSERS, API_KEY, \
    LLM_FACTORY, LLM_BASE_URL
 from api.db.services.user_service import UserService, TenantService, UserTenantService
 from api.db.services.file_service import FileService
 from api.settings import stat_logger
 from api.utils.api_utils import get_json_result, cors_reponse
@ -56,6 +58,8 @@ def login():
        response_data = user.to_json()
        user.access_token = get_uuid()
        login_user(user)
        user.update_time = current_timestamp(),
        user.update_date = datetime_format(datetime.now()),
        user.save()
        msg = "Welcome back!"
        return cors_reponse(data=response_data, auth=user.get_id(), retmsg=msg)
@ -218,6 +222,17 @@ def user_register(user_id, user):
        "invited_by": user_id,
        "role": UserTenantRole.OWNER
    }
    file_id = get_uuid()
    file = {
        "id": file_id,
        "parent_id": file_id,
        "tenant_id": user_id,
        "created_by": user_id,
        "name": "/",
        "type": FileType.FOLDER.value,
        "size": 0,
        "location": "",
    }
    tenant_llm = []
    for llm in LLMService.query(fid=LLM_FACTORY):
        tenant_llm.append({"tenant_id": user_id,
@ -233,6 +248,7 @@ def user_register(user_id, user):
    TenantService.insert(**tenant)
    UserTenantService.insert(**usr_tenant)
    TenantLLMService.insert_many(tenant_llm)
    FileService.insert(file)
    return UserService.query(email=user["email"])
--- a/api/db/init.py
+++ b/api/db/init.py
@ -45,6 +45,8 @@ class FileType(StrEnum):
    VISUAL = 'visual'
    AURAL = 'aural'
    VIRTUAL = 'virtual'
    FOLDER = 'folder'
    OTHER = "other"
 class LLMType(StrEnum):
--- a/api/db/db_models.py
+++ b/api/db/db_models.py
@ -629,7 +629,7 @@ class Document(DataBaseModel):
        max_length=128,
        null=False,
        default="local",
-        help_text="where dose this document from")
+        help_text="where dose this document come from")
    type = CharField(max_length=32, null=False, help_text="file extension")
    created_by = CharField(
        max_length=32,
@ -669,6 +669,61 @@ class Document(DataBaseModel):
        db_table = "document"
 class File(DataBaseModel):
    id = CharField(
        max_length=32,
        primary_key=True,
    )
    parent_id = CharField(
        max_length=32,
        null=False,
        help_text="parent folder id",
        index=True)
    tenant_id = CharField(
        max_length=32,
        null=False,
        help_text="tenant id",
        index=True)
    created_by = CharField(
        max_length=32,
        null=False,
        help_text="who created it")
    name = CharField(
        max_length=255,
        null=False,
        help_text="file name or folder name",
        index=True)
    location = CharField(
        max_length=255,
        null=True,
        help_text="where dose it store")
    size = IntegerField(default=0)
    type = CharField(max_length=32, null=False, help_text="file extension")
    class Meta:
        db_table = "file"
 class File2Document(DataBaseModel):
    id = CharField(
        max_length=32,
        primary_key=True,
    )
    file_id = CharField(
        max_length=32,
        null=True,
        help_text="file id",
        index=True)
    document_id = CharField(
        max_length=32,
        null=True,
        help_text="document id",
        index=True)
    class Meta:
        db_table = "file2document"
 class Task(DataBaseModel):
    id = CharField(max_length=32, primary_key=True)
    doc_id = CharField(max_length=32, null=False, index=True)
@ -697,7 +752,7 @@ class Dialog(DataBaseModel):
        null=True,
        default="Chinese",
        help_text="English|Chinese")
-    llm_id = CharField(max_length=32, null=False, help_text="default llm ID")
+    llm_id = CharField(max_length=128, null=False, help_text="default llm ID")
    llm_setting = JSONField(null=False, default={"temperature": 0.1, "top_p": 0.3, "frequency_penalty": 0.7,
                                                 "presence_penalty": 0.4, "max_tokens": 215})
    prompt_type = CharField(
--- a/api/db/init_data.py
+++ b/api/db/init_data.py
@ -120,7 +120,7 @@ factory_infos = [{
    "tags": "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION",
        "status": "1",
 },{
-    "name": "QAnything",
+    "name": "Youdao",
    "logo": "",
    "tags": "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION",
        "status": "1",
@ -323,7 +323,7 @@ def init_llm_factory():
            "max_tokens": 2147483648,
            "model_type": LLMType.EMBEDDING.value
        },
-        # ------------------------ QAnything -----------------------
+        # ------------------------ Youdao -----------------------
        {
            "fid": factory_infos[7]["name"],
            "llm_name": "maidalun1020/bce-embedding-base_v1",
@ -347,7 +347,9 @@ def init_llm_factory():
    LLMService.filter_delete([LLM.fid == "Local"])
    LLMService.filter_delete([LLM.fid == "Moonshot", LLM.llm_name == "flag-embedding"])
    TenantLLMService.filter_delete([TenantLLM.llm_factory == "Moonshot", TenantLLM.llm_name == "flag-embedding"])
-
+    LLMFactoriesService.filter_delete([LLMFactoriesService.model.name == "QAnything"])
    LLMService.filter_delete([LLMService.model.fid == "QAnything"])
    TenantLLMService.filter_update([TenantLLMService.model.llm_factory == "QAnything"], {"llm_factory": "Youdao"})
    """
    drop table llm;
    drop table llm_factories;
--- a/api/db/services/api_service.py
+++ b/api/db/services/api_service.py
@ -40,8 +40,8 @@ class API4ConversationService(CommonService):
    @classmethod
    @DB.connection_context()
    def append_message(cls, id, conversation):
-        cls.model.update_by_id(id, conversation)
+        cls.update_by_id(id, conversation)
-        return cls.model.update(round=cls.model.round + 1).where(id=id).execute()
+        return cls.model.update(round=cls.model.round + 1).where(cls.model.id==id).execute()
    @classmethod
    @DB.connection_context()
--- a/api/db/services/document_service.py
+++ b/api/db/services/document_service.py
@ -15,6 +15,11 @@
 #
 from peewee import Expression
 from elasticsearch_dsl import Q
 from rag.utils import ELASTICSEARCH
 from rag.utils.minio_conn import MINIO
 from rag.nlp import search
 from api.db import FileType, TaskStatus
 from api.db.db_models import DB, Knowledgebase, Tenant
 from api.db.db_models import Document
@ -69,6 +74,20 @@ class DocumentService(CommonService):
            raise RuntimeError("Database error (Knowledgebase)!")
        return cls.delete_by_id(doc.id)
    @classmethod
    @DB.connection_context()
    def remove_document(cls, doc, tenant_id):
        ELASTICSEARCH.deleteByQuery(
            Q("match", doc_id=doc.id), idxnm=search.index_name(tenant_id))
        cls.increment_chunk_num(
            doc.id, doc.kb_id, doc.token_num * -1, doc.chunk_num * -1, 0)
        if not cls.delete(doc):
            raise RuntimeError("Database error (Document removal)!")
        MINIO.rm(doc.kb_id, doc.location)
        return cls.delete_by_id(doc.id)
    @classmethod
    @DB.connection_context()
    def get_newly_uploaded(cls, tm, mod=0, comm=1, items_per_page=64):
--- a/api/db/services/file2document_service.py
+++ b/api/db/services/file2document_service.py
@ -0,0 +1,66 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from datetime import datetime
 from api.db.db_models import DB
 from api.db.db_models import File, Document, File2Document
 from api.db.services.common_service import CommonService
 from api.utils import current_timestamp, datetime_format
 class File2DocumentService(CommonService):
    model = File2Document
    @classmethod
    @DB.connection_context()
    def get_by_file_id(cls, file_id):
        objs = cls.model.select().where(cls.model.file_id == file_id)
        return objs
    @classmethod
    @DB.connection_context()
    def get_by_document_id(cls, document_id):
        objs = cls.model.select().where(cls.model.document_id == document_id)
        return objs
    @classmethod
    @DB.connection_context()
    def insert(cls, obj):
        if not cls.save(**obj):
            raise RuntimeError("Database error (File)!")
        e, obj = cls.get_by_id(obj["id"])
        if not e:
            raise RuntimeError("Database error (File retrieval)!")
        return obj
    @classmethod
    @DB.connection_context()
    def delete_by_file_id(cls, file_id):
        return cls.model.delete().where(cls.model.file_id == file_id).execute()
    @classmethod
    @DB.connection_context()
    def delete_by_document_id(cls, doc_id):
        return cls.model.delete().where(cls.model.document_id == doc_id).execute()
    @classmethod
    @DB.connection_context()
    def update_by_file_id(cls, file_id, obj):
        obj["update_time"] = current_timestamp()
        obj["update_date"] = datetime_format(datetime.now())
        num = cls.model.update(obj).where(cls.model.id == file_id).execute()
        e, obj = cls.get_by_id(cls.model.id)
        return obj
--- a/api/db/services/file_service.py
+++ b/api/db/services/file_service.py
@ -0,0 +1,243 @@
 #
 #  Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
 #
 #  Licensed under the Apache License, Version 2.0 (the "License");
 #  you may not use this file except in compliance with the License.
 #  You may obtain a copy of the License at
 #
 #      http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an "AS IS" BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from flask_login import current_user
 from peewee import fn
 from api.db import FileType
 from api.db.db_models import DB, File2Document, Knowledgebase
 from api.db.db_models import File, Document
 from api.db.services.common_service import CommonService
 from api.utils import get_uuid
 from rag.utils import MINIO
 class FileService(CommonService):
    model = File
    @classmethod
    @DB.connection_context()
    def get_by_pf_id(cls, tenant_id, pf_id, page_number, items_per_page,
                     orderby, desc, keywords):
        if keywords:
            files = cls.model.select().where(
                (cls.model.tenant_id == tenant_id)
                & (cls.model.parent_id == pf_id), (fn.LOWER(cls.model.name).like(f"%%{keywords.lower()}%%")))
        else:
            files = cls.model.select().where((cls.model.tenant_id == tenant_id)
                                             & (cls.model.parent_id == pf_id))
        count = files.count()
        if desc:
            files = files.order_by(cls.model.getter_by(orderby).desc())
        else:
            files = files.order_by(cls.model.getter_by(orderby).asc())
        files = files.paginate(page_number, items_per_page)
        res_files = list(files.dicts())
        for file in res_files:
            if file["type"] == FileType.FOLDER.value:
                file["size"] = cls.get_folder_size(file["id"])
                file['kbs_info'] = []
                continue
            kbs_info = cls.get_kb_id_by_file_id(file['id'])
            file['kbs_info'] = kbs_info
        return res_files, count
    @classmethod
    @DB.connection_context()
    def get_kb_id_by_file_id(cls, file_id):
        kbs = (cls.model.select(*[Knowledgebase.id, Knowledgebase.name])
               .join(File2Document, on=(File2Document.file_id == file_id))
               .join(Document, on=(File2Document.document_id == Document.id))
               .join(Knowledgebase, on=(Knowledgebase.id == Document.kb_id))
               .where(cls.model.id == file_id))
        if not kbs: return []
        kbs_info_list = []
        for kb in list(kbs.dicts()):
            kbs_info_list.append({"kb_id": kb['id'], "kb_name": kb['name']})
        return kbs_info_list
    @classmethod
    @DB.connection_context()
    def get_by_pf_id_name(cls, id, name):
        file = cls.model.select().where((cls.model.parent_id == id) & (cls.model.name == name))
        if file.count():
            e, file = cls.get_by_id(file[0].id)
            if not e:
                raise RuntimeError("Database error (File retrieval)!")
            return file
        return None
    @classmethod
    @DB.connection_context()
    def get_id_list_by_id(cls, id, name, count, res):
        if count < len(name):
            file = cls.get_by_pf_id_name(id, name[count])
            if file:
                res.append(file.id)
                return cls.get_id_list_by_id(file.id, name, count + 1, res)
            else:
                return res
        else:
            return res
    @classmethod
    @DB.connection_context()
    def get_all_innermost_file_ids(cls, folder_id, result_ids):
        subfolders = cls.model.select().where(cls.model.parent_id == folder_id)
        if subfolders.exists():
            for subfolder in subfolders:
                cls.get_all_innermost_file_ids(subfolder.id, result_ids)
        else:
            result_ids.append(folder_id)
        return result_ids
    @classmethod
    @DB.connection_context()
    def create_folder(cls, file, parent_id, name, count):
        if count > len(name) - 2:
            return file
        else:
            file = cls.insert({
                "id": get_uuid(),
                "parent_id": parent_id,
                "tenant_id": current_user.id,
                "created_by": current_user.id,
                "name": name[count],
                "location": "",
                "size": 0,
                "type": FileType.FOLDER.value
            })
            return cls.create_folder(file, file.id, name, count + 1)
    @classmethod
    @DB.connection_context()
    def is_parent_folder_exist(cls, parent_id):
        parent_files = cls.model.select().where(cls.model.id == parent_id)
        if parent_files.count():
            return True
        cls.delete_folder_by_pf_id(parent_id)
        return False
    @classmethod
    @DB.connection_context()
    def get_root_folder(cls, tenant_id):
        file = cls.model.select().where(cls.model.tenant_id == tenant_id and
                                        cls.model.parent_id == cls.model.id)
        if not file:
            file_id = get_uuid()
            file = {
                "id": file_id,
                "parent_id": file_id,
                "tenant_id": tenant_id,
                "created_by": tenant_id,
                "name": "/",
                "type": FileType.FOLDER.value,
                "size": 0,
                "location": "",
            }
            cls.save(**file)
        else:
            file_id = file[0].id
        e, file = cls.get_by_id(file_id)
        if not e:
            raise RuntimeError("Database error (File retrieval)!")
        return file
    @classmethod
    @DB.connection_context()
    def get_parent_folder(cls, file_id):
        file = cls.model.select().where(cls.model.id == file_id)
        if file.count():
            e, file = cls.get_by_id(file[0].parent_id)
            if not e:
                raise RuntimeError("Database error (File retrieval)!")
        else:
            raise RuntimeError("Database error (File doesn't exist)!")
        return file
    @classmethod
    @DB.connection_context()
    def get_all_parent_folders(cls, start_id):
        parent_folders = []
        current_id = start_id
        while current_id:
            e, file = cls.get_by_id(current_id)
            if file.parent_id != file.id and e:
                parent_folders.append(file)
                current_id = file.parent_id
            else:
                parent_folders.append(file)
                break
        return parent_folders
    @classmethod
    @DB.connection_context()
    def insert(cls, file):
        if not cls.save(**file):
            raise RuntimeError("Database error (File)!")
        e, file = cls.get_by_id(file["id"])
        if not e:
            raise RuntimeError("Database error (File retrieval)!")
        return file
    @classmethod
    @DB.connection_context()
    def delete(cls, file):
        return cls.delete_by_id(file.id)
    @classmethod
    @DB.connection_context()
    def delete_by_pf_id(cls, folder_id):
        return cls.model.delete().where(cls.model.parent_id == folder_id).execute()
    @classmethod
    @DB.connection_context()
    def delete_folder_by_pf_id(cls, user_id, folder_id):
        try:
            files = cls.model.select().where((cls.model.tenant_id == user_id)
                                             & (cls.model.parent_id == folder_id))
            for file in files:
                cls.delete_folder_by_pf_id(user_id, file.id)
            return cls.model.delete().where((cls.model.tenant_id == user_id)
                                            & (cls.model.id == folder_id)).execute(),
        except Exception as e:
            print(e)
            raise RuntimeError("Database error (File retrieval)!")
    @classmethod
    @DB.connection_context()
    def get_file_count(cls, tenant_id):
        files = cls.model.select(cls.model.id).where(cls.model.tenant_id == tenant_id)
        return len(files)
    @classmethod
    @DB.connection_context()
    def get_folder_size(cls, folder_id):
        size = 0
        def dfs(parent_id):
            nonlocal size
            for f in cls.model.select(*[cls.model.id, cls.model.size, cls.model.type]).where(
                    cls.model.parent_id == parent_id, cls.model.id != parent_id):
                size += f.size
                if f.type == FileType.FOLDER.value:
                    dfs(f.id)
        dfs(folder_id)
        return size
--- a/api/db/services/knowledgebase_service.py
+++ b/api/db/services/knowledgebase_service.py
@ -27,7 +27,8 @@ class KnowledgebaseService(CommonService):
                          page_number, items_per_page, orderby, desc):
        kbs = cls.model.select().where(
            ((cls.model.tenant_id.in_(joined_tenant_ids) & (cls.model.permission ==
-             TenantPermission.TEAM.value)) | (cls.model.tenant_id == user_id))
+                                                            TenantPermission.TEAM.value)) | (
                         cls.model.tenant_id == user_id))
            & (cls.model.status == StatusEnum.VALID.value)
        )
        if desc:
@ -56,7 +57,8 @@ class KnowledgebaseService(CommonService):
            cls.model.chunk_num,
            cls.model.parser_id,
            cls.model.parser_config]
-        kbs = cls.model.select(*fields).join(Tenant, on=((Tenant.id == cls.model.tenant_id) & (Tenant.status == StatusEnum.VALID.value))).where(
+        kbs = cls.model.select(*fields).join(Tenant, on=(
                    (Tenant.id == cls.model.tenant_id) & (Tenant.status == StatusEnum.VALID.value))).where(
            (cls.model.id == kb_id),
            (cls.model.status == StatusEnum.VALID.value)
        )
@ -86,6 +88,7 @@ class KnowledgebaseService(CommonService):
                    old[k] = list(set(old[k] + v))
                else:
                    old[k] = v
        dfs_update(m.parser_config, config)
        cls.update_by_id(id, {"parser_config": m.parser_config})
@ -97,3 +100,15 @@ class KnowledgebaseService(CommonService):
            if k.parser_config and "field_map" in k.parser_config:
                conf.update(k.parser_config["field_map"])
        return conf
    @classmethod
    @DB.connection_context()
    def get_by_name(cls, kb_name, tenant_id):
        kb = cls.model.select().where(
            (cls.model.name == kb_name)
            & (cls.model.tenant_id == tenant_id)
            & (cls.model.status == StatusEnum.VALID.value)
        )
        if kb:
            return True, kb[0]
        return False, None
--- a/api/db/services/llm_service.py
+++ b/api/db/services/llm_service.py
@ -81,7 +81,7 @@ class TenantLLMService(CommonService):
        if not model_config:
            if llm_type == LLMType.EMBEDDING.value:
                llm = LLMService.query(llm_name=llm_name)
-                if llm and llm[0].fid in ["QAnything", "FastEmbed"]:
+                if llm and llm[0].fid in ["Youdao", "FastEmbed"]:
                    model_config = {"llm_factory": llm[0].fid, "api_key":"", "llm_name": llm_name, "api_base": ""}
            if not model_config:
                if llm_name == "flag-embedding":
--- a/api/db/services/task_service.py
+++ b/api/db/services/task_service.py
@ -13,12 +13,15 @@
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 import random
 from peewee import Expression
 from api.db.db_models import DB
 from api.db import StatusEnum, FileType, TaskStatus
 from api.db.db_models import Task, Document, Knowledgebase, Tenant
 from api.db.services.common_service import CommonService
 from api.db.services.document_service import DocumentService
 from api.utils import current_timestamp
 class TaskService(CommonService):
@ -26,7 +29,7 @@ class TaskService(CommonService):
    @classmethod
    @DB.connection_context()
-    def get_tasks(cls, tm, mod=0, comm=1, items_per_page=64):
+    def get_tasks(cls, tm, mod=0, comm=1, items_per_page=1, takeit=True):
        fields = [
            cls.model.id,
            cls.model.doc_id,
@ -45,20 +48,47 @@ class TaskService(CommonService):
            Tenant.img2txt_id,
            Tenant.asr_id,
            cls.model.update_time]
-        docs = cls.model.select(*fields) \
+        with DB.lock("get_task", -1):
-            .join(Document, on=(cls.model.doc_id == Document.id)) \
+            docs = cls.model.select(*fields) \
-            .join(Knowledgebase, on=(Document.kb_id == Knowledgebase.id)) \
+                .join(Document, on=(cls.model.doc_id == Document.id)) \
-            .join(Tenant, on=(Knowledgebase.tenant_id == Tenant.id))\
+                .join(Knowledgebase, on=(Document.kb_id == Knowledgebase.id)) \
-            .where(
+                .join(Tenant, on=(Knowledgebase.tenant_id == Tenant.id))\
-                Document.status == StatusEnum.VALID.value,
+                .where(
-                Document.run == TaskStatus.RUNNING.value,
+                    Document.status == StatusEnum.VALID.value,
-                ~(Document.type == FileType.VIRTUAL.value),
+                    Document.run == TaskStatus.RUNNING.value,
-                cls.model.progress == 0,
+                    ~(Document.type == FileType.VIRTUAL.value),
-                cls.model.update_time >= tm,
+                    cls.model.progress == 0,
-                (Expression(cls.model.create_time, "%%", comm) == mod))\
+                    #cls.model.update_time >= tm,
-            .order_by(cls.model.update_time.asc())\
+                    #(Expression(cls.model.create_time, "%%", comm) == mod)
-            .paginate(1, items_per_page)
+                )\
-        return list(docs.dicts())
+                .order_by(cls.model.update_time.asc())\
                .paginate(0, items_per_page)
            docs = list(docs.dicts())
            if not docs: return []
            if not takeit: return docs
            cls.model.update(progress_msg=cls.model.progress_msg + "\n" + "Task has been received.", progress=random.random()/10.).where(
                cls.model.id == docs[0]["id"]).execute()
            return docs
    @classmethod
    @DB.connection_context()
    def get_ongoing_doc_name(cls):
        with DB.lock("get_task", -1):
            docs = cls.model.select(*[Document.kb_id, Document.location]) \
                .join(Document, on=(cls.model.doc_id == Document.id)) \
                .where(
                    Document.status == StatusEnum.VALID.value,
                    Document.run == TaskStatus.RUNNING.value,
                    ~(Document.type == FileType.VIRTUAL.value),
                    cls.model.progress >= 0,
                    cls.model.progress < 1,
                    cls.model.create_time >= current_timestamp() - 180000
                )
            docs = list(docs.dicts())
            if not docs: return []
            return list(set([(d["kb_id"], d["location"]) for d in docs]))
    @classmethod
    @DB.connection_context()
@ -74,9 +104,10 @@ class TaskService(CommonService):
    @classmethod
    @DB.connection_context()
    def update_progress(cls, id, info):
-        if info["progress_msg"]:
+        with DB.lock("update_progress", -1):
-            cls.model.update(progress_msg=cls.model.progress_msg + "\n" + info["progress_msg"]).where(
+            if info["progress_msg"]:
-                cls.model.id == id).execute()
+                cls.model.update(progress_msg=cls.model.progress_msg + "\n" + info["progress_msg"]).where(
-        if "progress" in info:
+                    cls.model.id == id).execute()
-            cls.model.update(progress=info["progress"]).where(
+            if "progress" in info:
-                cls.model.id == id).execute()
+                cls.model.update(progress=info["progress"]).where(
                    cls.model.id == id).execute()
--- a/api/utils/file_utils.py
+++ b/api/utils/file_utils.py
@ -147,7 +147,7 @@ def filename_type(filename):
        return FileType.PDF.value
    if re.match(
-            r".*\.(docx|doc|ppt|pptx|yml|xml|htm|json|csv|txt|ini|xls|xlsx|wps|rtf|hlp|pages|numbers|key|md)$", filename):
+            r".*\.(doc|docx|ppt|pptx|yml|xml|htm|json|csv|txt|ini|xls|xlsx|wps|rtf|hlp|pages|numbers|key|md)$", filename):
        return FileType.DOC.value
    if re.match(
@ -155,7 +155,9 @@ def filename_type(filename):
        return FileType.AURAL.value
    if re.match(r".*\.(jpg|jpeg|png|tif|gif|pcx|tga|exif|fpx|svg|psd|cdr|pcd|dxf|ufo|eps|ai|raw|WMF|webp|avif|apng|icon|ico|mpg|mpeg|avi|rm|rmvb|mov|wmv|asf|dat|asx|wvx|mpe|mpa|mp4)$", filename):
-        return FileType.VISUAL
+        return FileType.VISUAL.value
    return FileType.OTHER.value
 def thumbnail(filename, blob):
--- a/conf/mapping.json
+++ b/conf/mapping.json
@ -1,7 +1,7 @@
 {
  "settings": {
    "index": {
-      "number_of_shards": 4,
+      "number_of_shards": 2,
      "number_of_replicas": 0,
      "refresh_interval" : "1000ms"
    },
--- a/conf/service_conf.yaml
+++ b/conf/service_conf.yaml
@ -13,11 +13,16 @@ minio:
  user: 'rag_flow'
  password: 'infini_rag_flow'
  host: 'minio:9000'
 redis:
  db: 1
  password: 'infini_rag_flow'
  host: 'redis:6379'
 es:
  hosts: 'http://es01:9200'
 user_default_llm:
  factory: 'Tongyi-Qianwen'
  api_key: 'sk-xxxxxxxxxxxxx'
  base_url: ''
 oauth:
  github:
    client_id: xxxxxxxxxxxxxxxxxxxxxxxxx
--- a/deepdoc/README_zh.md
+++ b/deepdoc/README_zh.md
@ -1 +1,116 @@
-[English](./README.md) | 简体中文
+[English](./README.md) | 简体中文
 # *Deep*Doc
 - [*Deep*Doc](#deepdoc)
  - [1. 介绍](#1-介绍)
  - [2. 视觉处理](#2-视觉处理)
  - [3. 解析器](#3-解析器)
    - [简历](#简历)
 <a name="1"></a>
 ## 1. 介绍
 对于来自不同领域、具有不同格式和不同检索要求的大量文档，准确的分析成为一项极具挑战性的任务。*Deep*Doc 就是为了这个目的而诞生的。到目前为止，*Deep*Doc 中有两个组成部分：视觉处理和解析器。如果您对我们的OCR、布局识别和TSR结果感兴趣，您可以运行下面的测试程序。
 ```bash
 python deepdoc/vision/t_ocr.py -h
 usage: t_ocr.py [-h] --inputs INPUTS [--output_dir OUTPUT_DIR]
 options:
  -h, --help            show this help message and exit
  --inputs INPUTS       Directory where to store images or PDFs, or a file path to a single image or PDF
  --output_dir OUTPUT_DIR
                        Directory where to store the output images. Default: './ocr_outputs'
 ```
 ```bash
 python deepdoc/vision/t_recognizer.py -h
 usage: t_recognizer.py [-h] --inputs INPUTS [--output_dir OUTPUT_DIR] [--threshold THRESHOLD] [--mode {layout,tsr}]
 options:
  -h, --help            show this help message and exit
  --inputs INPUTS       Directory where to store images or PDFs, or a file path to a single image or PDF
  --output_dir OUTPUT_DIR
                        Directory where to store the output images. Default: './layouts_outputs'
  --threshold THRESHOLD
                        A threshold to filter out detections. Default: 0.5
  --mode {layout,tsr}   Task mode: layout recognition or table structure recognition
 ```
 HuggingFace为我们的模型提供服务。如果你在下载HuggingFace模型时遇到问题，这可能会有所帮助！！
 ```bash
 export HF_ENDPOINT=https://hf-mirror.com
 ```
 <a name="2"></a>
 ## 2. 视觉处理
 作为人类，我们使用视觉信息来解决问题。
  - **OCR（Optical Character Recognition，光学字符识别）**。由于许多文档都是以图像形式呈现的，或者至少能够转换为图像，因此OCR是文本提取的一个非常重要、基本，甚至通用的解决方案。
    ```bash
    python deepdoc/vision/t_ocr.py --inputs=path_to_images_or_pdfs --output_dir=path_to_store_result
    ```
    输入可以是图像或PDF的目录，或者单个图像、PDF文件。您可以查看文件夹 `path_to_store_result` ，其中有演示结果位置的图像，以及包含OCR文本的txt文件。
    <div align="center" style="margin-top:20px;margin-bottom:20px;">
    <img src="https://github.com/infiniflow/ragflow/assets/12318111/f25bee3d-aaf7-4102-baf5-d5208361d110" width="900"/>
    </div>
  - 布局识别（Layout recognition）。来自不同领域的文件可能有不同的布局，如报纸、杂志、书籍和简历在布局方面是不同的。只有当机器有准确的布局分析时，它才能决定这些文本部分是连续的还是不连续的，或者这个部分需要表结构识别（Table Structure Recognition，TSR）来处理，或者这个部件是一个图形并用这个标题来描述。我们有10个基本布局组件，涵盖了大多数情况：
      - 文本
      - 标题
      - 配图
      - 配图标题
      - 表格
      - 表格标题
      - 页头
      - 页尾
      - 参考引用
      - 公式
     请尝试以下命令以查看布局检测结果。
    ```bash
    python deepdoc/vision/t_recognizer.py --inputs=path_to_images_or_pdfs --threshold=0.2 --mode=layout --output_dir=path_to_store_result
    ```
    输入可以是图像或PDF的目录，或者单个图像、PDF文件。您可以查看文件夹 `path_to_store_result` ，其中有显示检测结果的图像，如下所示：
    <div align="center" style="margin-top:20px;margin-bottom:20px;">
    <img src="https://github.com/infiniflow/ragflow/assets/12318111/07e0f625-9b28-43d0-9fbb-5bf586cd286f" width="1000"/>
    </div>
  - **TSR（Table Structure Recognition，表结构识别）**。数据表是一种常用的结构，用于表示包括数字或文本在内的数据。表的结构可能非常复杂，比如层次结构标题、跨单元格和投影行标题。除了TSR，我们还将内容重新组合成LLM可以很好理解的句子。TSR任务有五个标签：
      - 列
      - 行
      - 列标题
      - 行标题
      - 合并单元格
    请尝试以下命令以查看布局检测结果。
    ```bash
    python deepdoc/vision/t_recognizer.py --inputs=path_to_images_or_pdfs --threshold=0.2 --mode=tsr --output_dir=path_to_store_result
    ```
    输入可以是图像或PDF的目录，或者单个图像、PDF文件。您可以查看文件夹 `path_to_store_result` ，其中包含图像和html页面，这些页面展示了以下检测结果：
    <div align="center" style="margin-top:20px;margin-bottom:20px;">
    <img src="https://github.com/infiniflow/ragflow/assets/12318111/cb24e81b-f2ba-49f3-ac09-883d75606f4c" width="1000"/>
    </div>
 <a name="3"></a>
 ## 3. 解析器
 PDF、DOCX、EXCEL和PPT四种文档格式都有相应的解析器。最复杂的是PDF解析器，因为PDF具有灵活性。PDF解析器的输出包括：
  - 在PDF中有自己位置的文本块（页码和矩形位置）。
  - 带有PDF裁剪图像的表格，以及已经翻译成自然语言句子的内容。
  - 图中带标题和文字的图。
 ### 简历
 简历是一种非常复杂的文件。一份由各种布局的非结构化文本组成的简历可以分解为由近百个字段组成的结构化数据。我们还没有打开解析器，因为我们在解析过程之后打开了处理方法。
--- a/deepdoc/parser/excel_parser.py
+++ b/deepdoc/parser/excel_parser.py
@ -3,6 +3,8 @@ from openpyxl import load_workbook
 import sys
 from io import BytesIO
 from rag.nlp import find_codec
 class HuExcelParser:
    def html(self, fnm):
@ -66,7 +68,8 @@ class HuExcelParser:
                return total
        if fnm.split(".")[-1].lower() in ["csv", "txt"]:
-            txt = binary.decode("utf-8")
+            encoding = find_codec(binary)
            txt = binary.decode(encoding)
            return len(txt.split("\n"))
--- a/deepdoc/parser/pdf_parser.py
+++ b/deepdoc/parser/pdf_parser.py
@ -11,7 +11,7 @@ import pdfplumber
 import logging
 from PIL import Image, ImageDraw
 import numpy as np
-
+from timeit import default_timer as timer
 from PyPDF2 import PdfReader as pdf2_read
 from api.utils.file_utils import get_project_base_directory
@ -37,17 +37,18 @@ class HuParser:
            self.updown_cnt_mdl.set_param({"device": "cuda"})
        try:
            model_dir = os.path.join(
-                    get_project_base_directory(),
+                get_project_base_directory(),
-                    "rag/res/deepdoc")
+                "rag/res/deepdoc")
            self.updown_cnt_mdl.load_model(os.path.join(
                model_dir, "updown_concat_xgb.model"))
        except Exception as e:
            model_dir = snapshot_download(
-                repo_id="InfiniFlow/text_concat_xgb_v1.0")
+                repo_id="InfiniFlow/text_concat_xgb_v1.0",
                local_dir=os.path.join(get_project_base_directory(), "rag/res/deepdoc"),
                local_dir_use_symlinks=False)
            self.updown_cnt_mdl.load_model(os.path.join(
                model_dir, "updown_concat_xgb.model"))
        self.page_from = 0
        """
        If you have trouble downloading HuggingFace models, -_^ this might help!!
@ -62,7 +63,7 @@ class HuParser:
        """
    def __char_width(self, c):
-        return (c["x1"] - c["x0"]) // len(c["text"])
+        return (c["x1"] - c["x0"]) // max(len(c["text"]), 1)
    def __height(self, c):
        return c["bottom"] - c["top"]
@ -74,7 +75,7 @@ class HuParser:
    def _y_dis(
            self, a, b):
        return (
-            b["top"] + b["bottom"] - a["top"] - a["bottom"]) / 2
+                       b["top"] + b["bottom"] - a["top"] - a["bottom"]) / 2
    def _match_proj(self, b):
        proj_patt = [
@ -97,9 +98,9 @@ class HuParser:
        tks_down = huqie.qie(down["text"][:LEN]).split(" ")
        tks_up = huqie.qie(up["text"][-LEN:]).split(" ")
        tks_all = up["text"][-LEN:].strip() \
-            + (" " if re.match(r"[a-zA-Z0-9]+",
+                  + (" " if re.match(r"[a-zA-Z0-9]+",
-                               up["text"][-1] + down["text"][0]) else "") \
+                                     up["text"][-1] + down["text"][0]) else "") \
-            + down["text"][:LEN].strip()
+                  + down["text"][:LEN].strip()
        tks_all = huqie.qie(tks_all).split(" ")
        fea = [
            up.get("R", -1) == down.get("R", -1),
@ -121,7 +122,7 @@ class HuParser:
            True if re.search(r"[，,][^。.]+$", up["text"]) else False,
            True if re.search(r"[，,][^。.]+$", up["text"]) else False,
            True if re.search(r"[\(（][^\)）]+$", up["text"])
-            and re.search(r"[\)）]", down["text"]) else False,
+                    and re.search(r"[\)）]", down["text"]) else False,
            self._match_proj(down),
            True if re.match(r"[A-Z]", down["text"]) else False,
            True if re.match(r"[A-Z]", up["text"][-1]) else False,
@ -183,7 +184,7 @@ class HuParser:
                continue
            for tb in tbls:  # for table
                left, top, right, bott = tb["x0"] - MARGIN, tb["top"] - MARGIN, \
-                    tb["x1"] + MARGIN, tb["bottom"] + MARGIN
+                                         tb["x1"] + MARGIN, tb["bottom"] + MARGIN
                left *= ZM
                top *= ZM
                right *= ZM
@ -295,7 +296,7 @@ class HuParser:
        for b in bxs:
            if not b["text"]:
                left, right, top, bott = b["x0"] * ZM, b["x1"] * \
-                    ZM, b["top"] * ZM, b["bottom"] * ZM
+                                         ZM, b["top"] * ZM, b["bottom"] * ZM
                b["text"] = self.ocr.recognize(np.array(img),
                                               np.array([[left, top], [right, top], [right, bott], [left, bott]],
                                                        dtype=np.float32))
@ -620,7 +621,7 @@ class HuParser:
                i += 1
                continue
            lout_no = str(self.boxes[i]["page_number"]) + \
-                "-" + str(self.boxes[i]["layoutno"])
+                      "-" + str(self.boxes[i]["layoutno"])
            if TableStructureRecognizer.is_caption(self.boxes[i]) or self.boxes[i]["layout_type"] in ["table caption",
                                                                                                      "title",
                                                                                                      "figure caption",
@ -828,9 +829,13 @@ class HuParser:
        pn = [bx["page_number"]]
        top = bx["top"] - self.page_cum_height[pn[0] - 1]
        bott = bx["bottom"] - self.page_cum_height[pn[0] - 1]
        page_images_cnt = len(self.page_images)
        if pn[-1] - 1 >= page_images_cnt: return ""
        while bott * ZM > self.page_images[pn[-1] - 1].size[1]:
            bott -= self.page_images[pn[-1] - 1].size[1] / ZM
            pn.append(pn[-1] + 1)
            if pn[-1] - 1 >= page_images_cnt:
                return ""
        return "@@{}\t{:.1f}\t{:.1f}\t{:.1f}\t{:.1f}##" \
            .format("-".join([str(p) for p in pn]),
@ -930,6 +935,7 @@ class HuParser:
        self.page_cum_height = [0]
        self.page_layout = []
        self.page_from = page_from
        st = timer()
        try:
            self.pdf = pdfplumber.open(fnm) if isinstance(
                fnm, str) else pdfplumber.open(BytesIO(fnm))
@ -968,6 +974,7 @@ class HuParser:
                        self.outlines.append((a["/Title"], depth))
                        continue
                    dfs(a, depth + 1)
            dfs(outlines, 0)
        except Exception as e:
            logging.warning(f"Outlines exception: {e}")
@ -977,13 +984,15 @@ class HuParser:
        logging.info("Images converted.")
        self.is_english = [re.search(r"[a-zA-Z0-9,/¸;:'\[\]\(\)!@#$%^&*\"?<>._-]{30,}", "".join(
            random.choices([c["text"] for c in self.page_chars[i]], k=min(100, len(self.page_chars[i]))))) for i in
-            range(len(self.page_chars))]
+                           range(len(self.page_chars))]
        if sum([1 if e else 0 for e in self.is_english]) > len(
                self.page_images) / 2:
            self.is_english = True
        else:
            self.is_english = False
        self.is_english = False
        st = timer()
        for i, img in enumerate(self.page_images):
            chars = self.page_chars[i] if not self.is_english else []
            self.mean_height.append(
@ -1001,15 +1010,11 @@ class HuParser:
                                                                       chars[j]["width"]) / 2:
                    chars[j]["text"] += " "
                j += 1
-            # if i > 0:
+
            #     if not chars:
            #         self.page_cum_height.append(img.size[1] / zoomin)
            #     else:
            #         self.page_cum_height.append(
            #             np.max([c["bottom"] for c in chars]))
            self.__ocr(i + 1, img, chars, zoomin)
-            if callback:
+            if callback and i % 6 == 5:
                callback(prog=(i + 1) * 0.6 / len(self.page_images), msg="")
        # print("OCR:", timer()-st)
        if not self.is_english and not any(
                [c for c in self.page_chars]) and self.boxes:
@ -1045,7 +1050,7 @@ class HuParser:
            left, right, top, bottom = float(left), float(
                right), float(top), float(bottom)
            poss.append(([int(p) - 1 for p in pn.split("-")],
-                        left, right, top, bottom))
+                         left, right, top, bottom))
        if not poss:
            if need_position:
                return None, None
@ -1071,7 +1076,7 @@ class HuParser:
                self.page_images[pns[0]].crop((left * ZM, top * ZM,
                                               right *
                                               ZM, min(
-                                                   bottom, self.page_images[pns[0]].size[1])
+                    bottom, self.page_images[pns[0]].size[1])
                                               ))
            )
            if 0 < ii < len(poss) - 1:
--- a/deepdoc/vision/layout_recognizer.py
+++ b/deepdoc/vision/layout_recognizer.py
@ -43,7 +43,9 @@ class LayoutRecognizer(Recognizer):
                    "rag/res/deepdoc")
            super().__init__(self.labels, domain, model_dir)
        except Exception as e:
-            model_dir = snapshot_download(repo_id="InfiniFlow/deepdoc")
+            model_dir = snapshot_download(repo_id="InfiniFlow/deepdoc",
                                          local_dir=os.path.join(get_project_base_directory(), "rag/res/deepdoc"),
                                          local_dir_use_symlinks=False)
            super().__init__(self.labels, domain, model_dir)
        self.garbage_layouts = ["footer", "header", "reference"]
--- a/deepdoc/vision/ocr.py
+++ b/deepdoc/vision/ocr.py
@ -486,7 +486,9 @@ class OCR(object):
                self.text_detector = TextDetector(model_dir)
                self.text_recognizer = TextRecognizer(model_dir)
            except Exception as e:
-                model_dir = snapshot_download(repo_id="InfiniFlow/deepdoc")
+                model_dir = snapshot_download(repo_id="InfiniFlow/deepdoc",
                                              local_dir=os.path.join(get_project_base_directory(), "rag/res/deepdoc"),
                                              local_dir_use_symlinks=False)
                self.text_detector = TextDetector(model_dir)
                self.text_recognizer = TextRecognizer(model_dir)
--- a/deepdoc/vision/recognizer.py
+++ b/deepdoc/vision/recognizer.py
@ -41,7 +41,9 @@ class Recognizer(object):
                        "rag/res/deepdoc")
            model_file_path = os.path.join(model_dir, task_name + ".onnx")
            if not os.path.exists(model_file_path):
-                model_dir = snapshot_download(repo_id="InfiniFlow/deepdoc")
+                model_dir = snapshot_download(repo_id="InfiniFlow/deepdoc",
                                              local_dir=os.path.join(get_project_base_directory(), "rag/res/deepdoc"),
                                              local_dir_use_symlinks=False)
                model_file_path = os.path.join(model_dir, task_name + ".onnx")
        else:
            model_file_path = os.path.join(model_dir, task_name + ".onnx")
--- a/deepdoc/vision/table_structure_recognizer.py
+++ b/deepdoc/vision/table_structure_recognizer.py
@ -39,7 +39,9 @@ class TableStructureRecognizer(Recognizer):
                    get_project_base_directory(),
                    "rag/res/deepdoc"))
        except Exception as e:
-            super().__init__(self.labels, "tsr", snapshot_download(repo_id="InfiniFlow/deepdoc"))
+            super().__init__(self.labels, "tsr", snapshot_download(repo_id="InfiniFlow/deepdoc",
                                              local_dir=os.path.join(get_project_base_directory(), "rag/res/deepdoc"),
                                              local_dir_use_symlinks=False))
    def __call__(self, images, thr=0.2):
        tbls = super().__call__(images, thr)
--- a/docker/.env
+++ b/docker/.env
@ -11,16 +11,24 @@ ES_PORT=1200
 KIBANA_PORT=6601
 # Increase or decrease based on the available host memory (in bytes)
-MEM_LIMIT=4073741824
+
 MEM_LIMIT=8073741824
 MYSQL_PASSWORD=infini_rag_flow
 MYSQL_PORT=5455
 # Port to expose minio to the host
 MINIO_CONSOLE_PORT=9001
 MINIO_PORT=9000
 MINIO_USER=rag_flow
 MINIO_PASSWORD=infini_rag_flow
 SVR_HTTP_PORT=9380
 RAGFLOW_VERSION=v0.4.0
 TIMEZONE='Asia/Shanghai'
 ######## OS setup for ES ###########
--- a/docker/docker-compose-CN-oc9.yml
+++ b/docker/docker-compose-CN-oc9.yml
@ -0,0 +1,29 @@
 include:
  - path: ./docker-compose-base.yml
    env_file: ./.env
 services:
   ragflow:
    depends_on:
      mysql:
        condition: service_healthy
      es01:
        condition: service_healthy
    image: edwardelric233/ragflow:oc9
    container_name: ragflow-server
    ports:
      - ${SVR_HTTP_PORT}:9380
      - 80:80
      - 443:443
    volumes:
      - ./service_conf.yaml:/ragflow/conf/service_conf.yaml
      - ./ragflow-logs:/ragflow/logs
      - ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
    environment:
      - TZ=${TIMEZONE}
      - HF_ENDPOINT=https://hf-mirror.com
    networks:
      - ragflow
    restart: always
--- a/docker/docker-compose-CN.yml
+++ b/docker/docker-compose-CN.yml
@ -9,7 +9,7 @@ services:
        condition: service_healthy
      es01:
        condition: service_healthy
-    image: swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:v0.2.0
+    image: swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:${RAGFLOW_VERSION}
    container_name: ragflow-server
    ports:
      - ${SVR_HTTP_PORT}:9380
--- a/docker/docker-compose-base.yml
+++ b/docker/docker-compose-base.yml
@ -29,23 +29,23 @@ services:
      - ragflow
    restart: always
-  kibana:
+  #kibana:
-    depends_on:
+  #  depends_on:
-        es01:
+  #      es01:
-          condition: service_healthy
+  #        condition: service_healthy
-    image: docker.elastic.co/kibana/kibana:${STACK_VERSION}
+  #  image: docker.elastic.co/kibana/kibana:${STACK_VERSION}
-    container_name: ragflow-kibana
+  #  container_name: ragflow-kibana
-    volumes:
+  #  volumes:
-      - kibanadata:/usr/share/kibana/data
+  #    - kibanadata:/usr/share/kibana/data
-    ports:
+  #  ports:
-      - ${KIBANA_PORT}:5601
+  #    - ${KIBANA_PORT}:5601
-    environment:
+  #  environment:
-      - SERVERNAME=kibana
+  #    - SERVERNAME=kibana
-      - ELASTICSEARCH_HOSTS=http://es01:9200
+  #    - ELASTICSEARCH_HOSTS=http://es01:9200
-      - TZ=${TIMEZONE}
+  #    - TZ=${TIMEZONE}
-    mem_limit: ${MEM_LIMIT}
+  #  mem_limit: ${MEM_LIMIT}
-    networks:
+  #  networks:
-      - ragflow
+  #    - ragflow
  mysql:
    image: mysql:5.7.18
@ -80,8 +80,8 @@ services:
    container_name: ragflow-minio
    command: server --console-address ":9001" /data
    ports:
-      - 9000:9000
+      - ${MINIO_PORT}:9000
-      - 9001:9001
+      - ${MINIO_CONSOLE_PORT}:9001
    environment:
      - MINIO_ROOT_USER=${MINIO_USER}
      - MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}
@ -96,8 +96,8 @@ services:
 volumes:
  esdata01:
    driver: local
-  kibanadata:
+#  kibanadata:
-    driver: local
+#    driver: local
  mysql_data:
    driver: local
  minio_data:
--- a/docker/docker-compose.yml
+++ b/docker/docker-compose.yml
@ -9,7 +9,7 @@ services:
        condition: service_healthy
      es01:
        condition: service_healthy
-    image: infiniflow/ragflow:v0.2.0
+    image: infiniflow/ragflow:${RAGFLOW_VERSION}
    container_name: ragflow-server
    ports:
      - ${SVR_HTTP_PORT}:9380
--- a/docker/entrypoint.sh
+++ b/docker/entrypoint.sh
@ -23,13 +23,12 @@ function watch_broker(){
 }
 function task_bro(){
    sleep 160;
    watch_broker;
 }
 task_bro &
-WS=2
+WS=1
 for ((i=0;i<WS;i++))
 do
  task_exe $i $WS &
--- a/docker/service_conf.yaml
+++ b/docker/service_conf.yaml
@ -13,6 +13,10 @@ minio:
  user: 'rag_flow'
  password: 'infini_rag_flow'
  host: 'minio:9000'
 redis:
  db: 1
  password: 'infini_rag_flow'
  host: 'redis:6379'
 es:
  hosts: 'http://es01:9200'
 user_default_llm:
--- a/docs/conversation_api.md
+++ b/docs/conversation_api.md
@ -1,5 +1,9 @@
 # Conversation API Instruction
 <div align="center" style="margin-top:20px;margin-bottom:20px;">
 <img src="https://github.com/infiniflow/ragflow/assets/12318111/df0dcc3d-789a-44f7-89f1-7a5f044ab729" width="830"/>
 </div>
 ## Base URL
 ```buildoutcfg
 https://demo.ragflow.io/v1/
@ -7,7 +11,7 @@ https://demo.ragflow.io/v1/
 ## Authorization
-All the APIs are authorized with API-Key. Please keep it save and private. Don't reveal it in any way from the front-end. 
+All the APIs are authorized with API-Key. Please keep it safe and private. Don't reveal it in any way from the front-end. 
 The API-Key should put in the header of request:
 ```buildoutcfg
 Authorization: Bearer {API_KEY}
@ -299,5 +303,61 @@ This will be called to get the answer to users' questions.
 ## Get document content or image
 This is usually used when display content of citation.
-### Path: /document/get/\<id\>
+### Path: /api/document/get/\<id\>
 ### Method: GET
 ## Upload file
 This is usually used when upload a file to.
 ### Path: /api/document/upload/
 ### Method: POST
 ### Parameter:
 | name    | type   | optional | description                            |
 |---------|--------|----------|----------------------------------------|
 | file    | file   | No       | Upload file.                           |
 | kb_name | string | No       | Choose the upload knowledge base name. |
 ### Response 
 ```json
 {
    "data": {
        "chunk_num": 0,
        "create_date": "Thu, 25 Apr 2024 14:30:06 GMT",
        "create_time": 1714026606921,
        "created_by": "553ec818fd5711ee8ea63043d7ed348e",
        "id": "41e9324602cd11ef9f5f3043d7ed348e",
        "kb_id": "06802686c0a311ee85d6246e9694c130",
        "location": "readme.txt",
        "name": "readme.txt",
        "parser_config": {
            "field_map": {
            },
            "pages": [
                [
                    0,
                    1000000
                ]
            ]
        },
        "parser_id": "general",
        "process_begin_at": null,
        "process_duation": 0.0,
        "progress": 0.0,
        "progress_msg": "",
        "run": "0",
        "size": 929,
        "source_type": "local",
        "status": "1",
        "thumbnail": null,
        "token_num": 0,
        "type": "doc",
        "update_date": "Thu, 25 Apr 2024 14:30:06 GMT",
        "update_time": 1714026606921
    },
    "retcode": 0,
    "retmsg": "success"
 }
 ```
--- a/docs/faq.md
+++ b/docs/faq.md
@ -2,95 +2,187 @@
 ## General
-### What sets RAGFlow apart from other RAG products?
+### 1. What sets RAGFlow apart from other RAG products?
 The "garbage in garbage out" status quo remains unchanged despite the fact that LLMs have advanced Natural Language Processing (NLP) significantly. In response, RAGFlow introduces two unique features compared to other Retrieval-Augmented Generation (RAG) products.
 - Fine-grained document parsing: Document parsing involves images and tables, with the flexibility for you to intervene as needed.
 - Traceable answers with reduced hallucinations: You can trust RAGFlow's responses as you can view the citations and references supporting them.
-### Which languages does RAGFlow support?
+### 2. Which languages does RAGFlow support?
 English, simplified Chinese, traditional Chinese for now. 
 ## Performance
-### Why does it take longer for RAGFlow to parse a document than LangChain?
+### 1. Why does it take longer for RAGFlow to parse a document than LangChain?
 We put painstaking effort into document pre-processing tasks like layout analysis, table structure recognition, and OCR (Optical Character Recognition) using our vision model. This contributes to the additional time required. 
 ### 2. Why does RAGFlow require more resources than other projects?
 RAGFlow has a number of built-in models for document structure parsing, which account for the additional computational resources.
 ## Feature
-### Which architectures or devices does RAGFlow support?
+### 1. Which architectures or devices does RAGFlow support?
-ARM64 and Ascend GPU are not supported.
+Currently, we only support x86 CPU and Nvidia GPU. 
-### Do you offer an API for integration with third-party applications?
+### 2. Do you offer an API for integration with third-party applications?
-These APIs are still in development. Contributions are welcome.
+The corresponding APIs are now available. See the [Conversation API](./conversation_api.md) for more information. 
-### Do you support stream output?
+### 3. Do you support stream output?
 No, this feature is still in development. Contributions are welcome. 
-### Is it possible to share dialogue through URL?
+### 4. Is it possible to share dialogue through URL?
 Yes, this feature is now available.
 ### 5. Do you support multiple rounds of dialogues, i.e., referencing previous dialogues as context for the current dialogue?
 This feature and the related APIs are still in development. Contributions are welcome.
 ### Do you support multiple rounds of dialogues, i.e., referencing previous dialogues as context for the current dialogue?
-This feature and the related APIs are still in development. Contributions are welcome.
+## Troubleshooting
-## Configurations
+### 1. Issues with docker images
-### How to increase the length of RAGFlow responses?
+#### 1.1 How to build the RAGFlow image from scratch?
-1. Right click the desired dialog to display the **Chat Configuration** window.
+```
-2. Switch to the **Model Setting** tab and adjust the **Max Tokens** slider to get the desired length.
+$ git clone https://github.com/infiniflow/ragflow.git
-3. Click **OK** to confirm your change. 
+$ cd ragflow
 $ docker build -t infiniflow/ragflow:v0.4.0 .
 $ cd ragflow/docker
 $ chmod +x ./entrypoint.sh
 $ docker compose up -d
 ```
 #### 1.2 `process "/bin/sh -c cd ./web && npm i && npm run build"` failed
-### What does Empty response mean? How to set it?
+1. Check your network from within Docker, for example:
 ```bash
 curl https://hf-mirror.com
 ```
-You limit what the system responds to what you specify in **Empty response** if nothing is retrieved from your knowledge base. If you do not specify anything in **Empty response**, you let your LLM improvise, giving it a chance to hallucinate.
+2. If your network works fine, the issue lies with the Docker network configuration. Replace the Docker building command:
 ```bash
 docker build -t infiniflow/ragflow:vX.Y.Z.
 ```
   With this:  
 ```bash
 docker build -t infiniflow/ragflow:vX.Y.Z. --network host
 ```
-### Can I set the base URL for OpenAI somewhere?
+### 2. Issues with huggingface models
-![](https://github.com/infiniflow/ragflow/assets/93570324/8cfb6fa4-8a97-415d-b9fa-b6f405a055f3)
+#### 2.1 Cannot access https://huggingface.co
 A *locally* deployed RAGflow downloads OCR and embedding modules from [Huggingface website](https://huggingface.co) by default. If your machine is unable to access this site, the following error occurs and PDF parsing fails: 
 ```
 FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/hub/models--InfiniFlow--deepdoc/snapshots/be0c1e50eef6047b412d1800aa89aba4d275f997/ocr.res'
 ```
 To fix this issue, use https://hf-mirror.com instead:
-### How to run RAGFlow with a locally deployed LLM?
+ 1. Stop all containers and remove all related resources:
-You can use Ollama to deploy local LLM. See [here](https://github.com/infiniflow/ragflow/blob/main/docs/ollama.md) for more information. 
+ ```bash
 cd ragflow/docker/
 docker compose down
 ```
-### How to link up ragflow and ollama servers?
+ 2. Replace `https://huggingface.co` with `https://hf-mirror.com` in **ragflow/docker/docker-compose.yml**.
 3. Start up the server: 
- If RAGFlow is locally deployed, ensure that your RAGFlow and Ollama are in the same LAN. 
+ ```bash
- If you are using our online demo, ensure that the IP address of your Ollama server is public and accessible.
+ docker compose up -d 
 ```
-### How to configure RAGFlow to respond with 100% matched results, rather than utilizing LLM?
+#### 2.2. `MaxRetryError: HTTPSConnectionPool(host='hf-mirror.com', port=443)`
-1. Click the **Knowledge Base** tab in the middle top of the page.
+This error suggests that you do not have Internet access or are unable to connect to hf-mirror.com. Try the following: 
 2. Right click the desired knowledge base to display the **Configuration** dialogue. 
 3. Choose **Q&A** as the chunk method and click **Save** to confirm your change. 
-## Debugging
+1. Manually download the resource files from [huggingface.co/InfiniFlow/deepdoc](https://huggingface.co/InfiniFlow/deepdoc) to your local folder **~/deepdoc**. 
 2. Add a volumes to **docker-compose.yml**, for example:
 ```
 - ~/deepdoc:/ragflow/rag/res/deepdoc
 ```
-### How to handle `WARNING: can't find /raglof/rag/res/borker.tm`?
+#### 2.3 `FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/hub/models--InfiniFlow--deepdoc/snapshots/FileNotFoundError: [Errno 2] No such file or directory: '/ragflow/rag/res/deepdoc/ocr.res'be0c1e50eef6047b412d1800aa89aba4d275f997/ocr.res'`
 1. Check your network from within Docker, for example: 
 ```bash
 curl https://hf-mirror.com
 ```
 2. Run `ifconfig` to check the `mtu` value. If the server's `mtu` is `1450` while the NIC's `mtu` in the container is `1500`, this mismatch may cause network instability. Adjust the `mtu` policy as follows:
 ```
 vim docker-compose-base.yml
 # Original configuration：
 networks:
  ragflow:
    driver: bridge
 # Modified configuration：
 networks:
  ragflow:
    driver: bridge
    driver_opts:
      com.docker.network.driver.mtu: 1450
 ```
 ### 3. Issues with RAGFlow servers
 #### 3.1 `WARNING: can't find /raglof/rag/res/borker.tm`
 Ignore this warning and continue. All system warnings can be ignored.
-### How to handle `Realtime synonym is disabled, since no redis connection`?
+#### 3.2 `network anomaly There is an abnormality in your network and you cannot connect to the server.`
 ![anomaly](https://github.com/infiniflow/ragflow/assets/93570324/beb7ad10-92e4-4a58-8886-bfb7cbd09e5d)
 You will not log in to RAGFlow unless the server is fully initialized. Run `docker logs -f ragflow-server`.
 *The server is successfully initialized, if your system displays the following:*
 ```
    ____                 ______ __
   / __ \ ____ _ ____ _ / ____// /____  _      __
  / /_/ // __ `// __ `// /_   / // __ \| | /| / /
 / _, _// /_/ // /_/ // __/  / // /_/ /| |/ |/ /
 /_/ |_| \__,_/ \__, //_/    /_/ \____/ |__/|__/
              /____/
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:9380
 * Running on http://x.x.x.x:9380
 INFO:werkzeug:Press CTRL+C to quit
 ```
 ### 4. Issues with RAGFlow backend services
 #### 4.1 `dependency failed to start: container ragflow-mysql is unhealthy`
 `dependency failed to start: container ragflow-mysql is unhealthy` means that your MySQL container failed to start. Try replacing `mysql:5.7.18` with `mariadb:10.5.8` in **docker-compose-base.yml**.
 #### 4.2 `Realtime synonym is disabled, since no redis connection`
 Ignore this warning and continue. All system warnings can be ignored.
 ![](https://github.com/infiniflow/ragflow/assets/93570324/ef5a6194-084a-4fe3-bdd5-1c025b40865c)
-### Why does it take so long to parse a 2MB document?
+#### 4.3 Why does it take so long to parse a 2MB document?
 Parsing requests have to wait in queue due to limited server resources. We are currently enhancing our algorithms and increasing computing power.
-### Why does my document parsing stall at under one percent?
+#### 4.4 Why does my document parsing stall at under one percent?
 ![stall](https://github.com/infiniflow/ragflow/assets/93570324/3589cc25-c733-47d5-bbfc-fedb74a3da50)
 If your RAGFlow is deployed *locally*, try the following: 
@ -98,20 +190,21 @@ If your RAGFlow is deployed *locally*, try the following:
 ```bash
 docker logs -f ragflow-server
 ```
-2. Check if the **tast_executor.py** process exist.
+2. Check if the **task_executor.py** process exists.
 3. Check if your RAGFlow server can access hf-mirror.com or huggingface.com.
-### How to handle `Index failure`?
+
 #### 4.5 `Index failure`
 An index failure usually indicates an unavailable Elasticsearch service.
-### How to check the log of RAGFlow?
+#### 4.6 How to check the log of RAGFlow?
 ```bash
 tail -f path_to_ragflow/docker/ragflow-logs/rag/*.log
 ```
-### How to check the status of each component in RAGFlow?
+#### 4.7 How to check the status of each component in RAGFlow?
 ```bash
 $ docker ps
@ -119,13 +212,13 @@ $ docker ps
 *The system displays the following if all your RAGFlow components are running properly:* 
 ```
-5bc45806b680   infiniflow/ragflow:v0.2.0     "./entrypoint.sh"        11 hours ago   Up 11 hours               0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp, 0.0.0.0:9380->9380/tcp, :::9380->9380/tcp   ragflow-server
+5bc45806b680   infiniflow/ragflow:v0.4.0     "./entrypoint.sh"        11 hours ago   Up 11 hours               0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp, 0.0.0.0:9380->9380/tcp, :::9380->9380/tcp   ragflow-server
 91220e3285dd   docker.elastic.co/elasticsearch/elasticsearch:8.11.3   "/bin/tini -- /usr/l…"   11 hours ago   Up 11 hours (healthy)     9300/tcp, 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp           ragflow-es-01
 d8c86f06c56b   mysql:5.7.18        "docker-entrypoint.s…"   7 days ago     Up 16 seconds (healthy)   0.0.0.0:3306->3306/tcp, :::3306->3306/tcp     ragflow-mysql
 cd29bcb254bc   quay.io/minio/minio:RELEASE.2023-12-20T01-00-02Z       "/usr/bin/docker-ent…"   2 weeks ago    Up 11 hours      0.0.0.0:9001->9001/tcp, :::9001->9001/tcp, 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp     ragflow-minio
 ```
-### How to handle `Exception: Can't connect to ES cluster`?
+#### 4.8 `Exception: Can't connect to ES cluster`
 1. Check the status of your Elasticsearch component:
@ -137,7 +230,7 @@ $ docker ps
 91220e3285dd   docker.elastic.co/elasticsearch/elasticsearch:8.11.3   "/bin/tini -- /usr/l…"   11 hours ago   Up 11 hours (healthy)     9300/tcp, 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp           ragflow-es-01
 ```
-2. If your container keeps restarting, ensure `vm.max_map_count` >= 262144 as per [this README](https://github.com/infiniflow/ragflow?tab=readme-ov-file#-start-up-the-server).
+2. If your container keeps restarting, ensure `vm.max_map_count` >= 262144 as per [this README](https://github.com/infiniflow/ragflow?tab=readme-ov-file#-start-up-the-server). Updating the `vm.max_map_count` value in **/etc/sysctl.conf** is required, if you wish to keep your change permanent. This configuration works only for Linux.
 3. If your issue persists, ensure that the ES host setting is correct:
@ -153,12 +246,104 @@ $ docker ps
    ```
-### How to handle `{"data":null,"retcode":100,"retmsg":"<NotFound '404: Not Found'>"}`?
+#### 4.9 `{"data":null,"retcode":100,"retmsg":"<NotFound '404: Not Found'>"}`
-Your IP address or port number may be incorrect. If you are using the default configurations, enter http://<IP_OF_YOUR_MACHINE> (**NOT `localhost`, NOT 9380, AND NO PORT NUMBER REQUIRED!**) in your browser. This should work.
+Your IP address or port number may be incorrect. If you are using the default configurations, enter http://<IP_OF_YOUR_MACHINE> (**NOT 9380, AND NO PORT NUMBER REQUIRED!**) in your browser. This should work.
 #### 4.10 `Ollama - Mistral instance running at 127.0.0.1:11434 but cannot add Ollama as model in RagFlow`
 A correct Ollama IP address and port is crucial to adding models to Ollama:
 - If you are on demo.ragflow.io, ensure that the server hosting Ollama has a publicly accessible IP address.Note that 127.0.0.1 is not a publicly accessible IP address.
 - If you deploy RAGFlow locally, ensure that Ollama and RAGFlow are in the same LAN and can comunicate with each other.
 #### 4.11 Do you offer examples of using deepdoc to parse PDF or other files?
 Yes, we do. See the Python files under the **rag/app** folder. 
 #### 4.12 Why did I fail to upload a 10MB+ file to my locally deployed RAGFlow?
 You probably forgot to update the **MAX_CONTENT_LENGTH** environment variable:
 1. Add environment variable `MAX_CONTENT_LENGTH` to **ragflow/docker/.env**:
 ```
 MAX_CONTENT_LENGTH=100000000
 ```
 2. Update **docker-compose.yml**:
 ```
 environment:
  - MAX_CONTENT_LENGTH=${MAX_CONTENT_LENGTH}
 ```
 3. Restart the RAGFlow server:
 ```
 docker compose up ragflow -d
 ```
   *Now you should be able to upload files of sizes less than 100MB.*
 #### 4.13 `Table 'rag_flow.document' doesn't exist`
 This exception occurs when starting up the RAGFlow server. Try the following: 
  1. Prolong the sleep time: Go to **docker/entrypoint.sh**, locate line 26, and replace `sleep 60` with `sleep 280`.
  2. If using Windows, ensure that the **entrypoint.sh** has LF end-lines.
  3. Go to **docker/docker-compose.yml**, add the following:
  ```
  ./entrypoint.sh:/ragflow/entrypoint.sh
  ```
  4. Change directory:
  ```bash
  cd docker
  ```
  5. Stop the RAGFlow server:
  ```bash
  docker compose stop
  ```
  6. Restart up the RAGFlow server:
  ```bash
  docker compose up
  ```
 #### 4.14 `hint : 102  Fail to access model  Connection error`
 ![hint102](https://github.com/infiniflow/ragflow/assets/93570324/6633d892-b4f8-49b5-9a0a-37a0a8fba3d2)
 1. Ensure that the RAGFlow server can access the base URL.
 2. Do not forget to append **/v1/** to **http://IP:port**: 
   **http://IP:port/v1/**
 ## Usage
 ### 1. How to increase the length of RAGFlow responses?
 1. Right click the desired dialog to display the **Chat Configuration** window.
 2. Switch to the **Model Setting** tab and adjust the **Max Tokens** slider to get the desired length.
 3. Click **OK** to confirm your change. 
 ### 2. What does Empty response mean? How to set it?
 You limit what the system responds to what you specify in **Empty response** if nothing is retrieved from your knowledge base. If you do not specify anything in **Empty response**, you let your LLM improvise, giving it a chance to hallucinate.
 ### 3. Can I set the base URL for OpenAI somewhere?
 ![](https://github.com/infiniflow/ragflow/assets/93570324/8cfb6fa4-8a97-415d-b9fa-b6f405a055f3)
 ### 4. How to run RAGFlow with a locally deployed LLM?
 You can use Ollama to deploy local LLM. See [here](https://github.com/infiniflow/ragflow/blob/main/docs/ollama.md) for more information. 
 ### 5. How to link up ragflow and ollama servers?
 - If RAGFlow is locally deployed, ensure that your RAGFlow and Ollama are in the same LAN. 
 - If you are using our online demo, ensure that the IP address of your Ollama server is public and accessible.
 ### 6. How to configure RAGFlow to respond with 100% matched results, rather than utilizing LLM?
 1. Click the **Knowledge Base** tab in the middle top of the page.
 2. Right click the desired knowledge base to display the **Configuration** dialogue. 
 3. Choose **Q&A** as the chunk method and click **Save** to confirm your change. 
 ### Do I need to connect to Redis?
 No, connecting to Redis is not required to use RAGFlow. 
--- a/printEnvironment.sh
+++ b/printEnvironment.sh
@ -0,0 +1,67 @@
 #!/bin/bash
 # The function is used to obtain distribution information
 get_distro_info() {
    local distro_id=$(lsb_release -i -s 2>/dev/null)
    local distro_version=$(lsb_release -r -s 2>/dev/null)
    local kernel_version=$(uname -r)
    # If lsd_release is not available, try parsing the/etc/* - release file
    if [ -z "$distro_id" ] || [ -z "$distro_version" ]; then
        distro_id=$(grep '^ID=' /etc/*-release | cut -d= -f2 | tr -d '"')
        distro_version=$(grep '^VERSION_ID=' /etc/*-release | cut -d= -f2 | tr -d '"')
    fi
    echo "$distro_id $distro_version (Kernel version: $kernel_version)"
 }
 # get Git repo name
 git_repo_name=''
 if git rev-parse --is-inside-work-tree > /dev/null 2>&1; then
    git_repo_name=$(basename "$(git rev-parse --show-toplevel)")
    if [ $? -ne 0 ]; then
        git_repo_name="(Can't get repo name)"
    fi
 else
    git_repo_name="It NOT a Git repo"
 fi
 # get CPU type
 cpu_model=$(uname -m)
 # get memory size
 memory_size=$(free -h | grep Mem | awk '{print $2}')
 # get docker version
 docker_version=''
 if command -v docker &> /dev/null; then
    docker_version=$(docker --version | cut -d ' ' -f3)
 else
    docker_version="Docker not installed"
 fi
 # get python version
 python_version=''
 if command -v python &> /dev/null; then
    python_version=$(python --version | cut -d ' ' -f2)
 else
    python_version="Python not installed"
 fi
 # Print all infomation
 echo "Current Repo: $git_repo_name"
 # get Commit ID
 git_version=$(git log -1 --pretty=format:'%h')
 if [ -z "$git_version" ]; then
    echo "Commit Id: The current directory is not a Git repository, or the Git command is not installed."
 else
    echo "Commit Id: $git_version"
 fi
 echo "Operating system: $(get_distro_info)"
 echo "CPU Type: $cpu_model"
 echo "Memory: $memory_size"
 echo "Docker Version: $docker_version"
 echo "Python Version: $python_version"
--- a/rag/app/book.py
+++ b/rag/app/book.py
@ -11,11 +11,13 @@
 #  limitations under the License.
 #
 import copy
 from tika import parser
 import re
 from io import BytesIO
 from rag.nlp import bullets_category, is_english, tokenize, remove_contents_table, \
-    hierarchical_merge, make_colon_as_title, naive_merge, random_choices, tokenize_table, add_positions, tokenize_chunks
+    hierarchical_merge, make_colon_as_title, naive_merge, random_choices, tokenize_table, add_positions, \
    tokenize_chunks, find_codec
 from rag.nlp import huqie
 from deepdoc.parser import PdfParser, DocxParser, PlainParser
@ -23,7 +25,7 @@ from deepdoc.parser import PdfParser, DocxParser, PlainParser
 class Pdf(PdfParser):
    def __call__(self, filename, binary=None, from_page=0,
                 to_page=100000, zoomin=3, callback=None):
-        callback(msg="OCR is  running...")
+        callback(msg="OCR is running...")
        self.__images__(
            filename if not binary else binary,
            zoomin,
@ -36,7 +38,7 @@ class Pdf(PdfParser):
        start = timer()
        self._layouts_rec(zoomin)
        callback(0.67, "Layout analysis finished")
-        print("paddle layouts:", timer() - start)
+        print("layouts:", timer() - start)
        self._table_transformer_job(zoomin)
        callback(0.68, "Table analysis finished")
        self._text_merge()
@ -66,7 +68,7 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
    doc["title_sm_tks"] = huqie.qieqie(doc["title_tks"])
    pdf_parser = None
    sections, tbls = [], []
-    if re.search(r"\.docx?$", filename, re.IGNORECASE):
+    if re.search(r"\.docx$", filename, re.IGNORECASE):
        callback(0.1, "Start to parse.")
        doc_parser = DocxParser()
        # TODO: table of contents need to be removed
@ -74,6 +76,7 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
            binary if binary else filename, from_page=from_page, to_page=to_page)
        remove_contents_table(sections, eng=is_english(
            random_choices([t for t, _ in sections], k=200)))
        tbls = [((None, lns), None) for lns in tbls]
        callback(0.8, "Finish parsing.")
    elif re.search(r"\.pdf$", filename, re.IGNORECASE):
@ -87,7 +90,8 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
        callback(0.1, "Start to parse.")
        txt = ""
        if binary:
-            txt = binary.decode("utf-8")
+            encoding = find_codec(binary)
            txt = binary.decode(encoding)
        else:
            with open(filename, "r") as f:
                while True:
@ -101,9 +105,19 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
            random_choices([t for t, _ in sections], k=200)))
        callback(0.8, "Finish parsing.")
    elif re.search(r"\.doc$", filename, re.IGNORECASE):
        callback(0.1, "Start to parse.")
        binary = BytesIO(binary)
        doc_parsed = parser.from_buffer(binary)
        sections = doc_parsed['content'].split('\n')
        sections = [(l, "") for l in sections if l]
        remove_contents_table(sections, eng=is_english(
            random_choices([t for t, _ in sections], k=200)))
        callback(0.8, "Finish parsing.")
    else:
        raise NotImplementedError(
-            "file type not supported yet(docx, pdf, txt supported)")
+            "file type not supported yet(doc, docx, pdf, txt supported)")
    make_colon_as_title(sections)
    bull = bullets_category(
--- a/rag/app/laws.py
+++ b/rag/app/laws.py
@ -11,13 +11,14 @@
 #  limitations under the License.
 #
 import copy
 from tika import parser
 import re
 from io import BytesIO
 from docx import Document
 from api.db import ParserType
 from rag.nlp import bullets_category, is_english, tokenize, remove_contents_table, hierarchical_merge, \
-    make_colon_as_title, add_positions, tokenize_chunks
+    make_colon_as_title, add_positions, tokenize_chunks, find_codec
 from rag.nlp import huqie
 from deepdoc.parser import PdfParser, DocxParser, PlainParser
 from rag.settings import cron_logger
@ -57,7 +58,7 @@ class Pdf(PdfParser):
    def __call__(self, filename, binary=None, from_page=0,
                 to_page=100000, zoomin=3, callback=None):
-        callback(msg="OCR is  running...")
+        callback(msg="OCR is running...")
        self.__images__(
            filename if not binary else binary,
            zoomin,
@ -71,7 +72,7 @@ class Pdf(PdfParser):
        start = timer()
        self._layouts_rec(zoomin)
        callback(0.67, "Layout analysis finished")
-        cron_logger.info("paddle layouts:".format(
+        cron_logger.info("layouts:".format(
            (timer() - start) / (self.total_page + 0.1)))
        self._naive_vertical_merge()
@ -93,7 +94,7 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
    doc["title_sm_tks"] = huqie.qieqie(doc["title_tks"])
    pdf_parser = None
    sections = []
-    if re.search(r"\.docx?$", filename, re.IGNORECASE):
+    if re.search(r"\.docx$", filename, re.IGNORECASE):
        callback(0.1, "Start to parse.")
        for txt in Docx()(filename, binary):
            sections.append(txt)
@ -111,7 +112,8 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
        callback(0.1, "Start to parse.")
        txt = ""
        if binary:
-            txt = binary.decode("utf-8")
+            encoding = find_codec(binary)
            txt = binary.decode(encoding)
        else:
            with open(filename, "r") as f:
                while True:
@ -122,9 +124,18 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
        sections = txt.split("\n")
        sections = [l for l in sections if l]
        callback(0.8, "Finish parsing.")
    elif re.search(r"\.doc$", filename, re.IGNORECASE):
        callback(0.1, "Start to parse.")
        binary = BytesIO(binary)
        doc_parsed = parser.from_buffer(binary)
        sections = doc_parsed['content'].split('\n')
        sections = [l for l in sections if l]
        callback(0.8, "Finish parsing.")
    else:
        raise NotImplementedError(
-            "file type not supported yet(docx, pdf, txt supported)")
+            "file type not supported yet(doc, docx, pdf, txt supported)")
    # is it English
    eng = lang.lower() == "english"  # is_english(sections)
--- a/rag/app/manual.py
+++ b/rag/app/manual.py
@ -16,7 +16,7 @@ class Pdf(PdfParser):
                 to_page=100000, zoomin=3, callback=None):
        from timeit import default_timer as timer
        start = timer()
-        callback(msg="OCR is  running...")
+        callback(msg="OCR is running...")
        self.__images__(
            filename if not binary else binary,
            zoomin,
@ -32,7 +32,7 @@ class Pdf(PdfParser):
        self._layouts_rec(zoomin)
        callback(0.65, "Layout analysis finished.")
-        print("paddle layouts:", timer() - start)
+        print("layouts:", timer() - start)
        self._table_transformer_job(zoomin)
        callback(0.67, "Table analysis finished.")
        self._text_merge()
--- a/rag/app/naive.py
+++ b/rag/app/naive.py
@ -10,12 +10,13 @@
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from tika import parser
 from io import BytesIO
 from docx import Document
 from timeit import default_timer as timer
 import re
 from deepdoc.parser.pdf_parser import PlainParser
-from rag.app import laws
+from rag.nlp import huqie, naive_merge, tokenize_table, tokenize_chunks, find_codec
 from rag.nlp import huqie, is_english, tokenize, naive_merge, tokenize_table, add_positions, tokenize_chunks
 from deepdoc.parser import PdfParser, ExcelParser, DocxParser
 from rag.settings import cron_logger
@ -67,7 +68,8 @@ class Docx(DocxParser):
 class Pdf(PdfParser):
    def __call__(self, filename, binary=None, from_page=0,
                 to_page=100000, zoomin=3, callback=None):
-        callback(msg="OCR is  running...")
+        start = timer()
        callback(msg="OCR is running...")
        self.__images__(
            filename if not binary else binary,
            zoomin,
@ -76,12 +78,11 @@ class Pdf(PdfParser):
            callback
        )
        callback(msg="OCR finished")
        cron_logger.info("OCR({}~{}): {}".format(from_page, to_page, timer() - start))
        from timeit import default_timer as timer
        start = timer()
        self._layouts_rec(zoomin)
        callback(0.63, "Layout analysis finished.")
        print("paddle layouts:", timer() - start)
        self._table_transformer_job(zoomin)
        callback(0.65, "Table analysis finished.")
        self._text_merge()
@ -91,8 +92,7 @@ class Pdf(PdfParser):
        self._concat_downward()
        #self._filter_forpages()
-        cron_logger.info("paddle layouts:".format(
+        cron_logger.info("layouts: {}".format(timer() - start))
            (timer() - start) / (self.total_page + 0.1)))
        return [(b["text"], self._line_tag(b, zoomin))
                for b in self.boxes], tbls
@ -118,7 +118,7 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
    res = []
    pdf_parser = None
    sections = []
-    if re.search(r"\.docx?$", filename, re.IGNORECASE):
+    if re.search(r"\.docx$", filename, re.IGNORECASE):
        callback(0.1, "Start to parse.")
        sections, tbls = Docx()(filename, binary)
        res = tokenize_table(tbls, doc, eng)
@ -136,11 +136,12 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
        excel_parser = ExcelParser()
        sections = [(excel_parser.html(binary), "")]
-    elif re.search(r"\.txt$", filename, re.IGNORECASE):
+    elif re.search(r"\.(txt|md)$", filename, re.IGNORECASE):
        callback(0.1, "Start to parse.")
        txt = ""
        if binary:
-            txt = binary.decode("utf-8")
+            encoding = find_codec(binary)
            txt = binary.decode(encoding)
        else:
            with open(filename, "r") as f:
                while True:
@ -152,16 +153,26 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
        sections = [(l, "") for l in sections if l]
        callback(0.8, "Finish parsing.")
    elif re.search(r"\.doc$", filename, re.IGNORECASE):
        callback(0.1, "Start to parse.")
        binary = BytesIO(binary)
        doc_parsed = parser.from_buffer(binary)
        sections = doc_parsed['content'].split('\n')
        sections = [(l, "") for l in sections if l]
        callback(0.8, "Finish parsing.")
    else:
        raise NotImplementedError(
-            "file type not supported yet(docx, pdf, txt supported)")
+            "file type not supported yet(doc, docx, pdf, txt supported)")
    st = timer()
    chunks = naive_merge(
        sections, parser_config.get(
            "chunk_token_num", 128), parser_config.get(
            "delimiter", "\n!?。；！？"))
    res.extend(tokenize_chunks(chunks, doc, eng, pdf_parser))
    cron_logger.info("naive_merge({}): {}".format(filename, timer() - st))
    return res
--- a/rag/app/one.py
+++ b/rag/app/one.py
@ -10,16 +10,18 @@
 #  See the License for the specific language governing permissions and
 #  limitations under the License.
 #
 from tika import parser
 from io import BytesIO
 import re
 from rag.app import laws
-from rag.nlp import huqie, tokenize
+from rag.nlp import huqie, tokenize, find_codec
 from deepdoc.parser import PdfParser, ExcelParser, PlainParser
 class Pdf(PdfParser):
    def __call__(self, filename, binary=None, from_page=0,
                 to_page=100000, zoomin=3, callback=None):
-        callback(msg="OCR is  running...")
+        callback(msg="OCR is running...")
        self.__images__(
            filename if not binary else binary,
            zoomin,
@ -33,7 +35,7 @@ class Pdf(PdfParser):
        start = timer()
        self._layouts_rec(zoomin, drop=False)
        callback(0.63, "Layout analysis finished.")
-        print("paddle layouts:", timer() - start)
+        print("layouts:", timer() - start)
        self._table_transformer_job(zoomin)
        callback(0.65, "Table analysis finished.")
        self._text_merge()
@ -60,7 +62,7 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
    eng = lang.lower() == "english"  # is_english(cks)
-    if re.search(r"\.docx?$", filename, re.IGNORECASE):
+    if re.search(r"\.docx$", filename, re.IGNORECASE):
        callback(0.1, "Start to parse.")
        sections = [txt for txt in laws.Docx()(filename, binary) if txt]
        callback(0.8, "Finish parsing.")
@ -82,7 +84,8 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
        callback(0.1, "Start to parse.")
        txt = ""
        if binary:
-            txt = binary.decode("utf-8")
+            encoding = find_codec(binary)
            txt = binary.decode(encoding)
        else:
            with open(filename, "r") as f:
                while True:
@ -94,9 +97,17 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
        sections = [s for s in sections if s]
        callback(0.8, "Finish parsing.")
    elif re.search(r"\.doc$", filename, re.IGNORECASE):
        callback(0.1, "Start to parse.")
        binary = BytesIO(binary)
        doc_parsed = parser.from_buffer(binary)
        sections = doc_parsed['content'].split('\n')
        sections = [l for l in sections if l]
        callback(0.8, "Finish parsing.")
    else:
        raise NotImplementedError(
-            "file type not supported yet(docx, pdf, txt supported)")
+            "file type not supported yet(doc, docx, pdf, txt supported)")
    doc = {
        "docnm_kwd": filename,
--- a/rag/app/paper.py
+++ b/rag/app/paper.py
@ -28,7 +28,7 @@ class Pdf(PdfParser):
    def __call__(self, filename, binary=None, from_page=0,
                 to_page=100000, zoomin=3, callback=None):
-        callback(msg="OCR is  running...")
+        callback(msg="OCR is running...")
        self.__images__(
            filename if not binary else binary,
            zoomin,
@ -42,7 +42,7 @@ class Pdf(PdfParser):
        start = timer()
        self._layouts_rec(zoomin)
        callback(0.63, "Layout analysis finished")
-        print("paddle layouts:", timer() - start)
+        print("layouts:", timer() - start)
        self._table_transformer_job(zoomin)
        callback(0.68, "Table analysis finished")
        self._text_merge()
@ -78,7 +78,7 @@ class Pdf(PdfParser):
        title = ""
        authors = []
        i = 0
-        while i < min(32, len(self.boxes)):
+        while i < min(32, len(self.boxes)-1):
            b = self.boxes[i]
            i += 1
            if b.get("layoutno", "").find("title") >= 0:
--- a/rag/app/presentation.py
+++ b/rag/app/presentation.py
@ -58,7 +58,7 @@ class Pdf(PdfParser):
    def __call__(self, filename, binary=None, from_page=0,
                 to_page=100000, zoomin=3, callback=None):
-        callback(msg="OCR is  running...")
+        callback(msg="OCR is running...")
        self.__images__(filename if not binary else binary,
                        zoomin, from_page, to_page, callback)
        callback(0.8, "Page {}~{}: OCR finished".format(
--- a/rag/app/qa.py
+++ b/rag/app/qa.py
@ -15,7 +15,7 @@ from copy import deepcopy
 from io import BytesIO
 from nltk import word_tokenize
 from openpyxl import load_workbook
-from rag.nlp import is_english, random_choices
+from rag.nlp import is_english, random_choices, find_codec
 from rag.nlp import huqie
 from deepdoc.parser import ExcelParser
@ -106,7 +106,8 @@ def chunk(filename, binary=None, lang="Chinese", callback=None, **kwargs):
        callback(0.1, "Start to parse.")
        txt = ""
        if binary:
-            txt = binary.decode("utf-8")
+            encoding = find_codec(binary)
            txt = binary.decode(encoding)
        else:
            with open(filename, "r") as f:
                while True:
--- a/rag/app/table.py
+++ b/rag/app/table.py
@ -20,7 +20,7 @@ from openpyxl import load_workbook
 from dateutil.parser import parse as datetime_parse
 from api.db.services.knowledgebase_service import KnowledgebaseService
-from rag.nlp import huqie, is_english, tokenize
+from rag.nlp import huqie, is_english, tokenize, find_codec
 from deepdoc.parser import ExcelParser
@ -147,7 +147,8 @@ def chunk(filename, binary=None, from_page=0, to_page=10000000000,
        callback(0.1, "Start to parse.")
        txt = ""
        if binary:
-            txt = binary.decode("utf-8")
+            encoding = find_codec(binary)
            txt = binary.decode(encoding)
        else:
            with open(filename, "r") as f:
                while True:
@ -199,7 +200,7 @@ def chunk(filename, binary=None, from_page=0, to_page=10000000000,
                re.sub(
                    r"(/.*|（[^（）]+?）|\([^()]+?\))",
                    "",
-                    n),
+                    str(n)),
                '_')[0] for n in clmns]
        clmn_tys = []
        for j in range(len(clmns)):
@ -208,7 +209,7 @@ def chunk(filename, binary=None, from_page=0, to_page=10000000000,
            df[clmns[j]] = cln
            if ty == "text":
                txts.extend([str(c) for c in cln if c])
-        clmns_map = [(py_clmns[i].lower() + fieds_map[clmn_tys[i]], clmns[i].replace("_", " "))
+        clmns_map = [(py_clmns[i].lower() + fieds_map[clmn_tys[i]], str(clmns[i]).replace("_", " "))
                     for i in range(len(clmns))]
        eng = lang.lower() == "english"  # is_english(txts)
@ -223,8 +224,8 @@ def chunk(filename, binary=None, from_page=0, to_page=10000000000,
                    continue
                if not str(row[clmns[j]]):
                    continue
-                #if pd.isna(row[clmns[j]]):
+                if pd.isna(row[clmns[j]]):
-                #    continue
+                    continue
                fld = clmns_map[j][0]
                d[fld] = row[clmns[j]] if clmn_tys[j] != "text" else huqie.qie(
                    row[clmns[j]])
--- a/rag/llm/init.py
+++ b/rag/llm/init.py
@ -25,7 +25,7 @@ EmbeddingModel = {
    "Tongyi-Qianwen": HuEmbedding, #QWenEmbed,
    "ZHIPU-AI": ZhipuEmbed,
    "FastEmbed": FastEmbed,
-    "QAnything": QAnythingEmbed
+    "Youdao": YoudaoEmbed
 }
--- a/rag/llm/chat_model.py
+++ b/rag/llm/chat_model.py
@ -153,7 +153,7 @@ class OllamaChat(Base):
                options=options
            )
            ans = response["message"]["content"].strip()
-            return ans, response["eval_count"] + response["prompt_eval_count"]
+            return ans, response["eval_count"] + response.get("prompt_eval_count", 0)
        except Exception as e:
            return "**ERROR**: " + str(e), 0
--- a/rag/llm/embedding_model.py
+++ b/rag/llm/embedding_model.py
@ -14,6 +14,8 @@
 #  limitations under the License.
 #
 from typing import Optional
 from huggingface_hub import snapshot_download
 from zhipuai import ZhipuAI
 import os
 from abc import ABC
@ -35,7 +37,10 @@ try:
        query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章：",
        use_fp16=torch.cuda.is_available())
 except Exception as e:
-    flag_model = FlagModel("BAAI/bge-large-zh-v1.5",
+    model_dir = snapshot_download(repo_id="BAAI/bge-large-zh-v1.5",
                                  local_dir=os.path.join(get_project_base_directory(), "rag/res/bge-large-zh-v1.5"),
                                  local_dir_use_symlinks=False)
    flag_model = FlagModel(model_dir,
                           query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章：",
                           use_fp16=torch.cuda.is_available())
@ -224,19 +229,19 @@ class XinferenceEmbed(Base):
        return np.array(res.data[0].embedding), res.usage.total_tokens
-class QAnythingEmbed(Base):
+class YoudaoEmbed(Base):
    _client = None
    def __init__(self, key=None, model_name="maidalun1020/bce-embedding-base_v1", **kwargs):
        from BCEmbedding import EmbeddingModel as qanthing
-        if not QAnythingEmbed._client:
+        if not YoudaoEmbed._client:
            try:
                print("LOADING BCE...")
-                QAnythingEmbed._client = qanthing(model_name_or_path=os.path.join(
+                YoudaoEmbed._client = qanthing(model_name_or_path=os.path.join(
                    get_project_base_directory(),
                    "rag/res/bce-embedding-base_v1"))
            except Exception as e:
-                QAnythingEmbed._client = qanthing(
+                YoudaoEmbed._client = qanthing(
                    model_name_or_path=model_name.replace(
                        "maidalun1020", "InfiniFlow"))
@ -246,10 +251,10 @@ class QAnythingEmbed(Base):
        for t in texts:
            token_count += num_tokens_from_string(t)
        for i in range(0, len(texts), batch_size):
-            embds = QAnythingEmbed._client.encode(texts[i:i + batch_size])
+            embds = YoudaoEmbed._client.encode(texts[i:i + batch_size])
            res.extend(embds)
        return np.array(res), token_count
    def encode_queries(self, text):
-        embds = QAnythingEmbed._client.encode([text])
+        embds = YoudaoEmbed._client.encode([text])
        return np.array(embds[0]), num_tokens_from_string(text)
--- a/rag/nlp/init.py
+++ b/rag/nlp/init.py
@ -6,6 +6,35 @@ from . import huqie
 import re
 import copy
 all_codecs = [
    'utf-8', 'gb2312', 'gbk', 'utf_16', 'ascii', 'big5', 'big5hkscs',
    'cp037', 'cp273', 'cp424', 'cp437',
    'cp500', 'cp720', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857',
    'cp858', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869',
    'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1125',
    'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256',
    'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr',
    'gb2312', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2',
    'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1',
    'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7',
    'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_11', 'iso8859_13',
    'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_t', 'koi8_u',
    'kz1048', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman',
    'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213',
    'utf_32', 'utf_32_be', 'utf_32_le''utf_16_be', 'utf_16_le', 'utf_7'
 ]
 def find_codec(blob):
    global all_codecs
    for c in all_codecs:
        try:
            blob.decode(c)
            return c
        except Exception as e:
            pass
    return "utf-8"
 BULLET_PATTERN = [[
    r"第[零一二三四五六七八九十百0-9]+(分?编|部分)",
--- a/rag/nlp/huqie.py
+++ b/rag/nlp/huqie.py
@ -8,6 +8,7 @@ import re
 import string
 import sys
 from hanziconv import HanziConv
 from huggingface_hub import snapshot_download
 from nltk import word_tokenize
 from nltk.stem import PorterStemmer, WordNetLemmatizer
 from api.utils.file_utils import get_project_base_directory
--- a/rag/nlp/search.py
+++ b/rag/nlp/search.py
@ -68,7 +68,7 @@ class Dealer:
        pg = int(req.get("page", 1)) - 1
        ps = int(req.get("size", 1000))
        topk = int(req.get("topk", 1024))
-        src = req.get("fields", ["docnm_kwd", "content_ltks", "kb_id", "img_id",
+        src = req.get("fields", ["docnm_kwd", "content_ltks", "kb_id", "img_id", "title_tks", "important_kwd",
                                 "image_id", "doc_id", "q_512_vec", "q_768_vec", "position_int",
                                 "q_1024_vec", "q_1536_vec", "available_int", "content_with_weight"])
@ -237,7 +237,7 @@ class Dealer:
            pieces_.append(t)
        es_logger.info("{} => {}".format(answer, pieces_))
        if not pieces_:
-            return answer
+            return answer, set([])
        ans_v, _ = embd_mdl.encode(pieces_)
        assert len(ans_v[0]) == len(chunk_v[0]), "The dimension of query and chunk do not match: {} vs. {}".format(
@ -289,8 +289,18 @@ class Dealer:
                sres.field[i].get("q_%d_vec" % len(sres.query_vector), "\t".join(["0"] * len(sres.query_vector)))) for i in sres.ids]
        if not ins_embd:
            return [], [], []
-        ins_tw = [sres.field[i][cfield].split(" ")
+
-                  for i in sres.ids]
+        for i in sres.ids:
            if isinstance(sres.field[i].get("important_kwd", []), str):
                sres.field[i]["important_kwd"] = [sres.field[i]["important_kwd"]]
        ins_tw = []
        for i in sres.ids:
            content_ltks = sres.field[i][cfield].split(" ")
            title_tks = [t for t in sres.field[i].get("title_tks", "").split(" ") if t]
            important_kwd = sres.field[i].get("important_kwd", [])
            tks = content_ltks + title_tks + important_kwd
            ins_tw.append(tks)
        sim, tksim, vtsim = self.qryr.hybrid_similarity(sres.query_vector,
                                                        ins_embd,
                                                        keywords,
@ -368,7 +378,7 @@ class Dealer:
    def sql_retrieval(self, sql, fetch_size=128, format="json"):
        from api.settings import chat_logger
-        sql = re.sub(r"[ ]+", " ", sql)
+        sql = re.sub(r"[ `]+", " ", sql)
        sql = sql.replace("%", "")
        es_logger.info(f"Get es sql: {sql}")
        replaces = []
--- a/rag/nlp/synonym.py
+++ b/rag/nlp/synonym.py
@ -17,12 +17,12 @@ class Dealer:
        try:
            self.dictionary = json.load(open(path, 'r'))
        except Exception as e:
-            logging.warn("Miss synonym.json")
+            logging.warning("Missing synonym.json")
            self.dictionary = {}
        if not redis:
            logging.warning(
-                "Realtime synonym is disabled, since no redis connection.")
+                "Real-time synonym is disabled, since no redis connection.")
        if not len(self.dictionary.keys()):
            logging.warning(f"Fail to load synonym")
--- a/rag/settings.py
+++ b/rag/settings.py
@ -25,6 +25,11 @@ SUBPROCESS_STD_LOG_NAME = "std.log"
 ES = get_base_config("es", {})
 MINIO = decrypt_database_config(name="minio")
 try:
    REDIS = decrypt_database_config(name="redis")
 except Exception as e:
    REDIS = {}
    pass
 DOC_MAXIMUM_SIZE = 128 * 1024 * 1024
 # Logger
@ -39,5 +44,6 @@ LoggerFactory.LEVEL = 30
 es_logger = getLogger("es")
 minio_logger = getLogger("minio")
 cron_logger = getLogger("cron_logger")
 cron_logger.setLevel(20)
 chunk_logger = getLogger("chunk_logger")
 database_logger = getLogger("database")
--- a/rag/svr/cache_file_svr.py
+++ b/rag/svr/cache_file_svr.py
@ -0,0 +1,43 @@
 import random
 import time
 import traceback
 from api.db.db_models import close_connection
 from api.db.services.task_service import TaskService
 from rag.utils import MINIO
 from rag.utils.redis_conn import REDIS_CONN
 def collect():
    doc_locations = TaskService.get_ongoing_doc_name()
    #print(tasks)
    if len(doc_locations) == 0:
        time.sleep(1)
        return
    return doc_locations
 def main():
    locations = collect()
    if not locations:return
    print("TASKS:", len(locations))
    for kb_id, loc in locations:
        try:
            if REDIS_CONN.is_alive():
                try:
                    key = "{}/{}".format(kb_id, loc)
                    if REDIS_CONN.exist(key):continue
                    file_bin = MINIO.get(kb_id, loc)
                    REDIS_CONN.transaction(key, file_bin, 12 * 60)
                    print("CACHE:", loc)
                except Exception as e:
                    traceback.print_stack(e)
        except Exception as e:
            traceback.print_stack(e)
 if __name__ == "__main__":
    while True:
        main()
        close_connection()
        time.sleep(1)
--- a/rag/svr/task_broker.py
+++ b/rag/svr/task_broker.py
@ -32,6 +32,9 @@ from api.db.services.document_service import DocumentService
 from api.settings import database_logger
 from api.utils import get_format_time, get_uuid
 from api.utils.file_utils import get_project_base_directory
 from rag.utils.redis_conn import REDIS_CONN
 from api.db.db_models import init_database_tables as init_web_db
 from api.db.init_data import init_web_data
 def collect(tm):
@ -84,10 +87,16 @@ def dispatch():
        tsks = []
        try:
            file_bin = MINIO.get(r["kb_id"], r["location"])
            if REDIS_CONN.is_alive():
                try:
                    REDIS_CONN.set("{}/{}".format(r["kb_id"], r["location"]), file_bin, 12*60)
                except Exception as e:
                    cron_logger.warning("Put into redis[EXCEPTION]:" + str(e))
            if r["type"] == FileType.PDF.value:
                do_layout = r["parser_config"].get("layout_recognize", True)
-                pages = PdfParser.total_page_number(
+                pages = PdfParser.total_page_number(r["name"], file_bin)
                        r["name"], MINIO.get(r["kb_id"], r["location"]))
                page_size = r["parser_config"].get("task_page_size", 12)
                if r["parser_id"] == "paper":
                    page_size = r["parser_config"].get("task_page_size", 22)
@ -110,8 +119,7 @@ def dispatch():
            elif r["parser_id"] == "table":
                rn = HuExcelParser.row_number(
-                    r["name"], MINIO.get(
+                    r["name"], file_bin)
                        r["kb_id"], r["location"]))
                for i in range(0, rn, 3000):
                    task = new_task()
                    task["from_page"] = i
@ -159,7 +167,7 @@ def update_progress():
            info = {
                "process_duation": datetime.timestamp(
                    datetime.now()) -
-                d["process_begin_at"].timestamp(),
+                                   d["process_begin_at"].timestamp(),
                "run": status}
            if prg != 0:
                info["progress"] = prg
@ -175,6 +183,9 @@ if __name__ == "__main__":
    peewee_logger.propagate = False
    peewee_logger.addHandler(database_logger.handlers[0])
    peewee_logger.setLevel(database_logger.level)
    # init db
    init_web_db()
    init_web_data()
    while True:
        dispatch()
--- a/rag/svr/task_executor.py
+++ b/rag/svr/task_executor.py
@ -24,7 +24,7 @@ import sys
 import time
 import traceback
 from functools import partial
-
+from rag.utils import MINIO
 from api.db.db_models import close_connection
 from rag.settings import database_logger
 from rag.settings import cron_logger, DOC_MAXIMUM_SIZE
@ -34,7 +34,7 @@ from elasticsearch_dsl import Q
 from multiprocessing.context import TimeoutError
 from api.db.services.task_service import TaskService
 from rag.utils import ELASTICSEARCH
-from rag.utils import MINIO
+from timeit import default_timer as timer
 from rag.utils import rmSpace, findMaxTm
 from rag.nlp import search
@ -47,6 +47,7 @@ from api.db import LLMType, ParserType
 from api.db.services.document_service import DocumentService
 from api.db.services.llm_service import LLMBundle
 from api.utils.file_utils import get_project_base_directory
 from rag.utils.redis_conn import REDIS_CONN
 BATCH_SIZE = 64
@ -92,6 +93,7 @@ def set_progress(task_id, from_page=0, to_page=-1,
 def collect(comm, mod, tm):
    tasks = TaskService.get_tasks(tm, mod, comm)
    #print(tasks)
    if len(tasks) == 0:
        time.sleep(1)
        return pd.DataFrame()
@ -103,11 +105,22 @@ def collect(comm, mod, tm):
 def get_minio_binary(bucket, name):
    global MINIO
    if REDIS_CONN.is_alive():
        try:
            for _ in range(30):
                if REDIS_CONN.exist("{}/{}".format(bucket, name)):
                    time.sleep(1)
                    break
                time.sleep(1)
            r = REDIS_CONN.get("{}/{}".format(bucket, name))
            if r: return r
            cron_logger.warning("Cache missing: {}".format(name))
        except Exception as e:
            cron_logger.warning("Get redis[EXCEPTION]:" + str(e))
    return MINIO.get(bucket, name)
 def build(row):
    from timeit import default_timer as timer
    if row["size"] > DOC_MAXIMUM_SIZE:
        set_progress(row["id"], prog=-1, msg="File size exceeds( <= %dMb )" %
                     (int(DOC_MAXIMUM_SIZE / 1024 / 1024)))
@ -156,6 +169,7 @@ def build(row):
        "doc_id": row["doc_id"],
        "kb_id": [str(row["kb_id"])]
    }
    el = 0
    for ck in cks:
        d = copy.deepcopy(doc)
        d.update(ck)
@ -175,10 +189,13 @@ def build(row):
        else:
            d["image"].save(output_buffer, format='JPEG')
        st = timer()
        MINIO.put(row["kb_id"], d["_id"], output_buffer.getvalue())
        el += timer() - st
        d["img_id"] = "{}-{}".format(row["kb_id"], d["_id"])
        del d["image"]
        docs.append(d)
    cron_logger.info("MINIO PUT({}):{}".format(row["name"], el))
    return docs
@ -243,6 +260,7 @@ def main(comm, mod):
    tmf = open(tm_fnm, "a+")
    for _, r in rows.iterrows():
        callback = partial(set_progress, r["id"], r["from_page"], r["to_page"])
        #callback(random.random()/10., "Task has been received.")
        try:
            embd_mdl = LLMBundle(r["tenant_id"], LLMType.EMBEDDING, llm_name=r["embd_id"], lang=r["language"])
        except Exception as e:
@ -250,7 +268,9 @@ def main(comm, mod):
            callback(prog=-1, msg=str(e))
            continue
        st = timer()
        cks = build(r)
        cron_logger.info("Build chunks({}): {}".format(r["name"], timer()-st))
        if cks is None:
            continue
        if not cks:
@ -262,17 +282,21 @@ def main(comm, mod):
        callback(
            msg="Finished slicing files(%d). Start to embedding the content." %
            len(cks))
        st = timer()
        try:
            tk_count = embedding(cks, embd_mdl, r["parser_config"], callback)
        except Exception as e:
            callback(-1, "Embedding error:{}".format(str(e)))
            cron_logger.error(str(e))
            tk_count = 0
        cron_logger.info("Embedding elapsed({}): {}".format(r["name"], timer()-st))
-        callback(msg="Finished embedding! Start to build index!")
+        callback(msg="Finished embedding({})! Start to build index!".format(timer()-st))
        init_kb(r)
        chunk_count = len(set([c["_id"] for c in cks]))
        st = timer()
        es_r = ELASTICSEARCH.bulk(cks, search.index_name(r["tenant_id"]))
        cron_logger.info("Indexing elapsed({}): {}".format(r["name"], timer()-st))
        if es_r:
            callback(-1, "Index failure!")
            ELASTICSEARCH.deleteByQuery(
@ -287,8 +311,8 @@ def main(comm, mod):
            DocumentService.increment_chunk_num(
                r["doc_id"], r["kb_id"], tk_count, chunk_count, 0)
            cron_logger.info(
-                "Chunk doc({}), token({}), chunks({})".format(
+                "Chunk doc({}), token({}), chunks({}), elapsed:{}".format(
-                    r["id"], tk_count, len(cks)))
+                    r["id"], tk_count, len(cks), timer()-st))
        tmf.write(str(r["update_time"]) + "\n")
    tmf.close()
@ -300,9 +324,8 @@ if __name__ == "__main__":
    peewee_logger.addHandler(database_logger.handlers[0])
    peewee_logger.setLevel(database_logger.level)
-    from mpi4py import MPI
+    #from mpi4py import MPI
-
+    #comm = MPI.COMM_WORLD
    comm = MPI.COMM_WORLD
    while True:
        main(int(sys.argv[2]), int(sys.argv[1]))
        close_connection()
--- a/rag/utils/es_conn.py
+++ b/rag/utils/es_conn.py
@ -2,6 +2,7 @@ import re
 import json
 import time
 import copy
 import elasticsearch
 from elastic_transport import ConnectionTimeout
 from elasticsearch import Elasticsearch
--- a/rag/utils/minio_conn.py
+++ b/rag/utils/minio_conn.py
@ -56,7 +56,6 @@ class HuMinio(object):
        except Exception as e:
            minio_logger.error(f"Fail rm {bucket}/{fnm}: " + str(e))
    def get(self, bucket, fnm):
        for _ in range(1):
            try:
--- a/rag/utils/redis_conn.py
+++ b/rag/utils/redis_conn.py
@ -0,0 +1,74 @@
 import json
 import redis
 import logging
 from rag import settings
 from rag.utils import singleton
@singleton
 class RedisDB:
    def __init__(self):
        self.REDIS = None
        self.config = settings.REDIS
        self.__open__()
    def __open__(self):
        try:
            self.REDIS = redis.Redis(host=self.config.get("host", "redis").split(":")[0],
                                     port=int(self.config.get("host", ":6379").split(":")[1]),
                                     db=int(self.config.get("db", 1)),
                                     password=self.config.get("password"))
        except Exception as e:
            logging.warning("Redis can't be connected.")
        return self.REDIS
    def is_alive(self):
        return self.REDIS is not None
    def exist(self, k):
        if not self.REDIS: return
        try:
            return self.REDIS.exists(k)
        except Exception as e:
            logging.warning("[EXCEPTION]exist" + str(k) + "||" + str(e))
            self.__open__()
    def get(self, k):
        if not self.REDIS: return
        try:
            return self.REDIS.get(k)
        except Exception as e:
            logging.warning("[EXCEPTION]get" + str(k) + "||" + str(e))
            self.__open__()
    def set_obj(self, k, obj, exp=3600):
        try:
            self.REDIS.set(k, json.dumps(obj, ensure_ascii=False), exp)
            return True
        except Exception as e:
            logging.warning("[EXCEPTION]set_obj" + str(k) + "||" + str(e))
            self.__open__()
        return False
    def set(self, k, v, exp=3600):
        try:
            self.REDIS.set(k, v, exp)
            return True
        except Exception as e:
            logging.warning("[EXCEPTION]set" + str(k) + "||" + str(e))
            self.__open__()
        return False
    def transaction(self, key, value, exp=3600):
        try:
            pipeline = self.REDIS.pipeline(transaction=True)
            pipeline.set(key, value, exp, nx=True)
            pipeline.execute()
            return True
        except Exception as e:
            logging.warning("[EXCEPTION]set" + str(key) + "||" + str(e))
            self.__open__()
        return False
 REDIS_CONN = RedisDB()
--- a/requirements.txt
+++ b/requirements.txt
@ -116,6 +116,7 @@ sniffio==1.3.1
 StrEnum==0.4.15
 sympy==1.12
 threadpoolctl==3.3.0
 tika==2.6.0
 tiktoken==0.6.0
 tokenizers==0.15.2
 torch==2.2.1
@ -133,4 +134,4 @@ xxhash==3.4.1
 yarl==1.9.4
 zhipuai==2.0.1
 BCEmbedding
-loguru==0.7.2
+loguru==0.7.2
--- a/web/externals.d.ts
+++ b/web/externals.d.ts
@ -0,0 +1,138 @@
 // This file is generated by Umi automatically
 // DO NOT CHANGE IT MANUALLY!
 type CSSModuleClasses = { readonly [key: string]: string };
 declare module '*.css' {
  const classes: CSSModuleClasses;
  export default classes;
 }
 declare module '*.scss' {
  const classes: CSSModuleClasses;
  export default classes;
 }
 declare module '*.sass' {
  const classes: CSSModuleClasses;
  export default classes;
 }
 declare module '*.less' {
  const classes: CSSModuleClasses;
  export default classes;
 }
 declare module '*.styl' {
  const classes: CSSModuleClasses;
  export default classes;
 }
 declare module '*.stylus' {
  const classes: CSSModuleClasses;
  export default classes;
 }
 // images
 declare module '*.jpg' {
  const src: string;
  export default src;
 }
 declare module '*.jpeg' {
  const src: string;
  export default src;
 }
 declare module '*.png' {
  const src: string;
  export default src;
 }
 declare module '*.gif' {
  const src: string;
  export default src;
 }
 declare module '*.svg' {
  import * as React from 'react';
  export const ReactComponent: React.FunctionComponent<
    React.SVGProps<SVGSVGElement> & { title?: string }
  >;
  const src: string;
  export default src;
 }
 declare module '*.ico' {
  const src: string;
  export default src;
 }
 declare module '*.webp' {
  const src: string;
  export default src;
 }
 declare module '*.avif' {
  const src: string;
  export default src;
 }
 // media
 declare module '*.mp4' {
  const src: string;
  export default src;
 }
 declare module '*.webm' {
  const src: string;
  export default src;
 }
 declare module '*.ogg' {
  const src: string;
  export default src;
 }
 declare module '*.mp3' {
  const src: string;
  export default src;
 }
 declare module '*.wav' {
  const src: string;
  export default src;
 }
 declare module '*.flac' {
  const src: string;
  export default src;
 }
 declare module '*.aac' {
  const src: string;
  export default src;
 }
 // fonts
 declare module '*.woff' {
  const src: string;
  export default src;
 }
 declare module '*.woff2' {
  const src: string;
  export default src;
 }
 declare module '*.eot' {
  const src: string;
  export default src;
 }
 declare module '*.ttf' {
  const src: string;
  export default src;
 }
 declare module '*.otf' {
  const src: string;
  export default src;
 }
 // other
 declare module '*.wasm' {
  const initWasm: (
    options: WebAssembly.Imports,
  ) => Promise<WebAssembly.Exports>;
  export default initWasm;
 }
 declare module '*.webmanifest' {
  const src: string;
  export default src;
 }
 declare module '*.pdf' {
  const src: string;
  export default src;
 }
 declare module '*.txt' {
  const src: string;
  export default src;
 }
--- a/web/src/assets/svg/chunk-method/law-02.svg
+++ b/web/src/assets/svg/chunk-method/law-02.svg
--- a/web/src/assets/svg/chunk-method/manual-02.svg
+++ b/web/src/assets/svg/chunk-method/manual-02.svg
--- a/web/src/assets/svg/chunk-method/manual-04.svg
+++ b/web/src/assets/svg/chunk-method/manual-04.svg
--- a/web/src/assets/svg/chunk-method/qa-01.svg
+++ b/web/src/assets/svg/chunk-method/qa-01.svg
--- a/web/src/assets/svg/chunk-method/qa-02.svg
+++ b/web/src/assets/svg/chunk-method/qa-02.svg
--- a/web/src/assets/svg/chunk-method/resume-02.svg
+++ b/web/src/assets/svg/chunk-method/resume-02.svg
--- a/web/src/assets/svg/chunk-method/table-01.svg
+++ b/web/src/assets/svg/chunk-method/table-01.svg
--- a/web/src/assets/svg/chunk-method/table-02.svg
+++ b/web/src/assets/svg/chunk-method/table-02.svg
--- a/web/src/assets/svg/file-icon/folder.svg
+++ b/web/src/assets/svg/file-icon/folder.svg
@ -0,0 +1,18 @@
 <svg width="24" height="18" viewBox="0 0 24 18" fill="none" xmlns="http://www.w3.org/2000/svg">
    <path
        d="M1.32202e-08 2.54731L21.5 2.54731C22.8807 2.54731 24 3.4977 24 4.67006L24 15.2838C24 16.4562 22.8807 17.4066 21.5 17.4066L12 17.4066L2.5 17.4066C1.11929 17.4066 8.54054e-08 16.4562 7.9321e-08 15.2838L1.32202e-08 2.54731Z"
        fill="#FBBC1A" />
    <path
        d="M2.97454e-08 5.73144L7.49143e-08 14.4347C8.09987e-08 15.6071 1.11929 16.5575 2.5 16.5575L21.5 16.5575C22.8807 16.5575 24 15.6071 24 14.4347L24 5.51916C24 4.3468 22.8807 3.39641 21.5 3.39641L11 3.39641L11 4.45779C11 5.16121 10.3284 5.73144 9.5 5.73144L2.97454e-08 5.73144Z"
        fill="url(#paint0_linear_2323_8307)" />
    <path
        d="M8.81345e-09 1.6982C3.94591e-09 0.760312 0.89543 -4.64716e-09 2 -1.03797e-08L9 -4.67088e-08C10.1046 -5.24413e-08 11 0.760312 11 1.6982L11 2.54731L1.32202e-08 2.54731L8.81345e-09 1.6982Z"
        fill="#FBBC1A" />
    <defs>
        <linearGradient id="paint0_linear_2323_8307" x1="0" y1="0" x2="28.8004" y2="20.3231"
            gradientUnits="userSpaceOnUse">
            <stop stop-color="#FFE69C" />
            <stop offset="1" stop-color="#FFC937" />
        </linearGradient>
    </defs>
 </svg>
--- a/web/src/base.ts
+++ b/web/src/base.ts
@ -0,0 +1,48 @@
 import isObject from 'lodash/isObject';
 import { DvaModel } from 'umi';
 import { BaseState } from './interfaces/common';
 type State = Record<string, any>;
 type DvaModelKey<T> = keyof DvaModel<T>;
 export const modelExtend = <T>(
  baseModel: Partial<DvaModel<any>>,
  extendModel: DvaModel<any>,
 ): DvaModel<T> => {
  return Object.keys(extendModel).reduce<DvaModel<T>>((pre, cur) => {
    const baseValue = baseModel[cur as DvaModelKey<State>];
    const value = extendModel[cur as DvaModelKey<State>];
    if (isObject(value) && isObject(baseValue) && typeof value !== 'string') {
      const key = cur as Exclude<DvaModelKey<State>, 'namespace'>;
      pre[key] = {
        ...baseValue,
        ...value,
      } as any;
    } else {
      pre[cur as DvaModelKey<State>] = value as any;
    }
    return pre;
  }, {} as DvaModel<T>);
 };
 export const paginationModel: Partial<DvaModel<BaseState>> = {
  state: {
    searchString: '',
    pagination: {
      total: 0,
      current: 1,
      pageSize: 10,
    },
  },
  reducers: {
    setSearchString(state, { payload }) {
      return { ...state, searchString: payload };
    },
    setPagination(state, { payload }) {
      return { ...state, pagination: { ...state.pagination, ...payload } };
    },
  },
 };
--- a/web/src/components/highlight-markdown/index.tsx
+++ b/web/src/components/highlight-markdown/index.tsx
@ -0,0 +1,36 @@
 import Markdown from 'react-markdown';
 import SyntaxHighlighter from 'react-syntax-highlighter';
 import remarkGfm from 'remark-gfm';
 const HightLightMarkdown = ({
  children,
 }: {
  children: string | null | undefined;
 }) => {
  return (
    <Markdown
      remarkPlugins={[remarkGfm]}
      components={
        {
          code(props: any) {
            const { children, className, node, ...rest } = props;
            const match = /language-(\w+)/.exec(className || '');
            return match ? (
              <SyntaxHighlighter {...rest} PreTag="div" language={match[1]}>
                {String(children).replace(/\n$/, '')}
              </SyntaxHighlighter>
            ) : (
              <code {...rest} className={className}>
                {children}
              </code>
            );
          },
        } as any
      }
    >
      {children}
    </Markdown>
  );
 };
 export default HightLightMarkdown;
--- a/web/src/components/line-chart/index.tsx
+++ b/web/src/components/line-chart/index.tsx
@ -50,9 +50,10 @@ const data = [
 interface IProps extends CategoricalChartProps {
  data?: Array<{ xAxis: string; yAxis: number }>;
  showLegend?: boolean;
 }
-const RagLineChart = ({ data }: IProps) => {
+const RagLineChart = ({ data, showLegend = false }: IProps) => {
  return (
    <ResponsiveContainer width="100%" height="100%">
      <LineChart
@ -72,7 +73,7 @@ const RagLineChart = ({ data }: IProps) => {
        <XAxis dataKey="xAxis" />
        <YAxis />
        <Tooltip />
-        <Legend />
+        {showLegend && <Legend />}
        <Line
          type="monotone"
          dataKey="yAxis"
--- a/web/src/global.less
+++ b/web/src/global.less
@ -1,6 +1,19 @@
@import url(./inter.less);
 html {
  height: 100%;
 }
 body {
  font-family: Inter;
  margin: 0;
  height: 100%;
 }
 #root {
  height: 100%;
 }
 .ant-app {
  height: 100%;
 }
--- a/web/src/hooks/chatHooks.ts
+++ b/web/src/hooks/chatHooks.ts
@ -248,3 +248,55 @@ export const useSelectStats = () => {
 };
 //#endregion
 //#region shared chat
 export const useCreateSharedConversation = () => {
  const dispatch = useDispatch();
  const createSharedConversation = useCallback(
    (userId?: string) => {
      return dispatch<any>({
        type: 'chatModel/createExternalConversation',
        payload: { userId },
      });
    },
    [dispatch],
  );
  return createSharedConversation;
 };
 export const useFetchSharedConversation = () => {
  const dispatch = useDispatch();
  const fetchSharedConversation = useCallback(
    (conversationId: string) => {
      return dispatch<any>({
        type: 'chatModel/getExternalConversation',
        payload: conversationId,
      });
    },
    [dispatch],
  );
  return fetchSharedConversation;
 };
 export const useCompleteSharedConversation = () => {
  const dispatch = useDispatch();
  const completeSharedConversation = useCallback(
    (payload: any) => {
      return dispatch<any>({
        type: 'chatModel/completeExternalConversation',
        payload: payload,
      });
    },
    [dispatch],
  );
  return completeSharedConversation;
 };
 //#endregion
--- a/web/src/hooks/documentHooks.ts
+++ b/web/src/hooks/documentHooks.ts
@ -160,12 +160,12 @@ export const useRemoveDocument = () => {
  const { knowledgeId } = useGetKnowledgeSearchParams();
  const removeDocument = useCallback(
-    (documentId: string) => {
+    (documentIds: string[]) => {
      try {
        return dispatch<any>({
          type: 'kFModel/document_rm',
          payload: {
-            doc_id: documentId,
+            doc_id: documentIds,
            kb_id: knowledgeId,
          },
        });
--- a/web/src/hooks/fileManagerHooks.ts
+++ b/web/src/hooks/fileManagerHooks.ts
@ -0,0 +1,144 @@
 import {
  IConnectRequestBody,
  IFileListRequestBody,
 } from '@/interfaces/request/file-manager';
 import { UploadFile } from 'antd';
 import { useCallback } from 'react';
 import { useDispatch, useSelector } from 'umi';
 export const useFetchFileList = () => {
  const dispatch = useDispatch();
  const fetchFileList = useCallback(
    (payload: IFileListRequestBody) => {
      return dispatch<any>({
        type: 'fileManager/listFile',
        payload,
      });
    },
    [dispatch],
  );
  return fetchFileList;
 };
 export const useRemoveFile = () => {
  const dispatch = useDispatch();
  const removeFile = useCallback(
    (fileIds: string[], parentId: string) => {
      return dispatch<any>({
        type: 'fileManager/removeFile',
        payload: { fileIds, parentId },
      });
    },
    [dispatch],
  );
  return removeFile;
 };
 export const useRenameFile = () => {
  const dispatch = useDispatch();
  const renameFile = useCallback(
    (fileId: string, name: string, parentId: string) => {
      return dispatch<any>({
        type: 'fileManager/renameFile',
        payload: { fileId, name, parentId },
      });
    },
    [dispatch],
  );
  return renameFile;
 };
 export const useFetchParentFolderList = () => {
  const dispatch = useDispatch();
  const fetchParentFolderList = useCallback(
    (fileId: string) => {
      return dispatch<any>({
        type: 'fileManager/getAllParentFolder',
        payload: { fileId },
      });
    },
    [dispatch],
  );
  return fetchParentFolderList;
 };
 export const useCreateFolder = () => {
  const dispatch = useDispatch();
  const createFolder = useCallback(
    (parentId: string, name: string) => {
      return dispatch<any>({
        type: 'fileManager/createFolder',
        payload: { parentId, name, type: 'folder' },
      });
    },
    [dispatch],
  );
  return createFolder;
 };
 export const useSelectFileList = () => {
  const fileList = useSelector((state) => state.fileManager.fileList);
  return fileList;
 };
 export const useSelectParentFolderList = () => {
  const parentFolderList = useSelector(
    (state) => state.fileManager.parentFolderList,
  );
  return parentFolderList.toReversed();
 };
 export const useUploadFile = () => {
  const dispatch = useDispatch();
  const uploadFile = useCallback(
    (fileList: UploadFile[], parentId: string) => {
      try {
        return dispatch<any>({
          type: 'fileManager/uploadFile',
          payload: {
            file: fileList,
            parentId,
            path: fileList.map((file) => (file as any).webkitRelativePath),
          },
        });
      } catch (errorInfo) {
        console.log('Failed:', errorInfo);
      }
    },
    [dispatch],
  );
  return uploadFile;
 };
 export const useConnectToKnowledge = () => {
  const dispatch = useDispatch();
  const uploadFile = useCallback(
    (payload: IConnectRequestBody) => {
      try {
        return dispatch<any>({
          type: 'fileManager/connectFileToKnowledge',
          payload,
        });
      } catch (errorInfo) {
        console.log('Failed:', errorInfo);
      }
    },
    [dispatch],
  );
  return uploadFile;
 };
--- a/web/src/hooks/knowledgeHook.ts
+++ b/web/src/hooks/knowledgeHook.ts
@ -127,13 +127,13 @@ export const useFetchKnowledgeBaseConfiguration = () => {
 export const useFetchKnowledgeList = (
  shouldFilterListWithoutDocument: boolean = false,
-): { list: IKnowledge[]; loading: boolean } => {
+) => {
  const dispatch = useDispatch();
  const loading = useOneNamespaceEffectsLoading('knowledgeModel', ['getList']);
  const knowledgeModel = useSelector((state: any) => state.knowledgeModel);
  const { data = [] } = knowledgeModel;
-  const list = useMemo(() => {
+  const list: IKnowledge[] = useMemo(() => {
    return shouldFilterListWithoutDocument
      ? data.filter((x: IKnowledge) => x.chunk_num > 0)
      : data;
@ -149,7 +149,7 @@ export const useFetchKnowledgeList = (
    fetchList();
  }, [fetchList]);
-  return { list, loading };
+  return { list, loading, fetchList };
 };
 export const useSelectFileThumbnails = () => {
--- a/web/src/hooks/logicHooks.ts
+++ b/web/src/hooks/logicHooks.ts
@ -1,9 +1,12 @@
 import { LanguageTranslationMap } from '@/constants/common';
 import { Pagination } from '@/interfaces/common';
 import { IKnowledgeFile } from '@/interfaces/database/knowledge';
 import { IChangeParserConfigRequestBody } from '@/interfaces/request/document';
-import { useCallback, useState } from 'react';
+import { PaginationProps } from 'antd';
 import { useCallback, useMemo, useState } from 'react';
 import { useTranslation } from 'react-i18next';
-import { useSetModalState } from './commonHooks';
+import { useDispatch } from 'umi';
 import { useSetModalState, useTranslate } from './commonHooks';
 import { useSetDocumentParser } from './documentHooks';
 import { useOneNamespaceEffectsLoading } from './storeHooks';
 import { useSaveSetting } from './userSettingHook';
@ -62,3 +65,51 @@ export const useChangeLanguage = () => {
  return changeLanguage;
 };
 export const useGetPagination = (
  total: number,
  page: number,
  pageSize: number,
  onPageChange: PaginationProps['onChange'],
 ) => {
  const { t } = useTranslate('common');
  const pagination: PaginationProps = useMemo(() => {
    return {
      showQuickJumper: true,
      total,
      showSizeChanger: true,
      current: page,
      pageSize: pageSize,
      pageSizeOptions: [1, 2, 10, 20, 50, 100],
      onChange: onPageChange,
      showTotal: (total) => `${t('total')} ${total}`,
    };
  }, [t, onPageChange, page, pageSize, total]);
  return {
    pagination,
  };
 };
 export const useSetPagination = (namespace: string) => {
  const dispatch = useDispatch();
  const setPagination = useCallback(
    (pageNumber = 1, pageSize?: number) => {
      const pagination: Pagination = {
        current: pageNumber,
      } as Pagination;
      if (pageSize) {
        pagination.pageSize = pageSize;
      }
      dispatch({
        type: `${namespace}/setPagination`,
        payload: pagination,
      });
    },
    [dispatch, namespace],
  );
  return setPagination;
 };
--- a/web/src/interfaces/common.ts
+++ b/web/src/interfaces/common.ts
@ -1,6 +1,7 @@
 export interface Pagination {
  current: number;
  pageSize: number;
  total: number;
 }
 export interface BaseState {
@ -13,5 +14,5 @@ export interface IModalProps<T> {
  hideModal(): void;
  visible: boolean;
  loading?: boolean;
-  onOk?(payload?: T): Promise<void> | void;
+  onOk?(payload?: T): Promise<any> | void;
 }
--- a/web/src/interfaces/database/file-manager.ts
+++ b/web/src/interfaces/database/file-manager.ts
@ -0,0 +1,30 @@
 export interface IFile {
  create_date: string;
  create_time: number;
  created_by: string;
  id: string;
  kbs_info: { kb_id: string; kb_name: string }[];
  location: string;
  name: string;
  parent_id: string;
  size: number;
  tenant_id: string;
  type: string;
  update_date: string;
  update_time: number;
 }
 export interface IFolder {
  create_date: string;
  create_time: number;
  created_by: string;
  id: string;
  location: string;
  name: string;
  parent_id: string;
  size: number;
  tenant_id: string;
  type: string;
  update_date: string;
  update_time: number;
 }
--- a/web/src/interfaces/request/base.ts
+++ b/web/src/interfaces/request/base.ts
@ -0,0 +1,7 @@
 export interface IPaginationRequestBody {
  keywords?: string;
  page?: number;
  page_size?: number; // name|create|doc_num|create_time|update_time，default：create_time
  orderby?: string;
  desc?: string;
 }
--- a/web/src/interfaces/request/file-manager.ts
+++ b/web/src/interfaces/request/file-manager.ts
@ -0,0 +1,14 @@
 import { IPaginationRequestBody } from './base';
 export interface IFileListRequestBody extends IPaginationRequestBody {
  parent_id?: string; // folder id
 }
 interface BaseRequestBody {
  parentId: string;
 }
 export interface IConnectRequestBody extends BaseRequestBody {
  fileIds: string[];
  kbIds: string[];
 }
--- a/web/src/layouts/components/header/index.tsx
+++ b/web/src/layouts/components/header/index.tsx
@ -1,4 +1,5 @@
 import { ReactComponent as StarIon } from '@/assets/svg/chat-star.svg';
 import { ReactComponent as FileIcon } from '@/assets/svg/file-management.svg';
 import { ReactComponent as KnowledgeBaseIcon } from '@/assets/svg/knowledge-base.svg';
 import { ReactComponent as Logo } from '@/assets/svg/logo.svg';
 import { useTranslate } from '@/hooks/commonHooks';
@ -24,7 +25,7 @@ const RagHeader = () => {
    () => [
      { path: '/knowledge', name: t('knowledgeBase'), icon: KnowledgeBaseIcon },
      { path: '/chat', name: t('chat'), icon: StarIon },
-      // { path: '/file', name: 'File Management', icon: FileIcon },
+      { path: '/file', name: t('fileManager'), icon: FileIcon },
    ],
    [t],
  );
--- a/web/src/less/mixins.less
+++ b/web/src/less/mixins.less
@ -33,3 +33,12 @@
 .pointerCursor() {
  cursor: pointer;
 }
 .clearCardBody() {
  :global {
    .ant-card-body {
      padding: 0;
      margin: 0;
    }
  }
 }
--- a/web/src/locales/en.ts
+++ b/web/src/locales/en.ts
@ -22,6 +22,8 @@ export default {
      languagePlaceholder: 'select your language',
      copy: 'Copy',
      copied: 'Copied',
      comingSoon: 'Coming Soon',
      download: 'Download',
    },
    login: {
      login: 'Sign in',
@ -52,6 +54,7 @@ export default {
      home: 'Home',
      setting: '用户设置',
      logout: '登出',
      fileManager: 'File Management',
    },
    knowledgeList: {
      welcome: 'Welcome back',
@ -70,7 +73,7 @@ export default {
      namePlaceholder: 'Please input name!',
      doc: 'Docs',
      datasetDescription:
-        "Hey, don't forget to adjust the chunk after adding the dataset! 😉",
+        '😉 Questions and answers can only be answered after the parsing is successful.',
      addFile: 'Add file',
      searchFiles: 'Search your files',
      localFiles: 'Local files',
@ -171,7 +174,7 @@ export default {
      methodTitle: 'Chunking Method Description',
      methodExamples: 'Examples',
      methodExamplesDescription:
-        'This visual guides is in order to make understanding easier for you.',
+        'The following screenshots are presented to facilitate understanding.',
      dialogueExamplesTitle: 'Dialogue Examples',
      methodEmpty:
        'This will display a visual explanation of the knowledge base categories',
@ -201,15 +204,27 @@ export default {
      presentation: `<p>The supported file formats are <b>PDF</b>, <b>PPTX</b>.</p><p>
      Every page will be treated as a chunk. And the thumbnail of every page will be stored.</p><p>
      <i>All the PPT files you uploaded will be chunked by using this method automatically, setting-up for every PPT file is not necessary.</i></p>`,
-      qa: `<p><b>EXCEL</b> and <b>CSV/TXT</b> files are supported.</p><p>
+      qa: `
-      If the file is in excel format, there should be 2 columns question and answer without header.
+      <p>
-      And question column is ahead of answer column.
+      This chunk method supports <b>EXCEL</b> and <b>CSV/TXT</b> file formats.
-      And it's O.K if it has multiple sheets as long as the columns are rightly composed.</p><p>
+    </p>
-    
+    <li>
-      If it's in csv format, it should be UTF-8 encoded. Use TAB as delimiter to separate question and answer.</p><p>
+      If the file is in <b>Excel</b> format, it should consist of two columns
-    
+      without headers: one for questions and the other for answers, with the
-      <i>All the deformed lines will be ignored.
+      question column preceding the answer column. Multiple sheets are
-      Every pair of Q&A will be treated as a chunk.</i></p>`,
+      acceptable as long as the columns are correctly structured.
    </li>
    <li>
      If the file is in <b>CSV/TXT</b> format, it must be UTF-8 encoded with TAB
      used as the delimiter to separate questions and answers.
    </li>
    <p>
      <i>
        Lines of texts that fail to follow the above rules will be ignored, and
        each Q&A pair will be considered a distinct chunk.
      </i>
    </p>
      `,
      resume: `<p>The supported file formats are <b>DOCX</b>, <b>PDF</b>, <b>TXT</b>.
      </p><p>
      The résumé comes in a variety of formats, just like a person’s personality, but we often have to organize them into structured data that makes it easy to search.
@ -337,24 +352,31 @@ export default {
        'This sets the maximum length of the model’s output, measured in the number of tokens (words or pieces of words).',
      quote: 'Show Quote',
      quoteTip: 'Should the source of the original text be displayed?',
-      overview: 'Overview',
+      overview: 'Chat Bot API',
      pv: 'Number of messages',
      uv: 'Active user number',
      speed: 'Token output speed',
      tokens: 'Consume the token number',
      round: 'Session Interaction Number',
      thumbUp: 'customer satisfaction',
      publicUrl: 'Public URL',
      preview: 'Preview',
      embedded: 'Embedded',
      serviceApiEndpoint: 'Service API Endpoint',
      apiKey: 'Api Key',
-      apiReference: 'Api Reference',
+      apiReference: 'API Documents',
      dateRange: 'Date Range:',
      backendServiceApi: 'Backend service API',
      createNewKey: 'Create new key',
      created: 'Created',
      action: 'Action',
      embedModalTitle: 'Embed into website',
      comingSoon: 'Coming Soon',
      fullScreenTitle: 'Full Embed',
      fullScreenDescription:
        'Embed the following iframe into your website at the desired location',
      partialTitle: 'Partial Embed',
      extensionTitle: 'Chrome Extension',
      tokenError: 'Please create API Token first!',
    },
    setting: {
      profile: 'Profile',
@ -363,7 +385,7 @@ export default {
      passwordDescription:
        'Please enter your current password to change your password.',
      model: 'Model Providers',
-      modelDescription: 'Manage your account settings and preferences here.',
+      modelDescription: 'Set the model parameter and API Key here.',
      team: 'Team',
      logout: 'Log out',
      username: 'Username',
@ -440,6 +462,7 @@ export default {
      renamed: 'Renamed',
      operated: 'Operated',
      updated: 'Updated',
      uploaded: 'Uploaded',
      200: 'The server successfully returns the requested data.',
      201: 'Create or modify data successfully.',
      202: 'A request has been queued in the background (asynchronous task).',
@ -461,6 +484,24 @@ export default {
      networkAnomaly: 'network anomaly',
      hint: 'hint',
    },
    fileManager: {
      name: 'Name',
      uploadDate: 'Upload Date',
      knowledgeBase: 'Knowledge Base',
      size: 'Size',
      action: 'Action',
      addToKnowledge: 'Add to Knowledge Base',
      pleaseSelect: 'Please select',
      newFolder: 'New Folder',
      file: 'File',
      uploadFile: 'Upload File',
      directory: 'Directory',
      uploadTitle: 'Click or drag file to this area to upload',
      uploadDescription:
        'Support for a single or bulk upload. Strictly prohibited from uploading company data or other banned files.',
      local: 'Local uploads',
      s3: 'S3 uploads',
    },
    footer: {
      profile: 'All rights reserved @ React',
    },
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Jin Hai	f1c98aad6b	Update version info (#564 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Documentation Update - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2024-04-26 20:07:26 +08:00
KevinHuSh	ab06f502d7	fix bug of file management (#565 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-26 19:59:21 +08:00
balibabu	6329339a32	feat: add Tooltip to action icon of FileManager (#561 ) ### What problem does this PR solve? #345 feat: add Tooltip to action icon of FileManager ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-04-26 18:55:37 +08:00
KevinHuSh	84b39c60f6	fix rename bug (#562 ) ### What problem does this PR solve? fix rename file bugs ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-26 18:55:21 +08:00
balibabu	eb62c669ae	feat: translate FileManager #345 (#558 ) ### What problem does this PR solve? #345 feat: translate FileManager feat: batch delete files from the file table in the knowledge base ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-04-26 17:22:23 +08:00
KevinHuSh	f69ff39fa0	add file management feature (#560 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2024-04-26 17:21:53 +08:00
Jin Hai	b1cd203904	Update version to 0.3.2 (#550 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Documentation Update Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2024-04-26 09:58:35 +08:00
KevinHuSh	b75d75e995	fix youdao bug (#551 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-26 09:58:22 +08:00
Moonlit	76c477f211	chore: disable Kibana volume storage in Docker Compose (#548 ) ### What problem does this PR solve? Since Kibana service is not currently being used, the associated volume 'kibanadata' has been commented out in the Docker Compose file. This change helps to prevent the allocation of unnecessary resources and simplifies the configuration. ### Type of change - [x] Refactoring unused Kibana volume storage	2024-04-26 08:54:27 +08:00
writinwaters	1b01c4fe69	Updated badge link (#545 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Documentation Update	2024-04-25 19:34:21 +08:00
Jin Hai	188f3ddfc5	Update version to v0.3.1 (#544 ) ### What problem does this PR solve? Update version to v0.3.1 ### Type of change - [x] Documentation Update Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2024-04-25 19:18:04 +08:00
balibabu	1dcd439c58	feat: add file icon to table of FileManager #345 (#543 ) ### What problem does this PR solve? feat: add file icon to table of FileManager #345 fix: modify datasetDescription ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-04-25 19:06:24 +08:00
chrysanthemum-boy	26003b5076	Add upload file by knowledge base name API. (#539 ) ### What problem does this PR solve? Add upload file by knowledge base name API. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update --------- Co-authored-by: chrysanthemum-boy <fannc@qq.com>	2024-04-25 15:10:19 +08:00
writinwaters	4130e5c5e5	Updated badge (#540 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Documentation Updates	2024-04-25 15:08:57 +08:00
writinwaters	d0af2f92f2	Added release badge (#538 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Documentation Update	2024-04-25 14:31:54 +08:00
KevinHuSh	66f8d35632	Refactor (#537 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-04-25 14:14:28 +08:00
writinwaters	cf9b554c3a	there's no need to connect to Redis in order to use Redis (#536 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Documentation Update	2024-04-25 14:01:39 +08:00
Jin Hai	aeabc0c9a4	Add disk requirements on the README (#535 ) ### What problem does this PR solve? Add disk requirements on the README ### Type of change - [x] Documentation Update Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2024-04-25 14:00:48 +08:00
Edward-Elric233	9db44da992	Add docker support for OpenCloudOS 9 (#526 ) ### What problem does this PR solve? This PR aims to add support for running Ragflow on Docker with the OpenCloudOS 9 distribution. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: edwardewang <edwardewang@tencent.com>	2024-04-25 08:46:53 +08:00
balibabu	51e7697df7	feat: upload file in FileManager #345 (#529 ) ### What problem does this PR solve? feat: upload file in FileManager #345 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-04-25 08:46:18 +08:00
writinwaters	b06d6395bb	Updated minimum RAM capacity (#528 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2024-04-24 19:22:00 +08:00
Liu Xiaohui	b79f0b0cac	Add .DS_Store and docker/ragflow-logs to the git ignore list (#523 ) ### What problem does this PR solve? Ignore temporal files to help Mac developers. ### Type of change - [x] Other (please describe): Co-authored-by: PLIX870I <plix870i@V-SPDT-XIAOHUI-MB.local>	2024-04-24 17:05:01 +08:00
writinwaters	fe51488973	editorial updates (#525 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2024-04-24 17:04:23 +08:00
Zhenting Gao	5d1803c31d	Add an entry in Debugging section (#481 ) ### What problem does this PR solve? _Add an entry in Debugging section._ ### Type of change - [x] Documentation Update --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2024-04-24 12:21:41 +08:00
Selfuppen	bd76a82c1f	Update conversation_api.md (#489 ) Fixed a spelling error： save -> safe ### What problem does this PR solve? Fixed a spelling error： save -> safe ### Type of change - [x] Documentation Update	2024-04-24 12:21:14 +08:00
Dong Liu	2bc9a7cc18	Add Chinese readme for DeepDoc (#515 ) ### What problem does this PR solve? Add Chinese explanation for deepdoc ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [*] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2024-04-24 12:20:56 +08:00
balibabu	2d228dbf7f	feat: create folder #345 (#518 ) ### What problem does this PR solve? feat: create folder feat: ensure that all files in the current folder can be correctly requested after renaming the folder #345 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-04-24 11:07:22 +08:00
KevinHuSh	369400c483	fix bug of table in docx (#510 ) ### What problem does this PR solve? #509 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-23 19:10:33 +08:00
balibabu	6405041b4d	fix: cannot save the system model setting #468 (#508 ) ### What problem does this PR solve? fix: cannot save the system model setting #468 feat: rename file in FileManager feat: add FileManager feat: override useSelector type ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-23 17:46:56 +08:00
KevinHuSh	aa71462a9f	fix bug #502 (#504 ) ### What problem does this PR solve? #502 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-23 16:01:46 +08:00
chrysanthemum-boy	72384b191d	Add `.doc` file parser. (#497 ) ### What problem does this PR solve? Add `.doc` file parser, using tika. ``` pip install tika ``` ``` from tika import parser from io import BytesIO def extract_text_from_doc_bytes(doc_bytes): file_like_object = BytesIO(doc_bytes) parsed = parser.from_buffer(file_like_object) return parsed["content"] ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: chrysanthemum-boy <fannc@qq.com>	2024-04-23 15:31:43 +08:00
KevinHuSh	0dfc8ddc0f	enlarge docker memory usage (#501 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-04-23 14:41:10 +08:00
KevinHuSh	78402d9a57	enlarge docker memory usage (#496 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-04-23 10:28:09 +08:00
Yingfeng	b448c212ee	Adjust the structure of FAQ (#479 ) ### Type of change - [x] Documentation Update	2024-04-22 16:51:28 +08:00
加帆	0aaade088b	.doc file is not support yet. fix regular expression ，then message can be alert (#487 ) …e alert ### What problem does this PR solve? .doc file is not support yet, fix the regular expression ,then right message can by alert ### Type of change - [ ] Bug Fix : issule: 474	2024-04-22 16:44:20 +08:00
KevinHuSh	a38e163035	remove doc from supported processing types (#488 ) ### What problem does this PR solve? #474 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-22 15:46:09 +08:00
KevinHuSh	3610e1e5b4	fix ollama issuet push (#486 ) ### What problem does this PR solve? #477 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-22 15:13:01 +08:00
Shaun	11949f9f2e	feat: support markdown files (#483 ) parse markdown files as txt ### What problem does this PR solve? support markdown files ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-04-22 14:43:36 +08:00
KevinHuSh	b8e58fe27a	add redis to accelerate access of minio (#482 ) ### What problem does this PR solve? ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-04-22 14:11:09 +08:00
Vimpas	fc87c20bd8	fix: 🐛 Fix duplicate ports in docker-compose (#472 ) ### What problem does this PR solve? Fix duplicate ports in docker-compose ![image](https://github.com/infiniflow/ragflow/assets/54298540/32649b74-97dc-4004-b9aa-ac5e77b368a5) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-21 22:46:07 +08:00
Jin Hai	dee6299ddf	Update format (#467 ) ### What problem does this PR solve? Update README format ### Type of change - [x] Documentation Update Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2024-04-19 20:13:39 +08:00
KevinHuSh	101df2b470	Refine conversaion docs (#465 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2024-04-19 19:15:00 +08:00
balibabu	c055f40dff	feat: #345 even if the backend data returns empty, the skeleton of the chart will be displayed. (#461 ) … chart will be displayed. ### What problem does this PR solve? feat: #345 even if the backend data returns empty, the skeleton of the chart will be displayed. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-04-19 19:05:30 +08:00
KevinHuSh	7da3f88e54	refine docs for chat bot api. (#463 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2024-04-19 19:05:15 +08:00
KevinHuSh	10b79effab	trivals (#462 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2024-04-19 18:54:24 +08:00
KevinHuSh	7e41b4bc94	change readme for 0.3.0 release (#459 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2024-04-19 18:19:15 +08:00
KevinHuSh	ed6081845a	Fit a lot of encodings for text file. (#458 ) ### What problem does this PR solve? #384 ### Type of change - [x] Performance Improvement	2024-04-19 18:02:53 +08:00
balibabu	cda7b607cb	feat: translate EmbedModal #345 (#455 ) ### What problem does this PR solve? Embed the chat window into other websites through iframe #345 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-04-19 16:55:23 +08:00
KevinHuSh	962c66714e	fix divide by zero bug (#447 ) ### What problem does this PR solve? #445 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-19 11:26:38 +08:00
加帆	39f1feaccb	Bug fix pdf parse index out of range (#440 ) ### What problem does this PR solve? fix a bug comes when parse some pdf file #436 ### Type of change - [☑️ ] Bug Fix (non-breaking change which fixes an issue)	2024-04-19 08:44:51 +08:00
balibabu	1dada69daa	fix: replace some pictures of chunk method #437 (#438 ) ### What problem does this PR solve? some chunk method pictures are not in English #437 feat: set the height of both html and body to 100% feat: add SharedChat feat: add shared hooks ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-18 19:27:53 +08:00
ooo oo	fe2f5205fc	add lf end-lines in `*.sh` (#425 ) ### What problem does this PR solve? link #279 #266 ### Type of change - [x] Documentation Update --------- Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>	2024-04-18 17:17:54 +08:00
Bing Han	ac574af60a	Add env to expose minio port to the host (#426 ) ### What problem does this PR solve? The docker-compose file can't config minio related port by .env file. So I just add env `MINIO_CONSOLE_PORT=9001 MINIO_PORT=9000` to .env file. ### Type of change - [x] Refactoring	2024-04-18 15:45:09 +08:00
KevinHuSh	0499a3f621	rm page number exception for pdf parser (#424 ) ### What problem does this PR solve? #423 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-18 12:09:56 +08:00
KevinHuSh	453c29170f	make sure the models will not be load twice (#422 ) ### What problem does this PR solve? #381 ### Type of change - [x] Refactoring	2024-04-18 09:37:23 +08:00
YC	e8570da856	Update table.py to convert clmns to string (#414 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-17 19:48:11 +08:00
Jin Hai	dd7559a009	Update PR template (#415 ) ### What problem does this PR solve? Update PR template ### Type of change - [x] Documentation Update Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2024-04-17 16:43:08 +08:00
writinwaters	3719ff7299	Added some debugging FAQs (#413 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2024-04-17 16:32:36 +08:00
KevinHuSh	800b5c7aaa	fix bulk error for table method (#407 ) ### What problem does this PR solve? Issue link:#366 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-17 12:17:14 +08:00
ysyx2008	f12f30bb7b	Add automation scripts to support displaying environment information such as RAGFlow repository version, operating system, Python version, etc. in a Linux environment for users to report issues. (#396 ) ### What problem does this PR solve? Add automation scripts to support displaying environment information such as RAGFlow repository version, operating system, Python version, etc. in a Linux environment for users to report issues. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-04-17 11:54:06 +08:00
balibabu	30846c83b2	feat: modify the description of qa (#406 ) ### What problem does this PR solve? feat: modify the description of qa Issue link: #405 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-17 11:51:01 +08:00
writinwaters	2afe7a74b3	Added FAQs (#395 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2024-04-16 19:51:20 +08:00
KevinHuSh	d4e0bfc8a5	fix gb2312 encoding issue (#394 ) ### What problem does this PR solve? Issue link:#384 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-04-16 19:45:14 +08:00