Compare commits

..

68 Commits

Author SHA1 Message Date
b1cd203904 Update version to 0.3.2 (#550)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-04-26 09:58:35 +08:00
b75d75e995 fix youdao bug (#551)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-26 09:58:22 +08:00
76c477f211 chore: disable Kibana volume storage in Docker Compose (#548)
### What problem does this PR solve?

Since Kibana service is not currently being used, the associated volume
'kibanadata' has been commented out in the Docker Compose file. This
change helps to prevent the allocation of unnecessary resources and
simplifies the configuration.

### Type of change

- [x] Refactoring
unused Kibana volume storage
2024-04-26 08:54:27 +08:00
1b01c4fe69 Updated badge link (#545)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-04-25 19:34:21 +08:00
188f3ddfc5 Update version to v0.3.1 (#544)
### What problem does this PR solve?

Update version to v0.3.1

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-04-25 19:18:04 +08:00
1dcd439c58 feat: add file icon to table of FileManager #345 (#543)
### What problem does this PR solve?

feat: add file icon to table of FileManager #345
fix: modify datasetDescription

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-25 19:06:24 +08:00
26003b5076 Add upload file by knowledge base name API. (#539)
### What problem does this PR solve?
Add upload file by knowledge base name API.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update

---------

Co-authored-by: chrysanthemum-boy <fannc@qq.com>
2024-04-25 15:10:19 +08:00
4130e5c5e5 Updated badge (#540)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Updates
2024-04-25 15:08:57 +08:00
d0af2f92f2 Added release badge (#538)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-04-25 14:31:54 +08:00
66f8d35632 Refactor (#537)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-04-25 14:14:28 +08:00
cf9b554c3a there's no need to connect to Redis in order to use Redis (#536)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-04-25 14:01:39 +08:00
aeabc0c9a4 Add disk requirements on the README (#535)
### What problem does this PR solve?

Add disk requirements on the README

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-04-25 14:00:48 +08:00
9db44da992 Add docker support for OpenCloudOS 9 (#526)
### What problem does this PR solve?

This PR aims to add support for running Ragflow on Docker with the
OpenCloudOS 9 distribution.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: edwardewang <edwardewang@tencent.com>
2024-04-25 08:46:53 +08:00
51e7697df7 feat: upload file in FileManager #345 (#529)
### What problem does this PR solve?

feat: upload file in FileManager #345 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-25 08:46:18 +08:00
b06d6395bb Updated minimum RAM capacity (#528)
### What problem does this PR solve?


### Type of change

- [x] Documentation Update
2024-04-24 19:22:00 +08:00
b79f0b0cac Add .DS_Store and docker/ragflow-logs to the git ignore list (#523)
### What problem does this PR solve?

Ignore temporal files to help Mac developers.

### Type of change


- [x] Other (please describe):

Co-authored-by: PLIX870I <plix870i@V-SPDT-XIAOHUI-MB.local>
2024-04-24 17:05:01 +08:00
fe51488973 editorial updates (#525)
### What problem does this PR solve?


### Type of change

- [x] Documentation Update
2024-04-24 17:04:23 +08:00
5d1803c31d Add an entry in Debugging section (#481)
### What problem does this PR solve?

_Add an entry in Debugging section._

### Type of change

- [x] Documentation Update

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2024-04-24 12:21:41 +08:00
bd76a82c1f Update conversation_api.md (#489)
Fixed a spelling error:
save -> safe

### What problem does this PR solve?

Fixed a spelling error:
save -> safe

### Type of change

- [x] Documentation Update
2024-04-24 12:21:14 +08:00
2bc9a7cc18 Add Chinese readme for DeepDoc (#515)
### What problem does this PR solve?

Add Chinese explanation for deepdoc

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [*] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2024-04-24 12:20:56 +08:00
2d228dbf7f feat: create folder #345 (#518)
### What problem does this PR solve?

feat: create folder
feat: ensure that all files in the current folder can be correctly
requested after renaming the folder
#345 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-24 11:07:22 +08:00
369400c483 fix bug of table in docx (#510)
### What problem does this PR solve?
#509 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-23 19:10:33 +08:00
6405041b4d fix: cannot save the system model setting #468 (#508)
### What problem does this PR solve?

fix: cannot save the system model setting #468
feat: rename file in FileManager
feat: add FileManager
feat: override useSelector type

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-23 17:46:56 +08:00
aa71462a9f fix bug #502 (#504)
### What problem does this PR solve?

#502 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-23 16:01:46 +08:00
72384b191d Add .doc file parser. (#497)
### What problem does this PR solve?
Add `.doc` file parser, using tika.
```
pip install tika
```
```
from tika import parser
from io import BytesIO

def extract_text_from_doc_bytes(doc_bytes):
    file_like_object = BytesIO(doc_bytes)
    parsed = parser.from_buffer(file_like_object)
    return parsed["content"]
```
### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: chrysanthemum-boy <fannc@qq.com>
2024-04-23 15:31:43 +08:00
0dfc8ddc0f enlarge docker memory usage (#501)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-04-23 14:41:10 +08:00
78402d9a57 enlarge docker memory usage (#496)
### What problem does this PR solve?

### Type of change


- [x] Refactoring
2024-04-23 10:28:09 +08:00
b448c212ee Adjust the structure of FAQ (#479)
### Type of change

- [x] Documentation Update
2024-04-22 16:51:28 +08:00
0aaade088b .doc file is not support yet. fix regular expression ,then message can be alert (#487)
…e alert

### What problem does this PR solve?

.doc file is not support yet, fix the regular expression ,then right
message can by alert

### Type of change

- [ ] Bug Fix  : issule: 474
2024-04-22 16:44:20 +08:00
a38e163035 remove doc from supported processing types (#488)
### What problem does this PR solve?
#474 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-22 15:46:09 +08:00
3610e1e5b4 fix ollama issuet push (#486)
### What problem does this PR solve?

#477 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-22 15:13:01 +08:00
11949f9f2e feat: support markdown files (#483)
parse markdown files as txt

### What problem does this PR solve?

support markdown files

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-22 14:43:36 +08:00
b8e58fe27a add redis to accelerate access of minio (#482)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-22 14:11:09 +08:00
fc87c20bd8 fix: 🐛 Fix duplicate ports in docker-compose (#472)
### What problem does this PR solve?

Fix duplicate ports in docker-compose

![image](https://github.com/infiniflow/ragflow/assets/54298540/32649b74-97dc-4004-b9aa-ac5e77b368a5)


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-21 22:46:07 +08:00
dee6299ddf Update format (#467)
### What problem does this PR solve?

Update README format

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-04-19 20:13:39 +08:00
101df2b470 Refine conversaion docs (#465)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2024-04-19 19:15:00 +08:00
c055f40dff feat: #345 even if the backend data returns empty, the skeleton of the chart will be displayed. (#461)
… chart will be displayed.

### What problem does this PR solve?

feat: #345 even if the backend data returns empty, the skeleton of the
chart will be displayed.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-19 19:05:30 +08:00
7da3f88e54 refine docs for chat bot api. (#463)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2024-04-19 19:05:15 +08:00
10b79effab trivals (#462)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2024-04-19 18:54:24 +08:00
7e41b4bc94 change readme for 0.3.0 release (#459)
### What problem does this PR solve?


### Type of change

- [x] Documentation Update
2024-04-19 18:19:15 +08:00
ed6081845a Fit a lot of encodings for text file. (#458)
### What problem does this PR solve?

#384

### Type of change

- [x] Performance Improvement
2024-04-19 18:02:53 +08:00
cda7b607cb feat: translate EmbedModal #345 (#455)
### What problem does this PR solve?

Embed the chat window into other websites through iframe

#345 

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-04-19 16:55:23 +08:00
962c66714e fix divide by zero bug (#447)
### What problem does this PR solve?

#445 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-19 11:26:38 +08:00
39f1feaccb Bug fix pdf parse index out of range (#440)
### What problem does this PR solve?

fix a bug comes when parse some pdf file #436 

### Type of change

- [☑️ ] Bug Fix (non-breaking change which fixes an issue)
2024-04-19 08:44:51 +08:00
1dada69daa fix: replace some pictures of chunk method #437 (#438)
### What problem does this PR solve?

some chunk method pictures are not in English #437

feat: set the height of both html and body to 100%
feat: add SharedChat
feat: add shared hooks

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-18 19:27:53 +08:00
fe2f5205fc add lf end-lines in *.sh (#425)
### What problem does this PR solve?

link #279 #266 

### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2024-04-18 17:17:54 +08:00
ac574af60a Add env to expose minio port to the host (#426)
### What problem does this PR solve?

The docker-compose file can't config minio related port by .env file. So
I just add env `MINIO_CONSOLE_PORT=9001
MINIO_PORT=9000` to .env file.

### Type of change

- [x] Refactoring
2024-04-18 15:45:09 +08:00
0499a3f621 rm page number exception for pdf parser (#424)
### What problem does this PR solve?

#423 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-18 12:09:56 +08:00
453c29170f make sure the models will not be load twice (#422)
### What problem does this PR solve?

#381 
### Type of change

- [x] Refactoring
2024-04-18 09:37:23 +08:00
YC
e8570da856 Update table.py to convert clmns to string (#414)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-17 19:48:11 +08:00
dd7559a009 Update PR template (#415)
### What problem does this PR solve?

Update PR template

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-04-17 16:43:08 +08:00
3719ff7299 Added some debugging FAQs (#413)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2024-04-17 16:32:36 +08:00
800b5c7aaa fix bulk error for table method (#407)
### What problem does this PR solve?


Issue link:#366

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-17 12:17:14 +08:00
f12f30bb7b Add automation scripts to support displaying environment information such as RAGFlow repository version, operating system, Python version, etc. in a Linux environment for users to report issues. (#396)
### What problem does this PR solve?
Add automation scripts to support displaying environment information
such as RAGFlow repository version, operating system, Python version,
etc. in a Linux environment for users to report issues.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-17 11:54:06 +08:00
30846c83b2 feat: modify the description of qa (#406)
### What problem does this PR solve?

feat: modify the description of qa

Issue link: #405

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-17 11:51:01 +08:00
2afe7a74b3 Added FAQs (#395)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2024-04-16 19:51:20 +08:00
d4e0bfc8a5 fix gb2312 encoding issue (#394)
### What problem does this PR solve?

Issue link:#384
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-16 19:45:14 +08:00
044daff668 Fix document bug (#393)
### What problem does this PR solve?

As title

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-04-16 19:23:09 +08:00
03f8b01b3b fix bug for fasetembed (#392)
### What problem does this PR solve?

Issue link:#325

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-16 19:12:12 +08:00
ad6f0a1ce5 feat: add overview (#391)
### What problem does this PR solve?

feat: render stats charts
feat: create api token
feat: delete api token
feat: add ChatApiKeyModal
feat: add RagLineChart


Issue link: #345

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-16 19:06:47 +08:00
b3843138f4 change version number in readme (#390)
### What problem does this PR solve?


### Type of change

- [x] Documentation Update
2024-04-16 19:04:29 +08:00
e0bdcbbeba fix: revert #359 (#388)
### What problem does this PR solve?

![图片](https://github.com/infiniflow/ragflow/assets/106524776/e62dc04d-dd72-4ef6-ab1f-a2a219dc197a)

revert #359

Issue link:#359

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-16 18:55:51 +08:00
582340a184 Added FAQ why document parsing gets stuck (#385)
### What problem does this PR solve?


### Type of change


- [x] Documentation Update
2024-04-16 16:55:44 +08:00
890561703b Add bce-embedding and fastembed (#383)
### What problem does this PR solve?


Issue link:#326

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-16 16:42:19 +08:00
a7be5d4e8b build ragflow image from scratch (#376)
### What problem does this PR solve?

issue: #205 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-16 12:29:58 +08:00
c344486aa0 enlarge max file number per user limit (#373)
### What problem does this PR solve?

Issue link:#370

### Type of change

- [x] Refactoring
2024-04-16 10:00:52 +08:00
111501af5e make <xxxx> visiable (#369)
### What problem does this PR solve?


![image](https://github.com/infiniflow/ragflow/assets/106524776/0c526a56-05b1-42f8-8bf5-cb23a97183b8)

make `<xxxx>` visiable
it was misinterpreted as part of the HTML tags

![image](https://github.com/infiniflow/ragflow/assets/106524776/1c42aef0-6989-40c1-b129-47a835b038a7)

Issue link:None

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Breaking Change (fix or feature that could cause existing
functionality not to work as expected)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Test cases
- [ ] Python SDK impacted, Need to update PyPI
- [ ] Other (please describe):
2024-04-16 09:47:57 +08:00
9e75bd4d88 change version number (#368)
### What problem does this PR solve?

Issue link: for release v0.1.0

### Type of change

- [x] Documentation Update
- [x] Refactoring
2024-04-15 19:51:18 +08:00
148 changed files with 5916 additions and 1162 deletions

1
.gitattributes vendored Normal file
View File

@ -0,0 +1 @@
*.sh text eol=lf

View File

@ -1,5 +1,5 @@
name: Bug Report
description: Create a bug issue for infinity
description: Create a bug issue for RAGFlow
title: "[Bug]: "
labels: [bug]
body:

View File

@ -1,7 +1,7 @@
---
name: Feature request
title: '[Feature Request]: '
about: Suggest an idea for Infinity
about: Suggest an idea for RAGFlow
labels: ''
---

View File

@ -1,5 +1,5 @@
name: Feature request
description: Propose a feature request for infinity.
description: Propose a feature request for RAGFlow.
title: "[Feature Request]: "
labels: [feature request]
body:

View File

@ -1,5 +1,5 @@
name: Question
description: Ask questions on infinity
description: Ask questions on RAGFlow
title: "[Question]: "
labels: [question]
body:

View File

@ -1,5 +1,5 @@
name: Subtask
description: "Propose a subtask for infinity"
description: "Propose a subtask for RAGFlow"
title: "[Subtask]: "
labels: [subtask]

View File

@ -2,16 +2,11 @@
_Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._
Issue link:#[Link the issue here]
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Breaking Change (fix or feature that could cause existing functionality not to work as expected)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Test cases
- [ ] Python SDK impacted, Need to update PyPI
- [ ] Other (please describe):

6
.gitignore vendored
View File

@ -21,3 +21,9 @@ Cargo.lock
.idea/
.vscode/
# Exclude Mac generated files
.DS_Store
# Exclude the log folder
docker/ragflow-logs/

View File

@ -1,20 +1,20 @@
FROM swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow-base:v1.0
USER root
WORKDIR /ragflow
ADD ./web ./web
RUN cd ./web && npm i && npm run build
ADD ./api ./api
ADD ./conf ./conf
ADD ./deepdoc ./deepdoc
ADD ./rag ./rag
ENV PYTHONPATH=/ragflow/
ENV HF_ENDPOINT=https://hf-mirror.com
ADD docker/entrypoint.sh ./entrypoint.sh
RUN chmod +x ./entrypoint.sh
ENTRYPOINT ["./entrypoint.sh"]
FROM swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow-base:v1.0
USER root
WORKDIR /ragflow
ADD ./web ./web
RUN cd ./web && npm i && npm run build
ADD ./api ./api
ADD ./conf ./conf
ADD ./deepdoc ./deepdoc
ADD ./rag ./rag
ENV PYTHONPATH=/ragflow/
ENV HF_ENDPOINT=https://hf-mirror.com
ADD docker/entrypoint.sh ./entrypoint.sh
RUN chmod +x ./entrypoint.sh
ENTRYPOINT ["./entrypoint.sh"]

54
Dockerfile.scratch Normal file
View File

@ -0,0 +1,54 @@
FROM ubuntu:22.04
USER root
WORKDIR /ragflow
RUN apt-get update && apt-get install -y wget curl build-essential libopenmpi-dev
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh && \
bash ~/miniconda.sh -b -p /root/miniconda3 && \
rm ~/miniconda.sh && ln -s /root/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /root/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc && \
echo "conda activate base" >> ~/.bashrc
ENV PATH /root/miniconda3/bin:$PATH
RUN conda create -y --name py11 python=3.11
ENV CONDA_DEFAULT_ENV py11
ENV CONDA_PREFIX /root/miniconda3/envs/py11
ENV PATH $CONDA_PREFIX/bin:$PATH
RUN curl -sL https://deb.nodesource.com/setup_14.x | bash -
RUN apt-get install -y nodejs
RUN apt-get install -y nginx
ADD ./web ./web
ADD ./api ./api
ADD ./conf ./conf
ADD ./deepdoc ./deepdoc
ADD ./rag ./rag
ADD ./requirements.txt ./requirements.txt
RUN apt install openmpi-bin openmpi-common libopenmpi-dev
ENV LD_LIBRARY_PATH /usr/lib/x86_64-linux-gnu/openmpi/lib:$LD_LIBRARY_PATH
RUN rm /root/miniconda3/envs/py11/compiler_compat/ld
RUN cd ./web && npm i && npm run build
RUN conda run -n py11 pip install -i https://mirrors.aliyun.com/pypi/simple/ -r ./requirements.txt
RUN apt-get update && \
apt-get install -y libglib2.0-0 libgl1-mesa-glx && \
rm -rf /var/lib/apt/lists/*
RUN conda run -n py11 pip install -i https://mirrors.aliyun.com/pypi/simple/ ollama
RUN conda run -n py11 python -m nltk.downloader punkt
RUN conda run -n py11 python -m nltk.downloader wordnet
ENV PYTHONPATH=/ragflow/
ENV HF_ENDPOINT=https://hf-mirror.com
ADD docker/entrypoint.sh ./entrypoint.sh
RUN chmod +x ./entrypoint.sh
ENTRYPOINT ["./entrypoint.sh"]

56
Dockerfile.scratch.oc9 Normal file
View File

@ -0,0 +1,56 @@
FROM opencloudos/opencloudos:9.0
USER root
WORKDIR /ragflow
RUN dnf update -y && dnf install -y wget curl gcc-c++ openmpi-devel
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh && \
bash ~/miniconda.sh -b -p /root/miniconda3 && \
rm ~/miniconda.sh && ln -s /root/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /root/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc && \
echo "conda activate base" >> ~/.bashrc
ENV PATH /root/miniconda3/bin:$PATH
RUN conda create -y --name py11 python=3.11
ENV CONDA_DEFAULT_ENV py11
ENV CONDA_PREFIX /root/miniconda3/envs/py11
ENV PATH $CONDA_PREFIX/bin:$PATH
# RUN curl -sL https://rpm.nodesource.com/setup_14.x | bash -
RUN dnf install -y nodejs
RUN dnf install -y nginx
ADD ./web ./web
ADD ./api ./api
ADD ./conf ./conf
ADD ./deepdoc ./deepdoc
ADD ./rag ./rag
ADD ./requirements.txt ./requirements.txt
RUN dnf install -y openmpi openmpi-devel python3-openmpi
ENV C_INCLUDE_PATH /usr/include/openmpi-x86_64:$C_INCLUDE_PATH
ENV LD_LIBRARY_PATH /usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
RUN rm /root/miniconda3/envs/py11/compiler_compat/ld
RUN cd ./web && npm i && npm run build
RUN conda run -n py11 pip install $(grep -ivE "mpi4py" ./requirements.txt) # without mpi4py==3.1.5
RUN conda run -n py11 pip install redis
RUN dnf update -y && \
dnf install -y glib2 mesa-libGL && \
dnf clean all
RUN conda run -n py11 pip install ollama
RUN conda run -n py11 python -m nltk.downloader punkt
RUN conda run -n py11 python -m nltk.downloader wordnet
ENV PYTHONPATH=/ragflow/
ENV HF_ENDPOINT=https://hf-mirror.com
ADD docker/entrypoint.sh ./entrypoint.sh
RUN chmod +x ./entrypoint.sh
ENTRYPOINT ["./entrypoint.sh"]

View File

@ -11,11 +11,14 @@
</p>
<p align="center">
<a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
</a>
<a href="https://demo.ragflow.io" target="_blank">
<img alt="Static Badge" src="https://img.shields.io/badge/RAGFLOW-LLM-white?&labelColor=dd0af7"></a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/badge/docker_pull-ragflow:v1.0-brightgreen"
alt="docker pull ragflow:v1.0"></a>
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.3.2-brightgreen"
alt="docker pull infiniflow/ragflow:v0.3.2"></a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=7d09f1" alt="license">
</a>
@ -55,6 +58,9 @@
## 📌 Latest Features
- 2024-04-19 Support conversation API ([detail](./docs/conversation_api.md)).
- 2024-04-16 Add an embedding model 'bce-embedding-base_v1' from [BCEmbedding](https://github.com/netease-youdao/BCEmbedding).
- 2024-04-16 Add [FastEmbed](https://github.com/qdrant/fastembed), which is designed specifically for light and speedy embedding.
- 2024-04-11 Support [Xinference](./docs/xinference.md) for local LLM deployment.
- 2024-04-10 Add a new layout recognization model for analyzing Laws documentation.
- 2024-04-08 Support [Ollama](./docs/ollama.md) for local LLM deployment.
@ -70,8 +76,9 @@
### 📝 Prerequisites
- CPU >= 2 cores
- RAM >= 8 GB
- CPU >= 4 cores
- RAM >= 16 GB
- Disk >= 50 GB
- Docker >= 24.0.0 & Docker Compose >= v2.26.1
> If you have not installed Docker on your local machine (Windows, Mac, or Linux), see [Install Docker Engine](https://docs.docker.com/engine/install/).
@ -135,9 +142,10 @@
* Running on http://x.x.x.x:9380
INFO:werkzeug:Press CTRL+C to quit
```
> If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a `network anomaly` error because, at that moment, your RAGFlow may not be fully initialized.
5. In your web browser, enter the IP address of your server and log in to RAGFlow.
> In the given scenario, you only need to enter `http://IP_OF_YOUR_MACHINE` (sans port number) as the default HTTP serving port `80` can be omitted when using the default configurations.
> With default settings, you only need to enter `http://IP_OF_YOUR_MACHINE` (**sans** port number) as the default HTTP serving port `80` can be omitted when using the default configurations.
6. In [service_conf.yaml](./docker/service_conf.yaml), select the desired LLM factory in `user_default_llm` and update the `API_KEY` field with the corresponding API key.
> See [./docs/llm_api_key_setup.md](./docs/llm_api_key_setup.md) for more information.
@ -171,7 +179,7 @@ To build the Docker images from source:
```bash
$ git clone https://github.com/infiniflow/ragflow.git
$ cd ragflow/
$ docker build -t infiniflow/ragflow:v1.0 .
$ docker build -t infiniflow/ragflow:v0.3.2 .
$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d

View File

@ -11,11 +11,14 @@
</p>
<p align="center">
<a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
</a>
<a href="https://demo.ragflow.io" target="_blank">
<img alt="Static Badge" src="https://img.shields.io/badge/RAGFLOW-LLM-white?&labelColor=dd0af7"></a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/badge/docker_pull-ragflow:v1.0-brightgreen"
alt="docker pull ragflow:v1.0"></a>
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.3.2-brightgreen"
alt="docker pull infiniflow/ragflow:v0.3.2"></a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=7d09f1" alt="license">
</a>
@ -55,6 +58,9 @@
## 📌 最新の機能
- 2024-04-19 会話 API をサポートします ([詳細](./docs/conversation_api.md))。
- 2024-04-16 [BCEmbedding](https://github.com/netease-youdao/BCEmbedding) から埋め込みモデル「bce-embedding-base_v1」を追加します。
- 2024-04-16 [FastEmbed](https://github.com/qdrant/fastembed) は、軽量かつ高速な埋め込み用に設計されています。
- 2024-04-11 ローカル LLM デプロイメント用に [Xinference](./docs/xinference.md) をサポートします。
- 2024-04-10 メソッド「Laws」に新しいレイアウト認識モデルを追加します。
- 2024-04-08 [Ollama](./docs/ollama.md) を使用した大規模モデルのローカライズされたデプロイメントをサポートします。
@ -70,8 +76,9 @@
### 📝 必要条件
- CPU >= 2 cores
- RAM >= 8 GB
- CPU >= 4 cores
- RAM >= 16 GB
- Disk >= 50 GB
- Docker >= 24.0.0 & Docker Compose >= v2.26.1
> ローカルマシンWindows、Mac、または Linuxに Docker をインストールしていない場合は、[Docker Engine のインストール](https://docs.docker.com/engine/install/) を参照してください。
@ -135,6 +142,7 @@
* Running on http://x.x.x.x:9380
INFO:werkzeug:Press CTRL+C to quit
```
> もし確認ステップをスキップして直接 RAGFlow にログインした場合、その時点で RAGFlow が完全に初期化されていない可能性があるため、ブラウザーがネットワーク異常エラーを表示するかもしれません。
5. ウェブブラウザで、プロンプトに従ってサーバーの IP アドレスを入力し、RAGFlow にログインします。
> デフォルトの設定を使用する場合、デフォルトの HTTP サービングポート `80` は省略できるので、与えられたシナリオでは、`http://IP_OF_YOUR_MACHINE`(ポート番号は省略)だけを入力すればよい。
@ -171,7 +179,7 @@
```bash
$ git clone https://github.com/infiniflow/ragflow.git
$ cd ragflow/
$ docker build -t infiniflow/ragflow:v1.0 .
$ docker build -t infiniflow/ragflow:v0.3.2 .
$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d

View File

@ -11,11 +11,14 @@
</p>
<p align="center">
<a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
</a>
<a href="https://demo.ragflow.io" target="_blank">
<img alt="Static Badge" src="https://img.shields.io/badge/RAGFLOW-LLM-white?&labelColor=dd0af7"></a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/badge/docker_pull-ragflow:v1.0-brightgreen"
alt="docker pull ragflow:v1.0"></a>
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.3.2-brightgreen"
alt="docker pull infiniflow/ragflow:v0.3.2"></a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=7d09f1" alt="license">
</a>
@ -55,6 +58,9 @@
## 📌 新增功能
- 2024-04-19 支持对话 API ([更多](./docs/conversation_api.md)).
- 2024-04-16 添加嵌入模型 [BCEmbedding](https://github.com/netease-youdao/BCEmbedding) 。
- 2024-04-16 添加 [FastEmbed](https://github.com/qdrant/fastembed) 专为轻型和高速嵌入而设计。
- 2024-04-11 支持用 [Xinference](./docs/xinference.md) 本地化部署大模型。
- 2024-04-10 为Laws版面分析增加了底层模型。
- 2024-04-08 支持用 [Ollama](./docs/ollama.md) 本地化部署大模型。
@ -70,8 +76,9 @@
### 📝 前提条件
- CPU >= 2
- RAM >= 8 GB
- CPU >= 4
- RAM >= 16 GB
- Disk >= 50 GB
- Docker >= 24.0.0 & Docker Compose >= v2.26.1
> 如果你并没有在本机安装 DockerWindows、Mac或者 Linux, 可以参考文档 [Install Docker Engine](https://docs.docker.com/engine/install/) 自行安装。
@ -135,6 +142,7 @@
* Running on http://x.x.x.x:9380
INFO:werkzeug:Press CTRL+C to quit
```
> 如果您跳过这一步系统确认步骤就登录 RAGFlow你的浏览器有可能会提示 `network anomaly` 或 `网络异常`,因为 RAGFlow 可能并未完全启动成功。
5. 在你的浏览器中输入你的服务器对应的 IP 地址并登录 RAGFlow。
> 上面这个例子中,您只需输入 http://IP_OF_YOUR_MACHINE 即可:未改动过配置则无需输入端口(默认的 HTTP 服务端口 80
@ -171,7 +179,7 @@
```bash
$ git clone https://github.com/infiniflow/ragflow.git
$ cd ragflow/
$ docker build -t infiniflow/ragflow:v1.0 .
$ docker build -t infiniflow/ragflow:v0.3.2 .
$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d

View File

@ -54,7 +54,7 @@ app.errorhandler(Exception)(server_error_response)
#app.config["LOGIN_DISABLED"] = True
app.config["SESSION_PERMANENT"] = False
app.config["SESSION_TYPE"] = "filesystem"
app.config['MAX_CONTENT_LENGTH'] = os.environ.get("MAX_CONTENT_LENGTH", 128 * 1024 * 1024)
app.config['MAX_CONTENT_LENGTH'] = int(os.environ.get("MAX_CONTENT_LENGTH", 128 * 1024 * 1024))
Session(app)
login_manager = LoginManager()

View File

@ -13,18 +13,28 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
import os
import re
from datetime import datetime, timedelta
from flask import request
from flask_login import login_required, current_user
from api.db import FileType, ParserType
from api.db.db_models import APIToken, API4Conversation
from api.db.services import duplicate_name
from api.db.services.api_service import APITokenService, API4ConversationService
from api.db.services.dialog_service import DialogService, chat
from api.db.services.document_service import DocumentService
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.user_service import UserTenantService
from api.settings import RetCode
from api.utils import get_uuid, current_timestamp, datetime_format
from api.utils.api_utils import server_error_response, get_data_error_result, get_json_result, validate_request
from itsdangerous import URLSafeTimedSerializer
from api.utils.file_utils import filename_type, thumbnail
from rag.utils import MINIO
def generate_confirmation_token(tenent_id):
serializer = URLSafeTimedSerializer(tenent_id)
@ -105,8 +115,8 @@ def stats():
res = {
"pv": [(o["dt"], o["pv"]) for o in objs],
"uv": [(o["dt"], o["uv"]) for o in objs],
"speed": [(o["dt"], o["tokens"]/o["duration"]) for o in objs],
"tokens": [(o["dt"], o["tokens"]/1000.) for o in objs],
"speed": [(o["dt"], float(o["tokens"])/(float(o["duration"]+0.1))) for o in objs],
"tokens": [(o["dt"], float(o["tokens"])/1000.) for o in objs],
"round": [(o["dt"], o["round"]) for o in objs],
"thumb_up": [(o["dt"], o["thumb_up"]) for o in objs]
}
@ -115,8 +125,7 @@ def stats():
return server_error_response(e)
@manager.route('/new_conversation', methods=['POST'])
@validate_request("user_id")
@manager.route('/new_conversation', methods=['GET'])
def set_conversation():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
@ -131,7 +140,7 @@ def set_conversation():
conv = {
"id": get_uuid(),
"dialog_id": dia.id,
"user_id": req["user_id"],
"user_id": request.args.get("user_id", ""),
"message": [{"role": "assistant", "content": dia.prompt_config["prologue"]}]
}
API4ConversationService.save(**conv)
@ -177,7 +186,6 @@ def completion():
conv.reference.append(ans["reference"])
conv.message.append({"role": "assistant", "content": ans["answer"]})
API4ConversationService.append_message(conv.id, conv.to_dict())
APITokenService.APITokenService(token)
return get_json_result(data=ans)
except Exception as e:
return server_error_response(e)
@ -193,4 +201,74 @@ def get(conversation_id):
return get_json_result(data=conv.to_dict())
except Exception as e:
return server_error_response(e)
return server_error_response(e)
@manager.route('/document/upload', methods=['POST'])
@validate_request("kb_name")
def upload():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, retmsg='Token is not valid!"', retcode=RetCode.AUTHENTICATION_ERROR)
kb_name = request.form.get("kb_name").strip()
tenant_id = objs[0].tenant_id
try:
e, kb = KnowledgebaseService.get_by_name(kb_name, tenant_id)
if not e:
return get_data_error_result(
retmsg="Can't find this knowledgebase!")
kb_id = kb.id
except Exception as e:
return server_error_response(e)
if 'file' not in request.files:
return get_json_result(
data=False, retmsg='No file part!', retcode=RetCode.ARGUMENT_ERROR)
file = request.files['file']
if file.filename == '':
return get_json_result(
data=False, retmsg='No file selected!', retcode=RetCode.ARGUMENT_ERROR)
try:
if DocumentService.get_doc_count(kb.tenant_id) >= int(os.environ.get('MAX_FILE_NUM_PER_USER', 8192)):
return get_data_error_result(
retmsg="Exceed the maximum file number of a free user!")
filename = duplicate_name(
DocumentService.query,
name=file.filename,
kb_id=kb_id)
filetype = filename_type(filename)
if not filetype:
return get_data_error_result(
retmsg="This type of file has not been supported yet!")
location = filename
while MINIO.obj_exist(kb_id, location):
location += "_"
blob = request.files['file'].read()
MINIO.put(kb_id, location, blob)
doc = {
"id": get_uuid(),
"kb_id": kb.id,
"parser_id": kb.parser_id,
"parser_config": kb.parser_config,
"created_by": kb.tenant_id,
"type": filetype,
"name": filename,
"location": location,
"size": len(blob),
"thumbnail": thumbnail(filename, blob)
}
if doc["type"] == FileType.VISUAL:
doc["parser_id"] = ParserType.PICTURE.value
if re.search(r"\.(ppt|pptx|pages)$", filename):
doc["parser_id"] = ParserType.PRESENTATION.value
doc = DocumentService.insert(doc)
return get_json_result(data=doc.to_json())
except Exception as e:
return server_error_response(e)

View File

@ -252,7 +252,7 @@ def retrieval_test():
return get_data_error_result(retmsg="Knowledgebase not found!")
embd_mdl = TenantLLMService.model_instance(
kb.tenant_id, LLMType.EMBEDDING.value)
kb.tenant_id, LLMType.EMBEDDING.value, llm_name=kb.embd_id)
ranks = retrievaler.retrieval(question, embd_mdl, kb.tenant_id, [kb_id], page, size, similarity_threshold,
vector_similarity_weight, top, doc_ids)
for c in ranks["chunks"]:

View File

@ -15,6 +15,7 @@
#
import base64
import os
import pathlib
import re
@ -57,7 +58,8 @@ def upload():
if not e:
return get_data_error_result(
retmsg="Can't find this knowledgebase!")
if DocumentService.get_doc_count(kb.tenant_id) >= 128:
MAX_FILE_NUM_PER_USER = int(os.environ.get('MAX_FILE_NUM_PER_USER', 0))
if MAX_FILE_NUM_PER_USER > 0 and DocumentService.get_doc_count(kb.tenant_id) >= MAX_FILE_NUM_PER_USER:
return get_data_error_result(
retmsg="Exceed the maximum file number of a free user!")

View File

@ -28,7 +28,7 @@ from rag.llm import EmbeddingModel, ChatModel
def factories():
try:
fac = LLMFactoriesService.get_all()
return get_json_result(data=[f.to_dict() for f in fac])
return get_json_result(data=[f.to_dict() for f in fac if f.name not in ["Youdao", "FastEmbed"]])
except Exception as e:
return server_error_response(e)
@ -174,7 +174,7 @@ def list():
llms = [m.to_dict()
for m in llms if m.status == StatusEnum.VALID.value]
for m in llms:
m["available"] = m["fid"] in facts or m["llm_name"].lower() == "flag-embedding"
m["available"] = m["fid"] in facts or m["llm_name"].lower() == "flag-embedding" or m["fid"] in ["Youdao","FastEmbed"]
llm_set = set([m["llm_name"] for m in llms])
for o in objs:

View File

@ -14,6 +14,7 @@
# limitations under the License.
#
import re
from datetime import datetime
from flask import request, session, redirect
from werkzeug.security import generate_password_hash, check_password_hash
@ -22,7 +23,7 @@ from flask_login import login_required, current_user, login_user, logout_user
from api.db.db_models import TenantLLM
from api.db.services.llm_service import TenantLLMService, LLMService
from api.utils.api_utils import server_error_response, validate_request
from api.utils import get_uuid, get_format_time, decrypt, download_img
from api.utils import get_uuid, get_format_time, decrypt, download_img, current_timestamp, datetime_format
from api.db import UserTenantRole, LLMType
from api.settings import RetCode, GITHUB_OAUTH, CHAT_MDL, EMBEDDING_MDL, ASR_MDL, IMAGE2TEXT_MDL, PARSERS, API_KEY, \
LLM_FACTORY, LLM_BASE_URL
@ -56,6 +57,8 @@ def login():
response_data = user.to_json()
user.access_token = get_uuid()
login_user(user)
user.update_time = current_timestamp(),
user.update_date = datetime_format(datetime.now()),
user.save()
msg = "Welcome back!"
return cors_reponse(data=response_data, auth=user.get_id(), retmsg=msg)

View File

@ -629,7 +629,7 @@ class Document(DataBaseModel):
max_length=128,
null=False,
default="local",
help_text="where dose this document from")
help_text="where dose this document come from")
type = CharField(max_length=32, null=False, help_text="file extension")
created_by = CharField(
max_length=32,
@ -697,7 +697,7 @@ class Dialog(DataBaseModel):
null=True,
default="Chinese",
help_text="English|Chinese")
llm_id = CharField(max_length=32, null=False, help_text="default llm ID")
llm_id = CharField(max_length=128, null=False, help_text="default llm ID")
llm_setting = JSONField(null=False, default={"temperature": 0.1, "top_p": 0.3, "frequency_penalty": 0.7,
"presence_penalty": 0.4, "max_tokens": 215})
prompt_type = CharField(

View File

@ -18,7 +18,7 @@ import time
import uuid
from api.db import LLMType, UserTenantRole
from api.db.db_models import init_database_tables as init_web_db, LLMFactories, LLM
from api.db.db_models import init_database_tables as init_web_db, LLMFactories, LLM, TenantLLM
from api.db.services import UserService
from api.db.services.llm_service import LLMFactoriesService, LLMService, TenantLLMService, LLMBundle
from api.db.services.user_service import TenantService, UserTenantService
@ -114,12 +114,16 @@ factory_infos = [{
"logo": "",
"tags": "TEXT EMBEDDING",
"status": "1",
},
{
}, {
"name": "Xinference",
"logo": "",
"tags": "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION",
"status": "1",
},{
"name": "Youdao",
"logo": "",
"tags": "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION",
"status": "1",
},
# {
# "name": "文心一言",
@ -254,12 +258,6 @@ def init_llm_factory():
"tags": "LLM,CHAT,",
"max_tokens": 7900,
"model_type": LLMType.CHAT.value
}, {
"fid": factory_infos[4]["name"],
"llm_name": "flag-embedding",
"tags": "TEXT EMBEDDING,",
"max_tokens": 128 * 1000,
"model_type": LLMType.EMBEDDING.value
}, {
"fid": factory_infos[4]["name"],
"llm_name": "moonshot-v1-32k",
@ -325,6 +323,14 @@ def init_llm_factory():
"max_tokens": 2147483648,
"model_type": LLMType.EMBEDDING.value
},
# ------------------------ Youdao -----------------------
{
"fid": factory_infos[7]["name"],
"llm_name": "maidalun1020/bce-embedding-base_v1",
"tags": "TEXT EMBEDDING,",
"max_tokens": 512,
"model_type": LLMType.EMBEDDING.value
},
]
for info in factory_infos:
try:
@ -337,9 +343,13 @@ def init_llm_factory():
except Exception as e:
pass
LLMFactoriesService.filter_delete([LLMFactories.name=="Local"])
LLMService.filter_delete([LLM.fid=="Local"])
LLMFactoriesService.filter_delete([LLMFactories.name == "Local"])
LLMService.filter_delete([LLM.fid == "Local"])
LLMService.filter_delete([LLM.fid == "Moonshot", LLM.llm_name == "flag-embedding"])
TenantLLMService.filter_delete([TenantLLM.llm_factory == "Moonshot", TenantLLM.llm_name == "flag-embedding"])
LLMFactoriesService.filter_delete([LLMFactoriesService.model.name == "QAnything"])
LLMService.filter_delete([LLMService.model.fid == "QAnything"])
TenantLLMService.filter_update([TenantLLMService.model.llm_factory == "QAnything"], {"llm_factory": "Youdao"})
"""
drop table llm;
drop table llm_factories;

View File

@ -40,8 +40,8 @@ class API4ConversationService(CommonService):
@classmethod
@DB.connection_context()
def append_message(cls, id, conversation):
cls.model.update_by_id(id, conversation)
return cls.model.update(round=cls.model.round + 1).where(id=id).execute()
cls.update_by_id(id, conversation)
return cls.model.update(round=cls.model.round + 1).where(cls.model.id==id).execute()
@classmethod
@DB.connection_context()

View File

@ -80,8 +80,12 @@ def chat(dialog, messages, **kwargs):
raise LookupError("LLM(%s) not found" % dialog.llm_id)
max_tokens = 1024
else: max_tokens = llm[0].max_tokens
kbs = KnowledgebaseService.get_by_ids(dialog.kb_ids)
embd_nms = list(set([kb.embd_id for kb in kbs]))
assert len(embd_nms) == 1, "Knowledge bases use different embedding models."
questions = [m["content"] for m in messages if m["role"] == "user"]
embd_mdl = LLMBundle(dialog.tenant_id, LLMType.EMBEDDING)
embd_mdl = LLMBundle(dialog.tenant_id, LLMType.EMBEDDING, embd_nms[0])
chat_mdl = LLMBundle(dialog.tenant_id, LLMType.CHAT, dialog.llm_id)
prompt_config = dialog.prompt_config

View File

@ -27,7 +27,8 @@ class KnowledgebaseService(CommonService):
page_number, items_per_page, orderby, desc):
kbs = cls.model.select().where(
((cls.model.tenant_id.in_(joined_tenant_ids) & (cls.model.permission ==
TenantPermission.TEAM.value)) | (cls.model.tenant_id == user_id))
TenantPermission.TEAM.value)) | (
cls.model.tenant_id == user_id))
& (cls.model.status == StatusEnum.VALID.value)
)
if desc:
@ -56,7 +57,8 @@ class KnowledgebaseService(CommonService):
cls.model.chunk_num,
cls.model.parser_id,
cls.model.parser_config]
kbs = cls.model.select(*fields).join(Tenant, on=((Tenant.id == cls.model.tenant_id) & (Tenant.status == StatusEnum.VALID.value))).where(
kbs = cls.model.select(*fields).join(Tenant, on=(
(Tenant.id == cls.model.tenant_id) & (Tenant.status == StatusEnum.VALID.value))).where(
(cls.model.id == kb_id),
(cls.model.status == StatusEnum.VALID.value)
)
@ -86,6 +88,7 @@ class KnowledgebaseService(CommonService):
old[k] = list(set(old[k] + v))
else:
old[k] = v
dfs_update(m.parser_config, config)
cls.update_by_id(id, {"parser_config": m.parser_config})
@ -97,3 +100,15 @@ class KnowledgebaseService(CommonService):
if k.parser_config and "field_map" in k.parser_config:
conf.update(k.parser_config["field_map"])
return conf
@classmethod
@DB.connection_context()
def get_by_name(cls, kb_name, tenant_id):
kb = cls.model.select().where(
(cls.model.name == kb_name)
& (cls.model.tenant_id == tenant_id)
& (cls.model.status == StatusEnum.VALID.value)
)
if kb:
return True, kb[0]
return False, None

View File

@ -66,7 +66,7 @@ class TenantLLMService(CommonService):
raise LookupError("Tenant not found")
if llm_type == LLMType.EMBEDDING.value:
mdlnm = tenant.embd_id
mdlnm = tenant.embd_id if not llm_name else llm_name
elif llm_type == LLMType.SPEECH2TEXT.value:
mdlnm = tenant.asr_id
elif llm_type == LLMType.IMAGE2TEXT.value:
@ -77,9 +77,19 @@ class TenantLLMService(CommonService):
assert False, "LLM type error"
model_config = cls.get_api_key(tenant_id, mdlnm)
if model_config: model_config = model_config.to_dict()
if not model_config:
raise LookupError("Model({}) not authorized".format(mdlnm))
model_config = model_config.to_dict()
if llm_type == LLMType.EMBEDDING.value:
llm = LLMService.query(llm_name=llm_name)
if llm and llm[0].fid in ["Youdao", "FastEmbed"]:
model_config = {"llm_factory": llm[0].fid, "api_key":"", "llm_name": llm_name, "api_base": ""}
if not model_config:
if llm_name == "flag-embedding":
model_config = {"llm_factory": "Tongyi-Qianwen", "api_key": "",
"llm_name": llm_name, "api_base": ""}
else:
raise LookupError("Model({}) not authorized".format(mdlnm))
if llm_type == LLMType.EMBEDDING.value:
if model_config["llm_factory"] not in EmbeddingModel:
return

View File

@ -13,12 +13,15 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
import random
from peewee import Expression
from api.db.db_models import DB
from api.db import StatusEnum, FileType, TaskStatus
from api.db.db_models import Task, Document, Knowledgebase, Tenant
from api.db.services.common_service import CommonService
from api.db.services.document_service import DocumentService
from api.utils import current_timestamp
class TaskService(CommonService):
@ -26,7 +29,7 @@ class TaskService(CommonService):
@classmethod
@DB.connection_context()
def get_tasks(cls, tm, mod=0, comm=1, items_per_page=64):
def get_tasks(cls, tm, mod=0, comm=1, items_per_page=1, takeit=True):
fields = [
cls.model.id,
cls.model.doc_id,
@ -41,24 +44,51 @@ class TaskService(CommonService):
Document.size,
Knowledgebase.tenant_id,
Knowledgebase.language,
Tenant.embd_id,
Knowledgebase.embd_id,
Tenant.img2txt_id,
Tenant.asr_id,
cls.model.update_time]
docs = cls.model.select(*fields) \
.join(Document, on=(cls.model.doc_id == Document.id)) \
.join(Knowledgebase, on=(Document.kb_id == Knowledgebase.id)) \
.join(Tenant, on=(Knowledgebase.tenant_id == Tenant.id))\
.where(
Document.status == StatusEnum.VALID.value,
Document.run == TaskStatus.RUNNING.value,
~(Document.type == FileType.VIRTUAL.value),
cls.model.progress == 0,
cls.model.update_time >= tm,
(Expression(cls.model.create_time, "%%", comm) == mod))\
.order_by(cls.model.update_time.asc())\
.paginate(1, items_per_page)
return list(docs.dicts())
with DB.lock("get_task", -1):
docs = cls.model.select(*fields) \
.join(Document, on=(cls.model.doc_id == Document.id)) \
.join(Knowledgebase, on=(Document.kb_id == Knowledgebase.id)) \
.join(Tenant, on=(Knowledgebase.tenant_id == Tenant.id))\
.where(
Document.status == StatusEnum.VALID.value,
Document.run == TaskStatus.RUNNING.value,
~(Document.type == FileType.VIRTUAL.value),
cls.model.progress == 0,
#cls.model.update_time >= tm,
#(Expression(cls.model.create_time, "%%", comm) == mod)
)\
.order_by(cls.model.update_time.asc())\
.paginate(0, items_per_page)
docs = list(docs.dicts())
if not docs: return []
if not takeit: return docs
cls.model.update(progress_msg=cls.model.progress_msg + "\n" + "Task has been received.", progress=random.random()/10.).where(
cls.model.id == docs[0]["id"]).execute()
return docs
@classmethod
@DB.connection_context()
def get_ongoing_doc_name(cls):
with DB.lock("get_task", -1):
docs = cls.model.select(*[Document.kb_id, Document.location]) \
.join(Document, on=(cls.model.doc_id == Document.id)) \
.where(
Document.status == StatusEnum.VALID.value,
Document.run == TaskStatus.RUNNING.value,
~(Document.type == FileType.VIRTUAL.value),
cls.model.progress >= 0,
cls.model.progress < 1,
cls.model.create_time >= current_timestamp() - 180000
)
docs = list(docs.dicts())
if not docs: return []
return list(set([(d["kb_id"], d["location"]) for d in docs]))
@classmethod
@DB.connection_context()
@ -74,9 +104,10 @@ class TaskService(CommonService):
@classmethod
@DB.connection_context()
def update_progress(cls, id, info):
if info["progress_msg"]:
cls.model.update(progress_msg=cls.model.progress_msg + "\n" + info["progress_msg"]).where(
cls.model.id == id).execute()
if "progress" in info:
cls.model.update(progress=info["progress"]).where(
cls.model.id == id).execute()
with DB.lock("update_progress", -1):
if info["progress_msg"]:
cls.model.update(progress_msg=cls.model.progress_msg + "\n" + info["progress_msg"]).where(
cls.model.id == id).execute()
if "progress" in info:
cls.model.update(progress=info["progress"]).where(
cls.model.id == id).execute()

View File

@ -147,7 +147,7 @@ def filename_type(filename):
return FileType.PDF.value
if re.match(
r".*\.(docx|doc|ppt|pptx|yml|xml|htm|json|csv|txt|ini|xls|xlsx|wps|rtf|hlp|pages|numbers|key|md)$", filename):
r".*\.(doc|docx|ppt|pptx|yml|xml|htm|json|csv|txt|ini|xls|xlsx|wps|rtf|hlp|pages|numbers|key|md)$", filename):
return FileType.DOC.value
if re.match(

View File

@ -1,7 +1,7 @@
{
"settings": {
"index": {
"number_of_shards": 4,
"number_of_shards": 2,
"number_of_replicas": 0,
"refresh_interval" : "1000ms"
},

View File

@ -13,11 +13,16 @@ minio:
user: 'rag_flow'
password: 'infini_rag_flow'
host: 'minio:9000'
redis:
db: 1
password: 'infini_rag_flow'
host: 'redis:6379'
es:
hosts: 'http://es01:9200'
user_default_llm:
factory: 'Tongyi-Qianwen'
api_key: 'sk-xxxxxxxxxxxxx'
base_url: ''
oauth:
github:
client_id: xxxxxxxxxxxxxxxxxxxxxxxxx

View File

@ -1 +1,116 @@
[English](./README.md) | 简体中文
[English](./README.md) | 简体中文
# *Deep*Doc
- [*Deep*Doc](#deepdoc)
- [1. 介绍](#1-介绍)
- [2. 视觉处理](#2-视觉处理)
- [3. 解析器](#3-解析器)
- [简历](#简历)
<a name="1"></a>
## 1. 介绍
对于来自不同领域、具有不同格式和不同检索要求的大量文档,准确的分析成为一项极具挑战性的任务。*Deep*Doc 就是为了这个目的而诞生的。到目前为止,*Deep*Doc 中有两个组成部分视觉处理和解析器。如果您对我们的OCR、布局识别和TSR结果感兴趣您可以运行下面的测试程序。
```bash
python deepdoc/vision/t_ocr.py -h
usage: t_ocr.py [-h] --inputs INPUTS [--output_dir OUTPUT_DIR]
options:
-h, --help show this help message and exit
--inputs INPUTS Directory where to store images or PDFs, or a file path to a single image or PDF
--output_dir OUTPUT_DIR
Directory where to store the output images. Default: './ocr_outputs'
```
```bash
python deepdoc/vision/t_recognizer.py -h
usage: t_recognizer.py [-h] --inputs INPUTS [--output_dir OUTPUT_DIR] [--threshold THRESHOLD] [--mode {layout,tsr}]
options:
-h, --help show this help message and exit
--inputs INPUTS Directory where to store images or PDFs, or a file path to a single image or PDF
--output_dir OUTPUT_DIR
Directory where to store the output images. Default: './layouts_outputs'
--threshold THRESHOLD
A threshold to filter out detections. Default: 0.5
--mode {layout,tsr} Task mode: layout recognition or table structure recognition
```
HuggingFace为我们的模型提供服务。如果你在下载HuggingFace模型时遇到问题这可能会有所帮助
```bash
export HF_ENDPOINT=https://hf-mirror.com
```
<a name="2"></a>
## 2. 视觉处理
作为人类,我们使用视觉信息来解决问题。
- **OCROptical Character Recognition光学字符识别**。由于许多文档都是以图像形式呈现的或者至少能够转换为图像因此OCR是文本提取的一个非常重要、基本甚至通用的解决方案。
```bash
python deepdoc/vision/t_ocr.py --inputs=path_to_images_or_pdfs --output_dir=path_to_store_result
```
输入可以是图像或PDF的目录或者单个图像、PDF文件。您可以查看文件夹 `path_to_store_result` 其中有演示结果位置的图像以及包含OCR文本的txt文件。
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/f25bee3d-aaf7-4102-baf5-d5208361d110" width="900"/>
</div>
- 布局识别Layout recognition。来自不同领域的文件可能有不同的布局如报纸、杂志、书籍和简历在布局方面是不同的。只有当机器有准确的布局分析时它才能决定这些文本部分是连续的还是不连续的或者这个部分需要表结构识别Table Structure RecognitionTSR来处理或者这个部件是一个图形并用这个标题来描述。我们有10个基本布局组件涵盖了大多数情况
- 文本
- 标题
- 配图
- 配图标题
- 表格
- 表格标题
- 页头
- 页尾
- 参考引用
- 公式
请尝试以下命令以查看布局检测结果。
```bash
python deepdoc/vision/t_recognizer.py --inputs=path_to_images_or_pdfs --threshold=0.2 --mode=layout --output_dir=path_to_store_result
```
输入可以是图像或PDF的目录或者单个图像、PDF文件。您可以查看文件夹 `path_to_store_result` ,其中有显示检测结果的图像,如下所示:
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/07e0f625-9b28-43d0-9fbb-5bf586cd286f" width="1000"/>
</div>
- **TSRTable Structure Recognition表结构识别**。数据表是一种常用的结构用于表示包括数字或文本在内的数据。表的结构可能非常复杂比如层次结构标题、跨单元格和投影行标题。除了TSR我们还将内容重新组合成LLM可以很好理解的句子。TSR任务有五个标签
- 列
- 行
- 列标题
- 行标题
- 合并单元格
请尝试以下命令以查看布局检测结果。
```bash
python deepdoc/vision/t_recognizer.py --inputs=path_to_images_or_pdfs --threshold=0.2 --mode=tsr --output_dir=path_to_store_result
```
输入可以是图像或PDF的目录或者单个图像、PDF文件。您可以查看文件夹 `path_to_store_result` 其中包含图像和html页面这些页面展示了以下检测结果
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/cb24e81b-f2ba-49f3-ac09-883d75606f4c" width="1000"/>
</div>
<a name="3"></a>
## 3. 解析器
PDF、DOCX、EXCEL和PPT四种文档格式都有相应的解析器。最复杂的是PDF解析器因为PDF具有灵活性。PDF解析器的输出包括
- 在PDF中有自己位置的文本块页码和矩形位置
- 带有PDF裁剪图像的表格以及已经翻译成自然语言句子的内容。
- 图中带标题和文字的图。
### 简历
简历是一种非常复杂的文件。一份由各种布局的非结构化文本组成的简历可以分解为由近百个字段组成的结构化数据。我们还没有打开解析器,因为我们在解析过程之后打开了处理方法。

View File

@ -3,6 +3,8 @@ from openpyxl import load_workbook
import sys
from io import BytesIO
from rag.nlp import find_codec
class HuExcelParser:
def html(self, fnm):
@ -66,7 +68,8 @@ class HuExcelParser:
return total
if fnm.split(".")[-1].lower() in ["csv", "txt"]:
txt = binary.decode("utf-8")
encoding = find_codec(binary)
txt = binary.decode(encoding)
return len(txt.split("\n"))

View File

@ -11,7 +11,7 @@ import pdfplumber
import logging
from PIL import Image, ImageDraw
import numpy as np
from timeit import default_timer as timer
from PyPDF2 import PdfReader as pdf2_read
from api.utils.file_utils import get_project_base_directory
@ -37,17 +37,18 @@ class HuParser:
self.updown_cnt_mdl.set_param({"device": "cuda"})
try:
model_dir = os.path.join(
get_project_base_directory(),
"rag/res/deepdoc")
get_project_base_directory(),
"rag/res/deepdoc")
self.updown_cnt_mdl.load_model(os.path.join(
model_dir, "updown_concat_xgb.model"))
except Exception as e:
model_dir = snapshot_download(
repo_id="InfiniFlow/text_concat_xgb_v1.0")
repo_id="InfiniFlow/text_concat_xgb_v1.0",
local_dir=os.path.join(get_project_base_directory(), "rag/res/deepdoc"),
local_dir_use_symlinks=False)
self.updown_cnt_mdl.load_model(os.path.join(
model_dir, "updown_concat_xgb.model"))
self.page_from = 0
"""
If you have trouble downloading HuggingFace models, -_^ this might help!!
@ -62,7 +63,7 @@ class HuParser:
"""
def __char_width(self, c):
return (c["x1"] - c["x0"]) // len(c["text"])
return (c["x1"] - c["x0"]) // max(len(c["text"]), 1)
def __height(self, c):
return c["bottom"] - c["top"]
@ -74,7 +75,7 @@ class HuParser:
def _y_dis(
self, a, b):
return (
b["top"] + b["bottom"] - a["top"] - a["bottom"]) / 2
b["top"] + b["bottom"] - a["top"] - a["bottom"]) / 2
def _match_proj(self, b):
proj_patt = [
@ -97,9 +98,9 @@ class HuParser:
tks_down = huqie.qie(down["text"][:LEN]).split(" ")
tks_up = huqie.qie(up["text"][-LEN:]).split(" ")
tks_all = up["text"][-LEN:].strip() \
+ (" " if re.match(r"[a-zA-Z0-9]+",
up["text"][-1] + down["text"][0]) else "") \
+ down["text"][:LEN].strip()
+ (" " if re.match(r"[a-zA-Z0-9]+",
up["text"][-1] + down["text"][0]) else "") \
+ down["text"][:LEN].strip()
tks_all = huqie.qie(tks_all).split(" ")
fea = [
up.get("R", -1) == down.get("R", -1),
@ -121,7 +122,7 @@ class HuParser:
True if re.search(r"[,][^。.]+$", up["text"]) else False,
True if re.search(r"[,][^。.]+$", up["text"]) else False,
True if re.search(r"[\(][^\)]+$", up["text"])
and re.search(r"[\)]", down["text"]) else False,
and re.search(r"[\)]", down["text"]) else False,
self._match_proj(down),
True if re.match(r"[A-Z]", down["text"]) else False,
True if re.match(r"[A-Z]", up["text"][-1]) else False,
@ -183,7 +184,7 @@ class HuParser:
continue
for tb in tbls: # for table
left, top, right, bott = tb["x0"] - MARGIN, tb["top"] - MARGIN, \
tb["x1"] + MARGIN, tb["bottom"] + MARGIN
tb["x1"] + MARGIN, tb["bottom"] + MARGIN
left *= ZM
top *= ZM
right *= ZM
@ -295,7 +296,7 @@ class HuParser:
for b in bxs:
if not b["text"]:
left, right, top, bott = b["x0"] * ZM, b["x1"] * \
ZM, b["top"] * ZM, b["bottom"] * ZM
ZM, b["top"] * ZM, b["bottom"] * ZM
b["text"] = self.ocr.recognize(np.array(img),
np.array([[left, top], [right, top], [right, bott], [left, bott]],
dtype=np.float32))
@ -620,7 +621,7 @@ class HuParser:
i += 1
continue
lout_no = str(self.boxes[i]["page_number"]) + \
"-" + str(self.boxes[i]["layoutno"])
"-" + str(self.boxes[i]["layoutno"])
if TableStructureRecognizer.is_caption(self.boxes[i]) or self.boxes[i]["layout_type"] in ["table caption",
"title",
"figure caption",
@ -828,9 +829,13 @@ class HuParser:
pn = [bx["page_number"]]
top = bx["top"] - self.page_cum_height[pn[0] - 1]
bott = bx["bottom"] - self.page_cum_height[pn[0] - 1]
page_images_cnt = len(self.page_images)
if pn[-1] - 1 >= page_images_cnt: return ""
while bott * ZM > self.page_images[pn[-1] - 1].size[1]:
bott -= self.page_images[pn[-1] - 1].size[1] / ZM
pn.append(pn[-1] + 1)
if pn[-1] - 1 >= page_images_cnt:
return ""
return "@@{}\t{:.1f}\t{:.1f}\t{:.1f}\t{:.1f}##" \
.format("-".join([str(p) for p in pn]),
@ -930,6 +935,7 @@ class HuParser:
self.page_cum_height = [0]
self.page_layout = []
self.page_from = page_from
st = timer()
try:
self.pdf = pdfplumber.open(fnm) if isinstance(
fnm, str) else pdfplumber.open(BytesIO(fnm))
@ -968,6 +974,7 @@ class HuParser:
self.outlines.append((a["/Title"], depth))
continue
dfs(a, depth + 1)
dfs(outlines, 0)
except Exception as e:
logging.warning(f"Outlines exception: {e}")
@ -977,13 +984,15 @@ class HuParser:
logging.info("Images converted.")
self.is_english = [re.search(r"[a-zA-Z0-9,/¸;:'\[\]\(\)!@#$%^&*\"?<>._-]{30,}", "".join(
random.choices([c["text"] for c in self.page_chars[i]], k=min(100, len(self.page_chars[i]))))) for i in
range(len(self.page_chars))]
range(len(self.page_chars))]
if sum([1 if e else 0 for e in self.is_english]) > len(
self.page_images) / 2:
self.is_english = True
else:
self.is_english = False
self.is_english = False
st = timer()
for i, img in enumerate(self.page_images):
chars = self.page_chars[i] if not self.is_english else []
self.mean_height.append(
@ -1001,15 +1010,11 @@ class HuParser:
chars[j]["width"]) / 2:
chars[j]["text"] += " "
j += 1
# if i > 0:
# if not chars:
# self.page_cum_height.append(img.size[1] / zoomin)
# else:
# self.page_cum_height.append(
# np.max([c["bottom"] for c in chars]))
self.__ocr(i + 1, img, chars, zoomin)
if callback:
if callback and i % 6 == 5:
callback(prog=(i + 1) * 0.6 / len(self.page_images), msg="")
# print("OCR:", timer()-st)
if not self.is_english and not any(
[c for c in self.page_chars]) and self.boxes:
@ -1045,7 +1050,7 @@ class HuParser:
left, right, top, bottom = float(left), float(
right), float(top), float(bottom)
poss.append(([int(p) - 1 for p in pn.split("-")],
left, right, top, bottom))
left, right, top, bottom))
if not poss:
if need_position:
return None, None
@ -1071,7 +1076,7 @@ class HuParser:
self.page_images[pns[0]].crop((left * ZM, top * ZM,
right *
ZM, min(
bottom, self.page_images[pns[0]].size[1])
bottom, self.page_images[pns[0]].size[1])
))
)
if 0 < ii < len(poss) - 1:

View File

@ -1,5 +1,5 @@
# -*- coding: utf-8 -*-
import re, copy, time, datetime, demjson, \
import re, copy, time, datetime, demjson3, \
traceback, signal
import numpy as np
from deepdoc.parser.resume.entities import degrees, schools, corporations
@ -197,7 +197,7 @@ def forProj(cv):
def json_loads(line):
return demjson.decode(re.sub(r": *(True|False)", r": '\1'", line))
return demjson3.decode(re.sub(r": *(True|False)", r": '\1'", line))
def forWork(cv):

View File

@ -43,7 +43,9 @@ class LayoutRecognizer(Recognizer):
"rag/res/deepdoc")
super().__init__(self.labels, domain, model_dir)
except Exception as e:
model_dir = snapshot_download(repo_id="InfiniFlow/deepdoc")
model_dir = snapshot_download(repo_id="InfiniFlow/deepdoc",
local_dir=os.path.join(get_project_base_directory(), "rag/res/deepdoc"),
local_dir_use_symlinks=False)
super().__init__(self.labels, domain, model_dir)
self.garbage_layouts = ["footer", "header", "reference"]

View File

@ -486,7 +486,9 @@ class OCR(object):
self.text_detector = TextDetector(model_dir)
self.text_recognizer = TextRecognizer(model_dir)
except Exception as e:
model_dir = snapshot_download(repo_id="InfiniFlow/deepdoc")
model_dir = snapshot_download(repo_id="InfiniFlow/deepdoc",
local_dir=os.path.join(get_project_base_directory(), "rag/res/deepdoc"),
local_dir_use_symlinks=False)
self.text_detector = TextDetector(model_dir)
self.text_recognizer = TextRecognizer(model_dir)

View File

@ -41,7 +41,9 @@ class Recognizer(object):
"rag/res/deepdoc")
model_file_path = os.path.join(model_dir, task_name + ".onnx")
if not os.path.exists(model_file_path):
model_dir = snapshot_download(repo_id="InfiniFlow/deepdoc")
model_dir = snapshot_download(repo_id="InfiniFlow/deepdoc",
local_dir=os.path.join(get_project_base_directory(), "rag/res/deepdoc"),
local_dir_use_symlinks=False)
model_file_path = os.path.join(model_dir, task_name + ".onnx")
else:
model_file_path = os.path.join(model_dir, task_name + ".onnx")

View File

@ -39,7 +39,9 @@ class TableStructureRecognizer(Recognizer):
get_project_base_directory(),
"rag/res/deepdoc"))
except Exception as e:
super().__init__(self.labels, "tsr", snapshot_download(repo_id="InfiniFlow/deepdoc"))
super().__init__(self.labels, "tsr", snapshot_download(repo_id="InfiniFlow/deepdoc",
local_dir=os.path.join(get_project_base_directory(), "rag/res/deepdoc"),
local_dir_use_symlinks=False))
def __call__(self, images, thr=0.2):
tbls = super().__call__(images, thr)

View File

@ -11,16 +11,24 @@ ES_PORT=1200
KIBANA_PORT=6601
# Increase or decrease based on the available host memory (in bytes)
MEM_LIMIT=4073741824
MEM_LIMIT=8073741824
MYSQL_PASSWORD=infini_rag_flow
MYSQL_PORT=5455
# Port to expose minio to the host
MINIO_CONSOLE_PORT=9001
MINIO_PORT=9000
MINIO_USER=rag_flow
MINIO_PASSWORD=infini_rag_flow
SVR_HTTP_PORT=9380
RAGFLOW_VERSION=v0.3.2
TIMEZONE='Asia/Shanghai'
######## OS setup for ES ###########

View File

@ -0,0 +1,29 @@
include:
- path: ./docker-compose-base.yml
env_file: ./.env
services:
ragflow:
depends_on:
mysql:
condition: service_healthy
es01:
condition: service_healthy
image: edwardelric233/ragflow:oc9
container_name: ragflow-server
ports:
- ${SVR_HTTP_PORT}:9380
- 80:80
- 443:443
volumes:
- ./service_conf.yaml:/ragflow/conf/service_conf.yaml
- ./ragflow-logs:/ragflow/logs
- ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
- ./nginx/proxy.conf:/etc/nginx/proxy.conf
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
environment:
- TZ=${TIMEZONE}
- HF_ENDPOINT=https://hf-mirror.com
networks:
- ragflow
restart: always

View File

@ -9,7 +9,7 @@ services:
condition: service_healthy
es01:
condition: service_healthy
image: swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:v1.0
image: swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:${RAGFLOW_VERSION}
container_name: ragflow-server
ports:
- ${SVR_HTTP_PORT}:9380

View File

@ -29,23 +29,23 @@ services:
- ragflow
restart: always
kibana:
depends_on:
es01:
condition: service_healthy
image: docker.elastic.co/kibana/kibana:${STACK_VERSION}
container_name: ragflow-kibana
volumes:
- kibanadata:/usr/share/kibana/data
ports:
- ${KIBANA_PORT}:5601
environment:
- SERVERNAME=kibana
- ELASTICSEARCH_HOSTS=http://es01:9200
- TZ=${TIMEZONE}
mem_limit: ${MEM_LIMIT}
networks:
- ragflow
#kibana:
# depends_on:
# es01:
# condition: service_healthy
# image: docker.elastic.co/kibana/kibana:${STACK_VERSION}
# container_name: ragflow-kibana
# volumes:
# - kibanadata:/usr/share/kibana/data
# ports:
# - ${KIBANA_PORT}:5601
# environment:
# - SERVERNAME=kibana
# - ELASTICSEARCH_HOSTS=http://es01:9200
# - TZ=${TIMEZONE}
# mem_limit: ${MEM_LIMIT}
# networks:
# - ragflow
mysql:
image: mysql:5.7.18
@ -80,8 +80,8 @@ services:
container_name: ragflow-minio
command: server --console-address ":9001" /data
ports:
- 9000:9000
- 9001:9001
- ${MINIO_PORT}:9000
- ${MINIO_CONSOLE_PORT}:9001
environment:
- MINIO_ROOT_USER=${MINIO_USER}
- MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}
@ -96,8 +96,8 @@ services:
volumes:
esdata01:
driver: local
kibanadata:
driver: local
# kibanadata:
# driver: local
mysql_data:
driver: local
minio_data:

View File

@ -9,7 +9,7 @@ services:
condition: service_healthy
es01:
condition: service_healthy
image: infiniflow/ragflow:v1.0
image: infiniflow/ragflow:${RAGFLOW_VERSION}
container_name: ragflow-server
ports:
- ${SVR_HTTP_PORT}:9380
@ -23,7 +23,7 @@ services:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
environment:
- TZ=${TIMEZONE}
- HF_ENDPOINT=https://huggingface.com
- HF_ENDPOINT=https://huggingface.co
networks:
- ragflow
restart: always

View File

@ -23,13 +23,12 @@ function watch_broker(){
}
function task_bro(){
sleep 160;
watch_broker;
}
task_bro &
WS=2
WS=1
for ((i=0;i<WS;i++))
do
task_exe $i $WS &

View File

@ -13,6 +13,10 @@ minio:
user: 'rag_flow'
password: 'infini_rag_flow'
host: 'minio:9000'
redis:
db: 1
password: 'infini_rag_flow'
host: 'redis:6379'
es:
hosts: 'http://es01:9200'
user_default_llm:

View File

@ -1,5 +1,9 @@
# Conversation API Instruction
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/df0dcc3d-789a-44f7-89f1-7a5f044ab729" width="830"/>
</div>
## Base URL
```buildoutcfg
https://demo.ragflow.io/v1/
@ -7,7 +11,7 @@ https://demo.ragflow.io/v1/
## Authorization
All the APIs are authorized with API-Key. Please keep it save and private. Don't reveal it in any way from the front-end.
All the APIs are authorized with API-Key. Please keep it safe and private. Don't reveal it in any way from the front-end.
The API-Key should put in the header of request:
```buildoutcfg
Authorization: Bearer {API_KEY}
@ -299,5 +303,61 @@ This will be called to get the answer to users' questions.
## Get document content or image
This is usually used when display content of citation.
### Path: /document/get/\<id\>
### Path: /api/document/get/\<id\>
### Method: GET
## Upload file
This is usually used when upload a file to.
### Path: /api/document/upload/
### Method: POST
### Parameter:
| name | type | optional | description |
|---------|--------|----------|----------------------------------------|
| file | file | No | Upload file. |
| kb_name | string | No | Choose the upload knowledge base name. |
### Response
```json
{
"data": {
"chunk_num": 0,
"create_date": "Thu, 25 Apr 2024 14:30:06 GMT",
"create_time": 1714026606921,
"created_by": "553ec818fd5711ee8ea63043d7ed348e",
"id": "41e9324602cd11ef9f5f3043d7ed348e",
"kb_id": "06802686c0a311ee85d6246e9694c130",
"location": "readme.txt",
"name": "readme.txt",
"parser_config": {
"field_map": {
},
"pages": [
[
0,
1000000
]
]
},
"parser_id": "general",
"process_begin_at": null,
"process_duation": 0.0,
"progress": 0.0,
"progress_msg": "",
"run": "0",
"size": 929,
"source_type": "local",
"status": "1",
"thumbnail": null,
"token_num": 0,
"type": "doc",
"update_date": "Thu, 25 Apr 2024 14:30:06 GMT",
"update_time": 1714026606921
},
"retcode": 0,
"retmsg": "success"
}
```

View File

@ -2,105 +2,209 @@
## General
### What sets RAGFlow apart from other RAG products?
### 1. What sets RAGFlow apart from other RAG products?
The "garbage in garbage out" status quo remains unchanged despite the fact that LLMs have advanced Natural Language Processing (NLP) significantly. In response, RAGFlow introduces two unique features compared to other Retrieval-Augmented Generation (RAG) products.
- Fine-grained document parsing: Document parsing involves images and tables, with the flexibility for you to intervene as needed.
- Traceable answers with reduced hallucinations: You can trust RAGFlow's responses as you can view the citations and references supporting them.
### Which languages does RAGFlow support?
### 2. Which languages does RAGFlow support?
English, simplified Chinese, traditional Chinese for now.
## Performance
### Why does it take longer for RAGFlow to parse a document than LangChain?
### 1. Why does it take longer for RAGFlow to parse a document than LangChain?
We put painstaking effort into document pre-processing tasks like layout analysis, table structure recognition, and OCR (Optical Character Recognition) using our vision model. This contributes to the additional time required.
### 2. Why does RAGFlow require more resources than other projects?
RAGFlow has a number of built-in models for document structure parsing, which account for the additional computational resources.
## Feature
### Which architectures or devices does RAGFlow support?
### 1. Which architectures or devices does RAGFlow support?
ARM64 and Ascend GPU are not supported.
Currently, we only support x86 CPU and Nvidia GPU.
### Do you offer an API for integration with third-party applications?
### 2. Do you offer an API for integration with third-party applications?
These APIs are still in development. Contributions are welcome.
The corresponding APIs are now available. See the [Conversation API](./conversation_api.md) for more information.
### Do you support stream output?
### 3. Do you support stream output?
No, this feature is still in development. Contributions are welcome.
### Is it possible to share dialogue through URL?
### 4. Is it possible to share dialogue through URL?
Yes, this feature is now available.
### 5. Do you support multiple rounds of dialogues, i.e., referencing previous dialogues as context for the current dialogue?
This feature and the related APIs are still in development. Contributions are welcome.
### Do you support multiple rounds of dialogues, i.e., referencing previous dialogues as context for the current dialogue?
This feature and the related APIs are still in development. Contributions are welcome.
## Troubleshooting
## Configurations
### 1. Issues with docker images
### How to increase the length of RAGFlow responses?
#### 1.1 How to build the RAGFlow image from scratch?
1. Right click the desired dialog to display the **Chat Configuration** window.
2. Switch to the **Model Setting** tab and adjust the **Max Tokens** slider to get the desired length.
3. Click **OK** to confirm your change.
```
$ git clone https://github.com/infiniflow/ragflow.git
$ cd ragflow
$ docker build -t infiniflow/ragflow:v0.3.2 .
$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d
```
#### 1.2 `process "/bin/sh -c cd ./web && npm i && npm run build"` failed
### What does Empty response mean? How to set it?
1. Check your network from within Docker, for example:
```bash
curl https://hf-mirror.com
```
You limit what the system responds to what you specify in **Empty response** if nothing is retrieved from your knowledge base. If you do not specify anything in **Empty response**, you let your LLM improvise, giving it a chance to hallucinate.
2. If your network works fine, the issue lies with the Docker network configuration. Replace the Docker building command:
```bash
docker build -t infiniflow/ragflow:vX.Y.Z.
```
With this:
```bash
docker build -t infiniflow/ragflow:vX.Y.Z. --network host
```
### Can I set the base URL for OpenAI somewhere?
### 2. Issues with huggingface models
![](https://github.com/infiniflow/ragflow/assets/93570324/8cfb6fa4-8a97-415d-b9fa-b6f405a055f3)
#### 2.1 Cannot access https://huggingface.co
A *locally* deployed RAGflow downloads OCR and embedding modules from [Huggingface website](https://huggingface.co) by default. If your machine is unable to access this site, the following error occurs and PDF parsing fails:
```
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/hub/models--InfiniFlow--deepdoc/snapshots/be0c1e50eef6047b412d1800aa89aba4d275f997/ocr.res'
```
To fix this issue, use https://hf-mirror.com instead:
### How to run RAGFlow with a locally deployed LLM?
1. Stop all containers and remove all related resources:
You can use Ollama to deploy local LLM. See [here](https://github.com/infiniflow/ragflow/blob/main/docs/ollama.md) for more information.
```bash
cd ragflow/docker/
docker compose down
```
### How to link up ragflow and ollama servers?
2. Replace `https://huggingface.co` with `https://hf-mirror.com` in **ragflow/docker/docker-compose.yml**.
3. Start up the server:
- If RAGFlow is locally deployed, ensure that your RAGFlow and Ollama are in the same LAN.
- If you are using our online demo, ensure that the IP address of your Ollama server is public and accessible.
```bash
docker compose up -d
```
### How to configure RAGFlow to respond with 100% matched results, rather than utilizing LLM?
#### 2.2. `MaxRetryError: HTTPSConnectionPool(host='hf-mirror.com', port=443)`
1. Click the **Knowledge Base** tab in the middle top of the page.
2. Right click the desired knowledge base to display the **Configuration** dialogue.
3. Choose **Q&A** as the chunk method and click **Save** to confirm your change.
This error suggests that you do not have Internet access or are unable to connect to hf-mirror.com. Try the following:
## Debugging
1. Manually download the resource files from [huggingface.co/InfiniFlow/deepdoc](https://huggingface.co/InfiniFlow/deepdoc) to your local folder **~/deepdoc**.
2. Add a volumes to **docker-compose.yml**, for example:
```
- ~/deepdoc:/ragflow/rag/res/deepdoc
```
### How to handle `WARNING: can't find /raglof/rag/res/borker.tm`?
#### 2.3 `FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/hub/models--InfiniFlow--deepdoc/snapshots/FileNotFoundError: [Errno 2] No such file or directory: '/ragflow/rag/res/deepdoc/ocr.res'be0c1e50eef6047b412d1800aa89aba4d275f997/ocr.res'`
1. Check your network from within Docker, for example:
```bash
curl https://hf-mirror.com
```
2. Run `ifconfig` to check the `mtu` value. If the server's `mtu` is `1450` while the NIC's `mtu` in the container is `1500`, this mismatch may cause network instability. Adjust the `mtu` policy as follows:
```
vim docker-compose-base.yml
# Original configuration
networks:
ragflow:
driver: bridge
# Modified configuration
networks:
ragflow:
driver: bridge
driver_opts:
com.docker.network.driver.mtu: 1450
```
### 3. Issues with RAGFlow servers
#### 3.1 `WARNING: can't find /raglof/rag/res/borker.tm`
Ignore this warning and continue. All system warnings can be ignored.
### How to handle `Realtime synonym is disabled, since no redis connection`?
#### 3.2 `network anomaly There is an abnormality in your network and you cannot connect to the server.`
![anomaly](https://github.com/infiniflow/ragflow/assets/93570324/beb7ad10-92e4-4a58-8886-bfb7cbd09e5d)
You will not log in to RAGFlow unless the server is fully initialized. Run `docker logs -f ragflow-server`.
*The server is successfully initialized, if your system displays the following:*
```
____ ______ __
/ __ \ ____ _ ____ _ / ____// /____ _ __
/ /_/ // __ `// __ `// /_ / // __ \| | /| / /
/ _, _// /_/ // /_/ // __/ / // /_/ /| |/ |/ /
/_/ |_| \__,_/ \__, //_/ /_/ \____/ |__/|__/
/____/
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:9380
* Running on http://x.x.x.x:9380
INFO:werkzeug:Press CTRL+C to quit
```
### 4. Issues with RAGFlow backend services
#### 4.1 `dependency failed to start: container ragflow-mysql is unhealthy`
`dependency failed to start: container ragflow-mysql is unhealthy` means that your MySQL container failed to start. Try replacing `mysql:5.7.18` with `mariadb:10.5.8` in **docker-compose-base.yml**.
#### 4.2 `Realtime synonym is disabled, since no redis connection`
Ignore this warning and continue. All system warnings can be ignored.
![](https://github.com/infiniflow/ragflow/assets/93570324/ef5a6194-084a-4fe3-bdd5-1c025b40865c)
### Why does it take so long to parse a 2MB document?
#### 4.3 Why does it take so long to parse a 2MB document?
Parsing requests have to wait in queue due to limited server resources. We are currently enhancing our algorithms and increasing computing power.
### How to handle `Index failure`?
#### 4.4 Why does my document parsing stall at under one percent?
![stall](https://github.com/infiniflow/ragflow/assets/93570324/3589cc25-c733-47d5-bbfc-fedb74a3da50)
If your RAGFlow is deployed *locally*, try the following:
1. Check the log of your RAGFlow server to see if it is running properly:
```bash
docker logs -f ragflow-server
```
2. Check if the **task_executor.py** process exists.
3. Check if your RAGFlow server can access hf-mirror.com or huggingface.com.
#### 4.5 `Index failure`
An index failure usually indicates an unavailable Elasticsearch service.
### How to check the log of RAGFlow?
#### 4.6 How to check the log of RAGFlow?
```bash
tail -f path_to_ragflow/docker/ragflow-logs/rag/*.log
```
### How to check the status of each component in RAGFlow?
#### 4.7 How to check the status of each component in RAGFlow?
```bash
$ docker ps
@ -108,13 +212,13 @@ $ docker ps
*The system displays the following if all your RAGFlow components are running properly:*
```
5bc45806b680 infiniflow/ragflow:v1.0 "./entrypoint.sh" 11 hours ago Up 11 hours 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp, 0.0.0.0:9380->9380/tcp, :::9380->9380/tcp ragflow-server
5bc45806b680 infiniflow/ragflow:v0.3.2 "./entrypoint.sh" 11 hours ago Up 11 hours 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp, 0.0.0.0:9380->9380/tcp, :::9380->9380/tcp ragflow-server
91220e3285dd docker.elastic.co/elasticsearch/elasticsearch:8.11.3 "/bin/tini -- /usr/l…" 11 hours ago Up 11 hours (healthy) 9300/tcp, 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp ragflow-es-01
d8c86f06c56b mysql:5.7.18 "docker-entrypoint.s…" 7 days ago Up 16 seconds (healthy) 0.0.0.0:3306->3306/tcp, :::3306->3306/tcp ragflow-mysql
cd29bcb254bc quay.io/minio/minio:RELEASE.2023-12-20T01-00-02Z "/usr/bin/docker-ent…" 2 weeks ago Up 11 hours 0.0.0.0:9001->9001/tcp, :::9001->9001/tcp, 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp ragflow-minio
```
### How to handle `Exception: Can't connect to ES cluster`?
#### 4.8 `Exception: Can't connect to ES cluster`
1. Check the status of your Elasticsearch component:
@ -126,7 +230,7 @@ $ docker ps
91220e3285dd docker.elastic.co/elasticsearch/elasticsearch:8.11.3 "/bin/tini -- /usr/l…" 11 hours ago Up 11 hours (healthy) 9300/tcp, 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp ragflow-es-01
```
2. If your container keeps restarting, ensure `vm.max_map_count` >= 262144 as per [this README](https://github.com/infiniflow/ragflow?tab=readme-ov-file#-start-up-the-server).
2. If your container keeps restarting, ensure `vm.max_map_count` >= 262144 as per [this README](https://github.com/infiniflow/ragflow?tab=readme-ov-file#-start-up-the-server). Updating the `vm.max_map_count` value in **/etc/sysctl.conf** is required, if you wish to keep your change permanent. This configuration works only for Linux.
3. If your issue persists, ensure that the ES host setting is correct:
@ -142,12 +246,104 @@ $ docker ps
```
### How to handle `{"data":null,"retcode":100,"retmsg":"<NotFound '404: Not Found'>"}`?
#### 4.9 `{"data":null,"retcode":100,"retmsg":"<NotFound '404: Not Found'>"}`
Your IP address or port number may be incorrect. If you are using the default configurations, enter http://<IP_OF_YOUR_MACHINE> (**NOT `localhost`, NOT 9380, AND NO PORT NUMBER REQUIRED!**) in your browser. This should work.
Your IP address or port number may be incorrect. If you are using the default configurations, enter http://<IP_OF_YOUR_MACHINE> (**NOT 9380, AND NO PORT NUMBER REQUIRED!**) in your browser. This should work.
#### 4.10 `Ollama - Mistral instance running at 127.0.0.1:11434 but cannot add Ollama as model in RagFlow`
A correct Ollama IP address and port is crucial to adding models to Ollama:
- If you are on demo.ragflow.io, ensure that the server hosting Ollama has a publicly accessible IP address.Note that 127.0.0.1 is not a publicly accessible IP address.
- If you deploy RAGFlow locally, ensure that Ollama and RAGFlow are in the same LAN and can comunicate with each other.
#### 4.11 Do you offer examples of using deepdoc to parse PDF or other files?
Yes, we do. See the Python files under the **rag/app** folder.
#### 4.12 Why did I fail to upload a 10MB+ file to my locally deployed RAGFlow?
You probably forgot to update the **MAX_CONTENT_LENGTH** environment variable:
1. Add environment variable `MAX_CONTENT_LENGTH` to **ragflow/docker/.env**:
```
MAX_CONTENT_LENGTH=100000000
```
2. Update **docker-compose.yml**:
```
environment:
- MAX_CONTENT_LENGTH=${MAX_CONTENT_LENGTH}
```
3. Restart the RAGFlow server:
```
docker compose up ragflow -d
```
*Now you should be able to upload files of sizes less than 100MB.*
#### 4.13 `Table 'rag_flow.document' doesn't exist`
This exception occurs when starting up the RAGFlow server. Try the following:
1. Prolong the sleep time: Go to **docker/entrypoint.sh**, locate line 26, and replace `sleep 60` with `sleep 280`.
2. If using Windows, ensure that the **entrypoint.sh** has LF end-lines.
3. Go to **docker/docker-compose.yml**, add the following:
```
./entrypoint.sh:/ragflow/entrypoint.sh
```
4. Change directory:
```bash
cd docker
```
5. Stop the RAGFlow server:
```bash
docker compose stop
```
6. Restart up the RAGFlow server:
```bash
docker compose up
```
#### 4.14 `hint : 102 Fail to access model Connection error`
![hint102](https://github.com/infiniflow/ragflow/assets/93570324/6633d892-b4f8-49b5-9a0a-37a0a8fba3d2)
1. Ensure that the RAGFlow server can access the base URL.
2. Do not forget to append **/v1/** to **http://IP:port**:
**http://IP:port/v1/**
## Usage
### 1. How to increase the length of RAGFlow responses?
1. Right click the desired dialog to display the **Chat Configuration** window.
2. Switch to the **Model Setting** tab and adjust the **Max Tokens** slider to get the desired length.
3. Click **OK** to confirm your change.
### 2. What does Empty response mean? How to set it?
You limit what the system responds to what you specify in **Empty response** if nothing is retrieved from your knowledge base. If you do not specify anything in **Empty response**, you let your LLM improvise, giving it a chance to hallucinate.
### 3. Can I set the base URL for OpenAI somewhere?
![](https://github.com/infiniflow/ragflow/assets/93570324/8cfb6fa4-8a97-415d-b9fa-b6f405a055f3)
### 4. How to run RAGFlow with a locally deployed LLM?
You can use Ollama to deploy local LLM. See [here](https://github.com/infiniflow/ragflow/blob/main/docs/ollama.md) for more information.
### 5. How to link up ragflow and ollama servers?
- If RAGFlow is locally deployed, ensure that your RAGFlow and Ollama are in the same LAN.
- If you are using our online demo, ensure that the IP address of your Ollama server is public and accessible.
### 6. How to configure RAGFlow to respond with 100% matched results, rather than utilizing LLM?
1. Click the **Knowledge Base** tab in the middle top of the page.
2. Right click the desired knowledge base to display the **Configuration** dialogue.
3. Choose **Q&A** as the chunk method and click **Save** to confirm your change.
### Do I need to connect to Redis?
No, connecting to Redis is not required to use RAGFlow.

View File

@ -31,7 +31,7 @@ $ docker exec -it ollama ollama run mistral
<img src="https://github.com/infiniflow/ragflow/assets/12318111/a9df198a-226d-4f30-b8d7-829f00256d46" width="1300"/>
</div>
> Base URL: Enter the base URL where the Ollama service is accessible, like, http://<your-ollama-endpoint-domain>:11434
> Base URL: Enter the base URL where the Ollama service is accessible, like, `http://<your-ollama-endpoint-domain>:11434`.
- Use Ollama Models.

View File

@ -31,7 +31,7 @@ $ xinference launch -u mistral --model-name mistral-v0.1 --size-in-billions 7 --
<img src="https://github.com/infiniflow/ragflow/assets/12318111/bcbf4d7a-ade6-44c7-ad5f-0a92c8a73789" width="1300"/>
</div>
> Base URL: Enter the base URL where the Xinference service is accessible, like, http://<your-xinference-endpoint-domain>:9997/v1
> Base URL: Enter the base URL where the Xinference service is accessible, like, `http://<your-xinference-endpoint-domain>:9997/v1`.
- Use Xinference Models.

67
printEnvironment.sh Normal file
View File

@ -0,0 +1,67 @@
#!/bin/bash
# The function is used to obtain distribution information
get_distro_info() {
local distro_id=$(lsb_release -i -s 2>/dev/null)
local distro_version=$(lsb_release -r -s 2>/dev/null)
local kernel_version=$(uname -r)
# If lsd_release is not available, try parsing the/etc/* - release file
if [ -z "$distro_id" ] || [ -z "$distro_version" ]; then
distro_id=$(grep '^ID=' /etc/*-release | cut -d= -f2 | tr -d '"')
distro_version=$(grep '^VERSION_ID=' /etc/*-release | cut -d= -f2 | tr -d '"')
fi
echo "$distro_id $distro_version (Kernel version: $kernel_version)"
}
# get Git repo name
git_repo_name=''
if git rev-parse --is-inside-work-tree > /dev/null 2>&1; then
git_repo_name=$(basename "$(git rev-parse --show-toplevel)")
if [ $? -ne 0 ]; then
git_repo_name="(Can't get repo name)"
fi
else
git_repo_name="It NOT a Git repo"
fi
# get CPU type
cpu_model=$(uname -m)
# get memory size
memory_size=$(free -h | grep Mem | awk '{print $2}')
# get docker version
docker_version=''
if command -v docker &> /dev/null; then
docker_version=$(docker --version | cut -d ' ' -f3)
else
docker_version="Docker not installed"
fi
# get python version
python_version=''
if command -v python &> /dev/null; then
python_version=$(python --version | cut -d ' ' -f2)
else
python_version="Python not installed"
fi
# Print all infomation
echo "Current Repo: $git_repo_name"
# get Commit ID
git_version=$(git log -1 --pretty=format:'%h')
if [ -z "$git_version" ]; then
echo "Commit Id: The current directory is not a Git repository, or the Git command is not installed."
else
echo "Commit Id: $git_version"
fi
echo "Operating system: $(get_distro_info)"
echo "CPU Type: $cpu_model"
echo "Memory: $memory_size"
echo "Docker Version: $docker_version"
echo "Python Version: $python_version"

View File

@ -11,11 +11,13 @@
# limitations under the License.
#
import copy
from tika import parser
import re
from io import BytesIO
from rag.nlp import bullets_category, is_english, tokenize, remove_contents_table, \
hierarchical_merge, make_colon_as_title, naive_merge, random_choices, tokenize_table, add_positions, tokenize_chunks
hierarchical_merge, make_colon_as_title, naive_merge, random_choices, tokenize_table, add_positions, \
tokenize_chunks, find_codec
from rag.nlp import huqie
from deepdoc.parser import PdfParser, DocxParser, PlainParser
@ -36,7 +38,7 @@ class Pdf(PdfParser):
start = timer()
self._layouts_rec(zoomin)
callback(0.67, "Layout analysis finished")
print("paddle layouts:", timer() - start)
print("layouts:", timer() - start)
self._table_transformer_job(zoomin)
callback(0.68, "Table analysis finished")
self._text_merge()
@ -66,7 +68,7 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
doc["title_sm_tks"] = huqie.qieqie(doc["title_tks"])
pdf_parser = None
sections, tbls = [], []
if re.search(r"\.docx?$", filename, re.IGNORECASE):
if re.search(r"\.docx$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
doc_parser = DocxParser()
# TODO: table of contents need to be removed
@ -74,6 +76,7 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
binary if binary else filename, from_page=from_page, to_page=to_page)
remove_contents_table(sections, eng=is_english(
random_choices([t for t, _ in sections], k=200)))
tbls = [((None, lns), None) for lns in tbls]
callback(0.8, "Finish parsing.")
elif re.search(r"\.pdf$", filename, re.IGNORECASE):
@ -87,7 +90,8 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
callback(0.1, "Start to parse.")
txt = ""
if binary:
txt = binary.decode("utf-8")
encoding = find_codec(binary)
txt = binary.decode(encoding)
else:
with open(filename, "r") as f:
while True:
@ -101,9 +105,19 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
random_choices([t for t, _ in sections], k=200)))
callback(0.8, "Finish parsing.")
elif re.search(r"\.doc$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
binary = BytesIO(binary)
doc_parsed = parser.from_buffer(binary)
sections = doc_parsed['content'].split('\n')
sections = [(l, "") for l in sections if l]
remove_contents_table(sections, eng=is_english(
random_choices([t for t, _ in sections], k=200)))
callback(0.8, "Finish parsing.")
else:
raise NotImplementedError(
"file type not supported yet(docx, pdf, txt supported)")
"file type not supported yet(doc, docx, pdf, txt supported)")
make_colon_as_title(sections)
bull = bullets_category(

View File

@ -11,13 +11,14 @@
# limitations under the License.
#
import copy
from tika import parser
import re
from io import BytesIO
from docx import Document
from api.db import ParserType
from rag.nlp import bullets_category, is_english, tokenize, remove_contents_table, hierarchical_merge, \
make_colon_as_title, add_positions, tokenize_chunks
make_colon_as_title, add_positions, tokenize_chunks, find_codec
from rag.nlp import huqie
from deepdoc.parser import PdfParser, DocxParser, PlainParser
from rag.settings import cron_logger
@ -71,7 +72,7 @@ class Pdf(PdfParser):
start = timer()
self._layouts_rec(zoomin)
callback(0.67, "Layout analysis finished")
cron_logger.info("paddle layouts:".format(
cron_logger.info("layouts:".format(
(timer() - start) / (self.total_page + 0.1)))
self._naive_vertical_merge()
@ -93,7 +94,7 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
doc["title_sm_tks"] = huqie.qieqie(doc["title_tks"])
pdf_parser = None
sections = []
if re.search(r"\.docx?$", filename, re.IGNORECASE):
if re.search(r"\.docx$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
for txt in Docx()(filename, binary):
sections.append(txt)
@ -111,7 +112,8 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
callback(0.1, "Start to parse.")
txt = ""
if binary:
txt = binary.decode("utf-8")
encoding = find_codec(binary)
txt = binary.decode(encoding)
else:
with open(filename, "r") as f:
while True:
@ -122,9 +124,18 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
sections = txt.split("\n")
sections = [l for l in sections if l]
callback(0.8, "Finish parsing.")
elif re.search(r"\.doc$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
binary = BytesIO(binary)
doc_parsed = parser.from_buffer(binary)
sections = doc_parsed['content'].split('\n')
sections = [l for l in sections if l]
callback(0.8, "Finish parsing.")
else:
raise NotImplementedError(
"file type not supported yet(docx, pdf, txt supported)")
"file type not supported yet(doc, docx, pdf, txt supported)")
# is it English
eng = lang.lower() == "english" # is_english(sections)

View File

@ -32,7 +32,7 @@ class Pdf(PdfParser):
self._layouts_rec(zoomin)
callback(0.65, "Layout analysis finished.")
print("paddle layouts:", timer() - start)
print("layouts:", timer() - start)
self._table_transformer_job(zoomin)
callback(0.67, "Table analysis finished.")
self._text_merge()

View File

@ -10,12 +10,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
from tika import parser
from io import BytesIO
from docx import Document
from timeit import default_timer as timer
import re
from deepdoc.parser.pdf_parser import PlainParser
from rag.app import laws
from rag.nlp import huqie, is_english, tokenize, naive_merge, tokenize_table, add_positions, tokenize_chunks
from rag.nlp import huqie, naive_merge, tokenize_table, tokenize_chunks, find_codec
from deepdoc.parser import PdfParser, ExcelParser, DocxParser
from rag.settings import cron_logger
@ -67,6 +68,7 @@ class Docx(DocxParser):
class Pdf(PdfParser):
def __call__(self, filename, binary=None, from_page=0,
to_page=100000, zoomin=3, callback=None):
start = timer()
callback(msg="OCR is running...")
self.__images__(
filename if not binary else binary,
@ -76,12 +78,11 @@ class Pdf(PdfParser):
callback
)
callback(msg="OCR finished")
cron_logger.info("OCR({}~{}): {}".format(from_page, to_page, timer() - start))
from timeit import default_timer as timer
start = timer()
self._layouts_rec(zoomin)
callback(0.63, "Layout analysis finished.")
print("paddle layouts:", timer() - start)
self._table_transformer_job(zoomin)
callback(0.65, "Table analysis finished.")
self._text_merge()
@ -91,8 +92,7 @@ class Pdf(PdfParser):
self._concat_downward()
#self._filter_forpages()
cron_logger.info("paddle layouts:".format(
(timer() - start) / (self.total_page + 0.1)))
cron_logger.info("layouts: {}".format(timer() - start))
return [(b["text"], self._line_tag(b, zoomin))
for b in self.boxes], tbls
@ -118,7 +118,7 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
res = []
pdf_parser = None
sections = []
if re.search(r"\.docx?$", filename, re.IGNORECASE):
if re.search(r"\.docx$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
sections, tbls = Docx()(filename, binary)
res = tokenize_table(tbls, doc, eng)
@ -136,11 +136,12 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
excel_parser = ExcelParser()
sections = [(excel_parser.html(binary), "")]
elif re.search(r"\.txt$", filename, re.IGNORECASE):
elif re.search(r"\.(txt|md)$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
txt = ""
if binary:
txt = binary.decode("utf-8")
encoding = find_codec(binary)
txt = binary.decode(encoding)
else:
with open(filename, "r") as f:
while True:
@ -152,16 +153,26 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
sections = [(l, "") for l in sections if l]
callback(0.8, "Finish parsing.")
elif re.search(r"\.doc$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
binary = BytesIO(binary)
doc_parsed = parser.from_buffer(binary)
sections = doc_parsed['content'].split('\n')
sections = [(l, "") for l in sections if l]
callback(0.8, "Finish parsing.")
else:
raise NotImplementedError(
"file type not supported yet(docx, pdf, txt supported)")
"file type not supported yet(doc, docx, pdf, txt supported)")
st = timer()
chunks = naive_merge(
sections, parser_config.get(
"chunk_token_num", 128), parser_config.get(
"delimiter", "\n!?。;!?"))
res.extend(tokenize_chunks(chunks, doc, eng, pdf_parser))
cron_logger.info("naive_merge({}): {}".format(filename, timer() - st))
return res

View File

@ -10,9 +10,11 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
from tika import parser
from io import BytesIO
import re
from rag.app import laws
from rag.nlp import huqie, tokenize
from rag.nlp import huqie, tokenize, find_codec
from deepdoc.parser import PdfParser, ExcelParser, PlainParser
@ -33,7 +35,7 @@ class Pdf(PdfParser):
start = timer()
self._layouts_rec(zoomin, drop=False)
callback(0.63, "Layout analysis finished.")
print("paddle layouts:", timer() - start)
print("layouts:", timer() - start)
self._table_transformer_job(zoomin)
callback(0.65, "Table analysis finished.")
self._text_merge()
@ -60,7 +62,7 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
eng = lang.lower() == "english" # is_english(cks)
if re.search(r"\.docx?$", filename, re.IGNORECASE):
if re.search(r"\.docx$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
sections = [txt for txt in laws.Docx()(filename, binary) if txt]
callback(0.8, "Finish parsing.")
@ -82,7 +84,8 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
callback(0.1, "Start to parse.")
txt = ""
if binary:
txt = binary.decode("utf-8")
encoding = find_codec(binary)
txt = binary.decode(encoding)
else:
with open(filename, "r") as f:
while True:
@ -94,9 +97,17 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
sections = [s for s in sections if s]
callback(0.8, "Finish parsing.")
elif re.search(r"\.doc$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")
binary = BytesIO(binary)
doc_parsed = parser.from_buffer(binary)
sections = doc_parsed['content'].split('\n')
sections = [l for l in sections if l]
callback(0.8, "Finish parsing.")
else:
raise NotImplementedError(
"file type not supported yet(docx, pdf, txt supported)")
"file type not supported yet(doc, docx, pdf, txt supported)")
doc = {
"docnm_kwd": filename,

View File

@ -42,7 +42,7 @@ class Pdf(PdfParser):
start = timer()
self._layouts_rec(zoomin)
callback(0.63, "Layout analysis finished")
print("paddle layouts:", timer() - start)
print("layouts:", timer() - start)
self._table_transformer_job(zoomin)
callback(0.68, "Table analysis finished")
self._text_merge()
@ -78,7 +78,7 @@ class Pdf(PdfParser):
title = ""
authors = []
i = 0
while i < min(32, len(self.boxes)):
while i < min(32, len(self.boxes)-1):
b = self.boxes[i]
i += 1
if b.get("layoutno", "").find("title") >= 0:

View File

@ -15,7 +15,7 @@ from copy import deepcopy
from io import BytesIO
from nltk import word_tokenize
from openpyxl import load_workbook
from rag.nlp import is_english, random_choices
from rag.nlp import is_english, random_choices, find_codec
from rag.nlp import huqie
from deepdoc.parser import ExcelParser
@ -106,7 +106,8 @@ def chunk(filename, binary=None, lang="Chinese", callback=None, **kwargs):
callback(0.1, "Start to parse.")
txt = ""
if binary:
txt = binary.decode("utf-8")
encoding = find_codec(binary)
txt = binary.decode(encoding)
else:
with open(filename, "r") as f:
while True:

View File

@ -20,7 +20,7 @@ from openpyxl import load_workbook
from dateutil.parser import parse as datetime_parse
from api.db.services.knowledgebase_service import KnowledgebaseService
from rag.nlp import huqie, is_english, tokenize
from rag.nlp import huqie, is_english, tokenize, find_codec
from deepdoc.parser import ExcelParser
@ -147,7 +147,8 @@ def chunk(filename, binary=None, from_page=0, to_page=10000000000,
callback(0.1, "Start to parse.")
txt = ""
if binary:
txt = binary.decode("utf-8")
encoding = find_codec(binary)
txt = binary.decode(encoding)
else:
with open(filename, "r") as f:
while True:
@ -199,7 +200,7 @@ def chunk(filename, binary=None, from_page=0, to_page=10000000000,
re.sub(
r"(/.*|[^]+?|\([^()]+?\))",
"",
n),
str(n)),
'_')[0] for n in clmns]
clmn_tys = []
for j in range(len(clmns)):
@ -208,7 +209,7 @@ def chunk(filename, binary=None, from_page=0, to_page=10000000000,
df[clmns[j]] = cln
if ty == "text":
txts.extend([str(c) for c in cln if c])
clmns_map = [(py_clmns[i].lower() + fieds_map[clmn_tys[i]], clmns[i].replace("_", " "))
clmns_map = [(py_clmns[i].lower() + fieds_map[clmn_tys[i]], str(clmns[i]).replace("_", " "))
for i in range(len(clmns))]
eng = lang.lower() == "english" # is_english(txts)
@ -223,8 +224,8 @@ def chunk(filename, binary=None, from_page=0, to_page=10000000000,
continue
if not str(row[clmns[j]]):
continue
#if pd.isna(row[clmns[j]]):
# continue
if pd.isna(row[clmns[j]]):
continue
fld = clmns_map[j][0]
d[fld] = row[clmns[j]] if clmn_tys[j] != "text" else huqie.qie(
row[clmns[j]])

View File

@ -24,8 +24,8 @@ EmbeddingModel = {
"Xinference": XinferenceEmbed,
"Tongyi-Qianwen": HuEmbedding, #QWenEmbed,
"ZHIPU-AI": ZhipuEmbed,
"Moonshot": HuEmbedding,
"FastEmbed": FastEmbed
"FastEmbed": FastEmbed,
"Youdao": YoudaoEmbed
}

View File

@ -153,7 +153,7 @@ class OllamaChat(Base):
options=options
)
ans = response["message"]["content"].strip()
return ans, response["eval_count"] + response["prompt_eval_count"]
return ans, response["eval_count"] + response.get("prompt_eval_count", 0)
except Exception as e:
return "**ERROR**: " + str(e), 0

View File

@ -14,13 +14,14 @@
# limitations under the License.
#
from typing import Optional
from huggingface_hub import snapshot_download
from zhipuai import ZhipuAI
import os
from abc import ABC
from ollama import Client
import dashscope
from openai import OpenAI
from fastembed import TextEmbedding
from FlagEmbedding import FlagModel
import torch
import numpy as np
@ -28,16 +29,20 @@ import numpy as np
from api.utils.file_utils import get_project_base_directory
from rag.utils import num_tokens_from_string
try:
flag_model = FlagModel(os.path.join(
get_project_base_directory(),
"rag/res/bge-large-zh-v1.5"),
get_project_base_directory(),
"rag/res/bge-large-zh-v1.5"),
query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章:",
use_fp16=torch.cuda.is_available())
except Exception as e:
model_dir = snapshot_download(repo_id="BAAI/bge-large-zh-v1.5",
local_dir=os.path.join(get_project_base_directory(), "rag/res/bge-large-zh-v1.5"),
local_dir_use_symlinks=False)
flag_model = FlagModel(model_dir,
query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章:",
use_fp16=torch.cuda.is_available())
except Exception as e:
flag_model = FlagModel("BAAI/bge-large-zh-v1.5",
query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章:",
use_fp16=torch.cuda.is_available())
class Base(ABC):
@ -82,8 +87,10 @@ class HuEmbedding(Base):
class OpenAIEmbed(Base):
def __init__(self, key, model_name="text-embedding-ada-002", base_url="https://api.openai.com/v1"):
if not base_url: base_url="https://api.openai.com/v1"
def __init__(self, key, model_name="text-embedding-ada-002",
base_url="https://api.openai.com/v1"):
if not base_url:
base_url = "https://api.openai.com/v1"
self.client = OpenAI(api_key=key, base_url=base_url)
self.model_name = model_name
@ -142,7 +149,7 @@ class ZhipuEmbed(Base):
tks_num = 0
for txt in texts:
res = self.client.embeddings.create(input=txt,
model=self.model_name)
model=self.model_name)
arr.append(res.data[0].embedding)
tks_num += res.usage.total_tokens
return np.array(arr), tks_num
@ -163,14 +170,14 @@ class OllamaEmbed(Base):
tks_num = 0
for txt in texts:
res = self.client.embeddings(prompt=txt,
model=self.model_name)
model=self.model_name)
arr.append(res["embedding"])
tks_num += 128
return np.array(arr), tks_num
def encode_queries(self, text):
res = self.client.embeddings(prompt=text,
model=self.model_name)
model=self.model_name)
return np.array(res["embedding"]), 128
@ -183,10 +190,12 @@ class FastEmbed(Base):
threads: Optional[int] = None,
**kwargs,
):
from fastembed import TextEmbedding
self._model = TextEmbedding(model_name, cache_dir, threads, **kwargs)
def encode(self, texts: list, batch_size=32):
# Using the internal tokenizer to encode the texts and get the total number of tokens
# Using the internal tokenizer to encode the texts and get the total
# number of tokens
encodings = self._model.model.tokenizer.encode_batch(texts)
total_tokens = sum(len(e) for e in encodings)
@ -195,7 +204,8 @@ class FastEmbed(Base):
return np.array(embeddings), total_tokens
def encode_queries(self, text: str):
# Using the internal tokenizer to encode the texts and get the total number of tokens
# Using the internal tokenizer to encode the texts and get the total
# number of tokens
encoding = self._model.model.tokenizer.encode(text)
embedding = next(self._model.query_embed(text)).tolist()
@ -218,3 +228,33 @@ class XinferenceEmbed(Base):
model=self.model_name)
return np.array(res.data[0].embedding), res.usage.total_tokens
class YoudaoEmbed(Base):
_client = None
def __init__(self, key=None, model_name="maidalun1020/bce-embedding-base_v1", **kwargs):
from BCEmbedding import EmbeddingModel as qanthing
if not YoudaoEmbed._client:
try:
print("LOADING BCE...")
YoudaoEmbed._client = qanthing(model_name_or_path=os.path.join(
get_project_base_directory(),
"rag/res/bce-embedding-base_v1"))
except Exception as e:
YoudaoEmbed._client = qanthing(
model_name_or_path=model_name.replace(
"maidalun1020", "InfiniFlow"))
def encode(self, texts: list, batch_size=10):
res = []
token_count = 0
for t in texts:
token_count += num_tokens_from_string(t)
for i in range(0, len(texts), batch_size):
embds = YoudaoEmbed._client.encode(texts[i:i + batch_size])
res.extend(embds)
return np.array(res), token_count
def encode_queries(self, text):
embds = YoudaoEmbed._client.encode([text])
return np.array(embds[0]), num_tokens_from_string(text)

View File

@ -6,6 +6,35 @@ from . import huqie
import re
import copy
all_codecs = [
'utf-8', 'gb2312', 'gbk', 'utf_16', 'ascii', 'big5', 'big5hkscs',
'cp037', 'cp273', 'cp424', 'cp437',
'cp500', 'cp720', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857',
'cp858', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869',
'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1125',
'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256',
'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr',
'gb2312', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2',
'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1',
'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7',
'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_11', 'iso8859_13',
'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_t', 'koi8_u',
'kz1048', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman',
'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213',
'utf_32', 'utf_32_be', 'utf_32_le''utf_16_be', 'utf_16_le', 'utf_7'
]
def find_codec(blob):
global all_codecs
for c in all_codecs:
try:
blob.decode(c)
return c
except Exception as e:
pass
return "utf-8"
BULLET_PATTERN = [[
r"第[零一二三四五六七八九十百0-9]+(分?编|部分)",

View File

@ -8,6 +8,7 @@ import re
import string
import sys
from hanziconv import HanziConv
from huggingface_hub import snapshot_download
from nltk import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
from api.utils.file_utils import get_project_base_directory

View File

@ -46,7 +46,7 @@ class Dealer:
"k": topk,
"similarity": sim,
"num_candidates": topk * 2,
"query_vector": list(qv)
"query_vector": [float(v) for v in qv]
}
def search(self, req, idxnm, emb_mdl=None):
@ -68,7 +68,7 @@ class Dealer:
pg = int(req.get("page", 1)) - 1
ps = int(req.get("size", 1000))
topk = int(req.get("topk", 1024))
src = req.get("fields", ["docnm_kwd", "content_ltks", "kb_id", "img_id",
src = req.get("fields", ["docnm_kwd", "content_ltks", "kb_id", "img_id", "title_tks", "important_kwd",
"image_id", "doc_id", "q_512_vec", "q_768_vec", "position_int",
"q_1024_vec", "q_1536_vec", "available_int", "content_with_weight"])
@ -237,7 +237,7 @@ class Dealer:
pieces_.append(t)
es_logger.info("{} => {}".format(answer, pieces_))
if not pieces_:
return answer
return answer, set([])
ans_v, _ = embd_mdl.encode(pieces_)
assert len(ans_v[0]) == len(chunk_v[0]), "The dimension of query and chunk do not match: {} vs. {}".format(
@ -289,8 +289,18 @@ class Dealer:
sres.field[i].get("q_%d_vec" % len(sres.query_vector), "\t".join(["0"] * len(sres.query_vector)))) for i in sres.ids]
if not ins_embd:
return [], [], []
ins_tw = [sres.field[i][cfield].split(" ")
for i in sres.ids]
for i in sres.ids:
if isinstance(sres.field[i].get("important_kwd", []), str):
sres.field[i]["important_kwd"] = [sres.field[i]["important_kwd"]]
ins_tw = []
for i in sres.ids:
content_ltks = sres.field[i][cfield].split(" ")
title_tks = [t for t in sres.field[i].get("title_tks", "").split(" ") if t]
important_kwd = sres.field[i].get("important_kwd", [])
tks = content_ltks + title_tks + important_kwd
ins_tw.append(tks)
sim, tksim, vtsim = self.qryr.hybrid_similarity(sres.query_vector,
ins_embd,
keywords,
@ -368,7 +378,7 @@ class Dealer:
def sql_retrieval(self, sql, fetch_size=128, format="json"):
from api.settings import chat_logger
sql = re.sub(r"[ ]+", " ", sql)
sql = re.sub(r"[ `]+", " ", sql)
sql = sql.replace("%", "")
es_logger.info(f"Get es sql: {sql}")
replaces = []

View File

@ -25,6 +25,11 @@ SUBPROCESS_STD_LOG_NAME = "std.log"
ES = get_base_config("es", {})
MINIO = decrypt_database_config(name="minio")
try:
REDIS = decrypt_database_config(name="redis")
except Exception as e:
REDIS = {}
pass
DOC_MAXIMUM_SIZE = 128 * 1024 * 1024
# Logger
@ -39,5 +44,6 @@ LoggerFactory.LEVEL = 30
es_logger = getLogger("es")
minio_logger = getLogger("minio")
cron_logger = getLogger("cron_logger")
cron_logger.setLevel(20)
chunk_logger = getLogger("chunk_logger")
database_logger = getLogger("database")

43
rag/svr/cache_file_svr.py Normal file
View File

@ -0,0 +1,43 @@
import random
import time
import traceback
from api.db.db_models import close_connection
from api.db.services.task_service import TaskService
from rag.utils import MINIO
from rag.utils.redis_conn import REDIS_CONN
def collect():
doc_locations = TaskService.get_ongoing_doc_name()
#print(tasks)
if len(doc_locations) == 0:
time.sleep(1)
return
return doc_locations
def main():
locations = collect()
if not locations:return
print("TASKS:", len(locations))
for kb_id, loc in locations:
try:
if REDIS_CONN.is_alive():
try:
key = "{}/{}".format(kb_id, loc)
if REDIS_CONN.exist(key):continue
file_bin = MINIO.get(kb_id, loc)
REDIS_CONN.transaction(key, file_bin, 12 * 60)
print("CACHE:", loc)
except Exception as e:
traceback.print_stack(e)
except Exception as e:
traceback.print_stack(e)
if __name__ == "__main__":
while True:
main()
close_connection()
time.sleep(1)

View File

@ -32,6 +32,9 @@ from api.db.services.document_service import DocumentService
from api.settings import database_logger
from api.utils import get_format_time, get_uuid
from api.utils.file_utils import get_project_base_directory
from rag.utils.redis_conn import REDIS_CONN
from api.db.db_models import init_database_tables as init_web_db
from api.db.init_data import init_web_data
def collect(tm):
@ -84,10 +87,16 @@ def dispatch():
tsks = []
try:
file_bin = MINIO.get(r["kb_id"], r["location"])
if REDIS_CONN.is_alive():
try:
REDIS_CONN.set("{}/{}".format(r["kb_id"], r["location"]), file_bin, 12*60)
except Exception as e:
cron_logger.warning("Put into redis[EXCEPTION]:" + str(e))
if r["type"] == FileType.PDF.value:
do_layout = r["parser_config"].get("layout_recognize", True)
pages = PdfParser.total_page_number(
r["name"], MINIO.get(r["kb_id"], r["location"]))
pages = PdfParser.total_page_number(r["name"], file_bin)
page_size = r["parser_config"].get("task_page_size", 12)
if r["parser_id"] == "paper":
page_size = r["parser_config"].get("task_page_size", 22)
@ -110,8 +119,7 @@ def dispatch():
elif r["parser_id"] == "table":
rn = HuExcelParser.row_number(
r["name"], MINIO.get(
r["kb_id"], r["location"]))
r["name"], file_bin)
for i in range(0, rn, 3000):
task = new_task()
task["from_page"] = i
@ -159,7 +167,7 @@ def update_progress():
info = {
"process_duation": datetime.timestamp(
datetime.now()) -
d["process_begin_at"].timestamp(),
d["process_begin_at"].timestamp(),
"run": status}
if prg != 0:
info["progress"] = prg
@ -175,6 +183,9 @@ if __name__ == "__main__":
peewee_logger.propagate = False
peewee_logger.addHandler(database_logger.handlers[0])
peewee_logger.setLevel(database_logger.level)
# init db
init_web_db()
init_web_data()
while True:
dispatch()

View File

@ -24,7 +24,7 @@ import sys
import time
import traceback
from functools import partial
from rag.utils import MINIO
from api.db.db_models import close_connection
from rag.settings import database_logger
from rag.settings import cron_logger, DOC_MAXIMUM_SIZE
@ -34,7 +34,7 @@ from elasticsearch_dsl import Q
from multiprocessing.context import TimeoutError
from api.db.services.task_service import TaskService
from rag.utils import ELASTICSEARCH
from rag.utils import MINIO
from timeit import default_timer as timer
from rag.utils import rmSpace, findMaxTm
from rag.nlp import search
@ -47,6 +47,7 @@ from api.db import LLMType, ParserType
from api.db.services.document_service import DocumentService
from api.db.services.llm_service import LLMBundle
from api.utils.file_utils import get_project_base_directory
from rag.utils.redis_conn import REDIS_CONN
BATCH_SIZE = 64
@ -92,6 +93,7 @@ def set_progress(task_id, from_page=0, to_page=-1,
def collect(comm, mod, tm):
tasks = TaskService.get_tasks(tm, mod, comm)
#print(tasks)
if len(tasks) == 0:
time.sleep(1)
return pd.DataFrame()
@ -103,11 +105,22 @@ def collect(comm, mod, tm):
def get_minio_binary(bucket, name):
global MINIO
if REDIS_CONN.is_alive():
try:
for _ in range(30):
if REDIS_CONN.exist("{}/{}".format(bucket, name)):
time.sleep(1)
break
time.sleep(1)
r = REDIS_CONN.get("{}/{}".format(bucket, name))
if r: return r
cron_logger.warning("Cache missing: {}".format(name))
except Exception as e:
cron_logger.warning("Get redis[EXCEPTION]:" + str(e))
return MINIO.get(bucket, name)
def build(row):
from timeit import default_timer as timer
if row["size"] > DOC_MAXIMUM_SIZE:
set_progress(row["id"], prog=-1, msg="File size exceeds( <= %dMb )" %
(int(DOC_MAXIMUM_SIZE / 1024 / 1024)))
@ -156,6 +169,7 @@ def build(row):
"doc_id": row["doc_id"],
"kb_id": [str(row["kb_id"])]
}
el = 0
for ck in cks:
d = copy.deepcopy(doc)
d.update(ck)
@ -175,10 +189,13 @@ def build(row):
else:
d["image"].save(output_buffer, format='JPEG')
st = timer()
MINIO.put(row["kb_id"], d["_id"], output_buffer.getvalue())
el += timer() - st
d["img_id"] = "{}-{}".format(row["kb_id"], d["_id"])
del d["image"]
docs.append(d)
cron_logger.info("MINIO PUT({}):{}".format(row["name"], el))
return docs
@ -243,13 +260,17 @@ def main(comm, mod):
tmf = open(tm_fnm, "a+")
for _, r in rows.iterrows():
callback = partial(set_progress, r["id"], r["from_page"], r["to_page"])
#callback(random.random()/10., "Task has been received.")
try:
embd_mdl = LLMBundle(r["tenant_id"], LLMType.EMBEDDING)
embd_mdl = LLMBundle(r["tenant_id"], LLMType.EMBEDDING, llm_name=r["embd_id"], lang=r["language"])
except Exception as e:
traceback.print_stack(e)
callback(prog=-1, msg=str(e))
continue
st = timer()
cks = build(r)
cron_logger.info("Build chunks({}): {}".format(r["name"], timer()-st))
if cks is None:
continue
if not cks:
@ -261,17 +282,21 @@ def main(comm, mod):
callback(
msg="Finished slicing files(%d). Start to embedding the content." %
len(cks))
st = timer()
try:
tk_count = embedding(cks, embd_mdl, r["parser_config"], callback)
except Exception as e:
callback(-1, "Embedding error:{}".format(str(e)))
cron_logger.error(str(e))
tk_count = 0
cron_logger.info("Embedding elapsed({}): {}".format(r["name"], timer()-st))
callback(msg="Finished embedding! Start to build index!")
callback(msg="Finished embedding({})! Start to build index!".format(timer()-st))
init_kb(r)
chunk_count = len(set([c["_id"] for c in cks]))
st = timer()
es_r = ELASTICSEARCH.bulk(cks, search.index_name(r["tenant_id"]))
cron_logger.info("Indexing elapsed({}): {}".format(r["name"], timer()-st))
if es_r:
callback(-1, "Index failure!")
ELASTICSEARCH.deleteByQuery(
@ -286,8 +311,8 @@ def main(comm, mod):
DocumentService.increment_chunk_num(
r["doc_id"], r["kb_id"], tk_count, chunk_count, 0)
cron_logger.info(
"Chunk doc({}), token({}), chunks({})".format(
r["id"], tk_count, len(cks)))
"Chunk doc({}), token({}), chunks({}), elapsed:{}".format(
r["id"], tk_count, len(cks), timer()-st))
tmf.write(str(r["update_time"]) + "\n")
tmf.close()
@ -299,9 +324,8 @@ if __name__ == "__main__":
peewee_logger.addHandler(database_logger.handlers[0])
peewee_logger.setLevel(database_logger.level)
from mpi4py import MPI
comm = MPI.COMM_WORLD
#from mpi4py import MPI
#comm = MPI.COMM_WORLD
while True:
main(int(sys.argv[2]), int(sys.argv[1]))
close_connection()

View File

@ -2,6 +2,7 @@ import re
import json
import time
import copy
import elasticsearch
from elastic_transport import ConnectionTimeout
from elasticsearch import Elasticsearch

View File

@ -56,7 +56,6 @@ class HuMinio(object):
except Exception as e:
minio_logger.error(f"Fail rm {bucket}/{fnm}: " + str(e))
def get(self, bucket, fnm):
for _ in range(1):
try:

74
rag/utils/redis_conn.py Normal file
View File

@ -0,0 +1,74 @@
import json
import redis
import logging
from rag import settings
from rag.utils import singleton
@singleton
class RedisDB:
def __init__(self):
self.REDIS = None
self.config = settings.REDIS
self.__open__()
def __open__(self):
try:
self.REDIS = redis.Redis(host=self.config.get("host", "redis").split(":")[0],
port=int(self.config.get("host", ":6379").split(":")[1]),
db=int(self.config.get("db", 1)),
password=self.config.get("password"))
except Exception as e:
logging.warning("Redis can't be connected.")
return self.REDIS
def is_alive(self):
return self.REDIS is not None
def exist(self, k):
if not self.REDIS: return
try:
return self.REDIS.exists(k)
except Exception as e:
logging.warning("[EXCEPTION]exist" + str(k) + "||" + str(e))
self.__open__()
def get(self, k):
if not self.REDIS: return
try:
return self.REDIS.get(k)
except Exception as e:
logging.warning("[EXCEPTION]get" + str(k) + "||" + str(e))
self.__open__()
def set_obj(self, k, obj, exp=3600):
try:
self.REDIS.set(k, json.dumps(obj, ensure_ascii=False), exp)
return True
except Exception as e:
logging.warning("[EXCEPTION]set_obj" + str(k) + "||" + str(e))
self.__open__()
return False
def set(self, k, v, exp=3600):
try:
self.REDIS.set(k, v, exp)
return True
except Exception as e:
logging.warning("[EXCEPTION]set" + str(k) + "||" + str(e))
self.__open__()
return False
def transaction(self, key, value, exp=3600):
try:
pipeline = self.REDIS.pipeline(transaction=True)
pipeline.set(key, value, exp, nx=True)
pipeline.execute()
return True
except Exception as e:
logging.warning("[EXCEPTION]set" + str(key) + "||" + str(e))
self.__open__()
return False
REDIS_CONN = RedisDB()

View File

@ -19,7 +19,7 @@ cryptography==42.0.5
dashscope==1.14.1
datasets==2.17.1
datrie==0.8.2
demjson==2.2.4
demjson3==3.0.6
dill==0.3.8
distro==1.9.0
elastic-transport==8.12.0
@ -116,6 +116,7 @@ sniffio==1.3.1
StrEnum==0.4.15
sympy==1.12
threadpoolctl==3.3.0
tika==2.6.0
tiktoken==0.6.0
tokenizers==0.15.2
torch==2.2.1
@ -132,3 +133,5 @@ xpinyin==0.7.6
xxhash==3.4.1
yarl==1.9.4
zhipuai==2.0.1
BCEmbedding
loguru==0.7.2

View File

@ -27,7 +27,7 @@ export default defineConfig({
devtool: 'source-map',
proxy: {
'/v1': {
target: 'http://123.60.95.134:9380/',
target: 'http://192.168.200.233:9380/',
changeOrigin: true,
// pathRewrite: { '^/v1': '/v1' },
},

138
web/externals.d.ts vendored Normal file
View File

@ -0,0 +1,138 @@
// This file is generated by Umi automatically
// DO NOT CHANGE IT MANUALLY!
type CSSModuleClasses = { readonly [key: string]: string };
declare module '*.css' {
const classes: CSSModuleClasses;
export default classes;
}
declare module '*.scss' {
const classes: CSSModuleClasses;
export default classes;
}
declare module '*.sass' {
const classes: CSSModuleClasses;
export default classes;
}
declare module '*.less' {
const classes: CSSModuleClasses;
export default classes;
}
declare module '*.styl' {
const classes: CSSModuleClasses;
export default classes;
}
declare module '*.stylus' {
const classes: CSSModuleClasses;
export default classes;
}
// images
declare module '*.jpg' {
const src: string;
export default src;
}
declare module '*.jpeg' {
const src: string;
export default src;
}
declare module '*.png' {
const src: string;
export default src;
}
declare module '*.gif' {
const src: string;
export default src;
}
declare module '*.svg' {
import * as React from 'react';
export const ReactComponent: React.FunctionComponent<
React.SVGProps<SVGSVGElement> & { title?: string }
>;
const src: string;
export default src;
}
declare module '*.ico' {
const src: string;
export default src;
}
declare module '*.webp' {
const src: string;
export default src;
}
declare module '*.avif' {
const src: string;
export default src;
}
// media
declare module '*.mp4' {
const src: string;
export default src;
}
declare module '*.webm' {
const src: string;
export default src;
}
declare module '*.ogg' {
const src: string;
export default src;
}
declare module '*.mp3' {
const src: string;
export default src;
}
declare module '*.wav' {
const src: string;
export default src;
}
declare module '*.flac' {
const src: string;
export default src;
}
declare module '*.aac' {
const src: string;
export default src;
}
// fonts
declare module '*.woff' {
const src: string;
export default src;
}
declare module '*.woff2' {
const src: string;
export default src;
}
declare module '*.eot' {
const src: string;
export default src;
}
declare module '*.ttf' {
const src: string;
export default src;
}
declare module '*.otf' {
const src: string;
export default src;
}
// other
declare module '*.wasm' {
const initWasm: (
options: WebAssembly.Imports,
) => Promise<WebAssembly.Exports>;
export default initWasm;
}
declare module '*.webmanifest' {
const src: string;
export default src;
}
declare module '*.pdf' {
const src: string;
export default src;
}
declare module '*.txt' {
const src: string;
export default src;
}

345
web/package-lock.json generated
View File

@ -13,19 +13,21 @@
"antd": "^5.12.7",
"axios": "^1.6.3",
"classnames": "^2.5.1",
"dayjs": "^1.11.10",
"i18next": "^23.7.16",
"js-base64": "^3.7.5",
"jsencrypt": "^3.3.2",
"lodash": "^4.17.21",
"moment": "^2.30.1",
"rc-tween-one": "^3.0.6",
"react-chat-elements": "^12.0.13",
"react-copy-to-clipboard": "^5.1.0",
"react-i18next": "^14.0.0",
"react-infinite-scroll-component": "^6.1.0",
"react-markdown": "^9.0.1",
"react-pdf-highlighter": "^6.1.0",
"react-string-replace": "^1.1.1",
"react-syntax-highlighter": "^15.5.0",
"recharts": "^2.12.4",
"remark-gfm": "^4.0.0",
"umi": "^4.0.90",
"umi-request": "^1.4.0",
@ -36,6 +38,7 @@
"@react-dev-inspector/umi4-plugin": "^2.0.1",
"@types/lodash": "^4.14.202",
"@types/react": "^18.0.33",
"@types/react-copy-to-clipboard": "^5.0.7",
"@types/react-dom": "^18.0.11",
"@types/react-syntax-highlighter": "^15.5.11",
"@types/uuid": "^9.0.8",
@ -2676,6 +2679,60 @@
"@babel/types": "^7.20.7"
}
},
"node_modules/@types/d3-array": {
"version": "3.2.1",
"resolved": "https://registry.npmmirror.com/@types/d3-array/-/d3-array-3.2.1.tgz",
"integrity": "sha512-Y2Jn2idRrLzUfAKV2LyRImR+y4oa2AntrgID95SHJxuMUrkNXmanDSed71sRNZysveJVt1hLLemQZIady0FpEg=="
},
"node_modules/@types/d3-color": {
"version": "3.1.3",
"resolved": "https://registry.npmmirror.com/@types/d3-color/-/d3-color-3.1.3.tgz",
"integrity": "sha512-iO90scth9WAbmgv7ogoq57O9YpKmFBbmoEoCHDB2xMBY0+/KVrqAaCDyCE16dUspeOvIxFFRI+0sEtqDqy2b4A=="
},
"node_modules/@types/d3-ease": {
"version": "3.0.2",
"resolved": "https://registry.npmmirror.com/@types/d3-ease/-/d3-ease-3.0.2.tgz",
"integrity": "sha512-NcV1JjO5oDzoK26oMzbILE6HW7uVXOHLQvHshBUW4UMdZGfiY6v5BeQwh9a9tCzv+CeefZQHJt5SRgK154RtiA=="
},
"node_modules/@types/d3-interpolate": {
"version": "3.0.4",
"resolved": "https://registry.npmmirror.com/@types/d3-interpolate/-/d3-interpolate-3.0.4.tgz",
"integrity": "sha512-mgLPETlrpVV1YRJIglr4Ez47g7Yxjl1lj7YKsiMCb27VJH9W8NVM6Bb9d8kkpG/uAQS5AmbA48q2IAolKKo1MA==",
"dependencies": {
"@types/d3-color": "*"
}
},
"node_modules/@types/d3-path": {
"version": "3.1.0",
"resolved": "https://registry.npmmirror.com/@types/d3-path/-/d3-path-3.1.0.tgz",
"integrity": "sha512-P2dlU/q51fkOc/Gfl3Ul9kicV7l+ra934qBFXCFhrZMOL6du1TM0pm1ThYvENukyOn5h9v+yMJ9Fn5JK4QozrQ=="
},
"node_modules/@types/d3-scale": {
"version": "4.0.8",
"resolved": "https://registry.npmmirror.com/@types/d3-scale/-/d3-scale-4.0.8.tgz",
"integrity": "sha512-gkK1VVTr5iNiYJ7vWDI+yUFFlszhNMtVeneJ6lUTKPjprsvLLI9/tgEGiXJOnlINJA8FyA88gfnQsHbybVZrYQ==",
"dependencies": {
"@types/d3-time": "*"
}
},
"node_modules/@types/d3-shape": {
"version": "3.1.6",
"resolved": "https://registry.npmmirror.com/@types/d3-shape/-/d3-shape-3.1.6.tgz",
"integrity": "sha512-5KKk5aKGu2I+O6SONMYSNflgiP0WfZIQvVUMan50wHsLG1G94JlxEVnCpQARfTtzytuY0p/9PXXZb3I7giofIA==",
"dependencies": {
"@types/d3-path": "*"
}
},
"node_modules/@types/d3-time": {
"version": "3.0.3",
"resolved": "https://registry.npmmirror.com/@types/d3-time/-/d3-time-3.0.3.tgz",
"integrity": "sha512-2p6olUZ4w3s+07q3Tm2dbiMZy5pCDfYwtLXXHUnVzXgQlZ/OyPtUz6OL382BkOuGlLXqfT+wqv8Fw2v8/0geBw=="
},
"node_modules/@types/d3-timer": {
"version": "3.0.2",
"resolved": "https://registry.npmmirror.com/@types/d3-timer/-/d3-timer-3.0.2.tgz",
"integrity": "sha512-Ps3T8E8dZDam6fUyNiMkekK3XUsaUEik+idO9/YjPtfj2qruF8tFBXS7XhtE4iIXBLxhmLjP3SXpLhVf21I9Lw=="
},
"node_modules/@types/debug": {
"version": "4.1.12",
"resolved": "https://registry.npmmirror.com/@types/debug/-/debug-4.1.12.tgz",
@ -2884,6 +2941,15 @@
"csstype": "^3.0.2"
}
},
"node_modules/@types/react-copy-to-clipboard": {
"version": "5.0.7",
"resolved": "https://registry.npmmirror.com/@types/react-copy-to-clipboard/-/react-copy-to-clipboard-5.0.7.tgz",
"integrity": "sha512-Gft19D+as4M+9Whq1oglhmK49vqPhcLzk8WfvfLvaYMIPYanyfLy0+CwFucMJfdKoSFyySPmkkWn8/E6voQXjQ==",
"dev": true,
"dependencies": {
"@types/react": "*"
}
},
"node_modules/@types/react-dom": {
"version": "18.2.18",
"resolved": "https://registry.npmmirror.com/@types/react-dom/-/react-dom-18.2.18.tgz",
@ -5832,6 +5898,14 @@
"node": ">=12"
}
},
"node_modules/clsx": {
"version": "2.1.0",
"resolved": "https://registry.npmmirror.com/clsx/-/clsx-2.1.0.tgz",
"integrity": "sha512-m3iNNWpd9rl3jvvcBnu70ylMdrXt8Vlq4HYadnU5fwcOtvkSQWPmj7amUcDT2qYI7risszBjI5AUIUox9D16pg==",
"engines": {
"node": ">=6"
}
},
"node_modules/coa": {
"version": "2.0.2",
"resolved": "https://registry.npmmirror.com/coa/-/coa-2.0.2.tgz",
@ -6640,11 +6714,132 @@
"resolved": "https://registry.npmmirror.com/d3-array/-/d3-array-1.2.4.tgz",
"integrity": "sha512-KHW6M86R+FUPYGb3R5XiYjXPq7VzwxZ22buHhAEVG5ztoEcZZMLov530mmccaqA1GghZArjQV46fuc8kUqhhHw=="
},
"node_modules/d3-color": {
"version": "3.1.0",
"resolved": "https://registry.npmmirror.com/d3-color/-/d3-color-3.1.0.tgz",
"integrity": "sha512-zg/chbXyeBtMQ1LbD/WSoW2DpC3I0mpmPdW+ynRTj/x2DAWYrIY7qeZIHidozwV24m4iavr15lNwIwLxRmOxhA==",
"engines": {
"node": ">=12"
}
},
"node_modules/d3-ease": {
"version": "3.0.1",
"resolved": "https://registry.npmmirror.com/d3-ease/-/d3-ease-3.0.1.tgz",
"integrity": "sha512-wR/XK3D3XcLIZwpbvQwQ5fK+8Ykds1ip7A2Txe0yxncXSdq1L9skcG7blcedkOX+ZcgxGAmLX1FrRGbADwzi0w==",
"engines": {
"node": ">=12"
}
},
"node_modules/d3-format": {
"version": "3.1.0",
"resolved": "https://registry.npmmirror.com/d3-format/-/d3-format-3.1.0.tgz",
"integrity": "sha512-YyUI6AEuY/Wpt8KWLgZHsIU86atmikuoOmCfommt0LYHiQSPjvX2AcFc38PX0CBpr2RCyZhjex+NS/LPOv6YqA==",
"engines": {
"node": ">=12"
}
},
"node_modules/d3-interpolate": {
"version": "3.0.1",
"resolved": "https://registry.npmmirror.com/d3-interpolate/-/d3-interpolate-3.0.1.tgz",
"integrity": "sha512-3bYs1rOD33uo8aqJfKP3JWPAibgw8Zm2+L9vBKEHJ2Rg+viTR7o5Mmv5mZcieN+FRYaAOWX5SJATX6k1PWz72g==",
"dependencies": {
"d3-color": "1 - 3"
},
"engines": {
"node": ">=12"
}
},
"node_modules/d3-path": {
"version": "3.1.0",
"resolved": "https://registry.npmmirror.com/d3-path/-/d3-path-3.1.0.tgz",
"integrity": "sha512-p3KP5HCf/bvjBSSKuXid6Zqijx7wIfNW+J/maPs+iwR35at5JCbLUT0LzF1cnjbCHWhqzQTIN2Jpe8pRebIEFQ==",
"engines": {
"node": ">=12"
}
},
"node_modules/d3-polygon": {
"version": "1.0.6",
"resolved": "https://registry.npmmirror.com/d3-polygon/-/d3-polygon-1.0.6.tgz",
"integrity": "sha512-k+RF7WvI08PC8reEoXa/w2nSg5AUMTi+peBD9cmFc+0ixHfbs4QmxxkarVal1IkVkgxVuk9JSHhJURHiyHKAuQ=="
},
"node_modules/d3-scale": {
"version": "4.0.2",
"resolved": "https://registry.npmmirror.com/d3-scale/-/d3-scale-4.0.2.tgz",
"integrity": "sha512-GZW464g1SH7ag3Y7hXjf8RoUuAFIqklOAq3MRl4OaWabTFJY9PN/E1YklhXLh+OQ3fM9yS2nOkCoS+WLZ6kvxQ==",
"dependencies": {
"d3-array": "2.10.0 - 3",
"d3-format": "1 - 3",
"d3-interpolate": "1.2.0 - 3",
"d3-time": "2.1.1 - 3",
"d3-time-format": "2 - 4"
},
"engines": {
"node": ">=12"
}
},
"node_modules/d3-scale/node_modules/d3-array": {
"version": "3.2.4",
"resolved": "https://registry.npmmirror.com/d3-array/-/d3-array-3.2.4.tgz",
"integrity": "sha512-tdQAmyA18i4J7wprpYq8ClcxZy3SC31QMeByyCFyRt7BVHdREQZ5lpzoe5mFEYZUWe+oq8HBvk9JjpibyEV4Jg==",
"dependencies": {
"internmap": "1 - 2"
},
"engines": {
"node": ">=12"
}
},
"node_modules/d3-shape": {
"version": "3.2.0",
"resolved": "https://registry.npmmirror.com/d3-shape/-/d3-shape-3.2.0.tgz",
"integrity": "sha512-SaLBuwGm3MOViRq2ABk3eLoxwZELpH6zhl3FbAoJ7Vm1gofKx6El1Ib5z23NUEhF9AsGl7y+dzLe5Cw2AArGTA==",
"dependencies": {
"d3-path": "^3.1.0"
},
"engines": {
"node": ">=12"
}
},
"node_modules/d3-time": {
"version": "3.1.0",
"resolved": "https://registry.npmmirror.com/d3-time/-/d3-time-3.1.0.tgz",
"integrity": "sha512-VqKjzBLejbSMT4IgbmVgDjpkYrNWUYJnbCGo874u7MMKIWsILRX+OpX/gTk8MqjpT1A/c6HY2dCA77ZN0lkQ2Q==",
"dependencies": {
"d3-array": "2 - 3"
},
"engines": {
"node": ">=12"
}
},
"node_modules/d3-time-format": {
"version": "4.1.0",
"resolved": "https://registry.npmmirror.com/d3-time-format/-/d3-time-format-4.1.0.tgz",
"integrity": "sha512-dJxPBlzC7NugB2PDLwo9Q8JiTR3M3e4/XANkreKSUxF8vvXKqm1Yfq4Q5dl8budlunRVlUUaDUgFt7eA8D6NLg==",
"dependencies": {
"d3-time": "1 - 3"
},
"engines": {
"node": ">=12"
}
},
"node_modules/d3-time/node_modules/d3-array": {
"version": "3.2.4",
"resolved": "https://registry.npmmirror.com/d3-array/-/d3-array-3.2.4.tgz",
"integrity": "sha512-tdQAmyA18i4J7wprpYq8ClcxZy3SC31QMeByyCFyRt7BVHdREQZ5lpzoe5mFEYZUWe+oq8HBvk9JjpibyEV4Jg==",
"dependencies": {
"internmap": "1 - 2"
},
"engines": {
"node": ">=12"
}
},
"node_modules/d3-timer": {
"version": "3.0.1",
"resolved": "https://registry.npmmirror.com/d3-timer/-/d3-timer-3.0.1.tgz",
"integrity": "sha512-ndfJ/JxxMd3nw31uyKoY2naivF+r29V+Lc0svZxe1JvvIRmi8hUsrMvdOwgS1o6uBHmiz91geQ0ylPP0aj1VUA==",
"engines": {
"node": ">=12"
}
},
"node_modules/data-uri-to-buffer": {
"version": "4.0.1",
"resolved": "https://registry.npmmirror.com/data-uri-to-buffer/-/data-uri-to-buffer-4.0.1.tgz",
@ -6705,6 +6900,11 @@
"node": ">=0.10.0"
}
},
"node_modules/decimal.js-light": {
"version": "2.5.1",
"resolved": "https://registry.npmmirror.com/decimal.js-light/-/decimal.js-light-2.5.1.tgz",
"integrity": "sha512-qIMFpTMZmny+MMIitAB6D7iVPEorVw6YQRWkvarTkT4tBeSLLiHzcwj6q0MmYSFCiVpiqPJTJEYIrpcPzVEIvg=="
},
"node_modules/decode-named-character-reference": {
"version": "1.0.2",
"resolved": "https://registry.npmmirror.com/decode-named-character-reference/-/decode-named-character-reference-1.0.2.tgz",
@ -7032,6 +7232,15 @@
"utila": "~0.4"
}
},
"node_modules/dom-helpers": {
"version": "5.2.1",
"resolved": "https://registry.npmmirror.com/dom-helpers/-/dom-helpers-5.2.1.tgz",
"integrity": "sha512-nRCa7CK3VTrM2NmGkIy4cbK7IZlgBE/PYMn55rrXefr5xXDP0LdtfPnblFDoVdcAfslJ7or6iqAUnx0CCGIWQA==",
"dependencies": {
"@babel/runtime": "^7.8.7",
"csstype": "^3.0.2"
}
},
"node_modules/dom-serializer": {
"version": "1.4.1",
"resolved": "https://registry.npmmirror.com/dom-serializer/-/dom-serializer-1.4.1.tgz",
@ -8151,6 +8360,11 @@
"es5-ext": "~0.10.14"
}
},
"node_modules/eventemitter3": {
"version": "4.0.7",
"resolved": "https://registry.npmmirror.com/eventemitter3/-/eventemitter3-4.0.7.tgz",
"integrity": "sha512-8guHBZCwKnFhYdHr2ysuRWErTwhoN2X8XELRlrRwpmfeY2jjuUN4taQMsULKUVo1K4DvZl+0pgfyoysHxvmvEw=="
},
"node_modules/events": {
"version": "3.3.0",
"resolved": "https://registry.npmmirror.com/events/-/events-3.3.0.tgz",
@ -8356,6 +8570,14 @@
"resolved": "https://registry.npmmirror.com/fast-deep-equal/-/fast-deep-equal-3.1.3.tgz",
"integrity": "sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q=="
},
"node_modules/fast-equals": {
"version": "5.0.1",
"resolved": "https://registry.npmmirror.com/fast-equals/-/fast-equals-5.0.1.tgz",
"integrity": "sha512-WF1Wi8PwwSY7/6Kx0vKXtw8RwuSGoM1bvDaJbu7MxDlR1vovZjIAKrnzyrThgAjm6JDTu0fVgWXDlMGspodfoQ==",
"engines": {
"node": ">=6.0.0"
}
},
"node_modules/fast-glob": {
"version": "3.2.12",
"resolved": "https://registry.npmmirror.com/fast-glob/-/fast-glob-3.2.12.tgz",
@ -9693,6 +9915,14 @@
"node": ">= 0.4"
}
},
"node_modules/internmap": {
"version": "2.0.3",
"resolved": "https://registry.npmmirror.com/internmap/-/internmap-2.0.3.tgz",
"integrity": "sha512-5Hh7Y1wQbvY5ooGgPbDaL5iYLAPzMTUrjMulskHLH6wnv/A+1q5rgEaiuqEjB+oxGXIVZs1FF+R/KPN3ZSQYYg==",
"engines": {
"node": ">=12"
}
},
"node_modules/intersection-observer": {
"version": "0.12.2",
"resolved": "https://registry.npmmirror.com/intersection-observer/-/intersection-observer-0.12.2.tgz",
@ -11925,6 +12155,7 @@
"version": "2.30.1",
"resolved": "https://registry.npmmirror.com/moment/-/moment-2.30.1.tgz",
"integrity": "sha512-uEmtNhbDOrWPFS+hdjFCBfy9f2YoyzRpwcl+DqpC6taX21FzsTLQVbMV/W7PzNSX6x/bhC1zA3c2UQ5NzH6how==",
"devOptional": true,
"engines": {
"node": "*"
}
@ -14356,6 +14587,18 @@
"react-dom": "18.2.0"
}
},
"node_modules/react-copy-to-clipboard": {
"version": "5.1.0",
"resolved": "https://registry.npmmirror.com/react-copy-to-clipboard/-/react-copy-to-clipboard-5.1.0.tgz",
"integrity": "sha512-k61RsNgAayIJNoy9yDsYzDe/yAZAzEbEgcz3DZMhF686LEyukcE1hzurxe85JandPUG+yTfGVFzuEw3xt8WP/A==",
"dependencies": {
"copy-to-clipboard": "^3.3.1",
"prop-types": "^15.8.1"
},
"peerDependencies": {
"react": "^15.3.0 || 16 || 17 || 18"
}
},
"node_modules/react-dev-inspector": {
"version": "2.0.1",
"resolved": "https://registry.npmmirror.com/react-dev-inspector/-/react-dev-inspector-2.0.1.tgz",
@ -14934,6 +15177,20 @@
"react": ">=15"
}
},
"node_modules/react-smooth": {
"version": "4.0.1",
"resolved": "https://registry.npmmirror.com/react-smooth/-/react-smooth-4.0.1.tgz",
"integrity": "sha512-OE4hm7XqR0jNOq3Qmk9mFLyd6p2+j6bvbPJ7qlB7+oo0eNcL2l7WQzG6MBnT3EXY6xzkLMUBec3AfewJdA0J8w==",
"dependencies": {
"fast-equals": "^5.0.1",
"prop-types": "^15.8.1",
"react-transition-group": "^4.4.5"
},
"peerDependencies": {
"react": "^16.8.0 || ^17.0.0 || ^18.0.0",
"react-dom": "^16.8.0 || ^17.0.0 || ^18.0.0"
}
},
"node_modules/react-spinkit": {
"version": "3.0.0",
"resolved": "https://registry.npmmirror.com/react-spinkit/-/react-spinkit-3.0.0.tgz",
@ -14968,6 +15225,21 @@
"react": ">= 0.14.0"
}
},
"node_modules/react-transition-group": {
"version": "4.4.5",
"resolved": "https://registry.npmmirror.com/react-transition-group/-/react-transition-group-4.4.5.tgz",
"integrity": "sha512-pZcd1MCJoiKiBR2NRxeCRg13uCXbydPnmB4EOeRrY7480qNWO8IIgQG6zlDkm6uRMsURXPuKq0GWtiM59a5Q6g==",
"dependencies": {
"@babel/runtime": "^7.5.5",
"dom-helpers": "^5.0.1",
"loose-envify": "^1.4.0",
"prop-types": "^15.6.2"
},
"peerDependencies": {
"react": ">=16.6.0",
"react-dom": ">=16.6.0"
}
},
"node_modules/reactcss": {
"version": "1.2.3",
"resolved": "https://registry.npmmirror.com/reactcss/-/reactcss-1.2.3.tgz",
@ -15145,6 +15417,41 @@
"node": ">= 12.13.0"
}
},
"node_modules/recharts": {
"version": "2.12.4",
"resolved": "https://registry.npmmirror.com/recharts/-/recharts-2.12.4.tgz",
"integrity": "sha512-dM4skmk4fDKEDjL9MNunxv6zcTxePGVEzRnLDXALRpfJ85JoQ0P0APJ/CoJlmnQI0gPjBlOkjzrwrfQrRST3KA==",
"dependencies": {
"clsx": "^2.0.0",
"eventemitter3": "^4.0.1",
"lodash": "^4.17.21",
"react-is": "^16.10.2",
"react-smooth": "^4.0.0",
"recharts-scale": "^0.4.4",
"tiny-invariant": "^1.3.1",
"victory-vendor": "^36.6.8"
},
"engines": {
"node": ">=14"
},
"peerDependencies": {
"react": "^16.0.0 || ^17.0.0 || ^18.0.0",
"react-dom": "^16.0.0 || ^17.0.0 || ^18.0.0"
}
},
"node_modules/recharts-scale": {
"version": "0.4.5",
"resolved": "https://registry.npmmirror.com/recharts-scale/-/recharts-scale-0.4.5.tgz",
"integrity": "sha512-kivNFO+0OcUNu7jQquLXAxz1FIwZj8nrj+YkOKc5694NbjCvcT6aSZiIzNzd2Kul4o4rTto8QVR9lMNtxD4G1w==",
"dependencies": {
"decimal.js-light": "^2.4.1"
}
},
"node_modules/recharts/node_modules/react-is": {
"version": "16.13.1",
"resolved": "https://registry.npmmirror.com/react-is/-/react-is-16.13.1.tgz",
"integrity": "sha512-24e6ynE2H+OKt4kqsOvNd8kBpV65zoxbA4BVsEOB3ARVWQki/DHzaUoC5KuON/BiccDaCCTZBuOcfZs70kR8bQ=="
},
"node_modules/recursive-readdir": {
"version": "2.2.3",
"resolved": "https://registry.npmmirror.com/recursive-readdir/-/recursive-readdir-2.2.3.tgz",
@ -17000,9 +17307,7 @@
"node_modules/tiny-invariant": {
"version": "1.3.1",
"resolved": "https://registry.npmmirror.com/tiny-invariant/-/tiny-invariant-1.3.1.tgz",
"integrity": "sha512-AD5ih2NlSssTCwsMznbvwMZpJ1cbhkGd2uueNxzv2jDlEeZdU04JQfRnggJQ8DrcVBGjAsCKwFBbDlVNtEMlzw==",
"dev": true,
"peer": true
"integrity": "sha512-AD5ih2NlSssTCwsMznbvwMZpJ1cbhkGd2uueNxzv2jDlEeZdU04JQfRnggJQ8DrcVBGjAsCKwFBbDlVNtEMlzw=="
},
"node_modules/tiny-warning": {
"version": "1.0.3",
@ -18221,6 +18526,38 @@
"unist-util-stringify-position": "^4.0.0"
}
},
"node_modules/victory-vendor": {
"version": "36.9.2",
"resolved": "https://registry.npmmirror.com/victory-vendor/-/victory-vendor-36.9.2.tgz",
"integrity": "sha512-PnpQQMuxlwYdocC8fIJqVXvkeViHYzotI+NJrCuav0ZYFoq912ZHBk3mCeuj+5/VpodOjPe1z0Fk2ihgzlXqjQ==",
"dependencies": {
"@types/d3-array": "^3.0.3",
"@types/d3-ease": "^3.0.0",
"@types/d3-interpolate": "^3.0.1",
"@types/d3-scale": "^4.0.2",
"@types/d3-shape": "^3.1.0",
"@types/d3-time": "^3.0.0",
"@types/d3-timer": "^3.0.0",
"d3-array": "^3.1.6",
"d3-ease": "^3.0.1",
"d3-interpolate": "^3.0.1",
"d3-scale": "^4.0.2",
"d3-shape": "^3.1.0",
"d3-time": "^3.0.0",
"d3-timer": "^3.0.1"
}
},
"node_modules/victory-vendor/node_modules/d3-array": {
"version": "3.2.4",
"resolved": "https://registry.npmmirror.com/d3-array/-/d3-array-3.2.4.tgz",
"integrity": "sha512-tdQAmyA18i4J7wprpYq8ClcxZy3SC31QMeByyCFyRt7BVHdREQZ5lpzoe5mFEYZUWe+oq8HBvk9JjpibyEV4Jg==",
"dependencies": {
"internmap": "1 - 2"
},
"engines": {
"node": ">=12"
}
},
"node_modules/vite": {
"version": "4.3.1",
"resolved": "https://registry.npmmirror.com/vite/-/vite-4.3.1.tgz",

View File

@ -17,19 +17,21 @@
"antd": "^5.12.7",
"axios": "^1.6.3",
"classnames": "^2.5.1",
"dayjs": "^1.11.10",
"i18next": "^23.7.16",
"js-base64": "^3.7.5",
"jsencrypt": "^3.3.2",
"lodash": "^4.17.21",
"moment": "^2.30.1",
"rc-tween-one": "^3.0.6",
"react-chat-elements": "^12.0.13",
"react-copy-to-clipboard": "^5.1.0",
"react-i18next": "^14.0.0",
"react-infinite-scroll-component": "^6.1.0",
"react-markdown": "^9.0.1",
"react-pdf-highlighter": "^6.1.0",
"react-string-replace": "^1.1.1",
"react-syntax-highlighter": "^15.5.0",
"recharts": "^2.12.4",
"remark-gfm": "^4.0.0",
"umi": "^4.0.90",
"umi-request": "^1.4.0",
@ -40,6 +42,7 @@
"@react-dev-inspector/umi4-plugin": "^2.0.1",
"@types/lodash": "^4.14.202",
"@types/react": "^18.0.33",
"@types/react-copy-to-clipboard": "^5.0.7",
"@types/react-dom": "^18.0.11",
"@types/react-syntax-highlighter": "^15.5.11",
"@types/uuid": "^9.0.8",

View File

@ -6,6 +6,21 @@ import zh_HK from 'antd/locale/zh_HK';
import React, { ReactNode, useEffect, useState } from 'react';
import storage from './utils/authorizationUtil';
import dayjs from 'dayjs';
import advancedFormat from 'dayjs/plugin/advancedFormat';
import customParseFormat from 'dayjs/plugin/customParseFormat';
import localeData from 'dayjs/plugin/localeData';
import weekday from 'dayjs/plugin/weekday';
import weekOfYear from 'dayjs/plugin/weekOfYear';
import weekYear from 'dayjs/plugin/weekYear';
dayjs.extend(customParseFormat);
dayjs.extend(advancedFormat);
dayjs.extend(weekday);
dayjs.extend(localeData);
dayjs.extend(weekOfYear);
dayjs.extend(weekYear);
const AntLanguageMap = {
en: enUS,
zh: zhCN,

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 545 KiB

After

Width:  |  Height:  |  Size: 406 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 390 KiB

After

Width:  |  Height:  |  Size: 388 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 321 KiB

After

Width:  |  Height:  |  Size: 467 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 2.0 MiB

After

Width:  |  Height:  |  Size: 1.1 MiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 311 KiB

After

Width:  |  Height:  |  Size: 966 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 599 KiB

After

Width:  |  Height:  |  Size: 515 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 872 KiB

After

Width:  |  Height:  |  Size: 196 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 366 KiB

After

Width:  |  Height:  |  Size: 296 KiB

View File

@ -0,0 +1,18 @@
<svg width="24" height="18" viewBox="0 0 24 18" fill="none" xmlns="http://www.w3.org/2000/svg">
<path
d="M1.32202e-08 2.54731L21.5 2.54731C22.8807 2.54731 24 3.4977 24 4.67006L24 15.2838C24 16.4562 22.8807 17.4066 21.5 17.4066L12 17.4066L2.5 17.4066C1.11929 17.4066 8.54054e-08 16.4562 7.9321e-08 15.2838L1.32202e-08 2.54731Z"
fill="#FBBC1A" />
<path
d="M2.97454e-08 5.73144L7.49143e-08 14.4347C8.09987e-08 15.6071 1.11929 16.5575 2.5 16.5575L21.5 16.5575C22.8807 16.5575 24 15.6071 24 14.4347L24 5.51916C24 4.3468 22.8807 3.39641 21.5 3.39641L11 3.39641L11 4.45779C11 5.16121 10.3284 5.73144 9.5 5.73144L2.97454e-08 5.73144Z"
fill="url(#paint0_linear_2323_8307)" />
<path
d="M8.81345e-09 1.6982C3.94591e-09 0.760312 0.89543 -4.64716e-09 2 -1.03797e-08L9 -4.67088e-08C10.1046 -5.24413e-08 11 0.760312 11 1.6982L11 2.54731L1.32202e-08 2.54731L8.81345e-09 1.6982Z"
fill="#FBBC1A" />
<defs>
<linearGradient id="paint0_linear_2323_8307" x1="0" y1="0" x2="28.8004" y2="20.3231"
gradientUnits="userSpaceOnUse">
<stop stop-color="#FFE69C" />
<stop offset="1" stop-color="#FFC937" />
</linearGradient>
</defs>
</svg>

After

Width:  |  Height:  |  Size: 1.2 KiB

48
web/src/base.ts Normal file
View File

@ -0,0 +1,48 @@
import isObject from 'lodash/isObject';
import { DvaModel } from 'umi';
import { BaseState } from './interfaces/common';
type State = Record<string, any>;
type DvaModelKey<T> = keyof DvaModel<T>;
export const modelExtend = <T>(
baseModel: Partial<DvaModel<any>>,
extendModel: DvaModel<any>,
): DvaModel<T> => {
return Object.keys(extendModel).reduce<DvaModel<T>>((pre, cur) => {
const baseValue = baseModel[cur as DvaModelKey<State>];
const value = extendModel[cur as DvaModelKey<State>];
if (isObject(value) && isObject(baseValue) && typeof value !== 'string') {
const key = cur as Exclude<DvaModelKey<State>, 'namespace'>;
pre[key] = {
...baseValue,
...value,
} as any;
} else {
pre[cur as DvaModelKey<State>] = value as any;
}
return pre;
}, {} as DvaModel<T>);
};
export const paginationModel: Partial<DvaModel<BaseState>> = {
state: {
searchString: '',
pagination: {
total: 0,
current: 1,
pageSize: 10,
},
},
reducers: {
setSearchString(state, { payload }) {
return { ...state, searchString: payload };
},
setPagination(state, { payload }) {
return { ...state, pagination: { ...state.pagination, ...payload } };
},
},
};

View File

@ -0,0 +1,27 @@
import { useTranslate } from '@/hooks/commonHooks';
import { CheckOutlined, CopyOutlined } from '@ant-design/icons';
import { Tooltip } from 'antd';
import { useState } from 'react';
import { CopyToClipboard as Clipboard, Props } from 'react-copy-to-clipboard';
const CopyToClipboard = ({ text }: Props) => {
const [copied, setCopied] = useState(false);
const { t } = useTranslate('common');
const handleCopy = () => {
setCopied(true);
setTimeout(() => {
setCopied(false);
}, 2000);
};
return (
<Tooltip title={copied ? t('copied') : t('copy')}>
<Clipboard text={text} onCopy={handleCopy}>
{copied ? <CheckOutlined /> : <CopyOutlined />}
</Clipboard>
</Tooltip>
);
};
export default CopyToClipboard;

View File

@ -0,0 +1,36 @@
import Markdown from 'react-markdown';
import SyntaxHighlighter from 'react-syntax-highlighter';
import remarkGfm from 'remark-gfm';
const HightLightMarkdown = ({
children,
}: {
children: string | null | undefined;
}) => {
return (
<Markdown
remarkPlugins={[remarkGfm]}
components={
{
code(props: any) {
const { children, className, node, ...rest } = props;
const match = /language-(\w+)/.exec(className || '');
return match ? (
<SyntaxHighlighter {...rest} PreTag="div" language={match[1]}>
{String(children).replace(/\n$/, '')}
</SyntaxHighlighter>
) : (
<code {...rest} className={className}>
{children}
</code>
);
},
} as any
}
>
{children}
</Markdown>
);
};
export default HightLightMarkdown;

View File

@ -0,0 +1,89 @@
import {
CartesianGrid,
Legend,
Line,
LineChart,
ResponsiveContainer,
Tooltip,
XAxis,
YAxis,
} from 'recharts';
import { CategoricalChartProps } from 'recharts/types/chart/generateCategoricalChart';
const data = [
{
name: 'Page A',
uv: 4000,
pv: 2400,
},
{
name: 'Page B',
uv: 3000,
pv: 1398,
},
{
name: 'Page C',
uv: 2000,
pv: 9800,
},
{
name: 'Page D',
uv: 2780,
pv: 3908,
},
{
name: 'Page E',
uv: 1890,
pv: 4800,
},
{
name: 'Page F',
uv: 2390,
pv: 3800,
},
{
name: 'Page G',
uv: 3490,
pv: 4300,
},
];
interface IProps extends CategoricalChartProps {
data?: Array<{ xAxis: string; yAxis: number }>;
showLegend?: boolean;
}
const RagLineChart = ({ data, showLegend = false }: IProps) => {
return (
<ResponsiveContainer width="100%" height="100%">
<LineChart
// width={500}
// height={300}
data={data}
margin={
{
// top: 5,
// right: 30,
// left: 20,
// bottom: 10,
}
}
>
<CartesianGrid strokeDasharray="3 3" />
<XAxis dataKey="xAxis" />
<YAxis />
<Tooltip />
{showLegend && <Legend />}
<Line
type="monotone"
dataKey="yAxis"
stroke="#8884d8"
activeDot={{ r: 8 }}
/>
{/* <Line type="monotone" dataKey="uv" stroke="#82ca9d" /> */}
</LineChart>
</ResponsiveContainer>
);
};
export default RagLineChart;

View File

@ -1,6 +1,19 @@
@import url(./inter.less);
html {
height: 100%;
}
body {
font-family: Inter;
margin: 0;
height: 100%;
}
#root {
height: 100%;
}
.ant-app {
height: 100%;
}

View File

@ -1,4 +1,9 @@
import { IConversation, IDialog } from '@/interfaces/database/chat';
import {
IConversation,
IDialog,
IStats,
IToken,
} from '@/interfaces/database/chat';
import { useCallback } from 'react';
import { useDispatch, useSelector } from 'umi';
@ -164,3 +169,134 @@ export const useCompleteConversation = () => {
return completeConversation;
};
// #region API provided for external calls
export const useCreateToken = (dialogId: string) => {
const dispatch = useDispatch();
const createToken = useCallback(() => {
return dispatch<any>({
type: 'chatModel/createToken',
payload: { dialogId },
});
}, [dispatch, dialogId]);
return createToken;
};
export const useListToken = () => {
const dispatch = useDispatch();
const listToken = useCallback(
(dialogId: string) => {
return dispatch<any>({
type: 'chatModel/listToken',
payload: { dialogId },
});
},
[dispatch],
);
return listToken;
};
export const useSelectTokenList = () => {
const tokenList: IToken[] = useSelector(
(state: any) => state.chatModel.tokenList,
);
return tokenList;
};
export const useRemoveToken = () => {
const dispatch = useDispatch();
const removeToken = useCallback(
(payload: { tenantId: string; dialogId: string; tokens: string[] }) => {
return dispatch<any>({
type: 'chatModel/removeToken',
payload: payload,
});
},
[dispatch],
);
return removeToken;
};
export const useFetchStats = () => {
const dispatch = useDispatch();
const fetchStats = useCallback(
(payload: any) => {
return dispatch<any>({
type: 'chatModel/getStats',
payload,
});
},
[dispatch],
);
return fetchStats;
};
export const useSelectStats = () => {
const stats: IStats = useSelector((state: any) => state.chatModel.stats);
return stats;
};
//#endregion
//#region shared chat
export const useCreateSharedConversation = () => {
const dispatch = useDispatch();
const createSharedConversation = useCallback(
(userId?: string) => {
return dispatch<any>({
type: 'chatModel/createExternalConversation',
payload: { userId },
});
},
[dispatch],
);
return createSharedConversation;
};
export const useFetchSharedConversation = () => {
const dispatch = useDispatch();
const fetchSharedConversation = useCallback(
(conversationId: string) => {
return dispatch<any>({
type: 'chatModel/getExternalConversation',
payload: conversationId,
});
},
[dispatch],
);
return fetchSharedConversation;
};
export const useCompleteSharedConversation = () => {
const dispatch = useDispatch();
const completeSharedConversation = useCallback(
(payload: any) => {
return dispatch<any>({
type: 'chatModel/completeExternalConversation',
payload: payload,
});
},
[dispatch],
);
return completeSharedConversation;
};
//#endregion

View File

@ -0,0 +1,144 @@
import {
IConnectRequestBody,
IFileListRequestBody,
} from '@/interfaces/request/file-manager';
import { UploadFile } from 'antd';
import { useCallback } from 'react';
import { useDispatch, useSelector } from 'umi';
export const useFetchFileList = () => {
const dispatch = useDispatch();
const fetchFileList = useCallback(
(payload: IFileListRequestBody) => {
return dispatch<any>({
type: 'fileManager/listFile',
payload,
});
},
[dispatch],
);
return fetchFileList;
};
export const useRemoveFile = () => {
const dispatch = useDispatch();
const removeFile = useCallback(
(fileIds: string[], parentId: string) => {
return dispatch<any>({
type: 'fileManager/removeFile',
payload: { fileIds, parentId },
});
},
[dispatch],
);
return removeFile;
};
export const useRenameFile = () => {
const dispatch = useDispatch();
const renameFile = useCallback(
(fileId: string, name: string, parentId: string) => {
return dispatch<any>({
type: 'fileManager/renameFile',
payload: { fileId, name, parentId },
});
},
[dispatch],
);
return renameFile;
};
export const useFetchParentFolderList = () => {
const dispatch = useDispatch();
const fetchParentFolderList = useCallback(
(fileId: string) => {
return dispatch<any>({
type: 'fileManager/getAllParentFolder',
payload: { fileId },
});
},
[dispatch],
);
return fetchParentFolderList;
};
export const useCreateFolder = () => {
const dispatch = useDispatch();
const createFolder = useCallback(
(parentId: string, name: string) => {
return dispatch<any>({
type: 'fileManager/createFolder',
payload: { parentId, name, type: 'folder' },
});
},
[dispatch],
);
return createFolder;
};
export const useSelectFileList = () => {
const fileList = useSelector((state) => state.fileManager.fileList);
return fileList;
};
export const useSelectParentFolderList = () => {
const parentFolderList = useSelector(
(state) => state.fileManager.parentFolderList,
);
return parentFolderList.toReversed();
};
export const useUploadFile = () => {
const dispatch = useDispatch();
const uploadFile = useCallback(
(fileList: UploadFile[], parentId: string) => {
try {
return dispatch<any>({
type: 'fileManager/uploadFile',
payload: {
file: fileList,
parentId,
path: fileList.map((file) => (file as any).webkitRelativePath),
},
});
} catch (errorInfo) {
console.log('Failed:', errorInfo);
}
},
[dispatch],
);
return uploadFile;
};
export const useConnectToKnowledge = () => {
const dispatch = useDispatch();
const uploadFile = useCallback(
(payload: IConnectRequestBody) => {
try {
return dispatch<any>({
type: 'fileManager/connectFileToKnowledge',
payload,
});
} catch (errorInfo) {
console.log('Failed:', errorInfo);
}
},
[dispatch],
);
return uploadFile;
};

View File

@ -127,13 +127,13 @@ export const useFetchKnowledgeBaseConfiguration = () => {
export const useFetchKnowledgeList = (
shouldFilterListWithoutDocument: boolean = false,
): { list: IKnowledge[]; loading: boolean } => {
) => {
const dispatch = useDispatch();
const loading = useOneNamespaceEffectsLoading('knowledgeModel', ['getList']);
const knowledgeModel = useSelector((state: any) => state.knowledgeModel);
const { data = [] } = knowledgeModel;
const list = useMemo(() => {
const list: IKnowledge[] = useMemo(() => {
return shouldFilterListWithoutDocument
? data.filter((x: IKnowledge) => x.chunk_num > 0)
: data;
@ -149,7 +149,7 @@ export const useFetchKnowledgeList = (
fetchList();
}, [fetchList]);
return { list, loading };
return { list, loading, fetchList };
};
export const useSelectFileThumbnails = () => {

View File

@ -1,9 +1,12 @@
import { LanguageTranslationMap } from '@/constants/common';
import { Pagination } from '@/interfaces/common';
import { IKnowledgeFile } from '@/interfaces/database/knowledge';
import { IChangeParserConfigRequestBody } from '@/interfaces/request/document';
import { useCallback, useState } from 'react';
import { PaginationProps } from 'antd';
import { useCallback, useMemo, useState } from 'react';
import { useTranslation } from 'react-i18next';
import { useSetModalState } from './commonHooks';
import { useDispatch } from 'umi';
import { useSetModalState, useTranslate } from './commonHooks';
import { useSetDocumentParser } from './documentHooks';
import { useOneNamespaceEffectsLoading } from './storeHooks';
import { useSaveSetting } from './userSettingHook';
@ -62,3 +65,51 @@ export const useChangeLanguage = () => {
return changeLanguage;
};
export const useGetPagination = (
total: number,
page: number,
pageSize: number,
onPageChange: PaginationProps['onChange'],
) => {
const { t } = useTranslate('common');
const pagination: PaginationProps = useMemo(() => {
return {
showQuickJumper: true,
total,
showSizeChanger: true,
current: page,
pageSize: pageSize,
pageSizeOptions: [1, 2, 10, 20, 50, 100],
onChange: onPageChange,
showTotal: (total) => `${t('total')} ${total}`,
};
}, [t, onPageChange, page, pageSize, total]);
return {
pagination,
};
};
export const useSetPagination = (namespace: string) => {
const dispatch = useDispatch();
const setPagination = useCallback(
(pageNumber = 1, pageSize?: number) => {
const pagination: Pagination = {
current: pageNumber,
} as Pagination;
if (pageSize) {
pagination.pageSize = pageSize;
}
dispatch({
type: `${namespace}/setPagination`,
payload: pagination,
});
},
[dispatch, namespace],
);
return setPagination;
};

View File

@ -1,6 +1,7 @@
export interface Pagination {
current: number;
pageSize: number;
total: number;
}
export interface BaseState {

View File

@ -91,3 +91,21 @@ export interface Docagg {
// term_similarity: number;
// vector_similarity: number;
// }
export interface IToken {
create_date: string;
create_time: number;
tenant_id: string;
token: string;
update_date?: any;
update_time?: any;
}
export interface IStats {
pv: [string, number][];
uv: [string, number][];
speed: [string, number][];
tokens: [string, number][];
round: [string, number][];
thumb_up: [string, number][];
}

View File

@ -0,0 +1,30 @@
export interface IFile {
create_date: string;
create_time: number;
created_by: string;
id: string;
kbs_info: { kb_id: string; kb_name: string }[];
location: string;
name: string;
parent_id: string;
size: number;
tenant_id: string;
type: string;
update_date: string;
update_time: number;
}
export interface IFolder {
create_date: string;
create_time: number;
created_by: string;
id: string;
location: string;
name: string;
parent_id: string;
size: number;
tenant_id: string;
type: string;
update_date: string;
update_time: number;
}

View File

@ -0,0 +1,7 @@
export interface IPaginationRequestBody {
keywords?: string;
page?: number;
page_size?: number; // name|create|doc_num|create_time|update_timedefaultcreate_time
orderby?: string;
desc?: string;
}

View File

@ -0,0 +1,14 @@
import { IPaginationRequestBody } from './base';
export interface IFileListRequestBody extends IPaginationRequestBody {
parent_id?: string; // folder id
}
interface BaseRequestBody {
parentId: string;
}
export interface IConnectRequestBody extends BaseRequestBody {
fileIds: string[];
kbIds: string[];
}

Some files were not shown because too many files have changed in this diff Show More