Compare commits

...

175 Commits

Author SHA1 Message Date
cfdccebb17 Feat: Fixed an issue where modifying fields in the agent operator caused the loss of structured data. #10427 (#11388)
### What problem does this PR solve?

Feat: Fixed an issue where modifying fields in the agent operator caused
the loss of structured data. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-19 20:11:53 +08:00
980a883033 Docs: minor (#11385)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2025-11-19 19:41:21 +08:00
02d429f0ca Doc: Optimize read me (#11386)
### What problem does this PR solve?

Users currently can’t view `git checkout v0.22.1` directly. They need to
scroll the code block all the way to the right to see it.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-19 19:40:55 +08:00
9c24d5d44a Fix some multilingual issues (#11382)
### What problem does this PR solve?

Fix some multilingual issues

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-19 19:14:43 +08:00
0cc5d7a8a6 Feat: If a query variable in a data manipulation operator is deleted, a warning message should be displayed to the user. #10427 #11255 (#11384)
### What problem does this PR solve?

Feat: If a query variable in a data manipulation operator is deleted, a
warning message should be displayed to the user. #10427 #11255

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-19 19:10:57 +08:00
c43bf1dcf5 Fix: refine error msg. (#11380)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-19 19:10:45 +08:00
f76b8279dd Doc: Added v0.22.1 release notes (#11383)
### What problem does this PR solve?


### Type of change


- [x] Documentation Update
2025-11-19 18:40:06 +08:00
db5ec89dc5 Feat: The key for the begin operator can only contain alphanumeric characters and underscores. #10427 (#11377)
### What problem does this PR solve?

Feat: The key for the begin operator can only contain alphanumeric
characters and underscores. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-19 16:16:57 +08:00
1c201c4d54 Fix: circle imports issue. (#11374)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-19 16:13:21 +08:00
ba78d0f0c2 Feat: Structured data will still be stored in outputs for compatibility with older versions. #10427 (#11368)
### What problem does this PR solve?

Feat: Structured data will still be stored in outputs for compatibility
with older versions. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-19 15:15:51 +08:00
add8c63458 Add release notes (#11372)
### What problem does this PR solve?

As title.

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-19 14:48:41 +08:00
83661efdaf Update README for supporting Gemini 3 Pro (#11369)
### What problem does this PR solve?

As title

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-19 14:16:03 +08:00
971197d595 Feat: Set the outputs type of list operation. #10427 (#11366)
### What problem does this PR solve?

Feat: Set the outputs type of list operation. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-19 13:59:43 +08:00
0884e9a4d9 Fix: bbox not included in mineru output (#11365)
### What problem does this PR solve?

Fix: bbox not included in mineru output #11315

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-19 13:59:32 +08:00
2de42f00b8 Fix: component list operation issue. (#11364)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-19 13:19:44 +08:00
e8fe580d7a Feat: add Gemini 3 Pro preview (#11361)
### What problem does this PR solve?

Add Gemini 3 Pro preview.

Change `GenerativeModel` to `genai`.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-19 13:17:22 +08:00
62505164d5 chore(template): introducing variable aggregator to customer service template (#11352)
### What problem does this PR solve?
Update customer service template

### Type of change
- [x] Other (please describe):
2025-11-19 12:28:06 +08:00
d1dcf3b43c Refactor /stats API (#11363)
### What problem does this PR solve?

One loop to get better performance

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-19 12:27:45 +08:00
f84662d2ee Fix: Fixed an issue where variable aggregation operators could not be connected to other operators. #10427 (#11358)
### What problem does this PR solve?

Fix: Fixed an issue where variable aggregation operators could not be
connected to other operators. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-19 10:29:26 +08:00
1cb6b7f5dd Update version info to v0.22.1 (#11346)
### What problem does this PR solve?

As title

### Type of change

- [x] Other (please describe): Update version info

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-19 09:50:23 +08:00
023f509501 Fix: variable assigner issue. (#11351)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-19 09:49:40 +08:00
50bc53a1f5 Fix: Modify the personal center style #10703 (#11347)
### What problem does this PR solve?

Fix: Modify the personal center style

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-18 20:07:17 +08:00
8cd4882596 Feat: Display variables in the variable assignment node. #10427 (#11349)
### What problem does this PR solve?

Feat: Display variables in the variable assignment node. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-18 20:07:04 +08:00
35e5fade93 Feat: new component variable assigner (#11050)
### What problem does this PR solve?
issue:
https://github.com/infiniflow/ragflow/issues/10427
change:
new component variable assigner
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-18 19:14:38 +08:00
4942a23290 Feat: Add a switch to control the display of structured output to the agent form. #10427 (#11344)
### What problem does this PR solve?

Feat: Add a switch to control the display of structured output to the
agent form. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-18 18:58:36 +08:00
d1716d865a Feat: Alter flask to Quart for async API serving. (#11275)
### What problem does this PR solve?

#11277

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-18 17:05:16 +08:00
c2b7c305fa Fix: crop index may out of range (#11341)
### What problem does this PR solve?

Crop index may out of range. #11323


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-18 17:01:54 +08:00
341e5904c8 Fix: No results can be found through the API /api/v1/dify/retrieval (#11338)
### What problem does this PR solve?

No results can be found through the API /api/v1/dify/retrieval. #11307 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-18 15:42:31 +08:00
ded9bf80c5 Fix:limit random sampling range in check_embedding (#11337)
### What problem does this PR solve?
issue:
[#11319](https://github.com/infiniflow/ragflow/issues/11319)
change:
limit random sampling range in check_embedding

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-18 15:24:27 +08:00
fea157ba08 Fix: manual parser with mineru (#11336)
### What problem does this PR solve?

Fix: manual parser with mineru #11320
Fix: missing parameter in mineru #11334
Fix: add outlines parameter for pdf parsers

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-18 15:22:52 +08:00
0db00f70b2 Fix: add describe_image_with_prompt for ZHIPU AI (#11317)
### What problem does this PR solve?

Fix: add describe_image_with_prompt for ZHIPU AI  #11289 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-18 13:09:39 +08:00
701761d119 Feat: Fixed the issue where form data assigned by variables was not updated in real time. #10427 (#11333)
### What problem does this PR solve?

Feat: Fixed the issue where form data assigned by variables was not
updated in real time. #10427
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-18 13:07:52 +08:00
2993fc666b Feat: update version to 0.22.1 (#11331)
### What problem does this PR solve?

Update version to 0.22.1

### Type of change

- [x] Documentation Update
2025-11-18 10:49:36 +08:00
8a6d205df0 fix: entrypoint.sh typo for disable datasync command (#11326)
### What problem does this PR solve?

There's a typo in `entrypoint.sh` on line 74: the case statement uses
`--disable-datasyn)` (missing the 'c'), while the usage function and
documentation correctly show `--disable-datasync` (with the 'c'). This
mismatch causes the `--disable-datasync` flag to be unrecognized,
triggering the usage message and causing containers to restart in a loop
when this flag is used.

**Background:**
- Users following the documentation use `--disable-datasync` in their
docker-compose.yml
- The entrypoint script doesn't recognize this flag due to the typo
- The script calls `usage()` and exits, causing Docker containers to
restart continuously
- This makes it impossible to disable the data sync service as intended

**Example scenario:**
When a user adds `--disable-datasync` to their docker-compose command
(as shown in examples), the container fails to start properly because
the argument isn't recognized.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
### Proposed Solution

Fix the typo on line 74 of `entrypoint.sh` by changing:
```bash
    --disable-datasyn)
```
to:
```bash
    --disable-datasync)
```

This matches the spelling used in the usage function (line 9 and 13) and
allows the flag to work as documented.

### Changes Made

- Fixed typo in `entrypoint.sh` line 74: changed `--disable-datasyn)` to
`--disable-datasync)`
- This ensures the argument matches the documented flag name and usage
function

---

**Code change:**

```bash
# Line 74 in entrypoint.sh
# Before:
    --disable-datasyn)
      ENABLE_DATASYNC=0
      shift
      ;;

# After:
    --disable-datasync)
      ENABLE_DATASYNC=0
      shift
      ;;
```

This is a simple one-character fix that resolves the argument parsing
issue.
2025-11-18 10:28:00 +08:00
912b6b023e fix: update check_embedding failed info (#11321)
### What problem does this PR solve?
change:
update check_embedding failed info

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-18 09:39:45 +08:00
89e8818dda Feat: add s3-compatible storage boxes (#11313)
### What problem does this PR solve?

PR for implementing s3 compatible storage units #11240 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-18 09:39:25 +08:00
1dba6b5bf9 Fix: Fixed an issue where adding session variables multiple times would overwrite them. (#11308)
### What problem does this PR solve?

Fix: Fixed an issue where adding session variables multiple times would
overwrite them.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-18 09:39:02 +08:00
3fcf2ee54c feat: add new LLM provider Jiekou.AI (#11300)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Jason <ggbbddjm@gmail.com>
2025-11-17 19:47:46 +08:00
d8f413a885 Feat: Construct a dynamic variable assignment form #10427 (#11316)
### What problem does this PR solve?

Feat: Construct a dynamic variable assignment form #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-17 19:45:58 +08:00
7264fb6978 Fix: concat images in word document. (#11310)
### What problem does this PR solve?

Fix: concat images in word document. Partially solved issues in #11063 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-17 19:38:26 +08:00
bd4bc57009 Refactor: move mcp connection utilities to common (#11304)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-17 15:34:17 +08:00
0569b50fed Fix: create dataset return type inconsistent (#11272)
### What problem does this PR solve?

Fix: create dataset return type inconsistent #11167 
 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-17 15:27:19 +08:00
6b64641042 Fix: default model base url extraction logic (#11263)
### What problem does this PR solve?

Fixes an issue where default models which used the same factory but
different base URLs would all be initialised with the default chat
model's base URL and would ignore e.g. the embedding model's base URL
config.

For example, with the following service config, the embedding and
reranker models would end up using the base URL for the default chat
model (i.e. `llm1.example.com`):

```yaml
ragflow:
  service_conf:
    user_default_llm:
      factory: OpenAI-API-Compatible
      api_key: not-used
      default_models:
        chat_model:
          name: llm1
          base_url: https://llm1.example.com/v1
        embedding_model:
          name: llm2
          base_url: https://llm2.example.com/v1
        rerank_model:
          name: llm3
          base_url: https://llm3.example.com/v1/rerank

  llm_factories:
    factory_llm_infos:
    - name: OpenAI-API-Compatible
      logo: ""
      tags: "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION"
      status: "1"
      llm:
        - llm_name: llm1
          base_url: 'https://llm1.example.com/v1'
          api_key: not-used
          tags: "LLM,CHAT,IMAGE2TEXT"
          max_tokens: 100000
          model_type: chat
          is_tools: false

        - llm_name: llm2
          base_url: https://llm2.example.com/v1
          api_key: not-used
          tags: "TEXT EMBEDDING"
          max_tokens: 10000
          model_type: embedding

        - llm_name: llm3
          base_url: https://llm3.example.com/v1/rerank
          api_key: not-used
          tags: "RERANK,1k"
          max_tokens: 10000
          model_type: rerank
```

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
2025-11-17 14:21:27 +08:00
9cef3a2625 Fix: Fixed the issue of not being able to select the time zone in the user center. (#11298)
… user center.

### What problem does this PR solve?

Fix: Fixed the issue of not being able to select the time zone in the
user center.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-17 11:16:55 +08:00
e7e89d3ecb Doc: style fix (#11295)
### What problem does this PR solve?

Style fix based on  #11283
### Type of change

- [x] Documentation Update
2025-11-17 11:16:34 +08:00
13e212c856 Feat: add Jira connector (#11285)
### What problem does this PR solve?

Add Jira connector.

<img width="978" height="925" alt="image"
src="https://github.com/user-attachments/assets/78bb5c77-2710-4569-a76e-9087ca23b227"
/>

---

<img width="1903" height="489" alt="image"
src="https://github.com/user-attachments/assets/193bc5c5-f751-4bd5-883a-2173282c2b96"
/>

---

<img width="1035" height="925" alt="image"
src="https://github.com/user-attachments/assets/1a0aec19-30eb-4ada-9283-61d1c915f59d"
/>

---

<img width="1905" height="601" alt="image"
src="https://github.com/user-attachments/assets/3dde1062-3f27-4717-8e09-fd5fd5e64171"
/>

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-17 09:38:04 +08:00
61cf430dbb Minor tweats (#11271)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-16 19:29:20 +08:00
e841b09d63 Remove unused code and fix performance issue (#11284)
### What problem does this PR solve?

1. remove redundant code
2. fix miner performance issue

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-14 20:39:54 +08:00
b1a1eedf53 Doc: add default username & pwd (#11283)
### What problem does this PR solve?
Doc: add default username & pwd

### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-11-14 19:52:58 +08:00
68e3b33ae4 Feat: extract message output to file (#11251)
### What problem does this PR solve?

Feat: extract message output to file

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-14 19:52:11 +08:00
cd55f6c1b8 Fix:ListOperations does not support sorting arrays of objects. (#11278)
### What problem does this PR solve?

pr:
#11276
change:
ListOperations does not support sorting arrays of objects.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-14 19:50:29 +08:00
996b5fe14e Fix: Added the ability to download files in the agent message reply function. (#11281)
### What problem does this PR solve?

Fix: Added the ability to download files in the agent message reply
function.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-14 19:50:01 +08:00
db4fd19c82 Feat:new component list operations (#11276)
### What problem does this PR solve?
issue:
https://github.com/infiniflow/ragflow/issues/10427
change:
new component list operations

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-14 16:33:20 +08:00
12db62b9c7 Refactor: improve mineru_parser get property logic (#11268)
### What problem does this PR solve?

improve mineru_parser get property logic

### Type of change

- [x] Refactoring
2025-11-14 16:32:35 +08:00
b5f2cf16bc Fix: check task executor alive and display status (#11270)
### What problem does this PR solve?

Correctly check task executor alive and display status.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-14 15:52:28 +08:00
e27ff8d3d4 Fix: rerank algorithm (#11266)
### What problem does this PR solve?

Fix: rerank algorithm #11234

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-14 13:59:54 +08:00
5f59418aba Remove leftover account and password from the code (#11248)
Remove legacy accounts and passwords.

### What problem does this PR solve?

Remove leftover account and password in
agent/templates/sql_assistant.json

### Type of change

- [x] Other (please describe):
2025-11-14 13:59:03 +08:00
87e69868c0 Fixes: Added session variable types and modified configuration (#11269)
### What problem does this PR solve?

Fixes: Added session variable types and modified configuration

- Added more types of session variables
- Modified the embedding model switching logic in the knowledge base
configuration

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-14 13:56:56 +08:00
72c20022f6 Refactor service config fetching in admin server (#11267)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2025-11-14 12:32:08 +08:00
3f2472f1b9 Skip checking python comments 2025-11-14 11:59:15 +08:00
1d4d67daf8 Fix check_comment_ascii.py 2025-11-14 11:45:32 +08:00
7538e218a5 Fix check_comment_ascii.py 2025-11-14 11:32:55 +08:00
6b52f7df5a CI check comments of cheanged Python files 2025-11-14 10:54:07 +08:00
63131ec9b2 Docs: default admin credentials (#11260)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2025-11-14 09:35:56 +08:00
e8f1a245a6 Feat:update check_embedding api (#11254)
### What problem does this PR solve?
pr: 
#10854
change:
update check_embedding api

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-13 18:48:25 +08:00
908450509f Feat: add fault-tolerant mechanism to RAPTOR (#11206)
### What problem does this PR solve?

Add fault-tolerant mechanism to RAPTOR.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-13 18:48:07 +08:00
70a0f081f6 Minor tweaks (#11249)
### What problem does this PR solve?

Fix some IDE warnings

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-13 16:11:07 +08:00
93422fa8cc Fix: Law parser (#11246)
### What problem does this PR solve?

Fix: Law parser
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-13 15:19:02 +08:00
bfc84ba95b Test: handle duplicate names by appending "(1)" (#11244)
### What problem does this PR solve?

- Updated tests to reflect new behavior of handling duplicate dataset
names
- Instead of returning an error, the system now appends "(1)" to
duplicate names
- This problem was introduced by PR #10960

### Type of change

- [x] Testcase update
2025-11-13 15:18:32 +08:00
871055b0fc Feat:support API for generating knowledge graph and raptor (#11229)
### What problem does this PR solve?
issue:
[#11195](https://github.com/infiniflow/ragflow/issues/11195)
change:
support API for generating knowledge graph and raptor

### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
2025-11-13 15:17:52 +08:00
ba71160b14 Refa: rm useless code. (#11238)
### Type of change

- [x] Refactoring
2025-11-13 09:59:55 +08:00
bd5dda6b10 Feature/doc upload api add parent path 20251112 (#11231)
### What problem does this PR solve?

Add the specified parent_path to the document upload api interface
(#11230)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: virgilwong <hyhvirgil@gmail.com>
2025-11-13 09:59:39 +08:00
774563970b Fix: update readme (#11212)
### What problem does this PR solve?

Continue update readme #11167 

### Type of change

- [x] Documentation Update
2025-11-13 09:50:47 +08:00
83d84e90ed Fix: Profile picture cropping supported #10703 (#11221)
### What problem does this PR solve?

Fix: Profile picture cropping supported

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-13 09:50:10 +08:00
8ef2f79d0a Fix:reset the agent component’s output (#11222)
### What problem does this PR solve?

change:
“After each dialogue turn, the agent component’s output is not reset.”

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-13 09:49:12 +08:00
296476ab89 Refactor function name (#11210)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-12 19:00:15 +08:00
a36a0fe71c Docs: Update version references to v0.22.0 in READMEs and docs (#11211)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.21.1 to v0.22.0
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2025-11-12 14:54:28 +08:00
a81f6d1b24 Fix: Bug Fixes - Added disabled logic RAPTOR scope #10703 (#11207)
### What problem does this PR solve?

Fix: Bug Fixes #10703

- Fixed the menu order in the user center
- Added a disabled RAPTOR scope
- Fixed some style issues

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-12 14:36:30 +08:00
8406a5ea47 Fix typos (#11208)
### What problem does this PR solve?

As title

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-12 14:20:04 +08:00
20b6dafbd8 Update docs (#11204)
### What problem does this PR solve?

as title

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-12 14:01:47 +08:00
33cc9cafa9 chore(readme): remove slim image from docs (#11199)
### What problem does this PR solve?

RAGFlow will no longer offer docker images that contains embedding
models.

### Type of change

- [x] Documentation Update
2025-11-12 13:57:35 +08:00
6567ecf15a Bump infinity to 0.6.5 (#11203)
### What problem does this PR solve?

Bump infinity to 0.6.5

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-12 13:33:33 +08:00
3a7322f5b2 Docs: Added v0.22.0 release notes. (#11202)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2025-11-12 13:10:07 +08:00
829e5f287b Fixes: Fixed some bugs #10703 (#11200)
### What problem does this PR solve?

Fixes: Fixed some bugs #10703

- Removed login page animation
- Modified some styles in the user profile center

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-12 12:53:41 +08:00
1e8efa2631 chore(template): update agent template's title (#11201)
### What problem does this PR solve?

Update title

### Type of change

- [x] Other (please describe):
2025-11-12 12:53:28 +08:00
e7f7c09b0b Fix: Fixed an issue that caused the page to crash when a knowledge base variable was selected. #10427 (#11197)
### What problem does this PR solve?

Fix: Fixed an issue that caused the page to crash when a knowledge base
variable was selected. #10427

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-12 12:30:08 +08:00
8ae562504b Fix: GraphRAG and RAPTOR tasks do not affect document status (#11194)
### What problem does this PR solve?

GraphRAG and RAPTOR tasks do not affect document status.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-12 12:03:41 +08:00
bacc9d3ab9 Revert PR#11151 (#11196)
### What problem does this PR solve?

Revert PR#11151

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-12 11:58:02 +08:00
d226764ed0 Fix: connector auto-parse issue. (#11189)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-12 11:50:39 +08:00
39120d49cf Docs: Removed descriptions of the slim edition. (#11192)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2025-11-12 11:34:45 +08:00
27211a9b34 Update Chinese README.md on slim version (#11190)
### What problem does this PR solve?

As title.

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-12 11:06:08 +08:00
e9de25c973 Docs: update latest updates. (#11188)
### Type of change

- [x] Documentation Update
2025-11-12 10:38:33 +08:00
09e971dcc8 chore(templates): add user interaction agent (#11185)
### What problem does this PR solve?
Add user interaction agent template

### Type of change

- [x] Other (please describe): new agent template
2025-11-12 09:38:39 +08:00
883df22aa2 Update LLM factories ranks in llm_factories.json (#11184)
### What problem does this PR solve?

[Update LLM factory ranks in llm_factories.json]

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-12 09:38:06 +08:00
2bd7abadd3 Fix: Confluence cannot retrieve updated files (#11182)
### What problem does this PR solve?

Confluence cannot retrieve updated files。

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-12 09:37:32 +08:00
435479adb3 Fixes: Fixed some bugs #10703 (#11180)
### What problem does this PR solve?

Fixes: Fixed some bugs #10703

- Removed S3 upload from the file upload component
- Updated the dropdown menu style on the model provider page
- Updated some model provider icons
- Fixed other style issues

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-12 09:36:48 +08:00
2c727a4a9c Docs: parser behavior change (#11176)
### What problem does this PR solve?


### Type of change


- [x] Documentation Update
2025-11-11 21:10:06 +08:00
a15f522dc9 Update Admin UI user guide docs (#11183)
### What problem does this PR solve?

- Update Admin UI user guide docs

### Type of change

- [x] Documentation Update
2025-11-11 20:29:20 +08:00
de53498b39 Fix: Update env to support PPTX and update README for version changes (#11167)
### What problem does this PR solve?

Fix: Update env to support PPTX
Fix: update README for version changes #11138

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-11-11 19:56:54 +08:00
72740eb5b9 Fix:data_operations input return (#11177)
### What problem does this PR solve?

change:
data_operations input return

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-11 19:54:17 +08:00
c30ffb5716 Fix: ollama model list issue. (#11175)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-11 19:46:41 +08:00
6dcff7db97 Feat: The input parameters of data manipulation operators can only be of type object. #10427 (#11179)
### What problem does this PR solve?

Feat: The input parameters of data manipulation operators can only be of
type object. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-11 19:43:49 +08:00
9213568692 Feat: add mechanism to check cancellation in Agent (#10766)
### What problem does this PR solve?

Add mechanism to check cancellation in Agent.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-11 17:36:48 +08:00
d81e4095de Feat: Google drive supports web-based credentials (#11173)
### What problem does this PR solve?

 Google drive supports web-based credentials.

<img width="1204" height="612" alt="image"
src="https://github.com/user-attachments/assets/70291c63-a2dd-4a80-ae20-807fe034cdbc"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-11 17:21:08 +08:00
8ddeaca3d6 Feat: Place the new mcp button at the end of the line. #10427 (#11170)
### What problem does this PR solve?

Feat: Place the new mcp button at the end of the line. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-11 17:11:32 +08:00
f441f8ffc2 Fix: waitForResponse component. (#11172)
### What problem does this PR solve?

#10056

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-11-11 16:58:47 +08:00
522c7b7ac6 Fixe: model provider issues and improved some features #10703 (#11168)
### What problem does this PR solve?

Fixes: Fixed model provider issues and improved some features
- Removed the old login page
- Updated model provider icons
- Added RAPTOR modification range parameter

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-11 16:26:26 +08:00
377c0fb4fa Feat: Call the interface to stop the output of the large model #10997 (#11164)
### What problem does this PR solve?

Feat: Call the interface to stop the output of the large model #10997

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-11 15:21:08 +08:00
7dd9758056 Add task executor bar chart, add system version string (#11155)
### What problem does this PR solve?

- Add task executor bar chart
- Add read version string

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-11 15:20:37 +08:00
26cf5131c9 Fix: filter builtin llm factories. (#11163)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-11 14:52:59 +08:00
93207f83ba Changed infinity log level to info (#11165)
### What problem does this PR solve?

Changed infinity log level to info

### Type of change

- [x] Refactoring
2025-11-11 14:43:25 +08:00
f77604db26 Docs: add admin UI user guide (#11156)
### What problem does this PR solve?

Add admin UI user guide

### Type of change

- [x] Documentation Update
2025-11-11 14:20:35 +08:00
dd5b8e2e1a Fix: add auto_parse to kb detail. (#11153)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-11 12:22:43 +08:00
83ff8e8009 Fix:update agent variable name rule (#11124)
### What problem does this PR solve?

change:

1. update agent variable name rule.
2. reset() in Canvas doesn't reset the env var.
3. correct log input binding in message component
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-11 11:18:30 +08:00
7db6cb8ca3 Fixes: Bugs fixed #10703 (#11154)
### What problem does this PR solve?

Fixes: Bugs fixed
- Removed invalid code,
- Modified the user center style,
- Added an automatic data source parsing switch.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-11 11:18:07 +08:00
ba6470a7a5 Chore(config): Added rank values for the LLM vendors and remove deprecated LLM (#11133)
### What problem does this PR solve?

Added vendor ranking so that frequently used model providers appear
higher on the page for easier access.
Remove deprecated LLM configurations from llm_factories.json to
streamline model management

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-10 19:17:35 +08:00
df16a80f25 Feat: add initial Google Drive connector support (#11147)
### What problem does this PR solve?

This feature is primarily ported from the
[Onyx](https://github.com/onyx-dot-app/onyx) project with necessary
modifications. Thanks for such a brilliant project.

Minor: consistently use `google_drive` rather than `google_driver`.

<img width="566" height="731" alt="image"
src="https://github.com/user-attachments/assets/6f64e70e-881e-42c7-b45f-809d3e0024a4"
/>

<img width="904" height="830" alt="image"
src="https://github.com/user-attachments/assets/dfa7d1ef-819a-4a82-8c52-0999f48ed4a6"
/>

<img width="911" height="869" alt="image"
src="https://github.com/user-attachments/assets/39e792fb-9fbe-4f3d-9b3c-b2265186bc22"
/>

<img width="947" height="323" alt="image"
src="https://github.com/user-attachments/assets/27d70e96-d9c0-42d9-8c89-276919b6d61d"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-10 19:15:02 +08:00
29ea059f90 Feat: Adjust the style of mcp and checkbox. #10427 (#11150)
### What problem does this PR solve?

Feat: Adjust the style of mcp and checkbox. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-10 19:02:41 +08:00
a191933f81 Fix(config): Add raptor_kwd field to infinity mapping (#11146)
### What problem does this PR solve?

fix infinity "INSERT: Column raptor_kwd not found in table" error

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-10 19:02:25 +08:00
6e1ebb2855 Fix: Optimize Prompts and Regex for use_sql() (#11148)
### What problem does this PR solve?

Fix: Optimize Prompts and Regex for use_sql() #11127 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-10 19:02:07 +08:00
68b952abb1 Don't select vector on infinity (#11151)
### What problem does this PR solve?

Don't select vector on infinity

### Type of change

- [x] Performance Improvement
2025-11-10 18:01:40 +08:00
0879b6af2c Feat: Globally defined conversation variables can be selected in the operator's query variables. #10427 (#11135)
### What problem does this PR solve?

Feat: Globally defined conversation variables can be selected in the
operator's query variables. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-10 15:09:33 +08:00
2b9145948f Fix:not enough values to unpack (expected 3, got 2) in general chunk (#11139)
### What problem does this PR solve?
issue:
#11136
change:
not enough values to unpack (expected 3, got 2) in general chunk

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-10 15:08:24 +08:00
726473fd39 Fix: Bugs fixed #10703 (#11132)
### What problem does this PR solve?

Fix: Bugs fixed #10703

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-10 14:12:45 +08:00
d207291217 Fix: add download stats to kb logs. (#11112)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-10 13:28:07 +08:00
bf382e5c4d Fix: remove unsupported models in siliconflow api (#11126)
### What problem does this PR solve?

Fix: remove unsupported models in siliconflow api

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-10 13:27:42 +08:00
4338e706c6 Fix: missing file formats in hierarchical_manager (#11129)
### What problem does this PR solve?

Fix: missing file formats in hierarchical_manager  #11084 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-10 13:27:22 +08:00
86af330f06 Feat: The keys for data manipulation operators can only be numbers, letters, and underscores. #10427 (#11130)
### What problem does this PR solve?

Feat: The keys for data manipulation operators can only be numbers,
letters, and underscores. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-10 13:27:09 +08:00
d016a06fd5 Feat/monitor task (#11116)
### What problem does this PR solve?

Show task executor.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-10 12:51:39 +08:00
7423a5806e Feature: Added global variable functionality #10703 (#11117)
### What problem does this PR solve?

Feature: Added global variable functionality

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-10 10:16:12 +08:00
b6cd282ccd fix: layout structure to use main tag (#11119)
### What problem does this PR solve?

For proper semantics Layout should use HTML `<main>` element to wrap the
Header and Outlet which produce`<section>` HTML elements.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-10 10:15:57 +08:00
82ca2e0378 Refactor: QWenCV release temp path (#11122)
### What problem does this PR solve?

QWenCV release temp path

### Type of change
- [x] Refactoring
2025-11-10 10:15:37 +08:00
1cd54832b5 Adjust styles to match the design system (#11118)
### What problem does this PR solve?

- Modify and adjust styles (CSS vars, components) to match the design
system
- Adjust file and directory structure of admin UI

### Type of change

- [x] Refactoring
2025-11-10 10:05:19 +08:00
660386d3b5 Fix: cannot parse images (#11044)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/11043

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-10 09:31:19 +08:00
4cdaa77545 Docs: refine MinerU part in FAQ (#11111)
### What problem does this PR solve?

Refine MinerU part in FAQ.

### Type of change

- [x] Documentation Update
2025-11-07 19:58:07 +08:00
9fcc4946e2 Feat: add kimi-k2-thinking and moonshot-v1-vision-preview (#11110)
### What problem does this PR solve?

Add kimi-k2-thinking and moonshot-v1-vision-preview.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-07 19:52:57 +08:00
98e9d68c75 Feat: Add Variable aggregator (#11114)
### What problem does this PR solve?
Feat: Add Variable aggregator

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-07 19:52:26 +08:00
8f34824aa4 Feat: Display the selected variables in the variable aggregation node. #10427 (#11113)
### What problem does this PR solve?
Feat: Display the selected variables in the variable aggregation node.
#10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-07 19:52:04 +08:00
9a6808230a Fix workflows 2025-11-07 17:14:04 +08:00
c7bd0a755c Fix: python api streaming structure (#11105)
### What problem does this PR solve?

Fix: python api streaming structure

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-07 16:50:58 +08:00
dd1c8c5779 Feat: add auto parse to connector. (#11099)
### What problem does this PR solve?

#10953

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-07 16:49:29 +08:00
526ba3388f Feat: The output is derived based on the configuration of the variable aggregation operator. #10427 (#11109)
### What problem does this PR solve?

Feat: The output is derived based on the configuration of the variable
aggregation operator. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-07 16:35:32 +08:00
cb95072ecf Fix workflows 2025-11-07 15:57:33 +08:00
f6aeebc608 Fix: cannot write mode RGBA as JPEG (#11102)
### What problem does this PR solve?
Fix #11091 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-07 15:45:10 +08:00
307f53dae8 Minor tweaks (#11106)
### What problem does this PR solve?

Refactor

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-07 15:44:57 +08:00
fa98cc2bb9 Fix: add huggingface model download functionality (#11101)
### What problem does this PR solve?

reverse #11048

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-07 15:12:47 +08:00
c58d95ed69 Bump infinity to 0.6.4 (#11104)
### What problem does this PR solve?

Bump infinity to 0.6.4

Fixed https://github.com/infiniflow/infinity/issues/3048

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-07 14:44:34 +08:00
edbc396bc6 Fix: Added some prompts and polling functionality to retrieve data source logs. #10703 (#11103)
### What problem does this PR solve?

Fix: Added some prompts and polling functionality to retrieve data
source logs.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-07 14:28:45 +08:00
b137de1def Fix: Plain parser is skipped (#11094)
### What problem does this PR solve?

plain parser skipeed

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-07 13:39:29 +08:00
2cb1046cbf fix: The doc file cannot be parsed(#11092) (#11093)
### What problem does this PR solve?

The doc file cannot be parsed(#11092)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: virgilwong <hyhvirgil@gmail.com>
2025-11-07 11:46:10 +08:00
a880beb1f6 Feat: Add a form for variable aggregation operators #10427 (#11095)
### What problem does this PR solve?

Feat: Add a form for variable aggregation operators #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-07 11:44:22 +08:00
34283d4db4 Feat: add data source to pipleline logs . (#11075)
### What problem does this PR solve?

#10953

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-07 11:43:59 +08:00
5629fbd2ca Fix: OpenSearch retrieval no return & Add documentation of /retrieval (#11083)
### What problem does this PR solve?

Fix: OpenSearch retrieval no return #11006
Add documentation #11072
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-11-07 09:28:42 +08:00
b7aa6d6c4f Fix: add avatar for UI (#11080)
### What problem does this PR solve?

Add avatar for admin UI.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-07 09:27:31 +08:00
0b7b88592f Fix: Improve some functional issues with the data source. #10703 (#11081)
### What problem does this PR solve?

Fix: Improve some functional issues with the data source. #10703

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-06 20:07:38 +08:00
42edecc98f Add 'SHOW VERSION' to document (#11082)
### What problem does this PR solve?

As title

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-06 19:34:47 +08:00
af98763e27 Admin: add 'show version' (#11079)
### What problem does this PR solve?

```
admin> show version;
show_version
+-----------------------+
| version               |
+-----------------------+
| v0.21.0-241-gc6cf58d5 |
+-----------------------+
admin> \q
Goodbye!

```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-06 19:24:46 +08:00
5a8fbc5a81 Fix: Can't add more models (#11076)
### What problem does this PR solve?

Currently we cannot add any models, since factory is a string, and the
return type of get_allowed_llm_factories() is List[object]
https://github.com/infiniflow/ragflow/pull/11003

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-06 18:54:13 +08:00
0cd8024c34 Feat: RAPTOR handle cancel gracefully (#11074)
### What problem does this PR solve?

RAPTOR handle cancel gracefully.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-06 17:18:03 +08:00
3bd1fefe1f Feat: debug sync data. (#11073)
### What problem does this PR solve?

#10953 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-06 16:48:04 +08:00
e18c408759 Feat: Add variable aggregator node #10427 (#11070)
### What problem does this PR solve?

Feat: Add variable aggregator node #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-06 16:18:00 +08:00
23b81eae77 Feat: GraphRAG handle cancel gracefully (#11061)
### What problem does this PR solve?

 GraghRAG handle cancel gracefully. #10997.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-06 16:12:20 +08:00
66c01c7274 Minor tweaks (#11060)
### What problem does this PR solve?

Minor tweaks

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-06 15:28:48 +08:00
4b8ce08050 Fix: fix pdf_parser ignored in rag/app/naive.py (#11065)
### What problem does this PR solve?

Fix: fix pdf_parser ignored in rag/app/naive.py #11000

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-06 15:20:35 +08:00
ca30ef83bf Feat: Add variable assignment node #10427 (#11058)
### What problem does this PR solve?

Feat: Add variable assignment node #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-06 14:42:47 +08:00
d469ae6d50 Feat: The agent operator and message operator can only select string variables as prompt words. #10427 (#11054)
### What problem does this PR solve?

Feat: The agent operator and message operator can only select string
variables as prompt words. #10427
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-06 13:58:20 +08:00
f581a1c4e5 Feature: Added data source functionality #10703 (#11046)
### What problem does this PR solve?

Feature: Added data source functionality

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-06 11:53:46 +08:00
15c75bbf15 Refa: Remove HuggingFace repo downloads (#11048)
### What problem does this PR solve?

- Removed download_model function and HuggingFace repo download loop

### Type of change

- [x] Refactoring
2025-11-06 11:53:33 +08:00
adbb8319e0 Fix: add fields for logs. (#11039)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-06 09:49:57 +08:00
f98b24c9bf Move api.settings to common.settings (#11036)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-06 09:36:38 +08:00
87c9a054d3 Feat: The value of data operations operators can be either input or referenced from variables. #10427 (#11037)
### What problem does this PR solve?

Feat: The value of data operations operators can be either input or
referenced from variables. #10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-05 20:04:23 +08:00
cd6ed4b380 Feat: add webhook component. (#11033)
### What problem does this PR solve?

#10427

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-05 19:59:23 +08:00
f29a3dd651 fix:data operations update (#11013)
### What problem does this PR solve?

change:data operations update

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-05 19:59:10 +08:00
e658beee38 Fix: Fixed the issue of errors when using agents created from templates. #10427 (#11035)
### What problem does this PR solve?

Fix: Fixed the issue of errors when using agents created from templates.
#10427

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-11-05 19:15:43 +08:00
17ea5c1dee Fix: MCP cannot handle empty Auth field properly (#11034)
### What problem does this PR solve?

Fix MCP cannot handle empty Auth field properly, then result in 

```bash
2025-11-05 11:10:41,919 INFO     51209 Negotiated protocol version: 2025-06-18
2025-11-05 11:10:41,920 INFO     51209 client_session initialized successfully
2025-11-05 11:10:41,994 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:10:41] "GET /api/v1/datasets?page=1&page_size=1000&orderby=create_time&desc=True HTTP/1.1" 200 -
2025-11-05 11:10:41,999 INFO     51209 Want to clean up 1 MCP sessions
2025-11-05 11:10:42,000 INFO     51209 1 MCP sessions has been cleaned up. 0 in global context.
2025-11-05 11:10:42,001 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:10:42] "POST /v1/mcp_server/test_mcp HTTP/1.1" 200 -
2025-11-05 11:11:30,441 INFO     51209 Negotiated protocol version: 2025-06-18
2025-11-05 11:11:30,442 INFO     51209 client_session initialized successfully
2025-11-05 11:11:30,520 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:11:30] "GET /api/v1/datasets?page=1&page_size=1000&orderby=create_time&desc=True HTTP/1.1" 200 -
2025-11-05 11:11:30,525 INFO     51209 Want to clean up 1 MCP sessions
2025-11-05 11:11:30,526 INFO     51209 1 MCP sessions has been cleaned up. 0 in global context.
2025-11-05 11:11:30,527 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:11:30] "POST /v1/mcp_server/test_mcp HTTP/1.1" 200 -
2025-11-05 11:11:31,476 INFO     51209 Negotiated protocol version: 2025-06-18
2025-11-05 11:11:31,476 INFO     51209 client_session initialized successfully
2025-11-05 11:11:31,549 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:11:31] "GET /api/v1/datasets?page=1&page_size=1000&orderby=create_time&desc=True HTTP/1.1" 200 -
2025-11-05 11:11:31,552 INFO     51209 Want to clean up 1 MCP sessions
2025-11-05 11:11:31,553 INFO     51209 1 MCP sessions has been cleaned up. 0 in global context.
2025-11-05 11:11:31,554 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:11:31] "POST /v1/mcp_server/test_mcp HTTP/1.1" 200 -
2025-11-05 11:11:51,930 ERROR    51209 unhandled errors in a TaskGroup (1 sub-exception)
  + Exception Group Traceback (most recent call last):
  |   File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 86, in _mcp_server_loop
  |     async with streamablehttp_client(url, headers) as (read_stream, write_stream, _):
  |   File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/contextlib.py", line 217, in __aexit__
  |     await self.gen.athrow(typ, value, traceback)
  |   File "/home/xxxxxxxxx/workspace/ragflow/.venv/lib/python3.10/site-packages/mcp/client/streamable_http.py", line 478, in streamablehttp_client
  |     async with anyio.create_task_group() as tg:
  |   File "/home/xxxxxxxxx/workspace/ragflow/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 781, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/home/xxxxxxxxx/workspace/ragflow/.venv/lib/python3.10/site-packages/mcp/client/streamable_http.py", line 409, in handle_request_async
    |     await self._handle_post_request(ctx)
    |   File "/home/xxxxxxxxx/workspace/ragflow/.venv/lib/python3.10/site-packages/mcp/client/streamable_http.py", line 278, in _handle_post_request
    |     response.raise_for_status()
    |   File "/home/xxxxxxxxx/workspace/ragflow/.venv/lib/python3.10/site-packages/httpx/_models.py", line 829, in raise_for_status
    |     raise HTTPStatusError(message, request=request, response=self)
    | httpx.HTTPStatusError: Server error '502 Bad Gateway' for url 'http://192.168.1.38:9382/mcp'
    | For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/502
    +------------------------------------
2025-11-05 11:11:51,942 ERROR    51209 Error fetching tools from MCP server: streamable-http: http://192.168.1.38:9382/mcp
Traceback (most recent call last):
  File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 168, in get_tools
    return future.result(timeout=timeout)
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "<@beartype(rag.utils.mcp_tool_call_conn.MCPToolCallSession._get_tools_from_mcp_server) at 0x7d58f02e2c20>", line 40, in _get_tools_from_mcp_server
  File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 160, in _get_tools_from_mcp_server
    result: ListToolsResult = await self._call_mcp_server("list_tools", timeout=timeout)
  File "<@beartype(rag.utils.mcp_tool_call_conn.MCPToolCallSession._call_mcp_server) at 0x7d58f02e2b00>", line 63, in _call_mcp_server
  File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 139, in _call_mcp_server
    raise result
ValueError: Connection failed (possibly due to auth error). Please check authentication settings first
2025-11-05 11:11:51,943 ERROR    51209 Test MCP error: Connection failed (possibly due to auth error). Please check authentication settings first
Traceback (most recent call last):
  File "/home/xxxxxxxxx/workspace/ragflow/api/apps/mcp_server_app.py", line 429, in test_mcp
    tools = tool_call_session.get_tools(timeout)
  File "<@beartype(rag.utils.mcp_tool_call_conn.MCPToolCallSession.get_tools) at 0x7d58f02e2cb0>", line 40, in get_tools
  File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 168, in get_tools
    return future.result(timeout=timeout)
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "<@beartype(rag.utils.mcp_tool_call_conn.MCPToolCallSession._get_tools_from_mcp_server) at 0x7d58f02e2c20>", line 40, in _get_tools_from_mcp_server
  File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 160, in _get_tools_from_mcp_server
    result: ListToolsResult = await self._call_mcp_server("list_tools", timeout=timeout)
  File "<@beartype(rag.utils.mcp_tool_call_conn.MCPToolCallSession._call_mcp_server) at 0x7d58f02e2b00>", line 63, in _call_mcp_server
  File "/home/xxxxxxxxx/workspace/ragflow/rag/utils/mcp_tool_call_conn.py", line 139, in _call_mcp_server
    raise result
ValueError: Connection failed (possibly due to auth error). Please check authentication settings first
2025-11-05 11:11:51,944 INFO     51209 Want to clean up 1 MCP sessions
2025-11-05 11:11:51,945 INFO     51209 1 MCP sessions has been cleaned up. 0 in global context.
2025-11-05 11:11:51,946 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:11:51] "POST /v1/mcp_server/test_mcp HTTP/1.1" 200 -
2025-11-05 11:12:20,484 INFO     51209 Negotiated protocol version: 2025-06-18
2025-11-05 11:12:20,485 INFO     51209 client_session initialized successfully
2025-11-05 11:12:20,570 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:12:20] "GET /api/v1/datasets?page=1&page_size=1000&orderby=create_time&desc=True HTTP/1.1" 200 -
2025-11-05 11:12:20,573 INFO     51209 Want to clean up 1 MCP sessions
2025-11-05 11:12:20,574 INFO     51209 1 MCP sessions has been cleaned up. 0 in global context.
2025-11-05 11:12:20,575 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:12:20] "POST /v1/mcp_server/test_mcp HTTP/1.1" 200 -
2025-11-05 11:15:02,119 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:15:02] "GET /api/v1/datasets?page=1&page_size=1000&orderby=create_time&desc=True HTTP/1.1" 200 -
2025-11-05 11:16:24,967 INFO     51209 127.0.0.1 - - [05/Nov/2025 11:16:24] "GET /api/v1/datasets?page=1&page_size=1000&orderby=create_time&desc=True HTTP/1.1" 200 -
2025-11-05 11:30:24,284 ERROR    51209 Task was destroyed but it is pending!
task: <Task pending name='Task-58' coro=<MCPToolCallSession._mcp_server_loop() running at <@beartype(rag.utils.mcp_tool_call_conn.MCPToolCallSession._mcp_server_loop) at 0x7d58f02e29e0>:11> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[_chain_future.<locals>._call_set_state() at /home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/futures.py:392]>
2025-11-05 11:30:24,285 ERROR    51209 Task was destroyed but it is pending!
task: <Task pending name='Task-67' coro=<Queue.get() running at /home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/queues.py:159> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[_release_waiter(<Future pendi...ask_wakeup()]>)() at /home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/tasks.py:387]>
Exception ignored in: <coroutine object Queue.get at 0x7d585480ace0>
Traceback (most recent call last):
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/queues.py", line 161, in get
    getter.cancel()  # Just in case getter is not done yet.
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
    self._check_closed()
  File "/home/xxxxxxxxx/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-05 19:15:27 +08:00
570 changed files with 24208 additions and 18894 deletions

View File

@ -19,7 +19,7 @@ jobs:
runs-on: [ "self-hosted", "ragflow-test" ] runs-on: [ "self-hosted", "ragflow-test" ]
steps: steps:
- name: Ensure workspace ownership - name: Ensure workspace ownership
run: echo "chown -R $USER $GITHUB_WORKSPACE" && sudo chown -R $USER $GITHUB_WORKSPACE run: echo "chown -R ${USER} ${GITHUB_WORKSPACE}" && sudo chown -R ${USER} ${GITHUB_WORKSPACE}
# https://github.com/actions/checkout/blob/v3/README.md # https://github.com/actions/checkout/blob/v3/README.md
- name: Check out code - name: Check out code
@ -31,37 +31,37 @@ jobs:
- name: Prepare release body - name: Prepare release body
run: | run: |
if [[ $GITHUB_EVENT_NAME == 'create' ]]; then if [[ ${GITHUB_EVENT_NAME} == "create" ]]; then
RELEASE_TAG=${GITHUB_REF#refs/tags/} RELEASE_TAG=${GITHUB_REF#refs/tags/}
if [[ $RELEASE_TAG == 'nightly' ]]; then if [[ ${RELEASE_TAG} == "nightly" ]]; then
PRERELEASE=true PRERELEASE=true
else else
PRERELEASE=false PRERELEASE=false
fi fi
echo "Workflow triggered by create tag: $RELEASE_TAG" echo "Workflow triggered by create tag: ${RELEASE_TAG}"
else else
RELEASE_TAG=nightly RELEASE_TAG=nightly
PRERELEASE=true PRERELEASE=true
echo "Workflow triggered by schedule" echo "Workflow triggered by schedule"
fi fi
echo "RELEASE_TAG=$RELEASE_TAG" >> $GITHUB_ENV echo "RELEASE_TAG=${RELEASE_TAG}" >> ${GITHUB_ENV}
echo "PRERELEASE=$PRERELEASE" >> $GITHUB_ENV echo "PRERELEASE=${PRERELEASE}" >> ${GITHUB_ENV}
RELEASE_DATETIME=$(date --rfc-3339=seconds) RELEASE_DATETIME=$(date --rfc-3339=seconds)
echo Release $RELEASE_TAG created from $GITHUB_SHA at $RELEASE_DATETIME > release_body.md echo Release ${RELEASE_TAG} created from ${GITHUB_SHA} at ${RELEASE_DATETIME} > release_body.md
- name: Move the existing mutable tag - name: Move the existing mutable tag
# https://github.com/softprops/action-gh-release/issues/171 # https://github.com/softprops/action-gh-release/issues/171
run: | run: |
git fetch --tags git fetch --tags
if [[ $GITHUB_EVENT_NAME == 'schedule' ]]; then if [[ ${GITHUB_EVENT_NAME} == "schedule" ]]; then
# Determine if a given tag exists and matches a specific Git commit. # Determine if a given tag exists and matches a specific Git commit.
# actions/checkout@v4 fetch-tags doesn't work when triggered by schedule # actions/checkout@v4 fetch-tags doesn't work when triggered by schedule
if [ "$(git rev-parse -q --verify "refs/tags/$RELEASE_TAG")" = "$GITHUB_SHA" ]; then if [ "$(git rev-parse -q --verify "refs/tags/${RELEASE_TAG}")" = "${GITHUB_SHA}" ]; then
echo "mutable tag $RELEASE_TAG exists and matches $GITHUB_SHA" echo "mutable tag ${RELEASE_TAG} exists and matches ${GITHUB_SHA}"
else else
git tag -f $RELEASE_TAG $GITHUB_SHA git tag -f ${RELEASE_TAG} ${GITHUB_SHA}
git push -f origin $RELEASE_TAG:refs/tags/$RELEASE_TAG git push -f origin ${RELEASE_TAG}:refs/tags/${RELEASE_TAG}
echo "created/moved mutable tag $RELEASE_TAG to $GITHUB_SHA" echo "created/moved mutable tag ${RELEASE_TAG} to ${GITHUB_SHA}"
fi fi
fi fi
@ -87,7 +87,7 @@ jobs:
- name: Build and push image - name: Build and push image
run: | run: |
echo ${{ secrets.DOCKERHUB_TOKEN }} | sudo docker login --username infiniflow --password-stdin sudo docker login --username infiniflow --password-stdin <<< ${{ secrets.DOCKERHUB_TOKEN }}
sudo docker build --build-arg NEED_MIRROR=1 -t infiniflow/ragflow:${RELEASE_TAG} -f Dockerfile . sudo docker build --build-arg NEED_MIRROR=1 -t infiniflow/ragflow:${RELEASE_TAG} -f Dockerfile .
sudo docker tag infiniflow/ragflow:${RELEASE_TAG} infiniflow/ragflow:latest sudo docker tag infiniflow/ragflow:${RELEASE_TAG} infiniflow/ragflow:latest
sudo docker push infiniflow/ragflow:${RELEASE_TAG} sudo docker push infiniflow/ragflow:${RELEASE_TAG}

View File

@ -9,8 +9,11 @@ on:
- 'docs/**' - 'docs/**'
- '*.md' - '*.md'
- '*.mdx' - '*.mdx'
pull_request: # The only difference between pull_request and pull_request_target is the context in which the workflow runs:
types: [ labeled, synchronize, reopened ] # — pull_request_target workflows use the workflow files from the default branch, and secrets are available.
# — pull_request workflows use the workflow files from the pull request branch, and secrets are unavailable.
pull_request_target:
types: [ synchronize, ready_for_review ]
paths-ignore: paths-ignore:
- 'docs/**' - 'docs/**'
- '*.md' - '*.md'
@ -28,7 +31,7 @@ jobs:
name: ragflow_tests name: ragflow_tests
# https://docs.github.com/en/actions/using-jobs/using-conditions-to-control-job-execution # https://docs.github.com/en/actions/using-jobs/using-conditions-to-control-job-execution
# https://github.com/orgs/community/discussions/26261 # https://github.com/orgs/community/discussions/26261
if: ${{ github.event_name != 'pull_request' || contains(github.event.pull_request.labels.*.name, 'ci') }} if: ${{ github.event_name != 'pull_request_target' || contains(github.event.pull_request.labels.*.name, 'ci') }}
runs-on: [ "self-hosted", "ragflow-test" ] runs-on: [ "self-hosted", "ragflow-test" ]
steps: steps:
# https://github.com/hmarr/debug-action # https://github.com/hmarr/debug-action
@ -37,19 +40,20 @@ jobs:
- name: Ensure workspace ownership - name: Ensure workspace ownership
run: | run: |
echo "Workflow triggered by ${{ github.event_name }}" echo "Workflow triggered by ${{ github.event_name }}"
echo "chown -R $USER $GITHUB_WORKSPACE" && sudo chown -R $USER $GITHUB_WORKSPACE echo "chown -R ${USER} ${GITHUB_WORKSPACE}" && sudo chown -R ${USER} ${GITHUB_WORKSPACE}
# https://github.com/actions/checkout/issues/1781 # https://github.com/actions/checkout/issues/1781
- name: Check out code - name: Check out code
uses: actions/checkout@v4 uses: actions/checkout@v4
with: with:
ref: ${{ (github.event_name == 'pull_request' || github.event_name == 'pull_request_target') && format('refs/pull/{0}/merge', github.event.pull_request.number) || github.sha }}
fetch-depth: 0 fetch-depth: 0
fetch-tags: true fetch-tags: true
- name: Check workflow duplication - name: Check workflow duplication
if: ${{ !cancelled() && !failure() && (github.event_name != 'pull_request' || contains(github.event.pull_request.labels.*.name, 'ci')) }} if: ${{ !cancelled() && !failure() }}
run: | run: |
if [[ "$GITHUB_EVENT_NAME" != "pull_request" && "$GITHUB_EVENT_NAME" != "schedule" ]]; then if [[ ${GITHUB_EVENT_NAME} != "pull_request_target" && ${GITHUB_EVENT_NAME} != "schedule" ]]; then
HEAD=$(git rev-parse HEAD) HEAD=$(git rev-parse HEAD)
# Find a PR that introduced a given commit # Find a PR that introduced a given commit
gh auth login --with-token <<< "${{ secrets.GITHUB_TOKEN }}" gh auth login --with-token <<< "${{ secrets.GITHUB_TOKEN }}"
@ -67,14 +71,14 @@ jobs:
gh run cancel ${GITHUB_RUN_ID} gh run cancel ${GITHUB_RUN_ID}
while true; do while true; do
status=$(gh run view ${GITHUB_RUN_ID} --json status -q .status) status=$(gh run view ${GITHUB_RUN_ID} --json status -q .status)
[ "$status" = "completed" ] && break [ "${status}" = "completed" ] && break
sleep 5 sleep 5
done done
exit 1 exit 1
fi fi
fi fi
fi fi
else elif [[ ${GITHUB_EVENT_NAME} == "pull_request_target" ]]; then
PR_NUMBER=${{ github.event.pull_request.number }} PR_NUMBER=${{ github.event.pull_request.number }}
PR_SHA_FP=${RUNNER_WORKSPACE_PREFIX}/artifacts/${GITHUB_REPOSITORY}/PR_${PR_NUMBER} PR_SHA_FP=${RUNNER_WORKSPACE_PREFIX}/artifacts/${GITHUB_REPOSITORY}/PR_${PR_NUMBER}
# Calculate the hash of the current workspace content # Calculate the hash of the current workspace content
@ -91,20 +95,52 @@ jobs:
version: ">=0.11.x" version: ">=0.11.x"
args: "check" args: "check"
- name: Check comments of changed Python files
if: ${{ false }}
run: |
if [[ ${{ github.event_name }} == 'pull_request_target' ]]; then
CHANGED_FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha }}...${{ github.event.pull_request.head.sha }} \
| grep -E '\.(py)$' || true)
if [ -n "$CHANGED_FILES" ]; then
echo "Check comments of changed Python files with check_comment_ascii.py"
readarray -t files <<< "$CHANGED_FILES"
HAS_ERROR=0
for file in "${files[@]}"; do
if [ -f "$file" ]; then
if python3 check_comment_ascii.py "$file"; then
echo "✅ $file"
else
echo "❌ $file"
HAS_ERROR=1
fi
fi
done
if [ $HAS_ERROR -ne 0 ]; then
exit 1
fi
else
echo "No Python files changed"
fi
fi
- name: Build ragflow:nightly - name: Build ragflow:nightly
run: | run: |
RUNNER_WORKSPACE_PREFIX=${RUNNER_WORKSPACE_PREFIX:-$HOME} RUNNER_WORKSPACE_PREFIX=${RUNNER_WORKSPACE_PREFIX:-${HOME}}
RAGFLOW_IMAGE=infiniflow/ragflow:${GITHUB_RUN_ID} RAGFLOW_IMAGE=infiniflow/ragflow:${GITHUB_RUN_ID}
echo "RAGFLOW_IMAGE=${RAGFLOW_IMAGE}" >> $GITHUB_ENV echo "RAGFLOW_IMAGE=${RAGFLOW_IMAGE}" >> ${GITHUB_ENV}
sudo docker pull ubuntu:22.04 sudo docker pull ubuntu:22.04
sudo DOCKER_BUILDKIT=1 docker build --build-arg NEED_MIRROR=1 -f Dockerfile -t ${RAGFLOW_IMAGE} . sudo DOCKER_BUILDKIT=1 docker build --build-arg NEED_MIRROR=1 -f Dockerfile -t ${RAGFLOW_IMAGE} .
if [[ "$GITHUB_EVENT_NAME" == "schedule" ]]; then if [[ ${GITHUB_EVENT_NAME} == "schedule" ]]; then
export HTTP_API_TEST_LEVEL=p3 export HTTP_API_TEST_LEVEL=p3
else else
export HTTP_API_TEST_LEVEL=p2 export HTTP_API_TEST_LEVEL=p2
fi fi
echo "HTTP_API_TEST_LEVEL=${HTTP_API_TEST_LEVEL}" >> $GITHUB_ENV echo "HTTP_API_TEST_LEVEL=${HTTP_API_TEST_LEVEL}" >> ${GITHUB_ENV}
echo "RAGFLOW_CONTAINER=${GITHUB_RUN_ID}-ragflow-cpu-1" >> $GITHUB_ENV echo "RAGFLOW_CONTAINER=${GITHUB_RUN_ID}-ragflow-cpu-1" >> ${GITHUB_ENV}
- name: Start ragflow:nightly - name: Start ragflow:nightly
run: | run: |
@ -154,7 +190,7 @@ jobs:
echo -e "COMPOSE_PROFILES=\${COMPOSE_PROFILES},tei-cpu" >> docker/.env echo -e "COMPOSE_PROFILES=\${COMPOSE_PROFILES},tei-cpu" >> docker/.env
echo -e "TEI_MODEL=BAAI/bge-small-en-v1.5" >> docker/.env echo -e "TEI_MODEL=BAAI/bge-small-en-v1.5" >> docker/.env
echo -e "RAGFLOW_IMAGE=${RAGFLOW_IMAGE}" >> docker/.env echo -e "RAGFLOW_IMAGE=${RAGFLOW_IMAGE}" >> docker/.env
echo "HOST_ADDRESS=http://host.docker.internal:${SVR_HTTP_PORT}" >> $GITHUB_ENV echo "HOST_ADDRESS=http://host.docker.internal:${SVR_HTTP_PORT}" >> ${GITHUB_ENV}
sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} up -d sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} up -d
uv sync --python 3.10 --only-group test --no-default-groups --frozen && uv pip install sdk/python uv sync --python 3.10 --only-group test --no-default-groups --frozen && uv pip install sdk/python
@ -189,7 +225,8 @@ jobs:
- name: Stop ragflow:nightly - name: Stop ragflow:nightly
if: always() # always run this step even if previous steps failed if: always() # always run this step even if previous steps failed
run: | run: |
sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} down -v sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} down -v || true
sudo docker ps -a --filter "label=com.docker.compose.project=${GITHUB_RUN_ID}" -q | xargs -r sudo docker rm -f
- name: Start ragflow:nightly - name: Start ragflow:nightly
run: | run: |
@ -226,5 +263,9 @@ jobs:
- name: Stop ragflow:nightly - name: Stop ragflow:nightly
if: always() # always run this step even if previous steps failed if: always() # always run this step even if previous steps failed
run: | run: |
sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} down -v # Sometimes `docker compose down` fail due to hang container, heavy load etc. Need to remove such containers to release resources(for example, listen ports).
sudo docker rmi -f ${RAGFLOW_IMAGE:-NO_IMAGE} || true sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} down -v || true
sudo docker ps -a --filter "label=com.docker.compose.project=${GITHUB_RUN_ID}" -q | xargs -r sudo docker rm -f
if [[ -n ${RAGFLOW_IMAGE} ]]; then
sudo docker rmi -f ${RAGFLOW_IMAGE}
fi

View File

@ -51,7 +51,9 @@ RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
apt install -y libpython3-dev libgtk-4-1 libnss3 xdg-utils libgbm-dev && \ apt install -y libpython3-dev libgtk-4-1 libnss3 xdg-utils libgbm-dev && \
apt install -y libjemalloc-dev && \ apt install -y libjemalloc-dev && \
apt install -y python3-pip pipx nginx unzip curl wget git vim less && \ apt install -y python3-pip pipx nginx unzip curl wget git vim less && \
apt install -y ghostscript apt install -y ghostscript && \
apt install -y pandoc && \
apt install -y texlive
RUN if [ "$NEED_MIRROR" == "1" ]; then \ RUN if [ "$NEED_MIRROR" == "1" ]; then \
pip3 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && \ pip3 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && \

View File

@ -22,7 +22,7 @@
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"> <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a> </a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank"> <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.21.1"> <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.22.1">
</a> </a>
<a href="https://github.com/infiniflow/ragflow/releases/latest"> <a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release"> <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@ -61,8 +61,7 @@
- 🔎 [System Architecture](#-system-architecture) - 🔎 [System Architecture](#-system-architecture)
- 🎬 [Get Started](#-get-started) - 🎬 [Get Started](#-get-started)
- 🔧 [Configurations](#-configurations) - 🔧 [Configurations](#-configurations)
- 🔧 [Build a docker image without embedding models](#-build-a-docker-image-without-embedding-models) - 🔧 [Build a Docker image](#-build-a-docker-image)
- 🔧 [Build a docker image including embedding models](#-build-a-docker-image-including-embedding-models)
- 🔨 [Launch service from source for development](#-launch-service-from-source-for-development) - 🔨 [Launch service from source for development](#-launch-service-from-source-for-development)
- 📚 [Documentation](#-documentation) - 📚 [Documentation](#-documentation)
- 📜 [Roadmap](#-roadmap) - 📜 [Roadmap](#-roadmap)
@ -86,6 +85,8 @@ Try our demo at [https://demo.ragflow.io](https://demo.ragflow.io).
## 🔥 Latest Updates ## 🔥 Latest Updates
- 2025-11-19 Supports Gemini 3 Pro.
- 2025-11-12 Supports data synchronization from Confluence, AWS S3, Discord, Google Drive.
- 2025-10-23 Supports MinerU & Docling as document parsing methods. - 2025-10-23 Supports MinerU & Docling as document parsing methods.
- 2025-10-15 Supports orchestrable ingestion pipeline. - 2025-10-15 Supports orchestrable ingestion pipeline.
- 2025-08-08 Supports OpenAI's latest GPT-5 series models. - 2025-08-08 Supports OpenAI's latest GPT-5 series models.
@ -93,9 +94,6 @@ Try our demo at [https://demo.ragflow.io](https://demo.ragflow.io).
- 2025-05-23 Adds a Python/JavaScript code executor component to Agent. - 2025-05-23 Adds a Python/JavaScript code executor component to Agent.
- 2025-05-05 Supports cross-language query. - 2025-05-05 Supports cross-language query.
- 2025-03-19 Supports using a multi-modal model to make sense of images within PDF or DOCX files. - 2025-03-19 Supports using a multi-modal model to make sense of images within PDF or DOCX files.
- 2025-02-28 Combined with Internet search (Tavily), supports reasoning like Deep Research for any LLMs.
- 2024-12-18 Upgrades Document Layout Analysis model in DeepDoc.
- 2024-08-22 Support text to SQL statements through RAG.
## 🎉 Stay Tuned ## 🎉 Stay Tuned
@ -189,25 +187,31 @@ releases! 🌟
> All Docker images are built for x86 platforms. We don't currently offer Docker images for ARM64. > All Docker images are built for x86 platforms. We don't currently offer Docker images for ARM64.
> If you are on an ARM64 platform, follow [this guide](https://ragflow.io/docs/dev/build_docker_image) to build a Docker image compatible with your system. > If you are on an ARM64 platform, follow [this guide](https://ragflow.io/docs/dev/build_docker_image) to build a Docker image compatible with your system.
> The command below downloads the `v0.21.1-slim` edition of the RAGFlow Docker image. See the following table for descriptions of different RAGFlow editions. To download a RAGFlow edition different from `v0.21.1-slim`, update the `RAGFLOW_IMAGE` variable accordingly in **docker/.env** before using `docker compose` to start the server. > The command below downloads the `v0.22.1` edition of the RAGFlow Docker image. See the following table for descriptions of different RAGFlow editions. To download a RAGFlow edition different from `v0.22.1`, update the `RAGFLOW_IMAGE` variable accordingly in **docker/.env** before using `docker compose` to start the server.
```bash ```bash
$ cd ragflow/docker $ cd ragflow/docker
# Use CPU for embedding and DeepDoc tasks:
# git checkout v0.22.1
# Optional: use a stable tag (see releases: https://github.com/infiniflow/ragflow/releases)
# This steps ensures the **entrypoint.sh** file in the code matches the Docker image version.
# Use CPU for DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d $ docker compose -f docker-compose.yml up -d
# To use GPU to accelerate embedding and DeepDoc tasks: # To use GPU to accelerate DeepDoc tasks:
# sed -i '1i DEVICE=gpu' .env # sed -i '1i DEVICE=gpu' .env
# docker compose -f docker-compose.yml up -d # docker compose -f docker-compose.yml up -d
``` ```
| RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? | > Note: Prior to `v0.22.0`, we provided both images with embedding models and slim images without embedding models. Details as follows:
| ----------------- | --------------- | --------------------- | -------------------------- |
| v0.21.1 | &approx;9 | ✔️ | Stable release |
| v0.21.1-slim | &approx;2 | ❌ | Stable release |
| nightly | &approx;2 | ❌ | _Unstable_ nightly build |
> Note: Starting with `v0.22.0`, we ship only the slim edition and no longer append the **-slim** suffix to the image tag. | RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? |
| ----------------- | --------------- | --------------------- | ------------------------ |
| v0.21.1 | &approx;9 | ✔️ | Stable release |
| v0.21.1-slim | &approx;2 | ❌ | Stable release |
> Starting with `v0.22.0`, we ship only the slim edition and no longer append the **-slim** suffix to the image tag.
4. Check the server status after having the server up and running: 4. Check the server status after having the server up and running:
@ -288,7 +292,7 @@ RAGFlow uses Elasticsearch by default for storing full text and vectors. To swit
> [!WARNING] > [!WARNING]
> Switching to Infinity on a Linux/arm64 machine is not yet officially supported. > Switching to Infinity on a Linux/arm64 machine is not yet officially supported.
## 🔧 Build a Docker image without embedding models ## 🔧 Build a Docker image
This image is approximately 2 GB in size and relies on external LLM and embedding services. This image is approximately 2 GB in size and relies on external LLM and embedding services.

View File

@ -22,7 +22,7 @@
<img alt="Lencana Daring" src="https://img.shields.io/badge/Online-Demo-4e6b99"> <img alt="Lencana Daring" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a> </a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank"> <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.21.1"> <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.22.1">
</a> </a>
<a href="https://github.com/infiniflow/ragflow/releases/latest"> <a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Rilis%20Terbaru" alt="Rilis Terbaru"> <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Rilis%20Terbaru" alt="Rilis Terbaru">
@ -61,8 +61,7 @@
- 🔎 [Arsitektur Sistem](#-arsitektur-sistem) - 🔎 [Arsitektur Sistem](#-arsitektur-sistem)
- 🎬 [Mulai](#-mulai) - 🎬 [Mulai](#-mulai)
- 🔧 [Konfigurasi](#-konfigurasi) - 🔧 [Konfigurasi](#-konfigurasi)
- 🔧 [Membangun Image Docker tanpa Model Embedding](#-membangun-image-docker-tanpa-model-embedding) - 🔧 [Membangun Image Docker](#-membangun-docker-image)
- 🔧 [Membangun Image Docker dengan Model Embedding](#-membangun-image-docker-dengan-model-embedding)
- 🔨 [Meluncurkan aplikasi dari Sumber untuk Pengembangan](#-meluncurkan-aplikasi-dari-sumber-untuk-pengembangan) - 🔨 [Meluncurkan aplikasi dari Sumber untuk Pengembangan](#-meluncurkan-aplikasi-dari-sumber-untuk-pengembangan)
- 📚 [Dokumentasi](#-dokumentasi) - 📚 [Dokumentasi](#-dokumentasi)
- 📜 [Peta Jalan](#-peta-jalan) - 📜 [Peta Jalan](#-peta-jalan)
@ -86,6 +85,8 @@ Coba demo kami di [https://demo.ragflow.io](https://demo.ragflow.io).
## 🔥 Pembaruan Terbaru ## 🔥 Pembaruan Terbaru
- 2025-11-19 Mendukung Gemini 3 Pro.
- 2025-11-12 Mendukung sinkronisasi data dari Confluence, AWS S3, Discord, Google Drive.
- 2025-10-23 Mendukung MinerU & Docling sebagai metode penguraian dokumen. - 2025-10-23 Mendukung MinerU & Docling sebagai metode penguraian dokumen.
- 2025-10-15 Dukungan untuk jalur data yang terorkestrasi. - 2025-10-15 Dukungan untuk jalur data yang terorkestrasi.
- 2025-08-08 Mendukung model seri GPT-5 terbaru dari OpenAI. - 2025-08-08 Mendukung model seri GPT-5 terbaru dari OpenAI.
@ -93,7 +94,6 @@ Coba demo kami di [https://demo.ragflow.io](https://demo.ragflow.io).
- 2025-05-23 Menambahkan komponen pelaksana kode Python/JS ke Agen. - 2025-05-23 Menambahkan komponen pelaksana kode Python/JS ke Agen.
- 2025-05-05 Mendukung kueri lintas bahasa. - 2025-05-05 Mendukung kueri lintas bahasa.
- 2025-03-19 Mendukung penggunaan model multi-modal untuk memahami gambar di dalam file PDF atau DOCX. - 2025-03-19 Mendukung penggunaan model multi-modal untuk memahami gambar di dalam file PDF atau DOCX.
- 2025-02-28 dikombinasikan dengan pencarian Internet (TAVILY), mendukung penelitian mendalam untuk LLM apa pun.
- 2024-12-18 Meningkatkan model Analisis Tata Letak Dokumen di DeepDoc. - 2024-12-18 Meningkatkan model Analisis Tata Letak Dokumen di DeepDoc.
- 2024-08-22 Dukungan untuk teks ke pernyataan SQL melalui RAG. - 2024-08-22 Dukungan untuk teks ke pernyataan SQL melalui RAG.
@ -187,25 +187,31 @@ Coba demo kami di [https://demo.ragflow.io](https://demo.ragflow.io).
> Semua gambar Docker dibangun untuk platform x86. Saat ini, kami tidak menawarkan gambar Docker untuk ARM64. > Semua gambar Docker dibangun untuk platform x86. Saat ini, kami tidak menawarkan gambar Docker untuk ARM64.
> Jika Anda menggunakan platform ARM64, [silakan gunakan panduan ini untuk membangun gambar Docker yang kompatibel dengan sistem Anda](https://ragflow.io/docs/dev/build_docker_image). > Jika Anda menggunakan platform ARM64, [silakan gunakan panduan ini untuk membangun gambar Docker yang kompatibel dengan sistem Anda](https://ragflow.io/docs/dev/build_docker_image).
> Perintah di bawah ini mengunduh edisi v0.21.1 dari gambar Docker RAGFlow. Silakan merujuk ke tabel berikut untuk deskripsi berbagai edisi RAGFlow. Untuk mengunduh edisi RAGFlow yang berbeda dari v0.21.1, perbarui variabel RAGFLOW_IMAGE di docker/.env sebelum menggunakan docker compose untuk memulai server. > Perintah di bawah ini mengunduh edisi v0.22.1 dari gambar Docker RAGFlow. Silakan merujuk ke tabel berikut untuk deskripsi berbagai edisi RAGFlow. Untuk mengunduh edisi RAGFlow yang berbeda dari v0.22.1, perbarui variabel RAGFLOW_IMAGE di docker/.env sebelum menggunakan docker compose untuk memulai server.
```bash ```bash
$ cd ragflow/docker $ cd ragflow/docker
# Use CPU for embedding and DeepDoc tasks:
# git checkout v0.22.1
# Opsional: gunakan tag stabil (lihat releases: https://github.com/infiniflow/ragflow/releases)
# This steps ensures the **entrypoint.sh** file in the code matches the Docker image version.
# Use CPU for DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d $ docker compose -f docker-compose.yml up -d
# To use GPU to accelerate embedding and DeepDoc tasks: # To use GPU to accelerate DeepDoc tasks:
# sed -i '1i DEVICE=gpu' .env # sed -i '1i DEVICE=gpu' .env
# docker compose -f docker-compose.yml up -d # docker compose -f docker-compose.yml up -d
``` ```
| RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? | > Catatan: Sebelum `v0.22.0`, kami menyediakan image dengan model embedding dan image slim tanpa model embedding. Detailnya sebagai berikut:
| ----------------- | --------------- | --------------------- | -------------------------- |
| v0.21.1 | &approx;9 | ✔️ | Stable release |
| v0.21.1-slim | &approx;2 | ❌ | Stable release |
| nightly | &approx;2 | ❌ | _Unstable_ nightly build |
> Catatan: Mulai dari `v0.22.0`, kami hanya menyediakan edisi slim dan tidak lagi menambahkan akhiran **-slim** pada tag image. | RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? |
| ----------------- | --------------- | --------------------- | ------------------------ |
| v0.21.1 | &approx;9 | ✔️ | Stable release |
| v0.21.1-slim | &approx;2 | ❌ | Stable release |
> Mulai dari `v0.22.0`, kami hanya menyediakan edisi slim dan tidak lagi menambahkan akhiran **-slim** pada tag image.
1. Periksa status server setelah server aktif dan berjalan: 1. Periksa status server setelah server aktif dan berjalan:
@ -260,7 +266,7 @@ Pembaruan konfigurasi ini memerlukan reboot semua kontainer agar efektif:
> $ docker compose -f docker-compose.yml up -d > $ docker compose -f docker-compose.yml up -d
> ``` > ```
## 🔧 Membangun Docker Image tanpa Model Embedding ## 🔧 Membangun Docker Image
Image ini berukuran sekitar 2 GB dan bergantung pada aplikasi LLM eksternal dan embedding. Image ini berukuran sekitar 2 GB dan bergantung pada aplikasi LLM eksternal dan embedding.

View File

@ -22,7 +22,7 @@
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"> <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a> </a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank"> <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.21.1"> <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.22.1">
</a> </a>
<a href="https://github.com/infiniflow/ragflow/releases/latest"> <a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release"> <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@ -66,6 +66,8 @@
## 🔥 最新情報 ## 🔥 最新情報
- 2025-11-19 Gemini 3 Proをサポートしています
- 2025-11-12 Confluence、AWS S3、Discord、Google Drive からのデータ同期をサポートします。
- 2025-10-23 ドキュメント解析方法として MinerU と Docling をサポートします。 - 2025-10-23 ドキュメント解析方法として MinerU と Docling をサポートします。
- 2025-10-15 オーケストレーションされたデータパイプラインのサポート。 - 2025-10-15 オーケストレーションされたデータパイプラインのサポート。
- 2025-08-08 OpenAI の最新 GPT-5 シリーズモデルをサポートします。 - 2025-08-08 OpenAI の最新 GPT-5 シリーズモデルをサポートします。
@ -73,7 +75,6 @@
- 2025-05-23 エージェントに Python/JS コードエグゼキュータコンポーネントを追加しました。 - 2025-05-23 エージェントに Python/JS コードエグゼキュータコンポーネントを追加しました。
- 2025-05-05 言語間クエリをサポートしました。 - 2025-05-05 言語間クエリをサポートしました。
- 2025-03-19 PDFまたはDOCXファイル内の画像を理解するために、多モーダルモデルを使用することをサポートします。 - 2025-03-19 PDFまたはDOCXファイル内の画像を理解するために、多モーダルモデルを使用することをサポートします。
- 2025-02-28 インターネット検索 (TAVILY) と組み合わせて、あらゆる LLM の詳細な調査をサポートします。
- 2024-12-18 DeepDoc のドキュメント レイアウト分析モデルをアップグレードします。 - 2024-12-18 DeepDoc のドキュメント レイアウト分析モデルをアップグレードします。
- 2024-08-22 RAG を介して SQL ステートメントへのテキストをサポートします。 - 2024-08-22 RAG を介して SQL ステートメントへのテキストをサポートします。
@ -166,28 +167,34 @@
> 現在、公式に提供されているすべての Docker イメージは x86 アーキテクチャ向けにビルドされており、ARM64 用の Docker イメージは提供されていません。 > 現在、公式に提供されているすべての Docker イメージは x86 アーキテクチャ向けにビルドされており、ARM64 用の Docker イメージは提供されていません。
> ARM64 アーキテクチャのオペレーティングシステムを使用している場合は、[このドキュメント](https://ragflow.io/docs/dev/build_docker_image)を参照して Docker イメージを自分でビルドしてください。 > ARM64 アーキテクチャのオペレーティングシステムを使用している場合は、[このドキュメント](https://ragflow.io/docs/dev/build_docker_image)を参照して Docker イメージを自分でビルドしてください。
> 以下のコマンドは、RAGFlow Docker イメージの v0.21.1 エディションをダウンロードします。異なる RAGFlow エディションの説明については、以下の表を参照してください。v0.21.1 とは異なるエディションをダウンロードするには、docker/.env ファイルの RAGFLOW_IMAGE 変数を適宜更新し、docker compose を使用してサーバーを起動してください。 > 以下のコマンドは、RAGFlow Docker イメージの v0.22.1 エディションをダウンロードします。異なる RAGFlow エディションの説明については、以下の表を参照してください。v0.22.1 とは異なるエディションをダウンロードするには、docker/.env ファイルの RAGFLOW_IMAGE 変数を適宜更新し、docker compose を使用してサーバーを起動してください。
```bash ```bash
$ cd ragflow/docker $ cd ragflow/docker
# Use CPU for embedding and DeepDoc tasks:
# git checkout v0.22.1
# 任意: 安定版タグを利用 (一覧: https://github.com/infiniflow/ragflow/releases)
# この手順は、コード内の entrypoint.sh ファイルが Docker イメージのバージョンと一致していることを確認します。
# Use CPU for DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d $ docker compose -f docker-compose.yml up -d
# To use GPU to accelerate embedding and DeepDoc tasks: # To use GPU to accelerate DeepDoc tasks:
# sed -i '1i DEVICE=gpu' .env # sed -i '1i DEVICE=gpu' .env
# docker compose -f docker-compose.yml up -d # docker compose -f docker-compose.yml up -d
``` ```
| RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? | > 注意:`v0.22.0` より前のバージョンでは、embedding モデルを含むイメージと、embedding モデルを含まない slim イメージの両方を提供していました。詳細は以下の通りです:
| ----------------- | --------------- | --------------------- | -------------------------- |
| v0.21.1 | &approx;9 | ✔️ | Stable release |
| v0.21.1-slim | &approx;2 | ❌ | Stable release |
| nightly | &approx;2 | ❌ | _Unstable_ nightly build |
> 注意:`v0.22.0` 以降、当プロジェクトでは slim エディションのみを提供し、イメージタグに **-slim** サフィックスを付けなくなりました。 | RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? |
| ----------------- | --------------- | --------------------- | ------------------------ |
| v0.21.1 | &approx;9 | ✔️ | Stable release |
| v0.21.1-slim | &approx;2 | ❌ | Stable release |
1. サーバーを立ち上げた後、サーバーの状態を確認する: > `v0.22.0` 以降、当プロジェクトでは slim エディションのみを提供し、イメージタグに **-slim** サフィックスを付けなくなりました。
1. サーバーを立ち上げた後、サーバーの状態を確認する:
```bash ```bash
$ docker logs -f docker-ragflow-cpu-1 $ docker logs -f docker-ragflow-cpu-1
``` ```
@ -259,7 +266,7 @@ RAGFlow はデフォルトで Elasticsearch を使用して全文とベクトル
> Linux/arm64 マシンでの Infinity への切り替えは正式にサポートされていません。 > Linux/arm64 マシンでの Infinity への切り替えは正式にサポートされていません。
> >
## 🔧 ソースコードで Docker イメージを作成(埋め込みモデルなし) ## 🔧 ソースコードで Docker イメージを作成
この Docker イメージのサイズは約 1GB で、外部の大モデルと埋め込みサービスに依存しています。 この Docker イメージのサイズは約 1GB で、外部の大モデルと埋め込みサービスに依存しています。

View File

@ -22,7 +22,7 @@
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"> <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a> </a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank"> <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.21.1"> <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.22.1">
</a> </a>
<a href="https://github.com/infiniflow/ragflow/releases/latest"> <a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release"> <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@ -67,6 +67,8 @@
## 🔥 업데이트 ## 🔥 업데이트
- 2025-11-19 Gemini 3 Pro를 지원합니다.
- 2025-11-12 Confluence, AWS S3, Discord, Google Drive에서 데이터 동기화를 지원합니다.
- 2025-10-23 문서 파싱 방법으로 MinerU 및 Docling을 지원합니다. - 2025-10-23 문서 파싱 방법으로 MinerU 및 Docling을 지원합니다.
- 2025-10-15 조정된 데이터 파이프라인 지원. - 2025-10-15 조정된 데이터 파이프라인 지원.
- 2025-08-08 OpenAI의 최신 GPT-5 시리즈 모델을 지원합니다. - 2025-08-08 OpenAI의 최신 GPT-5 시리즈 모델을 지원합니다.
@ -74,7 +76,6 @@
- 2025-05-23 Agent에 Python/JS 코드 실행기 구성 요소를 추가합니다. - 2025-05-23 Agent에 Python/JS 코드 실행기 구성 요소를 추가합니다.
- 2025-05-05 언어 간 쿼리를 지원합니다. - 2025-05-05 언어 간 쿼리를 지원합니다.
- 2025-03-19 PDF 또는 DOCX 파일 내의 이미지를 이해하기 위해 다중 모드 모델을 사용하는 것을 지원합니다. - 2025-03-19 PDF 또는 DOCX 파일 내의 이미지를 이해하기 위해 다중 모드 모델을 사용하는 것을 지원합니다.
- 2025-02-28 인터넷 검색(TAVILY)과 결합되어 모든 LLM에 대한 심층 연구를 지원합니다.
- 2024-12-18 DeepDoc의 문서 레이아웃 분석 모델 업그레이드. - 2024-12-18 DeepDoc의 문서 레이아웃 분석 모델 업그레이드.
- 2024-08-22 RAG를 통해 SQL 문에 텍스트를 지원합니다. - 2024-08-22 RAG를 통해 SQL 문에 텍스트를 지원합니다.
@ -168,25 +169,31 @@
> 모든 Docker 이미지는 x86 플랫폼을 위해 빌드되었습니다. 우리는 현재 ARM64 플랫폼을 위한 Docker 이미지를 제공하지 않습니다. > 모든 Docker 이미지는 x86 플랫폼을 위해 빌드되었습니다. 우리는 현재 ARM64 플랫폼을 위한 Docker 이미지를 제공하지 않습니다.
> ARM64 플랫폼을 사용 중이라면, [시스템과 호환되는 Docker 이미지를 빌드하려면 이 가이드를 사용해 주세요](https://ragflow.io/docs/dev/build_docker_image). > ARM64 플랫폼을 사용 중이라면, [시스템과 호환되는 Docker 이미지를 빌드하려면 이 가이드를 사용해 주세요](https://ragflow.io/docs/dev/build_docker_image).
> 아래 명령어는 RAGFlow Docker 이미지의 v0.21.1 버전을 다운로드합니다. 다양한 RAGFlow 버전에 대한 설명은 다음 표를 참조하십시오. v0.21.1과 다른 RAGFlow 버전을 다운로드하려면, docker/.env 파일에서 RAGFLOW_IMAGE 변수를 적절히 업데이트한 후 docker compose를 사용하여 서버를 시작하십시오. > 아래 명령어는 RAGFlow Docker 이미지의 v0.22.1 버전을 다운로드합니다. 다양한 RAGFlow 버전에 대한 설명은 다음 표를 참조하십시오. v0.22.1과 다른 RAGFlow 버전을 다운로드하려면, docker/.env 파일에서 RAGFLOW_IMAGE 변수를 적절히 업데이트한 후 docker compose를 사용하여 서버를 시작하십시오.
```bash ```bash
$ cd ragflow/docker $ cd ragflow/docker
# Use CPU for embedding and DeepDoc tasks:
# git checkout v0.22.1
# Optional: use a stable tag (see releases: https://github.com/infiniflow/ragflow/releases)
# 이 단계는 코드의 entrypoint.sh 파일이 Docker 이미지 버전과 일치하도록 보장합니다.
# Use CPU for DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d $ docker compose -f docker-compose.yml up -d
# To use GPU to accelerate embedding and DeepDoc tasks: # To use GPU to accelerate DeepDoc tasks:
# sed -i '1i DEVICE=gpu' .env # sed -i '1i DEVICE=gpu' .env
# docker compose -f docker-compose.yml up -d # docker compose -f docker-compose.yml up -d
``` ```
| RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? | > 참고: `v0.22.0` 이전 버전에서는 embedding 모델이 포함된 이미지와 embedding 모델이 포함되지 않은 slim 이미지를 모두 제공했습니다. 자세한 내용은 다음과 같습니다:
| ----------------- | --------------- | --------------------- | ------------------------ |
| v0.21.1 | &approx;9 | ✔️ | Stable release |
| v0.21.1-slim | &approx;2 | ❌ | Stable release |
| nightly | &approx;2 | ❌ | _Unstable_ nightly build |
> 참고: `v0.22.0`부터는 slim 에디션만 배포하며 이미지 태그에 **-slim** 접미사를 더 이상 붙이지 않습니다. | RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? |
| ----------------- | --------------- | --------------------- | ------------------------ |
| v0.21.1 | &approx;9 | ✔️ | Stable release |
| v0.21.1-slim | &approx;2 | ❌ | Stable release |
> `v0.22.0`부터는 slim 에디션만 배포하며 이미지 태그에 **-slim** 접미사를 더 이상 붙이지 않습니다.
1. 서버가 시작된 후 서버 상태를 확인하세요: 1. 서버가 시작된 후 서버 상태를 확인하세요:
@ -253,7 +260,7 @@ RAGFlow 는 기본적으로 Elasticsearch 를 사용하여 전체 텍스트 및
> [!WARNING] > [!WARNING]
> Linux/arm64 시스템에서 Infinity로 전환하는 것은 공식적으로 지원되지 않습니다. > Linux/arm64 시스템에서 Infinity로 전환하는 것은 공식적으로 지원되지 않습니다.
## 🔧 소스 코드로 Docker 이미지를 컴파일합니다(임베딩 모델 포함하지 않음) ## 🔧 소스 코드로 Docker 이미지를 컴파일합니다
이 Docker 이미지의 크기는 약 1GB이며, 외부 대형 모델과 임베딩 서비스에 의존합니다. 이 Docker 이미지의 크기는 약 1GB이며, 외부 대형 모델과 임베딩 서비스에 의존합니다.

View File

@ -22,7 +22,7 @@
<img alt="Badge Estático" src="https://img.shields.io/badge/Online-Demo-4e6b99"> <img alt="Badge Estático" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a> </a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank"> <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.21.1"> <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.22.1">
</a> </a>
<a href="https://github.com/infiniflow/ragflow/releases/latest"> <a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Última%20Relese" alt="Última Versão"> <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Última%20Relese" alt="Última Versão">
@ -86,6 +86,8 @@ Experimente nossa demo em [https://demo.ragflow.io](https://demo.ragflow.io).
## 🔥 Últimas Atualizações ## 🔥 Últimas Atualizações
- 19-11-2025 Suporta Gemini 3 Pro.
- 12-11-2025 Suporta a sincronização de dados do Confluence, AWS S3, Discord e Google Drive.
- 23-10-2025 Suporta MinerU e Docling como métodos de análise de documentos. - 23-10-2025 Suporta MinerU e Docling como métodos de análise de documentos.
- 15-10-2025 Suporte para pipelines de dados orquestrados. - 15-10-2025 Suporte para pipelines de dados orquestrados.
- 08-08-2025 Suporta a mais recente série GPT-5 da OpenAI. - 08-08-2025 Suporta a mais recente série GPT-5 da OpenAI.
@ -93,7 +95,6 @@ Experimente nossa demo em [https://demo.ragflow.io](https://demo.ragflow.io).
- 23-05-2025 Adicione o componente executor de código Python/JS ao Agente. - 23-05-2025 Adicione o componente executor de código Python/JS ao Agente.
- 05-05-2025 Suporte a consultas entre idiomas. - 05-05-2025 Suporte a consultas entre idiomas.
- 19-03-2025 Suporta o uso de um modelo multi-modal para entender imagens dentro de arquivos PDF ou DOCX. - 19-03-2025 Suporta o uso de um modelo multi-modal para entender imagens dentro de arquivos PDF ou DOCX.
- 28-02-2025 combinado com a pesquisa na Internet (T AVI LY), suporta pesquisas profundas para qualquer LLM.
- 18-12-2024 Atualiza o modelo de Análise de Layout de Documentos no DeepDoc. - 18-12-2024 Atualiza o modelo de Análise de Layout de Documentos no DeepDoc.
- 22-08-2024 Suporta conversão de texto para comandos SQL via RAG. - 22-08-2024 Suporta conversão de texto para comandos SQL via RAG.
@ -186,25 +187,31 @@ Experimente nossa demo em [https://demo.ragflow.io](https://demo.ragflow.io).
> Todas as imagens Docker são construídas para plataformas x86. Atualmente, não oferecemos imagens Docker para ARM64. > Todas as imagens Docker são construídas para plataformas x86. Atualmente, não oferecemos imagens Docker para ARM64.
> Se você estiver usando uma plataforma ARM64, por favor, utilize [este guia](https://ragflow.io/docs/dev/build_docker_image) para construir uma imagem Docker compatível com o seu sistema. > Se você estiver usando uma plataforma ARM64, por favor, utilize [este guia](https://ragflow.io/docs/dev/build_docker_image) para construir uma imagem Docker compatível com o seu sistema.
> O comando abaixo baixa a edição`v0.21.1` da imagem Docker do RAGFlow. Consulte a tabela a seguir para descrições de diferentes edições do RAGFlow. Para baixar uma edição do RAGFlow diferente da `v0.21.1`, atualize a variável `RAGFLOW_IMAGE` conforme necessário no **docker/.env** antes de usar `docker compose` para iniciar o servidor. > O comando abaixo baixa a edição`v0.22.1` da imagem Docker do RAGFlow. Consulte a tabela a seguir para descrições de diferentes edições do RAGFlow. Para baixar uma edição do RAGFlow diferente da `v0.22.1`, atualize a variável `RAGFLOW_IMAGE` conforme necessário no **docker/.env** antes de usar `docker compose` para iniciar o servidor.
```bash ```bash
$ cd ragflow/docker $ cd ragflow/docker
# Use CPU for embedding and DeepDoc tasks:
# git checkout v0.22.1
# Opcional: use uma tag estável (veja releases: https://github.com/infiniflow/ragflow/releases)
# Esta etapa garante que o arquivo entrypoint.sh no código corresponda à versão da imagem do Docker.
# Use CPU for DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d $ docker compose -f docker-compose.yml up -d
# To use GPU to accelerate embedding and DeepDoc tasks: # To use GPU to accelerate DeepDoc tasks:
# sed -i '1i DEVICE=gpu' .env # sed -i '1i DEVICE=gpu' .env
# docker compose -f docker-compose.yml up -d # docker compose -f docker-compose.yml up -d
``` ```
| Tag da imagem RAGFlow | Tamanho da imagem (GB) | Possui modelos de incorporação? | Estável? | > Nota: Antes da `v0.22.0`, fornecíamos imagens com modelos de embedding e imagens slim sem modelos de embedding. Detalhes a seguir:
| --------------------- | ---------------------- | --------------------------------- | ------------------------------ |
| v0.21.1 | &approx;9 | ✔️ | Lançamento estável |
| v0.21.1-slim | &approx;2 | ❌ | Lançamento estável |
| nightly | &approx;2 | ❌ | Construção noturna instável |
> Observação: A partir da`v0.22.0`, distribuímos apenas a edição slim e não adicionamos mais o sufixo **-slim** às tags das imagens. | RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? |
| ----------------- | --------------- | --------------------- | ------------------------ |
| v0.21.1 | &approx;9 | ✔️ | Stable release |
| v0.21.1-slim | &approx;2 | ❌ | Stable release |
> A partir da `v0.22.0`, distribuímos apenas a edição slim e não adicionamos mais o sufixo **-slim** às tags das imagens.
4. Verifique o status do servidor após tê-lo iniciado: 4. Verifique o status do servidor após tê-lo iniciado:
@ -274,9 +281,9 @@ O RAGFlow usa o Elasticsearch por padrão para armazenar texto completo e vetore
``` ```
> [!ATENÇÃO] > [!ATENÇÃO]
> A mudança para o Infinity em uma máquina Linux/arm64 ainda não é oficialmente suportada. > A mudança para o Infinity em uma máquina Linux/arm64 ainda não é oficialmente suportada.
## 🔧 Criar uma imagem Docker sem modelos de incorporação ## 🔧 Criar uma imagem Docker
Esta imagem tem cerca de 2 GB de tamanho e depende de serviços externos de LLM e incorporação. Esta imagem tem cerca de 2 GB de tamanho e depende de serviços externos de LLM e incorporação.

View File

@ -22,7 +22,7 @@
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"> <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a> </a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank"> <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.21.1"> <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.22.1">
</a> </a>
<a href="https://github.com/infiniflow/ragflow/releases/latest"> <a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release"> <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@ -85,6 +85,8 @@
## 🔥 近期更新 ## 🔥 近期更新
- 2025-11-19 支援 Gemini 3 Pro.
- 2025-11-12 支援從 Confluence、AWS S3、Discord、Google Drive 進行資料同步。
- 2025-10-23 支援 MinerU 和 Docling 作為文件解析方法。 - 2025-10-23 支援 MinerU 和 Docling 作為文件解析方法。
- 2025-10-15 支援可編排的資料管道。 - 2025-10-15 支援可編排的資料管道。
- 2025-08-08 支援 OpenAI 最新的 GPT-5 系列模型。 - 2025-08-08 支援 OpenAI 最新的 GPT-5 系列模型。
@ -92,7 +94,6 @@
- 2025-05-23 為 Agent 新增 Python/JS 程式碼執行器元件。 - 2025-05-23 為 Agent 新增 Python/JS 程式碼執行器元件。
- 2025-05-05 支援跨語言查詢。 - 2025-05-05 支援跨語言查詢。
- 2025-03-19 PDF和DOCX中的圖支持用多模態大模型去解析得到描述. - 2025-03-19 PDF和DOCX中的圖支持用多模態大模型去解析得到描述.
- 2025-02-28 結合網路搜尋Tavily對於任意大模型實現類似 Deep Research 的推理功能.
- 2024-12-18 升級了 DeepDoc 的文檔佈局分析模型。 - 2024-12-18 升級了 DeepDoc 的文檔佈局分析模型。
- 2024-08-22 支援用 RAG 技術實現從自然語言到 SQL 語句的轉換。 - 2024-08-22 支援用 RAG 技術實現從自然語言到 SQL 語句的轉換。
@ -185,25 +186,31 @@
> 所有 Docker 映像檔都是為 x86 平台建置的。目前,我們不提供 ARM64 平台的 Docker 映像檔。 > 所有 Docker 映像檔都是為 x86 平台建置的。目前,我們不提供 ARM64 平台的 Docker 映像檔。
> 如果您使用的是 ARM64 平台,請使用 [這份指南](https://ragflow.io/docs/dev/build_docker_image) 來建置適合您系統的 Docker 映像檔。 > 如果您使用的是 ARM64 平台,請使用 [這份指南](https://ragflow.io/docs/dev/build_docker_image) 來建置適合您系統的 Docker 映像檔。
> 執行以下指令會自動下載 RAGFlow slim Docker 映像 `v0.21.1`。請參考下表查看不同 Docker 發行版的說明。如需下載不同於 `v0.21.1` 的 Docker 映像,請在執行 `docker compose` 啟動服務之前先更新 **docker/.env** 檔案內的 `RAGFLOW_IMAGE` 變數。 > 執行以下指令會自動下載 RAGFlow Docker 映像 `v0.22.1`。請參考下表查看不同 Docker 發行版的說明。如需下載不同於 `v0.22.1` 的 Docker 映像,請在執行 `docker compose` 啟動服務之前先更新 **docker/.env** 檔案內的 `RAGFLOW_IMAGE` 變數。
```bash ```bash
$ cd ragflow/docker $ cd ragflow/docker
# Use CPU for embedding and DeepDoc tasks:
# git checkout v0.22.1
# 可選使用穩定版標籤查看發佈https://github.com/infiniflow/ragflow/releases
# 此步驟確保程式碼中的 entrypoint.sh 檔案與 Docker 映像版本一致。
# Use CPU for DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d $ docker compose -f docker-compose.yml up -d
# To use GPU to accelerate embedding and DeepDoc tasks: # To use GPU to accelerate DeepDoc tasks:
# sed -i '1i DEVICE=gpu' .env # sed -i '1i DEVICE=gpu' .env
# docker compose -f docker-compose.yml up -d # docker compose -f docker-compose.yml up -d
``` ```
| RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? | > 注意:在 `v0.22.0` 之前的版本,我們會同時提供包含 embedding 模型的映像和不含 embedding 模型的 slim 映像。具體如下:
| ----------------- | --------------- | --------------------- | -------------------------- |
| v0.21.1 | &approx;9 | ✔️ | Stable release |
| v0.21.1-slim | &approx;2 | ❌ | Stable release |
| nightly | &approx;2 | ❌ | _Unstable_ nightly build |
> 注意:自 `v0.22.0` 起,我們僅發佈 slim 版本,並且不再在映像標籤後附加 **-slim** 後綴。 | RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? |
| ----------------- | --------------- | --------------------- | ------------------------ |
| v0.21.1 | &approx;9 | ✔️ | Stable release |
| v0.21.1-slim | &approx;2 | ❌ | Stable release |
> 從 `v0.22.0` 開始,我們只發佈 slim 版本,並且不再在映像標籤後附加 **-slim** 後綴。
> [!TIP] > [!TIP]
> 如果你遇到 Docker 映像檔拉不下來的問題,可以在 **docker/.env** 檔案內根據變數 `RAGFLOW_IMAGE` 的註解提示選擇華為雲或阿里雲的對應映像。 > 如果你遇到 Docker 映像檔拉不下來的問題,可以在 **docker/.env** 檔案內根據變數 `RAGFLOW_IMAGE` 的註解提示選擇華為雲或阿里雲的對應映像。
@ -285,7 +292,7 @@ RAGFlow 預設使用 Elasticsearch 儲存文字和向量資料. 如果要切換
> [!WARNING] > [!WARNING]
> Infinity 目前官方並未正式支援在 Linux/arm64 架構下的機器上運行. > Infinity 目前官方並未正式支援在 Linux/arm64 架構下的機器上運行.
## 🔧 原始碼編譯 Docker 映像(不含 embedding 模型) ## 🔧 原始碼編譯 Docker 映像
本 Docker 映像大小約 2 GB 左右並且依賴外部的大模型和 embedding 服務。 本 Docker 映像大小約 2 GB 左右並且依賴外部的大模型和 embedding 服務。

View File

@ -22,7 +22,7 @@
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"> <img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a> </a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank"> <a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.21.1"> <img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.22.1">
</a> </a>
<a href="https://github.com/infiniflow/ragflow/releases/latest"> <a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release"> <img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@ -85,6 +85,8 @@
## 🔥 近期更新 ## 🔥 近期更新
- 2025-11-19 支持 Gemini 3 Pro.
- 2025-11-12 支持从 Confluence、AWS S3、Discord、Google Drive 进行数据同步。
- 2025-10-23 支持 MinerU 和 Docling 作为文档解析方法。 - 2025-10-23 支持 MinerU 和 Docling 作为文档解析方法。
- 2025-10-15 支持可编排的数据管道。 - 2025-10-15 支持可编排的数据管道。
- 2025-08-08 支持 OpenAI 最新的 GPT-5 系列模型。 - 2025-08-08 支持 OpenAI 最新的 GPT-5 系列模型。
@ -92,7 +94,6 @@
- 2025-05-23 Agent 新增 Python/JS 代码执行器组件。 - 2025-05-23 Agent 新增 Python/JS 代码执行器组件。
- 2025-05-05 支持跨语言查询。 - 2025-05-05 支持跨语言查询。
- 2025-03-19 PDF 和 DOCX 中的图支持用多模态大模型去解析得到描述. - 2025-03-19 PDF 和 DOCX 中的图支持用多模态大模型去解析得到描述.
- 2025-02-28 结合互联网搜索Tavily对于任意大模型实现类似 Deep Research 的推理功能.
- 2024-12-18 升级了 DeepDoc 的文档布局分析模型。 - 2024-12-18 升级了 DeepDoc 的文档布局分析模型。
- 2024-08-22 支持用 RAG 技术实现从自然语言到 SQL 语句的转换。 - 2024-08-22 支持用 RAG 技术实现从自然语言到 SQL 语句的转换。
@ -186,25 +187,31 @@
> 请注意,目前官方提供的所有 Docker 镜像均基于 x86 架构构建,并不提供基于 ARM64 的 Docker 镜像。 > 请注意,目前官方提供的所有 Docker 镜像均基于 x86 架构构建,并不提供基于 ARM64 的 Docker 镜像。
> 如果你的操作系统是 ARM64 架构,请参考[这篇文档](https://ragflow.io/docs/dev/build_docker_image)自行构建 Docker 镜像。 > 如果你的操作系统是 ARM64 架构,请参考[这篇文档](https://ragflow.io/docs/dev/build_docker_image)自行构建 Docker 镜像。
> 运行以下命令会自动下载 RAGFlow slim Docker 镜像 `v0.21.1`。请参考下表查看不同 Docker 发行版的描述。如需下载不同于 `v0.21.1` 的 Docker 镜像,请在运行 `docker compose` 启动服务之前先更新 **docker/.env** 文件内的 `RAGFLOW_IMAGE` 变量。 > 运行以下命令会自动下载 RAGFlow Docker 镜像 `v0.22.1`。请参考下表查看不同 Docker 发行版的描述。如需下载不同于 `v0.22.1` 的 Docker 镜像,请在运行 `docker compose` 启动服务之前先更新 **docker/.env** 文件内的 `RAGFLOW_IMAGE` 变量。
```bash ```bash
$ cd ragflow/docker $ cd ragflow/docker
# Use CPU for embedding and DeepDoc tasks:
# git checkout v0.22.1
# 可选使用稳定版本标签查看发布https://github.com/infiniflow/ragflow/releases
# 这一步确保代码中的 entrypoint.sh 文件与 Docker 镜像的版本保持一致。
# Use CPU for DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d $ docker compose -f docker-compose.yml up -d
# To use GPU to accelerate embedding and DeepDoc tasks: # To use GPU to accelerate DeepDoc tasks:
# sed -i '1i DEVICE=gpu' .env # sed -i '1i DEVICE=gpu' .env
# docker compose -f docker-compose.yml up -d # docker compose -f docker-compose.yml up -d
``` ```
> 注意:在 `v0.22.0` 之前的版本,我们会同时提供包含 embedding 模型的镜像和不含 embedding 模型的 slim 镜像。具体如下:
| RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? | | RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? |
| ----------------- | --------------- | --------------------- | ------------------------ | | ----------------- | --------------- | --------------------- | ------------------------ |
| v0.21.1 | &approx;9 | ✔️ | Stable release | | v0.21.1 | &approx;9 | ✔️ | Stable release |
| v0.21.1-slim | &approx;2 | ❌ | Stable release | | v0.21.1-slim | &approx;2 | ❌ | Stable release |
| nightly | &approx;2 | ❌ | _Unstable_ nightly build |
> 注意:从 `v0.22.0` 开始,我们只发布 slim 版本,并且不再在镜像标签后附加 **-slim** 后缀。 > 从 `v0.22.0` 开始,我们只发布 slim 版本,并且不再在镜像标签后附加 **-slim** 后缀。
> [!TIP] > [!TIP]
> 如果你遇到 Docker 镜像拉不下来的问题,可以在 **docker/.env** 文件内根据变量 `RAGFLOW_IMAGE` 的注释提示选择华为云或者阿里云的相应镜像。 > 如果你遇到 Docker 镜像拉不下来的问题,可以在 **docker/.env** 文件内根据变量 `RAGFLOW_IMAGE` 的注释提示选择华为云或者阿里云的相应镜像。
@ -284,7 +291,7 @@ RAGFlow 默认使用 Elasticsearch 存储文本和向量数据. 如果要切换
> [!WARNING] > [!WARNING]
> Infinity 目前官方并未正式支持在 Linux/arm64 架构下的机器上运行. > Infinity 目前官方并未正式支持在 Linux/arm64 架构下的机器上运行.
## 🔧 源码编译 Docker 镜像(不含 embedding 模型) ## 🔧 源码编译 Docker 镜像
本 Docker 镜像大小约 2 GB 左右并且依赖外部的大模型和 embedding 服务。 本 Docker 镜像大小约 2 GB 左右并且依赖外部的大模型和 embedding 服务。

View File

@ -4,7 +4,7 @@
Admin Service is a dedicated management component designed to monitor, maintain, and administrate the RAGFlow system. It provides comprehensive tools for ensuring system stability, performing operational tasks, and managing users and permissions efficiently. Admin Service is a dedicated management component designed to monitor, maintain, and administrate the RAGFlow system. It provides comprehensive tools for ensuring system stability, performing operational tasks, and managing users and permissions efficiently.
The service offers real-time monitoring of critical components, including the RAGFlow server, Task Executor processes, and dependent services such as MySQL, Elasticsearch, Redis, and MinIO. It automatically checks their health status, resource usage, and uptime, and performs restarts in case of failures to minimize downtime. The service offers real-time monitoring of critical components, including the RAGFlow server, Task Executor processes, and dependent services such as MySQL, Infinity, Elasticsearch, Redis, and MinIO. It automatically checks their health status, resource usage, and uptime, and performs restarts in case of failures to minimize downtime.
For user and system management, it supports listing, creating, modifying, and deleting users and their associated resources like knowledge bases and Agents. For user and system management, it supports listing, creating, modifying, and deleting users and their associated resources like knowledge bases and Agents.
@ -48,7 +48,7 @@ It consists of a server-side Service and a command-line client (CLI), both imple
1. Ensure the Admin Service is running. 1. Ensure the Admin Service is running.
2. Install ragflow-cli. 2. Install ragflow-cli.
```bash ```bash
pip install ragflow-cli==0.21.1 pip install ragflow-cli==0.22.1
``` ```
3. Launch the CLI client: 3. Launch the CLI client:
```bash ```bash

View File

@ -23,6 +23,7 @@ from Cryptodome.Cipher import PKCS1_v1_5 as Cipher_pkcs1_v1_5
from typing import Dict, List, Any from typing import Dict, List, Any
from lark import Lark, Transformer, Tree from lark import Lark, Transformer, Tree
import requests import requests
import getpass
GRAMMAR = r""" GRAMMAR = r"""
start: command start: command
@ -51,6 +52,7 @@ sql_command: list_services
| revoke_permission | revoke_permission
| alter_user_role | alter_user_role
| show_user_permission | show_user_permission
| show_version
// meta command definition // meta command definition
meta_command: "\\" meta_command_name [meta_args] meta_command: "\\" meta_command_name [meta_args]
@ -92,6 +94,7 @@ FOR: "FOR"i
RESOURCES: "RESOURCES"i RESOURCES: "RESOURCES"i
ON: "ON"i ON: "ON"i
SET: "SET"i SET: "SET"i
VERSION: "VERSION"i
list_services: LIST SERVICES ";" list_services: LIST SERVICES ";"
show_service: SHOW SERVICE NUMBER ";" show_service: SHOW SERVICE NUMBER ";"
@ -120,6 +123,8 @@ revoke_permission: REVOKE action_list ON identifier FROM ROLE identifier ";"
alter_user_role: ALTER USER quoted_string SET ROLE identifier ";" alter_user_role: ALTER USER quoted_string SET ROLE identifier ";"
show_user_permission: SHOW USER PERMISSION quoted_string ";" show_user_permission: SHOW USER PERMISSION quoted_string ";"
show_version: SHOW VERSION ";"
action_list: identifier ("," identifier)* action_list: identifier ("," identifier)*
identifier: WORD identifier: WORD
@ -246,6 +251,9 @@ class AdminTransformer(Transformer):
user_name = items[3] user_name = items[3]
return {"type": "show_user_permission", "user_name": user_name} return {"type": "show_user_permission", "user_name": user_name}
def show_version(self, items):
return {"type": "show_version"}
def action_list(self, items): def action_list(self, items):
return items return items
@ -359,7 +367,7 @@ class AdminCLI(Cmd):
if single_command: if single_command:
admin_passwd = arguments['password'] admin_passwd = arguments['password']
else: else:
admin_passwd = input(f"password for {self.admin_account}: ").strip() admin_passwd = getpass.getpass(f"password for {self.admin_account}: ").strip()
try: try:
self.admin_password = encrypt(admin_passwd) self.admin_password = encrypt(admin_passwd)
response = self.session.post(url, json={'email': self.admin_account, 'password': self.admin_password}) response = self.session.post(url, json={'email': self.admin_account, 'password': self.admin_password})
@ -370,7 +378,7 @@ class AdminCLI(Cmd):
self.session.headers.update({ self.session.headers.update({
'Content-Type': 'application/json', 'Content-Type': 'application/json',
'Authorization': response.headers['Authorization'], 'Authorization': response.headers['Authorization'],
'User-Agent': 'RAGFlow-CLI/0.21.1' 'User-Agent': 'RAGFlow-CLI/0.22.1'
}) })
print("Authentication successful.") print("Authentication successful.")
return True return True
@ -384,6 +392,23 @@ class AdminCLI(Cmd):
print(str(e)) print(str(e))
print(f"Can't access {self.host}, port: {self.port}") print(f"Can't access {self.host}, port: {self.port}")
def _format_service_detail_table(self, data):
if isinstance(data, list):
return data
if not all([isinstance(v, list) for v in data.values()]):
# normal table
return data
# handle task_executor heartbeats map, for example {'name': [{'done': 2, 'now': timestamp1}, {'done': 3, 'now': timestamp2}]
task_executor_list = []
for k, v in data.items():
# display latest status
heartbeats = sorted(v, key=lambda x: x["now"], reverse=True)
task_executor_list.append({
"task_executor_name": k,
**heartbeats[0],
} if heartbeats else {"task_executor_name": k})
return task_executor_list
def _print_table_simple(self, data): def _print_table_simple(self, data):
if not data: if not data:
print("No data to print") print("No data to print")
@ -392,7 +417,8 @@ class AdminCLI(Cmd):
# handle single row data # handle single row data
data = [data] data = [data]
columns = list(data[0].keys()) columns = list(set().union(*(d.keys() for d in data)))
columns.sort()
col_widths = {} col_widths = {}
def get_string_width(text): def get_string_width(text):
@ -555,6 +581,8 @@ class AdminCLI(Cmd):
self._alter_user_role(command_dict) self._alter_user_role(command_dict)
case 'show_user_permission': case 'show_user_permission':
self._show_user_permission(command_dict) self._show_user_permission(command_dict)
case 'show_version':
self._show_version(command_dict)
case 'meta': case 'meta':
self._handle_meta_command(command_dict) self._handle_meta_command(command_dict)
case _: case _:
@ -585,7 +613,8 @@ class AdminCLI(Cmd):
if isinstance(res_data['message'], str): if isinstance(res_data['message'], str):
print(res_data['message']) print(res_data['message'])
else: else:
self._print_table_simple(res_data['message']) data = self._format_service_detail_table(res_data['message'])
self._print_table_simple(data)
else: else:
print(f"Service {res_data['service_name']} is down, {res_data['message']}") print(f"Service {res_data['service_name']} is down, {res_data['message']}")
else: else:
@ -622,7 +651,9 @@ class AdminCLI(Cmd):
response = self.session.get(url) response = self.session.get(url)
res_json = response.json() res_json = response.json()
if response.status_code == 200: if response.status_code == 200:
self._print_table_simple(res_json['data']) table_data = res_json['data']
table_data.pop('avatar')
self._print_table_simple(table_data)
else: else:
print(f"Fail to get user {user_name}, code: {res_json['code']}, message: {res_json['message']}") print(f"Fail to get user {user_name}, code: {res_json['code']}, message: {res_json['message']}")
@ -695,7 +726,10 @@ class AdminCLI(Cmd):
response = self.session.get(url) response = self.session.get(url)
res_json = response.json() res_json = response.json()
if response.status_code == 200: if response.status_code == 200:
self._print_table_simple(res_json['data']) table_data = res_json['data']
for t in table_data:
t.pop('avatar')
self._print_table_simple(table_data)
else: else:
print(f"Fail to get all datasets of {user_name}, code: {res_json['code']}, message: {res_json['message']}") print(f"Fail to get all datasets of {user_name}, code: {res_json['code']}, message: {res_json['message']}")
@ -707,7 +741,10 @@ class AdminCLI(Cmd):
response = self.session.get(url) response = self.session.get(url)
res_json = response.json() res_json = response.json()
if response.status_code == 200: if response.status_code == 200:
self._print_table_simple(res_json['data']) table_data = res_json['data']
for t in table_data:
t.pop('avatar')
self._print_table_simple(table_data)
else: else:
print(f"Fail to get all agents of {user_name}, code: {res_json['code']}, message: {res_json['message']}") print(f"Fail to get all agents of {user_name}, code: {res_json['code']}, message: {res_json['message']}")
@ -861,6 +898,16 @@ class AdminCLI(Cmd):
print( print(
f"Fail to show user: {user_name_str} permission, code: {res_json['code']}, message: {res_json['message']}") f"Fail to show user: {user_name_str} permission, code: {res_json['code']}, message: {res_json['message']}")
def _show_version(self, command):
print("show_version")
url = f'http://{self.host}:{self.port}/api/v1/admin/version'
response = self.session.get(url)
res_json = response.json()
if response.status_code == 200:
self._print_table_simple(res_json['data'])
else:
print(f"Fail to show version, code: {res_json['code']}, message: {res_json['message']}")
def _handle_meta_command(self, command): def _handle_meta_command(self, command):
meta_command = command['command'] meta_command = command['command']
args = command.get('args', []) args = command.get('args', [])

View File

@ -1,6 +1,6 @@
[project] [project]
name = "ragflow-cli" name = "ragflow-cli"
version = "0.21.1" version = "0.22.1"
description = "Admin Service's client of [RAGFlow](https://github.com/infiniflow/ragflow). The Admin Service provides user management and system monitoring. " description = "Admin Service's client of [RAGFlow](https://github.com/infiniflow/ragflow). The Admin Service provides user management and system monitoring. "
authors = [{ name = "Lynn", email = "lynn_inf@hotmail.com" }] authors = [{ name = "Lynn", email = "lynn_inf@hotmail.com" }]
license = { text = "Apache License, Version 2.0" } license = { text = "Apache License, Version 2.0" }

View File

@ -20,17 +20,19 @@ import logging
import time import time
import threading import threading
import traceback import traceback
from werkzeug.serving import run_simple
from flask import Flask from flask import Flask
from flask_login import LoginManager
from werkzeug.serving import run_simple
from routes import admin_bp from routes import admin_bp
from common.log_utils import init_root_logger from common.log_utils import init_root_logger
from common.constants import SERVICE_CONF from common.constants import SERVICE_CONF
from common.config_utils import show_configs from common.config_utils import show_configs
from api import settings from common import settings
from config import load_configurations, SERVICE_CONFIGS from config import load_configurations, SERVICE_CONFIGS
from auth import init_default_admin, setup_auth from auth import init_default_admin, setup_auth
from flask_session import Session from flask_session import Session
from flask_login import LoginManager from common.versions import get_ragflow_version
stop_event = threading.Event() stop_event = threading.Event()
@ -52,6 +54,7 @@ if __name__ == '__main__':
os.environ.get("MAX_CONTENT_LENGTH", 1024 * 1024 * 1024) os.environ.get("MAX_CONTENT_LENGTH", 1024 * 1024 * 1024)
) )
Session(app) Session(app)
logging.info(f'RAGFlow version: {get_ragflow_version()}')
show_configs() show_configs()
login_manager = LoginManager() login_manager = LoginManager()
login_manager.init_app(app) login_manager.init_app(app)
@ -67,7 +70,7 @@ if __name__ == '__main__':
port=9381, port=9381,
application=app, application=app,
threaded=True, threaded=True,
use_reloader=True, use_reloader=False,
use_debugger=True, use_debugger=True,
) )
except Exception: except Exception:

View File

@ -19,19 +19,20 @@ import logging
import uuid import uuid
from functools import wraps from functools import wraps
from datetime import datetime from datetime import datetime
from flask import request, jsonify
from flask import jsonify, request
from flask_login import current_user, login_user from flask_login import current_user, login_user
from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer
from api import settings
from api.common.exceptions import AdminException, UserNotFoundError from api.common.exceptions import AdminException, UserNotFoundError
from api.db.init_data import encode_to_base64 from api.common.base64 import encode_to_base64
from api.db.services import UserService from api.db.services import UserService
from common.constants import ActiveEnum, StatusEnum from common.constants import ActiveEnum, StatusEnum
from api.utils.crypt import decrypt from api.utils.crypt import decrypt
from common.misc_utils import get_uuid from common.misc_utils import get_uuid
from common.time_utils import current_timestamp, datetime_format, get_format_time from common.time_utils import current_timestamp, datetime_format, get_format_time
from common.connection_utils import construct_response from common.connection_utils import sync_construct_response
from common import settings
def setup_auth(login_manager): def setup_auth(login_manager):
@ -129,7 +130,7 @@ def login_admin(email: str, password: str):
user.last_login_time = get_format_time() user.last_login_time = get_format_time()
user.save() user.save()
msg = "Welcome back!" msg = "Welcome back!"
return construct_response(data=resp, auth=user.get_id(), message=msg) return sync_construct_response(data=resp, auth=user.get_id(), message=msg)
def check_admin(username: str, password: str): def check_admin(username: str, password: str):
@ -169,7 +170,7 @@ def login_verify(f):
username = auth.parameters['username'] username = auth.parameters['username']
password = auth.parameters['password'] password = auth.parameters['password']
try: try:
if check_admin(username, password) is False: if not check_admin(username, password):
return jsonify({ return jsonify({
"code": 500, "code": 500,
"message": "Access denied", "message": "Access denied",

View File

@ -25,8 +25,21 @@ from common.config_utils import read_config
from urllib.parse import urlparse from urllib.parse import urlparse
class BaseConfig(BaseModel):
id: int
name: str
host: str
port: int
service_type: str
detail_func_name: str
def to_dict(self) -> dict[str, Any]:
return {'id': self.id, 'name': self.name, 'host': self.host, 'port': self.port,
'service_type': self.service_type}
class ServiceConfigs: class ServiceConfigs:
configs = dict configs = list[BaseConfig]
def __init__(self): def __init__(self):
self.configs = [] self.configs = []
@ -45,19 +58,6 @@ class ServiceType(Enum):
FILE_STORE = "file_store" FILE_STORE = "file_store"
class BaseConfig(BaseModel):
id: int
name: str
host: str
port: int
service_type: str
detail_func_name: str
def to_dict(self) -> dict[str, Any]:
return {'id': self.id, 'name': self.name, 'host': self.host, 'port': self.port,
'service_type': self.service_type}
class MetaConfig(BaseConfig): class MetaConfig(BaseConfig):
meta_type: str meta_type: str
@ -183,11 +183,13 @@ class RAGFlowServerConfig(BaseConfig):
class TaskExecutorConfig(BaseConfig): class TaskExecutorConfig(BaseConfig):
message_queue_type: str
def to_dict(self) -> dict[str, Any]: def to_dict(self) -> dict[str, Any]:
result = super().to_dict() result = super().to_dict()
if 'extra' not in result: if 'extra' not in result:
result['extra'] = dict() result['extra'] = dict()
result['extra']['message_queue_type'] = self.message_queue_type
return result return result
@ -225,7 +227,7 @@ def load_configurations(config_path: str) -> list[BaseConfig]:
ragflow_count = 0 ragflow_count = 0
id_count = 0 id_count = 0
for k, v in raw_configs.items(): for k, v in raw_configs.items():
match (k): match k:
case "ragflow": case "ragflow":
name: str = f'ragflow_{ragflow_count}' name: str = f'ragflow_{ragflow_count}'
host: str = v['host'] host: str = v['host']
@ -299,6 +301,15 @@ def load_configurations(config_path: str) -> list[BaseConfig]:
id_count += 1 id_count += 1
case "admin": case "admin":
pass pass
case "task_executor":
name: str = 'task_executor'
host: str = v.get('host', '')
port: int = v.get('port', 0)
message_queue_type: str = v.get('message_queue_type')
config = TaskExecutorConfig(id=id_count, name=name, host=host, port=port, message_queue_type=message_queue_type,
service_type="task_executor", detail_func_name="check_task_executor_alive")
configurations.append(config)
id_count += 1
case _: case _:
logging.warning(f"Unknown configuration key: {k}") logging.warning(f"Unknown configuration key: {k}")
continue continue

View File

@ -13,8 +13,6 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# #
from flask import jsonify from flask import jsonify

View File

@ -17,13 +17,14 @@
import secrets import secrets
from flask import Blueprint, request from flask import Blueprint, request
from flask_login import current_user, logout_user, login_required from flask_login import current_user, login_required, logout_user
from auth import login_verify, login_admin, check_admin_auth from auth import login_verify, login_admin, check_admin_auth
from responses import success_response, error_response from responses import success_response, error_response
from services import UserMgr, ServiceMgr, UserServiceMgr from services import UserMgr, ServiceMgr, UserServiceMgr
from roles import RoleMgr from roles import RoleMgr
from api.common.exceptions import AdminException from api.common.exceptions import AdminException
from common.versions import get_ragflow_version
admin_bp = Blueprint('admin', __name__, url_prefix='/api/v1/admin') admin_bp = Blueprint('admin', __name__, url_prefix='/api/v1/admin')
@ -369,3 +370,13 @@ def get_user_permission(user_name: str):
return success_response(res) return success_response(res)
except Exception as e: except Exception as e:
return error_response(str(e), 500) return error_response(str(e), 500)
@admin_bp.route('/version', methods=['GET'])
@login_required
@check_admin_auth
def show_version():
try:
res = {"version": get_ragflow_version()}
return success_response(res)
except Exception as e:
return error_response(str(e), 500)

View File

@ -13,8 +13,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# #
import logging
import re import re
from werkzeug.security import check_password_hash from werkzeug.security import check_password_hash
from common.constants import ActiveEnum from common.constants import ActiveEnum
@ -52,6 +51,7 @@ class UserMgr:
result = [] result = []
for user in users: for user in users:
result.append({ result.append({
'avatar': user.avatar,
'email': user.email, 'email': user.email,
'language': user.language, 'language': user.language,
'last_login_time': user.last_login_time, 'last_login_time': user.last_login_time,
@ -170,7 +170,8 @@ class UserServiceMgr:
return [{ return [{
'title': r['title'], 'title': r['title'],
'permission': r['permission'], 'permission': r['permission'],
'canvas_category': r['canvas_category'].split('_')[0] 'canvas_category': r['canvas_category'].split('_')[0],
'avatar': r['avatar']
} for r in res] } for r in res]
@ -188,8 +189,13 @@ class ServiceMgr:
config_dict['status'] = service_detail['status'] config_dict['status'] = service_detail['status']
else: else:
config_dict['status'] = 'timeout' config_dict['status'] = 'timeout'
except Exception: except Exception as e:
logging.warning(f"Can't get service details, error: {e}")
config_dict['status'] = 'timeout' config_dict['status'] = 'timeout'
if not config_dict['host']:
config_dict['host'] = '-'
if not config_dict['port']:
config_dict['port'] = '-'
result.append(config_dict) result.append(config_dict)
return result return result
@ -199,17 +205,13 @@ class ServiceMgr:
@staticmethod @staticmethod
def get_service_details(service_id: int): def get_service_details(service_id: int):
service_id = int(service_id) service_idx = int(service_id)
configs = SERVICE_CONFIGS.configs configs = SERVICE_CONFIGS.configs
service_config_mapping = { if service_idx < 0 or service_idx >= len(configs):
c.id: { raise AdminException(f"invalid service_index: {service_idx}")
'name': c.name,
'detail_func_name': c.detail_func_name service_config = configs[service_idx]
} for c in configs service_info = {'name': service_config.name, 'detail_func_name': service_config.detail_func_name}
}
service_info = service_config_mapping.get(service_id, {})
if not service_info:
raise AdminException(f"invalid service_id: {service_id}")
detail_func = getattr(health_utils, service_info.get('detail_func_name')) detail_func = getattr(health_utils, service_info.get('detail_func_name'))
res = detail_func() res = detail_func()

View File

@ -25,8 +25,9 @@ from typing import Any, Union, Tuple
from agent.component import component_class from agent.component import component_class
from agent.component.base import ComponentBase from agent.component.base import ComponentBase
from api.db.services.file_service import FileService from api.db.services.task_service import has_canceled
from common.misc_utils import get_uuid, hash_str2int from common.misc_utils import get_uuid, hash_str2int
from common.exceptions import TaskCanceledException
from rag.prompts.generator import chunks_format from rag.prompts.generator import chunks_format
from rag.utils.redis_conn import REDIS_CONN from rag.utils.redis_conn import REDIS_CONN
@ -126,6 +127,7 @@ class Graph:
self.components[k]["obj"].reset() self.components[k]["obj"].reset()
try: try:
REDIS_CONN.delete(f"{self.task_id}-logs") REDIS_CONN.delete(f"{self.task_id}-logs")
REDIS_CONN.delete(f"{self.task_id}-cancel")
except Exception as e: except Exception as e:
logging.exception(e) logging.exception(e)
@ -153,6 +155,33 @@ class Graph:
def get_tenant_id(self): def get_tenant_id(self):
return self._tenant_id return self._tenant_id
def get_value_with_variable(self,value: str) -> Any:
pat = re.compile(r"\{* *\{([a-zA-Z:0-9]+@[A-Za-z0-9_.]+|sys\.[A-Za-z0-9_.]+|env\.[A-Za-z0-9_.]+)\} *\}*")
out_parts = []
last = 0
for m in pat.finditer(value):
out_parts.append(value[last:m.start()])
key = m.group(1)
v = self.get_variable_value(key)
if v is None:
rep = ""
elif isinstance(v, partial):
buf = []
for chunk in v():
buf.append(chunk)
rep = "".join(buf)
elif isinstance(v, str):
rep = v
else:
rep = json.dumps(v, ensure_ascii=False)
out_parts.append(rep)
last = m.end()
out_parts.append(value[last:])
return("".join(out_parts))
def get_variable_value(self, exp: str) -> Any: def get_variable_value(self, exp: str) -> Any:
exp = exp.strip("{").strip("}").strip(" ").strip("{").strip("}") exp = exp.strip("{").strip("}").strip(" ").strip("{").strip("}")
if exp.find("@") < 0: if exp.find("@") < 0:
@ -169,7 +198,7 @@ class Graph:
if not rest: if not rest:
return root_val return root_val
return self.get_variable_param_value(root_val,rest) return self.get_variable_param_value(root_val,rest)
def get_variable_param_value(self, obj: Any, path: str) -> Any: def get_variable_param_value(self, obj: Any, path: str) -> Any:
cur = obj cur = obj
if not path: if not path:
@ -187,6 +216,49 @@ class Graph:
else: else:
cur = getattr(cur, key, None) cur = getattr(cur, key, None)
return cur return cur
def set_variable_value(self, exp: str,value):
exp = exp.strip("{").strip("}").strip(" ").strip("{").strip("}")
if exp.find("@") < 0:
self.globals[exp] = value
return
cpn_id, var_nm = exp.split("@")
cpn = self.get_component(cpn_id)
if not cpn:
raise Exception(f"Can't find variable: '{cpn_id}@{var_nm}'")
parts = var_nm.split(".", 1)
root_key = parts[0]
rest = parts[1] if len(parts) > 1 else ""
if not rest:
cpn["obj"].set_output(root_key, value)
return
root_val = cpn["obj"].output(root_key)
if not root_val:
root_val = {}
cpn["obj"].set_output(root_key, self.set_variable_param_value(root_val,rest,value))
def set_variable_param_value(self, obj: Any, path: str, value) -> Any:
cur = obj
keys = path.split('.')
if not path:
return value
for key in keys:
if key not in cur or not isinstance(cur[key], dict):
cur[key] = {}
cur = cur[key]
cur[keys[-1]] = value
return obj
def is_canceled(self) -> bool:
return has_canceled(self.task_id)
def cancel_task(self) -> bool:
try:
REDIS_CONN.set(f"{self.task_id}-cancel", "x")
except Exception as e:
logging.exception(e)
return False
return True
class Canvas(Graph): class Canvas(Graph):
@ -212,7 +284,7 @@ class Canvas(Graph):
"sys.conversation_turns": 0, "sys.conversation_turns": 0,
"sys.files": [] "sys.files": []
} }
self.retrieval = self.dsl["retrieval"] self.retrieval = self.dsl["retrieval"]
self.memory = self.dsl.get("memory", []) self.memory = self.dsl.get("memory", [])
@ -229,20 +301,21 @@ class Canvas(Graph):
self.retrieval = [] self.retrieval = []
self.memory = [] self.memory = []
for k in self.globals.keys(): for k in self.globals.keys():
if isinstance(self.globals[k], str): if k.startswith("sys.") or k.startswith("env."):
self.globals[k] = "" if isinstance(self.globals[k], str):
elif isinstance(self.globals[k], int): self.globals[k] = ""
self.globals[k] = 0 elif isinstance(self.globals[k], int):
elif isinstance(self.globals[k], float): self.globals[k] = 0
self.globals[k] = 0 elif isinstance(self.globals[k], float):
elif isinstance(self.globals[k], list): self.globals[k] = 0
self.globals[k] = [] elif isinstance(self.globals[k], list):
elif isinstance(self.globals[k], dict): self.globals[k] = []
self.globals[k] = {} elif isinstance(self.globals[k], dict):
else: self.globals[k] = {}
self.globals[k] = None else:
self.globals[k] = None
def run(self, **kwargs): async def run(self, **kwargs):
st = time.perf_counter() st = time.perf_counter()
self.message_id = get_uuid() self.message_id = get_uuid()
created_at = int(time.time()) created_at = int(time.time())
@ -250,6 +323,12 @@ class Canvas(Graph):
for k, cpn in self.components.items(): for k, cpn in self.components.items():
self.components[k]["obj"].reset(True) self.components[k]["obj"].reset(True)
if kwargs.get("webhook_payload"):
for k, cpn in self.components.items():
if self.components[k]["obj"].component_name.lower() == "webhook":
for kk, vv in kwargs["webhook_payload"].items():
self.components[k]["obj"].set_output(kk, vv)
for k in kwargs.keys(): for k in kwargs.keys():
if k in ["query", "user_id", "files"] and kwargs[k]: if k in ["query", "user_id", "files"] and kwargs[k]:
if k == "files": if k == "files":
@ -275,10 +354,20 @@ class Canvas(Graph):
self.path.append("begin") self.path.append("begin")
self.retrieval.append({"chunks": [], "doc_aggs": []}) self.retrieval.append({"chunks": [], "doc_aggs": []})
if self.is_canceled():
msg = f"Task {self.task_id} has been canceled before starting."
logging.info(msg)
raise TaskCanceledException(msg)
yield decorate("workflow_started", {"inputs": kwargs.get("inputs")}) yield decorate("workflow_started", {"inputs": kwargs.get("inputs")})
self.retrieval.append({"chunks": {}, "doc_aggs": {}}) self.retrieval.append({"chunks": {}, "doc_aggs": {}})
def _run_batch(f, t): def _run_batch(f, t):
if self.is_canceled():
msg = f"Task {self.task_id} has been canceled during batch execution."
logging.info(msg)
raise TaskCanceledException(msg)
with ThreadPoolExecutor(max_workers=5) as executor: with ThreadPoolExecutor(max_workers=5) as executor:
thr = [] thr = []
i = f i = f
@ -289,7 +378,7 @@ class Canvas(Graph):
i += 1 i += 1
else: else:
for _, ele in cpn.get_input_elements().items(): for _, ele in cpn.get_input_elements().items():
if isinstance(ele, dict) and ele.get("_cpn_id") and ele.get("_cpn_id") not in self.path[:i]: if isinstance(ele, dict) and ele.get("_cpn_id") and ele.get("_cpn_id") not in self.path[:i] and self.path[0].lower().find("userfillup") < 0:
self.path.pop(i) self.path.pop(i)
t -= 1 t -= 1
break break
@ -348,6 +437,10 @@ class Canvas(Graph):
else: else:
yield decorate("message", {"content": cpn_obj.output("content")}) yield decorate("message", {"content": cpn_obj.output("content")})
cite = re.search(r"\[ID:[ 0-9]+\]", cpn_obj.output("content")) cite = re.search(r"\[ID:[ 0-9]+\]", cpn_obj.output("content"))
if isinstance(cpn_obj.output("attachment"), tuple):
yield decorate("message", {"attachment": cpn_obj.output("attachment")})
yield decorate("message_end", {"reference": self.get_reference() if cite else None}) yield decorate("message_end", {"reference": self.get_reference() if cite else None})
while partials: while partials:
@ -420,9 +513,10 @@ class Canvas(Graph):
for c in path: for c in path:
o = self.get_component_obj(c) o = self.get_component_obj(c)
if o.component_name.lower() == "userfillup": if o.component_name.lower() == "userfillup":
o.invoke()
another_inputs.update(o.get_input_elements()) another_inputs.update(o.get_input_elements())
if o.get_param("enable_tips"): if o.get_param("enable_tips"):
tips = o.get_param("tips") tips = o.output("tips")
self.path = path self.path = path
yield decorate("user_inputs", {"inputs": another_inputs, "tips": tips}) yield decorate("user_inputs", {"inputs": another_inputs, "tips": tips})
return return
@ -436,6 +530,14 @@ class Canvas(Graph):
"created_at": st, "created_at": st,
}) })
self.history.append(("assistant", self.get_component_obj(self.path[-1]).output())) self.history.append(("assistant", self.get_component_obj(self.path[-1]).output()))
elif "Task has been canceled" in self.error:
yield decorate("workflow_finished",
{
"inputs": kwargs.get("inputs"),
"outputs": "Task has been canceled",
"elapsed_time": time.perf_counter() - st,
"created_at": st,
})
def is_reff(self, exp: str) -> bool: def is_reff(self, exp: str) -> bool:
exp = exp.strip("{").strip("}") exp = exp.strip("{").strip("}")
@ -478,6 +580,7 @@ class Canvas(Graph):
return self.components[cpnnm]["obj"].get_input_elements() return self.components[cpnnm]["obj"].get_input_elements()
def get_files(self, files: Union[None, list[dict]]) -> list[str]: def get_files(self, files: Union[None, list[dict]]) -> list[str]:
from api.db.services.file_service import FileService
if not files: if not files:
return [] return []
def image_to_base64(file): def image_to_base64(file):

View File

@ -13,7 +13,6 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# #
import os import os
import importlib import importlib
import inspect import inspect
@ -50,9 +49,10 @@ del _package_path, _import_submodules, _extract_classes_from_module
def component_class(class_name): def component_class(class_name):
for mdl in ["agent.component", "agent.tools", "rag.flow"]: for module_name in ["agent.component", "agent.tools", "rag.flow"]:
try: try:
return getattr(importlib.import_module(mdl), class_name) return getattr(importlib.import_module(module_name), class_name)
except Exception: except Exception:
# logging.warning(f"Can't import module: {module_name}, error: {e}")
pass pass
assert False, f"Can't import {class_name}" assert False, f"Can't import {class_name}"

View File

@ -30,7 +30,7 @@ from api.db.services.mcp_server_service import MCPServerService
from common.connection_utils import timeout from common.connection_utils import timeout
from rag.prompts.generator import next_step, COMPLETE_TASK, analyze_task, \ from rag.prompts.generator import next_step, COMPLETE_TASK, analyze_task, \
citation_prompt, reflect, rank_memories, kb_prompt, citation_plus, full_question, message_fit_in citation_prompt, reflect, rank_memories, kb_prompt, citation_plus, full_question, message_fit_in
from rag.utils.mcp_tool_call_conn import MCPToolCallSession, mcp_tool_metadata_to_openai_tool from common.mcp_tool_call_conn import MCPToolCallSession, mcp_tool_metadata_to_openai_tool
from agent.component.llm import LLMParam, LLM from agent.component.llm import LLMParam, LLM
@ -139,6 +139,9 @@ class Agent(LLM, ToolBase):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 20*60))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 20*60)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("Agent processing"):
return
if kwargs.get("user_prompt"): if kwargs.get("user_prompt"):
usr_pmt = "" usr_pmt = ""
if kwargs.get("reasoning"): if kwargs.get("reasoning"):
@ -152,18 +155,15 @@ class Agent(LLM, ToolBase):
self._param.prompts = [{"role": "user", "content": usr_pmt}] self._param.prompts = [{"role": "user", "content": usr_pmt}]
if not self.tools: if not self.tools:
if self.check_if_canceled("Agent processing"):
return
return LLM._invoke(self, **kwargs) return LLM._invoke(self, **kwargs)
prompt, msg, user_defined_prompt = self._prepare_prompt_variables() prompt, msg, user_defined_prompt = self._prepare_prompt_variables()
downstreams = self._canvas.get_component(self._id)["downstream"] if self._canvas.get_component(self._id) else [] downstreams = self._canvas.get_component(self._id)["downstream"] if self._canvas.get_component(self._id) else []
ex = self.exception_handler() ex = self.exception_handler()
output_structure=None if any([self._canvas.get_component_obj(cid).component_name.lower()=="message" for cid in downstreams]) and not (ex and ex["goto"]):
try:
output_structure=self._param.outputs['structured']
except Exception:
pass
if any([self._canvas.get_component_obj(cid).component_name.lower()=="message" for cid in downstreams]) and not output_structure and not (ex and ex["goto"]):
self.set_output("content", partial(self.stream_output_with_tools, prompt, msg, user_defined_prompt)) self.set_output("content", partial(self.stream_output_with_tools, prompt, msg, user_defined_prompt))
return return
@ -171,6 +171,8 @@ class Agent(LLM, ToolBase):
use_tools = [] use_tools = []
ans = "" ans = ""
for delta_ans, tk in self._react_with_tools_streamly(prompt, msg, use_tools, user_defined_prompt): for delta_ans, tk in self._react_with_tools_streamly(prompt, msg, use_tools, user_defined_prompt):
if self.check_if_canceled("Agent processing"):
return
ans += delta_ans ans += delta_ans
if ans.find("**ERROR**") >= 0: if ans.find("**ERROR**") >= 0:
@ -191,12 +193,16 @@ class Agent(LLM, ToolBase):
answer_without_toolcall = "" answer_without_toolcall = ""
use_tools = [] use_tools = []
for delta_ans,_ in self._react_with_tools_streamly(prompt, msg, use_tools, user_defined_prompt): for delta_ans,_ in self._react_with_tools_streamly(prompt, msg, use_tools, user_defined_prompt):
if self.check_if_canceled("Agent streaming"):
return
if delta_ans.find("**ERROR**") >= 0: if delta_ans.find("**ERROR**") >= 0:
if self.get_exception_default_value(): if self.get_exception_default_value():
self.set_output("content", self.get_exception_default_value()) self.set_output("content", self.get_exception_default_value())
yield self.get_exception_default_value() yield self.get_exception_default_value()
else: else:
self.set_output("_ERROR", delta_ans) self.set_output("_ERROR", delta_ans)
return
answer_without_toolcall += delta_ans answer_without_toolcall += delta_ans
yield delta_ans yield delta_ans
@ -271,6 +277,8 @@ class Agent(LLM, ToolBase):
st = timer() st = timer()
txt = "" txt = ""
for delta_ans in self._gen_citations(entire_txt): for delta_ans in self._gen_citations(entire_txt):
if self.check_if_canceled("Agent streaming"):
return
yield delta_ans, 0 yield delta_ans, 0
txt += delta_ans txt += delta_ans
@ -286,6 +294,8 @@ class Agent(LLM, ToolBase):
task_desc = analyze_task(self.chat_mdl, prompt, user_request, tool_metas, user_defined_prompt) task_desc = analyze_task(self.chat_mdl, prompt, user_request, tool_metas, user_defined_prompt)
self.callback("analyze_task", {}, task_desc, elapsed_time=timer()-st) self.callback("analyze_task", {}, task_desc, elapsed_time=timer()-st)
for _ in range(self._param.max_rounds + 1): for _ in range(self._param.max_rounds + 1):
if self.check_if_canceled("Agent streaming"):
return
response, tk = next_step(self.chat_mdl, hist, tool_metas, task_desc, user_defined_prompt) response, tk = next_step(self.chat_mdl, hist, tool_metas, task_desc, user_defined_prompt)
# self.callback("next_step", {}, str(response)[:256]+"...") # self.callback("next_step", {}, str(response)[:256]+"...")
token_count += tk token_count += tk
@ -333,6 +343,8 @@ Instructions:
6. Focus on delivering VALUE with the information already gathered 6. Focus on delivering VALUE with the information already gathered
Respond immediately with your final comprehensive answer. Respond immediately with your final comprehensive answer.
""" """
if self.check_if_canceled("Agent final instruction"):
return
append_user_content(hist, final_instruction) append_user_content(hist, final_instruction)
for txt, tkcnt in complete(): for txt, tkcnt in complete():
@ -351,11 +363,19 @@ Respond immediately with your final comprehensive answer.
return "Error occurred." return "Error occurred."
def reset(self, temp=False): def reset(self, only_output=False):
""" """
Reset all tools if they have a reset method. This avoids errors for tools like MCPToolCallSession. Reset all tools if they have a reset method. This avoids errors for tools like MCPToolCallSession.
""" """
for k in self._param.outputs.keys():
self._param.outputs[k]["value"] = None
for k, cpn in self.tools.items(): for k, cpn in self.tools.items():
if hasattr(cpn, "reset") and callable(cpn.reset): if hasattr(cpn, "reset") and callable(cpn.reset):
cpn.reset() cpn.reset()
if only_output:
return
for k in self._param.inputs.keys():
self._param.inputs[k]["value"] = None
self._param.debug_inputs = {}

View File

@ -393,7 +393,7 @@ class ComponentParamBase(ABC):
class ComponentBase(ABC): class ComponentBase(ABC):
component_name: str component_name: str
thread_limiter = trio.CapacityLimiter(int(os.environ.get('MAX_CONCURRENT_CHATS', 10))) thread_limiter = trio.CapacityLimiter(int(os.environ.get('MAX_CONCURRENT_CHATS', 10)))
variable_ref_patt = r"\{* *\{([a-zA-Z:0-9]+@[A-Za-z:0-9_.-]+|sys\.[a-z_]+)\} *\}*" variable_ref_patt = r"\{* *\{([a-zA-Z:0-9]+@[A-Za-z0-9_.]+|sys\.[A-Za-z0-9_.]+|env\.[A-Za-z0-9_.]+)\} *\}*"
def __str__(self): def __str__(self):
""" """
@ -417,6 +417,20 @@ class ComponentBase(ABC):
self._param = param self._param = param
self._param.check() self._param.check()
def is_canceled(self) -> bool:
return self._canvas.is_canceled()
def check_if_canceled(self, message: str = "") -> bool:
if self.is_canceled():
task_id = getattr(self._canvas, 'task_id', 'unknown')
log_message = f"Task {task_id} has been canceled"
if message:
log_message += f" during {message}"
logging.info(log_message)
self.set_output("_ERROR", "Task has been canceled")
return True
return False
def invoke(self, **kwargs) -> dict[str, Any]: def invoke(self, **kwargs) -> dict[str, Any]:
self.set_output("_created_time", time.perf_counter()) self.set_output("_created_time", time.perf_counter())
try: try:
@ -449,12 +463,15 @@ class ComponentBase(ABC):
return self._param.outputs.get("_ERROR", {}).get("value") return self._param.outputs.get("_ERROR", {}).get("value")
def reset(self, only_output=False): def reset(self, only_output=False):
for k in self._param.outputs.keys(): outputs: dict = self._param.outputs # for better performance
self._param.outputs[k]["value"] = None for k in outputs.keys():
outputs[k]["value"] = None
if only_output: if only_output:
return return
for k in self._param.inputs.keys():
self._param.inputs[k]["value"] = None inputs: dict = self._param.inputs # for better performance
for k in inputs.keys():
inputs[k]["value"] = None
self._param.debug_inputs = {} self._param.debug_inputs = {}
def get_input(self, key: str=None) -> Union[Any, dict[str, Any]]: def get_input(self, key: str=None) -> Union[Any, dict[str, Any]]:
@ -514,6 +531,7 @@ class ComponentBase(ABC):
def get_param(self, name): def get_param(self, name):
if hasattr(self._param, name): if hasattr(self._param, name):
return getattr(self._param, name) return getattr(self._param, name)
return None
def debug(self, **kwargs): def debug(self, **kwargs):
return self._invoke(**kwargs) return self._invoke(**kwargs)
@ -521,7 +539,7 @@ class ComponentBase(ABC):
def get_parent(self) -> Union[object, None]: def get_parent(self) -> Union[object, None]:
pid = self._canvas.get_component(self._id).get("parent_id") pid = self._canvas.get_component(self._id).get("parent_id")
if not pid: if not pid:
return return None
return self._canvas.get_component(pid)["obj"] return self._canvas.get_component(pid)["obj"]
def get_upstream(self) -> List[str]: def get_upstream(self) -> List[str]:
@ -546,7 +564,7 @@ class ComponentBase(ABC):
def exception_handler(self): def exception_handler(self):
if not self._param.exception_method: if not self._param.exception_method:
return return None
return { return {
"goto": self._param.exception_goto, "goto": self._param.exception_goto,
"default_value": self._param.exception_default_value "default_value": self._param.exception_default_value

View File

@ -37,7 +37,13 @@ class Begin(UserFillUp):
component_name = "Begin" component_name = "Begin"
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("Begin processing"):
return
for k, v in kwargs.get("inputs", {}).items(): for k, v in kwargs.get("inputs", {}).items():
if self.check_if_canceled("Begin processing"):
return
if isinstance(v, dict) and v.get("type", "").lower().find("file") >=0: if isinstance(v, dict) and v.get("type", "").lower().find("file") >=0:
if v.get("optional") and v.get("value", None) is None: if v.get("optional") and v.get("value", None) is None:
v = None v = None

View File

@ -98,6 +98,9 @@ class Categorize(LLM, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("Categorize processing"):
return
msg = self._canvas.get_history(self._param.message_history_window_size) msg = self._canvas.get_history(self._param.message_history_window_size)
if not msg: if not msg:
msg = [{"role": "user", "content": ""}] msg = [{"role": "user", "content": ""}]
@ -114,10 +117,18 @@ class Categorize(LLM, ABC):
---- Real Data ---- ---- Real Data ----
{} {}
""".format(" | ".join(["{}: \"{}\"".format(c["role"].upper(), re.sub(r"\n", "", c["content"], flags=re.DOTALL)) for c in msg])) """.format(" | ".join(["{}: \"{}\"".format(c["role"].upper(), re.sub(r"\n", "", c["content"], flags=re.DOTALL)) for c in msg]))
if self.check_if_canceled("Categorize processing"):
return
ans = chat_mdl.chat(self._param.sys_prompt, [{"role": "user", "content": user_prompt}], self._param.gen_conf()) ans = chat_mdl.chat(self._param.sys_prompt, [{"role": "user", "content": user_prompt}], self._param.gen_conf())
logging.info(f"input: {user_prompt}, answer: {str(ans)}") logging.info(f"input: {user_prompt}, answer: {str(ans)}")
if ERROR_PREFIX in ans: if ERROR_PREFIX in ans:
raise Exception(ans) raise Exception(ans)
if self.check_if_canceled("Categorize processing"):
return
# Count the number of times each category appears in the answer. # Count the number of times each category appears in the answer.
category_counts = {} category_counts = {}
for c in self._param.category_description.keys(): for c in self._param.category_description.keys():

View File

@ -1,3 +1,18 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC from abc import ABC
import ast import ast
import os import os
@ -10,7 +25,7 @@ class DataOperationsParam(ComponentParamBase):
""" """
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.inputs = [] self.query = []
self.operations = "literal_eval" self.operations = "literal_eval"
self.select_keys = [] self.select_keys = []
self.filter_values=[] self.filter_values=[]
@ -35,18 +50,19 @@ class DataOperations(ComponentBase,ABC):
def get_input_form(self) -> dict[str, dict]: def get_input_form(self) -> dict[str, dict]:
return { return {
k: {"name": o.get("name", ""), "type": "line"} k: {"name": o.get("name", ""), "type": "line"}
for input_item in (self._param.inputs or []) for input_item in (self._param.query or [])
for k, o in self.get_input_elements_from_text(input_item).items() for k, o in self.get_input_elements_from_text(input_item).items()
} }
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
self.input_objects=[] self.input_objects=[]
inputs = getattr(self._param, "inputs", None) inputs = getattr(self._param, "query", None)
if not isinstance(inputs, (list, tuple)): if not isinstance(inputs, (list, tuple)):
inputs = [inputs] inputs = [inputs]
for input_ref in self._param.inputs: for input_ref in inputs:
input_object=self._canvas.get_variable_value(input_ref) input_object=self._canvas.get_variable_value(input_ref)
self.set_input_value(input_ref, input_object)
if input_object is None: if input_object is None:
continue continue
if isinstance(input_object,dict): if isinstance(input_object,dict):
@ -57,7 +73,7 @@ class DataOperations(ComponentBase,ABC):
continue continue
if self._param.operations == "select_keys": if self._param.operations == "select_keys":
self._select_keys() self._select_keys()
elif self._param.operations == "literal_eval": elif self._param.operations == "recursive_eval":
self._literal_eval() self._literal_eval()
elif self._param.operations == "combine": elif self._param.operations == "combine":
self._combine() self._combine()
@ -100,7 +116,7 @@ class DataOperations(ComponentBase,ABC):
def _combine(self): def _combine(self):
result={} result={}
for obj in self.input_objects(): for obj in self.input_objects:
for key, value in obj.items(): for key, value in obj.items():
if key not in result: if key not in result:
result[key] = value result[key] = value
@ -123,6 +139,7 @@ class DataOperations(ComponentBase,ABC):
key = rule.get("key") key = rule.get("key")
op = (rule.get("operator") or "equals").lower() op = (rule.get("operator") or "equals").lower()
target = self.norm(rule.get("value")) target = self.norm(rule.get("value"))
target = self._canvas.get_value_with_variable(target) or target
if key not in obj: if key not in obj:
return False return False
val = obj.get(key, None) val = obj.get(key, None)
@ -142,7 +159,7 @@ class DataOperations(ComponentBase,ABC):
def _filter_values(self): def _filter_values(self):
results=[] results=[]
rules = (getattr(self._param, "filter_values", None) or []) rules = (getattr(self._param, "filter_values", None) or [])
for obj in self.input_objects(): for obj in self.input_objects:
if not rules: if not rules:
results.append(obj) results.append(obj)
continue continue
@ -154,7 +171,7 @@ class DataOperations(ComponentBase,ABC):
def _append_or_update(self): def _append_or_update(self):
results=[] results=[]
updates = getattr(self._param, "updates", []) or [] updates = getattr(self._param, "updates", []) or []
for obj in self.input_objects(): for obj in self.input_objects:
new_obj = dict(obj) new_obj = dict(obj)
for item in updates: for item in updates:
if not isinstance(item, dict): if not isinstance(item, dict):
@ -162,7 +179,7 @@ class DataOperations(ComponentBase,ABC):
k = (item.get("key") or "").strip() k = (item.get("key") or "").strip()
if not k: if not k:
continue continue
new_obj[k] = item.get("value") new_obj[k] = self._canvas.get_value_with_variable(item.get("value")) or item.get("value")
results.append(new_obj) results.append(new_obj)
self.set_output("result", results) self.set_output("result", results)

View File

@ -13,7 +13,11 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# #
from agent.component.base import ComponentBase, ComponentParamBase import json
import re
from functools import partial
from agent.component.base import ComponentParamBase, ComponentBase
class UserFillUpParam(ComponentParamBase): class UserFillUpParam(ComponentParamBase):
@ -31,10 +35,35 @@ class UserFillUp(ComponentBase):
component_name = "UserFillUp" component_name = "UserFillUp"
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("UserFillUp processing"):
return
if self._param.enable_tips:
content = self._param.tips
for k, v in self.get_input_elements_from_text(self._param.tips).items():
v = v["value"]
ans = ""
if isinstance(v, partial):
for t in v():
ans += t
elif isinstance(v, list):
ans = ",".join([str(vv) for vv in v])
elif not isinstance(v, str):
try:
ans = json.dumps(v, ensure_ascii=False)
except Exception:
pass
else:
ans = v
if not ans:
ans = ""
content = re.sub(r"\{%s\}"%k, ans, content)
self.set_output("tips", content)
for k, v in kwargs.get("inputs", {}).items(): for k, v in kwargs.get("inputs", {}).items():
if self.check_if_canceled("UserFillUp processing"):
return
self.set_output(k, v) self.set_output(k, v)
def thoughts(self) -> str: def thoughts(self) -> str:
return "Waiting for your input..." return "Waiting for your input..."

View File

@ -56,6 +56,9 @@ class Invoke(ComponentBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 3))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 3)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("Invoke processing"):
return
args = {} args = {}
for para in self._param.variables: for para in self._param.variables:
if para.get("value"): if para.get("value"):
@ -89,6 +92,9 @@ class Invoke(ComponentBase, ABC):
last_e = "" last_e = ""
for _ in range(self._param.max_retries + 1): for _ in range(self._param.max_retries + 1):
if self.check_if_canceled("Invoke processing"):
return
try: try:
if method == "get": if method == "get":
response = requests.get(url=url, params=args, headers=headers, proxies=proxies, timeout=self._param.timeout) response = requests.get(url=url, params=args, headers=headers, proxies=proxies, timeout=self._param.timeout)
@ -121,6 +127,9 @@ class Invoke(ComponentBase, ABC):
return self.output("result") return self.output("result")
except Exception as e: except Exception as e:
if self.check_if_canceled("Invoke processing"):
return
last_e = e last_e = e
logging.exception(f"Http request error: {e}") logging.exception(f"Http request error: {e}")
time.sleep(self._param.delay_after_error) time.sleep(self._param.delay_after_error)

View File

@ -32,6 +32,7 @@ class IterationParam(ComponentParamBase):
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.items_ref = "" self.items_ref = ""
self.veriable={}
def get_input_form(self) -> dict[str, dict]: def get_input_form(self) -> dict[str, dict]:
return { return {
@ -56,6 +57,9 @@ class Iteration(ComponentBase, ABC):
return cid return cid
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("Iteration processing"):
return
arr = self._canvas.get_variable_value(self._param.items_ref) arr = self._canvas.get_variable_value(self._param.items_ref)
if not isinstance(arr, list): if not isinstance(arr, list):
self.set_output("_ERROR", self._param.items_ref + " must be an array, but its type is "+str(type(arr))) self.set_output("_ERROR", self._param.items_ref + " must be an array, but its type is "+str(type(arr)))

View File

@ -33,6 +33,9 @@ class IterationItem(ComponentBase, ABC):
self._idx = 0 self._idx = 0
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("IterationItem processing"):
return
parent = self.get_parent() parent = self.get_parent()
arr = self._canvas.get_variable_value(parent._param.items_ref) arr = self._canvas.get_variable_value(parent._param.items_ref)
if not isinstance(arr, list): if not isinstance(arr, list):
@ -40,12 +43,17 @@ class IterationItem(ComponentBase, ABC):
raise Exception(parent._param.items_ref + " must be an array, but its type is "+str(type(arr))) raise Exception(parent._param.items_ref + " must be an array, but its type is "+str(type(arr)))
if self._idx > 0: if self._idx > 0:
if self.check_if_canceled("IterationItem processing"):
return
self.output_collation() self.output_collation()
if self._idx >= len(arr): if self._idx >= len(arr):
self._idx = -1 self._idx = -1
return return
if self.check_if_canceled("IterationItem processing"):
return
self.set_output("item", arr[self._idx]) self.set_output("item", arr[self._idx])
self.set_output("index", self._idx) self.set_output("index", self._idx)
@ -80,4 +88,4 @@ class IterationItem(ComponentBase, ABC):
return self._idx == -1 return self._idx == -1
def thoughts(self) -> str: def thoughts(self) -> str:
return "Next turn..." return "Next turn..."

View File

@ -0,0 +1,168 @@
from abc import ABC
import os
from agent.component.base import ComponentBase, ComponentParamBase
from api.utils.api_utils import timeout
class ListOperationsParam(ComponentParamBase):
"""
Define the List Operations component parameters.
"""
def __init__(self):
super().__init__()
self.query = ""
self.operations = "topN"
self.n=0
self.sort_method = "asc"
self.filter = {
"operator": "=",
"value": ""
}
self.outputs = {
"result": {
"value": [],
"type": "Array of ?"
},
"first": {
"value": "",
"type": "?"
},
"last": {
"value": "",
"type": "?"
}
}
def check(self):
self.check_empty(self.query, "query")
self.check_valid_value(self.operations, "Support operations", ["topN","head","tail","filter","sort","drop_duplicates"])
def get_input_form(self) -> dict[str, dict]:
return {}
class ListOperations(ComponentBase,ABC):
component_name = "ListOperations"
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60)))
def _invoke(self, **kwargs):
self.input_objects=[]
inputs = getattr(self._param, "query", None)
self.inputs = self._canvas.get_variable_value(inputs)
if not isinstance(self.inputs, list):
raise TypeError("The input of List Operations should be an array.")
self.set_input_value(inputs, self.inputs)
if self._param.operations == "topN":
self._topN()
elif self._param.operations == "head":
self._head()
elif self._param.operations == "tail":
self._tail()
elif self._param.operations == "filter":
self._filter()
elif self._param.operations == "sort":
self._sort()
elif self._param.operations == "drop_duplicates":
self._drop_duplicates()
def _coerce_n(self):
try:
return int(getattr(self._param, "n", 0))
except Exception:
return 0
def _set_outputs(self, outputs):
self._param.outputs["result"]["value"] = outputs
self._param.outputs["first"]["value"] = outputs[0] if outputs else None
self._param.outputs["last"]["value"] = outputs[-1] if outputs else None
def _topN(self):
n = self._coerce_n()
if n < 1:
outputs = []
else:
n = min(n, len(self.inputs))
outputs = self.inputs[:n]
self._set_outputs(outputs)
def _head(self):
n = self._coerce_n()
if 1 <= n <= len(self.inputs):
outputs = [self.inputs[n - 1]]
else:
outputs = []
self._set_outputs(outputs)
def _tail(self):
n = self._coerce_n()
if 1 <= n <= len(self.inputs):
outputs = [self.inputs[-n]]
else:
outputs = []
self._set_outputs(outputs)
def _filter(self):
self._set_outputs([i for i in self.inputs if self._eval(self._norm(i),self._param.filter["operator"],self._param.filter["value"])])
def _norm(self,v):
s = "" if v is None else str(v)
return s
def _eval(self, v, operator, value):
if operator == "=":
return v == value
elif operator == "":
return v != value
elif operator == "contains":
return value in v
elif operator == "start with":
return v.startswith(value)
elif operator == "end with":
return v.endswith(value)
else:
return False
def _sort(self):
items = self.inputs or []
method = getattr(self._param, "sort_method", "asc") or "asc"
reverse = method == "desc"
if not items:
self._set_outputs([])
return
first = items[0]
if isinstance(first, dict):
outputs = sorted(
items,
key=lambda x: self._hashable(x),
reverse=reverse,
)
else:
outputs = sorted(items, reverse=reverse)
self._set_outputs(outputs)
def _drop_duplicates(self):
seen = set()
outs = []
for item in self.inputs:
k = self._hashable(item)
if k in seen:
continue
seen.add(k)
outs.append(item)
self._set_outputs(outs)
def _hashable(self,x):
if isinstance(x, dict):
return tuple(sorted((k, self._hashable(v)) for k, v in x.items()))
if isinstance(x, (list, tuple)):
return tuple(self._hashable(v) for v in x)
if isinstance(x, set):
return tuple(sorted(self._hashable(v) for v in x))
return x
def thoughts(self) -> str:
return "ListOperation in progress"

View File

@ -207,6 +207,9 @@ class LLM(ComponentBase):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("LLM processing"):
return
def clean_formated_answer(ans: str) -> str: def clean_formated_answer(ans: str) -> str:
ans = re.sub(r"^.*</think>", "", ans, flags=re.DOTALL) ans = re.sub(r"^.*</think>", "", ans, flags=re.DOTALL)
ans = re.sub(r"^.*```json", "", ans, flags=re.DOTALL) ans = re.sub(r"^.*```json", "", ans, flags=re.DOTALL)
@ -216,13 +219,16 @@ class LLM(ComponentBase):
error: str = "" error: str = ""
output_structure=None output_structure=None
try: try:
output_structure = None#self._param.outputs['structured'] output_structure = self._param.outputs['structured']
except Exception: except Exception:
pass pass
if output_structure: if output_structure and isinstance(output_structure, dict) and output_structure.get("properties"):
schema=json.dumps(output_structure, ensure_ascii=False, indent=2) schema=json.dumps(output_structure, ensure_ascii=False, indent=2)
prompt += structured_output_prompt(schema) prompt += structured_output_prompt(schema)
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("LLM processing"):
return
_, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97)) _, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97))
error = "" error = ""
ans = self._generate(msg) ans = self._generate(msg)
@ -243,11 +249,14 @@ class LLM(ComponentBase):
downstreams = self._canvas.get_component(self._id)["downstream"] if self._canvas.get_component(self._id) else [] downstreams = self._canvas.get_component(self._id)["downstream"] if self._canvas.get_component(self._id) else []
ex = self.exception_handler() ex = self.exception_handler()
if any([self._canvas.get_component_obj(cid).component_name.lower()=="message" for cid in downstreams]) and not output_structure and not (ex and ex["goto"]): if any([self._canvas.get_component_obj(cid).component_name.lower()=="message" for cid in downstreams]) and not (ex and ex["goto"]):
self.set_output("content", partial(self._stream_output, prompt, msg)) self.set_output("content", partial(self._stream_output, prompt, msg))
return return
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("LLM processing"):
return
_, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97)) _, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97))
error = "" error = ""
ans = self._generate(msg) ans = self._generate(msg)
@ -269,6 +278,9 @@ class LLM(ComponentBase):
_, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97)) _, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97))
answer = "" answer = ""
for ans in self._generate_streamly(msg): for ans in self._generate_streamly(msg):
if self.check_if_canceled("LLM streaming"):
return
if ans.find("**ERROR**") >= 0: if ans.find("**ERROR**") >= 0:
if self.get_exception_default_value(): if self.get_exception_default_value():
self.set_output("content", self.get_exception_default_value()) self.set_output("content", self.get_exception_default_value())
@ -287,4 +299,4 @@ class LLM(ComponentBase):
def thoughts(self) -> str: def thoughts(self) -> str:
_, msg,_ = self._prepare_prompt_variables() _, msg,_ = self._prepare_prompt_variables()
return "⌛Give me a moment—starting from: \n\n" + re.sub(r"(User's query:|[\\]+)", '', msg[-1]['content'], flags=re.DOTALL) + "\n\nIll figure out our best next move." return "⌛Give me a moment—starting from: \n\n" + re.sub(r"(User's query:|[\\]+)", '', msg[-1]['content'], flags=re.DOTALL) + "\n\nIll figure out our best next move."

View File

@ -17,6 +17,8 @@ import json
import os import os
import random import random
import re import re
import logging
import tempfile
from functools import partial from functools import partial
from typing import Any from typing import Any
@ -24,6 +26,8 @@ from agent.component.base import ComponentBase, ComponentParamBase
from jinja2 import Template as Jinja2Template from jinja2 import Template as Jinja2Template
from common.connection_utils import timeout from common.connection_utils import timeout
from common.misc_utils import get_uuid
from common import settings
class MessageParam(ComponentParamBase): class MessageParam(ComponentParamBase):
@ -34,6 +38,7 @@ class MessageParam(ComponentParamBase):
super().__init__() super().__init__()
self.content = [] self.content = []
self.stream = True self.stream = True
self.output_format = None # default output format
self.outputs = { self.outputs = {
"content": { "content": {
"type": "str" "type": "str"
@ -89,6 +94,9 @@ class Message(ComponentBase):
all_content = "" all_content = ""
cache = {} cache = {}
for r in re.finditer(self.variable_ref_patt, rand_cnt, flags=re.DOTALL): for r in re.finditer(self.variable_ref_patt, rand_cnt, flags=re.DOTALL):
if self.check_if_canceled("Message streaming"):
return
all_content += rand_cnt[s: r.start()] all_content += rand_cnt[s: r.start()]
yield rand_cnt[s: r.start()] yield rand_cnt[s: r.start()]
s = r.end() s = r.end()
@ -99,30 +107,38 @@ class Message(ComponentBase):
continue continue
v = self._canvas.get_variable_value(exp) v = self._canvas.get_variable_value(exp)
if not v: if v is None:
v = "" v = ""
if isinstance(v, partial): if isinstance(v, partial):
cnt = "" cnt = ""
for t in v(): for t in v():
if self.check_if_canceled("Message streaming"):
return
all_content += t all_content += t
cnt += t cnt += t
yield t yield t
self.set_input_value(exp, cnt)
continue continue
elif not isinstance(v, str): elif not isinstance(v, str):
try: try:
v = json.dumps(v, ensure_ascii=False, indent=2) v = json.dumps(v, ensure_ascii=False)
except Exception: except Exception:
v = str(v) v = str(v)
yield v yield v
self.set_input_value(exp, v)
all_content += v all_content += v
cache[exp] = v cache[exp] = v
if s < len(rand_cnt): if s < len(rand_cnt):
if self.check_if_canceled("Message streaming"):
return
all_content += rand_cnt[s: ] all_content += rand_cnt[s: ]
yield rand_cnt[s: ] yield rand_cnt[s: ]
self.set_output("content", all_content) self.set_output("content", all_content)
self._convert_content(all_content)
def _is_jinjia2(self, content:str) -> bool: def _is_jinjia2(self, content:str) -> bool:
patt = [ patt = [
@ -132,6 +148,9 @@ class Message(ComponentBase):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("Message processing"):
return
rand_cnt = random.choice(self._param.content) rand_cnt = random.choice(self._param.content)
if self._param.stream and not self._is_jinjia2(rand_cnt): if self._param.stream and not self._is_jinjia2(rand_cnt):
self.set_output("content", partial(self._stream, rand_cnt)) self.set_output("content", partial(self._stream, rand_cnt))
@ -144,10 +163,79 @@ class Message(ComponentBase):
except Exception: except Exception:
pass pass
if self.check_if_canceled("Message processing"):
return
for n, v in kwargs.items(): for n, v in kwargs.items():
content = re.sub(n, v, content) content = re.sub(n, v, content)
self.set_output("content", content) self.set_output("content", content)
self._convert_content(content)
def thoughts(self) -> str: def thoughts(self) -> str:
return "" return ""
def _convert_content(self, content):
if not self._param.output_format:
return
import pypandoc
doc_id = get_uuid()
if self._param.output_format.lower() not in {"markdown", "html", "pdf", "docx"}:
self._param.output_format = "markdown"
try:
if self._param.output_format in {"markdown", "html"}:
if isinstance(content, str):
converted = pypandoc.convert_text(
content,
to=self._param.output_format,
format="markdown",
)
else:
converted = pypandoc.convert_file(
content,
to=self._param.output_format,
format="markdown",
)
binary_content = converted.encode("utf-8")
else: # pdf, docx
with tempfile.NamedTemporaryFile(suffix=f".{self._param.output_format}", delete=False) as tmp:
tmp_name = tmp.name
try:
if isinstance(content, str):
pypandoc.convert_text(
content,
to=self._param.output_format,
format="markdown",
outputfile=tmp_name,
)
else:
pypandoc.convert_file(
content,
to=self._param.output_format,
format="markdown",
outputfile=tmp_name,
)
with open(tmp_name, "rb") as f:
binary_content = f.read()
finally:
if os.path.exists(tmp_name):
os.remove(tmp_name)
settings.STORAGE_IMPL.put(self._canvas._tenant_id, doc_id, binary_content)
self.set_output("attachment", {
"doc_id":doc_id,
"format":self._param.output_format,
"file_name":f"{doc_id[:8]}.{self._param.output_format}"})
logging.info(f"Converted content uploaded as {doc_id} (format={self._param.output_format})")
except Exception as e:
logging.error(f"Error converting content to {self._param.output_format}: {e}")

View File

@ -63,17 +63,24 @@ class StringTransform(Message, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("StringTransform processing"):
return
if self._param.method == "split": if self._param.method == "split":
self._split(kwargs.get("line")) self._split(kwargs.get("line"))
else: else:
self._merge(kwargs) self._merge(kwargs)
def _split(self, line:str|None = None): def _split(self, line:str|None = None):
if self.check_if_canceled("StringTransform split processing"):
return
var = self._canvas.get_variable_value(self._param.split_ref) if not line else line var = self._canvas.get_variable_value(self._param.split_ref) if not line else line
if not var: if not var:
var = "" var = ""
assert isinstance(var, str), "The input variable is not a string: {}".format(type(var)) assert isinstance(var, str), "The input variable is not a string: {}".format(type(var))
self.set_input_value(self._param.split_ref, var) self.set_input_value(self._param.split_ref, var)
res = [] res = []
for i,s in enumerate(re.split(r"(%s)"%("|".join([re.escape(d) for d in self._param.delimiters])), var, flags=re.DOTALL)): for i,s in enumerate(re.split(r"(%s)"%("|".join([re.escape(d) for d in self._param.delimiters])), var, flags=re.DOTALL)):
if i % 2 == 1: if i % 2 == 1:
@ -82,6 +89,9 @@ class StringTransform(Message, ABC):
self.set_output("result", res) self.set_output("result", res)
def _merge(self, kwargs:dict[str, str] = {}): def _merge(self, kwargs:dict[str, str] = {}):
if self.check_if_canceled("StringTransform merge processing"):
return
script = self._param.script script = self._param.script
script, kwargs = self.get_kwargs(script, kwargs, self._param.delimiters[0]) script, kwargs = self.get_kwargs(script, kwargs, self._param.delimiters[0])

View File

@ -63,9 +63,18 @@ class Switch(ComponentBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 3))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 3)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("Switch processing"):
return
for cond in self._param.conditions: for cond in self._param.conditions:
if self.check_if_canceled("Switch processing"):
return
res = [] res = []
for item in cond["items"]: for item in cond["items"]:
if self.check_if_canceled("Switch processing"):
return
if not item["cpn_id"]: if not item["cpn_id"]:
continue continue
cpn_v = self._canvas.get_variable_value(item["cpn_id"]) cpn_v = self._canvas.get_variable_value(item["cpn_id"])
@ -128,4 +137,4 @@ class Switch(ComponentBase, ABC):
raise ValueError('Not supported operator' + operator) raise ValueError('Not supported operator' + operator)
def thoughts(self) -> str: def thoughts(self) -> str:
return "Im weighing a few options and will pick the next step shortly." return "Im weighing a few options and will pick the next step shortly."

View File

@ -0,0 +1,84 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Any
import os
from common.connection_utils import timeout
from agent.component.base import ComponentBase, ComponentParamBase
class VariableAggregatorParam(ComponentParamBase):
"""
Parameters for VariableAggregator
- groups: list of dicts {"group_name": str, "variables": [variable selectors]}
"""
def __init__(self):
super().__init__()
# each group expects: {"group_name": str, "variables": List[str]}
self.groups = []
def check(self):
self.check_empty(self.groups, "[VariableAggregator] groups")
for g in self.groups:
if not g.get("group_name"):
raise ValueError("[VariableAggregator] group_name can not be empty!")
if not g.get("variables"):
raise ValueError(
f"[VariableAggregator] variables of group `{g.get('group_name')}` can not be empty"
)
if not isinstance(g.get("variables"), list):
raise ValueError(
f"[VariableAggregator] variables of group `{g.get('group_name')}` should be a list of strings"
)
def get_input_form(self) -> dict[str, dict]:
return {
"variables": {
"name": "Variables",
"type": "list",
}
}
class VariableAggregator(ComponentBase):
component_name = "VariableAggregator"
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 3)))
def _invoke(self, **kwargs):
# Group mode: for each group, pick the first available variable
for group in self._param.groups:
gname = group.get("group_name")
# record candidate selectors within this group
self.set_input_value(f"{gname}.variables", list(group.get("variables", [])))
for selector in group.get("variables", []):
val = self._canvas.get_variable_value(selector['value'])
if val:
self.set_output(gname, val)
break
@staticmethod
def _to_object(value: Any) -> Any:
# Try to convert value to serializable object if it has to_object()
try:
return value.to_object() # type: ignore[attr-defined]
except Exception:
return value
def thoughts(self) -> str:
return "Aggregating variables from canvas and grouping as configured."

View File

@ -0,0 +1,192 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
import os
import numbers
from agent.component.base import ComponentBase, ComponentParamBase
from api.utils.api_utils import timeout
class VariableAssignerParam(ComponentParamBase):
"""
Define the Variable Assigner component parameters.
"""
def __init__(self):
super().__init__()
self.variables=[]
def check(self):
return True
def get_input_form(self) -> dict[str, dict]:
return {
"items": {
"type": "json",
"name": "Items"
}
}
class VariableAssigner(ComponentBase,ABC):
component_name = "VariableAssigner"
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60)))
def _invoke(self, **kwargs):
if not isinstance(self._param.variables,list):
return
else:
for item in self._param.variables:
if any([not item.get("variable"), not item.get("operator"), not item.get("parameter")]):
assert "Variable is not complete."
variable=item["variable"]
operator=item["operator"]
parameter=item["parameter"]
variable_value=self._canvas.get_variable_value(variable)
new_variable=self._operate(variable_value,operator,parameter)
self._canvas.set_variable_value(variable, new_variable)
def _operate(self,variable,operator,parameter):
if operator == "overwrite":
return self._overwrite(parameter)
elif operator == "clear":
return self._clear(variable)
elif operator == "set":
return self._set(variable,parameter)
elif operator == "append":
return self._append(variable,parameter)
elif operator == "extend":
return self._extend(variable,parameter)
elif operator == "remove_first":
return self._remove_first(variable)
elif operator == "remove_last":
return self._remove_last(variable)
elif operator == "+=":
return self._add(variable,parameter)
elif operator == "-=":
return self._subtract(variable,parameter)
elif operator == "*=":
return self._multiply(variable,parameter)
elif operator == "/=":
return self._divide(variable,parameter)
else:
return
def _overwrite(self,parameter):
return self._canvas.get_variable_value(parameter)
def _clear(self,variable):
if isinstance(variable,list):
return []
elif isinstance(variable,str):
return ""
elif isinstance(variable,dict):
return {}
elif isinstance(variable,int):
return 0
elif isinstance(variable,float):
return 0.0
elif isinstance(variable,bool):
return False
else:
return None
def _set(self,variable,parameter):
if variable is None:
return self._canvas.get_value_with_variable(parameter)
elif isinstance(variable,str):
return self._canvas.get_value_with_variable(parameter)
elif isinstance(variable,bool):
return parameter
elif isinstance(variable,int):
return parameter
elif isinstance(variable,float):
return parameter
else:
return parameter
def _append(self,variable,parameter):
parameter=self._canvas.get_variable_value(parameter)
if variable is None:
variable=[]
if not isinstance(variable,list):
return "ERROR:VARIABLE_NOT_LIST"
elif len(variable)!=0 and not isinstance(parameter,type(variable[0])):
return "ERROR:PARAMETER_NOT_LIST_ELEMENT_TYPE"
else:
variable.append(parameter)
return variable
def _extend(self,variable,parameter):
parameter=self._canvas.get_variable_value(parameter)
if variable is None:
variable=[]
if not isinstance(variable,list):
return "ERROR:VARIABLE_NOT_LIST"
elif not isinstance(parameter,list):
return "ERROR:PARAMETER_NOT_LIST"
elif len(variable)!=0 and len(parameter)!=0 and not isinstance(parameter[0],type(variable[0])):
return "ERROR:PARAMETER_NOT_LIST_ELEMENT_TYPE"
else:
return variable + parameter
def _remove_first(self,variable):
if len(variable)==0:
return variable
if not isinstance(variable,list):
return "ERROR:VARIABLE_NOT_LIST"
else:
return variable[1:]
def _remove_last(self,variable):
if len(variable)==0:
return variable
if not isinstance(variable,list):
return "ERROR:VARIABLE_NOT_LIST"
else:
return variable[:-1]
def is_number(self, value):
if isinstance(value, bool):
return False
return isinstance(value, numbers.Number)
def _add(self,variable,parameter):
if self.is_number(variable) and self.is_number(parameter):
return variable + parameter
else:
return "ERROR:VARIABLE_NOT_NUMBER or PARAMETER_NOT_NUMBER"
def _subtract(self,variable,parameter):
if self.is_number(variable) and self.is_number(parameter):
return variable - parameter
else:
return "ERROR:VARIABLE_NOT_NUMBER or PARAMETER_NOT_NUMBER"
def _multiply(self,variable,parameter):
if self.is_number(variable) and self.is_number(parameter):
return variable * parameter
else:
return "ERROR:VARIABLE_NOT_NUMBER or PARAMETER_NOT_NUMBER"
def _divide(self,variable,parameter):
if self.is_number(variable) and self.is_number(parameter):
if parameter==0:
return "ERROR:DIVIDE_BY_ZERO"
else:
return variable/parameter
else:
return "ERROR:VARIABLE_NOT_NUMBER or PARAMETER_NOT_NUMBER"
def thoughts(self) -> str:
return "Assign variables from canvas."

View File

@ -0,0 +1,38 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from agent.component.base import ComponentParamBase, ComponentBase
class WebhookParam(ComponentParamBase):
"""
Define the Begin component parameters.
"""
def __init__(self):
super().__init__()
def get_input_form(self) -> dict[str, dict]:
return getattr(self, "inputs")
class Webhook(ComponentBase):
component_name = "Webhook"
def _invoke(self, **kwargs):
pass
def thoughts(self) -> str:
return ""

File diff suppressed because one or more lines are too long

View File

@ -83,10 +83,10 @@
"value": [] "value": []
} }
}, },
"password": "20010812Yy!", "password": "",
"port": 3306, "port": 3306,
"sql": "{Agent:WickedGoatsDivide@content}", "sql": "{Agent:WickedGoatsDivide@content}",
"username": "13637682833@163.com" "username": ""
} }
}, },
"upstream": [ "upstream": [
@ -527,10 +527,10 @@
"value": [] "value": []
} }
}, },
"password": "20010812Yy!", "password": "",
"port": 3306, "port": 3306,
"sql": "{Agent:WickedGoatsDivide@content}", "sql": "{Agent:WickedGoatsDivide@content}",
"username": "13637682833@163.com" "username": ""
}, },
"label": "ExeSQL", "label": "ExeSQL",
"name": "ExeSQL" "name": "ExeSQL"

View File

@ -0,0 +1,519 @@
{
"id": 27,
"title": {
"en": "Interactive Agent",
"zh": "可交互的 Agent"
},
"description": {
"en": "During the Agents execution, users can actively intervene and interact with the Agent to adjust or guide its output, ensuring the final result aligns with their intentions.",
"zh": "在 Agent 的运行过程中,用户可以随时介入,与 Agent 进行交互,以调整或引导生成结果,使最终输出更符合预期。"
},
"canvas_type": "Agent",
"dsl": {
"components": {
"Agent:LargeFliesMelt": {
"downstream": [
"UserFillUp:GoldBroomsRelate"
],
"obj": {
"component_name": "Agent",
"params": {
"cite": true,
"delay_after_error": 1,
"description": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": "",
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.7,
"llm_id": "qwen-turbo@Tongyi-Qianwen",
"maxTokensEnabled": false,
"max_retries": 3,
"max_rounds": 1,
"max_tokens": 256,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
},
"structured": {}
},
"presencePenaltyEnabled": false,
"presence_penalty": 0.4,
"prompts": [
{
"content": "User query:{sys.query}",
"role": "user"
}
],
"sys_prompt": "<role>\nYou are the Planning Agent in a multi-agent RAG workflow.\nYour sole job is to design a crisp, executable Search Plan for the next agent. Do not search or answer the users question.\n</role>\n<objectives>\nUnderstand the users task and decompose it into evidence-seeking steps.\nProduce high-quality queries and retrieval settings tailored to the task type (fact lookup, multi-hop reasoning, comparison, statistics, how-to, etc.).\nIdentify missing information that would materially change the plan (≤3 concise questions).\nOptimize for source trustworthiness, diversity, and recency; define stopping criteria to avoid over-searching.\nAnswer in 150 words.\n<objectives>",
"temperature": 0.1,
"temperatureEnabled": false,
"tools": [],
"topPEnabled": false,
"top_p": 0.3,
"user_prompt": "",
"visual_files_var": ""
}
},
"upstream": [
"begin"
]
},
"Agent:TangyWordsType": {
"downstream": [
"Message:FreshWallsStudy"
],
"obj": {
"component_name": "Agent",
"params": {
"cite": true,
"delay_after_error": 1,
"description": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": "",
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.7,
"llm_id": "qwen-turbo@Tongyi-Qianwen",
"maxTokensEnabled": false,
"max_retries": 3,
"max_rounds": 1,
"max_tokens": 256,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
},
"structured": {}
},
"presencePenaltyEnabled": false,
"presence_penalty": 0.4,
"prompts": [
{
"content": "Search Plan: {Agent:LargeFliesMelt@content}\n\n\n\nAwait Response feedback:{UserFillUp:GoldBroomsRelate@instructions}\n",
"role": "user"
}
],
"sys_prompt": "<role>\nYou are the Search Agent.\nYour job is to execute the approved Search Plan, integrate the Await Response feedback, retrieve evidence, and produce a well-grounded answer.\n</role>\n<objectives>\nTranslate the plan + feedback into concrete searches.\nCollect diverse, trustworthy, and recent evidence meeting the plans evidence bar.\nSynthesize a concise answer; include citations next to claims they support.\nIf evidence is insufficient or conflicting, clearly state limitations and propose next steps.\n</objectives>\n <tools>\nRetrieval: You must use Retrieval to do the search.\n </tools>\n",
"temperature": 0.1,
"temperatureEnabled": false,
"tools": [
{
"component_name": "Retrieval",
"name": "Retrieval",
"params": {
"cross_languages": [],
"description": "",
"empty_response": "",
"kb_ids": [],
"keywords_similarity_weight": 0.7,
"outputs": {
"formalized_content": {
"type": "string",
"value": ""
},
"json": {
"type": "Array<Object>",
"value": []
}
},
"rerank_id": "",
"similarity_threshold": 0.2,
"toc_enhance": false,
"top_k": 1024,
"top_n": 8,
"use_kg": false
}
}
],
"topPEnabled": false,
"top_p": 0.3,
"user_prompt": "",
"visual_files_var": ""
}
},
"upstream": [
"UserFillUp:GoldBroomsRelate"
]
},
"Message:FreshWallsStudy": {
"downstream": [],
"obj": {
"component_name": "Message",
"params": {
"content": [
"{Agent:TangyWordsType@content}"
]
}
},
"upstream": [
"Agent:TangyWordsType"
]
},
"UserFillUp:GoldBroomsRelate": {
"downstream": [
"Agent:TangyWordsType"
],
"obj": {
"component_name": "UserFillUp",
"params": {
"enable_tips": true,
"inputs": {
"instructions": {
"name": "instructions",
"optional": false,
"options": [],
"type": "paragraph"
}
},
"outputs": {
"instructions": {
"name": "instructions",
"optional": false,
"options": [],
"type": "paragraph"
}
},
"tips": "Here is my search plan:\n{Agent:LargeFliesMelt@content}\nAre you okay with it?"
}
},
"upstream": [
"Agent:LargeFliesMelt"
]
},
"begin": {
"downstream": [
"Agent:LargeFliesMelt"
],
"obj": {
"component_name": "Begin",
"params": {}
},
"upstream": []
}
},
"globals": {
"sys.conversation_turns": 0,
"sys.files": [],
"sys.query": "",
"sys.user_id": ""
},
"graph": {
"edges": [
{
"data": {
"isHovered": false
},
"id": "xy-edge__beginstart-Agent:LargeFliesMeltend",
"source": "begin",
"sourceHandle": "start",
"target": "Agent:LargeFliesMelt",
"targetHandle": "end"
},
{
"data": {
"isHovered": false
},
"id": "xy-edge__Agent:LargeFliesMeltstart-UserFillUp:GoldBroomsRelateend",
"source": "Agent:LargeFliesMelt",
"sourceHandle": "start",
"target": "UserFillUp:GoldBroomsRelate",
"targetHandle": "end"
},
{
"data": {
"isHovered": false
},
"id": "xy-edge__UserFillUp:GoldBroomsRelatestart-Agent:TangyWordsTypeend",
"source": "UserFillUp:GoldBroomsRelate",
"sourceHandle": "start",
"target": "Agent:TangyWordsType",
"targetHandle": "end"
},
{
"id": "xy-edge__Agent:TangyWordsTypetool-Tool:NastyBatsGoend",
"source": "Agent:TangyWordsType",
"sourceHandle": "tool",
"target": "Tool:NastyBatsGo",
"targetHandle": "end"
},
{
"id": "xy-edge__Agent:TangyWordsTypestart-Message:FreshWallsStudyend",
"source": "Agent:TangyWordsType",
"sourceHandle": "start",
"target": "Message:FreshWallsStudy",
"targetHandle": "end"
}
],
"nodes": [
{
"data": {
"label": "Begin",
"name": "begin"
},
"dragging": false,
"id": "begin",
"measured": {
"height": 50,
"width": 200
},
"position": {
"x": 154.9008789064451,
"y": 119.51001744285344
},
"selected": false,
"sourcePosition": "left",
"targetPosition": "right",
"type": "beginNode"
},
{
"data": {
"form": {
"cite": true,
"delay_after_error": 1,
"description": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": "",
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.7,
"llm_id": "qwen-turbo@Tongyi-Qianwen",
"maxTokensEnabled": false,
"max_retries": 3,
"max_rounds": 1,
"max_tokens": 256,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
},
"structured": {}
},
"presencePenaltyEnabled": false,
"presence_penalty": 0.4,
"prompts": [
{
"content": "User query:{sys.query}",
"role": "user"
}
],
"sys_prompt": "<role>\nYou are the Planning Agent in a multi-agent RAG workflow.\nYour sole job is to design a crisp, executable Search Plan for the next agent. Do not search or answer the users question.\n</role>\n<objectives>\nUnderstand the users task and decompose it into evidence-seeking steps.\nProduce high-quality queries and retrieval settings tailored to the task type (fact lookup, multi-hop reasoning, comparison, statistics, how-to, etc.).\nIdentify missing information that would materially change the plan (≤3 concise questions).\nOptimize for source trustworthiness, diversity, and recency; define stopping criteria to avoid over-searching.\nAnswer in 150 words.\n<objectives>",
"temperature": 0.1,
"temperatureEnabled": false,
"tools": [],
"topPEnabled": false,
"top_p": 0.3,
"user_prompt": "",
"visual_files_var": ""
},
"label": "Agent",
"name": "Planning Agent"
},
"dragging": false,
"id": "Agent:LargeFliesMelt",
"measured": {
"height": 90,
"width": 200
},
"position": {
"x": 443.96309330796714,
"y": 104.61370811205677
},
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "agentNode"
},
{
"data": {
"form": {
"enable_tips": true,
"inputs": {
"instructions": {
"name": "instructions",
"optional": false,
"options": [],
"type": "paragraph"
}
},
"outputs": {
"instructions": {
"name": "instructions",
"optional": false,
"options": [],
"type": "paragraph"
}
},
"tips": "Here is my search plan:\n{Agent:LargeFliesMelt@content}\nAre you okay with it?"
},
"label": "UserFillUp",
"name": "Await Response"
},
"dragging": false,
"id": "UserFillUp:GoldBroomsRelate",
"measured": {
"height": 50,
"width": 200
},
"position": {
"x": 683.3409492927474,
"y": 116.76274137645598
},
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "ragNode"
},
{
"data": {
"form": {
"cite": true,
"delay_after_error": 1,
"description": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": "",
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.7,
"llm_id": "qwen-turbo@Tongyi-Qianwen",
"maxTokensEnabled": false,
"max_retries": 3,
"max_rounds": 1,
"max_tokens": 256,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
},
"structured": {}
},
"presencePenaltyEnabled": false,
"presence_penalty": 0.4,
"prompts": [
{
"content": "Search Plan: {Agent:LargeFliesMelt@content}\n\n\n\nAwait Response feedback:{UserFillUp:GoldBroomsRelate@instructions}\n",
"role": "user"
}
],
"sys_prompt": "<role>\nYou are the Search Agent.\nYour job is to execute the approved Search Plan, integrate the Await Response feedback, retrieve evidence, and produce a well-grounded answer.\n</role>\n<objectives>\nTranslate the plan + feedback into concrete searches.\nCollect diverse, trustworthy, and recent evidence meeting the plans evidence bar.\nSynthesize a concise answer; include citations next to claims they support.\nIf evidence is insufficient or conflicting, clearly state limitations and propose next steps.\n</objectives>\n <tools>\nRetrieval: You must use Retrieval to do the search.\n </tools>\n",
"temperature": 0.1,
"temperatureEnabled": false,
"tools": [
{
"component_name": "Retrieval",
"name": "Retrieval",
"params": {
"cross_languages": [],
"description": "",
"empty_response": "",
"kb_ids": [],
"keywords_similarity_weight": 0.7,
"outputs": {
"formalized_content": {
"type": "string",
"value": ""
},
"json": {
"type": "Array<Object>",
"value": []
}
},
"rerank_id": "",
"similarity_threshold": 0.2,
"toc_enhance": false,
"top_k": 1024,
"top_n": 8,
"use_kg": false
}
}
],
"topPEnabled": false,
"top_p": 0.3,
"user_prompt": "",
"visual_files_var": ""
},
"label": "Agent",
"name": "Search Agent"
},
"dragging": false,
"id": "Agent:TangyWordsType",
"measured": {
"height": 90,
"width": 200
},
"position": {
"x": 944.6411255659472,
"y": 99.84499066368488
},
"selected": true,
"sourcePosition": "right",
"targetPosition": "left",
"type": "agentNode"
},
{
"data": {
"form": {
"description": "This is an agent for a specific task.",
"user_prompt": "This is the order you need to send to the agent."
},
"label": "Tool",
"name": "flow.tool_0"
},
"id": "Tool:NastyBatsGo",
"measured": {
"height": 50,
"width": 200
},
"position": {
"x": 862.6411255659472,
"y": 239.84499066368488
},
"sourcePosition": "right",
"targetPosition": "left",
"type": "toolNode"
},
{
"data": {
"form": {
"content": [
"{Agent:TangyWordsType@content}"
]
},
"label": "Message",
"name": "Message"
},
"dragging": false,
"id": "Message:FreshWallsStudy",
"measured": {
"height": 50,
"width": 200
},
"position": {
"x": 1216.7057997987163,
"y": 120.48541298149814
},
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "messageNode"
}
]
},
"history": [],
"messages": [],
"path": [],
"retrieval": [],
"variables": {}
},
"avatar":
""
}

View File

@ -16,7 +16,7 @@
import argparse import argparse
import os import os
from agent.canvas import Canvas from agent.canvas import Canvas
from api import settings from common import settings
if __name__ == '__main__': if __name__ == '__main__':
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()

View File

@ -63,12 +63,18 @@ class ArXiv(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("ArXiv processing"):
return
if not kwargs.get("query"): if not kwargs.get("query"):
self.set_output("formalized_content", "") self.set_output("formalized_content", "")
return "" return ""
last_e = "" last_e = ""
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("ArXiv processing"):
return
try: try:
sort_choices = {"relevance": arxiv.SortCriterion.Relevance, sort_choices = {"relevance": arxiv.SortCriterion.Relevance,
"lastUpdatedDate": arxiv.SortCriterion.LastUpdatedDate, "lastUpdatedDate": arxiv.SortCriterion.LastUpdatedDate,
@ -79,12 +85,20 @@ class ArXiv(ToolBase, ABC):
max_results=self._param.top_n, max_results=self._param.top_n,
sort_by=sort_choices[self._param.sort_by] sort_by=sort_choices[self._param.sort_by]
) )
self._retrieve_chunks(list(arxiv_client.results(search)), results = list(arxiv_client.results(search))
if self.check_if_canceled("ArXiv processing"):
return
self._retrieve_chunks(results,
get_title=lambda r: r.title, get_title=lambda r: r.title,
get_url=lambda r: r.pdf_url, get_url=lambda r: r.pdf_url,
get_content=lambda r: r.summary) get_content=lambda r: r.summary)
return self.output("formalized_content") return self.output("formalized_content")
except Exception as e: except Exception as e:
if self.check_if_canceled("ArXiv processing"):
return
last_e = e last_e = e
logging.exception(f"ArXiv error: {e}") logging.exception(f"ArXiv error: {e}")
time.sleep(self._param.delay_after_error) time.sleep(self._param.delay_after_error)

View File

@ -21,9 +21,8 @@ from functools import partial
from typing import TypedDict, List, Any from typing import TypedDict, List, Any
from agent.component.base import ComponentParamBase, ComponentBase from agent.component.base import ComponentParamBase, ComponentBase
from common.misc_utils import hash_str2int from common.misc_utils import hash_str2int
from rag.llm.chat_model import ToolCallSession
from rag.prompts.generator import kb_prompt from rag.prompts.generator import kb_prompt
from rag.utils.mcp_tool_call_conn import MCPToolCallSession from common.mcp_tool_call_conn import MCPToolCallSession, ToolCallSession
from timeit import default_timer as timer from timeit import default_timer as timer
@ -125,6 +124,9 @@ class ToolBase(ComponentBase):
return self._param.get_meta() return self._param.get_meta()
def invoke(self, **kwargs): def invoke(self, **kwargs):
if self.check_if_canceled("Tool processing"):
return
self.set_output("_created_time", time.perf_counter()) self.set_output("_created_time", time.perf_counter())
try: try:
res = self._invoke(**kwargs) res = self._invoke(**kwargs)
@ -170,4 +172,4 @@ class ToolBase(ComponentBase):
self.set_output("formalized_content", "\n".join(kb_prompt({"chunks": chunks, "doc_aggs": aggs}, 200000, True))) self.set_output("formalized_content", "\n".join(kb_prompt({"chunks": chunks, "doc_aggs": aggs}, 200000, True)))
def thoughts(self) -> str: def thoughts(self) -> str:
return self._canvas.get_component_name(self._id) + " is running..." return self._canvas.get_component_name(self._id) + " is running..."

View File

@ -21,8 +21,8 @@ from strenum import StrEnum
from typing import Optional from typing import Optional
from pydantic import BaseModel, Field, field_validator from pydantic import BaseModel, Field, field_validator
from agent.tools.base import ToolParamBase, ToolBase, ToolMeta from agent.tools.base import ToolParamBase, ToolBase, ToolMeta
from api import settings
from common.connection_utils import timeout from common.connection_utils import timeout
from common import settings
class Language(StrEnum): class Language(StrEnum):
@ -131,10 +131,14 @@ class CodeExec(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("CodeExec processing"):
return
lang = kwargs.get("lang", self._param.lang) lang = kwargs.get("lang", self._param.lang)
script = kwargs.get("script", self._param.script) script = kwargs.get("script", self._param.script)
arguments = {} arguments = {}
for k, v in self._param.arguments.items(): for k, v in self._param.arguments.items():
if kwargs.get(k): if kwargs.get(k):
arguments[k] = kwargs[k] arguments[k] = kwargs[k]
continue continue
@ -149,15 +153,28 @@ class CodeExec(ToolBase, ABC):
def _execute_code(self, language: str, code: str, arguments: dict): def _execute_code(self, language: str, code: str, arguments: dict):
import requests import requests
if self.check_if_canceled("CodeExec execution"):
return
try: try:
code_b64 = self._encode_code(code) code_b64 = self._encode_code(code)
code_req = CodeExecutionRequest(code_b64=code_b64, language=language, arguments=arguments).model_dump() code_req = CodeExecutionRequest(code_b64=code_b64, language=language, arguments=arguments).model_dump()
except Exception as e: except Exception as e:
if self.check_if_canceled("CodeExec execution"):
return
self.set_output("_ERROR", "construct code request error: " + str(e)) self.set_output("_ERROR", "construct code request error: " + str(e))
try: try:
if self.check_if_canceled("CodeExec execution"):
return "Task has been canceled"
resp = requests.post(url=f"http://{settings.SANDBOX_HOST}:9385/run", json=code_req, timeout=int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))) resp = requests.post(url=f"http://{settings.SANDBOX_HOST}:9385/run", json=code_req, timeout=int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60)))
logging.info(f"http://{settings.SANDBOX_HOST}:9385/run, code_req: {code_req}, resp.status_code {resp.status_code}:") logging.info(f"http://{settings.SANDBOX_HOST}:9385/run, code_req: {code_req}, resp.status_code {resp.status_code}:")
if self.check_if_canceled("CodeExec execution"):
return "Task has been canceled"
if resp.status_code != 200: if resp.status_code != 200:
resp.raise_for_status() resp.raise_for_status()
body = resp.json() body = resp.json()
@ -173,16 +190,25 @@ class CodeExec(ToolBase, ABC):
logging.info(f"http://{settings.SANDBOX_HOST}:9385/run -> {rt}") logging.info(f"http://{settings.SANDBOX_HOST}:9385/run -> {rt}")
if isinstance(rt, tuple): if isinstance(rt, tuple):
for i, (k, o) in enumerate(self._param.outputs.items()): for i, (k, o) in enumerate(self._param.outputs.items()):
if self.check_if_canceled("CodeExec execution"):
return
if k.find("_") == 0: if k.find("_") == 0:
continue continue
o["value"] = rt[i] o["value"] = rt[i]
elif isinstance(rt, dict): elif isinstance(rt, dict):
for i, (k, o) in enumerate(self._param.outputs.items()): for i, (k, o) in enumerate(self._param.outputs.items()):
if self.check_if_canceled("CodeExec execution"):
return
if k not in rt or k.find("_") == 0: if k not in rt or k.find("_") == 0:
continue continue
o["value"] = rt[k] o["value"] = rt[k]
else: else:
for i, (k, o) in enumerate(self._param.outputs.items()): for i, (k, o) in enumerate(self._param.outputs.items()):
if self.check_if_canceled("CodeExec execution"):
return
if k.find("_") == 0: if k.find("_") == 0:
continue continue
o["value"] = rt o["value"] = rt
@ -190,6 +216,9 @@ class CodeExec(ToolBase, ABC):
self.set_output("_ERROR", "There is no response from sandbox") self.set_output("_ERROR", "There is no response from sandbox")
except Exception as e: except Exception as e:
if self.check_if_canceled("CodeExec execution"):
return
self.set_output("_ERROR", "Exception executing code: " + str(e)) self.set_output("_ERROR", "Exception executing code: " + str(e))
return self.output() return self.output()

View File

@ -29,7 +29,7 @@ class CrawlerParam(ToolParamBase):
super().__init__() super().__init__()
self.proxy = None self.proxy = None
self.extract_type = "markdown" self.extract_type = "markdown"
def check(self): def check(self):
self.check_valid_value(self.extract_type, "Type of content from the crawler", ['html', 'markdown', 'content']) self.check_valid_value(self.extract_type, "Type of content from the crawler", ['html', 'markdown', 'content'])
@ -47,18 +47,24 @@ class Crawler(ToolBase, ABC):
result = asyncio.run(self.get_web(ans)) result = asyncio.run(self.get_web(ans))
return Crawler.be_output(result) return Crawler.be_output(result)
except Exception as e: except Exception as e:
return Crawler.be_output(f"An unexpected error occurred: {str(e)}") return Crawler.be_output(f"An unexpected error occurred: {str(e)}")
async def get_web(self, url): async def get_web(self, url):
if self.check_if_canceled("Crawler async operation"):
return
proxy = self._param.proxy if self._param.proxy else None proxy = self._param.proxy if self._param.proxy else None
async with AsyncWebCrawler(verbose=True, proxy=proxy) as crawler: async with AsyncWebCrawler(verbose=True, proxy=proxy) as crawler:
result = await crawler.arun( result = await crawler.arun(
url=url, url=url,
bypass_cache=True bypass_cache=True
) )
if self.check_if_canceled("Crawler async operation"):
return
if self._param.extract_type == 'html': if self._param.extract_type == 'html':
return result.cleaned_html return result.cleaned_html
elif self._param.extract_type == 'markdown': elif self._param.extract_type == 'markdown':

View File

@ -46,11 +46,16 @@ class DeepL(ComponentBase, ABC):
component_name = "DeepL" component_name = "DeepL"
def _run(self, history, **kwargs): def _run(self, history, **kwargs):
if self.check_if_canceled("DeepL processing"):
return
ans = self.get_input() ans = self.get_input()
ans = " - ".join(ans["content"]) if "content" in ans else "" ans = " - ".join(ans["content"]) if "content" in ans else ""
if not ans: if not ans:
return DeepL.be_output("") return DeepL.be_output("")
if self.check_if_canceled("DeepL processing"):
return
try: try:
translator = deepl.Translator(self._param.auth_key) translator = deepl.Translator(self._param.auth_key)
result = translator.translate_text(ans, source_lang=self._param.source_lang, result = translator.translate_text(ans, source_lang=self._param.source_lang,
@ -58,4 +63,6 @@ class DeepL(ComponentBase, ABC):
return DeepL.be_output(result.text) return DeepL.be_output(result.text)
except Exception as e: except Exception as e:
if self.check_if_canceled("DeepL processing"):
return
DeepL.be_output("**Error**:" + str(e)) DeepL.be_output("**Error**:" + str(e))

View File

@ -75,17 +75,30 @@ class DuckDuckGo(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("DuckDuckGo processing"):
return
if not kwargs.get("query"): if not kwargs.get("query"):
self.set_output("formalized_content", "") self.set_output("formalized_content", "")
return "" return ""
last_e = "" last_e = ""
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("DuckDuckGo processing"):
return
try: try:
if kwargs.get("topic", "general") == "general": if kwargs.get("topic", "general") == "general":
with DDGS() as ddgs: with DDGS() as ddgs:
if self.check_if_canceled("DuckDuckGo processing"):
return
# {'title': '', 'href': '', 'body': ''} # {'title': '', 'href': '', 'body': ''}
duck_res = ddgs.text(kwargs["query"], max_results=self._param.top_n) duck_res = ddgs.text(kwargs["query"], max_results=self._param.top_n)
if self.check_if_canceled("DuckDuckGo processing"):
return
self._retrieve_chunks(duck_res, self._retrieve_chunks(duck_res,
get_title=lambda r: r["title"], get_title=lambda r: r["title"],
get_url=lambda r: r.get("href", r.get("url")), get_url=lambda r: r.get("href", r.get("url")),
@ -94,8 +107,15 @@ class DuckDuckGo(ToolBase, ABC):
return self.output("formalized_content") return self.output("formalized_content")
else: else:
with DDGS() as ddgs: with DDGS() as ddgs:
if self.check_if_canceled("DuckDuckGo processing"):
return
# {'date': '', 'title': '', 'body': '', 'url': '', 'image': '', 'source': ''} # {'date': '', 'title': '', 'body': '', 'url': '', 'image': '', 'source': ''}
duck_res = ddgs.news(kwargs["query"], max_results=self._param.top_n) duck_res = ddgs.news(kwargs["query"], max_results=self._param.top_n)
if self.check_if_canceled("DuckDuckGo processing"):
return
self._retrieve_chunks(duck_res, self._retrieve_chunks(duck_res,
get_title=lambda r: r["title"], get_title=lambda r: r["title"],
get_url=lambda r: r.get("href", r.get("url")), get_url=lambda r: r.get("href", r.get("url")),
@ -103,6 +123,9 @@ class DuckDuckGo(ToolBase, ABC):
self.set_output("json", duck_res) self.set_output("json", duck_res)
return self.output("formalized_content") return self.output("formalized_content")
except Exception as e: except Exception as e:
if self.check_if_canceled("DuckDuckGo processing"):
return
last_e = e last_e = e
logging.exception(f"DuckDuckGo error: {e}") logging.exception(f"DuckDuckGo error: {e}")
time.sleep(self._param.delay_after_error) time.sleep(self._param.delay_after_error)

View File

@ -101,19 +101,27 @@ class Email(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 60))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 60)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("Email processing"):
return
if not kwargs.get("to_email"): if not kwargs.get("to_email"):
self.set_output("success", False) self.set_output("success", False)
return "" return ""
last_e = "" last_e = ""
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("Email processing"):
return
try: try:
# Parse JSON string passed from upstream # Parse JSON string passed from upstream
email_data = kwargs email_data = kwargs
# Validate required fields # Validate required fields
if "to_email" not in email_data: if "to_email" not in email_data:
return Email.be_output("Missing required field: to_email") self.set_output("_ERROR", "Missing required field: to_email")
self.set_output("success", False)
return False
# Create email object # Create email object
msg = MIMEMultipart('alternative') msg = MIMEMultipart('alternative')
@ -133,6 +141,9 @@ class Email(ToolBase, ABC):
# Connect to SMTP server and send # Connect to SMTP server and send
logging.info(f"Connecting to SMTP server {self._param.smtp_server}:{self._param.smtp_port}") logging.info(f"Connecting to SMTP server {self._param.smtp_server}:{self._param.smtp_port}")
if self.check_if_canceled("Email processing"):
return
context = smtplib.ssl.create_default_context() context = smtplib.ssl.create_default_context()
with smtplib.SMTP(self._param.smtp_server, self._param.smtp_port) as server: with smtplib.SMTP(self._param.smtp_server, self._param.smtp_port) as server:
server.ehlo() server.ehlo()
@ -149,6 +160,10 @@ class Email(ToolBase, ABC):
# Send email # Send email
logging.info(f"Sending email to recipients: {recipients}") logging.info(f"Sending email to recipients: {recipients}")
if self.check_if_canceled("Email processing"):
return
try: try:
server.send_message(msg, self._param.email, recipients) server.send_message(msg, self._param.email, recipients)
success = True success = True

View File

@ -81,6 +81,8 @@ class ExeSQL(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 60))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 60)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("ExeSQL processing"):
return
def convert_decimals(obj): def convert_decimals(obj):
from decimal import Decimal from decimal import Decimal
@ -96,6 +98,9 @@ class ExeSQL(ToolBase, ABC):
if not sql: if not sql:
raise Exception("SQL for `ExeSQL` MUST not be empty.") raise Exception("SQL for `ExeSQL` MUST not be empty.")
if self.check_if_canceled("ExeSQL processing"):
return
vars = self.get_input_elements_from_text(sql) vars = self.get_input_elements_from_text(sql)
args = {} args = {}
for k, o in vars.items(): for k, o in vars.items():
@ -108,6 +113,9 @@ class ExeSQL(ToolBase, ABC):
self.set_input_value(k, args[k]) self.set_input_value(k, args[k])
sql = self.string_format(sql, args) sql = self.string_format(sql, args)
if self.check_if_canceled("ExeSQL processing"):
return
sqls = sql.split(";") sqls = sql.split(";")
if self._param.db_type in ["mysql", "mariadb"]: if self._param.db_type in ["mysql", "mariadb"]:
db = pymysql.connect(db=self._param.database, user=self._param.username, host=self._param.host, db = pymysql.connect(db=self._param.database, user=self._param.username, host=self._param.host,
@ -181,6 +189,10 @@ class ExeSQL(ToolBase, ABC):
sql_res = [] sql_res = []
formalized_content = [] formalized_content = []
for single_sql in sqls: for single_sql in sqls:
if self.check_if_canceled("ExeSQL processing"):
ibm_db.close(conn)
return
single_sql = single_sql.replace("```", "").strip() single_sql = single_sql.replace("```", "").strip()
if not single_sql: if not single_sql:
continue continue
@ -190,6 +202,9 @@ class ExeSQL(ToolBase, ABC):
rows = [] rows = []
row = ibm_db.fetch_assoc(stmt) row = ibm_db.fetch_assoc(stmt)
while row and len(rows) < self._param.max_records: while row and len(rows) < self._param.max_records:
if self.check_if_canceled("ExeSQL processing"):
ibm_db.close(conn)
return
rows.append(row) rows.append(row)
row = ibm_db.fetch_assoc(stmt) row = ibm_db.fetch_assoc(stmt)
@ -220,6 +235,11 @@ class ExeSQL(ToolBase, ABC):
sql_res = [] sql_res = []
formalized_content = [] formalized_content = []
for single_sql in sqls: for single_sql in sqls:
if self.check_if_canceled("ExeSQL processing"):
cursor.close()
db.close()
return
single_sql = single_sql.replace('```','') single_sql = single_sql.replace('```','')
if not single_sql: if not single_sql:
continue continue
@ -244,6 +264,9 @@ class ExeSQL(ToolBase, ABC):
sql_res.append(convert_decimals(single_res.to_dict(orient='records'))) sql_res.append(convert_decimals(single_res.to_dict(orient='records')))
formalized_content.append(single_res.to_markdown(index=False, floatfmt=".6f")) formalized_content.append(single_res.to_markdown(index=False, floatfmt=".6f"))
cursor.close()
db.close()
self.set_output("json", sql_res) self.set_output("json", sql_res)
self.set_output("formalized_content", "\n\n".join(formalized_content)) self.set_output("formalized_content", "\n\n".join(formalized_content))
return self.output("formalized_content") return self.output("formalized_content")

View File

@ -59,17 +59,27 @@ class GitHub(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("GitHub processing"):
return
if not kwargs.get("query"): if not kwargs.get("query"):
self.set_output("formalized_content", "") self.set_output("formalized_content", "")
return "" return ""
last_e = "" last_e = ""
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("GitHub processing"):
return
try: try:
url = 'https://api.github.com/search/repositories?q=' + kwargs["query"] + '&sort=stars&order=desc&per_page=' + str( url = 'https://api.github.com/search/repositories?q=' + kwargs["query"] + '&sort=stars&order=desc&per_page=' + str(
self._param.top_n) self._param.top_n)
headers = {"Content-Type": "application/vnd.github+json", "X-GitHub-Api-Version": '2022-11-28'} headers = {"Content-Type": "application/vnd.github+json", "X-GitHub-Api-Version": '2022-11-28'}
response = requests.get(url=url, headers=headers).json() response = requests.get(url=url, headers=headers).json()
if self.check_if_canceled("GitHub processing"):
return
self._retrieve_chunks(response['items'], self._retrieve_chunks(response['items'],
get_title=lambda r: r["name"], get_title=lambda r: r["name"],
get_url=lambda r: r["html_url"], get_url=lambda r: r["html_url"],
@ -77,6 +87,9 @@ class GitHub(ToolBase, ABC):
self.set_output("json", response['items']) self.set_output("json", response['items'])
return self.output("formalized_content") return self.output("formalized_content")
except Exception as e: except Exception as e:
if self.check_if_canceled("GitHub processing"):
return
last_e = e last_e = e
logging.exception(f"GitHub error: {e}") logging.exception(f"GitHub error: {e}")
time.sleep(self._param.delay_after_error) time.sleep(self._param.delay_after_error)

View File

@ -118,6 +118,9 @@ class Google(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("Google processing"):
return
if not kwargs.get("q"): if not kwargs.get("q"):
self.set_output("formalized_content", "") self.set_output("formalized_content", "")
return "" return ""
@ -132,8 +135,15 @@ class Google(ToolBase, ABC):
} }
last_e = "" last_e = ""
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("Google processing"):
return
try: try:
search = GoogleSearch(params).get_dict() search = GoogleSearch(params).get_dict()
if self.check_if_canceled("Google processing"):
return
self._retrieve_chunks(search["organic_results"], self._retrieve_chunks(search["organic_results"],
get_title=lambda r: r["title"], get_title=lambda r: r["title"],
get_url=lambda r: r["link"], get_url=lambda r: r["link"],
@ -142,6 +152,9 @@ class Google(ToolBase, ABC):
self.set_output("json", search["organic_results"]) self.set_output("json", search["organic_results"])
return self.output("formalized_content") return self.output("formalized_content")
except Exception as e: except Exception as e:
if self.check_if_canceled("Google processing"):
return
last_e = e last_e = e
logging.exception(f"Google error: {e}") logging.exception(f"Google error: {e}")
time.sleep(self._param.delay_after_error) time.sleep(self._param.delay_after_error)

View File

@ -65,15 +65,25 @@ class GoogleScholar(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("GoogleScholar processing"):
return
if not kwargs.get("query"): if not kwargs.get("query"):
self.set_output("formalized_content", "") self.set_output("formalized_content", "")
return "" return ""
last_e = "" last_e = ""
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("GoogleScholar processing"):
return
try: try:
scholar_client = scholarly.search_pubs(kwargs["query"], patents=self._param.patents, year_low=self._param.year_low, scholar_client = scholarly.search_pubs(kwargs["query"], patents=self._param.patents, year_low=self._param.year_low,
year_high=self._param.year_high, sort_by=self._param.sort_by) year_high=self._param.year_high, sort_by=self._param.sort_by)
if self.check_if_canceled("GoogleScholar processing"):
return
self._retrieve_chunks(scholar_client, self._retrieve_chunks(scholar_client,
get_title=lambda r: r['bib']['title'], get_title=lambda r: r['bib']['title'],
get_url=lambda r: r["pub_url"], get_url=lambda r: r["pub_url"],
@ -82,6 +92,9 @@ class GoogleScholar(ToolBase, ABC):
self.set_output("json", list(scholar_client)) self.set_output("json", list(scholar_client))
return self.output("formalized_content") return self.output("formalized_content")
except Exception as e: except Exception as e:
if self.check_if_canceled("GoogleScholar processing"):
return
last_e = e last_e = e
logging.exception(f"GoogleScholar error: {e}") logging.exception(f"GoogleScholar error: {e}")
time.sleep(self._param.delay_after_error) time.sleep(self._param.delay_after_error)

View File

@ -50,6 +50,9 @@ class Jin10(ComponentBase, ABC):
component_name = "Jin10" component_name = "Jin10"
def _run(self, history, **kwargs): def _run(self, history, **kwargs):
if self.check_if_canceled("Jin10 processing"):
return
ans = self.get_input() ans = self.get_input()
ans = " - ".join(ans["content"]) if "content" in ans else "" ans = " - ".join(ans["content"]) if "content" in ans else ""
if not ans: if not ans:
@ -58,6 +61,9 @@ class Jin10(ComponentBase, ABC):
jin10_res = [] jin10_res = []
headers = {'secret-key': self._param.secret_key} headers = {'secret-key': self._param.secret_key}
try: try:
if self.check_if_canceled("Jin10 processing"):
return
if self._param.type == "flash": if self._param.type == "flash":
params = { params = {
'category': self._param.flash_type, 'category': self._param.flash_type,
@ -69,6 +75,8 @@ class Jin10(ComponentBase, ABC):
headers=headers, data=json.dumps(params)) headers=headers, data=json.dumps(params))
response = response.json() response = response.json()
for i in response['data']: for i in response['data']:
if self.check_if_canceled("Jin10 processing"):
return
jin10_res.append({"content": i['data']['content']}) jin10_res.append({"content": i['data']['content']})
if self._param.type == "calendar": if self._param.type == "calendar":
params = { params = {
@ -79,6 +87,8 @@ class Jin10(ComponentBase, ABC):
headers=headers, data=json.dumps(params)) headers=headers, data=json.dumps(params))
response = response.json() response = response.json()
if self.check_if_canceled("Jin10 processing"):
return
jin10_res.append({"content": pd.DataFrame(response['data']).to_markdown()}) jin10_res.append({"content": pd.DataFrame(response['data']).to_markdown()})
if self._param.type == "symbols": if self._param.type == "symbols":
params = { params = {
@ -90,8 +100,12 @@ class Jin10(ComponentBase, ABC):
url='https://open-data-api.jin10.com/data-api/' + self._param.symbols_datatype + '?type=' + self._param.symbols_type, url='https://open-data-api.jin10.com/data-api/' + self._param.symbols_datatype + '?type=' + self._param.symbols_type,
headers=headers, data=json.dumps(params)) headers=headers, data=json.dumps(params))
response = response.json() response = response.json()
if self.check_if_canceled("Jin10 processing"):
return
if self._param.symbols_datatype == "symbols": if self._param.symbols_datatype == "symbols":
for i in response['data']: for i in response['data']:
if self.check_if_canceled("Jin10 processing"):
return
i['Commodity Code'] = i['c'] i['Commodity Code'] = i['c']
i['Stock Exchange'] = i['e'] i['Stock Exchange'] = i['e']
i['Commodity Name'] = i['n'] i['Commodity Name'] = i['n']
@ -99,6 +113,8 @@ class Jin10(ComponentBase, ABC):
del i['c'], i['e'], i['n'], i['t'] del i['c'], i['e'], i['n'], i['t']
if self._param.symbols_datatype == "quotes": if self._param.symbols_datatype == "quotes":
for i in response['data']: for i in response['data']:
if self.check_if_canceled("Jin10 processing"):
return
i['Selling Price'] = i['a'] i['Selling Price'] = i['a']
i['Buying Price'] = i['b'] i['Buying Price'] = i['b']
i['Commodity Code'] = i['c'] i['Commodity Code'] = i['c']
@ -120,8 +136,12 @@ class Jin10(ComponentBase, ABC):
url='https://open-data-api.jin10.com/data-api/news', url='https://open-data-api.jin10.com/data-api/news',
headers=headers, data=json.dumps(params)) headers=headers, data=json.dumps(params))
response = response.json() response = response.json()
if self.check_if_canceled("Jin10 processing"):
return
jin10_res.append({"content": pd.DataFrame(response['data']).to_markdown()}) jin10_res.append({"content": pd.DataFrame(response['data']).to_markdown()})
except Exception as e: except Exception as e:
if self.check_if_canceled("Jin10 processing"):
return
return Jin10.be_output("**ERROR**: " + str(e)) return Jin10.be_output("**ERROR**: " + str(e))
if not jin10_res: if not jin10_res:

View File

@ -71,23 +71,40 @@ class PubMed(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("PubMed processing"):
return
if not kwargs.get("query"): if not kwargs.get("query"):
self.set_output("formalized_content", "") self.set_output("formalized_content", "")
return "" return ""
last_e = "" last_e = ""
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("PubMed processing"):
return
try: try:
Entrez.email = self._param.email Entrez.email = self._param.email
pubmedids = Entrez.read(Entrez.esearch(db='pubmed', retmax=self._param.top_n, term=kwargs["query"]))['IdList'] pubmedids = Entrez.read(Entrez.esearch(db='pubmed', retmax=self._param.top_n, term=kwargs["query"]))['IdList']
if self.check_if_canceled("PubMed processing"):
return
pubmedcnt = ET.fromstring(re.sub(r'<(/?)b>|<(/?)i>', '', Entrez.efetch(db='pubmed', id=",".join(pubmedids), pubmedcnt = ET.fromstring(re.sub(r'<(/?)b>|<(/?)i>', '', Entrez.efetch(db='pubmed', id=",".join(pubmedids),
retmode="xml").read().decode("utf-8"))) retmode="xml").read().decode("utf-8")))
if self.check_if_canceled("PubMed processing"):
return
self._retrieve_chunks(pubmedcnt.findall("PubmedArticle"), self._retrieve_chunks(pubmedcnt.findall("PubmedArticle"),
get_title=lambda child: child.find("MedlineCitation").find("Article").find("ArticleTitle").text, get_title=lambda child: child.find("MedlineCitation").find("Article").find("ArticleTitle").text,
get_url=lambda child: "https://pubmed.ncbi.nlm.nih.gov/" + child.find("MedlineCitation").find("PMID").text, get_url=lambda child: "https://pubmed.ncbi.nlm.nih.gov/" + child.find("MedlineCitation").find("PMID").text,
get_content=lambda child: self._format_pubmed_content(child),) get_content=lambda child: self._format_pubmed_content(child),)
return self.output("formalized_content") return self.output("formalized_content")
except Exception as e: except Exception as e:
if self.check_if_canceled("PubMed processing"):
return
last_e = e last_e = e
logging.exception(f"PubMed error: {e}") logging.exception(f"PubMed error: {e}")
time.sleep(self._param.delay_after_error) time.sleep(self._param.delay_after_error)

View File

@ -58,12 +58,18 @@ class QWeather(ComponentBase, ABC):
component_name = "QWeather" component_name = "QWeather"
def _run(self, history, **kwargs): def _run(self, history, **kwargs):
if self.check_if_canceled("Qweather processing"):
return
ans = self.get_input() ans = self.get_input()
ans = "".join(ans["content"]) if "content" in ans else "" ans = "".join(ans["content"]) if "content" in ans else ""
if not ans: if not ans:
return QWeather.be_output("") return QWeather.be_output("")
try: try:
if self.check_if_canceled("Qweather processing"):
return
response = requests.get( response = requests.get(
url="https://geoapi.qweather.com/v2/city/lookup?location=" + ans + "&key=" + self._param.web_apikey).json() url="https://geoapi.qweather.com/v2/city/lookup?location=" + ans + "&key=" + self._param.web_apikey).json()
if response["code"] == "200": if response["code"] == "200":
@ -71,16 +77,23 @@ class QWeather(ComponentBase, ABC):
else: else:
return QWeather.be_output("**Error**" + self._param.error_code[response["code"]]) return QWeather.be_output("**Error**" + self._param.error_code[response["code"]])
if self.check_if_canceled("Qweather processing"):
return
base_url = "https://api.qweather.com/v7/" if self._param.user_type == 'paid' else "https://devapi.qweather.com/v7/" base_url = "https://api.qweather.com/v7/" if self._param.user_type == 'paid' else "https://devapi.qweather.com/v7/"
if self._param.type == "weather": if self._param.type == "weather":
url = base_url + "weather/" + self._param.time_period + "?location=" + location_id + "&key=" + self._param.web_apikey + "&lang=" + self._param.lang url = base_url + "weather/" + self._param.time_period + "?location=" + location_id + "&key=" + self._param.web_apikey + "&lang=" + self._param.lang
response = requests.get(url=url).json() response = requests.get(url=url).json()
if self.check_if_canceled("Qweather processing"):
return
if response["code"] == "200": if response["code"] == "200":
if self._param.time_period == "now": if self._param.time_period == "now":
return QWeather.be_output(str(response["now"])) return QWeather.be_output(str(response["now"]))
else: else:
qweather_res = [{"content": str(i) + "\n"} for i in response["daily"]] qweather_res = [{"content": str(i) + "\n"} for i in response["daily"]]
if self.check_if_canceled("Qweather processing"):
return
if not qweather_res: if not qweather_res:
return QWeather.be_output("") return QWeather.be_output("")
@ -92,6 +105,8 @@ class QWeather(ComponentBase, ABC):
elif self._param.type == "indices": elif self._param.type == "indices":
url = base_url + "indices/1d?type=0&location=" + location_id + "&key=" + self._param.web_apikey + "&lang=" + self._param.lang url = base_url + "indices/1d?type=0&location=" + location_id + "&key=" + self._param.web_apikey + "&lang=" + self._param.lang
response = requests.get(url=url).json() response = requests.get(url=url).json()
if self.check_if_canceled("Qweather processing"):
return
if response["code"] == "200": if response["code"] == "200":
indices_res = response["daily"][0]["date"] + "\n" + "\n".join( indices_res = response["daily"][0]["date"] + "\n" + "\n".join(
[i["name"] + ": " + i["category"] + ", " + i["text"] for i in response["daily"]]) [i["name"] + ": " + i["category"] + ", " + i["text"] for i in response["daily"]])
@ -103,9 +118,13 @@ class QWeather(ComponentBase, ABC):
elif self._param.type == "airquality": elif self._param.type == "airquality":
url = base_url + "air/now?location=" + location_id + "&key=" + self._param.web_apikey + "&lang=" + self._param.lang url = base_url + "air/now?location=" + location_id + "&key=" + self._param.web_apikey + "&lang=" + self._param.lang
response = requests.get(url=url).json() response = requests.get(url=url).json()
if self.check_if_canceled("Qweather processing"):
return
if response["code"] == "200": if response["code"] == "200":
return QWeather.be_output(str(response["now"])) return QWeather.be_output(str(response["now"]))
else: else:
return QWeather.be_output("**Error**" + self._param.error_code[response["code"]]) return QWeather.be_output("**Error**" + self._param.error_code[response["code"]])
except Exception as e: except Exception as e:
if self.check_if_canceled("Qweather processing"):
return
return QWeather.be_output("**Error**" + str(e)) return QWeather.be_output("**Error**" + str(e))

View File

@ -24,8 +24,7 @@ from api.db.services.document_service import DocumentService
from api.db.services.dialog_service import meta_filter from api.db.services.dialog_service import meta_filter
from api.db.services.knowledgebase_service import KnowledgebaseService from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.llm_service import LLMBundle from api.db.services.llm_service import LLMBundle
from api import settings from common import settings
from common import globals
from common.connection_utils import timeout from common.connection_utils import timeout
from rag.app.tag import label_question from rag.app.tag import label_question
from rag.prompts.generator import cross_languages, kb_prompt, gen_meta_filter from rag.prompts.generator import cross_languages, kb_prompt, gen_meta_filter
@ -83,8 +82,12 @@ class Retrieval(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("Retrieval processing"):
return
if not kwargs.get("query"): if not kwargs.get("query"):
self.set_output("formalized_content", self._param.empty_response) self.set_output("formalized_content", self._param.empty_response)
return
kb_ids: list[str] = [] kb_ids: list[str] = []
for id in self._param.kb_ids: for id in self._param.kb_ids:
@ -123,7 +126,7 @@ class Retrieval(ToolBase, ABC):
vars = self.get_input_elements_from_text(kwargs["query"]) vars = self.get_input_elements_from_text(kwargs["query"])
vars = {k:o["value"] for k,o in vars.items()} vars = {k:o["value"] for k,o in vars.items()}
query = self.string_format(kwargs["query"], vars) query = self.string_format(kwargs["query"], vars)
doc_ids=[] doc_ids=[]
if self._param.meta_data_filter!={}: if self._param.meta_data_filter!={}:
metas = DocumentService.get_meta_by_kbs(kb_ids) metas = DocumentService.get_meta_by_kbs(kb_ids)
@ -136,7 +139,7 @@ class Retrieval(ToolBase, ABC):
elif self._param.meta_data_filter.get("method") == "manual": elif self._param.meta_data_filter.get("method") == "manual":
filters=self._param.meta_data_filter["manual"] filters=self._param.meta_data_filter["manual"]
for flt in filters: for flt in filters:
pat = re.compile(r"\{* *\{([a-zA-Z:0-9]+@[A-Za-z:0-9_.-]+|sys\.[a-z_]+)\} *\}*") pat = re.compile(self.variable_ref_patt)
s = flt["value"] s = flt["value"]
out_parts = [] out_parts = []
last = 0 last = 0
@ -171,7 +174,7 @@ class Retrieval(ToolBase, ABC):
if kbs: if kbs:
query = re.sub(r"^user[:\s]*", "", query, flags=re.IGNORECASE) query = re.sub(r"^user[:\s]*", "", query, flags=re.IGNORECASE)
kbinfos = globals.retriever.retrieval( kbinfos = settings.retriever.retrieval(
query, query,
embd_mdl, embd_mdl,
[kb.tenant_id for kb in kbs], [kb.tenant_id for kb in kbs],
@ -185,9 +188,14 @@ class Retrieval(ToolBase, ABC):
rerank_mdl=rerank_mdl, rerank_mdl=rerank_mdl,
rank_feature=label_question(query, kbs), rank_feature=label_question(query, kbs),
) )
if self.check_if_canceled("Retrieval processing"):
return
if self._param.toc_enhance: if self._param.toc_enhance:
chat_mdl = LLMBundle(self._canvas._tenant_id, LLMType.CHAT) chat_mdl = LLMBundle(self._canvas._tenant_id, LLMType.CHAT)
cks = globals.retriever.retrieval_by_toc(query, kbinfos["chunks"], [kb.tenant_id for kb in kbs], chat_mdl, self._param.top_n) cks = settings.retriever.retrieval_by_toc(query, kbinfos["chunks"], [kb.tenant_id for kb in kbs], chat_mdl, self._param.top_n)
if self.check_if_canceled("Retrieval processing"):
return
if cks: if cks:
kbinfos["chunks"] = cks kbinfos["chunks"] = cks
if self._param.use_kg: if self._param.use_kg:
@ -196,6 +204,8 @@ class Retrieval(ToolBase, ABC):
kb_ids, kb_ids,
embd_mdl, embd_mdl,
LLMBundle(self._canvas.get_tenant_id(), LLMType.CHAT)) LLMBundle(self._canvas.get_tenant_id(), LLMType.CHAT))
if self.check_if_canceled("Retrieval processing"):
return
if ck["content_with_weight"]: if ck["content_with_weight"]:
kbinfos["chunks"].insert(0, ck) kbinfos["chunks"].insert(0, ck)
else: else:
@ -203,6 +213,8 @@ class Retrieval(ToolBase, ABC):
if self._param.use_kg and kbs: if self._param.use_kg and kbs:
ck = settings.kg_retriever.retrieval(query, [kb.tenant_id for kb in kbs], filtered_kb_ids, embd_mdl, LLMBundle(kbs[0].tenant_id, LLMType.CHAT)) ck = settings.kg_retriever.retrieval(query, [kb.tenant_id for kb in kbs], filtered_kb_ids, embd_mdl, LLMBundle(kbs[0].tenant_id, LLMType.CHAT))
if self.check_if_canceled("Retrieval processing"):
return
if ck["content_with_weight"]: if ck["content_with_weight"]:
ck["content"] = ck["content_with_weight"] ck["content"] = ck["content_with_weight"]
del ck["content_with_weight"] del ck["content_with_weight"]

View File

@ -79,6 +79,9 @@ class SearXNG(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("SearXNG processing"):
return
# Gracefully handle try-run without inputs # Gracefully handle try-run without inputs
query = kwargs.get("query") query = kwargs.get("query")
if not query or not isinstance(query, str) or not query.strip(): if not query or not isinstance(query, str) or not query.strip():
@ -93,6 +96,9 @@ class SearXNG(ToolBase, ABC):
last_e = "" last_e = ""
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("SearXNG processing"):
return
try: try:
search_params = { search_params = {
'q': query, 'q': query,
@ -110,6 +116,9 @@ class SearXNG(ToolBase, ABC):
) )
response.raise_for_status() response.raise_for_status()
if self.check_if_canceled("SearXNG processing"):
return
data = response.json() data = response.json()
if not data or not isinstance(data, dict): if not data or not isinstance(data, dict):
@ -121,6 +130,9 @@ class SearXNG(ToolBase, ABC):
results = results[:self._param.top_n] results = results[:self._param.top_n]
if self.check_if_canceled("SearXNG processing"):
return
self._retrieve_chunks(results, self._retrieve_chunks(results,
get_title=lambda r: r.get("title", ""), get_title=lambda r: r.get("title", ""),
get_url=lambda r: r.get("url", ""), get_url=lambda r: r.get("url", ""),
@ -130,10 +142,16 @@ class SearXNG(ToolBase, ABC):
return self.output("formalized_content") return self.output("formalized_content")
except requests.RequestException as e: except requests.RequestException as e:
if self.check_if_canceled("SearXNG processing"):
return
last_e = f"Network error: {e}" last_e = f"Network error: {e}"
logging.exception(f"SearXNG network error: {e}") logging.exception(f"SearXNG network error: {e}")
time.sleep(self._param.delay_after_error) time.sleep(self._param.delay_after_error)
except Exception as e: except Exception as e:
if self.check_if_canceled("SearXNG processing"):
return
last_e = str(e) last_e = str(e)
logging.exception(f"SearXNG error: {e}") logging.exception(f"SearXNG error: {e}")
time.sleep(self._param.delay_after_error) time.sleep(self._param.delay_after_error)

View File

@ -103,6 +103,9 @@ class TavilySearch(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("TavilySearch processing"):
return
if not kwargs.get("query"): if not kwargs.get("query"):
self.set_output("formalized_content", "") self.set_output("formalized_content", "")
return "" return ""
@ -113,10 +116,16 @@ class TavilySearch(ToolBase, ABC):
if fld not in kwargs: if fld not in kwargs:
kwargs[fld] = getattr(self._param, fld) kwargs[fld] = getattr(self._param, fld)
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("TavilySearch processing"):
return
try: try:
kwargs["include_images"] = False kwargs["include_images"] = False
kwargs["include_raw_content"] = False kwargs["include_raw_content"] = False
res = self.tavily_client.search(**kwargs) res = self.tavily_client.search(**kwargs)
if self.check_if_canceled("TavilySearch processing"):
return
self._retrieve_chunks(res["results"], self._retrieve_chunks(res["results"],
get_title=lambda r: r["title"], get_title=lambda r: r["title"],
get_url=lambda r: r["url"], get_url=lambda r: r["url"],
@ -125,6 +134,9 @@ class TavilySearch(ToolBase, ABC):
self.set_output("json", res["results"]) self.set_output("json", res["results"])
return self.output("formalized_content") return self.output("formalized_content")
except Exception as e: except Exception as e:
if self.check_if_canceled("TavilySearch processing"):
return
last_e = e last_e = e
logging.exception(f"Tavily error: {e}") logging.exception(f"Tavily error: {e}")
time.sleep(self._param.delay_after_error) time.sleep(self._param.delay_after_error)
@ -201,6 +213,9 @@ class TavilyExtract(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("TavilyExtract processing"):
return
self.tavily_client = TavilyClient(api_key=self._param.api_key) self.tavily_client = TavilyClient(api_key=self._param.api_key)
last_e = None last_e = None
for fld in ["urls", "extract_depth", "format"]: for fld in ["urls", "extract_depth", "format"]:
@ -209,12 +224,21 @@ class TavilyExtract(ToolBase, ABC):
if kwargs.get("urls") and isinstance(kwargs["urls"], str): if kwargs.get("urls") and isinstance(kwargs["urls"], str):
kwargs["urls"] = kwargs["urls"].split(",") kwargs["urls"] = kwargs["urls"].split(",")
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("TavilyExtract processing"):
return
try: try:
kwargs["include_images"] = False kwargs["include_images"] = False
res = self.tavily_client.extract(**kwargs) res = self.tavily_client.extract(**kwargs)
if self.check_if_canceled("TavilyExtract processing"):
return
self.set_output("json", res["results"]) self.set_output("json", res["results"])
return self.output("json") return self.output("json")
except Exception as e: except Exception as e:
if self.check_if_canceled("TavilyExtract processing"):
return
last_e = e last_e = e
logging.exception(f"Tavily error: {e}") logging.exception(f"Tavily error: {e}")
if last_e: if last_e:

View File

@ -43,12 +43,18 @@ class TuShare(ComponentBase, ABC):
component_name = "TuShare" component_name = "TuShare"
def _run(self, history, **kwargs): def _run(self, history, **kwargs):
if self.check_if_canceled("TuShare processing"):
return
ans = self.get_input() ans = self.get_input()
ans = ",".join(ans["content"]) if "content" in ans else "" ans = ",".join(ans["content"]) if "content" in ans else ""
if not ans: if not ans:
return TuShare.be_output("") return TuShare.be_output("")
try: try:
if self.check_if_canceled("TuShare processing"):
return
tus_res = [] tus_res = []
params = { params = {
"api_name": "news", "api_name": "news",
@ -58,12 +64,18 @@ class TuShare(ComponentBase, ABC):
} }
response = requests.post(url="http://api.tushare.pro", data=json.dumps(params).encode('utf-8')) response = requests.post(url="http://api.tushare.pro", data=json.dumps(params).encode('utf-8'))
response = response.json() response = response.json()
if self.check_if_canceled("TuShare processing"):
return
if response['code'] != 0: if response['code'] != 0:
return TuShare.be_output(response['msg']) return TuShare.be_output(response['msg'])
df = pd.DataFrame(response['data']['items']) df = pd.DataFrame(response['data']['items'])
df.columns = response['data']['fields'] df.columns = response['data']['fields']
if self.check_if_canceled("TuShare processing"):
return
tus_res.append({"content": (df[df['content'].str.contains(self._param.keyword, case=False)]).to_markdown()}) tus_res.append({"content": (df[df['content'].str.contains(self._param.keyword, case=False)]).to_markdown()})
except Exception as e: except Exception as e:
if self.check_if_canceled("TuShare processing"):
return
return TuShare.be_output("**ERROR**: " + str(e)) return TuShare.be_output("**ERROR**: " + str(e))
if not tus_res: if not tus_res:

View File

@ -70,19 +70,31 @@ class WenCai(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 12)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("WenCai processing"):
return
if not kwargs.get("query"): if not kwargs.get("query"):
self.set_output("report", "") self.set_output("report", "")
return "" return ""
last_e = "" last_e = ""
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("WenCai processing"):
return
try: try:
wencai_res = [] wencai_res = []
res = pywencai.get(query=kwargs["query"], query_type=self._param.query_type, perpage=self._param.top_n) res = pywencai.get(query=kwargs["query"], query_type=self._param.query_type, perpage=self._param.top_n)
if self.check_if_canceled("WenCai processing"):
return
if isinstance(res, pd.DataFrame): if isinstance(res, pd.DataFrame):
wencai_res.append(res.to_markdown()) wencai_res.append(res.to_markdown())
elif isinstance(res, dict): elif isinstance(res, dict):
for item in res.items(): for item in res.items():
if self.check_if_canceled("WenCai processing"):
return
if isinstance(item[1], list): if isinstance(item[1], list):
wencai_res.append(item[0] + "\n" + pd.DataFrame(item[1]).to_markdown()) wencai_res.append(item[0] + "\n" + pd.DataFrame(item[1]).to_markdown())
elif isinstance(item[1], str): elif isinstance(item[1], str):
@ -100,6 +112,9 @@ class WenCai(ToolBase, ABC):
self.set_output("report", "\n\n".join(wencai_res)) self.set_output("report", "\n\n".join(wencai_res))
return self.output("report") return self.output("report")
except Exception as e: except Exception as e:
if self.check_if_canceled("WenCai processing"):
return
last_e = e last_e = e
logging.exception(f"WenCai error: {e}") logging.exception(f"WenCai error: {e}")
time.sleep(self._param.delay_after_error) time.sleep(self._param.delay_after_error)

View File

@ -66,17 +66,26 @@ class Wikipedia(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 60))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 60)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("Wikipedia processing"):
return
if not kwargs.get("query"): if not kwargs.get("query"):
self.set_output("formalized_content", "") self.set_output("formalized_content", "")
return "" return ""
last_e = "" last_e = ""
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("Wikipedia processing"):
return
try: try:
wikipedia.set_lang(self._param.language) wikipedia.set_lang(self._param.language)
wiki_engine = wikipedia wiki_engine = wikipedia
pages = [] pages = []
for p in wiki_engine.search(kwargs["query"], results=self._param.top_n): for p in wiki_engine.search(kwargs["query"], results=self._param.top_n):
if self.check_if_canceled("Wikipedia processing"):
return
try: try:
pages.append(wikipedia.page(p)) pages.append(wikipedia.page(p))
except Exception: except Exception:
@ -87,6 +96,9 @@ class Wikipedia(ToolBase, ABC):
get_content=lambda r: r.summary) get_content=lambda r: r.summary)
return self.output("formalized_content") return self.output("formalized_content")
except Exception as e: except Exception as e:
if self.check_if_canceled("Wikipedia processing"):
return
last_e = e last_e = e
logging.exception(f"Wikipedia error: {e}") logging.exception(f"Wikipedia error: {e}")
time.sleep(self._param.delay_after_error) time.sleep(self._param.delay_after_error)

View File

@ -74,15 +74,24 @@ class YahooFinance(ToolBase, ABC):
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 60))) @timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 60)))
def _invoke(self, **kwargs): def _invoke(self, **kwargs):
if self.check_if_canceled("YahooFinance processing"):
return
if not kwargs.get("stock_code"): if not kwargs.get("stock_code"):
self.set_output("report", "") self.set_output("report", "")
return "" return ""
last_e = "" last_e = ""
for _ in range(self._param.max_retries+1): for _ in range(self._param.max_retries+1):
if self.check_if_canceled("YahooFinance processing"):
return
yohoo_res = [] yohoo_res = []
try: try:
msft = yf.Ticker(kwargs["stock_code"]) msft = yf.Ticker(kwargs["stock_code"])
if self.check_if_canceled("YahooFinance processing"):
return
if self._param.info: if self._param.info:
yohoo_res.append("# Information:\n" + pd.Series(msft.info).to_markdown() + "\n") yohoo_res.append("# Information:\n" + pd.Series(msft.info).to_markdown() + "\n")
if self._param.history: if self._param.history:
@ -100,6 +109,9 @@ class YahooFinance(ToolBase, ABC):
self.set_output("report", "\n\n".join(yohoo_res)) self.set_output("report", "\n\n".join(yohoo_res))
return self.output("report") return self.output("report")
except Exception as e: except Exception as e:
if self.check_if_canceled("YahooFinance processing"):
return
last_e = e last_e = e
logging.exception(f"YahooFinance error: {e}") logging.exception(f"YahooFinance error: {e}")
time.sleep(self._param.delay_after_error) time.sleep(self._param.delay_after_error)

View File

@ -18,12 +18,11 @@ import sys
import logging import logging
from importlib.util import module_from_spec, spec_from_file_location from importlib.util import module_from_spec, spec_from_file_location
from pathlib import Path from pathlib import Path
from flask import Blueprint, Flask from quart import Blueprint, Quart, request, g, current_app, session
from werkzeug.wrappers.request import Request from werkzeug.wrappers.request import Request
from flask_cors import CORS
from flasgger import Swagger from flasgger import Swagger
from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer
from quart_cors import cors
from common.constants import StatusEnum from common.constants import StatusEnum
from api.db.db_models import close_connection from api.db.db_models import close_connection
from api.db.services import UserService from api.db.services import UserService
@ -31,17 +30,20 @@ from api.utils.json_encode import CustomJSONEncoder
from api.utils import commands from api.utils import commands
from flask_mail import Mail from flask_mail import Mail
from flask_session import Session from quart_auth import Unauthorized
from flask_login import LoginManager from common import settings
from api import settings
from api.utils.api_utils import server_error_response from api.utils.api_utils import server_error_response
from api.constants import API_VERSION from api.constants import API_VERSION
from common.misc_utils import get_uuid
settings.init_settings()
__all__ = ["app"] __all__ = ["app"]
Request.json = property(lambda self: self.get_json(force=True, silent=True)) Request.json = property(lambda self: self.get_json(force=True, silent=True))
app = Flask(__name__) app = Quart(__name__)
app = cors(app, allow_origin="*")
smtp_mail_server = Mail() smtp_mail_server = Mail()
# Add this at the beginning of your file to configure Swagger UI # Add this at the beginning of your file to configure Swagger UI
@ -76,7 +78,6 @@ swagger = Swagger(
}, },
) )
CORS(app, supports_credentials=True, max_age=2592000)
app.url_map.strict_slashes = False app.url_map.strict_slashes = False
app.json_encoder = CustomJSONEncoder app.json_encoder = CustomJSONEncoder
app.errorhandler(Exception)(server_error_response) app.errorhandler(Exception)(server_error_response)
@ -84,24 +85,150 @@ app.errorhandler(Exception)(server_error_response)
## convince for dev and debug ## convince for dev and debug
# app.config["LOGIN_DISABLED"] = True # app.config["LOGIN_DISABLED"] = True
app.config["SESSION_PERMANENT"] = False app.config["SESSION_PERMANENT"] = False
app.config["SESSION_TYPE"] = "filesystem" app.config["SESSION_TYPE"] = "redis"
app.config["SESSION_REDIS"] = settings.decrypt_database_config(name="redis")
app.config["MAX_CONTENT_LENGTH"] = int( app.config["MAX_CONTENT_LENGTH"] = int(
os.environ.get("MAX_CONTENT_LENGTH", 1024 * 1024 * 1024) os.environ.get("MAX_CONTENT_LENGTH", 1024 * 1024 * 1024)
) )
app.config['SECRET_KEY'] = settings.SECRET_KEY
Session(app) app.secret_key = settings.SECRET_KEY
login_manager = LoginManager()
login_manager.init_app(app)
commands.register_commands(app) commands.register_commands(app)
from functools import wraps
from typing import ParamSpec, TypeVar
from collections.abc import Awaitable, Callable
from werkzeug.local import LocalProxy
def search_pages_path(pages_dir): T = TypeVar("T")
P = ParamSpec("P")
def _load_user():
jwt = Serializer(secret_key=settings.SECRET_KEY)
authorization = request.headers.get("Authorization")
g.user = None
if not authorization:
return
try:
access_token = str(jwt.loads(authorization))
if not access_token or not access_token.strip():
logging.warning("Authentication attempt with empty access token")
return None
# Access tokens should be UUIDs (32 hex characters)
if len(access_token.strip()) < 32:
logging.warning(f"Authentication attempt with invalid token format: {len(access_token)} chars")
return None
user = UserService.query(
access_token=access_token, status=StatusEnum.VALID.value
)
if user:
if not user[0].access_token or not user[0].access_token.strip():
logging.warning(f"User {user[0].email} has empty access_token in database")
return None
g.user = user[0]
return user[0]
except Exception as e:
logging.warning(f"load_user got exception {e}")
current_user = LocalProxy(_load_user)
def login_required(func: Callable[P, Awaitable[T]]) -> Callable[P, Awaitable[T]]:
"""A decorator to restrict route access to authenticated users.
This should be used to wrap a route handler (or view function) to
enforce that only authenticated requests can access it. Note that
it is important that this decorator be wrapped by the route
decorator and not vice, versa, as below.
.. code-block:: python
@app.route('/')
@login_required
async def index():
...
If the request is not authenticated a
`quart.exceptions.Unauthorized` exception will be raised.
"""
@wraps(func)
async def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
if not current_user:# or not session.get("_user_id"):
raise Unauthorized()
else:
return await current_app.ensure_async(func)(*args, **kwargs)
return wrapper
def login_user(user, remember=False, duration=None, force=False, fresh=True):
"""
Logs a user in. You should pass the actual user object to this. If the
user's `is_active` property is ``False``, they will not be logged in
unless `force` is ``True``.
This will return ``True`` if the log in attempt succeeds, and ``False`` if
it fails (i.e. because the user is inactive).
:param user: The user object to log in.
:type user: object
:param remember: Whether to remember the user after their session expires.
Defaults to ``False``.
:type remember: bool
:param duration: The amount of time before the remember cookie expires. If
``None`` the value set in the settings is used. Defaults to ``None``.
:type duration: :class:`datetime.timedelta`
:param force: If the user is inactive, setting this to ``True`` will log
them in regardless. Defaults to ``False``.
:type force: bool
:param fresh: setting this to ``False`` will log in the user with a session
marked as not "fresh". Defaults to ``True``.
:type fresh: bool
"""
if not force and not user.is_active:
return False
session["_user_id"] = user.id
session["_fresh"] = fresh
session["_id"] = get_uuid()
return True
def logout_user():
"""
Logs a user out. (You do not need to pass the actual user.) This will
also clean up the remember me cookie if it exists.
"""
if "_user_id" in session:
session.pop("_user_id")
if "_fresh" in session:
session.pop("_fresh")
if "_id" in session:
session.pop("_id")
COOKIE_NAME = "remember_token"
cookie_name = current_app.config.get("REMEMBER_COOKIE_NAME", COOKIE_NAME)
if cookie_name in request.cookies:
session["_remember"] = "clear"
if "_remember_seconds" in session:
session.pop("_remember_seconds")
return True
def search_pages_path(page_path):
app_path_list = [ app_path_list = [
path for path in pages_dir.glob("*_app.py") if not path.name.startswith(".") path for path in page_path.glob("*_app.py") if not path.name.startswith(".")
] ]
api_path_list = [ api_path_list = [
path for path in pages_dir.glob("*sdk/*.py") if not path.name.startswith(".") path for path in page_path.glob("*sdk/*.py") if not path.name.startswith(".")
] ]
app_path_list.extend(api_path_list) app_path_list.extend(api_path_list)
return app_path_list return app_path_list
@ -138,44 +265,12 @@ pages_dir = [
] ]
client_urls_prefix = [ client_urls_prefix = [
register_page(path) for dir in pages_dir for path in search_pages_path(dir) register_page(path) for directory in pages_dir for path in search_pages_path(directory)
] ]
@login_manager.request_loader
def load_user(web_request):
jwt = Serializer(secret_key=settings.SECRET_KEY)
authorization = web_request.headers.get("Authorization")
if authorization:
try:
access_token = str(jwt.loads(authorization))
if not access_token or not access_token.strip():
logging.warning("Authentication attempt with empty access token")
return None
# Access tokens should be UUIDs (32 hex characters)
if len(access_token.strip()) < 32:
logging.warning(f"Authentication attempt with invalid token format: {len(access_token)} chars")
return None
user = UserService.query(
access_token=access_token, status=StatusEnum.VALID.value
)
if user:
if not user[0].access_token or not user[0].access_token.strip():
logging.warning(f"User {user[0].email} has empty access_token in database")
return None
return user[0]
else:
return None
except Exception as e:
logging.warning(f"load_user got exception {e}")
return None
else:
return None
@app.teardown_request @app.teardown_request
def _db_close(exc): def _db_close(exception):
if exception:
logging.exception(f"Request failed: {exception}")
close_connection() close_connection()

View File

@ -13,47 +13,21 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# #
import json
import os
import re
from datetime import datetime, timedelta from datetime import datetime, timedelta
from flask import request, Response from quart import request
from api.db.services.llm_service import LLMBundle from api.db.db_models import APIToken
from flask_login import login_required, current_user
from api.db import VALID_FILE_TYPES, FileType
from api.db.db_models import APIToken, Task, File
from api.db.services import duplicate_name
from api.db.services.api_service import APITokenService, API4ConversationService from api.db.services.api_service import APITokenService, API4ConversationService
from api.db.services.dialog_service import DialogService, chat
from api.db.services.document_service import DocumentService, doc_upload_and_parse
from api.db.services.file2document_service import File2DocumentService
from api.db.services.file_service import FileService
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.task_service import queue_tasks, TaskService
from api.db.services.user_service import UserTenantService from api.db.services.user_service import UserTenantService
from common.misc_utils import get_uuid
from common.constants import RetCode, VALID_TASK_STATUS, LLMType, ParserType, FileSource
from api.utils.api_utils import server_error_response, get_data_error_result, get_json_result, validate_request, \ from api.utils.api_utils import server_error_response, get_data_error_result, get_json_result, validate_request, \
generate_confirmation_token generate_confirmation_token
from api.utils.file_utils import filename_type, thumbnail
from rag.app.tag import label_question
from rag.prompts.generator import keyword_extraction
from rag.utils.storage_factory import STORAGE_IMPL
from common.time_utils import current_timestamp, datetime_format from common.time_utils import current_timestamp, datetime_format
from api.apps import login_required, current_user
from api.db.services.canvas_service import UserCanvasService
from agent.canvas import Canvas
from functools import partial
from pathlib import Path
from common import globals
@manager.route('/new_token', methods=['POST']) # noqa: F821 @manager.route('/new_token', methods=['POST']) # noqa: F821
@login_required @login_required
def new_token(): async def new_token():
req = request.json req = await request.json
try: try:
tenants = UserTenantService.query(user_id=current_user.id) tenants = UserTenantService.query(user_id=current_user.id)
if not tenants: if not tenants:
@ -98,8 +72,8 @@ def token_list():
@manager.route('/rm', methods=['POST']) # noqa: F821 @manager.route('/rm', methods=['POST']) # noqa: F821
@validate_request("tokens", "tenant_id") @validate_request("tokens", "tenant_id")
@login_required @login_required
def rm(): async def rm():
req = request.json req = await request.json
try: try:
for token in req["tokens"]: for token in req["tokens"]:
APITokenService.filter_delete( APITokenService.filter_delete(
@ -127,774 +101,19 @@ def stats():
"to_date", "to_date",
datetime.now().strftime("%Y-%m-%d %H:%M:%S")), datetime.now().strftime("%Y-%m-%d %H:%M:%S")),
"agent" if "canvas_id" in request.args else None) "agent" if "canvas_id" in request.args else None)
res = {
"pv": [(o["dt"], o["pv"]) for o in objs], res = {"pv": [], "uv": [], "speed": [], "tokens": [], "round": [], "thumb_up": []}
"uv": [(o["dt"], o["uv"]) for o in objs],
"speed": [(o["dt"], float(o["tokens"]) / (float(o["duration"] + 0.1))) for o in objs], for obj in objs:
"tokens": [(o["dt"], float(o["tokens"]) / 1000.) for o in objs], dt = obj["dt"]
"round": [(o["dt"], o["round"]) for o in objs], res["pv"].append((dt, obj["pv"]))
"thumb_up": [(o["dt"], o["thumb_up"]) for o in objs] res["uv"].append((dt, obj["uv"]))
} res["speed"].append((dt, float(obj["tokens"]) / (float(obj["duration"]) + 0.1))) # +0.1 to avoid division by zero
res["tokens"].append((dt, float(obj["tokens"]) / 1000.0)) # convert to thousands
res["round"].append((dt, obj["round"]))
res["thumb_up"].append((dt, obj["thumb_up"]))
return get_json_result(data=res) return get_json_result(data=res)
except Exception as e: except Exception as e:
return server_error_response(e) return server_error_response(e)
@manager.route('/new_conversation', methods=['GET']) # noqa: F821
def set_conversation():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, message='Authentication error: API key is invalid!"', code=RetCode.AUTHENTICATION_ERROR)
try:
if objs[0].source == "agent":
e, cvs = UserCanvasService.get_by_id(objs[0].dialog_id)
if not e:
return server_error_response("canvas not found.")
if not isinstance(cvs.dsl, str):
cvs.dsl = json.dumps(cvs.dsl, ensure_ascii=False)
canvas = Canvas(cvs.dsl, objs[0].tenant_id)
conv = {
"id": get_uuid(),
"dialog_id": cvs.id,
"user_id": request.args.get("user_id", ""),
"message": [{"role": "assistant", "content": canvas.get_prologue()}],
"source": "agent"
}
API4ConversationService.save(**conv)
return get_json_result(data=conv)
else:
e, dia = DialogService.get_by_id(objs[0].dialog_id)
if not e:
return get_data_error_result(message="Dialog not found")
conv = {
"id": get_uuid(),
"dialog_id": dia.id,
"user_id": request.args.get("user_id", ""),
"message": [{"role": "assistant", "content": dia.prompt_config["prologue"]}]
}
API4ConversationService.save(**conv)
return get_json_result(data=conv)
except Exception as e:
return server_error_response(e)
@manager.route('/completion', methods=['POST']) # noqa: F821
@validate_request("conversation_id", "messages")
def completion():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, message='Authentication error: API key is invalid!"', code=RetCode.AUTHENTICATION_ERROR)
req = request.json
e, conv = API4ConversationService.get_by_id(req["conversation_id"])
if not e:
return get_data_error_result(message="Conversation not found!")
if "quote" not in req:
req["quote"] = False
msg = []
for m in req["messages"]:
if m["role"] == "system":
continue
if m["role"] == "assistant" and not msg:
continue
msg.append(m)
if not msg[-1].get("id"):
msg[-1]["id"] = get_uuid()
message_id = msg[-1]["id"]
def fillin_conv(ans):
nonlocal conv, message_id
if not conv.reference:
conv.reference.append(ans["reference"])
else:
conv.reference[-1] = ans["reference"]
conv.message[-1] = {"role": "assistant", "content": ans["answer"], "id": message_id}
ans["id"] = message_id
def rename_field(ans):
reference = ans['reference']
if not isinstance(reference, dict):
return
for chunk_i in reference.get('chunks', []):
if 'docnm_kwd' in chunk_i:
chunk_i['doc_name'] = chunk_i['docnm_kwd']
chunk_i.pop('docnm_kwd')
try:
if conv.source == "agent":
stream = req.get("stream", True)
conv.message.append(msg[-1])
e, cvs = UserCanvasService.get_by_id(conv.dialog_id)
if not e:
return server_error_response("canvas not found.")
del req["conversation_id"]
del req["messages"]
if not isinstance(cvs.dsl, str):
cvs.dsl = json.dumps(cvs.dsl, ensure_ascii=False)
if not conv.reference:
conv.reference = []
conv.message.append({"role": "assistant", "content": "", "id": message_id})
conv.reference.append({"chunks": [], "doc_aggs": []})
final_ans = {"reference": [], "content": ""}
canvas = Canvas(cvs.dsl, objs[0].tenant_id)
canvas.messages.append(msg[-1])
canvas.add_user_input(msg[-1]["content"])
answer = canvas.run(stream=stream)
assert answer is not None, "Nothing. Is it over?"
if stream:
assert isinstance(answer, partial), "Nothing. Is it over?"
def sse():
nonlocal answer, cvs, conv
try:
for ans in answer():
for k in ans.keys():
final_ans[k] = ans[k]
ans = {"answer": ans["content"], "reference": ans.get("reference", [])}
fillin_conv(ans)
rename_field(ans)
yield "data:" + json.dumps({"code": 0, "message": "", "data": ans},
ensure_ascii=False) + "\n\n"
canvas.messages.append({"role": "assistant", "content": final_ans["content"], "id": message_id})
canvas.history.append(("assistant", final_ans["content"]))
if final_ans.get("reference"):
canvas.reference.append(final_ans["reference"])
cvs.dsl = json.loads(str(canvas))
API4ConversationService.append_message(conv.id, conv.to_dict())
except Exception as e:
yield "data:" + json.dumps({"code": 500, "message": str(e),
"data": {"answer": "**ERROR**: " + str(e), "reference": []}},
ensure_ascii=False) + "\n\n"
yield "data:" + json.dumps({"code": 0, "message": "", "data": True}, ensure_ascii=False) + "\n\n"
resp = Response(sse(), mimetype="text/event-stream")
resp.headers.add_header("Cache-control", "no-cache")
resp.headers.add_header("Connection", "keep-alive")
resp.headers.add_header("X-Accel-Buffering", "no")
resp.headers.add_header("Content-Type", "text/event-stream; charset=utf-8")
return resp
final_ans["content"] = "\n".join(answer["content"]) if "content" in answer else ""
canvas.messages.append({"role": "assistant", "content": final_ans["content"], "id": message_id})
if final_ans.get("reference"):
canvas.reference.append(final_ans["reference"])
cvs.dsl = json.loads(str(canvas))
result = {"answer": final_ans["content"], "reference": final_ans.get("reference", [])}
fillin_conv(result)
API4ConversationService.append_message(conv.id, conv.to_dict())
rename_field(result)
return get_json_result(data=result)
# ******************For dialog******************
conv.message.append(msg[-1])
e, dia = DialogService.get_by_id(conv.dialog_id)
if not e:
return get_data_error_result(message="Dialog not found!")
del req["conversation_id"]
del req["messages"]
if not conv.reference:
conv.reference = []
conv.message.append({"role": "assistant", "content": "", "id": message_id})
conv.reference.append({"chunks": [], "doc_aggs": []})
def stream():
nonlocal dia, msg, req, conv
try:
for ans in chat(dia, msg, True, **req):
fillin_conv(ans)
rename_field(ans)
yield "data:" + json.dumps({"code": 0, "message": "", "data": ans},
ensure_ascii=False) + "\n\n"
API4ConversationService.append_message(conv.id, conv.to_dict())
except Exception as e:
yield "data:" + json.dumps({"code": 500, "message": str(e),
"data": {"answer": "**ERROR**: " + str(e), "reference": []}},
ensure_ascii=False) + "\n\n"
yield "data:" + json.dumps({"code": 0, "message": "", "data": True}, ensure_ascii=False) + "\n\n"
if req.get("stream", True):
resp = Response(stream(), mimetype="text/event-stream")
resp.headers.add_header("Cache-control", "no-cache")
resp.headers.add_header("Connection", "keep-alive")
resp.headers.add_header("X-Accel-Buffering", "no")
resp.headers.add_header("Content-Type", "text/event-stream; charset=utf-8")
return resp
answer = None
for ans in chat(dia, msg, **req):
answer = ans
fillin_conv(ans)
API4ConversationService.append_message(conv.id, conv.to_dict())
break
rename_field(answer)
return get_json_result(data=answer)
except Exception as e:
return server_error_response(e)
@manager.route('/conversation/<conversation_id>', methods=['GET']) # noqa: F821
# @login_required
def get_conversation(conversation_id):
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, message='Authentication error: API key is invalid!"', code=RetCode.AUTHENTICATION_ERROR)
try:
e, conv = API4ConversationService.get_by_id(conversation_id)
if not e:
return get_data_error_result(message="Conversation not found!")
conv = conv.to_dict()
if token != APIToken.query(dialog_id=conv['dialog_id'])[0].token:
return get_json_result(data=False, message='Authentication error: API key is invalid for this conversation_id!"',
code=RetCode.AUTHENTICATION_ERROR)
for referenct_i in conv['reference']:
if referenct_i is None or len(referenct_i) == 0:
continue
for chunk_i in referenct_i['chunks']:
if 'docnm_kwd' in chunk_i.keys():
chunk_i['doc_name'] = chunk_i['docnm_kwd']
chunk_i.pop('docnm_kwd')
return get_json_result(data=conv)
except Exception as e:
return server_error_response(e)
@manager.route('/document/upload', methods=['POST']) # noqa: F821
@validate_request("kb_name")
def upload():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, message='Authentication error: API key is invalid!"', code=RetCode.AUTHENTICATION_ERROR)
kb_name = request.form.get("kb_name").strip()
tenant_id = objs[0].tenant_id
try:
e, kb = KnowledgebaseService.get_by_name(kb_name, tenant_id)
if not e:
return get_data_error_result(
message="Can't find this knowledgebase!")
kb_id = kb.id
except Exception as e:
return server_error_response(e)
if 'file' not in request.files:
return get_json_result(
data=False, message='No file part!', code=RetCode.ARGUMENT_ERROR)
file = request.files['file']
if file.filename == '':
return get_json_result(
data=False, message='No file selected!', code=RetCode.ARGUMENT_ERROR)
root_folder = FileService.get_root_folder(tenant_id)
pf_id = root_folder["id"]
FileService.init_knowledgebase_docs(pf_id, tenant_id)
kb_root_folder = FileService.get_kb_folder(tenant_id)
kb_folder = FileService.new_a_file_from_kb(kb.tenant_id, kb.name, kb_root_folder["id"])
try:
if DocumentService.get_doc_count(kb.tenant_id) >= int(os.environ.get('MAX_FILE_NUM_PER_USER', 8192)):
return get_data_error_result(
message="Exceed the maximum file number of a free user!")
filename = duplicate_name(
DocumentService.query,
name=file.filename,
kb_id=kb_id)
filetype = filename_type(filename)
if not filetype:
return get_data_error_result(
message="This type of file has not been supported yet!")
location = filename
while STORAGE_IMPL.obj_exist(kb_id, location):
location += "_"
blob = request.files['file'].read()
STORAGE_IMPL.put(kb_id, location, blob)
doc = {
"id": get_uuid(),
"kb_id": kb.id,
"parser_id": kb.parser_id,
"parser_config": kb.parser_config,
"created_by": kb.tenant_id,
"type": filetype,
"name": filename,
"location": location,
"size": len(blob),
"thumbnail": thumbnail(filename, blob),
"suffix": Path(filename).suffix.lstrip("."),
}
form_data = request.form
if "parser_id" in form_data.keys():
if request.form.get("parser_id").strip() in list(vars(ParserType).values())[1:-3]:
doc["parser_id"] = request.form.get("parser_id").strip()
if doc["type"] == FileType.VISUAL:
doc["parser_id"] = ParserType.PICTURE.value
if doc["type"] == FileType.AURAL:
doc["parser_id"] = ParserType.AUDIO.value
if re.search(r"\.(ppt|pptx|pages)$", filename):
doc["parser_id"] = ParserType.PRESENTATION.value
if re.search(r"\.(eml)$", filename):
doc["parser_id"] = ParserType.EMAIL.value
doc_result = DocumentService.insert(doc)
FileService.add_file_from_kb(doc, kb_folder["id"], kb.tenant_id)
except Exception as e:
return server_error_response(e)
if "run" in form_data.keys():
if request.form.get("run").strip() == "1":
try:
info = {"run": 1, "progress": 0}
info["progress_msg"] = ""
info["chunk_num"] = 0
info["token_num"] = 0
DocumentService.update_by_id(doc["id"], info)
# if str(req["run"]) == TaskStatus.CANCEL.value:
tenant_id = DocumentService.get_tenant_id(doc["id"])
if not tenant_id:
return get_data_error_result(message="Tenant not found!")
# e, doc = DocumentService.get_by_id(doc["id"])
TaskService.filter_delete([Task.doc_id == doc["id"]])
e, doc = DocumentService.get_by_id(doc["id"])
doc = doc.to_dict()
doc["tenant_id"] = tenant_id
bucket, name = File2DocumentService.get_storage_address(doc_id=doc["id"])
queue_tasks(doc, bucket, name, 0)
except Exception as e:
return server_error_response(e)
return get_json_result(data=doc_result.to_json())
@manager.route('/document/upload_and_parse', methods=['POST']) # noqa: F821
@validate_request("conversation_id")
def upload_parse():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, message='Authentication error: API key is invalid!"', code=RetCode.AUTHENTICATION_ERROR)
if 'file' not in request.files:
return get_json_result(
data=False, message='No file part!', code=RetCode.ARGUMENT_ERROR)
file_objs = request.files.getlist('file')
for file_obj in file_objs:
if file_obj.filename == '':
return get_json_result(
data=False, message='No file selected!', code=RetCode.ARGUMENT_ERROR)
doc_ids = doc_upload_and_parse(request.form.get("conversation_id"), file_objs, objs[0].tenant_id)
return get_json_result(data=doc_ids)
@manager.route('/list_chunks', methods=['POST']) # noqa: F821
# @login_required
def list_chunks():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, message='Authentication error: API key is invalid!"', code=RetCode.AUTHENTICATION_ERROR)
req = request.json
try:
if "doc_name" in req.keys():
tenant_id = DocumentService.get_tenant_id_by_name(req['doc_name'])
doc_id = DocumentService.get_doc_id_by_doc_name(req['doc_name'])
elif "doc_id" in req.keys():
tenant_id = DocumentService.get_tenant_id(req['doc_id'])
doc_id = req['doc_id']
else:
return get_json_result(
data=False, message="Can't find doc_name or doc_id"
)
kb_ids = KnowledgebaseService.get_kb_ids(tenant_id)
res = globals.retriever.chunk_list(doc_id, tenant_id, kb_ids)
res = [
{
"content": res_item["content_with_weight"],
"doc_name": res_item["docnm_kwd"],
"image_id": res_item["img_id"]
} for res_item in res
]
except Exception as e:
return server_error_response(e)
return get_json_result(data=res)
@manager.route('/get_chunk/<chunk_id>', methods=['GET']) # noqa: F821
# @login_required
def get_chunk(chunk_id):
from rag.nlp import search
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, message='Authentication error: API key is invalid!"', code=RetCode.AUTHENTICATION_ERROR)
try:
tenant_id = objs[0].tenant_id
kb_ids = KnowledgebaseService.get_kb_ids(tenant_id)
chunk = globals.docStoreConn.get(chunk_id, search.index_name(tenant_id), kb_ids)
if chunk is None:
return server_error_response(Exception("Chunk not found"))
k = []
for n in chunk.keys():
if re.search(r"(_vec$|_sm_|_tks|_ltks)", n):
k.append(n)
for n in k:
del chunk[n]
return get_json_result(data=chunk)
except Exception as e:
return server_error_response(e)
@manager.route('/list_kb_docs', methods=['POST']) # noqa: F821
# @login_required
def list_kb_docs():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, message='Authentication error: API key is invalid!"', code=RetCode.AUTHENTICATION_ERROR)
req = request.json
tenant_id = objs[0].tenant_id
kb_name = req.get("kb_name", "").strip()
try:
e, kb = KnowledgebaseService.get_by_name(kb_name, tenant_id)
if not e:
return get_data_error_result(
message="Can't find this knowledgebase!")
kb_id = kb.id
except Exception as e:
return server_error_response(e)
page_number = int(req.get("page", 1))
items_per_page = int(req.get("page_size", 15))
orderby = req.get("orderby", "create_time")
desc = req.get("desc", True)
keywords = req.get("keywords", "")
status = req.get("status", [])
if status:
invalid_status = {s for s in status if s not in VALID_TASK_STATUS}
if invalid_status:
return get_data_error_result(
message=f"Invalid filter status conditions: {', '.join(invalid_status)}"
)
types = req.get("types", [])
if types:
invalid_types = {t for t in types if t not in VALID_FILE_TYPES}
if invalid_types:
return get_data_error_result(
message=f"Invalid filter conditions: {', '.join(invalid_types)} type{'s' if len(invalid_types) > 1 else ''}"
)
try:
docs, tol = DocumentService.get_by_kb_id(
kb_id, page_number, items_per_page, orderby, desc, keywords, status, types)
docs = [{"doc_id": doc['id'], "doc_name": doc['name']} for doc in docs]
return get_json_result(data={"total": tol, "docs": docs})
except Exception as e:
return server_error_response(e)
@manager.route('/document/infos', methods=['POST']) # noqa: F821
@validate_request("doc_ids")
def docinfos():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, message='Authentication error: API key is invalid!"', code=RetCode.AUTHENTICATION_ERROR)
req = request.json
doc_ids = req["doc_ids"]
docs = DocumentService.get_by_ids(doc_ids)
return get_json_result(data=list(docs.dicts()))
@manager.route('/document', methods=['DELETE']) # noqa: F821
# @login_required
def document_rm():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, message='Authentication error: API key is invalid!"', code=RetCode.AUTHENTICATION_ERROR)
tenant_id = objs[0].tenant_id
req = request.json
try:
doc_ids = DocumentService.get_doc_ids_by_doc_names(req.get("doc_names", []))
for doc_id in req.get("doc_ids", []):
if doc_id not in doc_ids:
doc_ids.append(doc_id)
if not doc_ids:
return get_json_result(
data=False, message="Can't find doc_names or doc_ids"
)
except Exception as e:
return server_error_response(e)
root_folder = FileService.get_root_folder(tenant_id)
pf_id = root_folder["id"]
FileService.init_knowledgebase_docs(pf_id, tenant_id)
errors = ""
docs = DocumentService.get_by_ids(doc_ids)
doc_dic = {}
for doc in docs:
doc_dic[doc.id] = doc
for doc_id in doc_ids:
try:
if doc_id not in doc_dic:
return get_data_error_result(message="Document not found!")
doc = doc_dic[doc_id]
tenant_id = DocumentService.get_tenant_id(doc_id)
if not tenant_id:
return get_data_error_result(message="Tenant not found!")
b, n = File2DocumentService.get_storage_address(doc_id=doc_id)
if not DocumentService.remove_document(doc, tenant_id):
return get_data_error_result(
message="Database error (Document removal)!")
f2d = File2DocumentService.get_by_document_id(doc_id)
FileService.filter_delete([File.source_type == FileSource.KNOWLEDGEBASE, File.id == f2d[0].file_id])
File2DocumentService.delete_by_document_id(doc_id)
STORAGE_IMPL.rm(b, n)
except Exception as e:
errors += str(e)
if errors:
return get_json_result(data=False, message=errors, code=RetCode.SERVER_ERROR)
return get_json_result(data=True)
@manager.route('/completion_aibotk', methods=['POST']) # noqa: F821
@validate_request("Authorization", "conversation_id", "word")
def completion_faq():
import base64
req = request.json
token = req["Authorization"]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, message='Authentication error: API key is invalid!"', code=RetCode.AUTHENTICATION_ERROR)
e, conv = API4ConversationService.get_by_id(req["conversation_id"])
if not e:
return get_data_error_result(message="Conversation not found!")
if "quote" not in req:
req["quote"] = True
msg = []
msg.append({"role": "user", "content": req["word"]})
if not msg[-1].get("id"):
msg[-1]["id"] = get_uuid()
message_id = msg[-1]["id"]
def fillin_conv(ans):
nonlocal conv, message_id
if not conv.reference:
conv.reference.append(ans["reference"])
else:
conv.reference[-1] = ans["reference"]
conv.message[-1] = {"role": "assistant", "content": ans["answer"], "id": message_id}
ans["id"] = message_id
try:
if conv.source == "agent":
conv.message.append(msg[-1])
e, cvs = UserCanvasService.get_by_id(conv.dialog_id)
if not e:
return server_error_response("canvas not found.")
if not isinstance(cvs.dsl, str):
cvs.dsl = json.dumps(cvs.dsl, ensure_ascii=False)
if not conv.reference:
conv.reference = []
conv.message.append({"role": "assistant", "content": "", "id": message_id})
conv.reference.append({"chunks": [], "doc_aggs": []})
final_ans = {"reference": [], "doc_aggs": []}
canvas = Canvas(cvs.dsl, objs[0].tenant_id)
canvas.messages.append(msg[-1])
canvas.add_user_input(msg[-1]["content"])
answer = canvas.run(stream=False)
assert answer is not None, "Nothing. Is it over?"
data_type_picture = {
"type": 3,
"url": "base64 content"
}
data = [
{
"type": 1,
"content": ""
}
]
final_ans["content"] = "\n".join(answer["content"]) if "content" in answer else ""
canvas.messages.append({"role": "assistant", "content": final_ans["content"], "id": message_id})
if final_ans.get("reference"):
canvas.reference.append(final_ans["reference"])
cvs.dsl = json.loads(str(canvas))
ans = {"answer": final_ans["content"], "reference": final_ans.get("reference", [])}
data[0]["content"] += re.sub(r'##\d\$\$', '', ans["answer"])
fillin_conv(ans)
API4ConversationService.append_message(conv.id, conv.to_dict())
chunk_idxs = [int(match[2]) for match in re.findall(r'##\d\$\$', ans["answer"])]
for chunk_idx in chunk_idxs[:1]:
if ans["reference"]["chunks"][chunk_idx]["img_id"]:
try:
bkt, nm = ans["reference"]["chunks"][chunk_idx]["img_id"].split("-")
response = STORAGE_IMPL.get(bkt, nm)
data_type_picture["url"] = base64.b64encode(response).decode('utf-8')
data.append(data_type_picture)
break
except Exception as e:
return server_error_response(e)
response = {"code": 200, "msg": "success", "data": data}
return response
# ******************For dialog******************
conv.message.append(msg[-1])
e, dia = DialogService.get_by_id(conv.dialog_id)
if not e:
return get_data_error_result(message="Dialog not found!")
del req["conversation_id"]
if not conv.reference:
conv.reference = []
conv.message.append({"role": "assistant", "content": "", "id": message_id})
conv.reference.append({"chunks": [], "doc_aggs": []})
data_type_picture = {
"type": 3,
"url": "base64 content"
}
data = [
{
"type": 1,
"content": ""
}
]
ans = ""
for a in chat(dia, msg, stream=False, **req):
ans = a
break
data[0]["content"] += re.sub(r'##\d\$\$', '', ans["answer"])
fillin_conv(ans)
API4ConversationService.append_message(conv.id, conv.to_dict())
chunk_idxs = [int(match[2]) for match in re.findall(r'##\d\$\$', ans["answer"])]
for chunk_idx in chunk_idxs[:1]:
if ans["reference"]["chunks"][chunk_idx]["img_id"]:
try:
bkt, nm = ans["reference"]["chunks"][chunk_idx]["img_id"].split("-")
response = STORAGE_IMPL.get(bkt, nm)
data_type_picture["url"] = base64.b64encode(response).decode('utf-8')
data.append(data_type_picture)
break
except Exception as e:
return server_error_response(e)
response = {"code": 200, "msg": "success", "data": data}
return response
except Exception as e:
return server_error_response(e)
@manager.route('/retrieval', methods=['POST']) # noqa: F821
@validate_request("kb_id", "question")
def retrieval():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, message='Authentication error: API key is invalid!"', code=RetCode.AUTHENTICATION_ERROR)
req = request.json
kb_ids = req.get("kb_id", [])
doc_ids = req.get("doc_ids", [])
question = req.get("question")
page = int(req.get("page", 1))
size = int(req.get("page_size", 30))
similarity_threshold = float(req.get("similarity_threshold", 0.2))
vector_similarity_weight = float(req.get("vector_similarity_weight", 0.3))
top = int(req.get("top_k", 1024))
highlight = bool(req.get("highlight", False))
try:
kbs = KnowledgebaseService.get_by_ids(kb_ids)
embd_nms = list(set([kb.embd_id for kb in kbs]))
if len(embd_nms) != 1:
return get_json_result(
data=False, message='Knowledge bases use different embedding models or does not exist."',
code=RetCode.AUTHENTICATION_ERROR)
embd_mdl = LLMBundle(kbs[0].tenant_id, LLMType.EMBEDDING, llm_name=kbs[0].embd_id)
rerank_mdl = None
if req.get("rerank_id"):
rerank_mdl = LLMBundle(kbs[0].tenant_id, LLMType.RERANK, llm_name=req["rerank_id"])
if req.get("keyword", False):
chat_mdl = LLMBundle(kbs[0].tenant_id, LLMType.CHAT)
question += keyword_extraction(chat_mdl, question)
ranks = globals.retriever.retrieval(question, embd_mdl, kbs[0].tenant_id, kb_ids, page, size,
similarity_threshold, vector_similarity_weight, top,
doc_ids, rerank_mdl=rerank_mdl, highlight= highlight,
rank_feature=label_question(question, kbs))
for c in ranks["chunks"]:
c.pop("vector", None)
return get_json_result(data=ranks)
except Exception as e:
if str(e).find("not_found") > 0:
return get_json_result(data=False, message='No chunk found! Check the chunk status please!',
code=RetCode.DATA_ERROR)
return server_error_response(e)

View File

@ -34,7 +34,7 @@ class GithubOAuthClient(OAuthClient):
def fetch_user_info(self, access_token, **kwargs): def fetch_user_info(self, access_token, **kwargs):
""" """
Fetch github user info. Fetch GitHub user info.
""" """
user_info = {} user_info = {}
try: try:

View File

@ -43,7 +43,8 @@ class OIDCClient(OAuthClient):
self.jwks_uri = config['jwks_uri'] self.jwks_uri = config['jwks_uri']
def _load_oidc_metadata(self, issuer): @staticmethod
def _load_oidc_metadata(issuer):
""" """
Load OIDC metadata from `/.well-known/openid-configuration`. Load OIDC metadata from `/.well-known/openid-configuration`.
""" """

View File

@ -18,12 +18,8 @@ import logging
import re import re
import sys import sys
from functools import partial from functools import partial
import flask
import trio import trio
from flask import request, Response from quart import request, Response, make_response
from flask_login import login_required, current_user
from agent.component import LLM from agent.component import LLM
from api.db import CanvasCategory, FileType from api.db import CanvasCategory, FileType
from api.db.services.canvas_service import CanvasTemplateService, UserCanvasService, API4ConversationService from api.db.services.canvas_service import CanvasTemplateService, UserCanvasService, API4ConversationService
@ -35,7 +31,8 @@ from api.db.services.user_service import TenantService
from api.db.services.user_canvas_version import UserCanvasVersionService from api.db.services.user_canvas_version import UserCanvasVersionService
from common.constants import RetCode from common.constants import RetCode
from common.misc_utils import get_uuid from common.misc_utils import get_uuid
from api.utils.api_utils import get_json_result, server_error_response, validate_request, get_data_error_result from api.utils.api_utils import get_json_result, server_error_response, validate_request, get_data_error_result, \
request_json
from agent.canvas import Canvas from agent.canvas import Canvas
from peewee import MySQLDatabase, PostgresqlDatabase from peewee import MySQLDatabase, PostgresqlDatabase
from api.db.db_models import APIToken, Task from api.db.db_models import APIToken, Task
@ -45,7 +42,8 @@ from api.utils.file_utils import filename_type, read_potential_broken_pdf
from rag.flow.pipeline import Pipeline from rag.flow.pipeline import Pipeline
from rag.nlp import search from rag.nlp import search
from rag.utils.redis_conn import REDIS_CONN from rag.utils.redis_conn import REDIS_CONN
from common import globals from common import settings
from api.apps import login_required, current_user
@manager.route('/templates', methods=['GET']) # noqa: F821 @manager.route('/templates', methods=['GET']) # noqa: F821
@ -57,8 +55,9 @@ def templates():
@manager.route('/rm', methods=['POST']) # noqa: F821 @manager.route('/rm', methods=['POST']) # noqa: F821
@validate_request("canvas_ids") @validate_request("canvas_ids")
@login_required @login_required
def rm(): async def rm():
for i in request.json["canvas_ids"]: req = await request_json()
for i in req["canvas_ids"]:
if not UserCanvasService.accessible(i, current_user.id): if not UserCanvasService.accessible(i, current_user.id):
return get_json_result( return get_json_result(
data=False, message='Only owner of canvas authorized for this operation.', data=False, message='Only owner of canvas authorized for this operation.',
@ -70,8 +69,8 @@ def rm():
@manager.route('/set', methods=['POST']) # noqa: F821 @manager.route('/set', methods=['POST']) # noqa: F821
@validate_request("dsl", "title") @validate_request("dsl", "title")
@login_required @login_required
def save(): async def save():
req = request.json req = await request_json()
if not isinstance(req["dsl"], str): if not isinstance(req["dsl"], str):
req["dsl"] = json.dumps(req["dsl"], ensure_ascii=False) req["dsl"] = json.dumps(req["dsl"], ensure_ascii=False)
req["dsl"] = json.loads(req["dsl"]) req["dsl"] = json.loads(req["dsl"])
@ -129,8 +128,8 @@ def getsse(canvas_id):
@manager.route('/completion', methods=['POST']) # noqa: F821 @manager.route('/completion', methods=['POST']) # noqa: F821
@validate_request("id") @validate_request("id")
@login_required @login_required
def run(): async def run():
req = request.json req = await request_json()
query = req.get("query", "") query = req.get("query", "")
files = req.get("files", []) files = req.get("files", [])
inputs = req.get("inputs", {}) inputs = req.get("inputs", {})
@ -156,20 +155,22 @@ def run():
return get_json_result(data={"message_id": task_id}) return get_json_result(data={"message_id": task_id})
try: try:
canvas = Canvas(cvs.dsl, current_user.id, req["id"]) canvas = Canvas(cvs.dsl, current_user.id)
except Exception as e: except Exception as e:
return server_error_response(e) return server_error_response(e)
def sse(): async def sse():
nonlocal canvas, user_id nonlocal canvas, user_id
try: try:
for ans in canvas.run(query=query, files=files, user_id=user_id, inputs=inputs): async for ans in canvas.run(query=query, files=files, user_id=user_id, inputs=inputs):
yield "data:" + json.dumps(ans, ensure_ascii=False) + "\n\n" yield "data:" + json.dumps(ans, ensure_ascii=False) + "\n\n"
cvs.dsl = json.loads(str(canvas)) cvs.dsl = json.loads(str(canvas))
UserCanvasService.update_by_id(req["id"], cvs.to_dict()) UserCanvasService.update_by_id(req["id"], cvs.to_dict())
except Exception as e: except Exception as e:
logging.exception(e) logging.exception(e)
canvas.cancel_task()
yield "data:" + json.dumps({"code": 500, "message": str(e), "data": False}, ensure_ascii=False) + "\n\n" yield "data:" + json.dumps({"code": 500, "message": str(e), "data": False}, ensure_ascii=False) + "\n\n"
resp = Response(sse(), mimetype="text/event-stream") resp = Response(sse(), mimetype="text/event-stream")
@ -177,14 +178,15 @@ def run():
resp.headers.add_header("Connection", "keep-alive") resp.headers.add_header("Connection", "keep-alive")
resp.headers.add_header("X-Accel-Buffering", "no") resp.headers.add_header("X-Accel-Buffering", "no")
resp.headers.add_header("Content-Type", "text/event-stream; charset=utf-8") resp.headers.add_header("Content-Type", "text/event-stream; charset=utf-8")
#resp.call_on_close(lambda: canvas.cancel_task())
return resp return resp
@manager.route('/rerun', methods=['POST']) # noqa: F821 @manager.route('/rerun', methods=['POST']) # noqa: F821
@validate_request("id", "dsl", "component_id") @validate_request("id", "dsl", "component_id")
@login_required @login_required
def rerun(): async def rerun():
req = request.json req = await request_json()
doc = PipelineOperationLogService.get_documents_info(req["id"]) doc = PipelineOperationLogService.get_documents_info(req["id"])
if not doc: if not doc:
return get_data_error_result(message="Document not found.") return get_data_error_result(message="Document not found.")
@ -192,8 +194,8 @@ def rerun():
if 0 < doc["progress"] < 1: if 0 < doc["progress"] < 1:
return get_data_error_result(message=f"`{doc['name']}` is processing...") return get_data_error_result(message=f"`{doc['name']}` is processing...")
if globals.docStoreConn.indexExist(search.index_name(current_user.id), doc["kb_id"]): if settings.docStoreConn.indexExist(search.index_name(current_user.id), doc["kb_id"]):
globals.docStoreConn.delete({"doc_id": doc["id"]}, search.index_name(current_user.id), doc["kb_id"]) settings.docStoreConn.delete({"doc_id": doc["id"]}, search.index_name(current_user.id), doc["kb_id"])
doc["progress_msg"] = "" doc["progress_msg"] = ""
doc["chunk_num"] = 0 doc["chunk_num"] = 0
doc["token_num"] = 0 doc["token_num"] = 0
@ -221,8 +223,8 @@ def cancel(task_id):
@manager.route('/reset', methods=['POST']) # noqa: F821 @manager.route('/reset', methods=['POST']) # noqa: F821
@validate_request("id") @validate_request("id")
@login_required @login_required
def reset(): async def reset():
req = request.json req = await request_json()
if not UserCanvasService.accessible(req["id"], current_user.id): if not UserCanvasService.accessible(req["id"], current_user.id):
return get_json_result( return get_json_result(
data=False, message='Only owner of canvas authorized for this operation.', data=False, message='Only owner of canvas authorized for this operation.',
@ -242,7 +244,7 @@ def reset():
@manager.route("/upload/<canvas_id>", methods=["POST"]) # noqa: F821 @manager.route("/upload/<canvas_id>", methods=["POST"]) # noqa: F821
def upload(canvas_id): async def upload(canvas_id):
e, cvs = UserCanvasService.get_by_canvas_id(canvas_id) e, cvs = UserCanvasService.get_by_canvas_id(canvas_id)
if not e: if not e:
return get_data_error_result(message="canvas not found.") return get_data_error_result(message="canvas not found.")
@ -308,7 +310,8 @@ def upload(canvas_id):
except Exception as e: except Exception as e:
return server_error_response(e) return server_error_response(e)
file = request.files['file'] files = await request.files
file = files['file']
try: try:
DocumentService.check_doc_health(user_id, file.filename) DocumentService.check_doc_health(user_id, file.filename)
return get_json_result(data=structured(file.filename, filename_type(file.filename), file.read(), file.content_type)) return get_json_result(data=structured(file.filename, filename_type(file.filename), file.read(), file.content_type))
@ -339,8 +342,8 @@ def input_form():
@manager.route('/debug', methods=['POST']) # noqa: F821 @manager.route('/debug', methods=['POST']) # noqa: F821
@validate_request("id", "component_id", "params") @validate_request("id", "component_id", "params")
@login_required @login_required
def debug(): async def debug():
req = request.json req = await request_json()
if not UserCanvasService.accessible(req["id"], current_user.id): if not UserCanvasService.accessible(req["id"], current_user.id):
return get_json_result( return get_json_result(
data=False, message='Only owner of canvas authorized for this operation.', data=False, message='Only owner of canvas authorized for this operation.',
@ -371,8 +374,8 @@ def debug():
@manager.route('/test_db_connect', methods=['POST']) # noqa: F821 @manager.route('/test_db_connect', methods=['POST']) # noqa: F821
@validate_request("db_type", "database", "username", "host", "port", "password") @validate_request("db_type", "database", "username", "host", "port", "password")
@login_required @login_required
def test_db_connect(): async def test_db_connect():
req = request.json req = await request_json()
try: try:
if req["db_type"] in ["mysql", "mariadb"]: if req["db_type"] in ["mysql", "mariadb"]:
db = MySQLDatabase(req["database"], user=req["username"], host=req["host"], port=req["port"], db = MySQLDatabase(req["database"], user=req["username"], host=req["host"], port=req["port"],
@ -410,32 +413,31 @@ def test_db_connect():
ibm_db.close(conn) ibm_db.close(conn)
return get_json_result(data="Database Connection Successful!") return get_json_result(data="Database Connection Successful!")
elif req["db_type"] == 'trino': elif req["db_type"] == 'trino':
def _parse_catalog_schema(db: str): def _parse_catalog_schema(db_name: str):
if not db: if not db_name:
return None, None return None, None
if "." in db: if "." in db_name:
c, s = db.split(".", 1) catalog_name, schema_name = db_name.split(".", 1)
elif "/" in db: elif "/" in db_name:
c, s = db.split("/", 1) catalog_name, schema_name = db_name.split("/", 1)
else: else:
c, s = db, "default" catalog_name, schema_name = db_name, "default"
return c, s return catalog_name, schema_name
try: try:
import trino import trino
import os import os
from trino.auth import BasicAuthentication except Exception as e:
except Exception: return server_error_response(f"Missing dependency 'trino'. Please install: pip install trino, detail: {e}")
return server_error_response("Missing dependency 'trino'. Please install: pip install trino")
catalog, schema = _parse_catalog_schema(req["database"]) catalog, schema = _parse_catalog_schema(req["database"])
if not catalog: if not catalog:
return server_error_response("For Trino, 'database' must be 'catalog.schema' or at least 'catalog'.") return server_error_response("For Trino, 'database' must be 'catalog.schema' or at least 'catalog'.")
http_scheme = "https" if os.environ.get("TRINO_USE_TLS", "0") == "1" else "http" http_scheme = "https" if os.environ.get("TRINO_USE_TLS", "0") == "1" else "http"
auth = None auth = None
if http_scheme == "https" and req.get("password"): if http_scheme == "https" and req.get("password"):
auth = BasicAuthentication(req.get("username") or "ragflow", req["password"]) auth = trino.BasicAuthentication(req.get("username") or "ragflow", req["password"])
conn = trino.dbapi.connect( conn = trino.dbapi.connect(
host=req["host"], host=req["host"],
@ -468,8 +470,8 @@ def test_db_connect():
@login_required @login_required
def getlistversion(canvas_id): def getlistversion(canvas_id):
try: try:
list =sorted([c.to_dict() for c in UserCanvasVersionService.list_by_canvas_id(canvas_id)], key=lambda x: x["update_time"]*-1) versions =sorted([c.to_dict() for c in UserCanvasVersionService.list_by_canvas_id(canvas_id)], key=lambda x: x["update_time"]*-1)
return get_json_result(data=list) return get_json_result(data=versions)
except Exception as e: except Exception as e:
return get_data_error_result(message=f"Error getting history files: {e}") return get_data_error_result(message=f"Error getting history files: {e}")
@ -479,7 +481,6 @@ def getlistversion(canvas_id):
@login_required @login_required
def getversion( version_id): def getversion( version_id):
try: try:
e, version = UserCanvasVersionService.get_by_id(version_id) e, version = UserCanvasVersionService.get_by_id(version_id)
if version: if version:
return get_json_result(data=version.to_dict()) return get_json_result(data=version.to_dict())
@ -518,8 +519,8 @@ def list_canvas():
@manager.route('/setting', methods=['POST']) # noqa: F821 @manager.route('/setting', methods=['POST']) # noqa: F821
@validate_request("id", "title", "permission") @validate_request("id", "title", "permission")
@login_required @login_required
def setting(): async def setting():
req = request.json req = await request_json()
req["user_id"] = current_user.id req["user_id"] = current_user.id
if not UserCanvasService.accessible(req["id"], current_user.id): if not UserCanvasService.accessible(req["id"], current_user.id):
@ -546,11 +547,11 @@ def trace():
cvs_id = request.args.get("canvas_id") cvs_id = request.args.get("canvas_id")
msg_id = request.args.get("message_id") msg_id = request.args.get("message_id")
try: try:
bin = REDIS_CONN.get(f"{cvs_id}-{msg_id}-logs") binary = REDIS_CONN.get(f"{cvs_id}-{msg_id}-logs")
if not bin: if not binary:
return get_json_result(data={}) return get_json_result(data={})
return get_json_result(data=json.loads(bin.encode("utf-8"))) return get_json_result(data=json.loads(binary.encode("utf-8")))
except Exception as e: except Exception as e:
logging.exception(e) logging.exception(e)
@ -600,8 +601,8 @@ def prompts():
@manager.route('/download', methods=['GET']) # noqa: F821 @manager.route('/download', methods=['GET']) # noqa: F821
def download(): async def download():
id = request.args.get("id") id = request.args.get("id")
created_by = request.args.get("created_by") created_by = request.args.get("created_by")
blob = FileService.get_blob(created_by, id) blob = FileService.get_blob(created_by, id)
return flask.make_response(blob) return await make_response(blob)

View File

@ -18,32 +18,31 @@ import json
import re import re
import xxhash import xxhash
from flask import request from quart import request
from flask_login import current_user, login_required
from api import settings
from api.db.services.dialog_service import meta_filter from api.db.services.dialog_service import meta_filter
from api.db.services.document_service import DocumentService from api.db.services.document_service import DocumentService
from api.db.services.knowledgebase_service import KnowledgebaseService from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.llm_service import LLMBundle from api.db.services.llm_service import LLMBundle
from api.db.services.search_service import SearchService from api.db.services.search_service import SearchService
from api.db.services.user_service import UserTenantService from api.db.services.user_service import UserTenantService
from api.utils.api_utils import get_data_error_result, get_json_result, server_error_response, validate_request from api.utils.api_utils import get_data_error_result, get_json_result, server_error_response, validate_request, \
request_json
from rag.app.qa import beAdoc, rmPrefix from rag.app.qa import beAdoc, rmPrefix
from rag.app.tag import label_question from rag.app.tag import label_question
from rag.nlp import rag_tokenizer, search from rag.nlp import rag_tokenizer, search
from rag.prompts.generator import gen_meta_filter, cross_languages, keyword_extraction from rag.prompts.generator import gen_meta_filter, cross_languages, keyword_extraction
from rag.settings import PAGERANK_FLD
from common.string_utils import remove_redundant_spaces from common.string_utils import remove_redundant_spaces
from common.constants import RetCode, LLMType, ParserType from common.constants import RetCode, LLMType, ParserType, PAGERANK_FLD
from common import globals from common import settings
from api.apps import login_required, current_user
@manager.route('/list', methods=['POST']) # noqa: F821 @manager.route('/list', methods=['POST']) # noqa: F821
@login_required @login_required
@validate_request("doc_id") @validate_request("doc_id")
def list_chunk(): async def list_chunk():
req = request.json req = await request_json()
doc_id = req["doc_id"] doc_id = req["doc_id"]
page = int(req.get("page", 1)) page = int(req.get("page", 1))
size = int(req.get("size", 30)) size = int(req.get("size", 30))
@ -61,7 +60,7 @@ def list_chunk():
} }
if "available_int" in req: if "available_int" in req:
query["available_int"] = int(req["available_int"]) query["available_int"] = int(req["available_int"])
sres = globals.retriever.search(query, search.index_name(tenant_id), kb_ids, highlight=["content_ltks"]) sres = settings.retriever.search(query, search.index_name(tenant_id), kb_ids, highlight=["content_ltks"])
res = {"total": sres.total, "chunks": [], "doc": doc.to_dict()} res = {"total": sres.total, "chunks": [], "doc": doc.to_dict()}
for id in sres.ids: for id in sres.ids:
d = { d = {
@ -99,7 +98,7 @@ def get():
return get_data_error_result(message="Tenant not found!") return get_data_error_result(message="Tenant not found!")
for tenant in tenants: for tenant in tenants:
kb_ids = KnowledgebaseService.get_kb_ids(tenant.tenant_id) kb_ids = KnowledgebaseService.get_kb_ids(tenant.tenant_id)
chunk = globals.docStoreConn.get(chunk_id, search.index_name(tenant.tenant_id), kb_ids) chunk = settings.docStoreConn.get(chunk_id, search.index_name(tenant.tenant_id), kb_ids)
if chunk: if chunk:
break break
if chunk is None: if chunk is None:
@ -123,8 +122,8 @@ def get():
@manager.route('/set', methods=['POST']) # noqa: F821 @manager.route('/set', methods=['POST']) # noqa: F821
@login_required @login_required
@validate_request("doc_id", "chunk_id", "content_with_weight") @validate_request("doc_id", "chunk_id", "content_with_weight")
def set(): async def set():
req = request.json req = await request_json()
d = { d = {
"id": req["chunk_id"], "id": req["chunk_id"],
"content_with_weight": req["content_with_weight"]} "content_with_weight": req["content_with_weight"]}
@ -171,7 +170,7 @@ def set():
v, c = embd_mdl.encode([doc.name, req["content_with_weight"] if not d.get("question_kwd") else "\n".join(d["question_kwd"])]) v, c = embd_mdl.encode([doc.name, req["content_with_weight"] if not d.get("question_kwd") else "\n".join(d["question_kwd"])])
v = 0.1 * v[0] + 0.9 * v[1] if doc.parser_id != ParserType.QA else v[1] v = 0.1 * v[0] + 0.9 * v[1] if doc.parser_id != ParserType.QA else v[1]
d["q_%d_vec" % len(v)] = v.tolist() d["q_%d_vec" % len(v)] = v.tolist()
globals.docStoreConn.update({"id": req["chunk_id"]}, d, search.index_name(tenant_id), doc.kb_id) settings.docStoreConn.update({"id": req["chunk_id"]}, d, search.index_name(tenant_id), doc.kb_id)
return get_json_result(data=True) return get_json_result(data=True)
except Exception as e: except Exception as e:
return server_error_response(e) return server_error_response(e)
@ -180,14 +179,14 @@ def set():
@manager.route('/switch', methods=['POST']) # noqa: F821 @manager.route('/switch', methods=['POST']) # noqa: F821
@login_required @login_required
@validate_request("chunk_ids", "available_int", "doc_id") @validate_request("chunk_ids", "available_int", "doc_id")
def switch(): async def switch():
req = request.json req = await request_json()
try: try:
e, doc = DocumentService.get_by_id(req["doc_id"]) e, doc = DocumentService.get_by_id(req["doc_id"])
if not e: if not e:
return get_data_error_result(message="Document not found!") return get_data_error_result(message="Document not found!")
for cid in req["chunk_ids"]: for cid in req["chunk_ids"]:
if not globals.docStoreConn.update({"id": cid}, if not settings.docStoreConn.update({"id": cid},
{"available_int": int(req["available_int"])}, {"available_int": int(req["available_int"])},
search.index_name(DocumentService.get_tenant_id(req["doc_id"])), search.index_name(DocumentService.get_tenant_id(req["doc_id"])),
doc.kb_id): doc.kb_id):
@ -200,14 +199,13 @@ def switch():
@manager.route('/rm', methods=['POST']) # noqa: F821 @manager.route('/rm', methods=['POST']) # noqa: F821
@login_required @login_required
@validate_request("chunk_ids", "doc_id") @validate_request("chunk_ids", "doc_id")
def rm(): async def rm():
from rag.utils.storage_factory import STORAGE_IMPL req = await request_json()
req = request.json
try: try:
e, doc = DocumentService.get_by_id(req["doc_id"]) e, doc = DocumentService.get_by_id(req["doc_id"])
if not e: if not e:
return get_data_error_result(message="Document not found!") return get_data_error_result(message="Document not found!")
if not globals.docStoreConn.delete({"id": req["chunk_ids"]}, if not settings.docStoreConn.delete({"id": req["chunk_ids"]},
search.index_name(DocumentService.get_tenant_id(req["doc_id"])), search.index_name(DocumentService.get_tenant_id(req["doc_id"])),
doc.kb_id): doc.kb_id):
return get_data_error_result(message="Chunk deleting failure") return get_data_error_result(message="Chunk deleting failure")
@ -215,8 +213,8 @@ def rm():
chunk_number = len(deleted_chunk_ids) chunk_number = len(deleted_chunk_ids)
DocumentService.decrement_chunk_num(doc.id, doc.kb_id, 1, chunk_number, 0) DocumentService.decrement_chunk_num(doc.id, doc.kb_id, 1, chunk_number, 0)
for cid in deleted_chunk_ids: for cid in deleted_chunk_ids:
if STORAGE_IMPL.obj_exist(doc.kb_id, cid): if settings.STORAGE_IMPL.obj_exist(doc.kb_id, cid):
STORAGE_IMPL.rm(doc.kb_id, cid) settings.STORAGE_IMPL.rm(doc.kb_id, cid)
return get_json_result(data=True) return get_json_result(data=True)
except Exception as e: except Exception as e:
return server_error_response(e) return server_error_response(e)
@ -225,8 +223,8 @@ def rm():
@manager.route('/create', methods=['POST']) # noqa: F821 @manager.route('/create', methods=['POST']) # noqa: F821
@login_required @login_required
@validate_request("doc_id", "content_with_weight") @validate_request("doc_id", "content_with_weight")
def create(): async def create():
req = request.json req = await request_json()
chunck_id = xxhash.xxh64((req["content_with_weight"] + req["doc_id"]).encode("utf-8")).hexdigest() chunck_id = xxhash.xxh64((req["content_with_weight"] + req["doc_id"]).encode("utf-8")).hexdigest()
d = {"id": chunck_id, "content_ltks": rag_tokenizer.tokenize(req["content_with_weight"]), d = {"id": chunck_id, "content_ltks": rag_tokenizer.tokenize(req["content_with_weight"]),
"content_with_weight": req["content_with_weight"]} "content_with_weight": req["content_with_weight"]}
@ -271,7 +269,7 @@ def create():
v, c = embd_mdl.encode([doc.name, req["content_with_weight"] if not d["question_kwd"] else "\n".join(d["question_kwd"])]) v, c = embd_mdl.encode([doc.name, req["content_with_weight"] if not d["question_kwd"] else "\n".join(d["question_kwd"])])
v = 0.1 * v[0] + 0.9 * v[1] v = 0.1 * v[0] + 0.9 * v[1]
d["q_%d_vec" % len(v)] = v.tolist() d["q_%d_vec" % len(v)] = v.tolist()
globals.docStoreConn.insert([d], search.index_name(tenant_id), doc.kb_id) settings.docStoreConn.insert([d], search.index_name(tenant_id), doc.kb_id)
DocumentService.increment_chunk_num( DocumentService.increment_chunk_num(
doc.id, doc.kb_id, c, 1, 0) doc.id, doc.kb_id, c, 1, 0)
@ -283,8 +281,8 @@ def create():
@manager.route('/retrieval_test', methods=['POST']) # noqa: F821 @manager.route('/retrieval_test', methods=['POST']) # noqa: F821
@login_required @login_required
@validate_request("kb_id", "question") @validate_request("kb_id", "question")
def retrieval_test(): async def retrieval_test():
req = request.json req = await request_json()
page = int(req.get("page", 1)) page = int(req.get("page", 1))
size = int(req.get("size", 30)) size = int(req.get("size", 30))
question = req["question"] question = req["question"]
@ -347,7 +345,7 @@ def retrieval_test():
question += keyword_extraction(chat_mdl, question) question += keyword_extraction(chat_mdl, question)
labels = label_question(question, [kb]) labels = label_question(question, [kb])
ranks = globals.retriever.retrieval(question, embd_mdl, tenant_ids, kb_ids, page, size, ranks = settings.retriever.retrieval(question, embd_mdl, tenant_ids, kb_ids, page, size,
float(req.get("similarity_threshold", 0.0)), float(req.get("similarity_threshold", 0.0)),
float(req.get("vector_similarity_weight", 0.3)), float(req.get("vector_similarity_weight", 0.3)),
top, top,
@ -386,7 +384,7 @@ def knowledge_graph():
"doc_ids": [doc_id], "doc_ids": [doc_id],
"knowledge_graph_kwd": ["graph", "mind_map"] "knowledge_graph_kwd": ["graph", "mind_map"]
} }
sres = globals.retriever.search(req, search.index_name(tenant_id), kb_ids) sres = settings.retriever.search(req, search.index_name(tenant_id), kb_ids)
obj = {"graph": {}, "mind_map": {}} obj = {"graph": {}, "mind_map": {}}
for id in sres.ids[:2]: for id in sres.ids[:2]:
ty = sres.field[id]["knowledge_graph_kwd"] ty = sres.field[id]["knowledge_graph_kwd"]

View File

@ -13,21 +13,32 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# #
import asyncio
import json
import logging
import time import time
import uuid
from html import escape
from typing import Any
from flask import request from quart import request, make_response
from flask_login import login_required, current_user from google_auth_oauthlib.flow import Flow
from api.db import InputType from api.db import InputType
from api.db.services.connector_service import ConnectorService, Connector2KbService, SyncLogsService from api.db.services.connector_service import ConnectorService, SyncLogsService
from api.utils.api_utils import get_json_result, validate_request, get_data_error_result from api.utils.api_utils import get_data_error_result, get_json_result, validate_request
from common.misc_utils import get_uuid
from common.constants import RetCode, TaskStatus from common.constants import RetCode, TaskStatus
from common.data_source.config import GOOGLE_DRIVE_WEB_OAUTH_REDIRECT_URI, DocumentSource
from common.data_source.google_util.constant import GOOGLE_DRIVE_WEB_OAUTH_POPUP_TEMPLATE, GOOGLE_SCOPES
from common.misc_utils import get_uuid
from rag.utils.redis_conn import REDIS_CONN
from api.apps import login_required, current_user
@manager.route("/set", methods=["POST"]) # noqa: F821 @manager.route("/set", methods=["POST"]) # noqa: F821
@login_required @login_required
def set_connector(): async def set_connector():
req = request.json req = await request.json
if req.get("id"): if req.get("id"):
conn = {fld: req[fld] for fld in ["prune_freq", "refresh_freq", "config", "timeout_secs"] if fld in req} conn = {fld: req[fld] for fld in ["prune_freq", "refresh_freq", "config", "timeout_secs"] if fld in req}
ConnectorService.update_by_id(req["id"], conn) ConnectorService.update_by_id(req["id"], conn)
@ -42,13 +53,12 @@ def set_connector():
"config": req["config"], "config": req["config"],
"refresh_freq": int(req.get("refresh_freq", 30)), "refresh_freq": int(req.get("refresh_freq", 30)),
"prune_freq": int(req.get("prune_freq", 720)), "prune_freq": int(req.get("prune_freq", 720)),
"timeout_secs": int(req.get("timeout_secs", 60*29)), "timeout_secs": int(req.get("timeout_secs", 60 * 29)),
"status": TaskStatus.SCHEDULE "status": TaskStatus.SCHEDULE,
} }
conn["status"] = TaskStatus.SCHEDULE ConnectorService.save(**conn)
ConnectorService.save(**conn) await asyncio.sleep(1)
time.sleep(1)
e, conn = ConnectorService.get_by_id(req["id"]) e, conn = ConnectorService.get_by_id(req["id"])
return get_json_result(data=conn.to_dict()) return get_json_result(data=conn.to_dict())
@ -73,13 +83,14 @@ def get_connector(connector_id):
@login_required @login_required
def list_logs(connector_id): def list_logs(connector_id):
req = request.args.to_dict(flat=True) req = request.args.to_dict(flat=True)
return get_json_result(data=SyncLogsService.list_sync_tasks(connector_id, int(req.get("page", 1)), int(req.get("page_size", 15)))) arr, total = SyncLogsService.list_sync_tasks(connector_id, int(req.get("page", 1)), int(req.get("page_size", 15)))
return get_json_result(data={"total": total, "logs": arr})
@manager.route("/<connector_id>/resume", methods=["PUT"]) # noqa: F821 @manager.route("/<connector_id>/resume", methods=["PUT"]) # noqa: F821
@login_required @login_required
def resume(connector_id): async def resume(connector_id):
req = request.json req = await request.json
if req.get("resume"): if req.get("resume"):
ConnectorService.resume(connector_id, TaskStatus.SCHEDULE) ConnectorService.resume(connector_id, TaskStatus.SCHEDULE)
else: else:
@ -87,14 +98,14 @@ def resume(connector_id):
return get_json_result(data=True) return get_json_result(data=True)
@manager.route("/<connector_id>/link", methods=["POST"]) # noqa: F821 @manager.route("/<connector_id>/rebuild", methods=["PUT"]) # noqa: F821
@validate_request("kb_ids")
@login_required @login_required
def link_kb(connector_id): @validate_request("kb_id")
req = request.json async def rebuild(connector_id):
errors = Connector2KbService.link_kb(connector_id, req["kb_ids"], current_user.id) req = await request.json
if errors: err = ConnectorService.rebuild(req["kb_id"], connector_id, current_user.id)
return get_json_result(data=False, message=errors, code=RetCode.SERVER_ERROR) if err:
return get_json_result(data=False, message=err, code=RetCode.SERVER_ERROR)
return get_json_result(data=True) return get_json_result(data=True)
@ -103,4 +114,182 @@ def link_kb(connector_id):
def rm_connector(connector_id): def rm_connector(connector_id):
ConnectorService.resume(connector_id, TaskStatus.CANCEL) ConnectorService.resume(connector_id, TaskStatus.CANCEL)
ConnectorService.delete_by_id(connector_id) ConnectorService.delete_by_id(connector_id)
return get_json_result(data=True) return get_json_result(data=True)
GOOGLE_WEB_FLOW_STATE_PREFIX = "google_drive_web_flow_state"
GOOGLE_WEB_FLOW_RESULT_PREFIX = "google_drive_web_flow_result"
WEB_FLOW_TTL_SECS = 15 * 60
def _web_state_cache_key(flow_id: str) -> str:
return f"{GOOGLE_WEB_FLOW_STATE_PREFIX}:{flow_id}"
def _web_result_cache_key(flow_id: str) -> str:
return f"{GOOGLE_WEB_FLOW_RESULT_PREFIX}:{flow_id}"
def _load_credentials(payload: str | dict[str, Any]) -> dict[str, Any]:
if isinstance(payload, dict):
return payload
try:
return json.loads(payload)
except json.JSONDecodeError as exc: # pragma: no cover - defensive
raise ValueError("Invalid Google credentials JSON.") from exc
def _get_web_client_config(credentials: dict[str, Any]) -> dict[str, Any]:
web_section = credentials.get("web")
if not isinstance(web_section, dict):
raise ValueError("Google OAuth JSON must include a 'web' client configuration to use browser-based authorization.")
return {"web": web_section}
async def _render_web_oauth_popup(flow_id: str, success: bool, message: str):
status = "success" if success else "error"
auto_close = "window.close();" if success else ""
escaped_message = escape(message)
payload_json = json.dumps(
{
"type": "ragflow-google-drive-oauth",
"status": status,
"flowId": flow_id or "",
"message": message,
}
)
html = GOOGLE_DRIVE_WEB_OAUTH_POPUP_TEMPLATE.format(
heading="Authorization complete" if success else "Authorization failed",
message=escaped_message,
payload_json=payload_json,
auto_close=auto_close,
)
response = await make_response(html, 200)
response.headers["Content-Type"] = "text/html; charset=utf-8"
return response
@manager.route("/google-drive/oauth/web/start", methods=["POST"]) # noqa: F821
@login_required
@validate_request("credentials")
async def start_google_drive_web_oauth():
if not GOOGLE_DRIVE_WEB_OAUTH_REDIRECT_URI:
return get_json_result(
code=RetCode.SERVER_ERROR,
message="Google Drive OAuth redirect URI is not configured on the server.",
)
req = await request.json or {}
raw_credentials = req.get("credentials", "")
try:
credentials = _load_credentials(raw_credentials)
except ValueError as exc:
return get_json_result(code=RetCode.ARGUMENT_ERROR, message=str(exc))
if credentials.get("refresh_token"):
return get_json_result(
code=RetCode.ARGUMENT_ERROR,
message="Uploaded credentials already include a refresh token.",
)
try:
client_config = _get_web_client_config(credentials)
except ValueError as exc:
return get_json_result(code=RetCode.ARGUMENT_ERROR, message=str(exc))
flow_id = str(uuid.uuid4())
try:
flow = Flow.from_client_config(client_config, scopes=GOOGLE_SCOPES[DocumentSource.GOOGLE_DRIVE])
flow.redirect_uri = GOOGLE_DRIVE_WEB_OAUTH_REDIRECT_URI
authorization_url, _ = flow.authorization_url(
access_type="offline",
include_granted_scopes="true",
prompt="consent",
state=flow_id,
)
except Exception as exc: # pragma: no cover - defensive
logging.exception("Failed to create Google OAuth flow: %s", exc)
return get_json_result(
code=RetCode.SERVER_ERROR,
message="Failed to initialize Google OAuth flow. Please verify the uploaded client configuration.",
)
cache_payload = {
"user_id": current_user.id,
"client_config": client_config,
"created_at": int(time.time()),
}
REDIS_CONN.set_obj(_web_state_cache_key(flow_id), cache_payload, WEB_FLOW_TTL_SECS)
return get_json_result(
data={
"flow_id": flow_id,
"authorization_url": authorization_url,
"expires_in": WEB_FLOW_TTL_SECS,
}
)
@manager.route("/google-drive/oauth/web/callback", methods=["GET"]) # noqa: F821
async def google_drive_web_oauth_callback():
state_id = request.args.get("state")
error = request.args.get("error")
error_description = request.args.get("error_description") or error
if not state_id:
return await _render_web_oauth_popup("", False, "Missing OAuth state parameter.")
state_cache = REDIS_CONN.get(_web_state_cache_key(state_id))
if not state_cache:
return await _render_web_oauth_popup(state_id, False, "Authorization session expired. Please restart from the main window.")
state_obj = json.loads(state_cache)
client_config = state_obj.get("client_config")
if not client_config:
REDIS_CONN.delete(_web_state_cache_key(state_id))
return await _render_web_oauth_popup(state_id, False, "Authorization session was invalid. Please retry.")
if error:
REDIS_CONN.delete(_web_state_cache_key(state_id))
return await _render_web_oauth_popup(state_id, False, error_description or "Authorization was cancelled.")
code = request.args.get("code")
if not code:
return await _render_web_oauth_popup(state_id, False, "Missing authorization code from Google.")
try:
flow = Flow.from_client_config(client_config, scopes=GOOGLE_SCOPES[DocumentSource.GOOGLE_DRIVE])
flow.redirect_uri = GOOGLE_DRIVE_WEB_OAUTH_REDIRECT_URI
flow.fetch_token(code=code)
except Exception as exc: # pragma: no cover - defensive
logging.exception("Failed to exchange Google OAuth code: %s", exc)
REDIS_CONN.delete(_web_state_cache_key(state_id))
return await _render_web_oauth_popup(state_id, False, "Failed to exchange tokens with Google. Please retry.")
creds_json = flow.credentials.to_json()
result_payload = {
"user_id": state_obj.get("user_id"),
"credentials": creds_json,
}
REDIS_CONN.set_obj(_web_result_cache_key(state_id), result_payload, WEB_FLOW_TTL_SECS)
REDIS_CONN.delete(_web_state_cache_key(state_id))
return await _render_web_oauth_popup(state_id, True, "Authorization completed successfully.")
@manager.route("/google-drive/oauth/web/result", methods=["POST"]) # noqa: F821
@login_required
@validate_request("flow_id")
async def poll_google_drive_web_result():
req = await request.json or {}
flow_id = req.get("flow_id")
cache_raw = REDIS_CONN.get(_web_result_cache_key(flow_id))
if not cache_raw:
return get_json_result(code=RetCode.RUNNING, message="Authorization is still pending.")
result = json.loads(cache_raw)
if result.get("user_id") != current_user.id:
return get_json_result(code=RetCode.PERMISSION_ERROR, message="You are not allowed to access this authorization result.")
REDIS_CONN.delete(_web_result_cache_key(flow_id))
return get_json_result(data={"credentials": result.get("credentials")})

View File

@ -17,8 +17,8 @@ import json
import re import re
import logging import logging
from copy import deepcopy from copy import deepcopy
from flask import Response, request from quart import Response, request
from flask_login import current_user, login_required from api.apps import current_user, login_required
from api.db.db_models import APIToken from api.db.db_models import APIToken
from api.db.services.conversation_service import ConversationService, structure_answer from api.db.services.conversation_service import ConversationService, structure_answer
from api.db.services.dialog_service import DialogService, ask, chat, gen_mindmap from api.db.services.dialog_service import DialogService, ask, chat, gen_mindmap
@ -34,8 +34,8 @@ from common.constants import RetCode, LLMType
@manager.route("/set", methods=["POST"]) # noqa: F821 @manager.route("/set", methods=["POST"]) # noqa: F821
@login_required @login_required
def set_conversation(): async def set_conversation():
req = request.json req = await request.json
conv_id = req.get("conversation_id") conv_id = req.get("conversation_id")
is_new = req.get("is_new") is_new = req.get("is_new")
name = req.get("name", "New conversation") name = req.get("name", "New conversation")
@ -85,7 +85,6 @@ def get():
if not e: if not e:
return get_data_error_result(message="Conversation not found!") return get_data_error_result(message="Conversation not found!")
tenants = UserTenantService.query(user_id=current_user.id) tenants = UserTenantService.query(user_id=current_user.id)
avatar = None
for tenant in tenants: for tenant in tenants:
dialog = DialogService.query(tenant_id=tenant.tenant_id, id=conv.dialog_id) dialog = DialogService.query(tenant_id=tenant.tenant_id, id=conv.dialog_id)
if dialog and len(dialog) > 0: if dialog and len(dialog) > 0:
@ -129,8 +128,9 @@ def getsse(dialog_id):
@manager.route("/rm", methods=["POST"]) # noqa: F821 @manager.route("/rm", methods=["POST"]) # noqa: F821
@login_required @login_required
def rm(): async def rm():
conv_ids = request.json["conversation_ids"] req = await request.json
conv_ids = req["conversation_ids"]
try: try:
for cid in conv_ids: for cid in conv_ids:
exist, conv = ConversationService.get_by_id(cid) exist, conv = ConversationService.get_by_id(cid)
@ -166,8 +166,8 @@ def list_conversation():
@manager.route("/completion", methods=["POST"]) # noqa: F821 @manager.route("/completion", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("conversation_id", "messages") @validate_request("conversation_id", "messages")
def completion(): async def completion():
req = request.json req = await request.json
msg = [] msg = []
for m in req["messages"]: for m in req["messages"]:
if m["role"] == "system": if m["role"] == "system":
@ -251,8 +251,8 @@ def completion():
@manager.route("/tts", methods=["POST"]) # noqa: F821 @manager.route("/tts", methods=["POST"]) # noqa: F821
@login_required @login_required
def tts(): async def tts():
req = request.json req = await request.json
text = req["text"] text = req["text"]
tenants = TenantService.get_info_by(current_user.id) tenants = TenantService.get_info_by(current_user.id)
@ -284,8 +284,8 @@ def tts():
@manager.route("/delete_msg", methods=["POST"]) # noqa: F821 @manager.route("/delete_msg", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("conversation_id", "message_id") @validate_request("conversation_id", "message_id")
def delete_msg(): async def delete_msg():
req = request.json req = await request.json
e, conv = ConversationService.get_by_id(req["conversation_id"]) e, conv = ConversationService.get_by_id(req["conversation_id"])
if not e: if not e:
return get_data_error_result(message="Conversation not found!") return get_data_error_result(message="Conversation not found!")
@ -307,8 +307,8 @@ def delete_msg():
@manager.route("/thumbup", methods=["POST"]) # noqa: F821 @manager.route("/thumbup", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("conversation_id", "message_id") @validate_request("conversation_id", "message_id")
def thumbup(): async def thumbup():
req = request.json req = await request.json
e, conv = ConversationService.get_by_id(req["conversation_id"]) e, conv = ConversationService.get_by_id(req["conversation_id"])
if not e: if not e:
return get_data_error_result(message="Conversation not found!") return get_data_error_result(message="Conversation not found!")
@ -334,8 +334,8 @@ def thumbup():
@manager.route("/ask", methods=["POST"]) # noqa: F821 @manager.route("/ask", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("question", "kb_ids") @validate_request("question", "kb_ids")
def ask_about(): async def ask_about():
req = request.json req = await request.json
uid = current_user.id uid = current_user.id
search_id = req.get("search_id", "") search_id = req.get("search_id", "")
@ -366,8 +366,8 @@ def ask_about():
@manager.route("/mindmap", methods=["POST"]) # noqa: F821 @manager.route("/mindmap", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("question", "kb_ids") @validate_request("question", "kb_ids")
def mindmap(): async def mindmap():
req = request.json req = await request.json
search_id = req.get("search_id", "") search_id = req.get("search_id", "")
search_app = SearchService.get_detail(search_id) if search_id else {} search_app = SearchService.get_detail(search_id) if search_id else {}
search_config = search_app.get("search_config", {}) if search_app else {} search_config = search_app.get("search_config", {}) if search_app else {}
@ -384,8 +384,8 @@ def mindmap():
@manager.route("/related_questions", methods=["POST"]) # noqa: F821 @manager.route("/related_questions", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("question") @validate_request("question")
def related_questions(): async def related_questions():
req = request.json req = await request.json
search_id = req.get("search_id", "") search_id = req.get("search_id", "")
search_config = {} search_config = {}

View File

@ -14,8 +14,7 @@
# limitations under the License. # limitations under the License.
# #
from flask import request from quart import request
from flask_login import login_required, current_user
from api.db.services import duplicate_name from api.db.services import duplicate_name
from api.db.services.dialog_service import DialogService from api.db.services.dialog_service import DialogService
from common.constants import StatusEnum from common.constants import StatusEnum
@ -26,13 +25,14 @@ from api.utils.api_utils import server_error_response, get_data_error_result, va
from common.misc_utils import get_uuid from common.misc_utils import get_uuid
from common.constants import RetCode from common.constants import RetCode
from api.utils.api_utils import get_json_result from api.utils.api_utils import get_json_result
from api.apps import login_required, current_user
@manager.route('/set', methods=['POST']) # noqa: F821 @manager.route('/set', methods=['POST']) # noqa: F821
@validate_request("prompt_config") @validate_request("prompt_config")
@login_required @login_required
def set_dialog(): async def set_dialog():
req = request.json req = await request.json
dialog_id = req.get("dialog_id", "") dialog_id = req.get("dialog_id", "")
is_create = not dialog_id is_create = not dialog_id
name = req.get("name", "New Dialog") name = req.get("name", "New Dialog")
@ -154,33 +154,34 @@ def get_kb_names(kb_ids):
@login_required @login_required
def list_dialogs(): def list_dialogs():
try: try:
diags = DialogService.query( conversations = DialogService.query(
tenant_id=current_user.id, tenant_id=current_user.id,
status=StatusEnum.VALID.value, status=StatusEnum.VALID.value,
reverse=True, reverse=True,
order_by=DialogService.model.create_time) order_by=DialogService.model.create_time)
diags = [d.to_dict() for d in diags] conversations = [d.to_dict() for d in conversations]
for d in diags: for conversation in conversations:
d["kb_ids"], d["kb_names"] = get_kb_names(d["kb_ids"]) conversation["kb_ids"], conversation["kb_names"] = get_kb_names(conversation["kb_ids"])
return get_json_result(data=diags) return get_json_result(data=conversations)
except Exception as e: except Exception as e:
return server_error_response(e) return server_error_response(e)
@manager.route('/next', methods=['POST']) # noqa: F821 @manager.route('/next', methods=['POST']) # noqa: F821
@login_required @login_required
def list_dialogs_next(): async def list_dialogs_next():
keywords = request.args.get("keywords", "") args = request.args
page_number = int(request.args.get("page", 0)) keywords = args.get("keywords", "")
items_per_page = int(request.args.get("page_size", 0)) page_number = int(args.get("page", 0))
parser_id = request.args.get("parser_id") items_per_page = int(args.get("page_size", 0))
orderby = request.args.get("orderby", "create_time") parser_id = args.get("parser_id")
if request.args.get("desc", "true").lower() == "false": orderby = args.get("orderby", "create_time")
if args.get("desc", "true").lower() == "false":
desc = False desc = False
else: else:
desc = True desc = True
req = request.get_json() req = await request.get_json()
owner_ids = req.get("owner_ids", []) owner_ids = req.get("owner_ids", [])
try: try:
if not owner_ids: if not owner_ids:
@ -207,8 +208,8 @@ def list_dialogs_next():
@manager.route('/rm', methods=['POST']) # noqa: F821 @manager.route('/rm', methods=['POST']) # noqa: F821
@login_required @login_required
@validate_request("dialog_ids") @validate_request("dialog_ids")
def rm(): async def rm():
req = request.json req = await request.json
dialog_list=[] dialog_list=[]
tenants = UserTenantService.query(user_id=current_user.id) tenants = UserTenantService.query(user_id=current_user.id)
try: try:

View File

@ -18,11 +18,8 @@ import os.path
import pathlib import pathlib
import re import re
from pathlib import Path from pathlib import Path
from quart import request, make_response
import flask from api.apps import current_user, login_required
from flask import request
from flask_login import current_user, login_required
from api.common.check_team_permission import check_kb_team_permission from api.common.check_team_permission import check_kb_team_permission
from api.constants import FILE_NAME_LEN_LIMIT, IMG_BASE64_PREFIX from api.constants import FILE_NAME_LEN_LIMIT, IMG_BASE64_PREFIX
from api.db import VALID_FILE_TYPES, FileType from api.db import VALID_FILE_TYPES, FileType
@ -39,7 +36,7 @@ from api.utils.api_utils import (
get_data_error_result, get_data_error_result,
get_json_result, get_json_result,
server_error_response, server_error_response,
validate_request, validate_request, request_json,
) )
from api.utils.file_utils import filename_type, thumbnail from api.utils.file_utils import filename_type, thumbnail
from common.file_utils import get_project_base_directory from common.file_utils import get_project_base_directory
@ -47,21 +44,22 @@ from common.constants import RetCode, VALID_TASK_STATUS, ParserType, TaskStatus
from api.utils.web_utils import CONTENT_TYPE_MAP, html2pdf, is_valid_url from api.utils.web_utils import CONTENT_TYPE_MAP, html2pdf, is_valid_url
from deepdoc.parser.html_parser import RAGFlowHtmlParser from deepdoc.parser.html_parser import RAGFlowHtmlParser
from rag.nlp import search, rag_tokenizer from rag.nlp import search, rag_tokenizer
from rag.utils.storage_factory import STORAGE_IMPL from common import settings
from common import globals
@manager.route("/upload", methods=["POST"]) # noqa: F821 @manager.route("/upload", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("kb_id") @validate_request("kb_id")
def upload(): async def upload():
kb_id = request.form.get("kb_id") form = await request.form
kb_id = form.get("kb_id")
if not kb_id: if not kb_id:
return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR)
if "file" not in request.files: files = await request.files
if "file" not in files:
return get_json_result(data=False, message="No file part!", code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message="No file part!", code=RetCode.ARGUMENT_ERROR)
file_objs = request.files.getlist("file") file_objs = files.getlist("file")
for file_obj in file_objs: for file_obj in file_objs:
if file_obj.filename == "": if file_obj.filename == "":
return get_json_result(data=False, message="No file selected!", code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message="No file selected!", code=RetCode.ARGUMENT_ERROR)
@ -88,12 +86,13 @@ def upload():
@manager.route("/web_crawl", methods=["POST"]) # noqa: F821 @manager.route("/web_crawl", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("kb_id", "name", "url") @validate_request("kb_id", "name", "url")
def web_crawl(): async def web_crawl():
kb_id = request.form.get("kb_id") form = await request.form
kb_id = form.get("kb_id")
if not kb_id: if not kb_id:
return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR)
name = request.form.get("name") name = form.get("name")
url = request.form.get("url") url = form.get("url")
if not is_valid_url(url): if not is_valid_url(url):
return get_json_result(data=False, message="The URL format is invalid", code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message="The URL format is invalid", code=RetCode.ARGUMENT_ERROR)
e, kb = KnowledgebaseService.get_by_id(kb_id) e, kb = KnowledgebaseService.get_by_id(kb_id)
@ -119,9 +118,9 @@ def web_crawl():
raise RuntimeError("This type of file has not been supported yet!") raise RuntimeError("This type of file has not been supported yet!")
location = filename location = filename
while STORAGE_IMPL.obj_exist(kb_id, location): while settings.STORAGE_IMPL.obj_exist(kb_id, location):
location += "_" location += "_"
STORAGE_IMPL.put(kb_id, location, blob) settings.STORAGE_IMPL.put(kb_id, location, blob)
doc = { doc = {
"id": get_uuid(), "id": get_uuid(),
"kb_id": kb.id, "kb_id": kb.id,
@ -153,8 +152,8 @@ def web_crawl():
@manager.route("/create", methods=["POST"]) # noqa: F821 @manager.route("/create", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("name", "kb_id") @validate_request("name", "kb_id")
def create(): async def create():
req = request.json req = await request_json()
kb_id = req["kb_id"] kb_id = req["kb_id"]
if not kb_id: if not kb_id:
return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR)
@ -209,7 +208,7 @@ def create():
@manager.route("/list", methods=["POST"]) # noqa: F821 @manager.route("/list", methods=["POST"]) # noqa: F821
@login_required @login_required
def list_docs(): async def list_docs():
kb_id = request.args.get("kb_id") kb_id = request.args.get("kb_id")
if not kb_id: if not kb_id:
return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR)
@ -231,7 +230,7 @@ def list_docs():
create_time_from = int(request.args.get("create_time_from", 0)) create_time_from = int(request.args.get("create_time_from", 0))
create_time_to = int(request.args.get("create_time_to", 0)) create_time_to = int(request.args.get("create_time_to", 0))
req = request.get_json() req = await request.get_json()
run_status = req.get("run_status", []) run_status = req.get("run_status", [])
if run_status: if run_status:
@ -261,6 +260,8 @@ def list_docs():
for doc_item in docs: for doc_item in docs:
if doc_item["thumbnail"] and not doc_item["thumbnail"].startswith(IMG_BASE64_PREFIX): if doc_item["thumbnail"] and not doc_item["thumbnail"].startswith(IMG_BASE64_PREFIX):
doc_item["thumbnail"] = f"/v1/document/image/{kb_id}-{doc_item['thumbnail']}" doc_item["thumbnail"] = f"/v1/document/image/{kb_id}-{doc_item['thumbnail']}"
if doc_item.get("source_type"):
doc_item["source_type"] = doc_item["source_type"].split("/")[0]
return get_json_result(data={"total": tol, "docs": docs}) return get_json_result(data={"total": tol, "docs": docs})
except Exception as e: except Exception as e:
@ -269,8 +270,8 @@ def list_docs():
@manager.route("/filter", methods=["POST"]) # noqa: F821 @manager.route("/filter", methods=["POST"]) # noqa: F821
@login_required @login_required
def get_filter(): async def get_filter():
req = request.get_json() req = await request.get_json()
kb_id = req.get("kb_id") kb_id = req.get("kb_id")
if not kb_id: if not kb_id:
@ -307,8 +308,8 @@ def get_filter():
@manager.route("/infos", methods=["POST"]) # noqa: F821 @manager.route("/infos", methods=["POST"]) # noqa: F821
@login_required @login_required
def docinfos(): async def doc_infos():
req = request.json req = await request_json()
doc_ids = req["doc_ids"] doc_ids = req["doc_ids"]
for doc_id in doc_ids: for doc_id in doc_ids:
if not DocumentService.accessible(doc_id, current_user.id): if not DocumentService.accessible(doc_id, current_user.id):
@ -339,8 +340,8 @@ def thumbnails():
@manager.route("/change_status", methods=["POST"]) # noqa: F821 @manager.route("/change_status", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("doc_ids", "status") @validate_request("doc_ids", "status")
def change_status(): async def change_status():
req = request.get_json() req = await request.get_json()
doc_ids = req.get("doc_ids", []) doc_ids = req.get("doc_ids", [])
status = str(req.get("status", "")) status = str(req.get("status", ""))
@ -367,7 +368,7 @@ def change_status():
continue continue
status_int = int(status) status_int = int(status)
if not globals.docStoreConn.update({"doc_id": doc_id}, {"available_int": status_int}, search.index_name(kb.tenant_id), doc.kb_id): if not settings.docStoreConn.update({"doc_id": doc_id}, {"available_int": status_int}, search.index_name(kb.tenant_id), doc.kb_id):
result[doc_id] = {"error": "Database error (docStore update)!"} result[doc_id] = {"error": "Database error (docStore update)!"}
result[doc_id] = {"status": status} result[doc_id] = {"status": status}
except Exception as e: except Exception as e:
@ -379,8 +380,8 @@ def change_status():
@manager.route("/rm", methods=["POST"]) # noqa: F821 @manager.route("/rm", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("doc_id") @validate_request("doc_id")
def rm(): async def rm():
req = request.json req = await request_json()
doc_ids = req["doc_id"] doc_ids = req["doc_id"]
if isinstance(doc_ids, str): if isinstance(doc_ids, str):
doc_ids = [doc_ids] doc_ids = [doc_ids]
@ -400,8 +401,8 @@ def rm():
@manager.route("/run", methods=["POST"]) # noqa: F821 @manager.route("/run", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("doc_ids", "run") @validate_request("doc_ids", "run")
def run(): async def run():
req = request.json req = await request_json()
for doc_id in req["doc_ids"]: for doc_id in req["doc_ids"]:
if not DocumentService.accessible(doc_id, current_user.id): if not DocumentService.accessible(doc_id, current_user.id):
return get_json_result(data=False, message="No authorization.", code=RetCode.AUTHENTICATION_ERROR) return get_json_result(data=False, message="No authorization.", code=RetCode.AUTHENTICATION_ERROR)
@ -432,8 +433,8 @@ def run():
DocumentService.update_by_id(id, info) DocumentService.update_by_id(id, info)
if req.get("delete", False): if req.get("delete", False):
TaskService.filter_delete([Task.doc_id == id]) TaskService.filter_delete([Task.doc_id == id])
if globals.docStoreConn.indexExist(search.index_name(tenant_id), doc.kb_id): if settings.docStoreConn.indexExist(search.index_name(tenant_id), doc.kb_id):
globals.docStoreConn.delete({"doc_id": id}, search.index_name(tenant_id), doc.kb_id) settings.docStoreConn.delete({"doc_id": id}, search.index_name(tenant_id), doc.kb_id)
if str(req["run"]) == TaskStatus.RUNNING.value: if str(req["run"]) == TaskStatus.RUNNING.value:
doc = doc.to_dict() doc = doc.to_dict()
@ -447,8 +448,8 @@ def run():
@manager.route("/rename", methods=["POST"]) # noqa: F821 @manager.route("/rename", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("doc_id", "name") @validate_request("doc_id", "name")
def rename(): async def rename():
req = request.json req = await request_json()
if not DocumentService.accessible(req["doc_id"], current_user.id): if not DocumentService.accessible(req["doc_id"], current_user.id):
return get_json_result(data=False, message="No authorization.", code=RetCode.AUTHENTICATION_ERROR) return get_json_result(data=False, message="No authorization.", code=RetCode.AUTHENTICATION_ERROR)
try: try:
@ -479,8 +480,8 @@ def rename():
"title_tks": title_tks, "title_tks": title_tks,
"title_sm_tks": rag_tokenizer.fine_grained_tokenize(title_tks), "title_sm_tks": rag_tokenizer.fine_grained_tokenize(title_tks),
} }
if globals.docStoreConn.indexExist(search.index_name(tenant_id), doc.kb_id): if settings.docStoreConn.indexExist(search.index_name(tenant_id), doc.kb_id):
globals.docStoreConn.update( settings.docStoreConn.update(
{"doc_id": req["doc_id"]}, {"doc_id": req["doc_id"]},
es_body, es_body,
search.index_name(tenant_id), search.index_name(tenant_id),
@ -494,19 +495,20 @@ def rename():
@manager.route("/get/<doc_id>", methods=["GET"]) # noqa: F821 @manager.route("/get/<doc_id>", methods=["GET"]) # noqa: F821
# @login_required # @login_required
def get(doc_id): async def get(doc_id):
try: try:
e, doc = DocumentService.get_by_id(doc_id) e, doc = DocumentService.get_by_id(doc_id)
if not e: if not e:
return get_data_error_result(message="Document not found!") return get_data_error_result(message="Document not found!")
b, n = File2DocumentService.get_storage_address(doc_id=doc_id) b, n = File2DocumentService.get_storage_address(doc_id=doc_id)
response = flask.make_response(STORAGE_IMPL.get(b, n)) response = await make_response(settings.STORAGE_IMPL.get(b, n))
ext = re.search(r"\.([^.]+)$", doc.name.lower()) ext = re.search(r"\.([^.]+)$", doc.name.lower())
ext = ext.group(1) if ext else None ext = ext.group(1) if ext else None
if ext: if ext:
if doc.type == FileType.VISUAL.value: if doc.type == FileType.VISUAL.value:
content_type = CONTENT_TYPE_MAP.get(ext, f"image/{ext}") content_type = CONTENT_TYPE_MAP.get(ext, f"image/{ext}")
else: else:
content_type = CONTENT_TYPE_MAP.get(ext, f"application/{ext}") content_type = CONTENT_TYPE_MAP.get(ext, f"application/{ext}")
@ -516,12 +518,28 @@ def get(doc_id):
return server_error_response(e) return server_error_response(e)
@manager.route("/download/<attachment_id>", methods=["GET"]) # noqa: F821
@login_required
async def download_attachment(attachment_id):
try:
ext = request.args.get("ext", "markdown")
data = settings.STORAGE_IMPL.get(current_user.id, attachment_id)
# data = settings.STORAGE_IMPL.get("eb500d50bb0411f0907561d2782adda5", attachment_id)
response = await make_response(data)
response.headers.set("Content-Type", CONTENT_TYPE_MAP.get(ext, f"application/{ext}"))
return response
except Exception as e:
return server_error_response(e)
@manager.route("/change_parser", methods=["POST"]) # noqa: F821 @manager.route("/change_parser", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("doc_id") @validate_request("doc_id")
def change_parser(): async def change_parser():
req = request.json req = await request_json()
if not DocumentService.accessible(req["doc_id"], current_user.id): if not DocumentService.accessible(req["doc_id"], current_user.id):
return get_json_result(data=False, message="No authorization.", code=RetCode.AUTHENTICATION_ERROR) return get_json_result(data=False, message="No authorization.", code=RetCode.AUTHENTICATION_ERROR)
@ -541,8 +559,9 @@ def change_parser():
tenant_id = DocumentService.get_tenant_id(req["doc_id"]) tenant_id = DocumentService.get_tenant_id(req["doc_id"])
if not tenant_id: if not tenant_id:
return get_data_error_result(message="Tenant not found!") return get_data_error_result(message="Tenant not found!")
if globals.docStoreConn.indexExist(search.index_name(tenant_id), doc.kb_id): if settings.docStoreConn.indexExist(search.index_name(tenant_id), doc.kb_id):
globals.docStoreConn.delete({"doc_id": doc.id}, search.index_name(tenant_id), doc.kb_id) settings.docStoreConn.delete({"doc_id": doc.id}, search.index_name(tenant_id), doc.kb_id)
return None
try: try:
if "pipeline_id" in req and req["pipeline_id"] != "": if "pipeline_id" in req and req["pipeline_id"] != "":
@ -571,13 +590,13 @@ def change_parser():
@manager.route("/image/<image_id>", methods=["GET"]) # noqa: F821 @manager.route("/image/<image_id>", methods=["GET"]) # noqa: F821
# @login_required # @login_required
def get_image(image_id): async def get_image(image_id):
try: try:
arr = image_id.split("-") arr = image_id.split("-")
if len(arr) != 2: if len(arr) != 2:
return get_data_error_result(message="Image not found.") return get_data_error_result(message="Image not found.")
bkt, nm = image_id.split("-") bkt, nm = image_id.split("-")
response = flask.make_response(STORAGE_IMPL.get(bkt, nm)) response = await make_response(settings.STORAGE_IMPL.get(bkt, nm))
response.headers.set("Content-Type", "image/JPEG") response.headers.set("Content-Type", "image/JPEG")
return response return response
except Exception as e: except Exception as e:
@ -587,24 +606,25 @@ def get_image(image_id):
@manager.route("/upload_and_parse", methods=["POST"]) # noqa: F821 @manager.route("/upload_and_parse", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("conversation_id") @validate_request("conversation_id")
def upload_and_parse(): async def upload_and_parse():
if "file" not in request.files: files = await request.file
if "file" not in files:
return get_json_result(data=False, message="No file part!", code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message="No file part!", code=RetCode.ARGUMENT_ERROR)
file_objs = request.files.getlist("file") file_objs = files.getlist("file")
for file_obj in file_objs: for file_obj in file_objs:
if file_obj.filename == "": if file_obj.filename == "":
return get_json_result(data=False, message="No file selected!", code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message="No file selected!", code=RetCode.ARGUMENT_ERROR)
doc_ids = doc_upload_and_parse(request.form.get("conversation_id"), file_objs, current_user.id) form = await request.form
doc_ids = doc_upload_and_parse(form.get("conversation_id"), file_objs, current_user.id)
return get_json_result(data=doc_ids) return get_json_result(data=doc_ids)
@manager.route("/parse", methods=["POST"]) # noqa: F821 @manager.route("/parse", methods=["POST"]) # noqa: F821
@login_required @login_required
def parse(): async def parse():
url = request.json.get("url") if request.json else "" url = await request.json.get("url") if await request.json else ""
if url: if url:
if not is_valid_url(url): if not is_valid_url(url):
return get_json_result(data=False, message="The URL format is invalid", code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message="The URL format is invalid", code=RetCode.ARGUMENT_ERROR)
@ -645,10 +665,11 @@ def parse():
txt = FileService.parse_docs([f], current_user.id) txt = FileService.parse_docs([f], current_user.id)
return get_json_result(data=txt) return get_json_result(data=txt)
if "file" not in request.files: files = await request.files
if "file" not in files:
return get_json_result(data=False, message="No file part!", code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message="No file part!", code=RetCode.ARGUMENT_ERROR)
file_objs = request.files.getlist("file") file_objs = files.getlist("file")
txt = FileService.parse_docs(file_objs, current_user.id) txt = FileService.parse_docs(file_objs, current_user.id)
return get_json_result(data=txt) return get_json_result(data=txt)
@ -657,8 +678,8 @@ def parse():
@manager.route("/set_meta", methods=["POST"]) # noqa: F821 @manager.route("/set_meta", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("doc_id", "meta") @validate_request("doc_id", "meta")
def set_meta(): async def set_meta():
req = request.json req = await request_json()
if not DocumentService.accessible(req["doc_id"], current_user.id): if not DocumentService.accessible(req["doc_id"], current_user.id):
return get_json_result(data=False, message="No authorization.", code=RetCode.AUTHENTICATION_ERROR) return get_json_result(data=False, message="No authorization.", code=RetCode.AUTHENTICATION_ERROR)
try: try:

View File

@ -19,8 +19,8 @@ from pathlib import Path
from api.db.services.file2document_service import File2DocumentService from api.db.services.file2document_service import File2DocumentService
from api.db.services.file_service import FileService from api.db.services.file_service import FileService
from flask import request from quart import request
from flask_login import login_required, current_user from api.apps import login_required, current_user
from api.db.services.knowledgebase_service import KnowledgebaseService from api.db.services.knowledgebase_service import KnowledgebaseService
from api.utils.api_utils import server_error_response, get_data_error_result, validate_request from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
from common.misc_utils import get_uuid from common.misc_utils import get_uuid
@ -33,8 +33,8 @@ from api.utils.api_utils import get_json_result
@manager.route('/convert', methods=['POST']) # noqa: F821 @manager.route('/convert', methods=['POST']) # noqa: F821
@login_required @login_required
@validate_request("file_ids", "kb_ids") @validate_request("file_ids", "kb_ids")
def convert(): async def convert():
req = request.json req = await request.json
kb_ids = req["kb_ids"] kb_ids = req["kb_ids"]
file_ids = req["file_ids"] file_ids = req["file_ids"]
file2documents = [] file2documents = []
@ -103,8 +103,8 @@ def convert():
@manager.route('/rm', methods=['POST']) # noqa: F821 @manager.route('/rm', methods=['POST']) # noqa: F821
@login_required @login_required
@validate_request("file_ids") @validate_request("file_ids")
def rm(): async def rm():
req = request.json req = await request.json
file_ids = req["file_ids"] file_ids = req["file_ids"]
if not file_ids: if not file_ids:
return get_json_result( return get_json_result(

View File

@ -17,10 +17,8 @@ import logging
import os import os
import pathlib import pathlib
import re import re
from quart import request, make_response
import flask from api.apps import login_required, current_user
from flask import request
from flask_login import login_required, current_user
from api.common.check_team_permission import check_file_team_permission from api.common.check_team_permission import check_file_team_permission
from api.db.services.document_service import DocumentService from api.db.services.document_service import DocumentService
@ -34,23 +32,25 @@ from api.db.services.file_service import FileService
from api.utils.api_utils import get_json_result from api.utils.api_utils import get_json_result
from api.utils.file_utils import filename_type from api.utils.file_utils import filename_type
from api.utils.web_utils import CONTENT_TYPE_MAP from api.utils.web_utils import CONTENT_TYPE_MAP
from rag.utils.storage_factory import STORAGE_IMPL from common import settings
@manager.route('/upload', methods=['POST']) # noqa: F821 @manager.route('/upload', methods=['POST']) # noqa: F821
@login_required @login_required
# @validate_request("parent_id") # @validate_request("parent_id")
def upload(): async def upload():
pf_id = request.form.get("parent_id") form = await request.form
pf_id = form.get("parent_id")
if not pf_id: if not pf_id:
root_folder = FileService.get_root_folder(current_user.id) root_folder = FileService.get_root_folder(current_user.id)
pf_id = root_folder["id"] pf_id = root_folder["id"]
if 'file' not in request.files: files = await request.files
if 'file' not in files:
return get_json_result( return get_json_result(
data=False, message='No file part!', code=RetCode.ARGUMENT_ERROR) data=False, message='No file part!', code=RetCode.ARGUMENT_ERROR)
file_objs = request.files.getlist('file') file_objs = files.getlist('file')
for file_obj in file_objs: for file_obj in file_objs:
if file_obj.filename == '': if file_obj.filename == '':
@ -95,14 +95,14 @@ def upload():
# file type # file type
filetype = filename_type(file_obj_names[file_len - 1]) filetype = filename_type(file_obj_names[file_len - 1])
location = file_obj_names[file_len - 1] location = file_obj_names[file_len - 1]
while STORAGE_IMPL.obj_exist(last_folder.id, location): while settings.STORAGE_IMPL.obj_exist(last_folder.id, location):
location += "_" location += "_"
blob = file_obj.read() blob = file_obj.read()
filename = duplicate_name( filename = duplicate_name(
FileService.query, FileService.query,
name=file_obj_names[file_len - 1], name=file_obj_names[file_len - 1],
parent_id=last_folder.id) parent_id=last_folder.id)
STORAGE_IMPL.put(last_folder.id, location, blob) settings.STORAGE_IMPL.put(last_folder.id, location, blob)
file = { file = {
"id": get_uuid(), "id": get_uuid(),
"parent_id": last_folder.id, "parent_id": last_folder.id,
@ -123,10 +123,10 @@ def upload():
@manager.route('/create', methods=['POST']) # noqa: F821 @manager.route('/create', methods=['POST']) # noqa: F821
@login_required @login_required
@validate_request("name") @validate_request("name")
def create(): async def create():
req = request.json req = await request.json
pf_id = request.json.get("parent_id") pf_id = await request.json.get("parent_id")
input_file_type = request.json.get("type") input_file_type = await request.json.get("type")
if not pf_id: if not pf_id:
root_folder = FileService.get_root_folder(current_user.id) root_folder = FileService.get_root_folder(current_user.id)
pf_id = root_folder["id"] pf_id = root_folder["id"]
@ -238,16 +238,16 @@ def get_all_parent_folders():
@manager.route("/rm", methods=["POST"]) # noqa: F821 @manager.route("/rm", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("file_ids") @validate_request("file_ids")
def rm(): async def rm():
req = request.json req = await request.json
file_ids = req["file_ids"] file_ids = req["file_ids"]
def _delete_single_file(file): def _delete_single_file(file):
try: try:
if file.location: if file.location:
STORAGE_IMPL.rm(file.parent_id, file.location) settings.STORAGE_IMPL.rm(file.parent_id, file.location)
except Exception: except Exception as e:
logging.exception(f"Fail to remove object: {file.parent_id}/{file.location}") logging.exception(f"Fail to remove object: {file.parent_id}/{file.location}, error: {e}")
informs = File2DocumentService.get_by_file_id(file.id) informs = File2DocumentService.get_by_file_id(file.id)
for inform in informs: for inform in informs:
@ -299,8 +299,8 @@ def rm():
@manager.route('/rename', methods=['POST']) # noqa: F821 @manager.route('/rename', methods=['POST']) # noqa: F821
@login_required @login_required
@validate_request("file_id", "name") @validate_request("file_id", "name")
def rename(): async def rename():
req = request.json req = await request.json
try: try:
e, file = FileService.get_by_id(req["file_id"]) e, file = FileService.get_by_id(req["file_id"])
if not e: if not e:
@ -338,7 +338,7 @@ def rename():
@manager.route('/get/<file_id>', methods=['GET']) # noqa: F821 @manager.route('/get/<file_id>', methods=['GET']) # noqa: F821
@login_required @login_required
def get(file_id): async def get(file_id):
try: try:
e, file = FileService.get_by_id(file_id) e, file = FileService.get_by_id(file_id)
if not e: if not e:
@ -346,12 +346,12 @@ def get(file_id):
if not check_file_team_permission(file, current_user.id): if not check_file_team_permission(file, current_user.id):
return get_json_result(data=False, message='No authorization.', code=RetCode.AUTHENTICATION_ERROR) return get_json_result(data=False, message='No authorization.', code=RetCode.AUTHENTICATION_ERROR)
blob = STORAGE_IMPL.get(file.parent_id, file.location) blob = settings.STORAGE_IMPL.get(file.parent_id, file.location)
if not blob: if not blob:
b, n = File2DocumentService.get_storage_address(file_id=file_id) b, n = File2DocumentService.get_storage_address(file_id=file_id)
blob = STORAGE_IMPL.get(b, n) blob = settings.STORAGE_IMPL.get(b, n)
response = flask.make_response(blob) response = await make_response(blob)
ext = re.search(r"\.([^.]+)$", file.name.lower()) ext = re.search(r"\.([^.]+)$", file.name.lower())
ext = ext.group(1) if ext else None ext = ext.group(1) if ext else None
if ext: if ext:
@ -368,8 +368,8 @@ def get(file_id):
@manager.route("/mv", methods=["POST"]) # noqa: F821 @manager.route("/mv", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("src_file_ids", "dest_file_id") @validate_request("src_file_ids", "dest_file_id")
def move(): async def move():
req = request.json req = await request.json
try: try:
file_ids = req["src_file_ids"] file_ids = req["src_file_ids"]
dest_parent_id = req["dest_file_id"] dest_parent_id = req["dest_file_id"]
@ -428,11 +428,11 @@ def move():
filename = source_file_entry.name filename = source_file_entry.name
new_location = filename new_location = filename
while STORAGE_IMPL.obj_exist(dest_folder.id, new_location): while settings.STORAGE_IMPL.obj_exist(dest_folder.id, new_location):
new_location += "_" new_location += "_"
try: try:
STORAGE_IMPL.move(old_parent_id, old_location, dest_folder.id, new_location) settings.STORAGE_IMPL.move(old_parent_id, old_location, dest_folder.id, new_location)
except Exception as storage_err: except Exception as storage_err:
raise RuntimeError(f"Move file failed at storage layer: {str(storage_err)}") raise RuntimeError(f"Move file failed at storage layer: {str(storage_err)}")

View File

@ -16,12 +16,11 @@
import json import json
import logging import logging
import random import random
import re
from flask import request from quart import request
from flask_login import login_required, current_user
import numpy as np import numpy as np
from api.db.services.connector_service import Connector2KbService from api.db.services.connector_service import Connector2KbService
from api.db.services.llm_service import LLMBundle from api.db.services.llm_service import LLMBundle
from api.db.services.document_service import DocumentService, queue_raptor_o_graphrag_tasks from api.db.services.document_service import DocumentService, queue_raptor_o_graphrag_tasks
@ -30,36 +29,40 @@ from api.db.services.file_service import FileService
from api.db.services.pipeline_operation_log_service import PipelineOperationLogService from api.db.services.pipeline_operation_log_service import PipelineOperationLogService
from api.db.services.task_service import TaskService, GRAPH_RAPTOR_FAKE_DOC_ID from api.db.services.task_service import TaskService, GRAPH_RAPTOR_FAKE_DOC_ID
from api.db.services.user_service import TenantService, UserTenantService from api.db.services.user_service import TenantService, UserTenantService
from api.utils.api_utils import get_error_data_result, server_error_response, get_data_error_result, validate_request, not_allowed_parameters from api.utils.api_utils import get_error_data_result, server_error_response, get_data_error_result, validate_request, not_allowed_parameters, \
request_json
from api.db import VALID_FILE_TYPES from api.db import VALID_FILE_TYPES
from api.db.services.knowledgebase_service import KnowledgebaseService from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.db_models import File from api.db.db_models import File
from api.utils.api_utils import get_json_result from api.utils.api_utils import get_json_result
from rag.nlp import search from rag.nlp import search
from api.constants import DATASET_NAME_LIMIT from api.constants import DATASET_NAME_LIMIT
from rag.settings import PAGERANK_FLD
from rag.utils.redis_conn import REDIS_CONN from rag.utils.redis_conn import REDIS_CONN
from rag.utils.storage_factory import STORAGE_IMPL from rag.utils.doc_store_conn import OrderByExpr
from rag.utils.doc_store_conn import OrderByExpr from common.constants import RetCode, PipelineTaskType, StatusEnum, VALID_TASK_STATUS, FileSource, LLMType, PAGERANK_FLD
from common.constants import RetCode, PipelineTaskType, StatusEnum, VALID_TASK_STATUS, FileSource, LLMType from common import settings
from common import globals from api.apps import login_required, current_user
@manager.route('/create', methods=['post']) # noqa: F821 @manager.route('/create', methods=['post']) # noqa: F821
@login_required @login_required
@validate_request("name") @validate_request("name")
def create(): async def create():
req = request.json req = await request_json()
req = KnowledgebaseService.create_with_name( e, res = KnowledgebaseService.create_with_name(
name = req.pop("name", None), name = req.pop("name", None),
tenant_id = current_user.id, tenant_id = current_user.id,
parser_id = req.pop("parser_id", None), parser_id = req.pop("parser_id", None),
**req **req
) )
if not e:
return res
try: try:
if not KnowledgebaseService.save(**req): if not KnowledgebaseService.save(**res):
return get_data_error_result() return get_data_error_result()
return get_json_result(data={"kb_id":req["id"]}) return get_json_result(data={"kb_id":res["id"]})
except Exception as e: except Exception as e:
return server_error_response(e) return server_error_response(e)
@ -68,8 +71,8 @@ def create():
@login_required @login_required
@validate_request("kb_id", "name", "description", "parser_id") @validate_request("kb_id", "name", "description", "parser_id")
@not_allowed_parameters("id", "tenant_id", "created_by", "create_time", "update_time", "create_date", "update_date", "created_by") @not_allowed_parameters("id", "tenant_id", "created_by", "create_time", "update_time", "create_date", "update_date", "created_by")
def update(): async def update():
req = request.json req = await request_json()
if not isinstance(req["name"], str): if not isinstance(req["name"], str):
return get_data_error_result(message="Dataset name must be string.") return get_data_error_result(message="Dataset name must be string.")
if req["name"].strip() == "": if req["name"].strip() == "":
@ -104,24 +107,32 @@ def update():
message="Duplicated knowledgebase name.") message="Duplicated knowledgebase name.")
del req["kb_id"] del req["kb_id"]
connectors = []
if "connectors" in req:
connectors = req["connectors"]
del req["connectors"]
if not KnowledgebaseService.update_by_id(kb.id, req): if not KnowledgebaseService.update_by_id(kb.id, req):
return get_data_error_result() return get_data_error_result()
if kb.pagerank != req.get("pagerank", 0): if kb.pagerank != req.get("pagerank", 0):
if req.get("pagerank", 0) > 0: if req.get("pagerank", 0) > 0:
globals.docStoreConn.update({"kb_id": kb.id}, {PAGERANK_FLD: req["pagerank"]}, settings.docStoreConn.update({"kb_id": kb.id}, {PAGERANK_FLD: req["pagerank"]},
search.index_name(kb.tenant_id), kb.id) search.index_name(kb.tenant_id), kb.id)
else: else:
# Elasticsearch requires PAGERANK_FLD be non-zero! # Elasticsearch requires PAGERANK_FLD be non-zero!
globals.docStoreConn.update({"exists": PAGERANK_FLD}, {"remove": PAGERANK_FLD}, settings.docStoreConn.update({"exists": PAGERANK_FLD}, {"remove": PAGERANK_FLD},
search.index_name(kb.tenant_id), kb.id) search.index_name(kb.tenant_id), kb.id)
e, kb = KnowledgebaseService.get_by_id(kb.id) e, kb = KnowledgebaseService.get_by_id(kb.id)
if not e: if not e:
return get_data_error_result( return get_data_error_result(
message="Database error (Knowledgebase rename)!") message="Database error (Knowledgebase rename)!")
errors = Connector2KbService.link_connectors(kb.id, [conn for conn in connectors], current_user.id)
if errors:
logging.error("Link KB errors: ", errors)
kb = kb.to_dict() kb = kb.to_dict()
kb.update(req) kb.update(req)
kb["connectors"] = connectors
return get_json_result(data=kb) return get_json_result(data=kb)
except Exception as e: except Exception as e:
@ -159,18 +170,19 @@ def detail():
@manager.route('/list', methods=['POST']) # noqa: F821 @manager.route('/list', methods=['POST']) # noqa: F821
@login_required @login_required
def list_kbs(): async def list_kbs():
keywords = request.args.get("keywords", "") args = request.args
page_number = int(request.args.get("page", 0)) keywords = args.get("keywords", "")
items_per_page = int(request.args.get("page_size", 0)) page_number = int(args.get("page", 0))
parser_id = request.args.get("parser_id") items_per_page = int(args.get("page_size", 0))
orderby = request.args.get("orderby", "create_time") parser_id = args.get("parser_id")
if request.args.get("desc", "true").lower() == "false": orderby = args.get("orderby", "create_time")
if args.get("desc", "true").lower() == "false":
desc = False desc = False
else: else:
desc = True desc = True
req = request.get_json() req = await request_json()
owner_ids = req.get("owner_ids", []) owner_ids = req.get("owner_ids", [])
try: try:
if not owner_ids: if not owner_ids:
@ -192,11 +204,12 @@ def list_kbs():
except Exception as e: except Exception as e:
return server_error_response(e) return server_error_response(e)
@manager.route('/rm', methods=['post']) # noqa: F821 @manager.route('/rm', methods=['post']) # noqa: F821
@login_required @login_required
@validate_request("kb_id") @validate_request("kb_id")
def rm(): async def rm():
req = request.json req = await request_json()
if not KnowledgebaseService.accessible4deletion(req["kb_id"], current_user.id): if not KnowledgebaseService.accessible4deletion(req["kb_id"], current_user.id):
return get_json_result( return get_json_result(
data=False, data=False,
@ -225,10 +238,10 @@ def rm():
return get_data_error_result( return get_data_error_result(
message="Database error (Knowledgebase removal)!") message="Database error (Knowledgebase removal)!")
for kb in kbs: for kb in kbs:
globals.docStoreConn.delete({"kb_id": kb.id}, search.index_name(kb.tenant_id), kb.id) settings.docStoreConn.delete({"kb_id": kb.id}, search.index_name(kb.tenant_id), kb.id)
globals.docStoreConn.deleteIdx(search.index_name(kb.tenant_id), kb.id) settings.docStoreConn.deleteIdx(search.index_name(kb.tenant_id), kb.id)
if hasattr(STORAGE_IMPL, 'remove_bucket'): if hasattr(settings.STORAGE_IMPL, 'remove_bucket'):
STORAGE_IMPL.remove_bucket(kb.id) settings.STORAGE_IMPL.remove_bucket(kb.id)
return get_json_result(data=True) return get_json_result(data=True)
except Exception as e: except Exception as e:
return server_error_response(e) return server_error_response(e)
@ -247,7 +260,7 @@ def list_tags(kb_id):
tenants = UserTenantService.get_tenants_by_user_id(current_user.id) tenants = UserTenantService.get_tenants_by_user_id(current_user.id)
tags = [] tags = []
for tenant in tenants: for tenant in tenants:
tags += globals.retriever.all_tags(tenant["tenant_id"], [kb_id]) tags += settings.retriever.all_tags(tenant["tenant_id"], [kb_id])
return get_json_result(data=tags) return get_json_result(data=tags)
@ -266,14 +279,14 @@ def list_tags_from_kbs():
tenants = UserTenantService.get_tenants_by_user_id(current_user.id) tenants = UserTenantService.get_tenants_by_user_id(current_user.id)
tags = [] tags = []
for tenant in tenants: for tenant in tenants:
tags += globals.retriever.all_tags(tenant["tenant_id"], kb_ids) tags += settings.retriever.all_tags(tenant["tenant_id"], kb_ids)
return get_json_result(data=tags) return get_json_result(data=tags)
@manager.route('/<kb_id>/rm_tags', methods=['POST']) # noqa: F821 @manager.route('/<kb_id>/rm_tags', methods=['POST']) # noqa: F821
@login_required @login_required
def rm_tags(kb_id): async def rm_tags(kb_id):
req = request.json req = await request_json()
if not KnowledgebaseService.accessible(kb_id, current_user.id): if not KnowledgebaseService.accessible(kb_id, current_user.id):
return get_json_result( return get_json_result(
data=False, data=False,
@ -283,7 +296,7 @@ def rm_tags(kb_id):
e, kb = KnowledgebaseService.get_by_id(kb_id) e, kb = KnowledgebaseService.get_by_id(kb_id)
for t in req["tags"]: for t in req["tags"]:
globals.docStoreConn.update({"tag_kwd": t, "kb_id": [kb_id]}, settings.docStoreConn.update({"tag_kwd": t, "kb_id": [kb_id]},
{"remove": {"tag_kwd": t}}, {"remove": {"tag_kwd": t}},
search.index_name(kb.tenant_id), search.index_name(kb.tenant_id),
kb_id) kb_id)
@ -292,8 +305,8 @@ def rm_tags(kb_id):
@manager.route('/<kb_id>/rename_tag', methods=['POST']) # noqa: F821 @manager.route('/<kb_id>/rename_tag', methods=['POST']) # noqa: F821
@login_required @login_required
def rename_tags(kb_id): async def rename_tags(kb_id):
req = request.json req = await request_json()
if not KnowledgebaseService.accessible(kb_id, current_user.id): if not KnowledgebaseService.accessible(kb_id, current_user.id):
return get_json_result( return get_json_result(
data=False, data=False,
@ -302,7 +315,7 @@ def rename_tags(kb_id):
) )
e, kb = KnowledgebaseService.get_by_id(kb_id) e, kb = KnowledgebaseService.get_by_id(kb_id)
globals.docStoreConn.update({"tag_kwd": req["from_tag"], "kb_id": [kb_id]}, settings.docStoreConn.update({"tag_kwd": req["from_tag"], "kb_id": [kb_id]},
{"remove": {"tag_kwd": req["from_tag"].strip()}, "add": {"tag_kwd": req["to_tag"]}}, {"remove": {"tag_kwd": req["from_tag"].strip()}, "add": {"tag_kwd": req["to_tag"]}},
search.index_name(kb.tenant_id), search.index_name(kb.tenant_id),
kb_id) kb_id)
@ -325,9 +338,9 @@ def knowledge_graph(kb_id):
} }
obj = {"graph": {}, "mind_map": {}} obj = {"graph": {}, "mind_map": {}}
if not globals.docStoreConn.indexExist(search.index_name(kb.tenant_id), kb_id): if not settings.docStoreConn.indexExist(search.index_name(kb.tenant_id), kb_id):
return get_json_result(data=obj) return get_json_result(data=obj)
sres = globals.retriever.search(req, search.index_name(kb.tenant_id), [kb_id]) sres = settings.retriever.search(req, search.index_name(kb.tenant_id), [kb_id])
if not len(sres.ids): if not len(sres.ids):
return get_json_result(data=obj) return get_json_result(data=obj)
@ -359,7 +372,7 @@ def delete_knowledge_graph(kb_id):
code=RetCode.AUTHENTICATION_ERROR code=RetCode.AUTHENTICATION_ERROR
) )
_, kb = KnowledgebaseService.get_by_id(kb_id) _, kb = KnowledgebaseService.get_by_id(kb_id)
globals.docStoreConn.delete({"knowledge_graph_kwd": ["graph", "subgraph", "entity", "relation"]}, search.index_name(kb.tenant_id), kb_id) settings.docStoreConn.delete({"knowledge_graph_kwd": ["graph", "subgraph", "entity", "relation"]}, search.index_name(kb.tenant_id), kb_id)
return get_json_result(data=True) return get_json_result(data=True)
@ -396,7 +409,7 @@ def get_basic_info():
@manager.route("/list_pipeline_logs", methods=["POST"]) # noqa: F821 @manager.route("/list_pipeline_logs", methods=["POST"]) # noqa: F821
@login_required @login_required
def list_pipeline_logs(): async def list_pipeline_logs():
kb_id = request.args.get("kb_id") kb_id = request.args.get("kb_id")
if not kb_id: if not kb_id:
return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR)
@ -415,7 +428,7 @@ def list_pipeline_logs():
if create_date_to > create_date_from: if create_date_to > create_date_from:
return get_data_error_result(message="Create data filter is abnormal.") return get_data_error_result(message="Create data filter is abnormal.")
req = request.get_json() req = await request_json()
operation_status = req.get("operation_status", []) operation_status = req.get("operation_status", [])
if operation_status: if operation_status:
@ -440,7 +453,7 @@ def list_pipeline_logs():
@manager.route("/list_pipeline_dataset_logs", methods=["POST"]) # noqa: F821 @manager.route("/list_pipeline_dataset_logs", methods=["POST"]) # noqa: F821
@login_required @login_required
def list_pipeline_dataset_logs(): async def list_pipeline_dataset_logs():
kb_id = request.args.get("kb_id") kb_id = request.args.get("kb_id")
if not kb_id: if not kb_id:
return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR)
@ -457,7 +470,7 @@ def list_pipeline_dataset_logs():
if create_date_to > create_date_from: if create_date_to > create_date_from:
return get_data_error_result(message="Create data filter is abnormal.") return get_data_error_result(message="Create data filter is abnormal.")
req = request.get_json() req = await request_json()
operation_status = req.get("operation_status", []) operation_status = req.get("operation_status", [])
if operation_status: if operation_status:
@ -474,12 +487,12 @@ def list_pipeline_dataset_logs():
@manager.route("/delete_pipeline_logs", methods=["POST"]) # noqa: F821 @manager.route("/delete_pipeline_logs", methods=["POST"]) # noqa: F821
@login_required @login_required
def delete_pipeline_logs(): async def delete_pipeline_logs():
kb_id = request.args.get("kb_id") kb_id = request.args.get("kb_id")
if not kb_id: if not kb_id:
return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR) return get_json_result(data=False, message='Lack of "KB ID"', code=RetCode.ARGUMENT_ERROR)
req = request.get_json() req = await request_json()
log_ids = req.get("log_ids", []) log_ids = req.get("log_ids", [])
PipelineOperationLogService.delete_by_ids(log_ids) PipelineOperationLogService.delete_by_ids(log_ids)
@ -503,8 +516,8 @@ def pipeline_log_detail():
@manager.route("/run_graphrag", methods=["POST"]) # noqa: F821 @manager.route("/run_graphrag", methods=["POST"]) # noqa: F821
@login_required @login_required
def run_graphrag(): async def run_graphrag():
req = request.json req = await request_json()
kb_id = req.get("kb_id", "") kb_id = req.get("kb_id", "")
if not kb_id: if not kb_id:
@ -565,15 +578,15 @@ def trace_graphrag():
ok, task = TaskService.get_by_id(task_id) ok, task = TaskService.get_by_id(task_id)
if not ok: if not ok:
return get_error_data_result(message="GraphRAG Task Not Found or Error Occurred") return get_json_result(data={})
return get_json_result(data=task.to_dict()) return get_json_result(data=task.to_dict())
@manager.route("/run_raptor", methods=["POST"]) # noqa: F821 @manager.route("/run_raptor", methods=["POST"]) # noqa: F821
@login_required @login_required
def run_raptor(): async def run_raptor():
req = request.json req = await request_json()
kb_id = req.get("kb_id", "") kb_id = req.get("kb_id", "")
if not kb_id: if not kb_id:
@ -641,8 +654,8 @@ def trace_raptor():
@manager.route("/run_mindmap", methods=["POST"]) # noqa: F821 @manager.route("/run_mindmap", methods=["POST"]) # noqa: F821
@login_required @login_required
def run_mindmap(): async def run_mindmap():
req = request.json req = await request_json()
kb_id = req.get("kb_id", "") kb_id = req.get("kb_id", "")
if not kb_id: if not kb_id:
@ -725,19 +738,21 @@ def delete_kb_task():
def cancel_task(task_id): def cancel_task(task_id):
REDIS_CONN.set(f"{task_id}-cancel", "x") REDIS_CONN.set(f"{task_id}-cancel", "x")
kb_task_id_field: str = ""
kb_task_finish_at: str = ""
match pipeline_task_type: match pipeline_task_type:
case PipelineTaskType.GRAPH_RAG: case PipelineTaskType.GRAPH_RAG:
kb_task_id_field = "graphrag_task_id" kb_task_id_field = "graphrag_task_id"
task_id = kb.graphrag_task_id task_id = kb.graphrag_task_id
kb_task_finish_at = "graphrag_task_finish_at" kb_task_finish_at = "graphrag_task_finish_at"
cancel_task(task_id) cancel_task(task_id)
globals.docStoreConn.delete({"knowledge_graph_kwd": ["graph", "subgraph", "entity", "relation"]}, search.index_name(kb.tenant_id), kb_id) settings.docStoreConn.delete({"knowledge_graph_kwd": ["graph", "subgraph", "entity", "relation"]}, search.index_name(kb.tenant_id), kb_id)
case PipelineTaskType.RAPTOR: case PipelineTaskType.RAPTOR:
kb_task_id_field = "raptor_task_id" kb_task_id_field = "raptor_task_id"
task_id = kb.raptor_task_id task_id = kb.raptor_task_id
kb_task_finish_at = "raptor_task_finish_at" kb_task_finish_at = "raptor_task_finish_at"
cancel_task(task_id) cancel_task(task_id)
globals.docStoreConn.delete({"raptor_kwd": ["raptor"]}, search.index_name(kb.tenant_id), kb_id) settings.docStoreConn.delete({"raptor_kwd": ["raptor"]}, search.index_name(kb.tenant_id), kb_id)
case PipelineTaskType.MINDMAP: case PipelineTaskType.MINDMAP:
kb_task_id_field = "mindmap_task_id" kb_task_id_field = "mindmap_task_id"
task_id = kb.mindmap_task_id task_id = kb.mindmap_task_id
@ -755,7 +770,7 @@ def delete_kb_task():
@manager.route("/check_embedding", methods=["post"]) # noqa: F821 @manager.route("/check_embedding", methods=["post"]) # noqa: F821
@login_required @login_required
def check_embedding(): async def check_embedding():
def _guess_vec_field(src: dict) -> str | None: def _guess_vec_field(src: dict) -> str | None:
for k in src or {}: for k in src or {}:
@ -774,14 +789,14 @@ def check_embedding():
def _to_1d(x): def _to_1d(x):
a = np.asarray(x, dtype=np.float32) a = np.asarray(x, dtype=np.float32)
return a.reshape(-1) return a.reshape(-1)
def _cos_sim(a, b, eps=1e-12): def _cos_sim(a, b, eps=1e-12):
a = _to_1d(a) a = _to_1d(a)
b = _to_1d(b) b = _to_1d(b)
na = np.linalg.norm(a) na = np.linalg.norm(a)
nb = np.linalg.norm(b) nb = np.linalg.norm(b)
if na < eps or nb < eps: if na < eps or nb < eps:
return 0.0 return 0.0
return float(np.dot(a, b) / (na * nb)) return float(np.dot(a, b) / (na * nb))
@ -801,12 +816,12 @@ def check_embedding():
offset=0, limit=1, offset=0, limit=1,
indexNames=index_nm, knowledgebaseIds=[kb_id] indexNames=index_nm, knowledgebaseIds=[kb_id]
) )
total = docStoreConn.getTotal(res0) total = docStoreConn.get_total(res0)
if total <= 0: if total <= 0:
return [] return []
n = min(n, total) n = min(n, total)
offsets = sorted(random.sample(range(total), n)) offsets = sorted(random.sample(range(min(total,1000)), n))
out = [] out = []
for off in offsets: for off in offsets:
@ -818,8 +833,8 @@ def check_embedding():
offset=off, limit=1, offset=off, limit=1,
indexNames=index_nm, knowledgebaseIds=[kb_id] indexNames=index_nm, knowledgebaseIds=[kb_id]
) )
ids = docStoreConn.getChunkIds(res1) ids = docStoreConn.get_chunk_ids(res1)
if not ids: if not ids:
continue continue
cid = ids[0] cid = ids[0]
@ -839,9 +854,14 @@ def check_embedding():
"position_int": full_doc.get("position_int"), "position_int": full_doc.get("position_int"),
"top_int": full_doc.get("top_int"), "top_int": full_doc.get("top_int"),
"content_with_weight": full_doc.get("content_with_weight") or "", "content_with_weight": full_doc.get("content_with_weight") or "",
"question_kwd": full_doc.get("question_kwd") or []
}) })
return out return out
req = request.json
def _clean(s: str) -> str:
s = re.sub(r"</?(table|td|caption|tr|th)( [^<>]{0,12})?>", " ", s or "")
return s if s else "None"
req = await request_json()
kb_id = req.get("kb_id", "") kb_id = req.get("kb_id", "")
embd_id = req.get("embd_id", "") embd_id = req.get("embd_id", "")
n = int(req.get("check_num", 5)) n = int(req.get("check_num", 5))
@ -849,12 +869,14 @@ def check_embedding():
tenant_id = kb.tenant_id tenant_id = kb.tenant_id
emb_mdl = LLMBundle(tenant_id, LLMType.EMBEDDING, embd_id) emb_mdl = LLMBundle(tenant_id, LLMType.EMBEDDING, embd_id)
samples = sample_random_chunks_with_vectors(globals.docStoreConn, tenant_id=tenant_id, kb_id=kb_id, n=n) samples = sample_random_chunks_with_vectors(settings.docStoreConn, tenant_id=tenant_id, kb_id=kb_id, n=n)
results, eff_sims = [], [] results, eff_sims = [], []
for ck in samples: for ck in samples:
txt = (ck.get("content_with_weight") or "").strip() title = ck.get("doc_name") or "Title"
if not txt: txt_in = "\n".join(ck.get("question_kwd") or []) or ck.get("content_with_weight") or ""
txt_in = _clean(txt_in)
if not txt_in:
results.append({"chunk_id": ck["chunk_id"], "reason": "no_text"}) results.append({"chunk_id": ck["chunk_id"], "reason": "no_text"})
continue continue
@ -863,10 +885,19 @@ def check_embedding():
continue continue
try: try:
qv, _ = emb_mdl.encode_queries(txt) v, _ = emb_mdl.encode([title, txt_in])
sim = _cos_sim(qv, ck["vector"]) assert len(v[1]) == len(ck["vector"]), f"The dimension ({len(v[1])}) of given embedding model is different from the original ({len(ck['vector'])})"
except Exception: sim_content = _cos_sim(v[1], ck["vector"])
return get_error_data_result(message="embedding failure") title_w = 0.1
qv_mix = title_w * v[0] + (1 - title_w) * v[1]
sim_mix = _cos_sim(qv_mix, ck["vector"])
sim = sim_content
mode = "content_only"
if sim_mix > sim:
sim = sim_mix
mode = "title+content"
except Exception as e:
return get_error_data_result(message=f"Embedding failure. {e}")
eff_sims.append(sim) eff_sims.append(sim)
results.append({ results.append({
@ -886,19 +917,10 @@ def check_embedding():
"avg_cos_sim": round(float(np.mean(eff_sims)) if eff_sims else 0.0, 6), "avg_cos_sim": round(float(np.mean(eff_sims)) if eff_sims else 0.0, 6),
"min_cos_sim": round(float(np.min(eff_sims)) if eff_sims else 0.0, 6), "min_cos_sim": round(float(np.min(eff_sims)) if eff_sims else 0.0, 6),
"max_cos_sim": round(float(np.max(eff_sims)) if eff_sims else 0.0, 6), "max_cos_sim": round(float(np.max(eff_sims)) if eff_sims else 0.0, 6),
"match_mode": mode,
} }
if summary["avg_cos_sim"] > 0.99: if summary["avg_cos_sim"] > 0.9:
return get_json_result(data={"summary": summary, "results": results}) return get_json_result(data={"summary": summary, "results": results})
return get_json_result(code=RetCode.NOT_EFFECTIVE, message="failed", data={"summary": summary, "results": results}) return get_json_result(code=RetCode.NOT_EFFECTIVE, message="Embedding model switch failed: the average similarity between old and new vectors is below 0.9, indicating incompatible vector spaces.", data={"summary": summary, "results": results})
@manager.route("/<kb_id>/link", methods=["POST"]) # noqa: F821
@validate_request("connector_ids")
@login_required
def link_connector(kb_id):
req = request.json
errors = Connector2KbService.link_connectors(kb_id, req["connector_ids"], current_user.id)
if errors:
return get_json_result(data=False, message=errors, code=RetCode.SERVER_ERROR)
return get_json_result(data=True)

View File

@ -15,8 +15,8 @@
# #
from flask import request from quart import request
from flask_login import current_user, login_required from api.apps import current_user, login_required
from langfuse import Langfuse from langfuse import Langfuse
from api.db.db_models import DB from api.db.db_models import DB
@ -27,8 +27,8 @@ from api.utils.api_utils import get_error_data_result, get_json_result, server_e
@manager.route("/api_key", methods=["POST", "PUT"]) # noqa: F821 @manager.route("/api_key", methods=["POST", "PUT"]) # noqa: F821
@login_required @login_required
@validate_request("secret_key", "public_key", "host") @validate_request("secret_key", "public_key", "host")
def set_api_key(): async def set_api_key():
req = request.get_json() req = await request.get_json()
secret_key = req.get("secret_key", "") secret_key = req.get("secret_key", "")
public_key = req.get("public_key", "") public_key = req.get("public_key", "")
host = req.get("host", "") host = req.get("host", "")

View File

@ -16,8 +16,9 @@
import logging import logging
import json import json
import os import os
from flask import request from quart import request
from flask_login import login_required, current_user
from api.apps import login_required, current_user
from api.db.services.tenant_llm_service import LLMFactoriesService, TenantLLMService from api.db.services.tenant_llm_service import LLMFactoriesService, TenantLLMService
from api.db.services.llm_service import LLMService from api.db.services.llm_service import LLMService
from api.utils.api_utils import server_error_response, get_data_error_result, validate_request from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
@ -33,7 +34,7 @@ from rag.llm import EmbeddingModel, ChatModel, RerankModel, CvModel, TTSModel
def factories(): def factories():
try: try:
fac = get_allowed_llm_factories() fac = get_allowed_llm_factories()
fac = [f.to_dict() for f in fac if f.name not in ["Youdao", "FastEmbed", "BAAI"]] fac = [f.to_dict() for f in fac if f.name not in ["Youdao", "FastEmbed", "BAAI", "Builtin"]]
llms = LLMService.get_all() llms = LLMService.get_all()
mdl_types = {} mdl_types = {}
for m in llms: for m in llms:
@ -52,8 +53,8 @@ def factories():
@manager.route("/set_api_key", methods=["POST"]) # noqa: F821 @manager.route("/set_api_key", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("llm_factory", "api_key") @validate_request("llm_factory", "api_key")
def set_api_key(): async def set_api_key():
req = request.json req = await request.json
# test if api key works # test if api key works
chat_passed, embd_passed, rerank_passed = False, False, False chat_passed, embd_passed, rerank_passed = False, False, False
factory = req["llm_factory"] factory = req["llm_factory"]
@ -122,13 +123,13 @@ def set_api_key():
@manager.route("/add_llm", methods=["POST"]) # noqa: F821 @manager.route("/add_llm", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("llm_factory") @validate_request("llm_factory")
def add_llm(): async def add_llm():
req = request.json req = await request.json
factory = req["llm_factory"] factory = req["llm_factory"]
api_key = req.get("api_key", "x") api_key = req.get("api_key", "x")
llm_name = req.get("llm_name") llm_name = req.get("llm_name")
if factory not in get_allowed_llm_factories(): if factory not in [f.name for f in get_allowed_llm_factories()]:
return get_data_error_result(message=f"LLM factory {factory} is not allowed") return get_data_error_result(message=f"LLM factory {factory} is not allowed")
def apikey_json(keys): def apikey_json(keys):
@ -142,11 +143,11 @@ def add_llm():
elif factory == "Tencent Hunyuan": elif factory == "Tencent Hunyuan":
req["api_key"] = apikey_json(["hunyuan_sid", "hunyuan_sk"]) req["api_key"] = apikey_json(["hunyuan_sid", "hunyuan_sk"])
return set_api_key() return await set_api_key()
elif factory == "Tencent Cloud": elif factory == "Tencent Cloud":
req["api_key"] = apikey_json(["tencent_cloud_sid", "tencent_cloud_sk"]) req["api_key"] = apikey_json(["tencent_cloud_sid", "tencent_cloud_sk"])
return set_api_key() return await set_api_key()
elif factory == "Bedrock": elif factory == "Bedrock":
# For Bedrock, due to its special authentication method # For Bedrock, due to its special authentication method
@ -267,8 +268,8 @@ def add_llm():
@manager.route("/delete_llm", methods=["POST"]) # noqa: F821 @manager.route("/delete_llm", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("llm_factory", "llm_name") @validate_request("llm_factory", "llm_name")
def delete_llm(): async def delete_llm():
req = request.json req = await request.json
TenantLLMService.filter_delete([TenantLLM.tenant_id == current_user.id, TenantLLM.llm_factory == req["llm_factory"], TenantLLM.llm_name == req["llm_name"]]) TenantLLMService.filter_delete([TenantLLM.tenant_id == current_user.id, TenantLLM.llm_factory == req["llm_factory"], TenantLLM.llm_name == req["llm_name"]])
return get_json_result(data=True) return get_json_result(data=True)
@ -276,8 +277,8 @@ def delete_llm():
@manager.route("/enable_llm", methods=["POST"]) # noqa: F821 @manager.route("/enable_llm", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("llm_factory", "llm_name") @validate_request("llm_factory", "llm_name")
def enable_llm(): async def enable_llm():
req = request.json req = await request.json
TenantLLMService.filter_update( TenantLLMService.filter_update(
[TenantLLM.tenant_id == current_user.id, TenantLLM.llm_factory == req["llm_factory"], TenantLLM.llm_name == req["llm_name"]], {"status": str(req.get("status", "1"))} [TenantLLM.tenant_id == current_user.id, TenantLLM.llm_factory == req["llm_factory"], TenantLLM.llm_name == req["llm_name"]], {"status": str(req.get("status", "1"))}
) )
@ -287,8 +288,8 @@ def enable_llm():
@manager.route("/delete_factory", methods=["POST"]) # noqa: F821 @manager.route("/delete_factory", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("llm_factory") @validate_request("llm_factory")
def delete_factory(): async def delete_factory():
req = request.json req = await request.json
TenantLLMService.filter_delete([TenantLLM.tenant_id == current_user.id, TenantLLM.llm_factory == req["llm_factory"]]) TenantLLMService.filter_delete([TenantLLM.tenant_id == current_user.id, TenantLLM.llm_factory == req["llm_factory"]])
return get_json_result(data=True) return get_json_result(data=True)
@ -348,7 +349,7 @@ def list_app():
facts = set([o.to_dict()["llm_factory"] for o in objs if o.api_key and o.status == StatusEnum.VALID.value]) facts = set([o.to_dict()["llm_factory"] for o in objs if o.api_key and o.status == StatusEnum.VALID.value])
status = {(o.llm_name + "@" + o.llm_factory) for o in objs if o.status == StatusEnum.VALID.value} status = {(o.llm_name + "@" + o.llm_factory) for o in objs if o.status == StatusEnum.VALID.value}
llms = LLMService.get_all() llms = LLMService.get_all()
llms = [m.to_dict() for m in llms if m.status == StatusEnum.VALID.value and m.fid not in weighted and (m.llm_name + "@" + m.fid) in status] llms = [m.to_dict() for m in llms if m.status == StatusEnum.VALID.value and m.fid not in weighted and (m.fid == 'Builtin' or (m.llm_name + "@" + m.fid) in status)]
for m in llms: for m in llms:
m["available"] = m["fid"] in facts or m["llm_name"].lower() == "flag-embedding" or m["fid"] in self_deployed m["available"] = m["fid"] in facts or m["llm_name"].lower() == "flag-embedding" or m["fid"] in self_deployed
if "tei-" in os.getenv("COMPOSE_PROFILES", "") and m["model_type"] == LLMType.EMBEDDING and m["fid"] == "Builtin" and m["llm_name"] == os.getenv("TEI_MODEL", ""): if "tei-" in os.getenv("COMPOSE_PROFILES", "") and m["model_type"] == LLMType.EMBEDDING and m["fid"] == "Builtin" and m["llm_name"] == os.getenv("TEI_MODEL", ""):
@ -358,7 +359,7 @@ def list_app():
for o in objs: for o in objs:
if o.llm_name + "@" + o.llm_factory in llm_set: if o.llm_name + "@" + o.llm_factory in llm_set:
continue continue
llms.append({"llm_name": o.llm_name, "model_type": o.model_type, "fid": o.llm_factory, "available": True}) llms.append({"llm_name": o.llm_name, "model_type": o.model_type, "fid": o.llm_factory, "available": True, "status": StatusEnum.VALID.value})
res = {} res = {}
for m in llms: for m in llms:

View File

@ -13,8 +13,8 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# #
from flask import Response, request from quart import Response, request
from flask_login import current_user, login_required from api.apps import current_user, login_required
from api.db.db_models import MCPServer from api.db.db_models import MCPServer
from api.db.services.mcp_server_service import MCPServerService from api.db.services.mcp_server_service import MCPServerService
@ -25,12 +25,12 @@ from common.misc_utils import get_uuid
from api.utils.api_utils import get_data_error_result, get_json_result, server_error_response, validate_request, \ from api.utils.api_utils import get_data_error_result, get_json_result, server_error_response, validate_request, \
get_mcp_tools get_mcp_tools
from api.utils.web_utils import get_float, safe_json_parse from api.utils.web_utils import get_float, safe_json_parse
from rag.utils.mcp_tool_call_conn import MCPToolCallSession, close_multiple_mcp_toolcall_sessions from common.mcp_tool_call_conn import MCPToolCallSession, close_multiple_mcp_toolcall_sessions
@manager.route("/list", methods=["POST"]) # noqa: F821 @manager.route("/list", methods=["POST"]) # noqa: F821
@login_required @login_required
def list_mcp() -> Response: async def list_mcp() -> Response:
keywords = request.args.get("keywords", "") keywords = request.args.get("keywords", "")
page_number = int(request.args.get("page", 0)) page_number = int(request.args.get("page", 0))
items_per_page = int(request.args.get("page_size", 0)) items_per_page = int(request.args.get("page_size", 0))
@ -40,7 +40,7 @@ def list_mcp() -> Response:
else: else:
desc = True desc = True
req = request.get_json() req = await request.get_json()
mcp_ids = req.get("mcp_ids", []) mcp_ids = req.get("mcp_ids", [])
try: try:
servers = MCPServerService.get_servers(current_user.id, mcp_ids, 0, 0, orderby, desc, keywords) or [] servers = MCPServerService.get_servers(current_user.id, mcp_ids, 0, 0, orderby, desc, keywords) or []
@ -72,8 +72,8 @@ def detail() -> Response:
@manager.route("/create", methods=["POST"]) # noqa: F821 @manager.route("/create", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("name", "url", "server_type") @validate_request("name", "url", "server_type")
def create() -> Response: async def create() -> Response:
req = request.get_json() req = await request.get_json()
server_type = req.get("server_type", "") server_type = req.get("server_type", "")
if server_type not in VALID_MCP_SERVER_TYPES: if server_type not in VALID_MCP_SERVER_TYPES:
@ -127,8 +127,8 @@ def create() -> Response:
@manager.route("/update", methods=["POST"]) # noqa: F821 @manager.route("/update", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("mcp_id") @validate_request("mcp_id")
def update() -> Response: async def update() -> Response:
req = request.get_json() req = await request.get_json()
mcp_id = req.get("mcp_id", "") mcp_id = req.get("mcp_id", "")
e, mcp_server = MCPServerService.get_by_id(mcp_id) e, mcp_server = MCPServerService.get_by_id(mcp_id)
@ -183,8 +183,8 @@ def update() -> Response:
@manager.route("/rm", methods=["POST"]) # noqa: F821 @manager.route("/rm", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("mcp_ids") @validate_request("mcp_ids")
def rm() -> Response: async def rm() -> Response:
req = request.get_json() req = await request.get_json()
mcp_ids = req.get("mcp_ids", []) mcp_ids = req.get("mcp_ids", [])
try: try:
@ -201,8 +201,8 @@ def rm() -> Response:
@manager.route("/import", methods=["POST"]) # noqa: F821 @manager.route("/import", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("mcpServers") @validate_request("mcpServers")
def import_multiple() -> Response: async def import_multiple() -> Response:
req = request.get_json() req = await request.get_json()
servers = req.get("mcpServers", {}) servers = req.get("mcpServers", {})
if not servers: if not servers:
return get_data_error_result(message="No MCP servers provided.") return get_data_error_result(message="No MCP servers provided.")
@ -268,8 +268,8 @@ def import_multiple() -> Response:
@manager.route("/export", methods=["POST"]) # noqa: F821 @manager.route("/export", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("mcp_ids") @validate_request("mcp_ids")
def export_multiple() -> Response: async def export_multiple() -> Response:
req = request.get_json() req = await request.get_json()
mcp_ids = req.get("mcp_ids", []) mcp_ids = req.get("mcp_ids", [])
if not mcp_ids: if not mcp_ids:
@ -300,8 +300,8 @@ def export_multiple() -> Response:
@manager.route("/list_tools", methods=["POST"]) # noqa: F821 @manager.route("/list_tools", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("mcp_ids") @validate_request("mcp_ids")
def list_tools() -> Response: async def list_tools() -> Response:
req = request.get_json() req = await request.get_json()
mcp_ids = req.get("mcp_ids", []) mcp_ids = req.get("mcp_ids", [])
if not mcp_ids: if not mcp_ids:
return get_data_error_result(message="No MCP server IDs provided.") return get_data_error_result(message="No MCP server IDs provided.")
@ -347,8 +347,8 @@ def list_tools() -> Response:
@manager.route("/test_tool", methods=["POST"]) # noqa: F821 @manager.route("/test_tool", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("mcp_id", "tool_name", "arguments") @validate_request("mcp_id", "tool_name", "arguments")
def test_tool() -> Response: async def test_tool() -> Response:
req = request.get_json() req = await request.get_json()
mcp_id = req.get("mcp_id", "") mcp_id = req.get("mcp_id", "")
if not mcp_id: if not mcp_id:
return get_data_error_result(message="No MCP server ID provided.") return get_data_error_result(message="No MCP server ID provided.")
@ -380,8 +380,8 @@ def test_tool() -> Response:
@manager.route("/cache_tools", methods=["POST"]) # noqa: F821 @manager.route("/cache_tools", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("mcp_id", "tools") @validate_request("mcp_id", "tools")
def cache_tool() -> Response: async def cache_tool() -> Response:
req = request.get_json() req = await request.get_json()
mcp_id = req.get("mcp_id", "") mcp_id = req.get("mcp_id", "")
if not mcp_id: if not mcp_id:
return get_data_error_result(message="No MCP server ID provided.") return get_data_error_result(message="No MCP server ID provided.")
@ -403,8 +403,8 @@ def cache_tool() -> Response:
@manager.route("/test_mcp", methods=["POST"]) # noqa: F821 @manager.route("/test_mcp", methods=["POST"]) # noqa: F821
@validate_request("url", "server_type") @validate_request("url", "server_type")
def test_mcp() -> Response: async def test_mcp() -> Response:
req = request.get_json() req = await request.get_json()
url = req.get("url", "") url = req.get("url", "")
if not url: if not url:

View File

@ -15,8 +15,8 @@
# #
from flask import Response from quart import Response
from flask_login import login_required from api.apps import login_required
from api.utils.api_utils import get_json_result from api.utils.api_utils import get_json_result
from plugin import GlobalPluginManager from plugin import GlobalPluginManager

View File

@ -15,15 +15,19 @@
# #
import json import json
import logging
import time import time
from typing import Any, cast from typing import Any, cast
from agent.canvas import Canvas
from api.db import CanvasCategory
from api.db.services.canvas_service import UserCanvasService from api.db.services.canvas_service import UserCanvasService
from api.db.services.user_canvas_version import UserCanvasVersionService from api.db.services.user_canvas_version import UserCanvasVersionService
from common.constants import RetCode from common.constants import RetCode
from common.misc_utils import get_uuid from common.misc_utils import get_uuid
from api.utils.api_utils import get_data_error_result, get_error_data_result, get_json_result, token_required from api.utils.api_utils import get_data_error_result, get_error_data_result, get_json_result, token_required
from api.utils.api_utils import get_result from api.utils.api_utils import get_result
from flask import request from quart import request, Response
@manager.route('/agents', methods=['GET']) # noqa: F821 @manager.route('/agents', methods=['GET']) # noqa: F821
@ -37,19 +41,19 @@ def list_agents(tenant_id):
return get_error_data_result("The agent doesn't exist.") return get_error_data_result("The agent doesn't exist.")
page_number = int(request.args.get("page", 1)) page_number = int(request.args.get("page", 1))
items_per_page = int(request.args.get("page_size", 30)) items_per_page = int(request.args.get("page_size", 30))
orderby = request.args.get("orderby", "update_time") order_by = request.args.get("orderby", "update_time")
if request.args.get("desc") == "False" or request.args.get("desc") == "false": if request.args.get("desc") == "False" or request.args.get("desc") == "false":
desc = False desc = False
else: else:
desc = True desc = True
canvas = UserCanvasService.get_list(tenant_id, page_number, items_per_page, orderby, desc, id, title) canvas = UserCanvasService.get_list(tenant_id, page_number, items_per_page, order_by, desc, id, title)
return get_result(data=canvas) return get_result(data=canvas)
@manager.route("/agents", methods=["POST"]) # noqa: F821 @manager.route("/agents", methods=["POST"]) # noqa: F821
@token_required @token_required
def create_agent(tenant_id: str): async def create_agent(tenant_id: str):
req: dict[str, Any] = cast(dict[str, Any], request.json) req: dict[str, Any] = cast(dict[str, Any], await request.json)
req["user_id"] = tenant_id req["user_id"] = tenant_id
if req.get("dsl") is not None: if req.get("dsl") is not None:
@ -85,8 +89,8 @@ def create_agent(tenant_id: str):
@manager.route("/agents/<agent_id>", methods=["PUT"]) # noqa: F821 @manager.route("/agents/<agent_id>", methods=["PUT"]) # noqa: F821
@token_required @token_required
def update_agent(tenant_id: str, agent_id: str): async def update_agent(tenant_id: str, agent_id: str):
req: dict[str, Any] = {k: v for k, v in cast(dict[str, Any], request.json).items() if v is not None} req: dict[str, Any] = {k: v for k, v in cast(dict[str, Any], (await request.json)).items() if v is not None}
req["user_id"] = tenant_id req["user_id"] = tenant_id
if req.get("dsl") is not None: if req.get("dsl") is not None:
@ -127,3 +131,49 @@ def delete_agent(tenant_id: str, agent_id: str):
UserCanvasService.delete_by_id(agent_id) UserCanvasService.delete_by_id(agent_id)
return get_json_result(data=True) return get_json_result(data=True)
@manager.route('/webhook/<agent_id>', methods=['POST']) # noqa: F821
@token_required
async def webhook(tenant_id: str, agent_id: str):
req = await request.json
if not UserCanvasService.accessible(req["id"], tenant_id):
return get_json_result(
data=False, message='Only owner of canvas authorized for this operation.',
code=RetCode.OPERATING_ERROR)
e, cvs = UserCanvasService.get_by_id(req["id"])
if not e:
return get_data_error_result(message="canvas not found.")
if not isinstance(cvs.dsl, str):
cvs.dsl = json.dumps(cvs.dsl, ensure_ascii=False)
if cvs.canvas_category == CanvasCategory.DataFlow:
return get_data_error_result(message="Dataflow can not be triggered by webhook.")
try:
canvas = Canvas(cvs.dsl, tenant_id, agent_id)
except Exception as e:
return get_json_result(
data=False, message=str(e),
code=RetCode.EXCEPTION_ERROR)
def sse():
nonlocal canvas
try:
for ans in canvas.run(query=req.get("query", ""), files=req.get("files", []), user_id=req.get("user_id", tenant_id), webhook_payload=req):
yield "data:" + json.dumps(ans, ensure_ascii=False) + "\n\n"
cvs.dsl = json.loads(str(canvas))
UserCanvasService.update_by_id(req["id"], cvs.to_dict())
except Exception as e:
logging.exception(e)
yield "data:" + json.dumps({"code": 500, "message": str(e), "data": False}, ensure_ascii=False) + "\n\n"
resp = Response(sse(), mimetype="text/event-stream")
resp.headers.add_header("Cache-control", "no-cache")
resp.headers.add_header("Connection", "keep-alive")
resp.headers.add_header("X-Accel-Buffering", "no")
resp.headers.add_header("Content-Type", "text/event-stream; charset=utf-8")
return resp

View File

@ -14,22 +14,20 @@
# limitations under the License. # limitations under the License.
# #
import logging import logging
from quart import request
from flask import request
from api.db.services.dialog_service import DialogService from api.db.services.dialog_service import DialogService
from api.db.services.knowledgebase_service import KnowledgebaseService from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.tenant_llm_service import TenantLLMService from api.db.services.tenant_llm_service import TenantLLMService
from api.db.services.user_service import TenantService from api.db.services.user_service import TenantService
from common.misc_utils import get_uuid from common.misc_utils import get_uuid
from common.constants import RetCode, StatusEnum from common.constants import RetCode, StatusEnum
from api.utils.api_utils import check_duplicate_ids, get_error_data_result, get_result, token_required from api.utils.api_utils import check_duplicate_ids, get_error_data_result, get_result, token_required, request_json
@manager.route("/chats", methods=["POST"]) # noqa: F821 @manager.route("/chats", methods=["POST"]) # noqa: F821
@token_required @token_required
def create(tenant_id): async def create(tenant_id):
req = request.json req = await request_json()
ids = [i for i in req.get("dataset_ids", []) if i] ids = [i for i in req.get("dataset_ids", []) if i]
for kb_id in ids: for kb_id in ids:
kbs = KnowledgebaseService.accessible(kb_id=kb_id, user_id=tenant_id) kbs = KnowledgebaseService.accessible(kb_id=kb_id, user_id=tenant_id)
@ -145,10 +143,10 @@ def create(tenant_id):
@manager.route("/chats/<chat_id>", methods=["PUT"]) # noqa: F821 @manager.route("/chats/<chat_id>", methods=["PUT"]) # noqa: F821
@token_required @token_required
def update(tenant_id, chat_id): async def update(tenant_id, chat_id):
if not DialogService.query(tenant_id=tenant_id, id=chat_id, status=StatusEnum.VALID.value): if not DialogService.query(tenant_id=tenant_id, id=chat_id, status=StatusEnum.VALID.value):
return get_error_data_result(message="You do not own the chat") return get_error_data_result(message="You do not own the chat")
req = request.json req = await request_json()
ids = req.get("dataset_ids", []) ids = req.get("dataset_ids", [])
if "show_quotation" in req: if "show_quotation" in req:
req["do_refer"] = req.pop("show_quotation") req["do_refer"] = req.pop("show_quotation")
@ -228,10 +226,10 @@ def update(tenant_id, chat_id):
@manager.route("/chats", methods=["DELETE"]) # noqa: F821 @manager.route("/chats", methods=["DELETE"]) # noqa: F821
@token_required @token_required
def delete(tenant_id): async def delete_chats(tenant_id):
errors = [] errors = []
success_count = 0 success_count = 0
req = request.json req = await request_json()
if not req: if not req:
ids = None ids = None
else: else:
@ -251,8 +249,8 @@ def delete(tenant_id):
errors.append(f"Assistant({id}) not found.") errors.append(f"Assistant({id}) not found.")
continue continue
temp_dict = {"status": StatusEnum.INVALID.value} temp_dict = {"status": StatusEnum.INVALID.value}
DialogService.update_by_id(id, temp_dict) success_count += DialogService.update_by_id(id, temp_dict)
success_count += 1 print(success_count, "$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$", flush=True)
if errors: if errors:
if success_count > 0: if success_count > 0:

View File

@ -18,13 +18,14 @@
import logging import logging
import os import os
import json import json
from flask import request from quart import request
from peewee import OperationalError from peewee import OperationalError
from api.db.db_models import File from api.db.db_models import File
from api.db.services.document_service import DocumentService from api.db.services.document_service import DocumentService, queue_raptor_o_graphrag_tasks
from api.db.services.file2document_service import File2DocumentService from api.db.services.file2document_service import File2DocumentService
from api.db.services.file_service import FileService from api.db.services.file_service import FileService
from api.db.services.knowledgebase_service import KnowledgebaseService from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.task_service import GRAPH_RAPTOR_FAKE_DOC_ID, TaskService
from api.db.services.user_service import TenantService from api.db.services.user_service import TenantService
from common.constants import RetCode, FileSource, StatusEnum from common.constants import RetCode, FileSource, StatusEnum
from api.utils.api_utils import ( from api.utils.api_utils import (
@ -47,13 +48,13 @@ from api.utils.validation_utils import (
validate_and_parse_request_args, validate_and_parse_request_args,
) )
from rag.nlp import search from rag.nlp import search
from rag.settings import PAGERANK_FLD from common.constants import PAGERANK_FLD
from common import globals from common import settings
@manager.route("/datasets", methods=["POST"]) # noqa: F821 @manager.route("/datasets", methods=["POST"]) # noqa: F821
@token_required @token_required
def create(tenant_id): async def create(tenant_id):
""" """
Create a new dataset. Create a new dataset.
--- ---
@ -115,17 +116,19 @@ def create(tenant_id):
# | embedding_model| embd_id | # | embedding_model| embd_id |
# | chunk_method | parser_id | # | chunk_method | parser_id |
req, err = validate_and_parse_json_request(request, CreateDatasetReq) req, err = await validate_and_parse_json_request(request, CreateDatasetReq)
if err is not None: if err is not None:
return get_error_argument_result(err) return get_error_argument_result(err)
e, req = KnowledgebaseService.create_with_name(
req = KnowledgebaseService.create_with_name(
name = req.pop("name", None), name = req.pop("name", None),
tenant_id = tenant_id, tenant_id = tenant_id,
parser_id = req.pop("parser_id", None), parser_id = req.pop("parser_id", None),
**req **req
) )
if not e:
return req
# Insert embedding model(embd id) # Insert embedding model(embd id)
ok, t = TenantService.get_by_id(tenant_id) ok, t = TenantService.get_by_id(tenant_id)
if not ok: if not ok:
@ -144,7 +147,6 @@ def create(tenant_id):
ok, k = KnowledgebaseService.get_by_id(req["id"]) ok, k = KnowledgebaseService.get_by_id(req["id"])
if not ok: if not ok:
return get_error_data_result(message="Dataset created failed") return get_error_data_result(message="Dataset created failed")
response_data = remap_dictionary_keys(k.to_dict()) response_data = remap_dictionary_keys(k.to_dict())
return get_result(data=response_data) return get_result(data=response_data)
except Exception as e: except Exception as e:
@ -153,7 +155,7 @@ def create(tenant_id):
@manager.route("/datasets", methods=["DELETE"]) # noqa: F821 @manager.route("/datasets", methods=["DELETE"]) # noqa: F821
@token_required @token_required
def delete(tenant_id): async def delete(tenant_id):
""" """
Delete datasets. Delete datasets.
--- ---
@ -191,7 +193,7 @@ def delete(tenant_id):
schema: schema:
type: object type: object
""" """
req, err = validate_and_parse_json_request(request, DeleteDatasetReq) req, err = await validate_and_parse_json_request(request, DeleteDatasetReq)
if err is not None: if err is not None:
return get_error_argument_result(err) return get_error_argument_result(err)
@ -251,7 +253,7 @@ def delete(tenant_id):
@manager.route("/datasets/<dataset_id>", methods=["PUT"]) # noqa: F821 @manager.route("/datasets/<dataset_id>", methods=["PUT"]) # noqa: F821
@token_required @token_required
def update(tenant_id, dataset_id): async def update(tenant_id, dataset_id):
""" """
Update a dataset. Update a dataset.
--- ---
@ -317,7 +319,7 @@ def update(tenant_id, dataset_id):
# | embedding_model| embd_id | # | embedding_model| embd_id |
# | chunk_method | parser_id | # | chunk_method | parser_id |
extras = {"dataset_id": dataset_id} extras = {"dataset_id": dataset_id}
req, err = validate_and_parse_json_request(request, UpdateDatasetReq, extras=extras, exclude_unset=True) req, err = await validate_and_parse_json_request(request, UpdateDatasetReq, extras=extras, exclude_unset=True)
if err is not None: if err is not None:
return get_error_argument_result(err) return get_error_argument_result(err)
@ -360,11 +362,11 @@ def update(tenant_id, dataset_id):
return get_error_argument_result(message="'pagerank' can only be set when doc_engine is elasticsearch") return get_error_argument_result(message="'pagerank' can only be set when doc_engine is elasticsearch")
if req["pagerank"] > 0: if req["pagerank"] > 0:
globals.docStoreConn.update({"kb_id": kb.id}, {PAGERANK_FLD: req["pagerank"]}, settings.docStoreConn.update({"kb_id": kb.id}, {PAGERANK_FLD: req["pagerank"]},
search.index_name(kb.tenant_id), kb.id) search.index_name(kb.tenant_id), kb.id)
else: else:
# Elasticsearch requires PAGERANK_FLD be non-zero! # Elasticsearch requires PAGERANK_FLD be non-zero!
globals.docStoreConn.update({"exists": PAGERANK_FLD}, {"remove": PAGERANK_FLD}, settings.docStoreConn.update({"exists": PAGERANK_FLD}, {"remove": PAGERANK_FLD},
search.index_name(kb.tenant_id), kb.id) search.index_name(kb.tenant_id), kb.id)
if not KnowledgebaseService.update_by_id(kb.id, req): if not KnowledgebaseService.update_by_id(kb.id, req):
@ -493,9 +495,9 @@ def knowledge_graph(tenant_id, dataset_id):
} }
obj = {"graph": {}, "mind_map": {}} obj = {"graph": {}, "mind_map": {}}
if not globals.docStoreConn.indexExist(search.index_name(kb.tenant_id), dataset_id): if not settings.docStoreConn.indexExist(search.index_name(kb.tenant_id), dataset_id):
return get_result(data=obj) return get_result(data=obj)
sres = globals.retriever.search(req, search.index_name(kb.tenant_id), [dataset_id]) sres = settings.retriever.search(req, search.index_name(kb.tenant_id), [dataset_id])
if not len(sres.ids): if not len(sres.ids):
return get_result(data=obj) return get_result(data=obj)
@ -528,7 +530,161 @@ def delete_knowledge_graph(tenant_id, dataset_id):
code=RetCode.AUTHENTICATION_ERROR code=RetCode.AUTHENTICATION_ERROR
) )
_, kb = KnowledgebaseService.get_by_id(dataset_id) _, kb = KnowledgebaseService.get_by_id(dataset_id)
globals.docStoreConn.delete({"knowledge_graph_kwd": ["graph", "subgraph", "entity", "relation"]}, settings.docStoreConn.delete({"knowledge_graph_kwd": ["graph", "subgraph", "entity", "relation"]},
search.index_name(kb.tenant_id), dataset_id) search.index_name(kb.tenant_id), dataset_id)
return get_result(data=True) return get_result(data=True)
@manager.route("/datasets/<dataset_id>/run_graphrag", methods=["POST"]) # noqa: F821
@token_required
def run_graphrag(tenant_id,dataset_id):
if not dataset_id:
return get_error_data_result(message='Lack of "Dataset ID"')
if not KnowledgebaseService.accessible(dataset_id, tenant_id):
return get_result(
data=False,
message='No authorization.',
code=RetCode.AUTHENTICATION_ERROR
)
ok, kb = KnowledgebaseService.get_by_id(dataset_id)
if not ok:
return get_error_data_result(message="Invalid Dataset ID")
task_id = kb.graphrag_task_id
if task_id:
ok, task = TaskService.get_by_id(task_id)
if not ok:
logging.warning(f"A valid GraphRAG task id is expected for Dataset {dataset_id}")
if task and task.progress not in [-1, 1]:
return get_error_data_result(message=f"Task {task_id} in progress with status {task.progress}. A Graph Task is already running.")
documents, _ = DocumentService.get_by_kb_id(
kb_id=dataset_id,
page_number=0,
items_per_page=0,
orderby="create_time",
desc=False,
keywords="",
run_status=[],
types=[],
suffix=[],
)
if not documents:
return get_error_data_result(message=f"No documents in Dataset {dataset_id}")
sample_document = documents[0]
document_ids = [document["id"] for document in documents]
task_id = queue_raptor_o_graphrag_tasks(sample_doc_id=sample_document, ty="graphrag", priority=0, fake_doc_id=GRAPH_RAPTOR_FAKE_DOC_ID, doc_ids=list(document_ids))
if not KnowledgebaseService.update_by_id(kb.id, {"graphrag_task_id": task_id}):
logging.warning(f"Cannot save graphrag_task_id for Dataset {dataset_id}")
return get_result(data={"graphrag_task_id": task_id})
@manager.route("/datasets/<dataset_id>/trace_graphrag", methods=["GET"]) # noqa: F821
@token_required
def trace_graphrag(tenant_id,dataset_id):
if not dataset_id:
return get_error_data_result(message='Lack of "Dataset ID"')
if not KnowledgebaseService.accessible(dataset_id, tenant_id):
return get_result(
data=False,
message='No authorization.',
code=RetCode.AUTHENTICATION_ERROR
)
ok, kb = KnowledgebaseService.get_by_id(dataset_id)
if not ok:
return get_error_data_result(message="Invalid Dataset ID")
task_id = kb.graphrag_task_id
if not task_id:
return get_result(data={})
ok, task = TaskService.get_by_id(task_id)
if not ok:
return get_result(data={})
return get_result(data=task.to_dict())
@manager.route("/datasets/<dataset_id>/run_raptor", methods=["POST"]) # noqa: F821
@token_required
def run_raptor(tenant_id,dataset_id):
if not dataset_id:
return get_error_data_result(message='Lack of "Dataset ID"')
if not KnowledgebaseService.accessible(dataset_id, tenant_id):
return get_result(
data=False,
message='No authorization.',
code=RetCode.AUTHENTICATION_ERROR
)
ok, kb = KnowledgebaseService.get_by_id(dataset_id)
if not ok:
return get_error_data_result(message="Invalid Dataset ID")
task_id = kb.raptor_task_id
if task_id:
ok, task = TaskService.get_by_id(task_id)
if not ok:
logging.warning(f"A valid RAPTOR task id is expected for Dataset {dataset_id}")
if task and task.progress not in [-1, 1]:
return get_error_data_result(message=f"Task {task_id} in progress with status {task.progress}. A RAPTOR Task is already running.")
documents, _ = DocumentService.get_by_kb_id(
kb_id=dataset_id,
page_number=0,
items_per_page=0,
orderby="create_time",
desc=False,
keywords="",
run_status=[],
types=[],
suffix=[],
)
if not documents:
return get_error_data_result(message=f"No documents in Dataset {dataset_id}")
sample_document = documents[0]
document_ids = [document["id"] for document in documents]
task_id = queue_raptor_o_graphrag_tasks(sample_doc_id=sample_document, ty="raptor", priority=0, fake_doc_id=GRAPH_RAPTOR_FAKE_DOC_ID, doc_ids=list(document_ids))
if not KnowledgebaseService.update_by_id(kb.id, {"raptor_task_id": task_id}):
logging.warning(f"Cannot save raptor_task_id for Dataset {dataset_id}")
return get_result(data={"raptor_task_id": task_id})
@manager.route("/datasets/<dataset_id>/trace_raptor", methods=["GET"]) # noqa: F821
@token_required
def trace_raptor(tenant_id,dataset_id):
if not dataset_id:
return get_error_data_result(message='Lack of "Dataset ID"')
if not KnowledgebaseService.accessible(dataset_id, tenant_id):
return get_result(
data=False,
message='No authorization.',
code=RetCode.AUTHENTICATION_ERROR
)
ok, kb = KnowledgebaseService.get_by_id(dataset_id)
if not ok:
return get_error_data_result(message="Invalid Dataset ID")
task_id = kb.raptor_task_id
if not task_id:
return get_result(data={})
ok, task = TaskService.get_by_id(task_id)
if not ok:
return get_error_data_result(message="RAPTOR Task Not Found or Error Occurred")
return get_result(data=task.to_dict())

View File

@ -15,22 +15,21 @@
# #
import logging import logging
from flask import request, jsonify from quart import request, jsonify
from api.db.services.document_service import DocumentService from api.db.services.document_service import DocumentService
from api.db.services.knowledgebase_service import KnowledgebaseService from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.llm_service import LLMBundle from api.db.services.llm_service import LLMBundle
from api import settings
from api.utils.api_utils import validate_request, build_error_result, apikey_required from api.utils.api_utils import validate_request, build_error_result, apikey_required
from rag.app.tag import label_question from rag.app.tag import label_question
from api.db.services.dialog_service import meta_filter, convert_conditions from api.db.services.dialog_service import meta_filter, convert_conditions
from common.constants import RetCode, LLMType from common.constants import RetCode, LLMType
from common import globals from common import settings
@manager.route('/dify/retrieval', methods=['POST']) # noqa: F821 @manager.route('/dify/retrieval', methods=['POST']) # noqa: F821
@apikey_required @apikey_required
@validate_request("knowledge_id", "query") @validate_request("knowledge_id", "query")
def retrieval(tenant_id): async def retrieval(tenant_id):
""" """
Dify-compatible retrieval API Dify-compatible retrieval API
--- ---
@ -114,7 +113,7 @@ def retrieval(tenant_id):
404: 404:
description: Knowledge base or document not found description: Knowledge base or document not found
""" """
req = request.json req = await request.json
question = req["query"] question = req["query"]
kb_id = req["knowledge_id"] kb_id = req["knowledge_id"]
use_kg = req.get("use_kg", False) use_kg = req.get("use_kg", False)
@ -132,13 +131,11 @@ def retrieval(tenant_id):
return build_error_result(message="Knowledgebase not found!", code=RetCode.NOT_FOUND) return build_error_result(message="Knowledgebase not found!", code=RetCode.NOT_FOUND)
embd_mdl = LLMBundle(kb.tenant_id, LLMType.EMBEDDING.value, llm_name=kb.embd_id) embd_mdl = LLMBundle(kb.tenant_id, LLMType.EMBEDDING.value, llm_name=kb.embd_id)
print(metadata_condition) if metadata_condition:
# print("after", convert_conditions(metadata_condition)) doc_ids.extend(meta_filter(metas, convert_conditions(metadata_condition)))
doc_ids.extend(meta_filter(metas, convert_conditions(metadata_condition))) if not doc_ids and metadata_condition:
# print("doc_ids", doc_ids) doc_ids = ["-999"]
if not doc_ids and metadata_condition is not None: ranks = settings.retriever.retrieval(
doc_ids = ['-999']
ranks = globals.retriever.retrieval(
question, question,
embd_mdl, embd_mdl,
kb.tenant_id, kb.tenant_id,

View File

@ -20,11 +20,10 @@ import re
from io import BytesIO from io import BytesIO
import xxhash import xxhash
from flask import request, send_file from quart import request, send_file
from peewee import OperationalError from peewee import OperationalError
from pydantic import BaseModel, Field, validator from pydantic import BaseModel, Field, validator
from api import settings
from api.constants import FILE_NAME_LEN_LIMIT from api.constants import FILE_NAME_LEN_LIMIT
from api.db import FileType from api.db import FileType
from api.db.db_models import File, Task from api.db.db_models import File, Task
@ -36,15 +35,15 @@ from api.db.services.llm_service import LLMBundle
from api.db.services.tenant_llm_service import TenantLLMService from api.db.services.tenant_llm_service import TenantLLMService
from api.db.services.task_service import TaskService, queue_tasks from api.db.services.task_service import TaskService, queue_tasks
from api.db.services.dialog_service import meta_filter, convert_conditions from api.db.services.dialog_service import meta_filter, convert_conditions
from api.utils.api_utils import check_duplicate_ids, construct_json_result, get_error_data_result, get_parser_config, get_result, server_error_response, token_required from api.utils.api_utils import check_duplicate_ids, construct_json_result, get_error_data_result, get_parser_config, get_result, server_error_response, token_required, \
request_json
from rag.app.qa import beAdoc, rmPrefix from rag.app.qa import beAdoc, rmPrefix
from rag.app.tag import label_question from rag.app.tag import label_question
from rag.nlp import rag_tokenizer, search from rag.nlp import rag_tokenizer, search
from rag.prompts.generator import cross_languages, keyword_extraction from rag.prompts.generator import cross_languages, keyword_extraction
from rag.utils.storage_factory import STORAGE_IMPL
from common.string_utils import remove_redundant_spaces from common.string_utils import remove_redundant_spaces
from common.constants import RetCode, LLMType, ParserType, TaskStatus, FileSource from common.constants import RetCode, LLMType, ParserType, TaskStatus, FileSource
from common import globals from common import settings
MAXIMUM_OF_UPLOADING_FILES = 256 MAXIMUM_OF_UPLOADING_FILES = 256
@ -71,7 +70,7 @@ class Chunk(BaseModel):
@manager.route("/datasets/<dataset_id>/documents", methods=["POST"]) # noqa: F821 @manager.route("/datasets/<dataset_id>/documents", methods=["POST"]) # noqa: F821
@token_required @token_required
def upload(dataset_id, tenant_id): async def upload(dataset_id, tenant_id):
""" """
Upload documents to a dataset. Upload documents to a dataset.
--- ---
@ -95,6 +94,10 @@ def upload(dataset_id, tenant_id):
type: file type: file
required: true required: true
description: Document files to upload. description: Document files to upload.
- in: formData
name: parent_path
type: string
description: Optional nested path under the parent folder. Uses '/' separators.
responses: responses:
200: 200:
description: Successfully uploaded documents. description: Successfully uploaded documents.
@ -128,9 +131,11 @@ def upload(dataset_id, tenant_id):
type: string type: string
description: Processing status. description: Processing status.
""" """
if "file" not in request.files: form = await request.form
files = await request.files
if "file" not in files:
return get_error_data_result(message="No file part!", code=RetCode.ARGUMENT_ERROR) return get_error_data_result(message="No file part!", code=RetCode.ARGUMENT_ERROR)
file_objs = request.files.getlist("file") file_objs = files.getlist("file")
for file_obj in file_objs: for file_obj in file_objs:
if file_obj.filename == "": if file_obj.filename == "":
return get_result(message="No file selected!", code=RetCode.ARGUMENT_ERROR) return get_result(message="No file selected!", code=RetCode.ARGUMENT_ERROR)
@ -153,7 +158,7 @@ def upload(dataset_id, tenant_id):
e, kb = KnowledgebaseService.get_by_id(dataset_id) e, kb = KnowledgebaseService.get_by_id(dataset_id)
if not e: if not e:
raise LookupError(f"Can't find the dataset with ID {dataset_id}!") raise LookupError(f"Can't find the dataset with ID {dataset_id}!")
err, files = FileService.upload_document(kb, file_objs, tenant_id) err, files = FileService.upload_document(kb, file_objs, tenant_id, parent_path=form.get("parent_path"))
if err: if err:
return get_result(message="\n".join(err), code=RetCode.SERVER_ERROR) return get_result(message="\n".join(err), code=RetCode.SERVER_ERROR)
# rename key's name # rename key's name
@ -177,7 +182,7 @@ def upload(dataset_id, tenant_id):
@manager.route("/datasets/<dataset_id>/documents/<document_id>", methods=["PUT"]) # noqa: F821 @manager.route("/datasets/<dataset_id>/documents/<document_id>", methods=["PUT"]) # noqa: F821
@token_required @token_required
def update_doc(tenant_id, dataset_id, document_id): async def update_doc(tenant_id, dataset_id, document_id):
""" """
Update a document within a dataset. Update a document within a dataset.
--- ---
@ -226,7 +231,7 @@ def update_doc(tenant_id, dataset_id, document_id):
schema: schema:
type: object type: object
""" """
req = request.json req = await request_json()
if not KnowledgebaseService.query(id=dataset_id, tenant_id=tenant_id): if not KnowledgebaseService.query(id=dataset_id, tenant_id=tenant_id):
return get_error_data_result(message="You don't own the dataset.") return get_error_data_result(message="You don't own the dataset.")
e, kb = KnowledgebaseService.get_by_id(dataset_id) e, kb = KnowledgebaseService.get_by_id(dataset_id)
@ -308,7 +313,7 @@ def update_doc(tenant_id, dataset_id, document_id):
) )
if not e: if not e:
return get_error_data_result(message="Document not found!") return get_error_data_result(message="Document not found!")
globals.docStoreConn.delete({"doc_id": doc.id}, search.index_name(tenant_id), dataset_id) settings.docStoreConn.delete({"doc_id": doc.id}, search.index_name(tenant_id), dataset_id)
if "enabled" in req: if "enabled" in req:
status = int(req["enabled"]) status = int(req["enabled"])
@ -317,7 +322,7 @@ def update_doc(tenant_id, dataset_id, document_id):
if not DocumentService.update_by_id(doc.id, {"status": str(status)}): if not DocumentService.update_by_id(doc.id, {"status": str(status)}):
return get_error_data_result(message="Database error (Document update)!") return get_error_data_result(message="Database error (Document update)!")
globals.docStoreConn.update({"doc_id": doc.id}, {"available_int": status}, search.index_name(kb.tenant_id), doc.kb_id) settings.docStoreConn.update({"doc_id": doc.id}, {"available_int": status}, search.index_name(kb.tenant_id), doc.kb_id)
return get_result(data=True) return get_result(data=True)
except Exception as e: except Exception as e:
return server_error_response(e) return server_error_response(e)
@ -357,7 +362,7 @@ def update_doc(tenant_id, dataset_id, document_id):
@manager.route("/datasets/<dataset_id>/documents/<document_id>", methods=["GET"]) # noqa: F821 @manager.route("/datasets/<dataset_id>/documents/<document_id>", methods=["GET"]) # noqa: F821
@token_required @token_required
def download(tenant_id, dataset_id, document_id): async def download(tenant_id, dataset_id, document_id):
""" """
Download a document from a dataset. Download a document from a dataset.
--- ---
@ -402,15 +407,15 @@ def download(tenant_id, dataset_id, document_id):
return get_error_data_result(message=f"The dataset not own the document {document_id}.") return get_error_data_result(message=f"The dataset not own the document {document_id}.")
# The process of downloading # The process of downloading
doc_id, doc_location = File2DocumentService.get_storage_address(doc_id=document_id) # minio address doc_id, doc_location = File2DocumentService.get_storage_address(doc_id=document_id) # minio address
file_stream = STORAGE_IMPL.get(doc_id, doc_location) file_stream = settings.STORAGE_IMPL.get(doc_id, doc_location)
if not file_stream: if not file_stream:
return construct_json_result(message="This file is empty.", code=RetCode.DATA_ERROR) return construct_json_result(message="This file is empty.", code=RetCode.DATA_ERROR)
file = BytesIO(file_stream) file = BytesIO(file_stream)
# Use send_file with a proper filename and MIME type # Use send_file with a proper filename and MIME type
return send_file( return await send_file(
file, file,
as_attachment=True, as_attachment=True,
download_name=doc[0].name, attachment_filename=doc[0].name,
mimetype="application/octet-stream", # Set a default MIME type mimetype="application/octet-stream", # Set a default MIME type
) )
@ -587,7 +592,7 @@ def list_docs(dataset_id, tenant_id):
@manager.route("/datasets/<dataset_id>/documents", methods=["DELETE"]) # noqa: F821 @manager.route("/datasets/<dataset_id>/documents", methods=["DELETE"]) # noqa: F821
@token_required @token_required
def delete(tenant_id, dataset_id): async def delete(tenant_id, dataset_id):
""" """
Delete documents from a dataset. Delete documents from a dataset.
--- ---
@ -626,7 +631,7 @@ def delete(tenant_id, dataset_id):
""" """
if not KnowledgebaseService.accessible(kb_id=dataset_id, user_id=tenant_id): if not KnowledgebaseService.accessible(kb_id=dataset_id, user_id=tenant_id):
return get_error_data_result(message=f"You don't own the dataset {dataset_id}. ") return get_error_data_result(message=f"You don't own the dataset {dataset_id}. ")
req = request.json req = await request_json()
if not req: if not req:
doc_ids = None doc_ids = None
else: else:
@ -672,7 +677,7 @@ def delete(tenant_id, dataset_id):
) )
File2DocumentService.delete_by_document_id(doc_id) File2DocumentService.delete_by_document_id(doc_id)
STORAGE_IMPL.rm(b, n) settings.STORAGE_IMPL.rm(b, n)
success_count += 1 success_count += 1
except Exception as e: except Exception as e:
errors += str(e) errors += str(e)
@ -697,7 +702,7 @@ def delete(tenant_id, dataset_id):
@manager.route("/datasets/<dataset_id>/chunks", methods=["POST"]) # noqa: F821 @manager.route("/datasets/<dataset_id>/chunks", methods=["POST"]) # noqa: F821
@token_required @token_required
def parse(tenant_id, dataset_id): async def parse(tenant_id, dataset_id):
""" """
Start parsing documents into chunks. Start parsing documents into chunks.
--- ---
@ -736,7 +741,7 @@ def parse(tenant_id, dataset_id):
""" """
if not KnowledgebaseService.accessible(kb_id=dataset_id, user_id=tenant_id): if not KnowledgebaseService.accessible(kb_id=dataset_id, user_id=tenant_id):
return get_error_data_result(message=f"You don't own the dataset {dataset_id}.") return get_error_data_result(message=f"You don't own the dataset {dataset_id}.")
req = request.json req = await request_json()
if not req.get("document_ids"): if not req.get("document_ids"):
return get_error_data_result("`document_ids` is required") return get_error_data_result("`document_ids` is required")
doc_list = req.get("document_ids") doc_list = req.get("document_ids")
@ -756,7 +761,7 @@ def parse(tenant_id, dataset_id):
return get_error_data_result("Can't parse document that is currently being processed") return get_error_data_result("Can't parse document that is currently being processed")
info = {"run": "1", "progress": 0, "progress_msg": "", "chunk_num": 0, "token_num": 0} info = {"run": "1", "progress": 0, "progress_msg": "", "chunk_num": 0, "token_num": 0}
DocumentService.update_by_id(id, info) DocumentService.update_by_id(id, info)
globals.docStoreConn.delete({"doc_id": id}, search.index_name(tenant_id), dataset_id) settings.docStoreConn.delete({"doc_id": id}, search.index_name(tenant_id), dataset_id)
TaskService.filter_delete([Task.doc_id == id]) TaskService.filter_delete([Task.doc_id == id])
e, doc = DocumentService.get_by_id(id) e, doc = DocumentService.get_by_id(id)
doc = doc.to_dict() doc = doc.to_dict()
@ -780,7 +785,7 @@ def parse(tenant_id, dataset_id):
@manager.route("/datasets/<dataset_id>/chunks", methods=["DELETE"]) # noqa: F821 @manager.route("/datasets/<dataset_id>/chunks", methods=["DELETE"]) # noqa: F821
@token_required @token_required
def stop_parsing(tenant_id, dataset_id): async def stop_parsing(tenant_id, dataset_id):
""" """
Stop parsing documents into chunks. Stop parsing documents into chunks.
--- ---
@ -819,7 +824,7 @@ def stop_parsing(tenant_id, dataset_id):
""" """
if not KnowledgebaseService.accessible(kb_id=dataset_id, user_id=tenant_id): if not KnowledgebaseService.accessible(kb_id=dataset_id, user_id=tenant_id):
return get_error_data_result(message=f"You don't own the dataset {dataset_id}.") return get_error_data_result(message=f"You don't own the dataset {dataset_id}.")
req = request.json req = await request_json()
if not req.get("document_ids"): if not req.get("document_ids"):
return get_error_data_result("`document_ids` is required") return get_error_data_result("`document_ids` is required")
@ -836,7 +841,7 @@ def stop_parsing(tenant_id, dataset_id):
return get_error_data_result("Can't stop parsing document with progress at 0 or 1") return get_error_data_result("Can't stop parsing document with progress at 0 or 1")
info = {"run": "2", "progress": 0, "chunk_num": 0} info = {"run": "2", "progress": 0, "chunk_num": 0}
DocumentService.update_by_id(id, info) DocumentService.update_by_id(id, info)
globals.docStoreConn.delete({"doc_id": doc[0].id}, search.index_name(tenant_id), dataset_id) settings.docStoreConn.delete({"doc_id": doc[0].id}, search.index_name(tenant_id), dataset_id)
success_count += 1 success_count += 1
if duplicate_messages: if duplicate_messages:
if success_count > 0: if success_count > 0:
@ -969,7 +974,7 @@ def list_chunks(tenant_id, dataset_id, document_id):
res = {"total": 0, "chunks": [], "doc": renamed_doc} res = {"total": 0, "chunks": [], "doc": renamed_doc}
if req.get("id"): if req.get("id"):
chunk = globals.docStoreConn.get(req.get("id"), search.index_name(tenant_id), [dataset_id]) chunk = settings.docStoreConn.get(req.get("id"), search.index_name(tenant_id), [dataset_id])
if not chunk: if not chunk:
return get_result(message=f"Chunk not found: {dataset_id}/{req.get('id')}", code=RetCode.NOT_FOUND) return get_result(message=f"Chunk not found: {dataset_id}/{req.get('id')}", code=RetCode.NOT_FOUND)
k = [] k = []
@ -996,8 +1001,8 @@ def list_chunks(tenant_id, dataset_id, document_id):
res["chunks"].append(final_chunk) res["chunks"].append(final_chunk)
_ = Chunk(**final_chunk) _ = Chunk(**final_chunk)
elif globals.docStoreConn.indexExist(search.index_name(tenant_id), dataset_id): elif settings.docStoreConn.indexExist(search.index_name(tenant_id), dataset_id):
sres = globals.retriever.search(query, search.index_name(tenant_id), [dataset_id], emb_mdl=None, highlight=True) sres = settings.retriever.search(query, search.index_name(tenant_id), [dataset_id], emb_mdl=None, highlight=True)
res["total"] = sres.total res["total"] = sres.total
for id in sres.ids: for id in sres.ids:
d = { d = {
@ -1021,7 +1026,7 @@ def list_chunks(tenant_id, dataset_id, document_id):
"/datasets/<dataset_id>/documents/<document_id>/chunks", methods=["POST"] "/datasets/<dataset_id>/documents/<document_id>/chunks", methods=["POST"]
) )
@token_required @token_required
def add_chunk(tenant_id, dataset_id, document_id): async def add_chunk(tenant_id, dataset_id, document_id):
""" """
Add a chunk to a document. Add a chunk to a document.
--- ---
@ -1091,7 +1096,7 @@ def add_chunk(tenant_id, dataset_id, document_id):
if not doc: if not doc:
return get_error_data_result(message=f"You don't own the document {document_id}.") return get_error_data_result(message=f"You don't own the document {document_id}.")
doc = doc[0] doc = doc[0]
req = request.json req = await request_json()
if not str(req.get("content", "")).strip(): if not str(req.get("content", "")).strip():
return get_error_data_result(message="`content` is required") return get_error_data_result(message="`content` is required")
if "important_keywords" in req: if "important_keywords" in req:
@ -1121,7 +1126,7 @@ def add_chunk(tenant_id, dataset_id, document_id):
v, c = embd_mdl.encode([doc.name, req["content"] if not d["question_kwd"] else "\n".join(d["question_kwd"])]) v, c = embd_mdl.encode([doc.name, req["content"] if not d["question_kwd"] else "\n".join(d["question_kwd"])])
v = 0.1 * v[0] + 0.9 * v[1] v = 0.1 * v[0] + 0.9 * v[1]
d["q_%d_vec" % len(v)] = v.tolist() d["q_%d_vec" % len(v)] = v.tolist()
globals.docStoreConn.insert([d], search.index_name(tenant_id), dataset_id) settings.docStoreConn.insert([d], search.index_name(tenant_id), dataset_id)
DocumentService.increment_chunk_num(doc.id, doc.kb_id, c, 1, 0) DocumentService.increment_chunk_num(doc.id, doc.kb_id, c, 1, 0)
# rename keys # rename keys
@ -1150,7 +1155,7 @@ def add_chunk(tenant_id, dataset_id, document_id):
"datasets/<dataset_id>/documents/<document_id>/chunks", methods=["DELETE"] "datasets/<dataset_id>/documents/<document_id>/chunks", methods=["DELETE"]
) )
@token_required @token_required
def rm_chunk(tenant_id, dataset_id, document_id): async def rm_chunk(tenant_id, dataset_id, document_id):
""" """
Remove chunks from a document. Remove chunks from a document.
--- ---
@ -1197,12 +1202,12 @@ def rm_chunk(tenant_id, dataset_id, document_id):
docs = DocumentService.get_by_ids([document_id]) docs = DocumentService.get_by_ids([document_id])
if not docs: if not docs:
raise LookupError(f"Can't find the document with ID {document_id}!") raise LookupError(f"Can't find the document with ID {document_id}!")
req = request.json req = await request_json()
condition = {"doc_id": document_id} condition = {"doc_id": document_id}
if "chunk_ids" in req: if "chunk_ids" in req:
unique_chunk_ids, duplicate_messages = check_duplicate_ids(req["chunk_ids"], "chunk") unique_chunk_ids, duplicate_messages = check_duplicate_ids(req["chunk_ids"], "chunk")
condition["id"] = unique_chunk_ids condition["id"] = unique_chunk_ids
chunk_number = globals.docStoreConn.delete(condition, search.index_name(tenant_id), dataset_id) chunk_number = settings.docStoreConn.delete(condition, search.index_name(tenant_id), dataset_id)
if chunk_number != 0: if chunk_number != 0:
DocumentService.decrement_chunk_num(document_id, dataset_id, 1, chunk_number, 0) DocumentService.decrement_chunk_num(document_id, dataset_id, 1, chunk_number, 0)
if "chunk_ids" in req and chunk_number != len(unique_chunk_ids): if "chunk_ids" in req and chunk_number != len(unique_chunk_ids):
@ -1221,7 +1226,7 @@ def rm_chunk(tenant_id, dataset_id, document_id):
"/datasets/<dataset_id>/documents/<document_id>/chunks/<chunk_id>", methods=["PUT"] "/datasets/<dataset_id>/documents/<document_id>/chunks/<chunk_id>", methods=["PUT"]
) )
@token_required @token_required
def update_chunk(tenant_id, dataset_id, document_id, chunk_id): async def update_chunk(tenant_id, dataset_id, document_id, chunk_id):
""" """
Update a chunk within a document. Update a chunk within a document.
--- ---
@ -1274,7 +1279,7 @@ def update_chunk(tenant_id, dataset_id, document_id, chunk_id):
schema: schema:
type: object type: object
""" """
chunk = globals.docStoreConn.get(chunk_id, search.index_name(tenant_id), [dataset_id]) chunk = settings.docStoreConn.get(chunk_id, search.index_name(tenant_id), [dataset_id])
if chunk is None: if chunk is None:
return get_error_data_result(f"Can't find this chunk {chunk_id}") return get_error_data_result(f"Can't find this chunk {chunk_id}")
if not KnowledgebaseService.accessible(kb_id=dataset_id, user_id=tenant_id): if not KnowledgebaseService.accessible(kb_id=dataset_id, user_id=tenant_id):
@ -1283,7 +1288,7 @@ def update_chunk(tenant_id, dataset_id, document_id, chunk_id):
if not doc: if not doc:
return get_error_data_result(message=f"You don't own the document {document_id}.") return get_error_data_result(message=f"You don't own the document {document_id}.")
doc = doc[0] doc = doc[0]
req = request.json req = await request_json()
if "content" in req: if "content" in req:
content = req["content"] content = req["content"]
else: else:
@ -1319,13 +1324,13 @@ def update_chunk(tenant_id, dataset_id, document_id, chunk_id):
v, c = embd_mdl.encode([doc.name, d["content_with_weight"] if not d.get("question_kwd") else "\n".join(d["question_kwd"])]) v, c = embd_mdl.encode([doc.name, d["content_with_weight"] if not d.get("question_kwd") else "\n".join(d["question_kwd"])])
v = 0.1 * v[0] + 0.9 * v[1] if doc.parser_id != ParserType.QA else v[1] v = 0.1 * v[0] + 0.9 * v[1] if doc.parser_id != ParserType.QA else v[1]
d["q_%d_vec" % len(v)] = v.tolist() d["q_%d_vec" % len(v)] = v.tolist()
globals.docStoreConn.update({"id": chunk_id}, d, search.index_name(tenant_id), dataset_id) settings.docStoreConn.update({"id": chunk_id}, d, search.index_name(tenant_id), dataset_id)
return get_result() return get_result()
@manager.route("/retrieval", methods=["POST"]) # noqa: F821 @manager.route("/retrieval", methods=["POST"]) # noqa: F821
@token_required @token_required
def retrieval_test(tenant_id): async def retrieval_test(tenant_id):
""" """
Retrieve chunks based on a query. Retrieve chunks based on a query.
--- ---
@ -1406,7 +1411,7 @@ def retrieval_test(tenant_id):
format: float format: float
description: Similarity score. description: Similarity score.
""" """
req = request.json req = await request_json()
if not req.get("dataset_ids"): if not req.get("dataset_ids"):
return get_error_data_result("`dataset_ids` is required.") return get_error_data_result("`dataset_ids` is required.")
kb_ids = req["dataset_ids"] kb_ids = req["dataset_ids"]
@ -1465,7 +1470,7 @@ def retrieval_test(tenant_id):
chat_mdl = LLMBundle(kb.tenant_id, LLMType.CHAT) chat_mdl = LLMBundle(kb.tenant_id, LLMType.CHAT)
question += keyword_extraction(chat_mdl, question) question += keyword_extraction(chat_mdl, question)
ranks = globals.retriever.retrieval( ranks = settings.retriever.retrieval(
question, question,
embd_mdl, embd_mdl,
tenant_ids, tenant_ids,

View File

@ -17,9 +17,7 @@
import pathlib import pathlib
import re import re
from quart import request, make_response
import flask
from flask import request
from pathlib import Path from pathlib import Path
from api.db.services.document_service import DocumentService from api.db.services.document_service import DocumentService
@ -32,12 +30,12 @@ from api.db.services import duplicate_name
from api.db.services.file_service import FileService from api.db.services.file_service import FileService
from api.utils.api_utils import get_json_result from api.utils.api_utils import get_json_result
from api.utils.file_utils import filename_type from api.utils.file_utils import filename_type
from rag.utils.storage_factory import STORAGE_IMPL from common import settings
@manager.route('/file/upload', methods=['POST']) # noqa: F821 @manager.route('/file/upload', methods=['POST']) # noqa: F821
@token_required @token_required
def upload(tenant_id): async def upload(tenant_id):
""" """
Upload a file to the system. Upload a file to the system.
--- ---
@ -79,15 +77,17 @@ def upload(tenant_id):
type: string type: string
description: File type (e.g., document, folder) description: File type (e.g., document, folder)
""" """
pf_id = request.form.get("parent_id") form = await request.form
files = await request.files
pf_id = form.get("parent_id")
if not pf_id: if not pf_id:
root_folder = FileService.get_root_folder(tenant_id) root_folder = FileService.get_root_folder(tenant_id)
pf_id = root_folder["id"] pf_id = root_folder["id"]
if 'file' not in request.files: if 'file' not in files:
return get_json_result(data=False, message='No file part!', code=400) return get_json_result(data=False, message='No file part!', code=400)
file_objs = request.files.getlist('file') file_objs = files.getlist('file')
for file_obj in file_objs: for file_obj in file_objs:
if file_obj.filename == '': if file_obj.filename == '':
@ -126,7 +126,7 @@ def upload(tenant_id):
filetype = filename_type(file_obj_names[file_len - 1]) filetype = filename_type(file_obj_names[file_len - 1])
location = file_obj_names[file_len - 1] location = file_obj_names[file_len - 1]
while STORAGE_IMPL.obj_exist(last_folder.id, location): while settings.STORAGE_IMPL.obj_exist(last_folder.id, location):
location += "_" location += "_"
blob = file_obj.read() blob = file_obj.read()
filename = duplicate_name(FileService.query, name=file_obj_names[file_len - 1], parent_id=last_folder.id) filename = duplicate_name(FileService.query, name=file_obj_names[file_len - 1], parent_id=last_folder.id)
@ -142,7 +142,7 @@ def upload(tenant_id):
"size": len(blob), "size": len(blob),
} }
file = FileService.insert(file) file = FileService.insert(file)
STORAGE_IMPL.put(last_folder.id, location, blob) settings.STORAGE_IMPL.put(last_folder.id, location, blob)
file_res.append(file.to_json()) file_res.append(file.to_json())
return get_json_result(data=file_res) return get_json_result(data=file_res)
except Exception as e: except Exception as e:
@ -151,7 +151,7 @@ def upload(tenant_id):
@manager.route('/file/create', methods=['POST']) # noqa: F821 @manager.route('/file/create', methods=['POST']) # noqa: F821
@token_required @token_required
def create(tenant_id): async def create(tenant_id):
""" """
Create a new file or folder. Create a new file or folder.
--- ---
@ -193,9 +193,9 @@ def create(tenant_id):
type: type:
type: string type: string
""" """
req = request.json req = await request.json
pf_id = request.json.get("parent_id") pf_id = await request.json.get("parent_id")
input_file_type = request.json.get("type") input_file_type = await request.json.get("type")
if not pf_id: if not pf_id:
root_folder = FileService.get_root_folder(tenant_id) root_folder = FileService.get_root_folder(tenant_id)
pf_id = root_folder["id"] pf_id = root_folder["id"]
@ -450,7 +450,7 @@ def get_all_parent_folders(tenant_id):
@manager.route('/file/rm', methods=['POST']) # noqa: F821 @manager.route('/file/rm', methods=['POST']) # noqa: F821
@token_required @token_required
def rm(tenant_id): async def rm(tenant_id):
""" """
Delete one or multiple files/folders. Delete one or multiple files/folders.
--- ---
@ -481,7 +481,7 @@ def rm(tenant_id):
type: boolean type: boolean
example: true example: true
""" """
req = request.json req = await request.json
file_ids = req["file_ids"] file_ids = req["file_ids"]
try: try:
for file_id in file_ids: for file_id in file_ids:
@ -497,10 +497,10 @@ def rm(tenant_id):
e, file = FileService.get_by_id(inner_file_id) e, file = FileService.get_by_id(inner_file_id)
if not e: if not e:
return get_json_result(message="File not found!", code=404) return get_json_result(message="File not found!", code=404)
STORAGE_IMPL.rm(file.parent_id, file.location) settings.STORAGE_IMPL.rm(file.parent_id, file.location)
FileService.delete_folder_by_pf_id(tenant_id, file_id) FileService.delete_folder_by_pf_id(tenant_id, file_id)
else: else:
STORAGE_IMPL.rm(file.parent_id, file.location) settings.STORAGE_IMPL.rm(file.parent_id, file.location)
if not FileService.delete(file): if not FileService.delete(file):
return get_json_result(message="Database error (File removal)!", code=500) return get_json_result(message="Database error (File removal)!", code=500)
@ -524,7 +524,7 @@ def rm(tenant_id):
@manager.route('/file/rename', methods=['POST']) # noqa: F821 @manager.route('/file/rename', methods=['POST']) # noqa: F821
@token_required @token_required
def rename(tenant_id): async def rename(tenant_id):
""" """
Rename a file. Rename a file.
--- ---
@ -556,7 +556,7 @@ def rename(tenant_id):
type: boolean type: boolean
example: true example: true
""" """
req = request.json req = await request.json
try: try:
e, file = FileService.get_by_id(req["file_id"]) e, file = FileService.get_by_id(req["file_id"])
if not e: if not e:
@ -585,7 +585,7 @@ def rename(tenant_id):
@manager.route('/file/get/<file_id>', methods=['GET']) # noqa: F821 @manager.route('/file/get/<file_id>', methods=['GET']) # noqa: F821
@token_required @token_required
def get(tenant_id, file_id): async def get(tenant_id, file_id):
""" """
Download a file. Download a file.
--- ---
@ -614,12 +614,12 @@ def get(tenant_id, file_id):
if not e: if not e:
return get_json_result(message="Document not found!", code=404) return get_json_result(message="Document not found!", code=404)
blob = STORAGE_IMPL.get(file.parent_id, file.location) blob = settings.STORAGE_IMPL.get(file.parent_id, file.location)
if not blob: if not blob:
b, n = File2DocumentService.get_storage_address(file_id=file_id) b, n = File2DocumentService.get_storage_address(file_id=file_id)
blob = STORAGE_IMPL.get(b, n) blob = settings.STORAGE_IMPL.get(b, n)
response = flask.make_response(blob) response = await make_response(blob)
ext = re.search(r"\.([^.]+)$", file.name) ext = re.search(r"\.([^.]+)$", file.name)
if ext: if ext:
if file.type == FileType.VISUAL.value: if file.type == FileType.VISUAL.value:
@ -633,7 +633,7 @@ def get(tenant_id, file_id):
@manager.route('/file/mv', methods=['POST']) # noqa: F821 @manager.route('/file/mv', methods=['POST']) # noqa: F821
@token_required @token_required
def move(tenant_id): async def move(tenant_id):
""" """
Move one or multiple files to another folder. Move one or multiple files to another folder.
--- ---
@ -667,7 +667,7 @@ def move(tenant_id):
type: boolean type: boolean
example: true example: true
""" """
req = request.json req = await request.json
try: try:
file_ids = req["src_file_ids"] file_ids = req["src_file_ids"]
parent_id = req["dest_file_id"] parent_id = req["dest_file_id"]
@ -693,8 +693,8 @@ def move(tenant_id):
@manager.route('/file/convert', methods=['POST']) # noqa: F821 @manager.route('/file/convert', methods=['POST']) # noqa: F821
@token_required @token_required
def convert(tenant_id): async def convert(tenant_id):
req = request.json req = await request.json
kb_ids = req["kb_ids"] kb_ids = req["kb_ids"]
file_ids = req["file_ids"] file_ids = req["file_ids"]
file2documents = [] file2documents = []

View File

@ -18,10 +18,9 @@ import re
import time import time
import tiktoken import tiktoken
from flask import Response, jsonify, request from quart import Response, jsonify, request
from agent.canvas import Canvas from agent.canvas import Canvas
from api import settings
from api.db.db_models import APIToken from api.db.db_models import APIToken
from api.db.services.api_service import API4ConversationService from api.db.services.api_service import API4ConversationService
from api.db.services.canvas_service import UserCanvasService, completion_openai from api.db.services.canvas_service import UserCanvasService, completion_openai
@ -41,12 +40,12 @@ from rag.app.tag import label_question
from rag.prompts.template import load_prompt from rag.prompts.template import load_prompt
from rag.prompts.generator import cross_languages, gen_meta_filter, keyword_extraction, chunks_format from rag.prompts.generator import cross_languages, gen_meta_filter, keyword_extraction, chunks_format
from common.constants import RetCode, LLMType, StatusEnum from common.constants import RetCode, LLMType, StatusEnum
from common import globals from common import settings
@manager.route("/chats/<chat_id>/sessions", methods=["POST"]) # noqa: F821 @manager.route("/chats/<chat_id>/sessions", methods=["POST"]) # noqa: F821
@token_required @token_required
def create(tenant_id, chat_id): async def create(tenant_id, chat_id):
req = request.json req = await request.json
req["dialog_id"] = chat_id req["dialog_id"] = chat_id
dia = DialogService.query(tenant_id=tenant_id, id=req["dialog_id"], status=StatusEnum.VALID.value) dia = DialogService.query(tenant_id=tenant_id, id=req["dialog_id"], status=StatusEnum.VALID.value)
if not dia: if not dia:
@ -98,8 +97,8 @@ def create_agent_session(tenant_id, agent_id):
@manager.route("/chats/<chat_id>/sessions/<session_id>", methods=["PUT"]) # noqa: F821 @manager.route("/chats/<chat_id>/sessions/<session_id>", methods=["PUT"]) # noqa: F821
@token_required @token_required
def update(tenant_id, chat_id, session_id): async def update(tenant_id, chat_id, session_id):
req = request.json req = await request.json
req["dialog_id"] = chat_id req["dialog_id"] = chat_id
conv_id = session_id conv_id = session_id
conv = ConversationService.query(id=conv_id, dialog_id=chat_id) conv = ConversationService.query(id=conv_id, dialog_id=chat_id)
@ -120,8 +119,8 @@ def update(tenant_id, chat_id, session_id):
@manager.route("/chats/<chat_id>/completions", methods=["POST"]) # noqa: F821 @manager.route("/chats/<chat_id>/completions", methods=["POST"]) # noqa: F821
@token_required @token_required
def chat_completion(tenant_id, chat_id): async def chat_completion(tenant_id, chat_id):
req = request.json req = await request.json
if not req: if not req:
req = {"question": ""} req = {"question": ""}
if not req.get("session_id"): if not req.get("session_id"):
@ -150,7 +149,7 @@ def chat_completion(tenant_id, chat_id):
@manager.route("/chats_openai/<chat_id>/chat/completions", methods=["POST"]) # noqa: F821 @manager.route("/chats_openai/<chat_id>/chat/completions", methods=["POST"]) # noqa: F821
@validate_request("model", "messages") # noqa: F821 @validate_request("model", "messages") # noqa: F821
@token_required @token_required
def chat_completion_openai_like(tenant_id, chat_id): async def chat_completion_openai_like(tenant_id, chat_id):
""" """
OpenAI-like chat completion API that simulates the behavior of OpenAI's completions endpoint. OpenAI-like chat completion API that simulates the behavior of OpenAI's completions endpoint.
@ -207,7 +206,7 @@ def chat_completion_openai_like(tenant_id, chat_id):
if reference: if reference:
print(completion.choices[0].message.reference) print(completion.choices[0].message.reference)
""" """
req = request.get_json() req = await request.get_json()
need_reference = bool(req.get("reference", False)) need_reference = bool(req.get("reference", False))
@ -384,8 +383,8 @@ def chat_completion_openai_like(tenant_id, chat_id):
@manager.route("/agents_openai/<agent_id>/chat/completions", methods=["POST"]) # noqa: F821 @manager.route("/agents_openai/<agent_id>/chat/completions", methods=["POST"]) # noqa: F821
@validate_request("model", "messages") # noqa: F821 @validate_request("model", "messages") # noqa: F821
@token_required @token_required
def agents_completion_openai_compatibility(tenant_id, agent_id): async def agents_completion_openai_compatibility(tenant_id, agent_id):
req = request.json req = await request.json
tiktokenenc = tiktoken.get_encoding("cl100k_base") tiktokenenc = tiktoken.get_encoding("cl100k_base")
messages = req.get("messages", []) messages = req.get("messages", [])
if not messages: if not messages:
@ -444,8 +443,8 @@ def agents_completion_openai_compatibility(tenant_id, agent_id):
@manager.route("/agents/<agent_id>/completions", methods=["POST"]) # noqa: F821 @manager.route("/agents/<agent_id>/completions", methods=["POST"]) # noqa: F821
@token_required @token_required
def agent_completions(tenant_id, agent_id): async def agent_completions(tenant_id, agent_id):
req = request.json req = await request.json
if req.get("stream", True): if req.get("stream", True):
@ -611,13 +610,13 @@ def list_agent_session(tenant_id, agent_id):
@manager.route("/chats/<chat_id>/sessions", methods=["DELETE"]) # noqa: F821 @manager.route("/chats/<chat_id>/sessions", methods=["DELETE"]) # noqa: F821
@token_required @token_required
def delete(tenant_id, chat_id): async def delete(tenant_id, chat_id):
if not DialogService.query(id=chat_id, tenant_id=tenant_id, status=StatusEnum.VALID.value): if not DialogService.query(id=chat_id, tenant_id=tenant_id, status=StatusEnum.VALID.value):
return get_error_data_result(message="You don't own the chat") return get_error_data_result(message="You don't own the chat")
errors = [] errors = []
success_count = 0 success_count = 0
req = request.json req = await request.json
convs = ConversationService.query(dialog_id=chat_id) convs = ConversationService.query(dialog_id=chat_id)
if not req: if not req:
ids = None ids = None
@ -662,10 +661,10 @@ def delete(tenant_id, chat_id):
@manager.route("/agents/<agent_id>/sessions", methods=["DELETE"]) # noqa: F821 @manager.route("/agents/<agent_id>/sessions", methods=["DELETE"]) # noqa: F821
@token_required @token_required
def delete_agent_session(tenant_id, agent_id): async def delete_agent_session(tenant_id, agent_id):
errors = [] errors = []
success_count = 0 success_count = 0
req = request.json req = await request.json
cvs = UserCanvasService.query(user_id=tenant_id, id=agent_id) cvs = UserCanvasService.query(user_id=tenant_id, id=agent_id)
if not cvs: if not cvs:
return get_error_data_result(f"You don't own the agent {agent_id}") return get_error_data_result(f"You don't own the agent {agent_id}")
@ -717,8 +716,8 @@ def delete_agent_session(tenant_id, agent_id):
@manager.route("/sessions/ask", methods=["POST"]) # noqa: F821 @manager.route("/sessions/ask", methods=["POST"]) # noqa: F821
@token_required @token_required
def ask_about(tenant_id): async def ask_about(tenant_id):
req = request.json req = await request.json
if not req.get("question"): if not req.get("question"):
return get_error_data_result("`question` is required.") return get_error_data_result("`question` is required.")
if not req.get("dataset_ids"): if not req.get("dataset_ids"):
@ -756,8 +755,8 @@ def ask_about(tenant_id):
@manager.route("/sessions/related_questions", methods=["POST"]) # noqa: F821 @manager.route("/sessions/related_questions", methods=["POST"]) # noqa: F821
@token_required @token_required
def related_questions(tenant_id): async def related_questions(tenant_id):
req = request.json req = await request.json
if not req.get("question"): if not req.get("question"):
return get_error_data_result("`question` is required.") return get_error_data_result("`question` is required.")
question = req["question"] question = req["question"]
@ -807,8 +806,8 @@ Related search terms:
@manager.route("/chatbots/<dialog_id>/completions", methods=["POST"]) # noqa: F821 @manager.route("/chatbots/<dialog_id>/completions", methods=["POST"]) # noqa: F821
def chatbot_completions(dialog_id): async def chatbot_completions(dialog_id):
req = request.json req = await request.json
token = request.headers.get("Authorization").split() token = request.headers.get("Authorization").split()
if len(token) != 2: if len(token) != 2:
@ -857,8 +856,8 @@ def chatbots_inputs(dialog_id):
@manager.route("/agentbots/<agent_id>/completions", methods=["POST"]) # noqa: F821 @manager.route("/agentbots/<agent_id>/completions", methods=["POST"]) # noqa: F821
def agent_bot_completions(agent_id): async def agent_bot_completions(agent_id):
req = request.json req = await request.json
token = request.headers.get("Authorization").split() token = request.headers.get("Authorization").split()
if len(token) != 2: if len(token) != 2:
@ -902,7 +901,7 @@ def begin_inputs(agent_id):
@manager.route("/searchbots/ask", methods=["POST"]) # noqa: F821 @manager.route("/searchbots/ask", methods=["POST"]) # noqa: F821
@validate_request("question", "kb_ids") @validate_request("question", "kb_ids")
def ask_about_embedded(): async def ask_about_embedded():
token = request.headers.get("Authorization").split() token = request.headers.get("Authorization").split()
if len(token) != 2: if len(token) != 2:
return get_error_data_result(message='Authorization is not valid!"') return get_error_data_result(message='Authorization is not valid!"')
@ -911,7 +910,7 @@ def ask_about_embedded():
if not objs: if not objs:
return get_error_data_result(message='Authentication error: API key is invalid!"') return get_error_data_result(message='Authentication error: API key is invalid!"')
req = request.json req = await request.json
uid = objs[0].tenant_id uid = objs[0].tenant_id
search_id = req.get("search_id", "") search_id = req.get("search_id", "")
@ -941,7 +940,7 @@ def ask_about_embedded():
@manager.route("/searchbots/retrieval_test", methods=["POST"]) # noqa: F821 @manager.route("/searchbots/retrieval_test", methods=["POST"]) # noqa: F821
@validate_request("kb_id", "question") @validate_request("kb_id", "question")
def retrieval_test_embedded(): async def retrieval_test_embedded():
token = request.headers.get("Authorization").split() token = request.headers.get("Authorization").split()
if len(token) != 2: if len(token) != 2:
return get_error_data_result(message='Authorization is not valid!"') return get_error_data_result(message='Authorization is not valid!"')
@ -950,7 +949,7 @@ def retrieval_test_embedded():
if not objs: if not objs:
return get_error_data_result(message='Authentication error: API key is invalid!"') return get_error_data_result(message='Authentication error: API key is invalid!"')
req = request.json req = await request.json
page = int(req.get("page", 1)) page = int(req.get("page", 1))
size = int(req.get("size", 30)) size = int(req.get("size", 30))
question = req["question"] question = req["question"]
@ -1016,7 +1015,7 @@ def retrieval_test_embedded():
question += keyword_extraction(chat_mdl, question) question += keyword_extraction(chat_mdl, question)
labels = label_question(question, [kb]) labels = label_question(question, [kb])
ranks = globals.retriever.retrieval( ranks = settings.retriever.retrieval(
question, embd_mdl, tenant_ids, kb_ids, page, size, similarity_threshold, vector_similarity_weight, top, question, embd_mdl, tenant_ids, kb_ids, page, size, similarity_threshold, vector_similarity_weight, top,
doc_ids, rerank_mdl=rerank_mdl, highlight=req.get("highlight"), rank_feature=labels doc_ids, rerank_mdl=rerank_mdl, highlight=req.get("highlight"), rank_feature=labels
) )
@ -1040,7 +1039,7 @@ def retrieval_test_embedded():
@manager.route("/searchbots/related_questions", methods=["POST"]) # noqa: F821 @manager.route("/searchbots/related_questions", methods=["POST"]) # noqa: F821
@validate_request("question") @validate_request("question")
def related_questions_embedded(): async def related_questions_embedded():
token = request.headers.get("Authorization").split() token = request.headers.get("Authorization").split()
if len(token) != 2: if len(token) != 2:
return get_error_data_result(message='Authorization is not valid!"') return get_error_data_result(message='Authorization is not valid!"')
@ -1049,7 +1048,7 @@ def related_questions_embedded():
if not objs: if not objs:
return get_error_data_result(message='Authentication error: API key is invalid!"') return get_error_data_result(message='Authentication error: API key is invalid!"')
req = request.json req = await request.json
tenant_id = objs[0].tenant_id tenant_id = objs[0].tenant_id
if not tenant_id: if not tenant_id:
return get_error_data_result(message="permission denined.") return get_error_data_result(message="permission denined.")
@ -1116,7 +1115,7 @@ def detail_share_embedded():
@manager.route("/searchbots/mindmap", methods=["POST"]) # noqa: F821 @manager.route("/searchbots/mindmap", methods=["POST"]) # noqa: F821
@validate_request("question", "kb_ids") @validate_request("question", "kb_ids")
def mindmap(): async def mindmap():
token = request.headers.get("Authorization").split() token = request.headers.get("Authorization").split()
if len(token) != 2: if len(token) != 2:
return get_error_data_result(message='Authorization is not valid!"') return get_error_data_result(message='Authorization is not valid!"')
@ -1126,7 +1125,7 @@ def mindmap():
return get_error_data_result(message='Authentication error: API key is invalid!"') return get_error_data_result(message='Authentication error: API key is invalid!"')
tenant_id = objs[0].tenant_id tenant_id = objs[0].tenant_id
req = request.json req = await request.json
search_id = req.get("search_id", "") search_id = req.get("search_id", "")
search_app = SearchService.get_detail(search_id) if search_id else {} search_app = SearchService.get_detail(search_id) if search_id else {}

View File

@ -14,8 +14,8 @@
# limitations under the License. # limitations under the License.
# #
from flask import request from quart import request
from flask_login import current_user, login_required from api.apps import current_user, login_required
from api.constants import DATASET_NAME_LIMIT from api.constants import DATASET_NAME_LIMIT
from api.db.db_models import DB from api.db.db_models import DB
@ -30,8 +30,8 @@ from api.utils.api_utils import get_data_error_result, get_json_result, not_allo
@manager.route("/create", methods=["post"]) # noqa: F821 @manager.route("/create", methods=["post"]) # noqa: F821
@login_required @login_required
@validate_request("name") @validate_request("name")
def create(): async def create():
req = request.get_json() req = await request.get_json()
search_name = req["name"] search_name = req["name"]
description = req.get("description", "") description = req.get("description", "")
if not isinstance(search_name, str): if not isinstance(search_name, str):
@ -65,8 +65,8 @@ def create():
@login_required @login_required
@validate_request("search_id", "name", "search_config", "tenant_id") @validate_request("search_id", "name", "search_config", "tenant_id")
@not_allowed_parameters("id", "created_by", "create_time", "update_time", "create_date", "update_date", "created_by") @not_allowed_parameters("id", "created_by", "create_time", "update_time", "create_date", "update_date", "created_by")
def update(): async def update():
req = request.get_json() req = await request.get_json()
if not isinstance(req["name"], str): if not isinstance(req["name"], str):
return get_data_error_result(message="Search name must be string.") return get_data_error_result(message="Search name must be string.")
if req["name"].strip() == "": if req["name"].strip() == "":
@ -140,7 +140,7 @@ def detail():
@manager.route("/list", methods=["POST"]) # noqa: F821 @manager.route("/list", methods=["POST"]) # noqa: F821
@login_required @login_required
def list_search_app(): async def list_search_app():
keywords = request.args.get("keywords", "") keywords = request.args.get("keywords", "")
page_number = int(request.args.get("page", 0)) page_number = int(request.args.get("page", 0))
items_per_page = int(request.args.get("page_size", 0)) items_per_page = int(request.args.get("page_size", 0))
@ -150,7 +150,7 @@ def list_search_app():
else: else:
desc = True desc = True
req = request.get_json() req = await request.get_json()
owner_ids = req.get("owner_ids", []) owner_ids = req.get("owner_ids", [])
try: try:
if not owner_ids: if not owner_ids:
@ -173,8 +173,8 @@ def list_search_app():
@manager.route("/rm", methods=["post"]) # noqa: F821 @manager.route("/rm", methods=["post"]) # noqa: F821
@login_required @login_required
@validate_request("search_id") @validate_request("search_id")
def rm(): async def rm():
req = request.get_json() req = await request.get_json()
search_id = req["search_id"] search_id = req["search_id"]
if not SearchService.accessible4deletion(search_id, current_user.id): if not SearchService.accessible4deletion(search_id, current_user.id):
return get_json_result(data=False, message="No authorization.", code=RetCode.AUTHENTICATION_ERROR) return get_json_result(data=False, message="No authorization.", code=RetCode.AUTHENTICATION_ERROR)

View File

@ -17,28 +17,26 @@ import logging
from datetime import datetime from datetime import datetime
import json import json
from flask_login import login_required, current_user from api.apps import login_required, current_user
from api.db.db_models import APIToken from api.db.db_models import APIToken
from api.db.services.api_service import APITokenService from api.db.services.api_service import APITokenService
from api.db.services.knowledgebase_service import KnowledgebaseService from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.user_service import UserTenantService from api.db.services.user_service import UserTenantService
from api import settings
from api.utils.api_utils import ( from api.utils.api_utils import (
get_json_result, get_json_result,
get_data_error_result, get_data_error_result,
server_error_response, server_error_response,
generate_confirmation_token, generate_confirmation_token,
) )
from api.versions import get_ragflow_version from common.versions import get_ragflow_version
from common.time_utils import current_timestamp, datetime_format from common.time_utils import current_timestamp, datetime_format
from rag.utils.storage_factory import STORAGE_IMPL, STORAGE_IMPL_TYPE
from timeit import default_timer as timer from timeit import default_timer as timer
from rag.utils.redis_conn import REDIS_CONN from rag.utils.redis_conn import REDIS_CONN
from flask import jsonify from quart import jsonify
from api.utils.health_utils import run_health_checks from api.utils.health_utils import run_health_checks
from common import globals from common import settings
@manager.route("/version", methods=["GET"]) # noqa: F821 @manager.route("/version", methods=["GET"]) # noqa: F821
@ -101,7 +99,7 @@ def status():
res = {} res = {}
st = timer() st = timer()
try: try:
res["doc_engine"] = globals.docStoreConn.health() res["doc_engine"] = settings.docStoreConn.health()
res["doc_engine"]["elapsed"] = "{:.1f}".format((timer() - st) * 1000.0) res["doc_engine"]["elapsed"] = "{:.1f}".format((timer() - st) * 1000.0)
except Exception as e: except Exception as e:
res["doc_engine"] = { res["doc_engine"] = {
@ -113,15 +111,15 @@ def status():
st = timer() st = timer()
try: try:
STORAGE_IMPL.health() settings.STORAGE_IMPL.health()
res["storage"] = { res["storage"] = {
"storage": STORAGE_IMPL_TYPE.lower(), "storage": settings.STORAGE_IMPL_TYPE.lower(),
"status": "green", "status": "green",
"elapsed": "{:.1f}".format((timer() - st) * 1000.0), "elapsed": "{:.1f}".format((timer() - st) * 1000.0),
} }
except Exception as e: except Exception as e:
res["storage"] = { res["storage"] = {
"storage": STORAGE_IMPL_TYPE.lower(), "storage": settings.STORAGE_IMPL_TYPE.lower(),
"status": "red", "status": "red",
"elapsed": "{:.1f}".format((timer() - st) * 1000.0), "elapsed": "{:.1f}".format((timer() - st) * 1000.0),
"error": str(e), "error": str(e),

View File

@ -14,11 +14,7 @@
# limitations under the License. # limitations under the License.
# #
from flask import request from quart import request
from flask_login import login_required, current_user
from api import settings
from api.apps import smtp_mail_server
from api.db import UserTenantRole from api.db import UserTenantRole
from api.db.db_models import UserTenant from api.db.db_models import UserTenant
from api.db.services.user_service import UserTenantService, UserService from api.db.services.user_service import UserTenantService, UserService
@ -28,6 +24,8 @@ from common.misc_utils import get_uuid
from common.time_utils import delta_seconds from common.time_utils import delta_seconds
from api.utils.api_utils import get_json_result, validate_request, server_error_response, get_data_error_result from api.utils.api_utils import get_json_result, validate_request, server_error_response, get_data_error_result
from api.utils.web_utils import send_invite_email from api.utils.web_utils import send_invite_email
from common import settings
from api.apps import smtp_mail_server, login_required, current_user
@manager.route("/<tenant_id>/user/list", methods=["GET"]) # noqa: F821 @manager.route("/<tenant_id>/user/list", methods=["GET"]) # noqa: F821
@ -51,14 +49,14 @@ def user_list(tenant_id):
@manager.route('/<tenant_id>/user', methods=['POST']) # noqa: F821 @manager.route('/<tenant_id>/user', methods=['POST']) # noqa: F821
@login_required @login_required
@validate_request("email") @validate_request("email")
def create(tenant_id): async def create(tenant_id):
if current_user.id != tenant_id: if current_user.id != tenant_id:
return get_json_result( return get_json_result(
data=False, data=False,
message='No authorization.', message='No authorization.',
code=RetCode.AUTHENTICATION_ERROR) code=RetCode.AUTHENTICATION_ERROR)
req = request.json req = await request.json
invite_user_email = req["email"] invite_user_email = req["email"]
invite_users = UserService.query(email=invite_user_email) invite_users = UserService.query(email=invite_user_email)
if not invite_users: if not invite_users:

View File

@ -22,11 +22,9 @@ import secrets
import time import time
from datetime import datetime from datetime import datetime
from flask import redirect, request, session, make_response from quart import redirect, request, session, make_response
from flask_login import current_user, login_required, login_user, logout_user
from werkzeug.security import check_password_hash, generate_password_hash from werkzeug.security import check_password_hash, generate_password_hash
from api import settings
from api.apps.auth import get_auth_client from api.apps.auth import get_auth_client
from api.db import FileType, UserTenantRole from api.db import FileType, UserTenantRole
from api.db.db_models import TenantLLM from api.db.db_models import TenantLLM
@ -46,7 +44,7 @@ from api.utils.api_utils import (
) )
from api.utils.crypt import decrypt from api.utils.crypt import decrypt
from rag.utils.redis_conn import REDIS_CONN from rag.utils.redis_conn import REDIS_CONN
from api.apps import smtp_mail_server from api.apps import smtp_mail_server, login_required, current_user, login_user, logout_user
from api.utils.web_utils import ( from api.utils.web_utils import (
send_email_html, send_email_html,
OTP_LENGTH, OTP_LENGTH,
@ -58,11 +56,11 @@ from api.utils.web_utils import (
hash_code, hash_code,
captcha_key, captcha_key,
) )
from common import globals from common import settings
@manager.route("/login", methods=["POST", "GET"]) # noqa: F821 @manager.route("/login", methods=["POST", "GET"]) # noqa: F821
def login(): async def login():
""" """
User login endpoint. User login endpoint.
--- ---
@ -92,10 +90,11 @@ def login():
schema: schema:
type: object type: object
""" """
if not request.json: json_body = await request.json
if not json_body:
return get_json_result(data=False, code=RetCode.AUTHENTICATION_ERROR, message="Unauthorized!") return get_json_result(data=False, code=RetCode.AUTHENTICATION_ERROR, message="Unauthorized!")
email = request.json.get("email", "") email = json_body.get("email", "")
users = UserService.query(email=email) users = UserService.query(email=email)
if not users: if not users:
return get_json_result( return get_json_result(
@ -104,7 +103,7 @@ def login():
message=f"Email: {email} is not registered!", message=f"Email: {email} is not registered!",
) )
password = request.json.get("password") password = json_body.get("password")
try: try:
password = decrypt(password) password = decrypt(password)
except BaseException: except BaseException:
@ -126,7 +125,8 @@ def login():
user.update_date = (datetime_format(datetime.now()),) user.update_date = (datetime_format(datetime.now()),)
user.save() user.save()
msg = "Welcome back!" msg = "Welcome back!"
return construct_response(data=response_data, auth=user.get_id(), message=msg)
return await construct_response(data=response_data, auth=user.get_id(), message=msg)
else: else:
return get_json_result( return get_json_result(
data=False, data=False,
@ -502,7 +502,7 @@ def log_out():
@manager.route("/setting", methods=["POST"]) # noqa: F821 @manager.route("/setting", methods=["POST"]) # noqa: F821
@login_required @login_required
def setting_user(): async def setting_user():
""" """
Update user settings. Update user settings.
--- ---
@ -531,7 +531,7 @@ def setting_user():
type: object type: object
""" """
update_dict = {} update_dict = {}
request_data = request.json request_data = await request.json
if request_data.get("password"): if request_data.get("password"):
new_password = request_data.get("new_password") new_password = request_data.get("new_password")
if not check_password_hash(current_user.password, decrypt(request_data["password"])): if not check_password_hash(current_user.password, decrypt(request_data["password"])):
@ -624,7 +624,7 @@ def user_register(user_id, user):
"id": user_id, "id": user_id,
"name": user["nickname"] + "s Kingdom", "name": user["nickname"] + "s Kingdom",
"llm_id": settings.CHAT_MDL, "llm_id": settings.CHAT_MDL,
"embd_id": globals.EMBEDDING_MDL, "embd_id": settings.EMBEDDING_MDL,
"asr_id": settings.ASR_MDL, "asr_id": settings.ASR_MDL,
"parser_ids": settings.PARSERS, "parser_ids": settings.PARSERS,
"img2txt_id": settings.IMAGE2TEXT_MDL, "img2txt_id": settings.IMAGE2TEXT_MDL,
@ -661,7 +661,7 @@ def user_register(user_id, user):
@manager.route("/register", methods=["POST"]) # noqa: F821 @manager.route("/register", methods=["POST"]) # noqa: F821
@validate_request("nickname", "email", "password") @validate_request("nickname", "email", "password")
def user_add(): async def user_add():
""" """
Register a new user. Register a new user.
--- ---
@ -698,7 +698,7 @@ def user_add():
code=RetCode.OPERATING_ERROR, code=RetCode.OPERATING_ERROR,
) )
req = request.json req = await request.json
email_address = req["email"] email_address = req["email"]
# Validate the email address # Validate the email address
@ -738,7 +738,7 @@ def user_add():
raise Exception(f"Same email: {email_address} exists!") raise Exception(f"Same email: {email_address} exists!")
user = users[0] user = users[0]
login_user(user) login_user(user)
return construct_response( return await construct_response(
data=user.to_json(), data=user.to_json(),
auth=user.get_id(), auth=user.get_id(),
message=f"{nickname}, welcome aboard!", message=f"{nickname}, welcome aboard!",
@ -794,7 +794,7 @@ def tenant_info():
@manager.route("/set_tenant_info", methods=["POST"]) # noqa: F821 @manager.route("/set_tenant_info", methods=["POST"]) # noqa: F821
@login_required @login_required
@validate_request("tenant_id", "asr_id", "embd_id", "img2txt_id", "llm_id") @validate_request("tenant_id", "asr_id", "embd_id", "img2txt_id", "llm_id")
def set_tenant_info(): async def set_tenant_info():
""" """
Update tenant information. Update tenant information.
--- ---
@ -831,7 +831,7 @@ def set_tenant_info():
schema: schema:
type: object type: object
""" """
req = request.json req = await request.json
try: try:
tid = req.pop("tenant_id") tid = req.pop("tenant_id")
TenantService.update_by_id(tid, req) TenantService.update_by_id(tid, req)
@ -841,7 +841,7 @@ def set_tenant_info():
@manager.route("/forget/captcha", methods=["GET"]) # noqa: F821 @manager.route("/forget/captcha", methods=["GET"]) # noqa: F821
def forget_get_captcha(): async def forget_get_captcha():
""" """
GET /forget/captcha?email=<email> GET /forget/captcha?email=<email>
- Generate an image captcha and cache it in Redis under key captcha:{email} with TTL = OTP_TTL_SECONDS. - Generate an image captcha and cache it in Redis under key captcha:{email} with TTL = OTP_TTL_SECONDS.
@ -863,19 +863,19 @@ def forget_get_captcha():
from captcha.image import ImageCaptcha from captcha.image import ImageCaptcha
image = ImageCaptcha(width=300, height=120, font_sizes=[50, 60, 70]) image = ImageCaptcha(width=300, height=120, font_sizes=[50, 60, 70])
img_bytes = image.generate(captcha_text).read() img_bytes = image.generate(captcha_text).read()
response = make_response(img_bytes) response = await make_response(img_bytes)
response.headers.set("Content-Type", "image/JPEG") response.headers.set("Content-Type", "image/JPEG")
return response return response
@manager.route("/forget/otp", methods=["POST"]) # noqa: F821 @manager.route("/forget/otp", methods=["POST"]) # noqa: F821
def forget_send_otp(): async def forget_send_otp():
""" """
POST /forget/otp POST /forget/otp
- Verify the image captcha stored at captcha:{email} (case-insensitive). - Verify the image captcha stored at captcha:{email} (case-insensitive).
- On success, generate an email OTP (AZ with length = OTP_LENGTH), store hash + salt (and timestamp) in Redis with TTL, reset attempts and cooldown, and send the OTP via email. - On success, generate an email OTP (AZ with length = OTP_LENGTH), store hash + salt (and timestamp) in Redis with TTL, reset attempts and cooldown, and send the OTP via email.
""" """
req = request.get_json() req = await request.get_json()
email = req.get("email") or "" email = req.get("email") or ""
captcha = (req.get("captcha") or "").strip() captcha = (req.get("captcha") or "").strip()
@ -936,12 +936,12 @@ def forget_send_otp():
@manager.route("/forget", methods=["POST"]) # noqa: F821 @manager.route("/forget", methods=["POST"]) # noqa: F821
def forget(): async def forget():
""" """
POST: Verify email + OTP and reset password, then log the user in. POST: Verify email + OTP and reset password, then log the user in.
Request JSON: { email, otp, new_password, confirm_new_password } Request JSON: { email, otp, new_password, confirm_new_password }
""" """
req = request.get_json() req = await request.get_json()
email = req.get("email") or "" email = req.get("email") or ""
otp = (req.get("otp") or "").strip() otp = (req.get("otp") or "").strip()
new_pwd = req.get("new_password") new_pwd = req.get("new_password")

View File

@ -70,4 +70,7 @@ class PipelineTaskType(StrEnum):
VALID_PIPELINE_TASK_TYPES = {PipelineTaskType.PARSE, PipelineTaskType.DOWNLOAD, PipelineTaskType.RAPTOR, PipelineTaskType.GRAPH_RAG, PipelineTaskType.MINDMAP} VALID_PIPELINE_TASK_TYPES = {PipelineTaskType.PARSE, PipelineTaskType.DOWNLOAD, PipelineTaskType.RAPTOR, PipelineTaskType.GRAPH_RAG, PipelineTaskType.MINDMAP}
PIPELINE_SPECIAL_PROGRESS_FREEZE_TASK_TYPES = {PipelineTaskType.RAPTOR.lower(), PipelineTaskType.GRAPH_RAG.lower(), PipelineTaskType.MINDMAP.lower()}
KNOWLEDGEBASE_FOLDER_NAME=".knowledgebase" KNOWLEDGEBASE_FOLDER_NAME=".knowledgebase"

View File

@ -25,13 +25,13 @@ from datetime import datetime, timezone
from enum import Enum from enum import Enum
from functools import wraps from functools import wraps
from flask_login import UserMixin from quart_auth import AuthUser
from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer
from peewee import InterfaceError, OperationalError, BigIntegerField, BooleanField, CharField, CompositeKey, DateTimeField, Field, FloatField, IntegerField, Metadata, Model, TextField from peewee import InterfaceError, OperationalError, BigIntegerField, BooleanField, CharField, CompositeKey, DateTimeField, Field, FloatField, IntegerField, Metadata, Model, TextField
from playhouse.migrate import MySQLMigrator, PostgresqlMigrator, migrate from playhouse.migrate import MySQLMigrator, PostgresqlMigrator, migrate
from playhouse.pool import PooledMySQLDatabase, PooledPostgresqlDatabase from playhouse.pool import PooledMySQLDatabase, PooledPostgresqlDatabase
from api import settings, utils from api import utils
from api.db import SerializedType from api.db import SerializedType
from api.utils.json_encode import json_dumps, json_loads from api.utils.json_encode import json_dumps, json_loads
from api.utils.configs import deserialize_b64, serialize_b64 from api.utils.configs import deserialize_b64, serialize_b64
@ -39,6 +39,7 @@ from api.utils.configs import deserialize_b64, serialize_b64
from common.time_utils import current_timestamp, timestamp_to_date, date_string_to_timestamp from common.time_utils import current_timestamp, timestamp_to_date, date_string_to_timestamp
from common.decorator import singleton from common.decorator import singleton
from common.constants import ParserType from common.constants import ParserType
from common import settings
CONTINUOUS_FIELD_TYPE = {IntegerField, FloatField, DateTimeField} CONTINUOUS_FIELD_TYPE = {IntegerField, FloatField, DateTimeField}
@ -304,6 +305,7 @@ class RetryingPooledMySQLDatabase(PooledMySQLDatabase):
time.sleep(self.retry_delay * (2 ** attempt)) time.sleep(self.retry_delay * (2 ** attempt))
else: else:
raise raise
return None
class RetryingPooledPostgresqlDatabase(PooledPostgresqlDatabase): class RetryingPooledPostgresqlDatabase(PooledPostgresqlDatabase):
@ -593,7 +595,7 @@ def fill_db_model_object(model_object, human_model_dict):
return model_object return model_object
class User(DataBaseModel, UserMixin): class User(DataBaseModel, AuthUser):
id = CharField(max_length=32, primary_key=True) id = CharField(max_length=32, primary_key=True)
access_token = CharField(max_length=255, null=True, index=True) access_token = CharField(max_length=255, null=True, index=True)
nickname = CharField(max_length=100, null=False, help_text="nicky name", index=True) nickname = CharField(max_length=100, null=False, help_text="nicky name", index=True)
@ -668,6 +670,7 @@ class LLMFactories(DataBaseModel):
name = CharField(max_length=128, null=False, help_text="LLM factory name", primary_key=True) name = CharField(max_length=128, null=False, help_text="LLM factory name", primary_key=True)
logo = TextField(null=True, help_text="llm logo base64") logo = TextField(null=True, help_text="llm logo base64")
tags = CharField(max_length=255, null=False, help_text="LLM, Text Embedding, Image2Text, ASR", index=True) tags = CharField(max_length=255, null=False, help_text="LLM, Text Embedding, Image2Text, ASR", index=True)
rank = IntegerField(default=0, index=False)
status = CharField(max_length=1, null=True, help_text="is it validate(0: wasted, 1: validate)", default="1", index=True) status = CharField(max_length=1, null=True, help_text="is it validate(0: wasted, 1: validate)", default="1", index=True)
def __str__(self): def __str__(self):
@ -770,7 +773,7 @@ class Document(DataBaseModel):
thumbnail = TextField(null=True, help_text="thumbnail base64 string") thumbnail = TextField(null=True, help_text="thumbnail base64 string")
kb_id = CharField(max_length=256, null=False, index=True) kb_id = CharField(max_length=256, null=False, index=True)
parser_id = CharField(max_length=32, null=False, help_text="default parser ID", index=True) parser_id = CharField(max_length=32, null=False, help_text="default parser ID", index=True)
pipeline_id = CharField(max_length=32, null=True, help_text="pipleline ID", index=True) pipeline_id = CharField(max_length=32, null=True, help_text="pipeline ID", index=True)
parser_config = JSONField(null=False, default={"pages": [[1, 1000000]]}) parser_config = JSONField(null=False, default={"pages": [[1, 1000000]]})
source_type = CharField(max_length=128, null=False, default="local", help_text="where dose this document come from", index=True) source_type = CharField(max_length=128, null=False, default="local", help_text="where dose this document come from", index=True)
type = CharField(max_length=32, null=False, help_text="file extension", index=True) type = CharField(max_length=32, null=False, help_text="file extension", index=True)
@ -874,7 +877,7 @@ class Dialog(DataBaseModel):
class Conversation(DataBaseModel): class Conversation(DataBaseModel):
id = CharField(max_length=32, primary_key=True) id = CharField(max_length=32, primary_key=True)
dialog_id = CharField(max_length=32, null=False, index=True) dialog_id = CharField(max_length=32, null=False, index=True)
name = CharField(max_length=255, null=True, help_text="converastion name", index=True) name = CharField(max_length=255, null=True, help_text="conversation name", index=True)
message = JSONField(null=True) message = JSONField(null=True)
reference = JSONField(null=True, default=[]) reference = JSONField(null=True, default=[])
user_id = CharField(max_length=255, null=True, help_text="user_id", index=True) user_id = CharField(max_length=255, null=True, help_text="user_id", index=True)
@ -1063,6 +1066,7 @@ class Connector2Kb(DataBaseModel):
id = CharField(max_length=32, primary_key=True) id = CharField(max_length=32, primary_key=True)
connector_id = CharField(max_length=32, null=False, index=True) connector_id = CharField(max_length=32, null=False, index=True)
kb_id = CharField(max_length=32, null=False, index=True) kb_id = CharField(max_length=32, null=False, index=True)
auto_parse = CharField(max_length=1, null=False, default="1", index=False)
class Meta: class Meta:
db_table = "connector2kb" db_table = "connector2kb"
@ -1281,4 +1285,12 @@ def migrate_db():
migrate(migrator.add_column("tenant_llm", "status", CharField(max_length=1, null=False, help_text="is it validate(0: wasted, 1: validate)", default="1", index=True))) migrate(migrator.add_column("tenant_llm", "status", CharField(max_length=1, null=False, help_text="is it validate(0: wasted, 1: validate)", default="1", index=True)))
except Exception: except Exception:
pass pass
try:
migrate(migrator.add_column("connector2kb", "auto_parse", CharField(max_length=1, null=False, default="1", index=False)))
except Exception:
pass
try:
migrate(migrator.add_column("llm_factories", "rank", IntegerField(default=0, index=False)))
except Exception:
pass
logging.disable(logging.NOTSET) logging.disable(logging.NOTSET)

View File

@ -29,10 +29,9 @@ from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.tenant_llm_service import LLMFactoriesService, TenantLLMService from api.db.services.tenant_llm_service import LLMFactoriesService, TenantLLMService
from api.db.services.llm_service import LLMService, LLMBundle, get_init_tenant_llm from api.db.services.llm_service import LLMService, LLMBundle, get_init_tenant_llm
from api.db.services.user_service import TenantService, UserTenantService from api.db.services.user_service import TenantService, UserTenantService
from api import settings
from common.constants import LLMType from common.constants import LLMType
from common.file_utils import get_project_base_directory from common.file_utils import get_project_base_directory
from common import globals from common import settings
from api.common.base64 import encode_to_base64 from api.common.base64 import encode_to_base64
@ -50,7 +49,7 @@ def init_superuser():
"id": user_info["id"], "id": user_info["id"],
"name": user_info["nickname"] + "s Kingdom", "name": user_info["nickname"] + "s Kingdom",
"llm_id": settings.CHAT_MDL, "llm_id": settings.CHAT_MDL,
"embd_id": globals.EMBEDDING_MDL, "embd_id": settings.EMBEDDING_MDL,
"asr_id": settings.ASR_MDL, "asr_id": settings.ASR_MDL,
"parser_ids": settings.PARSERS, "parser_ids": settings.PARSERS,
"img2txt_id": settings.IMAGE2TEXT_MDL "img2txt_id": settings.IMAGE2TEXT_MDL
@ -90,13 +89,7 @@ def init_superuser():
def init_llm_factory(): def init_llm_factory():
try: LLMFactoriesService.filter_delete([1 == 1])
LLMService.filter_delete([(LLM.fid == "MiniMax" or LLM.fid == "Minimax")])
LLMService.filter_delete([(LLM.fid == "cohere")])
LLMFactoriesService.filter_delete([LLMFactories.name == "cohere"])
except Exception:
pass
factory_llm_infos = settings.FACTORY_LLM_INFOS factory_llm_infos = settings.FACTORY_LLM_INFOS
for factory_llm_info in factory_llm_infos: for factory_llm_info in factory_llm_infos:
info = deepcopy(factory_llm_info) info = deepcopy(factory_llm_info)

View File

@ -16,7 +16,6 @@
import logging import logging
import uuid import uuid
from api import settings
from api.utils.api_utils import group_by from api.utils.api_utils import group_by
from api.db import FileType, UserTenantRole from api.db import FileType, UserTenantRole
from api.db.services.api_service import APITokenService, API4ConversationService from api.db.services.api_service import APITokenService, API4ConversationService
@ -35,10 +34,9 @@ from api.db.services.task_service import TaskService
from api.db.services.tenant_llm_service import TenantLLMService from api.db.services.tenant_llm_service import TenantLLMService
from api.db.services.user_canvas_version import UserCanvasVersionService from api.db.services.user_canvas_version import UserCanvasVersionService
from api.db.services.user_service import TenantService, UserService, UserTenantService from api.db.services.user_service import TenantService, UserService, UserTenantService
from rag.utils.storage_factory import STORAGE_IMPL
from rag.nlp import search from rag.nlp import search
from common.constants import ActiveEnum from common.constants import ActiveEnum
from common import globals from common import settings
def create_new_user(user_info: dict) -> dict: def create_new_user(user_info: dict) -> dict:
""" """
@ -64,7 +62,7 @@ def create_new_user(user_info: dict) -> dict:
"id": user_id, "id": user_id,
"name": user_info["nickname"] + "s Kingdom", "name": user_info["nickname"] + "s Kingdom",
"llm_id": settings.CHAT_MDL, "llm_id": settings.CHAT_MDL,
"embd_id": globals.EMBEDDING_MDL, "embd_id": settings.EMBEDDING_MDL,
"asr_id": settings.ASR_MDL, "asr_id": settings.ASR_MDL,
"parser_ids": settings.PARSERS, "parser_ids": settings.PARSERS,
"img2txt_id": settings.IMAGE2TEXT_MDL, "img2txt_id": settings.IMAGE2TEXT_MDL,
@ -159,8 +157,8 @@ def delete_user_data(user_id: str) -> dict:
if kb_ids: if kb_ids:
# step1.1.1 delete files in storage, remove bucket # step1.1.1 delete files in storage, remove bucket
for kb_id in kb_ids: for kb_id in kb_ids:
if STORAGE_IMPL.bucket_exists(kb_id): if settings.STORAGE_IMPL.bucket_exists(kb_id):
STORAGE_IMPL.remove_bucket(kb_id) settings.STORAGE_IMPL.remove_bucket(kb_id)
done_msg += f"- Removed {len(kb_ids)} dataset's buckets.\n" done_msg += f"- Removed {len(kb_ids)} dataset's buckets.\n"
# step1.1.2 delete file and document info in db # step1.1.2 delete file and document info in db
doc_ids = DocumentService.get_all_doc_ids_by_kb_ids(kb_ids) doc_ids = DocumentService.get_all_doc_ids_by_kb_ids(kb_ids)
@ -180,7 +178,7 @@ def delete_user_data(user_id: str) -> dict:
) )
done_msg += f"- Deleted {file2doc_delete_res} document-file relation records.\n" done_msg += f"- Deleted {file2doc_delete_res} document-file relation records.\n"
# step1.1.3 delete chunk in es # step1.1.3 delete chunk in es
r = globals.docStoreConn.delete({"kb_id": kb_ids}, r = settings.docStoreConn.delete({"kb_id": kb_ids},
search.index_name(tenant_id), kb_ids) search.index_name(tenant_id), kb_ids)
done_msg += f"- Deleted {r} chunk records.\n" done_msg += f"- Deleted {r} chunk records.\n"
kb_delete_res = KnowledgebaseService.delete_by_ids(kb_ids) kb_delete_res = KnowledgebaseService.delete_by_ids(kb_ids)
@ -219,7 +217,7 @@ def delete_user_data(user_id: str) -> dict:
if created_files: if created_files:
# step2.1.1.1 delete file in storage # step2.1.1.1 delete file in storage
for f in created_files: for f in created_files:
STORAGE_IMPL.rm(f.parent_id, f.location) settings.STORAGE_IMPL.rm(f.parent_id, f.location)
done_msg += f"- Deleted {len(created_files)} uploaded file.\n" done_msg += f"- Deleted {len(created_files)} uploaded file.\n"
# step2.1.1.2 delete file record # step2.1.1.2 delete file record
file_delete_res = FileService.delete_by_ids([f.id for f in created_files]) file_delete_res = FileService.delete_by_ids([f.id for f in created_files])
@ -238,7 +236,7 @@ def delete_user_data(user_id: str) -> dict:
kb_doc_info = {} kb_doc_info = {}
for _tenant_id, kb_doc in kb_grouped_doc.items(): for _tenant_id, kb_doc in kb_grouped_doc.items():
for _kb_id, docs in kb_doc.items(): for _kb_id, docs in kb_doc.items():
chunk_delete_res += globals.docStoreConn.delete( chunk_delete_res += settings.docStoreConn.delete(
{"doc_id": [d["id"] for d in docs]}, {"doc_id": [d["id"] for d in docs]},
search.index_name(_tenant_id), _kb_id search.index_name(_tenant_id), _kb_id
) )

View File

@ -13,7 +13,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# #
from api.versions import get_ragflow_version from common.versions import get_ragflow_version
from .reload_config_base import ReloadConfigBase from .reload_config_base import ReloadConfigBase

View File

@ -67,6 +67,7 @@ class UserCanvasService(CommonService):
# will get all permitted agents, be cautious # will get all permitted agents, be cautious
fields = [ fields = [
cls.model.id, cls.model.id,
cls.model.avatar,
cls.model.title, cls.model.title,
cls.model.permission, cls.model.permission,
cls.model.canvas_type, cls.model.canvas_type,

View File

@ -90,7 +90,7 @@ class CommonService:
else: else:
query_records = cls.model.select() query_records = cls.model.select()
if reverse is not None: if reverse is not None:
if not order_by or not hasattr(cls, order_by): if not order_by or not hasattr(cls.model, order_by):
order_by = "create_time" order_by = "create_time"
if reverse is True: if reverse is True:
query_records = query_records.order_by(cls.model.getter_by(order_by).desc()) query_records = query_records.order_by(cls.model.getter_by(order_by).desc())

View File

@ -15,6 +15,7 @@
# #
import logging import logging
from datetime import datetime from datetime import datetime
from typing import Tuple, List
from anthropic import BaseModel from anthropic import BaseModel
from peewee import SQL, fn from peewee import SQL, fn
@ -23,7 +24,6 @@ from api.db import InputType
from api.db.db_models import Connector, SyncLogs, Connector2Kb, Knowledgebase from api.db.db_models import Connector, SyncLogs, Connector2Kb, Knowledgebase
from api.db.services.common_service import CommonService from api.db.services.common_service import CommonService
from api.db.services.document_service import DocumentService from api.db.services.document_service import DocumentService
from api.db.services.file_service import FileService
from common.misc_utils import get_uuid from common.misc_utils import get_uuid
from common.constants import TaskStatus from common.constants import TaskStatus
from common.time_utils import current_timestamp, timestamp_to_date from common.time_utils import current_timestamp, timestamp_to_date
@ -39,17 +39,20 @@ class ConnectorService(CommonService):
if not task: if not task:
if status == TaskStatus.SCHEDULE: if status == TaskStatus.SCHEDULE:
SyncLogsService.schedule(connector_id, c2k.kb_id) SyncLogsService.schedule(connector_id, c2k.kb_id)
ConnectorService.update_by_id(connector_id, {"status": status})
return
if task.status == TaskStatus.DONE: if task.status == TaskStatus.DONE:
if status == TaskStatus.SCHEDULE: if status == TaskStatus.SCHEDULE:
SyncLogsService.schedule(connector_id, c2k.kb_id, task.poll_range_end, total_docs_indexed=task.total_docs_indexed) SyncLogsService.schedule(connector_id, c2k.kb_id, task.poll_range_end, total_docs_indexed=task.total_docs_indexed)
ConnectorService.update_by_id(connector_id, {"status": status})
return
task = task.to_dict() task = task.to_dict()
task["status"] = status task["status"] = status
SyncLogsService.update_by_id(task["id"], task) SyncLogsService.update_by_id(task["id"], task)
ConnectorService.update_by_id(connector_id, {"status": status}) ConnectorService.update_by_id(connector_id, {"status": status})
@classmethod @classmethod
def list(cls, tenant_id): def list(cls, tenant_id):
fields = [ fields = [
@ -62,26 +65,43 @@ class ConnectorService(CommonService):
cls.model.tenant_id == tenant_id cls.model.tenant_id == tenant_id
).dicts()) ).dicts())
@classmethod
def rebuild(cls, kb_id:str, connector_id: str, tenant_id:str):
from api.db.services.file_service import FileService
e, conn = cls.get_by_id(connector_id)
if not e:
return None
SyncLogsService.filter_delete([SyncLogs.connector_id==connector_id, SyncLogs.kb_id==kb_id])
docs = DocumentService.query(source_type=f"{conn.source}/{conn.id}", kb_id=kb_id)
err = FileService.delete_docs([d.id for d in docs], tenant_id)
SyncLogsService.schedule(connector_id, kb_id, reindex=True)
return err
class SyncLogsService(CommonService): class SyncLogsService(CommonService):
model = SyncLogs model = SyncLogs
@classmethod @classmethod
def list_sync_tasks(cls, connector_id=None, page_number=None, items_per_page=15): def list_sync_tasks(cls, connector_id=None, page_number=None, items_per_page=15) -> Tuple[List[dict], int]:
fields = [ fields = [
cls.model.id, cls.model.id,
cls.model.connector_id, cls.model.connector_id,
cls.model.kb_id, cls.model.kb_id,
cls.model.update_date,
cls.model.poll_range_start, cls.model.poll_range_start,
cls.model.poll_range_end, cls.model.poll_range_end,
cls.model.new_docs_indexed, cls.model.new_docs_indexed,
cls.model.total_docs_indexed,
cls.model.error_msg, cls.model.error_msg,
cls.model.full_exception_trace,
cls.model.error_count, cls.model.error_count,
Connector.name, Connector.name,
Connector.source, Connector.source,
Connector.tenant_id, Connector.tenant_id,
Connector.timeout_secs, Connector.timeout_secs,
Knowledgebase.name.alias("kb_name"), Knowledgebase.name.alias("kb_name"),
Knowledgebase.avatar.alias("kb_avatar"),
Connector2Kb.auto_parse,
cls.model.from_beginning.alias("reindex"), cls.model.from_beginning.alias("reindex"),
cls.model.status cls.model.status
] ]
@ -105,10 +125,11 @@ class SyncLogsService(CommonService):
) )
query = query.distinct().order_by(cls.model.update_time.desc()) query = query.distinct().order_by(cls.model.update_time.desc())
total = query.count()
if page_number: if page_number:
query = query.paginate(page_number, items_per_page) query = query.paginate(page_number, items_per_page)
return list(query.dicts()) return list(query.dicts()), total
@classmethod @classmethod
def start(cls, id, connector_id): def start(cls, id, connector_id):
@ -122,13 +143,21 @@ class SyncLogsService(CommonService):
@classmethod @classmethod
def schedule(cls, connector_id, kb_id, poll_range_start=None, reindex=False, total_docs_indexed=0): def schedule(cls, connector_id, kb_id, poll_range_start=None, reindex=False, total_docs_indexed=0):
try:
if cls.model.select().where(cls.model.kb_id == kb_id, cls.model.connector_id == connector_id).count() > 100:
rm_ids = [m.id for m in cls.model.select(cls.model.id).where(cls.model.kb_id == kb_id, cls.model.connector_id == connector_id).order_by(cls.model.update_time.asc()).limit(70)]
deleted = cls.model.delete().where(cls.model.id.in_(rm_ids)).execute()
logging.info(f"[SyncLogService] Cleaned {deleted} old logs.")
except Exception as e:
logging.exception(e)
try: try:
e = cls.query(kb_id=kb_id, connector_id=connector_id, status=TaskStatus.SCHEDULE) e = cls.query(kb_id=kb_id, connector_id=connector_id, status=TaskStatus.SCHEDULE)
if e: if e:
logging.warning(f"{kb_id}--{connector_id} has already had a scheduling sync task which is abnormal.") logging.warning(f"{kb_id}--{connector_id} has already had a scheduling sync task which is abnormal.")
return None return None
reindex = "1" if reindex else "0" reindex = "1" if reindex else "0"
ConnectorService.update_by_id(connector_id, {"status": TaskStatus.SCHEDUL}) ConnectorService.update_by_id(connector_id, {"status": TaskStatus.SCHEDULE})
return cls.save(**{ return cls.save(**{
"id": get_uuid(), "id": get_uuid(),
"kb_id": kb_id, "status": TaskStatus.SCHEDULE, "connector_id": connector_id, "kb_id": kb_id, "status": TaskStatus.SCHEDULE, "connector_id": connector_id,
@ -145,7 +174,7 @@ class SyncLogsService(CommonService):
full_exception_trace=cls.model.full_exception_trace + str(e) full_exception_trace=cls.model.full_exception_trace + str(e)
) \ ) \
.where(cls.model.id == task.id).execute() .where(cls.model.id == task.id).execute()
ConnectorService.update_by_id(connector_id, {"status": TaskStatus.SCHEDUL}) ConnectorService.update_by_id(connector_id, {"status": TaskStatus.SCHEDULE})
@classmethod @classmethod
def increase_docs(cls, id, min_update, max_update, doc_num, err_msg="", error_count=0): def increase_docs(cls, id, min_update, max_update, doc_num, err_msg="", error_count=0):
@ -161,7 +190,8 @@ class SyncLogsService(CommonService):
.where(cls.model.id == id).execute() .where(cls.model.id == id).execute()
@classmethod @classmethod
def duplicate_and_parse(cls, kb, docs, tenant_id, src): def duplicate_and_parse(cls, kb, docs, tenant_id, src, auto_parse=True):
from api.db.services.file_service import FileService
if not docs: if not docs:
return None return None
@ -173,15 +203,17 @@ class SyncLogsService(CommonService):
return self.blob return self.blob
errs = [] errs = []
files = [FileObj(filename=d["semantic_identifier"]+f".{d['extension']}", blob=d["blob"]) for d in docs] files = [FileObj(filename=d["semantic_identifier"]+(f"{d['extension']}" if d["semantic_identifier"][::-1].find(d['extension'][::-1])<0 else ""), blob=d["blob"]) for d in docs]
doc_ids = [] doc_ids = []
err, doc_blob_pairs = FileService.upload_document(kb, files, tenant_id, src) err, doc_blob_pairs = FileService.upload_document(kb, files, tenant_id, src)
errs.extend(err) errs.extend(err)
if not err:
kb_table_num_map = {} kb_table_num_map = {}
for doc, _ in doc_blob_pairs: for doc, _ in doc_blob_pairs:
DocumentService.run(tenant_id, doc, kb_table_num_map) doc_ids.append(doc["id"])
doc_ids.append(doc["id"]) if not auto_parse or auto_parse == "0":
continue
DocumentService.run(tenant_id, doc, kb_table_num_map)
return errs, doc_ids return errs, doc_ids
@ -197,43 +229,21 @@ class Connector2KbService(CommonService):
model = Connector2Kb model = Connector2Kb
@classmethod @classmethod
def link_kb(cls, conn_id:str, kb_ids: list[str], tenant_id:str): def link_connectors(cls, kb_id:str, connectors: list[dict], tenant_id:str):
arr = cls.query(connector_id=conn_id)
old_kb_ids = [a.kb_id for a in arr]
for kb_id in kb_ids:
if kb_id in old_kb_ids:
continue
cls.save(**{
"id": get_uuid(),
"connector_id": conn_id,
"kb_id": kb_id
})
SyncLogsService.schedule(conn_id, kb_id, reindex=True)
errs = []
e, conn = ConnectorService.get_by_id(conn_id)
for kb_id in old_kb_ids:
if kb_id in kb_ids:
continue
cls.filter_delete([cls.model.kb_id==kb_id, cls.model.connector_id==conn_id])
SyncLogsService.filter_update([SyncLogs.connector_id==conn_id, SyncLogs.kb_id==kb_id, SyncLogs.status==TaskStatus.SCHEDULE], {"status": TaskStatus.CANCEL})
docs = DocumentService.query(source_type=f"{conn.source}/{conn.id}")
err = FileService.delete_docs([d.id for d in docs], tenant_id)
if err:
errs.append(err)
return "\n".join(errs)
@classmethod
def link_connectors(cls, kb_id:str, connector_ids: list[str], tenant_id:str):
arr = cls.query(kb_id=kb_id) arr = cls.query(kb_id=kb_id)
old_conn_ids = [a.connector_id for a in arr] old_conn_ids = [a.connector_id for a in arr]
for conn_id in connector_ids: connector_ids = []
for conn in connectors:
conn_id = conn["id"]
connector_ids.append(conn_id)
if conn_id in old_conn_ids: if conn_id in old_conn_ids:
cls.filter_update([cls.model.connector_id==conn_id, cls.model.kb_id==kb_id], {"auto_parse": conn.get("auto_parse", "1")})
continue continue
cls.save(**{ cls.save(**{
"id": get_uuid(), "id": get_uuid(),
"connector_id": conn_id, "connector_id": conn_id,
"kb_id": kb_id "kb_id": kb_id,
"auto_parse": conn.get("auto_parse", "1")
}) })
SyncLogsService.schedule(conn_id, kb_id, reindex=True) SyncLogsService.schedule(conn_id, kb_id, reindex=True)
@ -243,11 +253,15 @@ class Connector2KbService(CommonService):
continue continue
cls.filter_delete([cls.model.kb_id==kb_id, cls.model.connector_id==conn_id]) cls.filter_delete([cls.model.kb_id==kb_id, cls.model.connector_id==conn_id])
e, conn = ConnectorService.get_by_id(conn_id) e, conn = ConnectorService.get_by_id(conn_id)
SyncLogsService.filter_update([SyncLogs.connector_id==conn_id, SyncLogs.kb_id==kb_id, SyncLogs.status==TaskStatus.SCHEDULE], {"status": TaskStatus.CANCEL}) if not e:
docs = DocumentService.query(source_type=f"{conn.source}/{conn.id}") continue
err = FileService.delete_docs([d.id for d in docs], tenant_id) #SyncLogsService.filter_delete([SyncLogs.connector_id==conn_id, SyncLogs.kb_id==kb_id])
if err: # Do not delete docs while unlinking.
errs.append(err) SyncLogsService.filter_update([SyncLogs.connector_id==conn_id, SyncLogs.kb_id==kb_id, SyncLogs.status.in_([TaskStatus.SCHEDULE, TaskStatus.RUNNING])], {"status": TaskStatus.CANCEL})
#docs = DocumentService.query(source_type=f"{conn.source}/{conn.id}")
#err = FileService.delete_docs([d.id for d in docs], tenant_id)
#if err:
# errs.append(err)
return "\n".join(errs) return "\n".join(errs)
@classmethod @classmethod
@ -256,6 +270,7 @@ class Connector2KbService(CommonService):
Connector.id, Connector.id,
Connector.source, Connector.source,
Connector.name, Connector.name,
cls.model.auto_parse,
Connector.status Connector.status
] ]
return list(cls.model.select(*fields)\ return list(cls.model.select(*fields)\
@ -265,3 +280,5 @@ class Connector2KbService(CommonService):
).dicts() ).dicts()
) )

View File

@ -25,7 +25,6 @@ import trio
from langfuse import Langfuse from langfuse import Langfuse
from peewee import fn from peewee import fn
from agentic_reasoning import DeepResearcher from agentic_reasoning import DeepResearcher
from api import settings
from common.constants import LLMType, ParserType, StatusEnum from common.constants import LLMType, ParserType, StatusEnum
from api.db.db_models import DB, Dialog from api.db.db_models import DB, Dialog
from api.db.services.common_service import CommonService from api.db.services.common_service import CommonService
@ -44,7 +43,7 @@ from rag.prompts.generator import chunks_format, citation_prompt, cross_language
from common.token_utils import num_tokens_from_string from common.token_utils import num_tokens_from_string
from rag.utils.tavily_conn import Tavily from rag.utils.tavily_conn import Tavily
from common.string_utils import remove_redundant_spaces from common.string_utils import remove_redundant_spaces
from common import globals from common import settings
class DialogService(CommonService): class DialogService(CommonService):
@ -343,7 +342,7 @@ def chat(dialog, messages, stream=True, **kwargs):
if not dialog.kb_ids and not dialog.prompt_config.get("tavily_api_key"): if not dialog.kb_ids and not dialog.prompt_config.get("tavily_api_key"):
for ans in chat_solo(dialog, messages, stream): for ans in chat_solo(dialog, messages, stream):
yield ans yield ans
return return None
chat_start_ts = timer() chat_start_ts = timer()
@ -373,7 +372,7 @@ def chat(dialog, messages, stream=True, **kwargs):
chat_mdl.bind_tools(toolcall_session, tools) chat_mdl.bind_tools(toolcall_session, tools)
bind_models_ts = timer() bind_models_ts = timer()
retriever = globals.retriever retriever = settings.retriever
questions = [m["content"] for m in messages if m["role"] == "user"][-3:] questions = [m["content"] for m in messages if m["role"] == "user"][-3:]
attachments = kwargs["doc_ids"].split(",") if "doc_ids" in kwargs else [] attachments = kwargs["doc_ids"].split(",") if "doc_ids" in kwargs else []
if "doc_ids" in messages[-1]: if "doc_ids" in messages[-1]:
@ -387,7 +386,7 @@ def chat(dialog, messages, stream=True, **kwargs):
ans = use_sql(questions[-1], field_map, dialog.tenant_id, chat_mdl, prompt_config.get("quote", True), dialog.kb_ids) ans = use_sql(questions[-1], field_map, dialog.tenant_id, chat_mdl, prompt_config.get("quote", True), dialog.kb_ids)
if ans: if ans:
yield ans yield ans
return return None
for p in prompt_config["parameters"]: for p in prompt_config["parameters"]:
if p["key"] == "knowledge": if p["key"] == "knowledge":
@ -618,9 +617,16 @@ def chat(dialog, messages, stream=True, **kwargs):
res["audio_binary"] = tts(tts_mdl, answer) res["audio_binary"] = tts(tts_mdl, answer)
yield res yield res
return None
def use_sql(question, field_map, tenant_id, chat_mdl, quota=True, kb_ids=None): def use_sql(question, field_map, tenant_id, chat_mdl, quota=True, kb_ids=None):
sys_prompt = "You are a Database Administrator. You need to check the fields of the following tables based on the user's list of questions and write the SQL corresponding to the last question." sys_prompt = """
You are a Database Administrator. You need to check the fields of the following tables based on the user's list of questions and write the SQL corresponding to the last question.
Ensure that:
1. Field names should not start with a digit. If any field name starts with a digit, use double quotes around it.
2. Write only the SQL, no explanations or additional text.
"""
user_prompt = """ user_prompt = """
Table name: {}; Table name: {};
Table of database fields are as follows: Table of database fields are as follows:
@ -641,6 +647,7 @@ Please write the SQL, only SQL, without any other explanations or text.
sql = re.sub(r".*select ", "select ", sql.lower()) sql = re.sub(r".*select ", "select ", sql.lower())
sql = re.sub(r" +", " ", sql) sql = re.sub(r" +", " ", sql)
sql = re.sub(r"([;]|```).*", "", sql) sql = re.sub(r"([;]|```).*", "", sql)
sql = re.sub(r"&", "and", sql)
if sql[: len("select ")] != "select ": if sql[: len("select ")] != "select ":
return None, None return None, None
if not re.search(r"((sum|avg|max|min)\(|group by )", sql.lower()): if not re.search(r"((sum|avg|max|min)\(|group by )", sql.lower()):
@ -665,7 +672,7 @@ Please write the SQL, only SQL, without any other explanations or text.
logging.debug(f"{question} get SQL(refined): {sql}") logging.debug(f"{question} get SQL(refined): {sql}")
tried_times += 1 tried_times += 1
return globals.retriever.sql_retrieval(sql, format="json"), sql return settings.retriever.sql_retrieval(sql, format="json"), sql
tbl, sql = get_table() tbl, sql = get_table()
if tbl is None: if tbl is None:
@ -740,7 +747,7 @@ Please write the SQL, only SQL, without any other explanations or text.
def tts(tts_mdl, text): def tts(tts_mdl, text):
if not tts_mdl or not text: if not tts_mdl or not text:
return return None
bin = b"" bin = b""
for chunk in tts_mdl.tts(text): for chunk in tts_mdl.tts(text):
bin += chunk bin += chunk
@ -759,7 +766,7 @@ def ask(question, kb_ids, tenant_id, chat_llm_name=None, search_config={}):
embedding_list = list(set([kb.embd_id for kb in kbs])) embedding_list = list(set([kb.embd_id for kb in kbs]))
is_knowledge_graph = all([kb.parser_id == ParserType.KG for kb in kbs]) is_knowledge_graph = all([kb.parser_id == ParserType.KG for kb in kbs])
retriever = globals.retriever if not is_knowledge_graph else settings.kg_retriever retriever = settings.retriever if not is_knowledge_graph else settings.kg_retriever
embd_mdl = LLMBundle(tenant_id, LLMType.EMBEDDING, embedding_list[0]) embd_mdl = LLMBundle(tenant_id, LLMType.EMBEDDING, embedding_list[0])
chat_mdl = LLMBundle(tenant_id, LLMType.CHAT, chat_llm_name) chat_mdl = LLMBundle(tenant_id, LLMType.CHAT, chat_llm_name)
@ -855,7 +862,7 @@ def gen_mindmap(question, kb_ids, tenant_id, search_config={}):
if not doc_ids: if not doc_ids:
doc_ids = None doc_ids = None
ranks = globals.retriever.retrieval( ranks = settings.retriever.retrieval(
question=question, question=question,
embd_mdl=embd_mdl, embd_mdl=embd_mdl,
tenant_ids=tenant_ids, tenant_ids=tenant_ids,

Some files were not shown because too many files have changed in this diff Show More