### What problem does this PR solve?
Fix: Fixed the issue that the script text of the code operator is not
displayed after refreshing the page after saving the script text of the
code operator #4977
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Refactor the MessageForm with shadcn #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add more robust fallbacks for citations
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
change default models to buildin models
https://github.com/infiniflow/ragflow/issues/7774
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
1. Add sandbox options for max memory and timeout.
2. Malicious code detection for Python only.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add code_executor_manager. #4977.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Translate the begin operator #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Reconstruct the QueryTable of BeginForm using shandcn #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Synchronize BeginForm's query data to the canvas #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: xiaohzho <xiaohzho@cisco.com>
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
Feat: Verify the parameters of the begin operator #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Refactor BeginForm with shadcn #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
This small PR resolves the regex library warnings showing in Python3.11:
```python
DeprecationWarning: 'count' is passed as positional argument
```
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/7761
but it may be difficult to achieve 0 delay (which need to pass the
cancel token to all parts)
Another solution is just 0 delay effect at UI.
And task will stop latter
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add return value widget to CodeForm #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Switching the programming language of the code operator will
switch the corresponding language template #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Fixed the issue where the page would refresh continuously when
opening the sheet on the right side of the canvas #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
delete useless image blobs when the task executor meets edge cases
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Render the agent list page by page #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Migrate the code operator to the new agent. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: The image displayed in the reply message can also be clicked to
display the location of the source document where the slice is located
#7623
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
This PR introduces Pydantic-based validation for the list datasets HTTP
API, improving code clarity and robustness. Key changes include:
Pydantic Validation
Error Handling
Test Updates
Documentation Updates
### Type of change
- [x] Documentation Update
- [x] Refactoring
### What problem does this PR solve?
Add OAuth `state` parameter for CSRF protection:
- Updated `get_authorization_url()` to accept an optional state
parameter
- Generated a unique state value during OAuth login and stored in
session
- Verified state parameter in callback to ensure request legitimacy
This PR follows OAuth 2.0 security best practices by ensuring that the
authorization request originates from the same user who initiated the
flow.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix:When you create a new API module named xxxa_api, the access route
will become xxx instead of xxxa. For example, when I create a new API
module named 'data_api', the access route will become 'dat' instead of
'data'
Fix:Fixed the issue where the new knowledge base would not be renamed
when there was a knowledge base with the same name
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: tangyu <1@1.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Modify the Python language template code of the code operator
#4977
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
More fallbacks for bad citation format
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
### What problem does this PR solve?
Fixed uncaptured figure data with position information. #7466, #7681
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Try the best to repair corrupted PDF files on upload automatically.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
- Updated the dialog settings function to add a default prompt
configuration for no dataset.
- The prompt configuration will be determined based on the presence of
`kb_ids` in the request.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (Non-breaking change, adding functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
---------
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
### What problem does this PR solve?
## Cause of the bug:
During the execution process, due to improper use of trio
CapacityLimiter, the configuration parameter MAX_CONCURRENT_TASKS is
invalid, causing the executor to take out a large number of tasks from
the Redis queue at one time.
This behavior will cause the task executor to occupy too much memory and
be killed by the OS when a large number of tasks exist at the same time.
As a result, all executing tasks are suspended.
## Fix:
Added the task_manager method to the entry of /rag/svr/task_executor.py
to make CapacityLimiter effective. Deleted the invalid async with
statement.
## Fix result:
After testing, the task executor execution meets expectations, that is:
concurrent execution of up to $MAX_CONCURRENT_TASKS tasks.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Hello, when I input a very long line in the chat input box, it will fail
with following error:
```
2025-05-17 16:11:26,004 ERROR 182558 value too long for type character varying(255)
Traceback (most recent call last):
File "/var/home/sfc/Projects/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 3291, in execute_sql
cursor.execute(sql, params or ())
psycopg2.errors.StringDataRightTruncation: value too long for type character varying(255)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/home/sfc/Projects/ragflow/api/apps/conversation_app.py", line 68, in set_conversation
ConversationService.save(**conv)
File "/var/home/sfc/Projects/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 3128, in inner
return fn(*args, **kwargs)
File "/var/home/sfc/Projects/ragflow/api/db/services/common_service.py", line 145, in save
return cls.save_n(**kwargs)
File "/var/home/sfc/Projects/ragflow/api/db/services/common_service.py", line 139, in save_n
sample_obj = cls.model(**kwargs).save(force_insert=True)
File "/var/home/sfc/Projects/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 6923, in save
pk = self.insert(**field_dict).execute()
File "/var/home/sfc/Projects/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 2011, in inner
return method(self, database, *args, **kwargs)
File "/var/home/sfc/Projects/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 2082, in execute
return self._execute(database)
File "/var/home/sfc/Projects/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 2887, in _execute
return super(Insert, self)._execute(database)
File "/var/home/sfc/Projects/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 2598, in _execute
cursor = self.execute_returning(database)
File "/var/home/sfc/Projects/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 2605, in execute_returning
cursor = database.execute(self)
File "/var/home/sfc/Projects/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 3299, in execute
return self.execute_sql(sql, params)
File "/var/home/sfc/Projects/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 3289, in execute_sql
with __exception_wrapper__:
File "/var/home/sfc/Projects/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 3059, in __exit__
reraise(new_type, new_type(exc_value, *exc_args), traceback)
File "/var/home/sfc/Projects/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 192, in reraise
raise value.with_traceback(tb)
File "/var/home/sfc/Projects/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 3291, in execute_sql
cursor.execute(sql, params or ())
peewee.DataError: value too long for type character varying(255)
```
This PR fix it by truncate the `name` field in the `set_conversation`
method in the `conversation_app.py`.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Feat: Rendering recall test page #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: Fixed the issue where message references could not be displayed
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Hello, our use case requires LLM agent to invoke some tools, so I made a
simple implementation here.
This PR does two things:
1. A simple plugin mechanism based on `pluginlib`:
This mechanism lives in the `plugin` directory. It will only load
plugins from `plugin/embedded_plugins` for now.
A sample plugin `bad_calculator.py` is placed in
`plugin/embedded_plugins/llm_tools`, it accepts two numbers `a` and `b`,
then give a wrong result `a + b + 100`.
In the future, it can load plugins from external location with little
code change.
Plugins are divided into different types. The only plugin type supported
in this PR is `llm_tools`, which must implement the `LLMToolPlugin`
class in the `plugin/llm_tool_plugin.py`.
More plugin types can be added in the future.
2. A tool selector in the `Generate` component:
Added a tool selector to select one or more tools for LLM:

And with the `bad_calculator` tool, it results this with the `qwen-max`
model:

### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
### What problem does this PR solve?
Improve oauth configuration documentation and examples.
- Related pull requests:
- #7379
- #7553
- #7587
- Related issues:
- #3495
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
Fix: Fixed the issue where the height of the chat page shared externally
did not fill the window #7460
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Launch sandbox from docker-compose.
#4977
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
---------
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
### What problem does this PR solve?
Close#7655
Based on the codes atthe api_app, I think the reference is one-to-one
with the message
`
def fillin_conv(ans):
nonlocal conv, message_id
if not conv.reference:
conv.reference.append(ans["reference"])
else:
conv.reference[-1] = ans["reference"]
conv.message[-1] = {"role": "assistant", "content": ans["answer"], "id":
message_id}
ans["id"] = message_id
`
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add code agent component.
#4977
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
This PR introduces Pydantic-based validation for the delete dataset HTTP
API, improving code clarity and robustness. Key changes include:
1. Pydantic Validation
2. Error Handling
3. Test Updates
4. Documentation Updates
### Type of change
- [x] Documentation Update
- [x] Refactoring
### What problem does this PR solve?
Feat: Fixed the issue where the dataset configuration page kept
refreshing #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: Use DOMPurify to filter out dangerous HTML #7668
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add the JS code (or other) executor component to Agent. #4977
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Deprecate `/github_callback` route in favor of
`/oauth/callback/<channel>` for GitHub OAuth integration:
- Added GitHub OAuth support in the authentication module
- Introduced `GithubOAuthClient` with methods to fetch and normalize
user info
- Updated `CLIENT_TYPES` to include GitHub OAuth client
- Deprecated `/github_callback` route and suggested using the generic
`/oauth/callback/<channel>` route
---
- Related pull requests:
- #7379
- #7553
### Usage
- [Create a GitHub OAuth
App](https://github.com/settings/applications/new) to obtain the
`client_id` and `client_secret`, configure the authorization callback
url: `https://your-app.com/v1/user/oauth/callback/github`
- Edit `service_conf.yaml.template`:
```yaml
# ...
oauth:
github:
type: "github"
icon: "github"
display_name: "Github"
client_id: "your_client_id"
client_secret: "your_client_secret"
redirect_uri: "https://your-app.com/v1/user/oauth/callback/github"
# ...
```
### Type of change
- [x] Documentation Update
- [x] Refactoring (non-breaking change)
TOML-table-based project.license is deprecated as per PEP 639, see:
https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license-and-license-files
### What problem does this PR solve?
The following error when building project (e.g. `uv build`)
```
SetuptoolsDeprecationWarning: `project.license` as a TOML table is deprecated
!!
********************************************************************************
Please use a simple string containing a SPDX expression for `project.license`. You can also use `project.license-files`. (Both options available on setuptools>=77.0.0).
By 2026-Feb-18, you need to update your project and remove deprecated calls
or your builds will no longer be supported.
See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
********************************************************************************
!!
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
For `uv package`/`uv pip install ".[full]"`, bug introduced in #6370:
* Removes erroneous (non-package) directories (`helm`, `flask_session`)
* Adds `mcp.server` package
* Resolves "warning: package would be ignored" ambiguity by changing
`sdk` to `sdk.python.ragflow_sdk`
* Resolves "error: package directory 'intergrations' does not exist" by
including `intergrations.chatgpt-on-wechat.plugins` explicitly
* Also rearranges packages in alphabetical order, for DX.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
When Delete Chunk Will Also Delete Chunk Related Image
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Other (please describe): llm factories update
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
Feat: Add data set configuration form #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Display inline (non-quoted) images in the chat and search modules
#7623
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Update 7 readme
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Add libjemalloc installation command. If the operating system does not
have the libjemalloc library, the execution of entrypoint.sh and
launch_backend_service.sh will be interrupted, and the
rag/svr/task_executor.py script will not be started normally.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Add frontend support for third-party login integration:
- Used `getLoginChannels` API to fetch available login channels from the
server
- Used `loginWithChannel` function to initiate login based on the
selected channel
- Refactored `useLoginWithGithub` hook to `useOAuthCallback` for
generalized OAuth callback handling
- Updated the login page to dynamically render third-party login buttons
based on the fetched channel list
- Styled third-party login buttons to improve user experience
- Removed unused code snippets
> This PR removes the previously hardcoded GitHub login button. Since
the functionality only worked when `location.host` was equal to
`demo.ragflow.io`, and the authentication logic is now based on
`login.ragflow.io`, this change does not affect the existing logic and
is considered a non-breaking change
---
#### Frontend Screenshot && Backend Configuration

```yaml
# docker/service_conf.yaml.template
# ...
oauth:
github:
icon: github
display_name: "Github"
# ...
custom_channel:
display_name: "OIDC"
# ...
custom_channel_2:
display_name: "OAuth2"
# ...
```
---
- Related pull requests:
- #7379
- #7521
- Related issues:
- #3495
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
- [x] Performance Improvement
### What problem does this PR solve?
Feat: Show images in reply messages #7608
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Fixed the issue where the chat page would jump after entering the
homepage #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Adjust the display position of recall test item images #7608
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Info of whether applying graph resolution and community extraction is
storage in `task["kb_parser_config"]`. However, previous code get
`graphrag_conf` from `task["parser_config"]`, making `with_resolution`
and `with_community` are always false.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Feat: Add FormContainer component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix HTTP API Create/Update dataset parser config default value error
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Adds support for the GPT-4.1 series from OpenAI.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Hello, we are using ragflow as a backend service, so we need to manage
agents from our own frontend. So adding these http APIs to manage
agents.
The code logic is copied and modified from the `rm` and `save` methods
in `api/apps/canvas_app.py`.
btw, I found that the `save` method in `canvas_app.py` actually allows
to modify an agent to an existing title, so I kept the behavior in the
http api. I'm not sure if this is intentional.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Since `import markdown.markdown` has been changed to `import markdown`
in `rag/app/naive.py`, previous code for converting markdown tables
would call a markdown module instead of a callable function. This cause
error.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Modified the chart to retain persistent volumes by default when the
chart is uninstalled, following established best practices in the Helm
community (e.g., Bitnami charts)
### What problem does this PR solve?
Previously, deleting the helm chart would automatically remove all
persistent data, which poses a risk of accidental data loss.
### Rationale
This change aligns with industry standards to safeguard data by
requiring explicit action to remove persistence, rather than making
deletion the default behavior.
### Impact:
Users who intentionally want to remove persistent data will need to do
so manually or by setting appropriate flags during chart uninstallation.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
As RAGFlow has an integration with Langfuse, this docs page shows how to
configure Langfuse tracing.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
This PR introduces Pydantic-based validation for the update dataset HTTP
API, improving code clarity and robustness. Key changes include:
1. Pydantic Validation
2. Error Handling
3. Test Updates
4. Documentation Updates
5. fix bug: #5915
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
- [x] Refactoring
### What problem does this PR solve?
Fixes bug & regression introduced by [PR #7187 - refactor: Update Redis
configuration to use StatefulSet instead of deployment with
pvc](https://github.com/infiniflow/ragflow/pull/7187):
1. Fixes bug #7403 - `redis.persistence.enabled` missing from
`helm/values.yaml` causes helm error:
[ERROR] templates/: template: ragflow/templates/redis.yaml:55:24:
executing "ragflow/templates/redis.yaml" at
<.Values.redis.persistence.enabled>: nil pointer evaluating interface
{}.enabled
2. Fixes regression: reverts hardcoded redis.storage.capacity value back
to using variable `redis.storage.capacity` from `helm/values.yaml`.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
1. The MySQL instance is configured with max_connections=1000,
but our connection pool was limited to max_connections: 100.
This mismatch caused connection pool exhaustion during performance
testing.
2. Increase stale_timeout to resolve#6548
### Type of change
- [x] Performance Improvement
### What problem does this PR solve?
Feat: Cross-language query #7376#4503#5710#7470
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Kb detail supports return document total size now.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add scheduled workflow for daily HTTP API full tests
Configure cron job to trigger at 16:00:00Z(00:00:00+08:00)
### Type of change
- [X] CI update
### What problem does this PR solve?
Feat: Replace the submit form button with ButtonLoading #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Two Case when local Es tag search has result which is filtered by score
1: Doc has empty tag,and not visi LLM
2: Code may use empty examples in Prompt for LLM search tag
Co-authored-by: huangfuqunze <huangfuqunze.hfqz@alibaba-inc.com>
### What problem does this PR solve?
The parameter positions were incorrect and have been corrected to use
keyword argument passing
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Adjust the operation cell of the table on the file management page
and dataset page #3221.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
fix deepseek-ai/deepseek-vl2 model can not be select as a VL model to
parse pdf image . And add other vl models config from siliconflow
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
---------
Co-authored-by: unknown <taoshi.ln@chinatelecom.cn>
### What problem does this PR solve?
Add `/login/channels` route and improve auth logic to support frontend
integration with third-party login providers:
- Add `/login/channels` route to provide authentication channel list
with `display_name` and `icon`
- Optimize user info parsing logic by prioritizing `avatar_url` and
falling back to `picture`
- Simplify OIDC token validation by removing unnecessary `kid` checks
- Ensure `client_id` is safely cast to string during `audience`
validation
- Fix typo
---
- Related pull request: #7379
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
Fix:When sharing the knowledge base of multiple tenants with one person,
when this person queries the knowledge base of both tenants, they will
only query the question of the first person's knowledge base
Co-authored-by: 杜有强 <duyq@internal.ths.com.cn>
### What problem does this PR solve?
1. Add delete_by_ids method
2. Add get_doc_ids_by_doc_names
3. Improve user_canvan_version's logic (avoid O(n) db IO)
4. Improve document delete logic (avoid O(n) db IO)
### Type of change
- [x] Performance Improvement
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
---------
Co-authored-by: 马继龙 <majilong@ideal.com>
### What problem does this PR solve?
Fix: After deleting the file from the file management menu, it was not
removed from the MinIO bucket.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>
### What problem does this PR solve?
This is a follow-up of #7088 , adding a knowledge base type input to the
`Begin` component, and a knowledge base selector to the agent flow debug
input panel:

then you can select one or more knowledge bases when testing the agent:

Note: the lines changed in `agent/component/retrieval.py` after line 94
are modified by `ruff format` from the `pre-commit` hooks, no functional
change.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/7466
I think due to some times we can not get position
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
When parsing documents containing images, the current code uses a
single-threaded approach to call the VL model, resulting in extremely
slow parsing speed (e.g., parsing a Word document with dozens of images
takes over 20 minutes).
By switching to a multithreaded approach to call the VL model, the
parsing speed can be improved to an acceptable level.
### Type of change
- [x] Performance Improvement
---------
Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>
### What problem does this PR solve?
fixed errror when vars of cnt begin declare with key contain "begin"
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix https://github.com/infiniflow/ragflow/issues/7224 and
https://github.com/infiniflow/ragflow/issues/6793
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)a
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Fix instructions for Ollama
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
1. Use `host.docker.internal` as base URL
2. Fix numbers in list
3. Make clear what is the console input and what is the output
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Fix `filed_map` was incorrectly persisted. #7412
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Modify the style of the dataset page #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
change create dataset delimiter default value to r'\n'
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Modify the dataset list page style #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix#6600
Hello, I have the same business requirement as #6600. My use case is:
We have many departments (> 20 now and increasing), and each department
has its own knowledge base. Because the agent workflow is the same, so I
want to change the knowledge base on the fly, instead of creating agents
for every department.
It now looks like this:

Knowledge bases can be selected from the dropdown, and passed through
the variables in the table. All selected knowledge bases are used for
retrieval.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/7407
Based on this context, I think there should be some reasons that let
some LLMs have a mismatch (add the wrong "@xxx"),
So I think when use fid can not fetch llm then tried to just use name
should can fetch it.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Remove unnecessary parameter restrictions in dataset creation API
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Deprecate get_dataset_id_and_document_id fixture, use add_document
instead
### Type of change
- [x] Update test cases
### What problem does this PR solve?
Feat: Using IconFont as an additional icon library #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
When you removed any document in a knowledge base using knowledge graph,
the graph's `removed_kwd` is set to "Y".
However, in the function `graphrag.utils.get_gaph`, `rebuild_graph`
method is passed and directly return `None` while `removed_kwd=Y`,
making residual part of the graph abandoned (but old entity data still
exist in db).
Besides, infinity instance actually pass deleting graph components'
`source_id` when removing document. It may cause wrong graph after
rebuild.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Modify background color of Card #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Qwen3 and more LLMs.
Close#7296
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Add a language switch drop-down box to the top navigation bar
#3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Modify the segmented component style #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
This PR introduces Pydantic-based validation for the create dataset HTTP
API, improving code clarity and robustness. Key changes include:
1. Pydantic Validation
2. Error Handling
3. Test Updates
4. Documentation
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
- [x] Refactoring
### What problem does this PR solve?
Fix the redis lock will always timeout (change the logic order release
lock first)
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Adjust the style of the home page #3321
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Bind data to the agent module of the home page #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add support for OAuth2 and OpenID Connect (OIDC) authentication,
allowing OAuth/OIDC authentication using the specified routes:
- `/login/<channel>`: Initiates the OAuth flow for the specified channel
- `/oauth/callback/<channel>`: Handles the OAuth callback after
successful authentication
The callback URL should be configured in your OAuth provider as:
```
https://your-app.com/oauth/callback/<channel>
```
For detailed instructions on configuring **service_conf.yaml.template**,
see: `./api/apps/auth/README.md#usage`.
- Related issues
#3495
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
### What problem does this PR solve?
When updating a chat assistant using API,if the dataset attached by the
current chat assistant is not empty,setting dataset to
null("dataset_ids":[]) will cause update failure:'dataset_ids' can't be
empty
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add AsyncTreeSelect component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
当前graphrag的LOOP_PROMPT,会导致模型输出Y之后,继续补充了实体和关系,比较浪费时间。参照[graph
rag](https://github.com/microsoft/graphrag/blob/main/graphrag/prompts/index/extract_graph.py)最新的代码,修改了LOOP_PROMPT,经过验证,修改后可以稳定的输出Y停止。
Currently, GraphRAG’s LOOP_PROMPT causes the model to keep appending
entities and relationships even after outputting “Y,” which wastes time.
Referring to the latest code in
[graphRAG](https://github.com/microsoft/graphrag/blob/main/graphrag/prompts/index/extract_graph.py),
I modified the LOOP_PROMPT, and after verification the updated prompt
reliably outputs “Y” and stops.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):
Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>
### What problem does this PR solve?
0.18.0 mcp server can not start with upgrade from 0.17.2 or new install
except rebuild all docker
Close#7321
mcp server can not start auto from docker :
2025-04-25 17:30:44,512 INFO 25 task_executor_2a9f3e2de99a_0 reported
heartbeat: {"name": "task_executor_2a9f3e2de99a_0", "now":
"2025-04-25T17:30:44.509+08:00", "boot_at":
"2025-04-25T16:43:33.038+08:00", "pending": 0, "lag": 0, "done": 0,
"failed": 0, "current": {}}
usage: server.py [-h] [--base_url BASE_URL] [--host HOST] [--port PORT]
[--mode MODE] [--api_key API_KEY]
server.py: error: unrecognized arguments:
problem:
server.py in docker start arguments not correct , so mcp server start
fail
reason:
```
1. docker-copose.yaml
example - --mcp-host-api-key="ragflow-12345678" is wrong. do not add "" to key or it says:"api-key wrong"
2.docker file entrypoint.sh can not translate config to exec command , we need mapping file from host to docker
- ./entrypoint.sh:/ragflow/entrypoint.sh
3.just add one code raw fix all probelm
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
- [x] Performance Improvement
---------
Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
In the generate_confirmation_token method, a spelling error was found
with 'tenent_id'. The correct spelling should be 'tenant_id'.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Co-authored-by: shengliang xiao <shengliangxiao2024@gmail.com>
With current config will get error "Fail to access model(gemma-7b-it)
using this api key"
Since the model has been removed, according to Groq official document:
https://console.groq.com/docs/models
### Type of change
- [ x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Batch operations on documents in a dataset #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Enhance capability of `list_docs`.
Breaking change: change method from `GET` to `POST`.
### Type of change
- [x] Refactoring
- [x] Enhancement with breaking change
### What problem does this PR solve?
Feat: Create empty document. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Filter document by running status and file type. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Save document metadata #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/6984
1. Markdown parser supports get pictures
2. For Native, when handling Markdown, it will handle images
3. improve merge and
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Save the configuration information of the knowledge base document
#3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Display the document configuration dialog with shadcn #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
This PR adds the support for latest OpenSearch2.19.1 as the store engine
& search engine option for RAGFlow.
### Main Benefit
1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is
much better than Elasticsearch
2. For search, OpenSearch2.19.1 supports full-text
search、vector_search、hybrid_search those are similar with Elasticsearch
on schema
3. For store, OpenSearch2.19.1 stores text、vector those are quite
simliar with Elasticsearch on schema
### Changes
- Support opensearch_python_connetor. I make a lot of adaptions since
the schema and api/method between ES and Opensearch differs in many
ways(especially the knn_search has a significant gap) :
rag/utils/opensearch_coon.py
- Support static config adaptions by changing:
conf/service_conf.yaml、api/settings.py、rag/settings.py
- Supprt some store&search schema changes between OpenSearch and ES:
conf/os_mapping.json
- Support OpenSearch python sdk : pyproject.toml
- Support docker config for OpenSearch2.19.1 :
docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template
### How to use
- I didn't change the priority that ES as the default doc/search engine.
Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it
will work.
### Others
Our team tested a lot of docs in our environment by using OpenSearch as
the vector database ,it works very well.
All the conifg for OpenSearch is necessary.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
### What problem does this PR solve?
Feat: Delete and rename files in the knowledge base #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Display document parsing status #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
The lock is not released correctly when task_exectuor is abnormal
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Some models force thinking, resulting in the absence of the think tag in
the returned content
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Sometimes after we commit the code and open the PR the CI pipeline fails
in Ruff checks. Including a pre-commit we can identify this problem
early and avoid time loss.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [X] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Fix the entrypoint file from the docker container to solve #7249
Here is the important part from the logs:
```
docker logs -f ragflow-server
...
usage: server.py [-h] [--base_url BASE_URL] [--host HOST] [--port PORT]
[--mode MODE] [--api_key API_KEY]
server.py: error: unrecognized arguments:
...
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
This PR fixes an issue with the MCP server configuration in RAGFlow's
Docker deployment where:
1. Incorrect parameter naming (`--mcp--host-api-key` with double
hyphens) caused server startup failures
2. Port binding conflicts occurred due to unexposed MCP ports in Docker
3. Inconsistent host addressing between `0.0.0.0` and `127.0.0.1` led to
connectivity issues
The changes ensure proper MCP server initialization and reliable
inter-service communication.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### Key Changes
1. **Parameter Correction**:
- Fixed `--mcp--host-api-key` → `--mcp-host-api-key`
### What problem does this PR solve?
Feat: Deleting files in batches. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
The knowledge_graph chunk method is deprecated and should no longer be
used. #7184.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
Enhance capability of `list_kbs`.
Breaking change: change method from `GET` to `POST`.
### Type of change
- [x] Refactoring
- [x] Enhancement with breaking change
### What problem does this PR solve?
Feat: Show the owner of this knowledge base on the list card. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Even if the knowledge base has slices, the chunk method can be
changed #7115
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: Knowledge Graph Extraction Conflict Between Dataset-Level and
File-Specific Configurations #7198
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix retrieval testing wrong pagination. #7171
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Put the knowledge base list related hooks into use-knowledge-request.ts
#3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Move langfuse configuration to api page #6155
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Filter the knowledge base list using owner #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
This PR changes Redis to be a statefulset. In some situation when we
Redis pod gets rescheduled to another Node, it gets stuck in pending
state due to the PVC attached to another Kubernetes node.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [X] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
When parsing pptx files, some shapes do not contain the `shape_type`
attribute, which causes the original code to throw an exception during
extraction, leading to failure in content extraction. This optimization
introduces handling logic for such anomalous shapes, providing a safer
and more robust processing mechanism.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Add mcp self-host mode, a complement of #7084.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add mcp self-host mode documentation, a complement of #7141.
### Type of change
- [x] Documentation Update
---------
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
### What problem does this PR solve?
Hello, I encountered a problem when trying to use a S3 backend
(seaweedfs) for storage in RAGFlow: when calling
`STORAGE_IMPL.get("bucket", "key")`, the actual request sent to S3 is
`bucket/bucket/key`, causing a `NoSuchKey` error.
I compared the code in `s3_conn.py` to `minio_conn.py` and
`oss_conn.py`, then decided to remove the `else` branch in
`use_prefix_path` method, and it works. I didn't configure `prefix_path`
or `bucket` in `s3` section of the `service_conf.yaml`.
I think this is a bug, but not sure.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Add MCP support with a client example.
Issue link: #4344
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Documentation for MCP server
### Type of change
- [x] Documentation Update
---------
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
### What problem does this PR solve?
If you deploy Ragflow using Kubernetes, the hostname will change during
a rolling update. This causes the consumer name of the task executor to
change, making it impossible to schedule tasks that were previously in a
pending state.
To address this, I introduced a recovery task that scans these pending
messages and re-publishes them, allowing the tasks to continue being
processed.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
---------
Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>
## Problem Description
Multiple files in the RAGFlow project contain closure trap issues when
using lambda functions with `trio.open_nursery()`. This problem causes
concurrent tasks created in loops to reference the same variable,
resulting in all tasks processing the same data (the data from the last
iteration) rather than each task processing its corresponding data from
the loop.
## Issue Details
When using a `lambda` to create a closure function and passing it to
`nursery.start_soon()` within a loop, the lambda function captures a
reference to the loop variable rather than its value. For example:
```python
# Problematic code
async with trio.open_nursery() as nursery:
for d in docs:
nursery.start_soon(lambda: doc_keyword_extraction(chat_mdl, d, topn))
```
In this pattern, when concurrent tasks begin execution, `d` has already
become the value after the loop ends (typically the last element),
causing all tasks to use the same data.
## Fix Solution
Changed the way concurrent tasks are created with `nursery.start_soon()`
by leveraging Trio's API design to directly pass the function and its
arguments separately:
```python
# Fixed code
async with trio.open_nursery() as nursery:
for d in docs:
nursery.start_soon(doc_keyword_extraction, chat_mdl, d, topn)
```
This way, each task uses the parameter values at the time of the
function call, rather than references captured through closures.
## Fixed Files
Fixed closure traps in the following files:
1. `rag/svr/task_executor.py`: 3 fixes, involving document keyword
extraction, question generation, and tag processing
2. `rag/raptor.py`: 1 fix, involving document summarization
3. `graphrag/utils.py`: 2 fixes, involving graph node and edge
processing
4. `graphrag/entity_resolution.py`: 2 fixes, involving entity resolution
and graph node merging
5. `graphrag/general/mind_map_extractor.py`: 2 fixes, involving document
processing
6. `graphrag/general/extractor.py`: 3 fixes, involving content
processing and graph node/edge merging
7. `graphrag/general/community_reports_extractor.py`: 1 fix, involving
community report extraction
## Potential Impact
This fix resolves a serious concurrency issue that could have caused:
- Data processing errors (processing duplicate data)
- Performance degradation (all tasks working on the same data)
- Inconsistent results (some data not being processed)
After the fix, all concurrent tasks should correctly process their
respective data, improving system correctness and reliability.
### What problem does this PR solve?
Feat: Rendering a search test list with real data #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
… bytes-like object
### What problem does this PR solve?
fix bug #6990 internal server error ehile chunking:expected string or
bytes-like object
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Co-authored-by: unknown <taoshi.ln@chinatelecom.cn>
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/7083
Internal due to when returning from ES, fields changed to str, so the
bool conversion does not work as expected.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
when there are multiple entities, the variable `v` may be a list, which
will lead to this error:
```
| File "/mnt/d/wrf/ragflow/ragflow/graphrag/utils.py", line 59, in replace_all
| result = result.replace(f"{{{k}}}", v)
| TypeError: replace() argument 2 must be str, not list
```
this pr assign this `v` to be a str
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
When running graph resolution with infinity, if single quotation marks
appeared in the entities name that to be delete, an error tokenizing of
sqlglot might occur after calling infinity.
For example:
```
INFINITY delete table ragflow_xxx, filter knowledge_graph_kwd IN ('entity') AND entity_kwd IN ('86 IMAGES FROM PREVIOUS CONTESTS', 'ADAM OPTIMIZATION', 'BACKGROUND'ESTIMATION')
```
may raise error
```
Error tokenizing 'TS', 'ADAM OPTIMIZATION', 'BACKGROUND'ESTIMATION''
```
and make the document parsing failed。
Replace one single quotation mark with double single quotation marks can
let sqlglot tokenize the entity name correctly.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
The assistant message placeholder is incorrect, I have finished
modifying both Chinese and traditional Chinese characters
### Type of change
- [x] Bug Fix
### Related Issue:
https://github.com/infiniflow/ragflow/issues/6548
### Related PR:
https://github.com/infiniflow/ragflow/pull/6861
### Environment:
Commit version:
[[48730e0](48730e00a8)]
### Bug Description:
Unexpected `pymysql.err.InterfaceError: (0, '') `when using Peewee +
PyMySQL + PooledMySQLDatabase after a long-running `chat streamly`
operation.
This is a common issue with Peewee + PyMySQL + connection pooling: you
end up using a connection that was silently closed by the server, but
Peewee doesn't realize it's dead.
**I found that the error only occurs during longer streaming outputs**
and is unrelated to the database connection context, so it's likely
because:
- The prolonged streaming response caused the database connection to
time out
- The original database connection might have been disconnected by the
server during the streaming process
### Why This Happens
This error happens even when using `@DB.connection_context() `after the
stream is done. After investigation, I found this is caused by MySQL
connection pools that appear to be open but are actually dead (expired
due to` wait_timeout`).
1. `@DB.connection_context()` (as a decorator or context manager) pulls
a connection from the pool.
2. If this connection was idle and expired on the MySQL server (e.g.,
due to `wait_timeout`), but not closed in Python, it will still be
considered “open” (`DB.is_closed() == False`).
3. The real error will occur only when I execute a SQL command (such as
.`get_or_none()`), and PyMySQL tries to send it to the server via a
broken socket.
### Changes Made:
1. I implemented manual connection checks before executing SQL:
```
try:
DB.execute_sql("SELECT 1")
except Exception:
print("Connection dead, reconnecting...")
DB.close()
DB.connect()
```
2. Delayed the token count update until after the streaming response is
completed to ensure the streaming output isn't interrupted by database
operations.
```
total_tokens = 0
for txt in chat_streamly(system, history, gen_conf):
if isinstance(txt, int):
total_tokens = txt
......
break
......
if total_tokens > 0:
if not TenantLLMService.increase_usage(self.tenant_id, self.llm_type, txt, self.llm_name):
logging.error("LLMBundle.chat_streamly can't update token usage for {}/CHAT llm_name: {}, content: {}".format(self.tenant_id, self.llm_name, txt))
```
### What problem does this PR solve?
Fix: Files being parsed are not allowed to be deleted in batches #7065
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
docs(api): Fix request method in Related Questions example (DELETE→POST)
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/6905
When deleting a document will check before removing it from storage
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add fallback for bad citation output. #6948
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix Helm Ingress template; Trying to access a global variable within a
loop
Fix#6191
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Feat: Remove the rotation state of the button that parses the document
#7008
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: The selected state of the TreeView node cannot be seen on Mac #7000
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix update_progress issue introduced by #6975
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: Fix api page translation issue. #3221
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Sometimes a slide may trigger a Proxy error (ArgumentException:
Parameter is not valid) due to issues in the original file, and this
error message can be confusing for users.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [x] Other (please describe):
### What problem does this PR solve?
Considering the ragflow_deps image is only available for `linux/amd64`
platform, if we try to run the docker build commands in ,macOS for
instance, without the platform flag, we get an error due to the
different platform. Specifying the platform in the docker build command
fixes this issue.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [X] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
i use PdfParser in local(refer to this case:
https://github.com/infiniflow/ragflow/blob/main/rag/app/paper.py) like
this:
```
import re
import openpyxl
from ragflow.api.db import ParserType
from ragflow.rag.nlp import rag_tokenizer, tokenize, tokenize_table, add_positions, bullets_category, \
title_frequency, \
tokenize_chunks
from ragflow.rag.utils import num_tokens_from_string
from ragflow.deepdoc.parser import PdfParser, ExcelParser, DocxParser,PlainParser
def logger(prog=None, msg=""):
print(msg)
class Pdf(PdfParser):
def __init__(self):
self.model_speciess = ParserType.MANUAL.value
super().__init__()
def __call__(self, filename, binary=None, from_page=0,
to_page=100000, zoomin=3, callback=None):
from timeit import default_timer as timer
start = timer()
callback(msg="OCR is running...")
self.__images__(
filename if not binary else binary,
zoomin,
from_page,
to_page,
callback
)
callback(msg="OCR finished.")
print("OCR:", timer() - start)
self._layouts_rec(zoomin)
callback(0.65, "Layout analysis finished.")
print("layouts:", timer() - start)
self._table_transformer_job(zoomin)
callback(0.67, "Table analysis finished.")
self._text_merge()
tbls = self._extract_table_figure(True, zoomin, True, True)
self._concat_downward()
self._filter_forpages()
callback(0.68, "Text merging finished")
# clean mess
for b in self.boxes:
b["text"] = re.sub(r"([\t ]|\u3000){2,}", " ", b["text"].strip())
return [(b["text"], b.get("layout_no", ""), self.get_position(b, zoomin))
for i, b in enumerate(self.boxes)], tbls
```
show err like this:
```
File "xxxxx/third_party/ragflow/deepdoc/parser/pdf_parser.py", line 1039, in __images__
self.pdf.close()
AttributeError: 'PdfReader' object has no attribute 'close'
```
i found ragflow source code use
`pdfplumber.open`(https://github.com/infiniflow/ragflow/blob/main/deepdoc/parser/pdf_parser.py#L1007C28-L1007C43)
and replace` self.pdf `with ` pdf2_read` (from pypdf import PdfReader as
pdf2_read)in line 1024
(https://github.com/infiniflow/ragflow/blob/main/deepdoc/parser/pdf_parser.py#L1024)
```
self.pdf = pdf2_read
```
---
and I found that `pdfplumber` can be used in this way:
```
file_path="xxx.pdf"
res = pdfplumber.open(file_path)
res.close()
```
but `pypdf.PdfReader` source code do not has `close` func, source code
use like this
```
with open(stream, "rb") as fh:
stream = BytesIO(fh.read())
self._stream_opened = True
```
> https://github.com/py-pdf/pypdf/blob/main/pypdf/_reader.py#L156
so I moved the `self.pdf.close` function call and fixed this problem
hoping to help the project😊
ragflow: v0.17 also encountered this problem. #1453 The task table shows
that the actual task has been completed. Since the process_msg of the
task is not synchronized to the document table, there is no progress
update on the page.
This may be caused by the lock not being released when the exception
occurs.
ragflow:v0.17同样碰到这个问题, 看task表实际任务已经完成,由于没有把task的process_msg同步给document表,
所以在页面看没有进度更新。
可能是这里异常时没有释放锁导致的。
```/api/ragflow_server.py
def update_progress():
lock_value = str(uuid.uuid4())
redis_lock = RedisDistributedLock("update_progress", lock_value=lock_value, timeout=60)
logging.info(f"update_progress lock_value: {lock_value}")
while not stop_event.is_set():
try:
if redis_lock.acquire():
DocumentService.update_progress()
redis_lock.release()
stop_event.wait(6)
except Exception:
logging.exception("update_progress exception")
++ if redis_lock.acquired:
++ redis_lock.release()
```
### What problem does this PR solve?
Fix KB update_time changed whenever system relaunched. #6953
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Sometimes, the **s** in **chunks (s, a)** is an empty string. This
causes the condition **if s and len(a) > 0** in the line **chunks = [(s,
a) for s, a in chunks if s and len(a) > 0]** to fail, which changes the
length of the new chunks. As a result, the final assertion **assert
len(chunks) - end == n_clusters, "{} vs. {}".format(len(chunks) - end,
n_clusters)** fails and raises a confusing error like 7 vs. 8
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Fix: In the dark night theme, the message input box is not displayed
correctly. #6950
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### Related Issue:
https://github.com/infiniflow/ragflow/issues/6741
### Environment:
Using nightly version
Commit version:
[[6051abb](6051abb4a3)]
### Bug Description:
The retrieval function in rag/nlp/search.py returns the original total
chunks number
even after chunks are filtered by similarity_threshold. This creates
inconsistency
between the actual returned chunks and the reported total.
### Changes Made:
Added code to count how many search results actually meet or exceed the
configured similarity threshold
Positioned the calculation after the doc_ids conditional logic to ensure
special cases are handled correctly
Updated the ranks["total"] value to store this filtered count instead of
using the raw search result count
Using NumPy leverages optimized C-level batch operations to optimize
speed
### What problem does this PR solve?
Feat: Add translation text to the prompt word of the generate operator
to distinguish it from the prompt word of the knowledge base #6934
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- Returning 3 similarity scores to the chat completion's `reference`
field. It gives the user more transparency and added flexibility to
display/rerank the reference when needed
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
### What problem does this PR solve?
Fix: remove deprecated KB updating `permission` field. #6911
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: local variable referenced before assignment. #6803
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
The old logic filters out all assistant messages from messages, which,
in multi-turn conversations, results in only user messages being
retained. This leads to an error in locally deployed models:
Conversation roles must alternate user/assistant/user/assistant/...
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Feat: Install sonner library #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix the issue where waiting tasks couldn't be processed when upstream
components were "switch", "categorize", or "relevant" and the normal
processing path couldn't continue.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
This PR introduces **primitive support for function calls**,
enabling the system to handle basic function call capabilities.
However, this feature is currently experimental and **not yet enabled
for general use**, as it is only supported by a subset of models,
namely, Qwen and OpenAI models.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fixes#6548
Add exception handling to prevent exceptions from propagating back to
the web, which may lead to failure in displaying conversation content.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Co-authored-by: cm <caiming@sict.ac.cn>
[When parsing documents with graph, an error
occurred:[ERROR][Exception]: 'method']
(https://github.com/infiniflow/ragflow/issues/6835)
### What problem does this PR solve?
Close#6786
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Co-authored-by: cm <caiming@sict.ac.cn>
### What problem does this PR solve?
This PR addresses the build and dependency issues faced by developers in
regions with poor connectivity to official Ubuntu repositories and
standard dependency sources. Currently, developers in these regions
experience slow or failed Docker builds and dependency downloads,
significantly impacting development efficiency.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
The changes include:
1. Modified Dockerfile to use alternative Ubuntu mirrors with better
connectivity in affected regions
2. Added a new script (download_deps_CN.py) that provides
region-specific alternative download links for dependencies
### What problem does this PR solve?
#6731#6722
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [ ] Documentation Update
---------
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
### What problem does this PR solve?
Feat: Load the dialog page, prohibit calling the dialog/get interface
#6798
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
fix redis pvc in helm deployment
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
add openai agent
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Clarify the use of OpenAI-API-compatible #6782
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
…gic to return the correct deletion message. Add handling for empty
arrays to ensure no errors occur during the deletion operation. Update
the test cases to verify the new logic.
### What problem does this PR solve?
fix this bug:https://github.com/infiniflow/ragflow/issues/6607
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
### What problem does this PR solve?
align default values in create chat assistant HTTP API dos with
implementation.
llm.presence_penalty 0.2 -> 0.4
prompt.top_n 8->6
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
Fix: Issue with Markdown Code Blocks Breaking Frontend Layout #5789
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
…cument_ids" to maintain consistency.
### What problem does this PR solve?
Close#6752
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
### What problem does this PR solve?
Fix: The file upload prompt indicates "No authorization." #6516
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: Using the Enter key does not send a complete message #6754
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Support deleting knowledge graph #6747
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add a notification logic to the team member invite feature #6610
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Interrupt streaming #6515
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
fix#6085
RagTokenizer's dfs_() function falls into infinite recursion when
processing text with repetitive Chinese characters (e.g.,
"一一一一一十一十一十一..." or "一一一一一一十十十十十十十二十二十二..."), causing memory leaks.
### Type of change
Implemented three optimizations to the dfs_() function:
1.Added memoization with _memo dictionary to cache computed results
2.Added recursion depth limiting with _depth parameter (max 10 levels)
3.Implemented special handling for repetitive character sequences
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
I'm really sorry, I found that in graphrag/general/extractor.py under
def __call__, the line change.removed_nodes.extend(nodes[1:]) causes an
AttributeError: 'set' object has no attribute 'extend'. Could you please
merge the branch e666528 again? I made some modifications.
### What problem does this PR solve?
Feat: Allows users to search for models in the model selection drop-down
box #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### Related Issue:
https://github.com/infiniflow/ragflow/issues/6653
### Environment:
Using nightly version [ece5903]
Elasticsearch database
Thanks for the review! My fault! I realize my initial testing wasn't
passed.
In graphrag/entity_resolution.py
`sub_connect_graph` is a set like` {'HELLO', 'Hi', 'How are you'}`,
Neither accessing `.nodes` nor `.nodes()` will work, **it still causes
`AttributeError: 'set' object has no attribute 'nodes'`**
In graphrag/general/extractor.py
The `list.extend() `method performs an in-place operation, directly
modifying the original list and returning ‘None’ rather than the
modified list.
Neither accessing
`sorted(set(node0_attrs[attr].extend(node1_attrs.get(attr, []))))` nor
`sorted(set(node0_attrs[attr].extend(node1_attrs[attr])))` will work,
**it still causes `TypeError: 'NoneType' object is not iterable`**
### Type of change
- [ ] Bug Fix AttributeError: graphrag/entity_resolution.py
- [ ] Bug Fix TypeError: graphrag/general/extractor.py
### Related Issue: #6653
### Environment:
Using nightly version
Elasticsearch database
### Bug Description:
When clicking the "Entity Resolution" button in KnowledgeGraph,
encountered the following errors:
graphrag/entity_resolution.py
```
list(sub_connect_graph.nodes) AttributeError
```
graphrag/general/extractor.py
```
node0_attrs[attr] = sorted(set(node0_attrs[attr].extend(node1_attrs[attr])))
TypeError: 'NoneType' object is not iterable
```
```
for attr in ["keywords", "source_id"]:
KeyError I think attribute "keywords" is in edges not nodes
```
graphrag/utils.py
```
settings.docStoreConn.delete() # Sync function called as async
```
### Changes Made:
Fixed AttributeError in entity_resolution.py by properly handling graph
nodes
Fixed TypeError and KeyError in extractor.py by separate operations
Corrected async/sync mismatch in document deletion call
### What problem does this PR solve?
- Added support for S3-compatible protocols.
- Enabled the use of knowledge base ID as a file prefix when storing
files in S3.
- Updated docker/README.md to include detailed S3 and OSS configuration
instructions.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/6138
This PR is going to support vision llm for gpustack, modify url path
from `/v1-openai` to `/v1`
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
### Type of change
- [x] Documentation Update
---------
Co-authored-by: balibabu <cike8899@users.noreply.github.com>
### What problem does this PR solve?
Fix#6334
Hello, I encountered the same problem in #6334. In the
`api/db/db_models.py`, it calls `obj.create_table()` unconditionally in
`init_database_tables`, before the `migrate_db()`. Specially for the
`permission` field of `user_canvas` table, it has `index=True`, which
causes `peewee` to issue a SQL trying to create the index when the field
does not exist (the `user_canvas` table already exists), so
`psycopg2.errors.UndefinedColumn: column "permission" does not exist`
occurred.
I've added a judgement in the code, to only call `create_table()` when
the table does not exist, delegate the migration process to
`migrate_db()`.
Then another problem occurs: the `migrate_db()` actually does nothing
because it failed on the first migration! The `playhouse` blindly issue
DDLs without things like `IF NOT EXISTS`, so it fails... even if the
exception is `pass`, the transaction is still rolled back. So I removed
the transaction in `migrate_db()` to make it work.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Actually fix#6241
Hello, I ran into the same problem as #6241. When I'm testing my agent
flow in the web ui using `Run` button with a file input, the retrieval
component always gave an empty output.
In the code I found that:
`web/src/pages/flow/debug-content/index.tsx`:
```tsx
const onOk = useCallback(async () => {
const values = await form.validateFields();
const nextValues = Object.entries(values).map(([key, value]) => {
const item = parameters[Number(key)];
let nextValue = value;
if (Array.isArray(value)) {
nextValue = ``;
value.forEach((x) => {
nextValue +=
x?.originFileObj instanceof File
? `${x.name}\n${x.response?.data}\n----\n` // Here, the file content always ends in '\n'
: `${x.url}\n${x.result}\n----\n`;
});
}
return { ...item, value: nextValue };
});
ok(nextValues);
}, [form, ok, parameters]);
```
while in the `agent/component/retrieval.py`:
```python
def _run(self, history, **kwargs):
query = self.get_input()
query = str(query["content"][0]) if "content" in query else ""
lines = query.split('\n') # inputs are split to ['xxx','yyy','----','']
query = lines[-1] if lines else "" # Here we always get '', thus no result
kbs = KnowledgebaseService.get_by_ids(self._param.kb_ids)
if not kbs:
return Retrieval.be_output("")
```
so the code will never got correct result.
I'm not sure why the input needs such a split here, so I just removed
the splitting, and it works well on my side.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Fix#5418
Actually, the fix#4329 also works for agent flows with parameters, so
this PR just relaxes the `else` branch of that. With this PR, it works
fine on my side, may need more testing to make sure this does not break
something.
I guess the real problem may be deeply hidden in the code which relates
to conversation and canvas execution. After a few hours of debugging, I
see the only difference between with and without parameters in `begin`
component, is the `history` field of canvas data. When the `begin`
component contains some parameters, the debug log shows:
```
025-03-29 19:50:38,521 DEBUG 356590 {
"component_name": "Begin",
"params": {"output_var_name": "output", "message_history_window_size": 22, "query": [{"type": "fileUrls", "key": "fileUrls", "name": "files", "optional": true, "value": "问题.txt\n今天天气怎么样"}], "inputs": [], "debug_inputs": [], "prologue": "你好! 我是你的助理,有什么可以帮到你的吗?", "output": null},
"output": null,
"inputs": []
}, history: [["user", "请回答我上传文件中的问题。"]], kwargs: {"stream": false}
2025-03-29 19:50:38,523 DEBUG 356590 {
"component_name": "Answer",
"params": {"output_var_name": "output", "message_history_window_size": 22, "query": [], "inputs": [], "debug_inputs": [], "post_answers": [], "output": null},
"output": null,
"inputs": []
}, history: [["user", "请回答我上传文件中的问题。"]], kwargs: {"stream": false}
```
Then it does not go further along the flow.
When the `begin` component does not contain any parameter, the debug log
shows:
```
2025-03-29 19:41:13,518 DEBUG 353596 {
"component_name": "Begin",
"params": {"output_var_name": "output", "message_history_window_size": 22, "query": [], "inputs": [], "debug_inputs": [], "prologue": "你好! 我是你的助理,有什么可以帮到你的吗?", "output": null},
"output": null,
"inputs": []
}, history: [], kwargs: {"stream": false}
2025-03-29 19:41:13,520 DEBUG 353596 {
"component_name": "Answer",
"params": {"output_var_name": "output", "message_history_window_size": 22, "query": [], "inputs": [], "debug_inputs": [], "post_answers": [], "output": null},
"output": null,
"inputs": []
}, history: [], kwargs: {"stream": false}
2025-03-29 19:41:13,556 INFO 353596 127.0.0.1 - - [29/Mar/2025 19:41:13] "POST /api/v1/agents/fee6886a0c6f11f09b48eb8798e9aa9b/sessions?user_id=123 HTTP/1.1" 200 -
2025-03-29 19:41:21,115 DEBUG 353596 Canvas.prepare2run: Retrieval:LateGuestsNotice
2025-03-29 19:41:21,116 DEBUG 353596 {
"component_name": "Retrieval",
"params": {"output_var_name": "output", "message_history_window_size": 22, "query": [], "inputs": [], "debug_inputs": [], "similarity_threshold": 0.2, "keywords_similarity_weight": 0.3, "top_n": 8, "top_k": 1024, "kb_ids": ["9aca3c700c5911f0811caf35658b9385"], "rerank_id": "", "empty_response": "", "tavily_api_key": "", "use_kg": false, "output": null},
"output": null,
"inputs": []
}, history: [["user", "请回答我上传文件中的问题。"]], kwargs: {"stream": false}
```
It correctly goes along the flow and generates correct answer.
You can see the difference: when the `begin` component has any
parameter, the `history` field is filled from the beginning, while it is
just `[]` if the `begin` component has no parameter.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Fix knowledge_graph_kwd on infinity. Close#6476 and #6624
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix entity_types. Close#6287 and #6608
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
This PR gives better control over how we distribute which service will
be loaded. With this approach, we can create containers to run only the
web server and others to run the task executor. It also introduces the
unique ID per task executor host, this will be important when scaling
task executors horizontally, considering unique task executor ids will
be required.
This new `entrypoint.sh` maintains the default behavior of starting the
web server and task executor in the same host.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [X] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
# Dynamic Context Window Size for Ollama Chat
## Problem Statement
Previously, the Ollama chat implementation used a fixed context window
size of 32768 tokens. This caused two main issues:
1. Performance degradation due to unnecessarily large context windows
for small conversations
2. Potential business logic failures when using smaller fixed sizes
(e.g., 2048 tokens)
## Solution
Implemented a dynamic context window size calculation that:
1. Uses a base context size of 8192 tokens
2. Applies a 1.2x buffer ratio to the total token count
3. Adds multiples of 8192 tokens based on the buffered token count
4. Implements a smart context size update strategy
## Implementation Details
### Token Counting Logic
```python
def count_tokens(text):
"""Calculate token count for text"""
# Simple calculation: 1 token per ASCII character
# 2 tokens for non-ASCII characters (Chinese, Japanese, Korean, etc.)
total = 0
for char in text:
if ord(char) < 128: # ASCII characters
total += 1
else: # Non-ASCII characters
total += 2
return total
```
### Dynamic Context Calculation
```python
def _calculate_dynamic_ctx(self, history):
"""Calculate dynamic context window size"""
# Calculate total tokens for all messages
total_tokens = 0
for message in history:
content = message.get("content", "")
content_tokens = count_tokens(content)
role_tokens = 4 # Role marker token overhead
total_tokens += content_tokens + role_tokens
# Apply 1.2x buffer ratio
total_tokens_with_buffer = int(total_tokens * 1.2)
# Calculate context size in multiples of 8192
if total_tokens_with_buffer <= 8192:
ctx_size = 8192
else:
ctx_multiplier = (total_tokens_with_buffer // 8192) + 1
ctx_size = ctx_multiplier * 8192
return ctx_size
```
### Integration in Chat Method
```python
def chat(self, system, history, gen_conf):
if system:
history.insert(0, {"role": "system", "content": system})
if "max_tokens" in gen_conf:
del gen_conf["max_tokens"]
try:
# Calculate new context size
new_ctx_size = self._calculate_dynamic_ctx(history)
# Prepare options with context size
options = {
"num_ctx": new_ctx_size
}
# Add other generation options
if "temperature" in gen_conf:
options["temperature"] = gen_conf["temperature"]
if "max_tokens" in gen_conf:
options["num_predict"] = gen_conf["max_tokens"]
if "top_p" in gen_conf:
options["top_p"] = gen_conf["top_p"]
if "presence_penalty" in gen_conf:
options["presence_penalty"] = gen_conf["presence_penalty"]
if "frequency_penalty" in gen_conf:
options["frequency_penalty"] = gen_conf["frequency_penalty"]
# Make API call with dynamic context size
response = self.client.chat(
model=self.model_name,
messages=history,
options=options,
keep_alive=60
)
return response["message"]["content"].strip(), response.get("eval_count", 0) + response.get("prompt_eval_count", 0)
except Exception as e:
return "**ERROR**: " + str(e), 0
```
## Benefits
1. **Improved Performance**: Uses appropriate context windows based on
conversation length
2. **Better Resource Utilization**: Context window size scales with
content
3. **Maintained Compatibility**: Works with existing business logic
4. **Predictable Scaling**: Context growth in 8192-token increments
5. **Smart Updates**: Context size updates are optimized to reduce
unnecessary model reloads
## Future Considerations
1. Fine-tune buffer ratio based on usage patterns
2. Add monitoring for context window utilization
3. Consider language-specific token counting optimizations
4. Implement adaptive threshold based on conversation patterns
5. Add metrics for context size update frequency
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
This PR updates the MySQL container configuration by setting the
parameter --binlog_expire_logs_seconds to 604800 seconds (7 days). This
change ensures that MySQL automatically purges binary logs older than 7
days, helping to conserve disk space and maintain precise log
management.
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Feat: Add RadioGroup component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: When Excel is a formula, the parsed result is a formula, but cannot
be correctly parsed as a value type
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: tangyu <1@1.com>
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] add test cases
### What problem does this PR solve?
Introduced delete_knowledge_graph
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] Documentation Update
### What problem does this PR solve?
When I use the categorization operator, I find that if the keyword I
want to Categorize appears repeatedly in the input, then I cannot judge
the word that appears most frequently. Instead, I simply get the word
that matches and return all the ones that have made the following
changes to the categorize filter.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
- [x] Performance Improvement
### What problem does this PR solve?
Feat: Add logo-with-text-white.svg #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Prevent applications from failing to start due to calling non-existent
or incorrect Minio connection configurations when using file storage
outside of Minio
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
related issue #5882
### What problem does this PR solve?
update helm infinity image version from v0.5.0
image to infiniflow/infinity:v0.6.0-dev3
to solve issue #5882
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Fix:flow DB Assistant module translate to zh
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Chenzy <chenzy901@gmail.com>
### What problem does this PR solve?
There is a small bug in the update dataset of this document. The return
type of rag_oobject.list_datasets is a list type, and the first item
should be taken as' ragflow_stdk.modules.dataset ' DataSet`, Adapt to
the update.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Removed set_entity and set_relation to avoid accessing doc engine during
graph computation.
Introduced GraphChange to avoid writing unchanged chunks.
### Type of change
- [x] Performance Improvement
### What problem does this PR solve?
1. for /mv API use get by ids to avoid O(n) DB IO
2. for /list remove one useless call
### Type of change
- [x] Performance Improvement
Added the with_retry decorator in db_models.py to add a retry mechanism
for database operations. Applied the retry mechanism to the lock and
unlock methods of the PostgresDatabaseLock and MysqlDatabaseLock classes
to enhance the reliability of lock operations.
### What problem does this PR solve?
resolve failed to acquire lock exception with retry mechanism
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
### What problem does this PR solve?
Fix: Resolve FlowSetting not reading Title from .ts files
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Enhance Langfuse API: add project_id and project_name
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
For now, if you use thinking model (deepseek-r1:32b with ollama server
in my case) in "Keyword" node, result contains all <think> block and so
node return not only keywords
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
improve the logic to fetch parent folder, remove the useless DB IO logic
### Type of change
- [x] Performance Improvement
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] update test cases
### What problem does this PR solve?
1. miss completion delimiter.
2. miss bracket process.
3. doc_ids return by update_graph is a set, and insert operation in
extract_community need a list.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
for batch requests based on get_by_ids to fetch all files first replace
the O(n) IO logic.
### Type of change
- [x] Performance Improvement
### What problem does this PR solve?
Feat: Add LangfuseCard component. #6155
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Langfuse update model has no fields attribute
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: lizheng@ssc-hn.com <lizheng@ssc-hn.com>
### What problem does this PR solve?
Feat: Add background-core-standard to tailwind.css #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
support pic base bullet for PPT
modify one mistake in document
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
When using the online large model API knowledge base to extract
knowledge graphs, frequent Rate Limit Errors were triggered,
causing document parsing to fail. This commit fixes the issue by
optimizing API calls in the following way:
Added exponential backoff and jitter to the API call to reduce the
frequency of Rate Limit Errors.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
###Address Problem:
The original implementation used re.sub(r"(\\\"|\")", "", content) which
stripped all quotes from the processed content. While this worked for
simple Jinja2-rendered templates, it caused formatting issues when :
-Quotes were required in the final output (e.g., JSON, Python Code
strings)
###Solution:
1. Selective JSON Serialization.
2. Removed Global Quote Removal
### What problem does this PR solve?
This PR addresses an issue in template processing where all quotation
marks (" and \") were being removed from content, potentially corrupting
string formatting in rendered outputs. **In fact, extra quotes is
generated by json.dumps(v, ensure_ascii=False).**
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
This PR resolves the issue of multiple top-level packages being detected
in the Python project, which caused errors when using uv pip install.
The problem occurred because the project had multiple directories files
at the root level, leading to a flat-layout error.
To fix this, the pyproject.toml file was updated to explicitly list the
packages using the [tool.setuptools] section. This ensures that the
correct packages are included during installation, avoiding the
flat-layout error.
Type of change
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: If the Transfer item is disabled, the item cannot be edited. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Adds hierarchical title path tracking for tables in DOCX documents to
improve context association. Previously, extracted tables lacked
positional context within document structure.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix the error where the Ollama embeddings interface returns a “500
Internal Server Error” when using models such as xiaobu-embedding-v2 for
embedding.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- Introduce the `check_duplicate_ids` function in `dataset.py` and
`doc.py` to check for and handle duplicate IDs.
- Update the deletion operation to ensure that when deleting datasets
and documents, error messages regarding duplicate IDs can be returned.
- Implement the `check_duplicate_ids` function in `api_utils.py` to
return unique IDs and error messages for duplicate IDs.
### What problem does this PR solve?
Close https://github.com/infiniflow/ragflow/issues/6234
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Add user registration toggle feature. Added a user registration
toggle REGISTER_ENABLED in the settings and .env config file. The user
creation interface now checks the state of this toggle to control the
enabling and disabling of the user registration feature.
the front-end implementation is done, the registration button does not
appear if registration is not allowed. I did the actual tests on my
local server and it worked smoothly.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Fix: Optimized the get_by_id method to resolve the issue of missing
exceptions and improve query performance
### What problem does this PR solve?
Optimized the get_by_id method to resolve the issue of missing
exceptions and improve query performance.
Optimization details:
1. The original method used a custom query method that required
concatenating SQL, which impacted performance.
2. The query method returned a list, which needed to be accessed by
index, posing a risk of index out-of-bounds errors.
3. The original method used except Exception to catch all errors, which
is not a best practice in Python programming and may lead to missing
exceptions. The get_or_none method accurately catches DoesNotExist
errors while allowing other errors to be raised normally.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Performance Improvement
### What problem does this PR solve?
Call register_scripts on connecting redis
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Change Content
- A new function `load_env_file` has been added to load environment
variables from a .env file in the current script directory.
- If the .env file exists, the variables within it will be loaded; if it
does not exist, a warning message will be output.
I found this issue while testing this pr:
https://github.com/infiniflow/ragflow/pull/6327. The locally started
server did not read the REGISTER_ENABLED variables in the .env. The
result has always been the default True
### What problem does this PR solve?
Follow the tutorial in the README.md to start from source code. base's
container that is es、redis,etc will load .env. Therefore,
`launch_backend_service.sh` should also load .env to be consistent with
the configuration of the docker container when it was started
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
- Fix incorrect progress check condition that prevented re-parsing of
completed documents
- Allow parsing for documents with progress 0.0 (not started) or 1.0
(completed)
- Only block parsing for documents currently in progress (0.0 < progress
< 1.0)
Close#6312
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Fix: Resolved a bug where sibling components in Canvas were not
restricted to fetching data from the upstream when parallel components
were present.
Issue: When parallel components existed in Canvas, sibling components
incorrectly fetched data without being limited to the upstream scope,
causing data retrieval issues.
Solution: Adjusted the data fetching logic to ensure sibling components
only retrieve data from the upstream scope.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add fallback for PDF figure parser
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Optimize setting configuration initialization to resolve Minio
initialization error caused by using a specific storage.
Reproduction Scenario:
Using Aliyun OSS as the backend storage with the STORAGE_IMPL
environment variable set to OSS.
The service_conf.yaml.template configuration file contains OSS-related
configurations, while other storage configurations are commented out.
When the service starts, it still attempts to initialize the Minio
storage. Since there is no Minio configuration in
service_conf.yaml.template, it results in an error due to the missing
configuration file.
Optimization Measures:
Automatically determine the required initialization configuration based
on the environment variable.
Do not initialize configurations for unused resources.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add VLM-boosted PDF parser if VLM is set.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: In the Agent's workflow, the input content cannot be wrapped, and
\n will not work, otherwise an error will be reported #6241
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
When using LLM for auto-tag, if there are no examples, the tag format
generated by LLM may be wrong. This will cause Elasticsearch insert
errors. Adding basic examples can avoid this problem.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Alter TreeView component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add history version save
- Allows users to view and download agent files by version revision
history

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Blank and createFromNothing were not read from the i18n file when Agent
was created
创建Agent的时候 Blank 和 createFromNothing 没从i18n文件中读取
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix#5719
Add data type validation for parser_config
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Alter TransferList props #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add vision LLM PDF parser
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Add support for non-stream response with session.ask_without_stream and
fix a typo mistake in python API doc
There are requirements for non-stream response, especially for commands
exection, e.g. text2SQL. The commands have to be completed before the
agent is triggered.
### What problem does this PR solve?
It's to fix the [Issue:
6206](https://github.com/infiniflow/ragflow/issues/6206)
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
---------
Co-authored-by: Howard WU <yuanhao.wu@ifudata.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Add TreeView component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
For the create_inputs method based on np operation to replace for loop
### Type of change
- [x] Performance Improvement
### What problem does this PR solve?
This pull request (PR) incorporates codes for parsing PPTX files, aiming
to more precisely depict text in list formats (hint list by .).
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Fix: Fixed the issue that events cannot be triggered after the shadcn-ui
dialog is closed#3221.
Refer to [Combobox in a form in a dialog isn't working.
#1748](https://github.com/shadcn-ui/ui/issues/1748#issuecomment-2720130543)
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Regards kb_id at ElasticSearch insert, update, delete. Close#6066
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Resolve document concurrent upload issue. #6039
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: Knowledge base page cannot upload folders #6062
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: Prevent password boxes other than login passwords from displaying
passwords saved in the browser's password manager by default. #6033
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Change “Document parser” to "PDF parser" #6072
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
AWS Bedrock has made deepseek-r1 available on its serverless inference.
This adds the R1 serverless model for use via the bedrock model
abilities.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
…ions
### What problem does this PR solve?
This PR fixes an issue where the application was repeatedly reading the
llm_factories.json file from disk in multiple places, which could lead
to "Too many open files" errors under high load conditions. The fix
centralizes the file reading operation in the settings.py module and
stores the data in a global variable that can be accessed by other
modules.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
- The API documentation lacks detailed error code explanations. Added
error code tables to `python_api_reference.md` and
`http_api_reference.md` to clarify possible error codes and their
meanings.
- Error handling in the codebase is inconsistent. Standardized error
handling logic in `sdk/python/ragflow_sdk/modules/chunk.py`.
- Improved API comments by adding standardized docstrings to enhance
code readability and maintainability.
### Type of change
- [x] Documentation Update
- [x] Refactoring
### What problem does this PR solve?
fix chat_completion answer data incorrect
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: renqi <renqi08266@fxomail.com>
### What problem does this PR solve?
Optimize OCR garbage identification to reduce unnecessary filtering.
#5713
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Set the default value of Chunk token number to 512 #6016
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add CSV file parsing support #4552, #5849, #5870
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Alter Item to TransferListItemType #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Why can't Retrieval component support internet web search. #5973
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
…session_id does not exist in the session
For an Agent with an Input Begin value, on the first call the return
session_id does not exist in the session
### What problem does this PR solve?
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
When creating and updating chats, add a check for the parsing status of
knowledge base documents. Ensure that all documents have been parsed
before allowing chat creation to improve user experience and system
stability.
**Main Changes:**
- Add document parsing status check logic in `chat.py`.
- Implement the `is_parsed_done` method in `knowledgebase_service.py`.
- Prevent chat creation when documents are being parsed or parsing has
failed.
### What problem does this PR solve?
fix this bug:https://github.com/infiniflow/ragflow/issues/5960
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
This commit refactors the deep research module (deep_research.py), with
the following major improvements: The complex thinking and retrieval
logic has been broken down into multiple independent private methods,
enhancing code readability and maintainability. Static methods and class
methods have been introduced to simplify the logic for tag processing.
The search and reasoning processes have been optimized, increasing the
modularity of the code. The flexibility of information retrieval and
processing has been improved. The refactored code structure is now
clearer, making it easier to understand and extend the functionality of
the deep research module.
### What problem does this PR solve?
increase the modularity of the code
### Type of change
- [x] Refactoring
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
### What problem does this PR solve?
Fixes#5923
Fixes the readonly variables from payload at
/datasets/<dataset_id>
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
Now if user tries to modify readonly values then it will show " The
input parameters are invalid. "
invalid_keys = {"id", "embd_id", "chunk_num", "doc_num", "parser_id",
"create_date", "create_time", "created_by",
"status","token_num","update_date","update_time"}
if any(key in req for key in invalid_keys):
return get_error_data_result(message="The input parameters are
invalid.")
i have include those readonly keys in invalid_keys
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Raghav <2020csb1115@iitrpr.ac.in>
### What problem does this PR solve?
Fix:signal.SIGUSR1 and signal.SIGUSR2 can't use in window. so don't bind
signal.SIGUSR1 and signal.SIGUSR2 in the windows env
### Type of change
- [✓ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Co-authored-by: tangyu <1@1.com>
### What problem does this PR solve?
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: renqi <renqi08266@fxomail.com>
### What problem does this PR solve?
Feat: Add Breadcrumb component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add Support for german language
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fixed #5839
This PR fix error code 102, stating dataset_ids is required.
curl --request POST \
--url http://{address}/api/v1/chats \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"name": "test_chat"
}'
this is not getting datasetids , fix for it.
file location : sdk\python\ragflow_sdk\ragflow.py
added : "dataset_ids": dataset_list if dataset_list else [],
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: Raghav <2020csb1115@iitrpr.ac.in>
### What problem does this PR solve?
Feat: Add AvatarGroup component. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
**generate.py 更新:**
问题:部分模型提供商对输入对话内容的格式有严格校验,要求第一条内容的 role 不能为 assistant,否则会报错。
解决:删除了系统设置的 agent 开场白,确保传递给模型的对话内容中,第一条内容的 role 不为 assistant。
**retrieval.py 更新:**
问题:当前知识库检索使用全部对话内容作为输入,可能导致检索结果不准确。
解决:改为仅使用用户最后提出的一个问题进行知识库检索,提高检索的准确性。
**Update generate.py:**
Issue: Some model providers have strict validation rules for the format
of input conversation content, requiring that the role of the first
content must not be assistant. Otherwise, an error will occur.
Solution: Removed the system-set agent opening statement to ensure that
the role of the first content in the conversation passed to the model is
not assistant.
**Update retrieval.py:**
Issue: The current knowledge base retrieval uses the entire conversation
content as input, which may lead to inaccurate retrieval results.
Solution: Changed the retrieval logic to use only the last question
asked by the user for knowledge base retrieval, improving retrieval
accuracy.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Performance Improvement
When accessing the /api/v1/agents/{agent_id}/completions API, sessions
created before agent modifications retain the old DSL data. To use the
latest agent configuration (like new prompts) in historical sessions, I
added the sync_dsl parameter. It defaults to False to maintain existing
behavior and only synchronizes when set to True. If needed, a manual
synchronization API can be created to trigger the sync explicitly.
### What problem does this PR solve?
Fix: keyword compont display issue #5794
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: When selecting a reordering model, give a prompt that it takes too
long. #5834
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add TransferList component. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
The `/api/v1/chats` API endpoint was broken, any GET request got the
following response:
```
{"code":100,"data":null,"message":"TypeError(\"'int' object is not callable\")"}
```
With this log ragflow-server side:
```
2025-03-07 14:36:26,297 ERROR 20 'int' object is not callable
Traceback (most recent call last):
File "/ragflow/.venv/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
rv = self.dispatch_request()
File "/ragflow/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
File "/ragflow/api/utils/api_utils.py", line 303, in decorated_function
return func(*args, **kwargs)
File "/ragflow/api/apps/sdk/chat.py", line 323, in list_chat
logging.WARN(f"Don't exist the kb {kb_id}")
TypeError: 'int' object is not callable
2025-03-07 14:36:26,298 INFO 20 172.18.0.6 - - [07/Mar/2025 14:36:26] "GET /api/v1/chats HTTP/1.1" 200 -
```
This was caused by the incorrect use of `logging.WARN` as a method (it's
a loglevel object), instead of the correct `logging.warning()` method.
This PR fixes that, and also rewrites the message to be grammaticaly
correct.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
vLLM provider with a reranking model does not work : as vLLM uses under
the hood the [CoHereRerank
provider](https://github.com/infiniflow/ragflow/blob/v0.17.0/rag/llm/__init__.py#L250)
with a `base_url`, if this URL [is not passed to the Cohere
client](https://github.com/infiniflow/ragflow/blob/v0.17.0/rag/llm/rerank_model.py#L379-L382)
any attempt will endup on the Cohere SaaS (sending your private api key
in the process) instead of your vLLM instance.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
fix:when start with source code not in docker env report
"UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 5:
illegal multibyte sequence" in windows
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: tangyu <1@1.com>
### What problem does this PR solve?
Fixed the issue of "stop deleting when encountering invalid dataset ID"
#5760
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Enhance the recognition of both borderless and bordered Markdown tables.
Add support for extracting HTML tables, including various scenarios with
nested HTML tags. Improve performance by using conditional checks to
reduce unnecessary regular expression matching.
### What problem does this PR solve?
Optimize the table extraction logic in the Markdown parser:
Enhance the recognition of both borderless and bordered Markdown tables.
Add support for extracting HTML tables, including various scenarios with
nested HTML tags.
Improve performance by using conditional checks to reduce unnecessary
regular expression matching.
### Type of change
- [x] Performance Improvement
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
1. **Issue**: When calling `list_agent_session` via the HTTP API, users
may only need to display conversation messages, and do not want to see
the associated dsl, which can be very large. Therefore, consider adding
a control option to determine whether the DSL should be returned, with
the default being to return it.
2. **Documentation Discrepancy**: In the HTTP API documentation, under
"List agent sessions," the "Response" section states that the "data"
field is a dictionary when "success" is returned. However, the actual
returned data is a list. This discrepancy has been corrected.
### What problem does this PR solve?
Fix: Fixed the issue that files cannot be uploaded on the file
management page. #5730
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
The `dialog_id` field was inconsistently defined:
- In the `migrate_db()` function, it was set to `null=True`.
- In the model class, it was defined as `null=False`.
This inconsistency caused an issue during the initial deployment where
the database table did not allow `dialog_id` to be null. As a result,
calling `APITokenService.save(**obj)` in `system_app.py` raised the
following error:
```
peewee.IntegrityError: null value in column "dialog_id" violates not-null constraint
```
### What problem does this PR solve?
Error: peewee.IntegrityError: null value in column "dialog_id" violates
not-null constraint
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
close#5730
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
### What problem does this PR solve?
Fix: Remove the document language parameter. #5686
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: Remove the max token parameter. #5640#5646
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add rerank option to huggingface's model type drop-down box. #5658
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Use react-hook-form to synchronize the data of the categorize form
to the agent node. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: The parsing method is paper and needs to display Document parser.
#5467
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Refactored DocumentService.update_progress
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
The `ocr.res` file is already included in the model directory
`rag/res/deepdoc`, but it doesn't seem to be utilized here.
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
close issue #5600
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Performance Improvement
---------
Co-authored-by: wangwei <dwxiayi@163.com>
### What problem does this PR solve?
Fix the issue where, when getting a user's APIToken, if the user is part
of another user's team, it incorrectly gets the Team owner's APIToken
instead.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Render DynamicCategorize with shadcn-ui. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Render MessageForm with shadcn-ui. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
As title export PYTHONPATH in the shell
### Type of change
- [x] Refactoring
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
### What problem does this PR solve?
Introduced jemalloc.
Python uses pymalloc (which is an reimplementation of gblibc malloc) to
manage RES. It has pools for small objects to avoid returning memory to
OS aggressively. My experience is: Replacing pymalloc with
[jemalloc](https://github.com/jemalloc/jemalloc) can reduce RES and
speedup task_executor.py.
### Type of change
- [x] Performance Improvement
### What problem does this PR solve?
Fix may lose part of information of last stream chunck
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Adds a new Kubernetes Service resource to the Helm chart which
specifically targets the RAGFlow API. This feature useful for cases
where you want to expose the RAGFlow HTTP API separately from the web
interface, for example if RAGFlow is running behind an authenticating
proxy it allows a route to bypass the proxy (e.g. by defining a separate
ingress resource which forwards to the separate API-only k8s service
added here) to provide RAGFlow API access. This is still secure since
API access is already authenticated by API keys inside RAGFlow itself.
### Type of change
- [X] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: LLM with ___ return cannot be deleted #5585
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Render WikipediaForm and BaiduForm with shadcn-ui. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
the api doc is too long, add a toc might be better

### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Feat: Render QWeatherForm with shadcn-ui. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add sessions deletion support for agent in http and python api
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Render RewriteQuestionForm with shadcn-ui #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Combine Select and LlmSettingFieldItems into LLMSelect. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add NextLLMSelect with shadcn-ui. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
This pull request aims to fix a bug that prevents certain email
addresses from signing up. The affected TLDs were returning 'invalid
email address' errors:
.museum
.software
.photography
.technology
.marketing
.education
.international
.community
.construction
.government
.consulting
....
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
close#5277 by make sure the file close
### Type of change
- [x] Performance Improvement
---------
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
### What problem does this PR solve?
Feat: Add the Experimental text to the option of the large model of the
Image2text type of LayoutRecognizeItem
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Hide the suffix of the large model name. #5433
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Wrap MaxTokenNumber with DatasetConfigurationContainer. #5467
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Put the configuration of different parsing methods into separate
components. #5467
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
when develop ragflow local there would be a hash file generate that is
kind of not good for develop
this patch add a regex to `.gitignore` for better developing
### Type of change
- [x] Refactoring
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
### What problem does this PR solve?
Fix lots of typos.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Enhance aliyun oss access with adding prefix path.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
This patch add signal for ctrl + c that can exit the code friendly
cause code base use thread daemon can not exit friendly for being
started.
how to reproduce
1. docker-compose -f docker/docker-compose-base.yml up
2. other window `bash docker/launch_backend_service.sh`
3. stop 1 first
4. try to stop 2 then two thread can not exit which must use `kill pid`
This patch fix it
and should fix most the related issues in the `issues`
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
### What problem does this PR solve?
This patch drop useless fastext which is seems useless in the code base
and its very kind of hard install
should close#4498
### Type of change
- [x] Refactoring
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
### What problem does this PR solve?
This patch fix most of the issues like #4853#5038 and so on
the root reason is that we need to add the hostname to the `/etc/hosts`
which is not wrote in main README
and the code side read `conf/service_conf.yaml` as settings
and its hard for developers to debug, this patch fix it, or maybe can
discuss better solution here
### Type of change
- [x] Refactoring
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
### What problem does this PR solve?
Feat: Modify the parsing method string to an enumeration type. #5467
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: If the user is not logged in, jump to the login page by
refreshing.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix special delimiter parsing issue #5382
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Use shadcn-ui to build GenerateForm. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
```
Invoke agent
To be able to interact dynamically with the API, there is a customizable Data Type JSON or FormData, the default is JSON
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Wrap DynamicVariableForm with Collapsible. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
seems no need use ABC here, there's no `abstractmethod` here
### Type of change
- [x] Performance Improvement
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
This patch remove dangerous code that `may expand into
attacker-controllable code`
more:
```cli
error[template-injection]: code injection via template expansion
--> /Users/hyi/prs/ragflow/.github/workflows/tests.yml:35:9
|
35 | - name: Show PR labels
| ^^^^^^^^^^^^^^^^^^^^ this step
36 | run: |
| _________^
37 | | echo "Workflow triggered by ${{ github.event_name }}"
38 | | if [[ ${{ github.event_name }} == 'pull_request' ]]; then
39 | | echo "PR labels: ${{ join(github.event.pull_request.labels.*.name, ', ') }}"
40 | | fi
| |____________^ github.event.pull_request.labels.*.name may expand into attacker-controllable code
|
= note: audit confidence → High
```
using zizmor to check
https://woodruffw.github.io/zizmor/
but this patch do not fix them all, just remove high audit confidence →
High
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [x] Other (please describe):
---------
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Fixed OpenAI compatibility stream [DONE]
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Add DualRangeSlider #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add OpenAI-compatible http and python api reference
### Type of change
- [x] Documentation Update
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
### What problem does this PR solve?
Run keyword_extraction, question_proposal, content_tagging in threads
### Type of change
- [x] Performance Improvement
### What problem does this PR solve?
Added OpenAI-like completion api, related to #4672, #4705
This function allows users to interact with a model to get responses
based on a series of messages.
If `stream` is set to True, the response will be streamed in chunks,
mimicking the OpenAI-style API.
#### Example usage:
```bash
curl -X POST https://ragflow_address.com/api/v1/chats_openai/<chat_id>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $RAGFLOW_API_KEY" \
-d '{
"model": "model",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"stream": true
}'
```
Alternatively, you can use Python's `OpenAI` client:
```python
from openai import OpenAI
model = "model"
client = OpenAI(api_key="ragflow-api-key", base_url=f"http://ragflow_address/api/v1/chats_openai/<chat_id>")
completion = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who you are?"},
{"role": "assistant", "content": "I am an AI assistant named..."},
{"role": "user", "content": "Can you tell me how to install neovim"},
],
stream=True
)
stream = True
if stream:
for chunk in completion:
print(chunk)
else:
print(completion.choices[0].message.content)
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### Related Issues
Related to #4672, #4705
### What problem does this PR solve?
As issue #3268 mentioned, "Chun not found!" exception will occur,
especially during the teamwork of knowledge bases.
### The reason of this bug
"tenants" are the people on current_user's team, including the team
owner itself. The old one only checks the first "tenant", tenants[0],
which will cause error when anyone editing the chunk that is not in
tenants[0]'s knowledge base.
My modification won't introduce new errors while iterate all the tenant
then retrieve knowledge bases of each.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add Tavily Api Key to chat configuration modal. #5198
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: Due to the reference to tailwindcss, the height attribute setting
of the image is invalid, resulting in an uneven model list #5339
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add DynamicVariableForm with shadcn-ui. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
```
fixed type-script on MessageInput change to TextArea
```
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add FormDrawer to agent page. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Replaced pypi.tuna.tsinghua.edu.cn with mirrors.aliyun.com/pypi.
I notice aliyun.com sometimes is much faster than tsinghua.edu.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Feat: Render agent details #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Render operator menu by category. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Disable Max_token by default #5283
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add PageHeader to DatasetWrapper #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Partitions the upload of documents in parts of 20 to avoid the size
limit error. Allows uploading 100s of documents on a single interaction.
### Type of change
- [X] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
This pull request includes changes to the initialization logic of the
`ChatModel` and `EmbeddingModel` classes to enhance the handling of AWS
credentials.
Use cases:
- Use env variables for credentials instead of managing them on the DB
- Easy connection when deploying on an AWS machine
### Type of change
- [X] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
This pull request includes changes to the `api/settings.py` and
`docker/service_conf.yaml.template` files to add support for default
models in the LLM configuration (specially for LIGHTEN builds). The most
important changes include adding default model configurations and
updating the initialization settings to use these defaults.
For example:
With this configuration Bedrock will be enable by default with claude
and titan embeddings.
```
user_default_llm:
factory: 'Bedrock'
api_key: '{}'
base_url: ''
default_models:
chat_model: 'anthropic.claude-3-5-sonnet-20240620-v1:0'
embedding_model: 'amazon.titan-embed-text-v2:0'
rerank_model: ''
asr_model: ''
image2text_model: ''
```
### Type of change
- [X] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
This PR supports downloading models from ModelScope. The main
modifications are as follows:
-New Feature (non-breaking change which adds functionality)
-Documentation Update
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Add RAGFlowSelect component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
add non-stream mode support to session.ask function
### What problem does this PR solve?
same as title, I do not know why the stream=False is not work on the
server side also, when stream=False, the response in the
session._ask_chat is a fully connnected SSE string.
This is a quick fix on the sdk side to make the response format align
with API docs
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add AgentTemplates component. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Optimized Recognizer.sort_X_firstly and Recognizer.sort_Y_firstly
### Type of change
- [x] Performance Improvement
### What problem does this PR solve?
Feat: Add SearchPage component. #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add reasoning item to chat configuration modal #5173
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Modify embedding model ID comparison to remove vendor suffixes, ensuring
consistent model identification when working with multiple knowledge
bases. This change affects dialog creation, chat operations, and
document retrieval test functions.
### What problem does this PR solve?
resolve this bug: https://github.com/infiniflow/ragflow/issues/5166
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
### What problem does this PR solve?
Feat: Show formulas when answering, show reference labels in style,
remove cursor flashing effect. #5173
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Right now we cannot embed a chat in website when it has variables in the
begin component.
This PR tries to read the variables values from the query string via a
data_ prefixed variable.
#5016
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: gstrat88 <gstrat@innews.gr>
### What problem does this PR solve?
just resolve issue: [Improve message input handling with Shift+Enter
support](https://github.com/infiniflow/ragflow/issues/5116)
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
### What problem does this PR solve?
Feat: Add insert variable icon in the header of prompt editor. #4764
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Support preview of HTML files #5096
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Allow the Rewrite operator to connect to the Generate operator
#1739
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
add support for update document meta data through api
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
…o that all services (including the es and infinity containers) can be
started correctly, and resolve the Failed to resolve 'es01' #4875
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/4875
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
API options like `stream` was ignored when no session_id was provided.
This PR fixes the issue.
Test command and expected result:
```
curl --request POST \
--url http://:9222/api/v1/chats/2f2e1d30ee6111efafe211749b004925/completions \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ragflow-xxx' \
--data '{
"question":"Who are you",
"stream":false
}'
{"code":0,"data":"data:{\"code\": 0, \"message\": \"\", \"data\": {\"answer\": \"Hi! I'm your assistant, what can I do for you?\", \"reference\": {}, \"audio_binary\": null, \"id\": null, \"session_id\": \"82ceb0fcee7111efafe211749b004925\"}}\n\n"}
```
### Type of change
- [*] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
This PR fixes an AttributeError in the all_tags method of the Dealer
class. Previously, the method incorrectly called
self.docStoreConn.indexExist instead of self.dataStore.indexExist. Since
self.docStoreConn was never set (and self.dataStore is already
initialized in init), this resulted in an error when attempting to check
if the index exists. This change ensures that the proper connector is
used for the index existence check, thereby resolving the issue._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Write the thinking style in the MarkdownContent layer #4930
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Invoke component can be used to call third party services.
Tried GET/POST/PUT from web UI, and found PUT request failed like this:
(test api: api/v1/chats/<assistant_id>)
```
{"code":100,"data":null,"message":"AttributeError("'NoneType' object has
no attribute 'get'")"}
```
Root cause: Invoke PUT with a 'data=args' parameter, which is a form-encoded data, however the default content type setting of request header is application/json. The test api could not deal with such case.
Fix: use the 'json' parameter of reqeusts.put(), same as Invoke POST. Do not use the 'data' parameter.
Another way is to use 'data=json.dumps(args)'.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add ChunkedResultPanel #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: Chunk problem tag content cannot be displayed completely. #5076
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Extract the common parts of groupImage2TextOptions and
groupOptionsByModelType #5063
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add LanguageAbbreviation to simplify language resource files.
#5065
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: The max tokens defined by the tenant are not used (#4297) (#2817)
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
The current design is not well-suited for multimodal models, as each
model can only be configured for a single purpose—either chat or
Img2txt. To work around this limitation, we use model aliases such as
gpt-4o-mini and gpt-4o-mini-2024-07-18.
To fix this, this PR allows specifying the Img2txt model by tag instead
of model_type.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Update the agent session API "POST /api/v1/agents/{agent_id}/sessions",
to support uploading files while create a new session:
- currently, the API only supports requesting with a json body. If user
wants to upload a doc or image when create session, like what is already
supported on the web client, we need to update the API.
- if upload an image, ragflow will call image2text, and a user_id is
needed for the image2text model. So we need to send user_id in the API
request. As form-data is needed to upload files, not json body, seems we
need to put the user_id in the url as an optional parameter (currently
user_id is an optional in json body).
### Type of change
- [x] Documentation Update
- [x] Other (please describe):
### What problem does this PR solve?
Feat: Replace next-login-bg.svg #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: Cannot distinguish between export and import icons #5025
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add a LLM provider: PPIO
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
### What problem does this PR solve?
Fix knowledge graph node not found (#4968)
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add background color to GraphRag configuration #4980
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add ParsedPageCard component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Jump from the chunk page to the dataset page #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add an id to the dataset testing route #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### 🛠 Fixes `KeyError: 'content'` when using `stream=False`
#### 🔍 Problem
When calling the chat API with `stream=False`, the code attempts to
access `msg[-1]["content"]` without verifying if the key exists. This
causes a `KeyError` when the message structure does not contain
`"content"`.
This issue was discussed in
[#4885](https://github.com/infiniflow/ragflow/issues/4885), where we
analyzed the root cause. The error does not occur with `stream=True`, as
the response is processed differently.
#### ✅ Solution
- **Logging Fix:**
- Before accessing `msg[-1]["content"]`, we check if the key exists.
- If it does not exist, a default value (`"[content not available]"`) is
used to prevent errors.
- **Structural Fix in `msg` Construction:**
- Ensured that every message in `msg` contains the `"content"` key, even
if empty.
- This fixes the issue at its root and ensures consistent behavior
between `stream=True` and `stream=False`.
#### 🔄 Impact
- Prevents the `KeyError` without affecting normal application flow.
- Ensures the integrity of the `msg` structure, avoiding future
failures.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Bind data to datasets page #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Display Think for Deepseek R1 model #4903
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add ChatInput component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: After deleting all conversation lists, the chat input box can still
be used for input. #4907
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add LlmSettingFieldItems component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Use `json.loads()` instead.
### What problem does this PR solve?
Using `eval()` can lead to code injections. I think this loads a JSON
field, right? If yes, why is this done via `eval()` and not
`json.loads()`?
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Use `np.float32()` instead.
### What problem does this PR solve?
Using `eval()` can lead to code injections.
I think `eval()` is only used to parse a floating point number here.
This change preserves the correct behavior if the string `"None"` is
supplied. But if that behavior isn't intended then this part could be
just deleted instead, since `np.float32()` is parsing strings anyway:
```Python
if isinstance(scale, str):
scale = eval(scale)
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
if *.xls file is too large, .eg >50M, I get error.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: Knowledge base page crashes when network connection is lost. #4894
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add hatPromptEngine component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add ChatBasicSetting component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix categorize agent input content not format error
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: wangrui <wangrui@haima.me>
### What problem does this PR solve?
Feat: Add Sessions component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Modify the Preset configurations item style to distinguish it from
other fields #4844
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: Fail to open console with Firefox #4816
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Remove begin's width from agent templates #4764
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Fixed the issue where the prompt always displayed the initial
value when switching between different generate operators #4764
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add VariablePickerMenuPlugin to select variables in the prompt
text box by menu #4764
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: The requested interface timeout will cause the page to crash #4787
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
…e error " message
### What problem does this PR solve?
optimize TenantLLMService.increase_usage Performance
### Type of change
- [x] Performance Improvement
Co-authored-by: che_shuai <che_shuai@massclouds.com>
### What problem does this PR solve?
Feat: Supports docx in the MANUAL chunk method and docx, markdown, and
PDF in the Q&A chunk method #3957
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: New user can't accept invite without configuring LLM API #4379
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix Portuguese (Brazil) translation
Adding portuguese to Knowledge adding settings.
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
- [X] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
### What problem does this PR solve?
Fix: Chat Assistant page goes blank #4566
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fixed GPU detection on CPU only environment. Close#4692
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
ERROR: 'Stream' object has no attribute 'iter_lines' with reference to
Claude/Anthropic chat streams
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: Kyle Olmstead <k.olmstead@offensive-security.com>
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/4319
This pull request includes several changes to improve the Docker setup
and documentation for the project. The most important changes include
updating the Dockerfile to support modern versions of Rust, adding a new
Docker Compose configuration for macOS, and updating the build
instructions in the documentation.
Improvements to Docker setup:
*
[`Dockerfile`](diffhunk://#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557L80-R107):
Added installation steps for a modern version of Rust and updated the
logic for installing the correct ODBC driver based on the architecture.
*
[`docker/docker-compose-macos.yml`](diffhunk://#diff-8e8587143bb2442c02f6dff4caa217ebbe3ba4ec8e7c23b2e568886a67b00eafR1-R56):
Added a new Docker Compose configuration file specifically for macOS,
including service dependencies, environment variables, and volume
mappings.
Updates to documentation:
*
[`docs/guides/develop/build_docker_image.mdx`](diffhunk://#diff-d6136bb897f7245aae33b0accbcf7c508ceaef005c545f9f09cad3cada840a19L44-R44):
Updated the build instructions to use the new Docker Compose
configuration for macOS instead of the previous Docker build command.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
---------
Signed-off-by: Samuel Giffard <samuel.giffard@mytomorrows.com>
### What problem does this PR solve?
Fix typos in the documents
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
### What problem does this PR solve?
Rename from 'Diagonal' to 'Forward slash'
Rename from 'Minus' to 'Dash'
issue: #4655
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fixed vertical alignment issues between icons and text in API-Key and
System Model Settings buttons. This improves visual consistency across
the settings interface.
### Type of change
- [x] Refactoring
Before: Icons and text were slightly misaligned vertically
<img width="635" alt="Screenshot 2025-01-23 at 20 22 46"
src="https://github.com/user-attachments/assets/28f15637-d3fd-45a2-aae8-ca72fb12a88e"
/>
After: Icons and text are now properly centered with consistent spacing
<img width="540" alt="Screenshot 2025-01-23 at 20 23 02"
src="https://github.com/user-attachments/assets/98bb0ca5-6995-42d8-bd23-8a8f44ec0209"
/>
### What problem does this PR solve?
Removed onnxruntime. It conflicts with the onnxruntime-gpu.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Set the style of the header tag #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Template conversion adds Jinjia2 syntax support
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: wangrui <wangrui@haima.me>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Add keyword item to AssistantSetting #4543
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: Capture the problem that the knowledge graph interface returns null
and causes page errors #4543
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Display the knowledge graph on the knowledge base page #4543
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Remove usage of `eval()` from postprocess.py
### What problem does this PR solve?
The use of `eval()` is a potential security risk. While the use of
`eval()` is guarded and thus not a security risk normally, `assert`s
aren't run if `-O` or `-OO` is passed to the interpreter, and as such
then the guard would not apply. In any case there is no reason to use
`eval()` here at all.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Other (please describe):
Potential security fix if somehow the passed `modul_name` could be user
controlled.
### What problem does this PR solve?
Latest Release button on Portugese Read me is not looking like it shuld
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Support for Portuguese language #4557
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add language Portugese from Brazil
### Type of change
- [X] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Make the scroll bar of the DatasetSettings page appear inside
#3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Rename document name #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
`eval(op_type)` -> `getattr(operators, op_type)`
### What problem does this PR solve?
Using `eval()` can lead to code injections and is entirely unnecessary
here.
### Type of change
- [x] Other (please describe):
Best practice code improvement, preventing the possibility of code
injection.
`eval(op_name)` -> `getattr(operators, op_name)`
### What problem does this PR solve?
Using `eval()` can lead to code injections and is entirely unnecessary
here.
### Type of change
- [x] Other (please describe):
Best practice code improvement, preventing the possibility of code
injection.
### What problem does this PR solve?
Fix: Translate the operator options of the Switch operator #1739
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Other (please describe):
### What problem does this PR solve?
Change index url per NEED_MIRROR. Close#4507
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Make the category operator form displayed in collapsed mode by
default #4505
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
[remarkjs/react-markdown/issues/785](https://github.com/remarkjs/react-markdown/issues/785)
Fix: Fixed an issue where math formulas could not be displayed correctly
#4405
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add the MessageHistoryWindowSizeItem to RewriteQuestionForm #1739
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add LinkToDatasetDialog #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Bump infinity to v0.6.0-dev2. Close#4477
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add GPUStack as a new model provider.
[GPUStack](https://github.com/gpustack/gpustack) is an open-source GPU
cluster manager for running LLMs. Currently, locally deployed models in
GPUStack cannot integrate well with RAGFlow. GPUStack provides both
OpenAI compatible APIs (Models / Chat Completions / Embeddings /
Speech2Text / TTS) and other APIs like Rerank. We would like to use
GPUStack as a model provider in ragflow.
[GPUStack Docs](https://docs.gpustack.ai/latest/quickstart/)
Related issue: https://github.com/infiniflow/ragflow/issues/4064.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### Testing Instructions
1. Install GPUStack and deploy the `llama-3.2-1b-instruct` llm, `bge-m3`
text embedding model, `bge-reranker-v2-m3` rerank model,
`faster-whisper-medium` Speech-to-Text model, `cosyvoice-300m-sft` in
GPUStack.
2. Add provider in ragflow settings.
3. Testing in ragflow.
### What problem does this PR solve?
Update description
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: Add background to next login page #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Metadata in documents for improve the prompt #3690
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: Can not select GPT-4o / 4o mini as Chat Model #4421
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: In order to distinguish the keys of a pair of messages, add a
prefix to the id when rendering the message. #4409
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Display tag word cloud on recall test page #4368
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
duckduckgo search 6.3.0 still has error sometimes, need to update to
7.2.0, after updated, it works ok.
this PR is going to fix this issue
https://github.com/infiniflow/ragflow/issues/4396
### Type of change
Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: xiaohzho <xiaohzho@cisco.com>
### What problem does this PR solve?
Fix: Modify the text of the category operator form #4412
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add TagFeatureItem #4368
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
1. Update error message
2. Remove space characters
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: Add tag_kwd parameter to chunk configuration modal #4368
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add description for tag parsing method #4368
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
fix bug, agent invoke can not get params from begin
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: wangrui <wangrui@haima.me>
### What problem does this PR solve?
Some old types of machine or virtual machine doesn't support AVX CPU
flag. This PR is to use lts polars module to avoid this fault.
fix issue: #4349
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
- Bring `STORAGE_IMPL` back in `rag/svr/cache_file_svr.py`
- Simplify storage connection when working with AWS S3
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Feat: Add ConfirmDeleteDialog #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix potential SSRF attack vulnerability
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Feat: Add model id to ExeSql operator form. #1739
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix agent_completion bug #4320
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Add FileUploadDialog #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Update exesql component for agent
### Type of change
- [x] Refactoring
---------
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Feat: Add DatasetCreatingDialog #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Update displayed_name to display_name
### Type of change
- [x] Refactoring
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Fix: Fixed the issue that the graph could not display the grouping #4180
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Update error message for agent name conflict
### Type of change
- [x] Refactoring
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
OpenAI has deprecated the gpt-4-vision-preview model. This PR adds
support for the newer gpt-4o and gpt-4o-mini models in the img2text
feature.

This PR add addtional 4o/4o-mini entry for img2text besides original
ones. Utilized [alias](https://platform.openai.com/docs/models#gpt-4o)
model names (e.g., gpt-4o-2024-08-06) because the database schema uses
the model name as the primary key.
- [x] Other (please describe): model update
### What problem does this PR solve?
Add top_k for create_chat and update_chat api #4157
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Fix bugs in chunk api #4149
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Fix: After executing npm i --force locally, the login page cannot be
opened #4290
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: The Begin and IterationStart operators cannot be deleted using
shortcut keys #4287
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix the bug in create_dataset function #4136
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Feat: Translate the system prompt of the generate operator #3993
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix BaiduFanyi TestRun parameter validation and debug method missing
errors
![Uploading Snipaste_2024-12-27_19-56-31.png…]()
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: wangrui <wangrui@haima.me>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Fix some bugs in text2sql.(#4279)(#4281)
### What problem does this PR solve?
- The incorrect results in parsing CSV files of the QA knowledge base in
the text2sql scenario. Process CSV files using the csv library. Decouple
CSV parsing from TXT parsing
- Most llm return results in markdown format ```sql query ```, Fix
execution error caused by LLM output SQLmarkdown format.### Type of
change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
- Create a text2sql agent document: Receipe, procedure, debug (
including step run), run.
### Type of change
- [ ] Documentation Update
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Delete useless code #4242
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: The edit box for the headers parameter of the invoke operator is
always loading. #4265
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Limit the iteration start node to only be the source node #4242
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add the iteration Node #4242
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
1. Refactor error message
2. Fix knowledges are created on ES and can't be found in Infinity. The
document chunk fetch error.
### Type of change
- [x] Fix bug
- [x] Refactoring
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Update version info to 0.15.1
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Missing update information
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix: Rename chat name, missing field 'avatar' #4125
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
---------
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
### What problem does this PR solve?
stop rerank by model when search result is empty, otherwise rerank may
raise an error (qwen).
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: 刘博 <liubo@ynby.cn>
### What problem does this PR solve?
Ignore the millisecond and microsecond value.
### Type of change
- [x] Refactoring
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
as title
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix: Fixed the issue that the page crashed when the node ID was the same
as the combo ID #4180
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
1. Change embedding model of knowledge base won't change the default
embedding model.
2. Retrieval test bug
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: If there is no result in the recall test, an empty data image will
be displayed. #4182
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
As title
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
The refactor in 0.15 contains a small bug that eliminate the
classification
### What problem does this PR solve
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix the issue when retrieving AWS credentials from the S3 configuration
from the settings module instead of getting from the environment
variables.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Update the component of the agent API with parameters.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
---------
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
### What problem does this PR solve?
Refactor ask decorator
### Type of change
- [x] Refactoring
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
…ple sheets
### What problem does this PR solve?
discussed in https://github.com/infiniflow/ragflow/pull/4102
- In excel_parser.py, `total` means the total number of rows in Excel,
but it return in the first iterate, that lead to the wrong `to_page`
- In table.py, it when Excel file has multiple sheets, it will be
divided into multiple parts, every part size is 3000, `data` may be
empty, because it has recorded in the last iterate.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add parameters for ask_chat and fix bugs in list_sessions
#4105
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Fixed infinity exception SCORE() / SCORE_FACTORS() requires Fusion or
MATCH TEXT or MATCH TENSOR. Close#4109
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Translate the previous run into parsing #4094
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Refactor inviting team member message.
### Type of change
- [x] Refactoring
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix: Fixed the issue where the required information in the input box was
incorrect when inviting users #2834
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
1. Fix initial build and load trie
2. Update comment
### Type of change
- [x] Refactoring
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix conversation bug in agent session
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Fix: Fixed the issue with external chat box reporting errors #3909
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
fix chinese warning to update model
### Type of change
- [x] Other (please describe): see the message
---------
Signed-off-by: Hui Peng <benquike@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Fix: The cursor is lost after entering a character in the operator form
#4072
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Add AdvancedSettingForm #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
For the new feature Add mssql support in the Dockerfile, I suggest
including support for msodbcsql18 for ARM64.
Based on current testing results (on macOS ARM64 environment),
msodbcsql18 needs to be installed.
I hope future Dockerfiles can incorporate a conditional check for this.
Specifically:
When $ARCH=arm64 (macOS ARM64 environment), install msodbcsql18.
In other cases (general x86_64 environment), install msodbcsql17.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: Mage Lu <magelu@MagedeMac-mini.local>
show avatar dialog instead of default
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Elasticsearch disk-based shard allocator use absolute byte values
instead of ratio. Close#4018
### Type of change
- [ ] Documentation Update
- [x] Refactoring
### What problem does this PR solve?
Feat: Bind event to the theme Switch #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix rerank_model bug in chat and markdown bug
#4000#3992
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Feat: Modify the link address of the agent id #3909
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Modify the text of the embedded website button #3909
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Set the color of the canvas's control button #3851
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: Hide the upload button in the external chat box #2242
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: Every time you switch the page number of a chunk, the PDF document
will be reloaded. #4046
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
The removed if statement is unnecessary and adds cognitive load for
readers.
The original code:
```
vector_similarity_weight = req.get("vector_similarity_weight", 0.3)
if vector_similarity_weight is None:
vector_similarity_weight = 0.3
```
has been simplified to:
```
vector_similarity_weight = req.get("vector_similarity_weight", 0.3)
```
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Fix xinfo_groups returns unexpected result. Close#3545
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Reparse a file shall reuse existing chunks if possible #3793
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Supports to debug single component in Agent. #3993
Fix: The github button on the login page is displayed incorrectly #4002
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix bugs in agent api and update api document
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Fix hierarchical_merge function. From idx vs. actual value to actual
value vs. actual value.
Related issue #4003
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: luopan <luopan@example.com>
### What problem does this PR solve?
Fix json file parsing
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
some thing
- execsql add connection mssql
- fix bug duckduckgo-search rate limit
- update typo vi res
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: lizheng@ssc-hn.com <lizheng@ssc-hn.com>
### What problem does this PR solve?
Try to reuse existing chunks. Close#3793
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Documentation Update
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
### What problem does this PR solve?
Fix: Fixed the issue where two consecutive indexes were displayed
incorrectly #3839
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Update api documents
### Type of change
- [x] Documentation Update
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
The initial Helm chart implementation added in #3815 suffers from an
issue where the 5GB data volume for the SQL DB is filled up with
[binlog](https://dev.mysql.com/doc/refman/8.4/en/binary-log.html) files
after just a few days. Since the app uses a non-replicated SQL DB config
I think it makes sense to disable the binlog in the SQL DB container.
This is achieved by simply adding the required argument to the container
startup command.
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: Answers with links to information not parsing #3839
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fixed retrieval TypeError: unhashable type: 'list'
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Exclude reference from the data returned by the conversation/get
interface #3909
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Modify the data structure of the chunk in the conversation #3909
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Rename page_num_list, top_list, position_list to page_num_int, top_int,
position_int
### Type of change
- [x] Refactoring
### What problem does this PR solve?
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
DOC_ENGINE="INFINITY" or "Infinity" or "Elasticsearch" also works
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
This PR organize chunks in the prompt by document and indicate what is
the name of the document in this way
```
Document: {doc_name} \nContains the following relevant fragments:
chunk1
chunk2
chunk3
Document: {doc_name} \nContains the following relevant fragments:
chunk4
chunk5
```
Maybe can be a baseline to add metadata to the documents.
This allow in my case to improve llm context about the orgin of the
information.
### Type of change
- [X] New Feature (non-breaking change which adds functionality)
Co-authored-by: Miguel <your-noreply-github-email>
api http return error when content is not found
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Add native translation in locales for Japanese 🇯🇵 to support new local
language
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: Hiroshi Kameya <kameya_h@sunflare.co.jp>
### What problem does this PR solve?
Added static check at PR CI
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
### What problem does this PR solve?
Update OC9 docker compose files
issue: #3901
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Refactor UI text
### Type of change
- [x] Documentation Update
- [x] Refactoring
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix: Fixed the issue that the agent list page failed to load #3827
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: An error occurred when deleting a new conversation #3898
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Import & export the agents. #3851
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Lots of typo, also need to merge into DEMO
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: Add tooltip to question item of ChunkCreatingModal #3873
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add tooltip to question item of ChunkCreatingModal #3873
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Adjust the input box width of EditTag to 100% #3873
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add question parameter to edit chunk modal #3873
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: The right coordinates of Categorize and Switch operators are
misplaced #3868
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Bind the route to the navigation bar in the head #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add sdk for list agents and sessions
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Fix: Hide the download button if it is an empty file #3762
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Fixed the problem of not finding EmailForm #3837
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
https://github.com/infiniflow/ragflow/issues/3651
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
Added the function of sending emails through SMTP
Instructions for use-
Corresponding parameters need to be configured
Need to output upstream in a fixed format

### Type of change
- [√] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Documentation Update
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Add api for list agents and agent seesions
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Close issue: #3828
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: Modify the style of the home page in bright mode #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Refactor embedding batch_size. Close#3657
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
### What problem does this PR solve?
Fix the agent reference bug and the session prologue
#3285#3819
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Add's a Helm chart for deploying RAGFlow on Kubernetes.
Closes#864.
### Type of change
- [X] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Store the pagerank field at the outermost #3794
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
In #3335 someone suggested to upgrade pdfplumber==0.11.1, but that
didn't solve it.
It's actually the special formatting in some of the pdfs that triggers
the problem.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
1. Failed update dataset case
2. Upload and parse text file
### Type of change
- [x] Other (please describe): test cases
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix the bug that the agent could not find the context
#3682
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Feat: Supports page rank score for different knowledge bases. #3794
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
1. Test not allowed fields of dataset
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Other (please describe): Test cases
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: Quit from jointed team #3759
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Detect invalid response from api.siliconflow.cn. Close#2643
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix the bug that prevented modifying `dataset_ids`.
#3772
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
add new models in jinna connector, to allow use models that support
multilingual models
### Type of change
- [X] Other (please describe): new connectors no breaking change
### What problem does this PR solve?
Refine the file parsing progress info
### Type of change
- [x] Refactoring
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Documentation Update
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Update wechat group QR code.
### Type of change
- [x] Documentation Update
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Refactoring
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
1. Store error type in Infinity
2. position list value read from Infinity isn't correct.
Fix issue: #3729
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: Add tooltip to delimiter filed #1909
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Translate comments of file-util.ts to English #3749
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
The uploaded avatar has been compressed to preserve transparency while
meeting the length requirements for the 'text' type. The current
compressed size is 100x100 pixels.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fixed increase_usage for builtin models. Close#1803
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Support for formulas #1954
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix the context of the agent interface call to the context during web
testing, and change it to the context record of user chat
Remove duplicate messages and add them, which can be viewed in the
messages section of database api_4_comversation
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Update version info to v0.14.1
### Type of change
- [x] Documentation Update
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix: Clicking the checkbox of the pop-up window for editing chunk is
invalid #3726
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Don't log exception if object doesn't exist. Close#1483
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Ensure thumbnail be smaller than 64K. Close#1443
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix test cases
### Type of change
- [x] Other (please describe): Fix error cases
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Added an infinity configuration file to easily customize the settings of
Infinity
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Test cases about dataset
### Type of change
- [x] Other (please describe): test cases
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: Scrolling knowledge base list and set the number of entries per
page to 30 #3695
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Detect shape error of embedding. Close#2997
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
**Passing cv_mdl.describe() is not an RGB converted image**
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Edit chunk shall update instead of insert it. Close#3679
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Scrolling knowledge base list #3695
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add dataset sidebar #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Always open text file for write with UTF-8. Close#932
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Added aspose on macosx/arm64. Close#3666
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix enable/disable bug #3628
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Fix a bug in VolcEngine #3553
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
1. Read KB list path parameter, page_number and page_size, which type
isn't int
2. Add cases on create / list / delete datasets.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Test cases
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: add PromptManagement page #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Check model id when set dialog. Close#849
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix the bug causing garbled text #3613
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
### What problem does this PR solve?
Feat: Add ModelManagement page #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Documentation Update
2024-11-25 19:32:25 +08:00
1224 changed files with 116758 additions and 45656 deletions
label:Is there an existing issue for the same bug?
description:Please check if an issue already exists for the bug you encountered.
label:Self Checks
description:"Please check the following in order to be responded in time :)"
options:
- label:I have checked the existing issues.
required:true
- label:I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
required:true
- label:I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
required:true
- label:Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
required:true
- label:"Please do not modify this template :) and fill in all the required fields."
required:true
- type:markdown
attributes:
value:"Please provide the following information to help us understand the issue."
description:Propose a feature request for RAGFlow.
title:"[Feature Request]: "
labels:[feature request]
labels:["💞 feature"]
body:
- type:checkboxes
attributes:
label:Is there an existing issue for the same feature request?
description:Please check if an issue already exists for the feature you request.
label:Self Checks
description:"Please check the following in order to be responded in time :)"
options:
- label:I have checked the existing issues.
- label:I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
required:true
- label:I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
required:true
- label:Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
required:true
- label:"Please do not modify this template :) and fill in all the required fields."
description:"Please check the following in order to be responded in time :)"
options:
- label:I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
required:true
- label:I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
required:true
- label:Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
required:true
- label:"Please do not modify this template :) and fill in all the required fields."
- cron:'0 13 * * *'# This schedule runs every 13:00:00Z(21:00:00+08:00)
# The "create tags" trigger is specifically focused on the creation of new tags, while the "push tags" trigger is activated when tags are pushed, including both new tag creations and updates to existing tags.
RUN pip3 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && pip3 config set global.trusted-host "pypi.tuna.tsinghua.edu.cn mirrors.pku.edu.cn"&& pip3 config set global.extra-index-url "https://mirrors.pku.edu.cn/pypi/web/simple"\
| tar -xf - --strip-components=2 -C /root/.ragflow)\
fi
# https://github.com/chrismattmann/tika-python
# This is the only way to run python-tika without internet access. Without this set, the default is to check the tika version and pull latest every time from Apache.
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/,target=/deps \
| tar -xf - --strip-components=2 -C /root/.ragflow
# Copy nltk data downloaded via download_deps.py
COPY nltk_data /root/nltk_data
# https://github.com/chrismattmann/tika-python
# This is the only way to run python-tika without internet access. Without this set, the default is to check the tika version and pull latest every time from Apache.
RUN pip3 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && pip3 config set global.trusted-host "pypi.tuna.tsinghua.edu.cn mirrors.pku.edu.cn"&& pip3 config set global.extra-index-url "https://mirrors.pku.edu.cn/pypi/web/simple"\
RUN --mount=type=cache,id=ragflow_production_apt,target=/var/cache/apt,sharing=locked \
apt update && apt install -y --no-install-recommends nginx libgl1 vim less &&\
rm -rf /var/lib/apt/lists/*
COPY web web
COPY api api
COPY conf conf
COPY deepdoc deepdoc
COPY rag rag
COPY agent agent
COPY graphrag graphrag
COPY pyproject.toml poetry.toml poetry.lock ./
# Copy models downloaded via download_deps.py
RUN mkdir -p /ragflow/rag/res/deepdoc /root/.ragflow
RUN --mount=type=bind,source=huggingface.co,target=/huggingface.co \
tar --exclude='.*' -cf - \
/huggingface.co/InfiniFlow/text_concat_xgb_v1.0 \
/huggingface.co/InfiniFlow/deepdoc \
| tar -xf - --strip-components=3 -C /ragflow/rag/res/deepdoc
# Copy nltk data downloaded via download_deps.py
COPY nltk_data /root/nltk_data
# https://github.com/chrismattmann/tika-python
# This is the only way to run python-tika without internet access. Without this set, the default is to check the tika version and pull latest every time from Apache.
> If you have not installed Docker on your local machine (Windows, Mac, or Linux),
see [Install Docker Engine](https://docs.docker.com/engine/install/).
- [gVisor](https://gvisor.dev/docs/user_guide/install/): Required only if you intend to use the code executor (sandbox) feature of RAGFlow.
> [!TIP]
> If you have not installed Docker on your local machine (Windows, Mac, or Linux), see [Install Docker Engine](https://docs.docker.com/engine/install/).
### 🚀 Start up the server
@ -153,7 +160,7 @@ releases! 🌟
> ```
>
> This change will be reset after a system reboot. To ensure your change remains permanent, add or update the
`vm.max_map_count` value in **/etc/sysctl.conf** accordingly:
> `vm.max_map_count` value in **/etc/sysctl.conf** accordingly:
3. Build the pre-built Docker images and start up the server:
3. Start up the server using the pre-built Docker images:
> The command below downloads the dev version Docker image for RAGFlow slim (`dev-slim`). Note that RAGFlow slim
Docker images do not include embedding models or Python libraries and hence are approximately 1GB in size.
> [!CAUTION]
> All Docker images are built for x86 platforms. We don't currently offer Docker images for ARM64.
> If you are on an ARM64 platform, follow [this guide](https://ragflow.io/docs/dev/build_docker_image) to build a Docker image compatible with your system.
> The command below downloads the `v0.19.0-slim` edition of the RAGFlow Docker image. See the following table for descriptions of different RAGFlow editions. To download a RAGFlow edition different from `v0.19.0-slim`, update the `RAGFLOW_IMAGE` variable accordingly in **docker/.env** before using `docker compose` to start the server. For example: set `RAGFLOW_IMAGE=infiniflow/ragflow:v0.19.0` for the full edition `v0.19.0`.
```bash
$ cd ragflow/docker
# Use CPU for embedding and DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d
# To use GPU to accelerate embedding and DeepDoc tasks:
# docker compose -f docker-compose-gpu.yml up -d
```
> - To download a RAGFlow slim Docker image of a specific version, update the `RAGFLOW_IMAGE` variable in *
*docker/.env** to your desired version. For example, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0-slim`. After
making this change, rerun the command above to initiate the download.
> - To download the dev version of RAGFlow Docker image *including* embedding models and Python libraries, update the
`RAGFLOW_IMAGE` variable in **docker/.env** to `RAGFLOW_IMAGE=infiniflow/ragflow:dev`. After making this change,
rerun the command above to initiate the download.
> - To download a specific version of RAGFlow Docker image *including* embedding models and Python libraries, update
the `RAGFLOW_IMAGE` variable in **docker/.env** to your desired version. For example,
`RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0`. After making this change, rerun the command above to initiate the
download.
> **NOTE:** A RAGFlow Docker image that includes embedding models and Python libraries is approximately 9GB in size
and may take significantly longer time to load.
| RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? |
@ -127,6 +132,10 @@ Coba demo kami di [https://demo.ragflow.io](https://demo.ragflow.io).
- RAM >= 16 GB
- Disk >= 50 GB
- Docker >= 24.0.0 & Docker Compose >= v2.26.1
- [gVisor](https://gvisor.dev/docs/user_guide/install/): Hanya diperlukan jika Anda ingin menggunakan fitur eksekutor kode (sandbox) dari RAGFlow.
> [!TIP]
> Jika Anda belum menginstal Docker di komputer lokal Anda (Windows, Mac, atau Linux), lihat [Install Docker Engine](https://docs.docker.com/engine/install/).
### 🚀 Menjalankan Server
@ -146,7 +155,7 @@ Coba demo kami di [https://demo.ragflow.io](https://demo.ragflow.io).
> ```
>
> Perubahan ini akan hilang setelah sistem direboot. Untuk membuat perubahan ini permanen, tambahkan atau perbarui nilai
`vm.max_map_count` di **/etc/sysctl.conf**:
> `vm.max_map_count` di **/etc/sysctl.conf**:
>
> ```bash
> vm.max_map_count=262144
@ -160,28 +169,29 @@ Coba demo kami di [https://demo.ragflow.io](https://demo.ragflow.io).
3. Bangun image Docker pre-built dan jalankan server:
> Perintah di bawah ini akan mengunduh versi dev dari Docker image RAGFlow slim (`dev-slim`). Image RAGFlow slim
tidak termasuk model embedding atau library Python dan berukuran sekitar 1GB.
> [!CAUTION]
> Semua gambar Docker dibangun untuk platform x86. Saat ini, kami tidak menawarkan gambar Docker untuk ARM64.
> Jika Anda menggunakan platform ARM64, [silakan gunakan panduan ini untuk membangun gambar Docker yang kompatibel dengan sistem Anda](https://ragflow.io/docs/dev/build_docker_image).
```bash
$ cd ragflow/docker
$ docker compose -f docker-compose.yml up -d
```
> Perintah di bawah ini mengunduh edisi v0.19.0-slim dari gambar Docker RAGFlow. Silakan merujuk ke tabel berikut untuk deskripsi berbagai edisi RAGFlow. Untuk mengunduh edisi RAGFlow yang berbeda dari v0.19.0-slim, perbarui variabel RAGFLOW_IMAGE di docker/.env sebelum menggunakan docker compose untuk memulai server. Misalnya, atur RAGFLOW_IMAGE=infiniflow/ragflow:v0.19.0 untuk edisi lengkap v0.19.0.
> - Untuk mengunduh versi tertentu dari image Docker RAGFlow slim, perbarui variabel `RAGFlow_IMAGE` di *
*docker/.env** sesuai dengan versi yang diinginkan. Misalnya, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0-slim`.
Setelah mengubah ini, jalankan ulang perintah di atas untuk memulai unduhan.
> - Untuk mengunduh versi dev dari image Docker RAGFlow *termasuk* model embedding dan library Python, perbarui
variabel `RAGFlow_IMAGE` di **docker/.env** menjadi `RAGFLOW_IMAGE=infiniflow/ragflow:dev`. Setelah mengubah ini,
jalankan ulang perintah di atas untuk memulai unduhan.
> - Untuk mengunduh versi tertentu dari image Docker RAGFlow *termasuk* model embedding dan library Python, perbarui
variabel `RAGFlow_IMAGE` di **docker/.env** sesuai dengan versi yang diinginkan. Misalnya,
`RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0`. Setelah mengubah ini, jalankan ulang perintah di atas untuk memulai unduhan.
```bash
$ cd ragflow/docker
# Use CPU for embedding and DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d
> **CATATAN:** Image Docker RAGFlow yang mencakup model embedding dan library Python berukuran sekitar 9GB
dan mungkin memerlukan waktu lebih lama untuk dimuat.
# To use GPU to accelerate embedding and DeepDoc tasks:
# docker compose -f docker-compose-gpu.yml up -d
```
4. Periksa status server setelah server aktif dan berjalan:
| RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? |
[RAGFlow](https://ragflow.io/)는 심층 문서 이해에 기반한 오픈소스 RAG (Retrieval-Augmented Generation) 엔진입니다. 이 엔진은 대규모 언어 모델(LLM)과 결합하여 정확한 질문 응답 기능을 제공하며, 다양한 복잡한 형식의 데이터에서 신뢰할 수 있는 출처를 바탕으로 한 인용을 통해 이를 뒷받침합니다. RAGFlow는 규모에 상관없이 모든 기업에 최적화된 RAG 워크플로우를 제공합니다.
> 로컬 머신(Windows, Mac, Linux)에 Docker가 설치되지 않은 경우, [Docker 엔진 설치]((https://docs.docker.com/engine/install/))를 참조하세요.
- [gVisor](https://gvisor.dev/docs/user_guide/install/): RAGFlow의 코드 실행기(샌드박스) 기능을 사용하려는 경우에만 필요합니다.
> [!TIP]
> 로컬 머신(Windows, Mac, Linux)에 Docker가 설치되지 않은 경우, [Docker 엔진 설치](<(https://docs.docker.com/engine/install/)>)를 참조하세요.
### 🚀 서버 시작하기
1.`vm.max_map_count`가 262144 이상인지 확인하세요:
> `vm.max_map_count`의 값을 아래 명령어를 통해 확인하세요:
>
> ```bash
@ -145,21 +148,29 @@
3. 미리 빌드된 Docker 이미지를 생성하고 서버를 시작하세요:
> 아래의 명령은 RAGFlow slim(dev-slim)의 개발 버전 Docker 이미지를 다운로드합니다. RAGFlow slim Docker 이미지에는 임베딩 모델이나 Python 라이브러리가 포함되어 있지 않으므로 크기는 약 1GB입니다.
> [!CAUTION]
> 모든 Docker 이미지는 x86 플랫폼을 위해 빌드되었습니다. 우리는 현재 ARM64 플랫폼을 위한 Docker 이미지를 제공하지 않습니다.
> ARM64 플랫폼을 사용 중이라면, [시스템과 호환되는 Docker 이미지를 빌드하려면 이 가이드를 사용해 주세요](https://ragflow.io/docs/dev/build_docker_image).
> 아래 명령어는 RAGFlow Docker 이미지의 v0.19.0-slim 버전을 다운로드합니다. 다양한 RAGFlow 버전에 대한 설명은 다음 표를 참조하십시오. v0.19.0-slim과 다른 RAGFlow 버전을 다운로드하려면, docker/.env 파일에서 RAGFLOW_IMAGE 변수를 적절히 업데이트한 후 docker compose를 사용하여 서버를 시작하십시오. 예를 들어, 전체 버전인 v0.19.0을 다운로드하려면 RAGFLOW_IMAGE=infiniflow/ragflow:v0.19.0로 설정합니다.
```bash
$ cd ragflow/docker
# Use CPU for embedding and DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d
# To use GPU to accelerate embedding and DeepDoc tasks:
# docker compose -f docker-compose-gpu.yml up -d
```
> - 특정 버전의 RAGFlow slim Docker 이미지를 다운로드하려면, **docker/.env**에서 `RAGFlow_IMAGE` 변수를 원하는 버전으로 업데이트하세요. 예를 들어, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0-slim`으로 설정합니다. 이 변경을 완료한 후, 위의 명령을 다시 실행하여 다운로드를 시작하세요.
> - RAGFlow의 임베딩 모델과 Python 라이브러리를 포함한 개발 버전 Docker 이미지를 다운로드하려면, **docker/.env**에서 `RAGFlow_IMAGE` 변수를 `RAGFLOW_IMAGE=infiniflow/ragflow:dev`로 업데이트하세요. 이 변경을 완료한 후, 위의 명령을 다시 실행하여 다운로드를 시작하세요.
> - 특정 버전의 RAGFlow Docker 이미지를 임베딩 모델과 Python 라이브러리를 포함하여 다운로드하려면, **docker/.env**에서 `RAGFlow_IMAGE` 변수를 원하는 버전으로 업데이트하세요. 예를 들어, `RAGFLOW_IMAGE=infiniflow/ragflow:v0.14.0` 로 설정합니다. 이 변경을 완료한 후, 위의 명령을 다시 실행하여 다운로드를 시작하세요.
> **NOTE:** 임베딩 모델과 Python 라이브러리를 포함한 RAGFlow Docker 이미지의 크기는 약 9GB이며, 로드하는 데 상당히 오랜 시간이 걸릴 수 있습니다.
| RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? |
> 만약 확인 단계를 건너뛰고 바로 RAGFlow에 로그인하면, RAGFlow가 완전히 초기화되지 않았기 때문에 브라우저에서 `network anormal` 오류가 발생할 수 있습니다.
5. 웹 브라우저에 서버의 IP 주소를 입력하고 RAGFlow에 로그인하세요.
2. 웹 브라우저에 서버의 IP 주소를 입력하고 RAGFlow에 로그인하세요.
> 기본 설정을 사용할 경우, `http://IP_OF_YOUR_MACHINE`만 입력하면 됩니다 (포트 번호는 제외). 기본 HTTP 서비스 포트 `80`은 기본 구성으로 사용할 때 생략할 수 있습니다.
6. [service_conf.yaml](./docker/service_conf.yaml) 파일에서 원하는 LLM 팩토리를 `user_default_llm`에 선택하고, `API_KEY` 필드를 해당 API 키로 업데이트하세요.
3. [service_conf.yaml.template](./docker/service_conf.yaml.template) 파일에서 원하는 LLM 팩토리를 `user_default_llm`에 선택하고, `API_KEY` 필드를 해당 API 키로 업데이트하세요.
> 자세한 내용은 [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup)를 참조하세요.
_이제 쇼가 시작됩니다!_
@ -193,36 +203,38 @@
시스템 설정과 관련하여 다음 파일들을 관리해야 합니다:
- [.env](./docker/.env): `SVR_HTTP_PORT`, `MYSQL_PASSWORD`, `MINIO_PASSWORD`와 같은 시스템의 기본 설정을 포함합니다.
- [service_conf.yaml](./docker/service_conf.yaml): 백엔드 서비스를 구성합니다.
- [service_conf.yaml.template](./docker/service_conf.yaml.template): 백엔드 서비스를 구성합니다.
- [docker-compose.yml](./docker/docker-compose.yml): 시스템은 [docker-compose.yml](./docker/docker-compose.yml)을 사용하여 시작됩니다.
[.env](./docker/.env) 파일의 변경 사항이 [service_conf.yaml](./docker/service_conf.yaml) 파일의 내용과 일치하도록 해야 합니다.
[.env](./docker/.env) 파일의 변경 사항이 [service_conf.yaml.template](./docker/service_conf.yaml.template) 파일의 내용과 일치하도록 해야 합니다.
> [./docker/README](./docker/README.md) 파일에는 환경 설정과 서비스 구성에 대한 자세한 설명이 있으며, [./docker/README](./docker/README.md) 파일에 나열된 모든 환경 설정이 [service_conf.yaml](./docker/service_conf.yaml) 파일의 해당 구성과 일치하도록 해야 합니다.
> [./docker/README](./docker/README.md) 파일 ./docker/README은 service_conf.yaml.template 파일에서 ${ENV_VARS}로 사용할 수 있는 환경 설정과 서비스 구성에 대한 자세한 설명을 제공합니다.
기본 HTTP 서비스 포트(80)를 업데이트하려면 [docker-compose.yml](./docker/docker-compose.yml) 파일에서 `80:80`을 `<YOUR_SERVING_PORT>:80`으로 변경하세요.
> 모든 시스템 구성 업데이트는 적용되기 위해 시스템 재부팅이 필요합니다.
>
> ```bash
> $ docker compose -f docker/docker-compose.yml up -d
> $ docker compose -f docker-compose.yml up -d
> ```
### Elasticsearch 에서 Infinity 로 문서 엔진 전환
RAGFlow 는 기본적으로 Elasticsearch 를 사용하여 전체 텍스트 및 벡터를 저장합니다. [Infinity]로 전환(https://github.com/infiniflow/infinity/), 다음 절차를 따르십시오.
- 🔎 [Arquitetura do Sistema](#-arquitetura-do-sistema)
- 🎬 [Primeiros Passos](#-primeiros-passos)
- 🔧 [Configurações](#-configurações)
- 🔧 [Construir uma imagem docker sem incorporar modelos](#-construir-uma-imagem-docker-sem-incorporar-modelos)
- 🔧 [Construir uma imagem docker incluindo modelos](#-construir-uma-imagem-docker-incluindo-modelos)
- 🔨 [Lançar serviço a partir do código-fonte para desenvolvimento](#-lançar-serviço-a-partir-do-código-fonte-para-desenvolvimento)
- 📚 [Documentação](#-documentação)
- 📜 [Roadmap](#-roadmap)
- 🏄 [Comunidade](#-comunidade)
- 🙌 [Contribuindo](#-contribuindo)
</details>
## 💡 O que é o RAGFlow?
[RAGFlow](https://ragflow.io/) é um mecanismo RAG (Geração Aumentada por Recuperação) de código aberto baseado em entendimento profundo de documentos. Ele oferece um fluxo de trabalho RAG simplificado para empresas de qualquer porte, combinando LLMs (Modelos de Linguagem de Grande Escala) para fornecer capacidades de perguntas e respostas verídicas, respaldadas por citações bem fundamentadas de diversos dados complexos formatados.
## 🎮 Demo
Experimente nossa demo em [https://demo.ragflow.io](https://demo.ragflow.io).
- 19-03-2025 Suporta o uso de um modelo multi-modal para entender imagens dentro de arquivos PDF ou DOCX.
- 28-02-2025 combinado com a pesquisa na Internet (T AVI LY), suporta pesquisas profundas para qualquer LLM.
- 26-01-2025 Otimize a extração e aplicação de gráficos de conhecimento e forneça uma variedade de opções de configuração.
- 18-12-2024 Atualiza o modelo de Análise de Layout de Documentos no DeepDoc.
- 01-11-2024 Adiciona extração de palavras-chave e geração de perguntas relacionadas aos blocos analisados para melhorar a precisão da recuperação.
- 22-08-2024 Suporta conversão de texto para comandos SQL via RAG.
## 🎉 Fique Ligado
⭐️ Dê uma estrela no nosso repositório para se manter atualizado com novas funcionalidades e melhorias empolgantes! Receba notificações instantâneas sobre novos lançamentos! 🌟
- Extração de conhecimento baseada em [entendimento profundo de documentos](./deepdoc/README.md) a partir de dados não estruturados com formatos complicados.
- Encontra a "agulha no palheiro de dados" de literalmente tokens ilimitados.
### 🍱 **Fragmentação baseada em templates**
- Inteligente e explicável.
- Muitas opções de templates para escolher.
### 🌱 **Citações fundamentadas com menos alucinações**
- Visualização da fragmentação de texto para permitir intervenção humana.
- Visualização rápida das referências chave e citações rastreáveis para apoiar respostas fundamentadas.
### 🍔 **Compatibilidade com fontes de dados heterogêneas**
- Suporta Word, apresentações, excel, txt, imagens, cópias digitalizadas, dados estruturados, páginas da web e mais.
### 🛀 **Fluxo de trabalho RAG automatizado e sem esforço**
- Orquestração RAG simplificada voltada tanto para negócios pessoais quanto grandes empresas.
- Modelos LLM e de incorporação configuráveis.
- Múltiplas recuperações emparelhadas com reclassificação fundida.
- APIs intuitivas para integração sem problemas com os negócios.
- [gVisor](https://gvisor.dev/docs/user_guide/install/): Necessário apenas se você pretende usar o recurso de executor de código (sandbox) do RAGFlow.
> [!TIP]
> Se você não instalou o Docker na sua máquina local (Windows, Mac ou Linux), veja [Instalar Docker Engine](https://docs.docker.com/engine/install/).
### 🚀 Iniciar o servidor
1. Certifique-se de que `vm.max_map_count` >= 262144:
> Para verificar o valor de `vm.max_map_count`:
>
> ```bash
> $ sysctl vm.max_map_count
> ```
>
> Se necessário, redefina `vm.max_map_count` para um valor de pelo menos 262144:
>
> ```bash
> # Neste caso, defina para 262144:
> $ sudo sysctl -w vm.max_map_count=262144
> ```
>
> Essa mudança será resetada após a reinicialização do sistema. Para garantir que a alteração permaneça permanente, adicione ou atualize o valor de `vm.max_map_count` em **/etc/sysctl.conf**:
3. Inicie o servidor usando as imagens Docker pré-compiladas:
> [!CAUTION]
> Todas as imagens Docker são construídas para plataformas x86. Atualmente, não oferecemos imagens Docker para ARM64.
> Se você estiver usando uma plataforma ARM64, por favor, utilize [este guia](https://ragflow.io/docs/dev/build_docker_image) para construir uma imagem Docker compatível com o seu sistema.
> O comando abaixo baixa a edição `v0.19.0-slim` da imagem Docker do RAGFlow. Consulte a tabela a seguir para descrições de diferentes edições do RAGFlow. Para baixar uma edição do RAGFlow diferente da `v0.19.0-slim`, atualize a variável `RAGFLOW_IMAGE` conforme necessário no **docker/.env** antes de usar `docker compose` para iniciar o servidor. Por exemplo: defina `RAGFLOW_IMAGE=infiniflow/ragflow:v0.19.0` para a edição completa `v0.19.0`.
```bash
$ cd ragflow/docker
# Use CPU for embedding and DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d
# To use GPU to accelerate embedding and DeepDoc tasks:
# docker compose -f docker-compose-gpu.yml up -d
```
| Tag da imagem RAGFlow | Tamanho da imagem (GB) | Possui modelos de incorporação? | Estável? |
4. Verifique o status do servidor após tê-lo iniciado:
```bash
$ docker logs -f ragflow-server
```
_O seguinte resultado confirma o lançamento bem-sucedido do sistema:_
```bash
____ ___ ______ ______ __
/ __ \ / | / ____// ____// /____ _ __
/ /_/ // /| | / / __ / /_ / // __ \| | /| / /
/ _, _// ___ |/ /_/ // __/ / // /_/ /| |/ |/ /
/_/ |_|/_/ |_|\____//_/ /_/ \____/ |__/|__/
* Rodando em todos os endereços (0.0.0.0)
```
> Se você pular essa etapa de confirmação e acessar diretamente o RAGFlow, seu navegador pode exibir um erro `network anormal`, pois, nesse momento, seu RAGFlow pode não estar totalmente inicializado.
5. No seu navegador, insira o endereço IP do seu servidor e faça login no RAGFlow.
> Com as configurações padrão, você só precisa digitar `http://IP_DO_SEU_MÁQUINA` (**sem** o número da porta), pois a porta HTTP padrão `80` pode ser omitida ao usar as configurações padrão.
6. Em [service_conf.yaml.template](./docker/service_conf.yaml.template), selecione a fábrica LLM desejada em `user_default_llm` e atualize o campo `API_KEY` com a chave de API correspondente.
> Consulte [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup) para mais informações.
_O show está no ar!_
## 🔧 Configurações
Quando se trata de configurações do sistema, você precisará gerenciar os seguintes arquivos:
- [.env](./docker/.env): Contém as configurações fundamentais para o sistema, como `SVR_HTTP_PORT`, `MYSQL_PASSWORD` e `MINIO_PASSWORD`.
- [service_conf.yaml.template](./docker/service_conf.yaml.template): Configura os serviços de back-end. As variáveis de ambiente neste arquivo serão automaticamente preenchidas quando o contêiner Docker for iniciado. Quaisquer variáveis de ambiente definidas dentro do contêiner Docker estarão disponíveis para uso, permitindo personalizar o comportamento do serviço com base no ambiente de implantação.
- [docker-compose.yml](./docker/docker-compose.yml): O sistema depende do [docker-compose.yml](./docker/docker-compose.yml) para iniciar.
> O arquivo [./docker/README](./docker/README.md) fornece uma descrição detalhada das configurações do ambiente e dos serviços, que podem ser usadas como `${ENV_VARS}` no arquivo [service_conf.yaml.template](./docker/service_conf.yaml.template).
Para atualizar a porta HTTP de serviço padrão (80), vá até [docker-compose.yml](./docker/docker-compose.yml) e altere `80:80` para `<SUA_PORTA_DE_SERVIÇO>:80`.
Atualizações nas configurações acima exigem um reinício de todos os contêineres para que tenham efeito:
> ```bash
> $ docker compose -f docker-compose.yml up -d
> ```
### Mudar o mecanismo de documentos de Elasticsearch para Infinity
O RAGFlow usa o Elasticsearch por padrão para armazenar texto completo e vetores. Para mudar para o [Infinity](https://github.com/infiniflow/infinity/), siga estas etapas:
1. Pare todos os contêineres em execução:
```bash
$ docker compose -f docker/docker-compose.yml down -v
```
Note: `-v` irá deletar os volumes do contêiner, e os dados existentes serão apagados.
2. Defina `DOC_ENGINE` no **docker/.env** para `infinity`.
3. Inicie os contêineres:
```bash
$ docker compose -f docker-compose.yml up -d
```
> [!ATENÇÃO]
> A mudança para o Infinity em uma máquina Linux/arm64 ainda não é oficialmente suportada.
## 🔧 Criar uma imagem Docker sem modelos de incorporação
Esta imagem tem cerca de 2 GB de tamanho e depende de serviços externos de LLM e incorporação.
"content":f'Now you should analyze each web page and find helpful information based on the current search query "{search_query}" and previous reasoning steps.'}],
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
BEGIN_SEARCH_QUERY="<|begin_search_query|>"
END_SEARCH_QUERY="<|end_search_query|>"
BEGIN_SEARCH_RESULT="<|begin_search_result|>"
END_SEARCH_RESULT="<|end_search_result|>"
MAX_SEARCH_LIMIT=6
REASON_PROMPT=(
"You are a reasoning assistant with the ability to perform dataset searches to help "
"you answer the user's question accurately. You have special tools:\n\n"
f"- To perform a search: write {BEGIN_SEARCH_QUERY} your query here {END_SEARCH_QUERY}.\n"
f"Then, the system will search and analyze relevant content, then provide you with helpful information in the format {BEGIN_SEARCH_RESULT} ...search results... {END_SEARCH_RESULT}.\n\n"
f"You can repeat the search process multiple times if necessary. The maximum number of search attempts is limited to {MAX_SEARCH_LIMIT}.\n\n"
"Once you have all the information you need, continue your reasoning.\n\n"
"-- Example 1 --\n"########################################
"Question: \"Are both the directors of Jaws and Casino Royale from the same country?\"\n"
"Assistant:\n"
f"{BEGIN_SEARCH_QUERY}Who is the director of Jaws?{END_SEARCH_QUERY}\n\n"
"User:\n"
f"{BEGIN_SEARCH_RESULT}\nThe director of Jaws is Steven Spielberg...\n{END_SEARCH_RESULT}\n\n"
"Continues reasoning with the new information.\n"
"Assistant:\n"
f"{BEGIN_SEARCH_QUERY}Where is Steven Spielberg from?{END_SEARCH_QUERY}\n\n"
"User:\n"
f"{BEGIN_SEARCH_RESULT}\nSteven Allan Spielberg is an American filmmaker...\n{END_SEARCH_RESULT}\n\n"
"Continues reasoning with the new information...\n\n"
"Assistant:\n"
f"{BEGIN_SEARCH_QUERY}Who is the director of Casino Royale?{END_SEARCH_QUERY}\n\n"
"User:\n"
f"{BEGIN_SEARCH_RESULT}\nCasino Royale is a 2006 spy film directed by Martin Campbell...\n{END_SEARCH_RESULT}\n\n"
"Continues reasoning with the new information...\n\n"
"Assistant:\n"
f"{BEGIN_SEARCH_QUERY}Where is Martin Campbell from?{END_SEARCH_QUERY}\n\n"
"User:\n"
f"{BEGIN_SEARCH_RESULT}\nMartin Campbell (born 24 October 1943) is a New Zealand film and television director...\n{END_SEARCH_RESULT}\n\n"
"Continues reasoning with the new information...\n\n"
"Assistant:\nIt's enough to answer the question\n"
"-- Example 2 --\n"#########################################
"Question: \"When was the founder of craigslist born?\"\n"
"Assistant:\n"
f"{BEGIN_SEARCH_QUERY}Who was the founder of craigslist?{END_SEARCH_QUERY}\n\n"
"User:\n"
f"{BEGIN_SEARCH_RESULT}\nCraigslist was founded by Craig Newmark...\n{END_SEARCH_RESULT}\n\n"
"Continues reasoning with the new information.\n"
"Assistant:\n"
f"{BEGIN_SEARCH_QUERY} When was Craig Newmark born?{END_SEARCH_QUERY}\n\n"
"User:\n"
f"{BEGIN_SEARCH_RESULT}\nCraig Newmark was born on December 6, 1952...\n{END_SEARCH_RESULT}\n\n"
"Continues reasoning with the new information...\n\n"
"Assistant:\nIt's enough to answer the question\n"
"**Remember**:\n"
f"- You have a dataset to search, so you just provide a proper search query.\n"
f"- Use {BEGIN_SEARCH_QUERY} to request a dataset search and end with {END_SEARCH_QUERY}.\n"
"- The language of query MUST be as the same as 'Question' or 'search result'.\n"
"- If no helpful information can be found, rewrite the search query to be less and precise keywords.\n"
"- When done searching, continue your reasoning.\n\n"
'Please answer the following question. You should think step by step to solve it.\n\n'
You are tasked with reading and analyzing web pages based on the following inputs: **Previous Reasoning Steps**, **Current Search Query**, and **Searched Web Pages**. Your objective is to extract relevant and helpful information for **Current Search Query** from the **Searched Web Pages** and seamlessly integrate this information into the **Previous Reasoning Steps** to continue reasoning for the original question.
**Guidelines:**
1. **Analyze the Searched Web Pages:**
- Carefully review the content of each searched web page.
- Identify factual information that is relevant to the **Current Search Query** and can aid in the reasoning process for the original question.
2. **Extract Relevant Information:**
- Select the information from the Searched Web Pages that directly contributes to advancing the **Previous Reasoning Steps**.
- Ensure that the extracted information is accurate and relevant.
3. **Output Format:**
- **If the web pages provide helpful information for current search query:** Present the information beginning with `**Final Information**` as shown below.
- The language of query **MUST BE** as the same as 'Search Query' or 'Web Pages'.\n"
**Final Information**
[Helpful information]
- **If the web pages do not provide any helpful information for current search query:** Output the following text.
Objective: To generate search terms related to the user's search keywords, helping users find more valuable information.
Instructions:
- Based on the keywords provided by the user, generate 5-10 related search terms.
- Each search term should be directly or indirectly related to the keyword, guiding the user to find more valuable information.
- Use common, general terms as much as possible, avoiding obscure words or technical jargon.
- Keep the term length between 2-4 words, concise and clear.
- DO NOT translate, use the language of the original keywords.
Role: You are an AI language model assistant tasked with generating 5-10 related questions based on a user’s original query. These questions should help expand the search query scope and improve search relevance.
### Example:
Keywords: Chinese football
Related search terms:
1. Current status of Chinese football
2. Reform of Chinese football
3. Youth training of Chinese football
4. Chinese football in the Asian Cup
5. Chinese football in the World Cup
Instructions:
Input: You are provided with a user’s question.
Output: Generate 5-10 alternative questions that are related to the original user question. These alternatives should help retrieve a broader range of relevant documents from a vector database.
Context: Focus on rephrasing the original question in different ways, making sure the alternative questions are diverse but still connected to the topic of the original query. Do not create overly obscure, irrelevant, or unrelated questions.
Fallback: If you cannot generate any relevant alternatives, do not return any questions.
Guidance:
1. Each alternative should be unique but still relevant to the original query.
2. Keep the phrasing clear, concise, and easy to understand.
4. Ensure that each question contributes towards improving search results by broadening the search angle, not narrowing it.
Example:
Original Question: What are the benefits of electric vehicles?
Alternative Questions:
1. How do electric vehicles impact the environment?
2. What are the advantages of owning an electric car?
3. What is the cost-effectiveness of electric vehicles?
4. How do electric vehicles compare to traditional cars in terms of fuel efficiency?
5. What are the environmental benefits of switching to electric cars?
6. How do electric vehicles help reduce carbon emissions?
7. Why are electric vehicles becoming more popular?
8. What are the long-term savings of using electric vehicles?
9. How do electric vehicles contribute to sustainability?
10. What are the key benefits of electric vehicles for consumers?
Reason:
- When searching, users often only use one or two keywords, making it difficult to fully express their information needs.
- Generating related search terms can help users dig deeper into relevant information and improve search efficiency.
- At the same time, related terms can also help search engines better understand user needs and return more accurate search results.
Rephrasing the original query into multiple alternative questions helps the user explore different aspects of their search topic, improving the quality of search results.
These questions guide the search engine to provide a more comprehensive set of relevant documents.
returnget_json_result(data=files,message="There seems to be an issue with your file format. Please verify it is correct and not corrupted.",code=settings.RetCode.DATA_ERROR)
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.