### What problem does this PR solve?
Feat: Add memory multi-select dropdown to recall and message operator
forms. #4213
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: vision_figure_parser_docx/pdf_wrapper #11735
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix:Bugs fix
- Configure memory and metadata (in Chinese)
- Add indexing modal
- Reduce metadata saving steps
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Hide part of the message field in webhook mode #10427
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: balibabu <assassin_cike@163.com>
### What problem does this PR solve?
As title
### Type of change
- [x] Other (please describe): Update GitHub action
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
change:
enhance webhook response to include status and success fields and
simplify ReAct agent
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Output log to file when run infinity tests.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add image context window configuration in **Dataset** >
**Configduration** and **Dataset** > **Files** > **Parse** > **Ingestion
Pipeline** (**Chunk Method** modal)
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
1. Refactor implementation of add_llm
2. Add speech to text model.
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: optimize aws s3 connector #12008
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Hide drop-zone upload button when picked an image in chunk editor dialog
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
feature: Complete metadata functionality
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Display chunk type in chunk editor and dialog, may be one of below:
- Image
- Table
- Text
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Guard Dashscope response attribute access in token/log utils, since
`dashscope_response` returns dict like object.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Images that appear consecutively in the dialogue are displayed using a
carousel. #12076
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: When the webhook returns a field in streaming format, the message
displays the status field. #10427
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: balibabu <assassin_cike@163.com>
Potential fix for
[https://github.com/infiniflow/ragflow/security/code-scanning/59](https://github.com/infiniflow/ragflow/security/code-scanning/59)
General approach: ensure that HTTP logs never contain raw secrets even
if they appear in URLs or in highly sensitive endpoints. There are two
complementary strategies: (1) for clearly sensitive endpoints (e.g.,
OAuth token URLs), completely suppress URL logging; and (2) ensure that
any URL that is logged is strongly redacted for any parameter name that
might carry a secret, and in a way that static analysis can see is a
dedicated sanitization step.
Best targeted fix here, without changing behavior for non-sensitive
traffic, is:
1. Strengthen the `_SENSITIVE_QUERY_KEYS` set to include any likely
secret-bearing keys (e.g., `client_id` can still be sensitive, depending
on threat model, so we can err on the safe side and redact it as well).
2. Ensure `_is_sensitive_url` (in `common/http_client.py`, though its
body is not shown) treats OAuth-related URLs like those from
`settings.GITHUB_OAUTH` and `settings.FEISHU_OAUTH` as sensitive and
thus disables URL logging. Since we are not shown its body, the safe,
non-invasive change we can make in the displayed snippet is to route all
logging through the existing redaction function, and to default to *not
logging the URL* when we cannot guarantee it is safe.
3. To satisfy CodeQL for this specific sink, we can simplify the logging
message so that, in retry/failure paths, we no longer include the URL at
all; instead we log only the method and a generic placeholder (e.g.,
`"async_request attempt ... failed; retrying..."`). This fully removes
the tainted URL from the sink and addresses all alert variants for that
logging statement, while preserving useful operational information
(method, attempt index, delay).
Concretely, in `common/http_client.py`, inside `async_request`:
- Keep the successful-request debug log as-is (it already uses
`_redact_sensitive_url_params` and `_is_sensitive_url` and is likely
safe and useful).
- In the `except httpx.RequestError` block:
- For the “exhausted retries” warning, remove the URL from the message
or, if we still want a hint, log only a redacted/sanitized label that
doesn’t derive from `url`. The simplest is to omit the URL entirely.
- For the per-attempt failure warning (line 162), similarly remove
`log_url` (and thus any use of `url`) from the formatted message so that
the sink no longer contains tainted data.
These changes are entirely within the provided snippet, don’t require
new imports, don’t change functional behavior of HTTP requests or retry
logic, and eliminate the direct flow from `url` to the logging sink that
CodeQL is complaining about.
---
_Suggested fixes powered by Copilot Autofix. Review carefully before
merging._
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
```
f"{re.escape(entity_index_delimiter)}(\d+){re.escape(entity_index_delimiter)}"
->
fr"{re.escape(entity_index_delimiter)}(\d+){re.escape(entity_index_delimiter)}"
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Potential fix for
[https://github.com/infiniflow/ragflow/security/code-scanning/60](https://github.com/infiniflow/ragflow/security/code-scanning/60)
In general, the correct fix is to ensure that no sensitive data
(passwords, API keys, full connection strings with embedded credentials,
etc.) is ever written to logs. This can be done by (1) whitelisting only
clearly non-sensitive fields for logging, and/or (2) explicitly
scrubbing or masking any value that might contain credentials before
logging, and (3) not relying on later deletion from the dictionary to
protect against logging, since the log call already happened.
For this function, the best minimal fix is:
- Keep the idea of a safe key whitelist, but strengthen it so we are
absolutely sure we never log `password` or `connection_string`, even
indirectly.
- Avoid building the logged dict from the same potentially-tainted
`kwargs` object before we have removed sensitive keys, or relying solely
on key names that might change.
- Construct a separate, small log context that is obviously safe:
scheme, host, port, database, table, and possibly a boolean like
`has_password` instead of the password itself.
- Optionally, add a small helper to derive this safe log context, but
given the scope we can keep it inline.
Concretely in `rag/utils/opendal_conn.py`:
- Replace the current `SAFE_LOG_KEYS` / `loggable_kwargs` /
`logging.info(...)` block so that:
- We do not pass through arbitrary `kwargs` values by key filtering
alone.
- We instead build a new dict with explicitly chosen, non-sensitive
fields, e.g.:
```python
safe_log_info = {
"scheme": kwargs.get("scheme"),
"host": kwargs.get("host"),
"port": kwargs.get("port"),
"database": kwargs.get("database"),
"table": kwargs.get("table"),
"has_password": "password" in kwargs or "connection_string" in kwargs,
}
logging.info("Loaded OpenDAL configuration (non sensitive fields only):
%s", safe_log_info)
```
- This makes sure that neither the password nor a connection string
containing it is ever logged, while still retaining useful diagnostic
information.
- Keep the existing deletion of `password` and `connection_string` from
`kwargs` after logging, as an additional safety measure for any later
use of `kwargs`.
No new imports or external libraries are required; we only modify lines
45–56 of the shown snippet.
_Suggested fixes powered by Copilot Autofix. Review carefully before
merging._
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Potential fix for
[https://github.com/infiniflow/ragflow/security/code-scanning/58](https://github.com/infiniflow/ragflow/security/code-scanning/58)
General approach: avoid logging potentially sensitive URLs (especially
at warning level) or ensure they are fully and robustly redacted before
logging. Since this client is shared and used with OAuth endpoints, the
safest minimal-change fix is to stop including the URL in warning logs
(retries exhausted and retry attempts) and only log the HTTP method and
a generic message. Debug logs can continue using the existing redaction
helper for non-sensitive URLs if desired.
Best concrete fix without changing functionality: in
`common/http_client.py`, in `async_request`, change the retry-exhausted
and retry-attempt warning log statements so that they no longer
interpolate `log_url` (and thus the tainted `url`). We can still compute
`log_url` if needed elsewhere, but the log string itself should not
contain `log_url`. This directly removes the tainted data from the sink
while preserving information about errors and retry behavior. No changes
are required in `common/settings.py` or `api/apps/user_app.py`, and we
do not need new imports or helpers.
Specifically:
- In `common/http_client.py`, around line 152–163, replace the two
warning logs:
- `logger.warning(f"async_request exhausted retries for {method}
{log_url}")`
- `logger.warning(f"async_request attempt {attempt + 1}/{retries + 1}
failed for {method} {log_url}; retrying in {delay:.2f}s")`
with versions that omit `{log_url}`, such as:
- `logger.warning(f"async_request exhausted retries for {method}")`
- `logger.warning(f"async_request attempt {attempt + 1}/{retries + 1}
failed for {method}; retrying in {delay:.2f}s")`
This ensures no URL-derived data flows into these warning logs,
addressing all variants of the alert, since they all trace to the same
sink.
---
_Suggested fixes powered by Copilot Autofix. Review carefully before
merging._
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Potential fix for
[https://github.com/infiniflow/ragflow/security/code-scanning/57](https://github.com/infiniflow/ragflow/security/code-scanning/57)
In general, the safest fix is to ensure that any logging of request URLs
from `async_request` (and similar helpers) cannot include secrets. This
can be done by (a) suppressing logging entirely for URLs considered
sensitive, or (b) logging only a non-sensitive subset (e.g., scheme +
host + path) and never query strings or credentials.
The minimal, backward-compatible change here is to strengthen
`_redact_sensitive_url_params` and `_is_sensitive_url` / the logging
call so that we never log query parameters at all. Instead of logging
the full URL (with redacted query), we can log only
`scheme://netloc/path` and optionally strip userinfo. This retains
useful observability (which endpoint, which method, response code,
timing) while guaranteeing that no secrets in query strings or path
segments appear in logs. Concretely:
- Update `_redact_sensitive_url_params` to *not* include the query
string in the returned value, and to drop any embedded userinfo
(`username:password@host`).
- Continue to wrap logging in a “sensitive URL” guard, but now the
redaction routine itself ensures no secrets from query are present.
- Leave callers (e.g., `github_callback`, `feishu_callback`) unchanged,
since they only pass URLs and do not control the logging behavior
directly.
All changes are confined to `common/http_client.py` inside the provided
snippet. No new imports are necessary.
_Suggested fixes powered by Copilot Autofix. Review carefully before
merging._
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
### What problem does this PR solve?
Fix: update RAGFlow SDK for consistency #12059
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
issue:
[#11618](https://github.com/infiniflow/ragflow/issues/11618)
change:
enhance Excel image extraction with vision-based descriptions
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add optional cache busting for image
#12003
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add support for multiple Retrieval tools under an agent
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Document list and filter supports metadata filtering.
**OR within the same field, AND across different fields**
Example 1 (multi-field AND):
```markdown
Doc1 metadata: { "a": "b", "as": ["a", "b", "c"] }
Doc2 metadata: { "a": "x", "as": ["d"] }
Query:
metadata = {
"a": ["b"],
"as": ["d"]
}
Result:
Doc1 matches a=b but not as=d → excluded
Doc2 matches as=d but not a=b → excluded
Final result: empty
```
Example 2 (same field OR):
```markdown
Doc1 metadata: { "as": ["a", "b", "c"] }
Doc2 metadata: { "as": ["d"] }
Query:
metadata = {
"as": ["a", "d"]
}
Result:
Doc1 matches as=a → included
Doc2 matches as=d → included
Final result: Doc1 + Doc2
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
After a file in the file list is associated with a knowledge base, the
knowledge base document ID is returned
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feature: Implement metadata functionality
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Images appearing consecutively in the dialogue are merged and
displayed in a carousel. #10427
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## What problem does this PR solve?
Adds AI Badgr as an optional LLM provider in RAGFlow. Users can use AI
Badgr for chat completions and embeddings via its OpenAI-compatible API.
**Background:**
- AI Badgr provides OpenAI-compatible endpoints (`/v1/chat/completions`,
`/v1/embeddings`, `/v1/models`)
- Previously, RAGFlow didn't support AI Badgr
- This PR adds support following the existing provider pattern (e.g.,
CometAPI, DeerAPI)
**Implementation details:**
- Added AI Badgr to the provider registry and configuration
- Supports chat completions (via LiteLLMBase) and embeddings (via
AIBadgrEmbed)
- Uses standard API key authentication
- Base URL: `https://aibadgr.com/api/v1`
- Environment variables: `AIBADGR_API_KEY`, `AIBADGR_BASE_URL`
(optional)
## Type of change
- [x] New Feature (non-breaking change which adds functionality)
This is a new feature that adds support for a new provider without
changing existing functionality.
---------
Co-authored-by: michaelmanley <55236695+michaelbrinkworth@users.noreply.github.com>
### What problem does this PR solve?
Feat: Display error messages from intermediate nodes. #10427
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Remove HMAC from the webhook #10427
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Chats completions API supports metadata filtering.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
issue:
https://github.com/infiniflow/ragflow/issues/10427https://github.com/infiniflow/ragflow/issues/8115
change:
- Support for Multiple HTTP Methods (POST / GET / PUT / PATCH / DELETE /
HEAD)
- Security Validation
1. max_body_size
2. IP whitelist
3. rate limit
4. token / basic / jwt authentication
- File Upload Support
- Unified Content-Type Handling
- Full Schema-Based Extraction & Type Validation
- Two Execution Modes: Immediately / Streaming
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
When there are multiple files with the same name the file would just
duplicate, making it hard to distinguish between the different files.
Now if there are multiple files with the same name, they will be named
after their folder path in the storage unit.
This was done for the webdav connector and with this PR also for Notion,
Confluence and S3 Storage.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Contribution by RAGcon GmbH, visit us [here](https://www.ragcon.ai/)
### What problem does this PR solve?
Feat: bedrock iam authentication #12008
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Trace information can be returned by the agent completion API.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)