ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-01-31 15:45:08 +08:00

Author	SHA1	Message	Date
Tuan Le	7353070f49	Adds retrieval result fields to Chunk (#8478 ) ### What problem does this PR solve? This PR adds fields to the `Chunk` class to store retrieval results like similarity scores, term similarity, vector similarity, positions, and document type. This allows the chunk object to hold all the information needed when returning search results from the vector database. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-25 16:53:15 +08:00
Liu An	dac5bcdf17	Fix: Enforce default embedding model in create_dataset / update_dataset (#8486 ) ### What problem does this PR solve? Previous: - Defaulted to hardcoded model 'BAAI/bge-large-zh-v1.5@BAAI' - Did not respect user-configured default embedding_model Now: - Correctly prioritizes user-configured default embedding_model Other: - Make embedding_model optional in CreateDatasetReq with proper None handling - Add default embedding model fallback in dataset update when empty - Enhance validation utils to handle None values and string normalization - Update SDK default embedding model to None to match API changes - Adjust related test cases to reflect new validation rules ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-25 16:41:32 +08:00
Rainman	340354b79c	fix the error 'Unknown field for GenerationConfig: max_tokens' when u… (#8473 ) ### What problem does this PR solve? [https://github.com/infiniflow/ragflow/issues/8324](url) docker image version: v0.19.1 The `_clean_conf` function was not implemented in the `_chat` and `chat_streamly` methods of the `GeminiChat` class, causing the error "Unknown field for GenerationConfig: max_tokens" when the default LLM config includes the "max_tokens" parameter. Buggy Code(ragflow/rag/llm/chat_model.py) ```python class GeminiChat(Base): def __init__(self, key, model_name, base_url=None, kwargs): super().__init__(key, model_name, base_url=base_url, kwargs) from google.generativeai import GenerativeModel, client client.configure(api_key=key) _client = client.get_default_generative_client() self.model_name = "models/" + model_name self.model = GenerativeModel(model_name=self.model_name) self.model._client = _client def _clean_conf(self, gen_conf): for k in list(gen_conf.keys()): if k not in ["temperature", "top_p"]: del gen_conf[k] return gen_conf def _chat(self, history, gen_conf): from google.generativeai.types import content_types system = history[0]["content"] if history and history[0]["role"] == "system" else "" hist = [] for item in history: if item["role"] == "system": continue hist.append(deepcopy(item)) item = hist[-1] if "role" in item and item["role"] == "assistant": item["role"] = "model" if "role" in item and item["role"] == "system": item["role"] = "user" if "content" in item: item["parts"] = item.pop("content") if system: self.model._system_instruction = content_types.to_content(system) response = self.model.generate_content(hist, generation_config=gen_conf) ans = response.text return ans, response.usage_metadata.total_token_count def chat_streamly(self, system, history, gen_conf): from google.generativeai.types import content_types if system: self.model._system_instruction = content_types.to_content(system) #❌_clean_conf was not implemented for k in list(gen_conf.keys()): if k not in ["temperature", "top_p", "max_tokens"]: del gen_conf[k] for item in history: if "role" in item and item["role"] == "assistant": item["role"] = "model" if "content" in item: item["parts"] = item.pop("content") ans = "" try: response = self.model.generate_content(history, generation_config=gen_conf, stream=True) for resp in response: ans = resp.text yield ans yield response._chunks[-1].usage_metadata.total_token_count except Exception as e: yield ans + "\nERROR: " + str(e) yield 0 ``` Implement the _clean_conf function ```python class GeminiChat(Base): def __init__(self, key, model_name, base_url=None, kwargs): super().__init__(key, model_name, base_url=base_url, kwargs) from google.generativeai import GenerativeModel, client client.configure(api_key=key) _client = client.get_default_generative_client() self.model_name = "models/" + model_name self.model = GenerativeModel(model_name=self.model_name) self.model._client = _client def _clean_conf(self, gen_conf): for k in list(gen_conf.keys()): if k not in ["temperature", "top_p"]: del gen_conf[k] return gen_conf def _chat(self, history, gen_conf): from google.generativeai.types import content_types #✅ implement _clean_conf to remove the wrong parameters gen_conf = self._clean_conf(gen_conf) system = history[0]["content"] if history and history[0]["role"] == "system" else "" hist = [] for item in history: if item["role"] == "system": continue hist.append(deepcopy(item)) item = hist[-1] if "role" in item and item["role"] == "assistant": item["role"] = "model" if "role" in item and item["role"] == "system": item["role"] = "user" if "content" in item: item["parts"] = item.pop("content") if system: self.model._system_instruction = content_types.to_content(system) response = self.model.generate_content(hist, generation_config=gen_conf) ans = response.text return ans, response.usage_metadata.total_token_count def chat_streamly(self, system, history, gen_conf): from google.generativeai.types import content_types #✅ implement _clean_conf to remove the wrong parameters gen_conf = self._clean_conf(gen_conf) if system: self.model._system_instruction = content_types.to_content(system) #✅Removed duplicate parameter filtering logic "for k in list(gen_conf.keys()):" for item in history: if "role" in item and item["role"] == "assistant": item["role"] = "model" if "content" in item: item["parts"] = item.pop("content") ans = "" try: response = self.model.generate_content(history, generation_config=gen_conf, stream=True) for resp in response: ans = resp.text yield ans yield response._chunks[-1].usage_metadata.total_token_count except Exception as e: yield ans + "\nERROR: " + str(e) yield 0 ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-25 16:23:35 +08:00
balibabu	c4b58ed195	Feat: Filter the query variable drop-down box options by type #3221 (#8485 ) ### What problem does this PR solve? Feat: Filter the query variable drop-down box options by type #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-25 16:23:20 +08:00
Yongteng Lei	b705ff08fe	Refa: improve GraphRAG similarity sensitivity to numeric differences (#8479 ) ### What problem does this PR solve? Improve GraphRAG similarity sensitivity to numeric differences. #8444. ### Type of change - [x] Refactoring	2025-06-25 16:20:59 +08:00
Tuan Le	d632046032	Fixes typo in variable name (#8476 ) ### What problem does this PR solve? This PR fixes a typo in the variable name `succesfulFilenames`, correcting it to `successfulFilenames`. This ensures consistency and avoids potential errors due to the misspelled variable. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-25 15:36:54 +08:00
ruansheng	de8ba7298c	RAGFlow service_conf using .env variable (#8454 ) ### What problem does this PR solve? Fix: when using external components, it is impossible to specify the port, because the variables in the `docker/.env` variable were not referenced by `docker/service_conf.yaml.template`. `382d2d0373/docker/.env (L85)` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-25 15:24:37 +08:00
balibabu	ece27c66e9	Feat: Insert the node data of the bottom subagent into the tool array of the head agent #3221 (#8471 ) ### What problem does this PR solve? Feat: Insert the node data of the bottom subagent into the tool array of the head agent #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-25 15:24:22 +08:00
liuzhenghua	5256980ffb	Fix: Solve the OOM issue when passing large PDF files while using QA chunking method. (#8464 ) ### What problem does this PR solve? Using the QA chunking method with a large PDF (e.g., 300+ pages) may lead to OOM in the ragflow-worker module. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-25 10:25:45 +08:00
Yongteng Lei	f21827bc28	Feat: add MCP treamable-http transport (#8449 ) ### What problem does this PR solve? Add MCP treamable-http transport. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-25 10:01:54 +08:00
Stephen Hu	8d9d2cc0a9	Fix: some cases Task return but not set progress (#8469 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8466 I go through the codes, current logic: When do_handle_task raises an exception, handle_task will set the progress, but for some cases do_handle_task internal will just return but not set the right progress, at this cases the redis stream will been acked but the task is running. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-25 09:58:55 +08:00
Yongteng Lei	af6850c8d8	Feat: add MCP dashboard operations (#8460 ) ### What problem does this PR solve? Add MCP server dashboard operations. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-25 09:26:04 +08:00
writinwaters	18fd7983f1	Docs: exporting created knowledge graphs is not supported (#8465 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [x] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-06-25 09:21:54 +08:00
HaiyangP	d6a941ebf5	Fix the bug of long type value overflow (#8313 ) ### What problem does this PR solve? This PR will fix the #8271 by extending int type to float type when there is any value out of long type range in a column. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-24 18:18:30 +08:00
balibabu	1c68c9ebd6	Feat: Add IterationNode component #3221 (#8461 ) ### What problem does this PR solve? Feat: Add IterationNode component #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-24 18:01:30 +08:00
WuWeiFlow	bc1b837616	FIX:Saving an RGBA image directly as JPEG will cause an error. If the… (#8399 ) Saving an RGBA image directly as JPEG will cause an error. If the image is in RGBA mode, convert it to RGB mode before saving it in JPG format. ### What problem does this PR solve? During document parsing in the knowledge base, we occasionally encounter the error 'cannot write mode RGBA as JPEG.' This occurs because images in RGBA mode cannot be directly saved as JPEG. They must be converted first before saving. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-24 18:01:13 +08:00
Liu An	9f9acf0c49	Test: Add document app tests (#8456 ) ### What problem does this PR solve? - Add new test suite for document app with create/list/parse/upload/remove tests - Update API URLs to use version variable from config in HTTP and web API tests ### Type of change - [x] Add test cases	2025-06-24 17:26:16 +08:00
Stephen Hu	382d2d0373	Refactor:Improve insert file logic (#8445 ) ### What problem does this PR solve? before refactor 1. create file record 2. Add to blob if have some execption at 2 the system db will have a file record but not have related blob, which will introduce some bug. after refactor 1. add to blob 2. create file record. if 1 success but 2 failed just have a dirty blob in blob system, user will not feel that ### Type of change - [x] Refactoring	2025-06-24 13:17:22 +08:00
balibabu	07545fbfd3	Feat: Delete the agent and tool nodes downstream of the agent node #3221 (#8450 ) ### What problem does this PR solve? Feat: Delete the agent and tool nodes downstream of the agent node #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-24 11:33:01 +08:00
Rainman	49d67cbcb7	fix a bug when using huggingface embedding api (#8432 ) ### What problem does this PR solve? image_version: v0.19.1 This PR fixes a bug in the HuggingFaceEmBedding API method that was causing AssertionError: assert len(vects) == len(docs) during the document embedding process. #### Problem The HuggingFaceEmbed.encode() method had an early return statement inside the for loop, causing it to return after processing only the first text input instead of processing all texts in the input list. Error Messenge ```python AssertionError: assert len(vects) == len(docs) # input chunks != embedded vectors from embedding api File "/ragflow/rag/svr/task_executor.py", line 442, in embedding ``` Buggy code(/ragflow/rag/llm/embedding_model.py) ```python class HuggingFaceEmbed(Base): def __init__(self, key, model_name, base_url=None): if not model_name: raise ValueError("Model name cannot be None") self.key = key self.model_name = model_name.split("___")[0] self.base_url = base_url or "http://127.0.0.1:8080" def encode(self, texts: list): embeddings = [] for text in texts: response = requests.post(...) if response.status_code == 200: try: embedding = response.json() embeddings.append(embedding[0]) # ❌ Early return return np.array(embeddings), sum([num_tokens_from_string(text) for text in texts]) except Exception as _e: log_exception(_e, response) else: raise Exception(...) ``` Fixed Code(I just Rollback this function to the v0.19.0 version) ```python Class HuggingFaceEmbed(Base): def __init__(self, key, model_name, base_url=None): if not model_name: raise ValueError("Model name cannot be None") self.key = key self.model_name = model_name.split("___")[0] self.base_url = base_url or "http://127.0.0.1:8080" def encode(self, texts: list): embeddings = [] for text in texts: response = requests.post(...) if response.status_code == 200: embedding = response.json() embeddings.append(embedding[0]) # ✅ Only append, no return else: raise Exception(...) return np.array(embeddings), sum([num_tokens_from_string(text) for text in texts]) # ✅ Return after processing all ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-24 09:35:02 +08:00
balibabu	96b63cc81f	Feat: Use the message_id returned by the interface as the id of the reply message #3221 (#8434 ) ### What problem does this PR solve? Feat: Use the message_id returned by the interface as the id of the reply message #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-24 09:34:33 +08:00
Song Fuchang	fd7ac17605	Feat: Scratch MCP tool calling support. (#8263 ) ### What problem does this PR solve? This is a cherry-pick from #7781 as requested. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-23 17:45:35 +08:00
writinwaters	e9c6891e24	Docs: Miscellaneous editorial updates (#8430 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2025-06-23 17:45:20 +08:00
Yongteng Lei	03656da4dd	Refa: upgrade MCP SDK to v1.9.4 (#8421 ) ### What problem does this PR solve? Upgrade MCP SDK to v1.9.4 (latest). ### Type of change - [x] Refactoring	2025-06-23 16:53:59 +08:00
Jason	0427eebe94	Update .env ,Defaults to the v0.19.1-slim edition (#8412 ) ### What problem does this PR solve? Update .env ,Defaults to the v0.19.1-slim edition ### Type of change - [x] Other (please describe): Update .env ,Defaults to the v0.19.1-slim edition	2025-06-23 16:00:14 +08:00
Liu An	244d8a47b9	Fix: AzureChat model code (#8426 ) ### What problem does this PR solve? - Simplify AzureChat constructor by passing base_url directly - Clean up spacing and formatting in chat_model.py - Remove redundant parentheses and improve code consistency - #8423 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-23 15:59:25 +08:00
Yesid Cano Castro	4760e317d5	Feat: Add HTTPS setup instructions and configuration for Nginx (#8401 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change: Documentation Update/Refactoring #### Summary Adds HTTPS/SSL configuration guide/example to enable secure RAGFlow deployments with proper certificate management. #### Changes - New HTTPS Setup Section: Step-by-step guide for SSL certificate configuration - Let's Encrypt Integration: Complete Certbot setup instructions - Docker Configuration: Volume mapping examples for certificates #### Key Features - Prerequisites checklist - Docker Compose configuration examples - Support for both Let's Encrypt and existing certificates #### Files Modified - `README.md` - `ragflow.https.conf` (new file)	2025-06-23 15:36:15 +08:00
balibabu	71afebb2c0	Feat: The delete button is displayed only when the cursor is hovered over the connection line #3221 (#8422 ) ### What problem does this PR solve? Feat: The delete button is displayed only when the cursor is hovered over the connection line #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-23 15:27:34 +08:00
kira-offgrid	f0e0783618	Fix: Database Query Vulnerable to Injection Attacks in rag/utils/opendal_conn.py (#8408 ) Context and Purpose: This PR automatically remediates a security vulnerability: - Description: Detected possible formatted SQL query. Use parameterized queries instead. - Rule ID: python.lang.security.audit.formatted-sql-query.formatted-sql-query - Severity: HIGH - File: rag/utils/opendal_conn.py - Lines Affected: 98 - 98 This change is necessary to protect the application from potential security risks associated with this vulnerability. Solution Implemented: The automated remediation process has applied the necessary changes to the affected code in `rag/utils/opendal_conn.py` to resolve the identified issue. Please review the changes to ensure they are correct and integrate as expected.	2025-06-23 14:54:25 +08:00
Kevin Hu	d4e6e2bd21	Fix: doc_aggs issue. (#8418 ) ### What problem does this PR solve? #8406 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-23 14:54:01 +08:00
balibabu	81a4c0698c	Feat: Solved the conflict between the Handle click and drag events of the canvas node #3221 (#8413 ) ### What problem does this PR solve? Feat: Solved the conflict between the Handle click and drag events of the canvas node #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-23 14:36:01 +08:00
Kevin Hu	83e23f1e8a	Fix: rank feature score should be greater than 0. (#8416 ) ### What problem does this PR solve? #8414 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-23 14:10:13 +08:00
Stephen Hu	794a4102c2	Fix: Document parse via API will alot problen (#8407 ) ### What problem does this PR solve? #8391 #8404 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-23 13:08:11 +08:00
writinwaters	3a50908946	Docs: Added v0.19.1 release notes (#8398 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2025-06-23 09:51:28 +08:00
balibabu	db9e91152d	Feat: Add Tavily operator #3221 (#8400 ) ### What problem does this PR solve? Feat: Add Tavily operator #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-23 09:51:09 +08:00
balibabu	887651e5fa	Fix: Fixed the issue where tag content would overflow the container #8392 (#8393 ) ### What problem does this PR solve? Fix: Fixed the issue where tag content would overflow the container #8392 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) v0.19.1	2025-06-20 16:33:46 +08:00
BlueYu-0221	bb3d3f921a	Refa: Pdf 2 Slices page to new style (#8386 ) ### What problem does this PR solve? Refactor Pdf 2 Slices page to new style ### Type of change - [X] Refactoring	2025-06-20 16:18:37 +08:00
balibabu	8695d60055	Feat: Improve the tavily form #3221 (#8390 ) ### What problem does this PR solve? Feat: Improve the tavily form #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-20 16:18:22 +08:00
Yongteng Lei	936a91c5fe	Fix: code debug may corrupt by history answer (#8385 ) ### What problem does this PR solve? Fix code debug may corrupt by history answer. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-20 14:23:02 +08:00
Stephen Hu	ef5e7d8c44	Fix:embedding_model class SILICONFLOWEmbed(Base)Function reusing json (#8378 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8360 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-20 11:13:00 +08:00
Yongteng Lei	80f1f2723c	Docs: add curl example for interacting with the RAGFlow MCP server (#8372 ) ### What problem does this PR solve? Add curl example for interacting with the RAGFlow MCP server. Special thanks to @writinwaters for his expert refinement. ### Type of change - [x] Documentation Update --------- Co-authored-by: writinwaters <cai.keith@gmail.com>	2025-06-20 10:18:17 +08:00
balibabu	c4e081d4c6	Feat: Synchronize the data of the tavily form to the canvas node #3221 (#8377 ) ### What problem does this PR solve? Feat: Synchronize the data of the tavily form to the canvas node #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-20 10:16:32 +08:00
balibabu	972fd919b4	Feat: Deleting the last tool of the agent will delete the tool node #3221 (#8376 ) ### What problem does this PR solve? Feat: Deleting the last tool of the agent will delete the tool node #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-19 19:23:16 +08:00
BlueYu-0221	fa3e90c72e	Refactor: Datasets UI #3221 (#8349 ) ### What problem does this PR solve? Refactor Datasets UI #3221. ### Type of change - [X] New Feature (non-breaking change which adds functionality)	2025-06-19 16:40:30 +08:00
balibabu	403efe81a1	Feat: Save the agent tool data to the node #3221 (#8364 ) ### What problem does this PR solve? Feat: Save the agent tool data to the node #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-19 16:38:59 +08:00
Liu An	7e87eb2e23	Docs: Update version references to v0.19.1 in READMEs and docs (#8366 ) ### What problem does this PR solve? - Update Docker image version badges and references from v0.19.0 to v0.19.1 - Modify version mentions in all localized README files (id, ja, ko, pt_br, tzh, zh) - Update version in docker/README.md and related documentation files - Includes updates to Helm values and Python SDK dependencies ### Type of change - [x] Documentation Update	2025-06-19 14:39:27 +08:00
Liu An	9077ee8d15	Fix: desc parameter parsing (#8362 ) ### What problem does this PR solve? - Correct boolean parsing for 'desc' parameter in document_app.py to properly handle string values ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-19 14:22:56 +08:00
changqingla	4784aa5b0b	fix: List Chunks API fails to return the correct document status. (#8347 ) ### What problem does this PR solve? The existing /api/v1/datasets/{dataset_id}/documents/{document_id}/chunks endpoint fails to accurately return a document's chunk status. Even when a chunk is explicitly marked as unavailable, the API still returns true. ![img_v3_02nc_3458a1b7-609e-4f20-8cb7-2156a489848g](https://github.com/user-attachments/assets/ab3b8f69-1284-49c1-8af3-bdfae3416583) ![img_v3_02nc_82f1d96e-7596-4def-ba75-5a2bd10d56cg](https://github.com/user-attachments/assets/a8a4162b-b50d-4dfc-af72-e1d7812a0a93) Co-authored-by: zhoudeyong <zhoudeyong@idr.ai>	2025-06-19 11:12:53 +08:00
Kevin Hu	8f3fe63d73	Fix: duplicated task (#8358 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-19 11:12:29 +08:00
RyanFernandes23	c8b1790c92	Fix typo in dataset name length error message (#8351 ) ### What problem does this PR solve? Fixes a minor grammar issue in a user-facing error message. The original message said "large than" instead of the correct comparative form "larger than". Just a quick fix I noticed while reading the code. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-19 09:54:30 +08:00

... 3 4 5 6 7 ...

3477 Commits