ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-02-06 10:35:06 +08:00

Author	SHA1	Message	Date
Yingfeng	bfef96d56e	Potential fix for code scanning alert no. 58: Clear-text logging of sensitive information (#12070 ) Potential fix for [https://github.com/infiniflow/ragflow/security/code-scanning/58](https://github.com/infiniflow/ragflow/security/code-scanning/58) General approach: avoid logging potentially sensitive URLs (especially at warning level) or ensure they are fully and robustly redacted before logging. Since this client is shared and used with OAuth endpoints, the safest minimal-change fix is to stop including the URL in warning logs (retries exhausted and retry attempts) and only log the HTTP method and a generic message. Debug logs can continue using the existing redaction helper for non-sensitive URLs if desired. Best concrete fix without changing functionality: in `common/http_client.py`, in `async_request`, change the retry-exhausted and retry-attempt warning log statements so that they no longer interpolate `log_url` (and thus the tainted `url`). We can still compute `log_url` if needed elsewhere, but the log string itself should not contain `log_url`. This directly removes the tainted data from the sink while preserving information about errors and retry behavior. No changes are required in `common/settings.py` or `api/apps/user_app.py`, and we do not need new imports or helpers. Specifically: - In `common/http_client.py`, around line 152–163, replace the two warning logs: - `logger.warning(f"async_request exhausted retries for {method} {log_url}")` - `logger.warning(f"async_request attempt {attempt + 1}/{retries + 1} failed for {method} {log_url}; retrying in {delay:.2f}s")` with versions that omit `{log_url}`, such as: - `logger.warning(f"async_request exhausted retries for {method}")` - `logger.warning(f"async_request attempt {attempt + 1}/{retries + 1} failed for {method}; retrying in {delay:.2f}s")` This ensures no URL-derived data flows into these warning logs, addressing all variants of the alert, since they all trace to the same sink. --- _Suggested fixes powered by Copilot Autofix. Review carefully before merging._ Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2025-12-22 13:31:25 +08:00
Yingfeng	74adf3d59c	Potential fix for code scanning alert no. 57: Clear-text logging of sensitive information (#12071 ) Potential fix for [https://github.com/infiniflow/ragflow/security/code-scanning/57](https://github.com/infiniflow/ragflow/security/code-scanning/57) In general, the safest fix is to ensure that any logging of request URLs from `async_request` (and similar helpers) cannot include secrets. This can be done by (a) suppressing logging entirely for URLs considered sensitive, or (b) logging only a non-sensitive subset (e.g., scheme + host + path) and never query strings or credentials. The minimal, backward-compatible change here is to strengthen `_redact_sensitive_url_params` and `_is_sensitive_url` / the logging call so that we never log query parameters at all. Instead of logging the full URL (with redacted query), we can log only `scheme://netloc/path` and optionally strip userinfo. This retains useful observability (which endpoint, which method, response code, timing) while guaranteeing that no secrets in query strings or path segments appear in logs. Concretely: - Update `_redact_sensitive_url_params` to not include the query string in the returned value, and to drop any embedded userinfo (`username:password@host`). - Continue to wrap logging in a “sensitive URL” guard, but now the redaction routine itself ensures no secrets from query are present. - Leave callers (e.g., `github_callback`, `feishu_callback`) unchanged, since they only pass URLs and do not control the logging behavior directly. All changes are confined to `common/http_client.py` inside the provided snippet. No new imports are necessary. _Suggested fixes powered by Copilot Autofix. Review carefully before merging._ --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2025-12-22 13:31:03 +08:00
Stephen Hu	ba7e087aef	Refactor:remove useless try catch for ppt parser (#12063 ) ### What problem does this PR solve? remove useless try catch for ppt parser ### Type of change - [x] Refactoring	2025-12-22 13:09:42 +08:00
Yongteng Lei	f911aa2997	Fix: list MCP tools may block (#12067 ) ### What problem does this PR solve? List MCP tools may block. #12043 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-22 13:08:44 +08:00
Jin Hai	42f9ac997f	Remove Chinese comments and fix function arguments errors (#12052 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-22 12:59:37 +08:00
Magicbook1108	c7cf7aad4e	Fix: update RAGFlow SDK for consistency (#12065 ) ### What problem does this PR solve? Fix: update RAGFlow SDK for consistency #12059 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-22 11:09:56 +08:00
Stephen Hu	2118bc2556	Fix: Python SDK retrieve document_name is empty (#12062 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/12056 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-22 11:08:39 +08:00
buua436	b49eb6826b	Feat: enhance Excel image extraction with vision-based descriptions (#12054 ) ### What problem does this PR solve? issue: [#11618](https://github.com/infiniflow/ragflow/issues/11618) change: enhance Excel image extraction with vision-based descriptions ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-22 10:17:44 +08:00
Jimmy Ben Klieve	8dd2394e93	feat: add optional cache busting for image (#12055 ) ### What problem does this PR solve? Add optional cache busting for image #12003 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-22 09:36:45 +08:00
Magicbook1108	5aea82d9c4	Feat: Separate connectors from s3 (#12045 ) ### What problem does this PR solve? Feat: Separate connectors from s3 #12008 ### Type of change - [x] New Feature (non-breaking change which adds functionality) Overview: <img width="1500" alt="image" src="https://github.com/user-attachments/assets/d54fea7a-7294-4ec0-ab6c-9753b3f03a72" /> Oracle: <img width="350" alt="image" src="https://github.com/user-attachments/assets/bca140c1-33d8-4950-afdc-153407eedc46" />	2025-12-22 09:36:16 +08:00
Jimmy Ben Klieve	47005ebe10	feat: supports multiple retrieval tool under an agent (#12046 ) ### What problem does this PR solve? Add support for multiple Retrieval tools under an agent ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-22 09:35:34 +08:00
Yongteng Lei	3ee47e4af7	Feat: document list and filter supports metadata filtering (#12053 ) ### What problem does this PR solve? Document list and filter supports metadata filtering. OR within the same field, AND across different fields Example 1 (multi-field AND): ```markdown Doc1 metadata: { "a": "b", "as": ["a", "b", "c"] } Doc2 metadata: { "a": "x", "as": ["d"] } Query: metadata = { "a": ["b"], "as": ["d"] } Result: Doc1 matches a=b but not as=d → excluded Doc2 matches as=d but not a=b → excluded Final result: empty ``` Example 2 (same field OR): ```markdown Doc1 metadata: { "as": ["a", "b", "c"] } Doc2 metadata: { "as": ["d"] } Query: metadata = { "as": ["a", "d"] } Result: Doc1 matches as=a → included Doc2 matches as=d → included Final result: Doc1 + Doc2 ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-22 09:35:11 +08:00
wenjuhao	55c0468ac9	Include document_id in knowledgebase info retrieval (#12041 ) ### What problem does this PR solve? After a file in the file list is associated with a knowledge base, the knowledge base document ID is returned ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-19 19:32:24 +08:00
chanx	eeb36a5ce7	Feature: Implement metadata functionality (#12049 ) ### What problem does this PR solve? Feature: Implement metadata functionality ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-19 19:13:33 +08:00
balibabu	aceca266ff	Feat: Images appearing consecutively in the dialogue are merged and displayed in a carousel. #10427 (#12051 ) ### What problem does this PR solve? Feat: Images appearing consecutively in the dialogue are merged and displayed in a carousel. #10427 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-19 19:13:18 +08:00
miguelmanlyx	d82e502a71	Add AI Badgr as OpenAI-compatible chat model provider (#12018 ) ## What problem does this PR solve? Adds AI Badgr as an optional LLM provider in RAGFlow. Users can use AI Badgr for chat completions and embeddings via its OpenAI-compatible API. Background: - AI Badgr provides OpenAI-compatible endpoints (`/v1/chat/completions`, `/v1/embeddings`, `/v1/models`) - Previously, RAGFlow didn't support AI Badgr - This PR adds support following the existing provider pattern (e.g., CometAPI, DeerAPI) Implementation details: - Added AI Badgr to the provider registry and configuration - Supports chat completions (via LiteLLMBase) and embeddings (via AIBadgrEmbed) - Uses standard API key authentication - Base URL: `https://aibadgr.com/api/v1` - Environment variables: `AIBADGR_API_KEY`, `AIBADGR_BASE_URL` (optional) ## Type of change - [x] New Feature (non-breaking change which adds functionality) This is a new feature that adds support for a new provider without changing existing functionality. --------- Co-authored-by: michaelmanley <55236695+michaelbrinkworth@users.noreply.github.com>	2025-12-19 17:45:20 +08:00
balibabu	0494b92371	Feat: Display error messages from intermediate nodes. #10427 (#12038 ) ### What problem does this PR solve? Feat: Display error messages from intermediate nodes. #10427 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-19 17:44:45 +08:00
writinwaters	8683a5b1b7	Docs: How to call MinerU as a remote service (#12004 ) ### Type of change - [x] Documentation Update	2025-12-19 17:06:32 +08:00
balibabu	4cbe470089	Feat: Display error messages from intermediate nodes of the webhook. #10427 (#11954 ) ### What problem does this PR solve? Feat: Remove HMAC from the webhook #10427 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-19 12:56:56 +08:00
Yongteng Lei	6cd1824a77	Feat: chats completions API supports metadata filtering (#12023 ) ### What problem does this PR solve? Chats completions API supports metadata filtering. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-19 11:36:35 +08:00
Yongteng Lei	2844700dc4	Refa: better UX for adding OCR model (#12034 ) ### What problem does this PR solve? Better UX for adding OCR model. ### Type of change - [x] Refactoring	2025-12-19 11:34:21 +08:00
Magicbook1108	f8fd1ea7e1	Feat: Further update Bedrock model configs (#12029 ) ### What problem does this PR solve? Feat: Further update Bedrock model configs #12020 #12008 <img width="700" alt="2b4f0f7fab803a2a2d5f345c756a2c69" src="https://github.com/user-attachments/assets/e1b9eaad-5c60-47bd-a6f4-88a104ce0c63" /> <img width="700" alt="afe88ec3c58f745f85c5c507b040c250" src="https://github.com/user-attachments/assets/9de39745-395d-4145-930b-96eb452ad6ef" /> <img width="700" alt="1a21bb2b7cd8003dce1e5207f27efc69" src="https://github.com/user-attachments/assets/ddba1682-6654-4954-aa71-41b8ebc04ac0" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-19 11:32:20 +08:00
buua436	57edc215d7	Feat:update webhook component (#11739 ) ### What problem does this PR solve? issue: https://github.com/infiniflow/ragflow/issues/10427 https://github.com/infiniflow/ragflow/issues/8115 change: - Support for Multiple HTTP Methods (POST / GET / PUT / PATCH / DELETE / HEAD) - Security Validation 1. max_body_size 2. IP whitelist 3. rate limit 4. token / basic / jwt authentication - File Upload Support - Unified Content-Type Handling - Full Schema-Based Extraction & Type Validation - Two Execution Modes: Immediately / Streaming ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-18 19:34:39 +08:00
Jonah Hartmann	7a4044b05f	Feat: use filepath for files with the same name for all data source types (#11819 ) ### What problem does this PR solve? When there are multiple files with the same name the file would just duplicate, making it hard to distinguish between the different files. Now if there are multiple files with the same name, they will be named after their folder path in the storage unit. This was done for the webdav connector and with this PR also for Notion, Confluence and S3 Storage. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Contribution by RAGcon GmbH, visit us [here](https://www.ragcon.ai/)	2025-12-18 17:42:43 +08:00
Magicbook1108	e84d5412bc	Feat: bedrock iam authentication (#12020 ) ### What problem does this PR solve? Feat: bedrock iam authentication #12008 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-18 17:13:09 +08:00
Yongteng Lei	151480dc85	Feat: trace information can be returned by the agent completion API (#12019 ) ### What problem does this PR solve? Trace information can be returned by the agent completion API. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-18 15:52:11 +08:00
Magicbook1108	2331b3a270	Refact: Update loggings (#12014 ) ### What problem does this PR solve? Refact: Update loggings ### Type of change - [x] Refactoring	2025-12-18 14:18:03 +08:00
Magicbook1108	5cd1a678c8	Fix: image edit in edit_chunk (#12009 ) ### What problem does this PR solve? Fix: image edit in edit_chunk #11971 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-18 11:35:01 +08:00
Jin Hai	cc9546b761	Fix IDE warnings (#12010 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-18 11:27:02 +08:00
Stephen Hu	a63dcfed6f	Refactor: improve cohere calculate total counts (#12007 ) ### What problem does this PR solve? improve cohere calculate total counts ### Type of change - [x] Refactoring	2025-12-18 10:04:28 +08:00
concertdictate	4dd8cdc38b	task executor issues (#12006 ) ### What problem does this PR solve? Fixes #8706 - `InfinityException: TOO_MANY_CONNECTIONS` when running multiple task executor workers ### Problem Description When running RAGFlow with 8-16 task executor workers, most workers fail to start properly. Checking logs revealed that workers were stuck/hanging during Infinity connection initialization - only 1-2 workers would successfully register in Redis while the rest remained blocked. ### Root Cause The Infinity SDK `ConnectionPool` pre-allocates all connections in `__init__`. With the default `max_size=32` and multiple workers (e.g., 16), this creates 16×32=512 connections immediately on startup, exceeding Infinity's default 128 connection limit. Workers hang while waiting for connections that can never be established. ### Changes 1. Prevent Infinity connection storm (`rag/utils/infinity_conn.py`, `rag/svr/task_executor.py`) - Reduced ConnectionPool `max_size` from 32 to 4 (sufficient since operations are synchronous) - Added staggered startup delay (2s per worker) to spread connection initialization 2. Handle None children_delimiter (`rag/app/naive.py`) - Use `or ""` to handle explicitly set None values from parser config 3. MinerU parser robustness (`deepdoc/parser/mineru_parser.py`) - Use `.get()` for optional output fields that may be missing - Fix DISCARDED block handling: change `pass` to `continue` to skip discarded blocks entirely ### Why `max_size=4` is sufficient \| Workers \| Pool Size \| Total Connections \| Infinity Limit \| \|---------\|-----------\|-------------------\|----------------\| \| 16 \| 32 \| 512 \| 128 ❌ \| \| 16 \| 4 \| 64 \| 128 ✅ \| \| 32 \| 4 \| 128 \| 128 ✅ \| - All RAGFlow operations are synchronous: `get_conn()` → operation → `release_conn()` - No parallel `docStoreConn` operations in the codebase - Maximum 1-2 concurrent connections needed per worker; 4 provides safety margin ### MinerU DISCARDED block bug When MinerU returns blocks with `type: "discarded"` (headers, footers, watermarks, page numbers, artifacts), the previous code used `pass` which left the `section` variable undefined, causing: - UnboundLocalError if DISCARDED is the first block - Duplicate content if DISCARDED follows another block (stale value from previous iteration) Root cause confirmed via MinerU source code: From [`mineru/utils/enum_class.py`](https://github.com/opendatalab/MinerU/blob/main/mineru/utils/enum_class.py#L14): ```python class BlockType: DISCARDED = 'discarded' # VLM 2.5+ also has: HEADER, FOOTER, PAGE_NUMBER, ASIDE_TEXT, PAGE_FOOTNOTE ``` Per [MinerU documentation](https://opendatalab.github.io/MinerU/reference/output_files/), discarded blocks contain content that should be filtered out for clean text extraction. Fix: Changed `pass` to `continue` to skip discarded blocks entirely. ### Testing - Verified all 16 workers now register successfully in Redis - All workers heartbeating correctly - Document parsing works as expected - MinerU parsing with DISCARDED blocks no longer crashes ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: user210 <user210@rt>	2025-12-18 10:03:30 +08:00
Stephen Hu	1a4822d6be	Refactor: Improve the timestamp consistency (#11942 ) ### What problem does this PR solve? Improve the timestamp consistency ### Type of change - [x] Refactoring	2025-12-18 09:40:33 +08:00
Jimmy Ben Klieve	ce161f09cc	feat: add image uploader in edit chunk dialog (#12003 ) ### What problem does this PR solve? Add image uploader in edit chunk dialog for replacing image chunk ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-18 09:33:52 +08:00
Yongteng Lei	672958a192	Fix: model not authorized (#12001 ) ### What problem does this PR solve? Fix model not authorized. #11973. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-17 19:48:24 +08:00
Yongteng Lei	3820de916c	Fix: duplicated PDF parser (#12000 ) ### What problem does this PR solve? Fix duplicated PDF parser. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-17 19:48:10 +08:00
Jin Hai	ef44979b5c	Fix table format warning in Markdown file (#12002 ) ### What problem does this PR solve? As title ### Type of change - [x] Documentation Update - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-17 19:27:47 +08:00
Jin Hai	d38f8a1562	Add license and Fix IDE warnings (#11985 ) ### What problem does this PR solve? - Add license - Fix IDE warnings ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-12-17 17:04:44 +08:00
Kevin Hu	8e4d011b15	Fix: parent-children chunking method. (#11997 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-12-17 16:50:36 +08:00
Magicbook1108	7baa67dfe8	Feat: Reject default admin account log in to normal services (#11994 ) ### What problem does this PR solve? Feat: Reject default admin account log in to normal services #11854 #11673 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-17 16:29:20 +08:00
Jimmy Ben Klieve	e58271ef76	feat: add toc option in transformer node in ingestion pipeline (#11992 ) ### What problem does this PR solve? Add TOC (Table of contents) option in Ingestion Pipeline canvas > Transformer node ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-17 15:51:55 +08:00
Magicbook1108	4fd4a41e7c	Fix: add multimodel models in chat api (#11986 ) …tant, but model is available via UI Fix: add multimodel models in chat api Fixes #8549 ### What problem does this PR solve? Add a parameter model_type in chat api. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>	2025-12-17 15:46:43 +08:00
Magicbook1108	82d4e5fb87	Ref: update loggings (#11987 ) ### What problem does this PR solve? Ref: update loggins ### Type of change - [x] Refactoring	2025-12-17 15:43:25 +08:00
chanx	d16643a53d	Fix: Fixed the issue of empty memory parameters (#11988 ) ### What problem does this PR solve? Fix: Fixed the issue of empty memory parameters ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-17 15:42:29 +08:00
Magicbook1108	93ca1e0b91	Fix: update document api sample reponse is out of date. (#11989 ) ### What problem does this PR solve? Fix: update document api sample reponse is out of date. ### Type of change - [x] Documentation Update	2025-12-17 15:39:12 +08:00
Jimmy Ben Klieve	4046bffaf1	fix: unable to save ingestion pipeline config without modifying children delimiter (#11991 ) …ildren delimiter ### What problem does this PR solve? Fix the issue of unable to save Files > Ingestion Pipeline (Modal) config without modifying children delimiter ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-17 15:37:28 +08:00
Yongteng Lei	03f9be7cbb	Refa: only support MinerU-API now (#11977 ) ### What problem does this PR solve? Only support MinerU-API now, still need to complete frontend for pipeline to allow the configuration of MinerU options. ### Type of change - [x] Refactoring	2025-12-17 12:58:48 +08:00
Jin Hai	5e05f43c3d	Update default prompt (#11984 ) ### What problem does this PR solve? New default prompt: ``` You are an intelligent assistant. Your primary function is to answer questions based strictly on the provided knowledge base. Essential Rules: - Your answer must be derived solely from this knowledge base: `{knowledge}`. - When information is available: Summarize the content to give a detailed answer. - When information is unavailable: Your response must contain this exact sentence: "The answer you are looking for is not found in the knowledge base!" - Always consider the entire conversation history. ``` Also fix some grammar errors. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-17 12:57:24 +08:00
chanx	205a6483f5	Feature：memory function complete (#11982 ) ### What problem does this PR solve? memory function complete ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-17 12:35:26 +08:00
Jimmy Ben Klieve	2595644dfd	feat: add ingestion pipeline children delimiters configs (#11979 ) ### What problem does this PR solve? Add children delimiters for Ingestion pipeline config ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-17 11:18:54 +08:00
Jin Hai	30019dab9f	Change knowledge base to dataset (#11976 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-17 10:03:33 +08:00

1 2 3 4 5 ...

4777 Commits