ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2025-12-24 15:36:50 +08:00

Author	SHA1	Message	Date
balibabu	a6bd765a02	Feat: Flatten the request schema of the webhook #10427 (#11917 ) ### What problem does this PR solve? Feat: Flatten the request schema of the webhook #10427 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-12 09:59:54 +08:00
Andrea Bugeja	74afb8d710	feat: Add Single Bucket Mode for MinIO/S3 (#11416 ) ## Overview This PR adds support for Single Bucket Mode in RAGFlow, allowing users to configure MinIO/S3 to use a single bucket with a directory structure instead of creating multiple buckets per Knowledge Base and user folder. ## Problem Statement The current implementation creates one bucket per Knowledge Base and one bucket per user folder, which can be problematic when: - Cloud providers charge per bucket - IAM policies restrict bucket creation - Organizations want centralized data management in a single bucket ## Solution Added a `prefix_path` configuration option to the MinIO connector that enables: - Using a single bucket with directory-based organization - Backward compatibility with existing multi-bucket deployments - Support for MinIO, AWS S3, and other S3-compatible storage backends ## Changes - `rag/utils/minio_conn.py`: Enhanced MinIO connector to support single bucket mode with prefix paths - `conf/service_conf.yaml`: Added new configuration options (`bucket` and `prefix_path`) - `docker/service_conf.yaml.template`: Updated template with single bucket configuration examples - `docker/.env.single-bucket-example`: Added example environment variables for single bucket setup - `docs/single-bucket-mode.md`: Comprehensive documentation covering usage, migration, and troubleshooting ## Configuration Example ```yaml minio: user: "access-key" password: "secret-key" host: "minio.example.com:443" bucket: "ragflow-bucket" # Single bucket name prefix_path: "ragflow" # Optional prefix path ``` ## Backward Compatibility ✅ Fully backward compatible - existing deployments continue to work without any changes - If `bucket` is not configured, uses default multi-bucket behavior - If `bucket` is configured without `prefix_path`, uses bucket root - If both are configured, uses `bucket/prefix_path/` structure ## Testing - Tested with MinIO (local and cloud) - Verified backward compatibility with existing multi-bucket mode - Validated IAM policy restrictions work correctly ## Documentation Included comprehensive documentation in `docs/single-bucket-mode.md` covering: - Configuration examples - Migration guide from multi-bucket to single-bucket mode - IAM policy examples - Troubleshooting guide --- Related Issue: Addresses use cases where bucket creation is restricted or costly	2025-12-11 19:22:47 +08:00
Kevin Hu	ea4a5cd665	Fix: tokenizer issue. (#11902 ) #11786 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-11 17:38:17 +08:00
balibabu	22a51a3868	Feat: Add mineru as a model manufacturer to the system. #10621 (#11903 ) ### What problem does this PR solve? Feat: Add mineru as a model manufacturer to the system. #10621 ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2025-12-11 17:37:10 +08:00
Yongteng Lei	e9710b7aa9	Refa: treat MinerU as an OCR model 2 (#11905 ) ### What problem does this PR solve? Treat MinerU as an OCR model 2. #11903 ### Type of change - [x] Refactoring	2025-12-11 17:33:12 +08:00
TeslaZY	bd0eff2954	Add DeepseekV3.2 of Tongyi-Qianwen and remove unused code (#11898 ) ### What problem does this PR solve? Add DeepseekV3.2 of Tongyi-Qianwen and remove unused code ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-11 13:55:01 +08:00
buua436	e3cfe8e848	Fix:async issue and sensitive logging (#11895 ) ### What problem does this PR solve? change： async issue and sensitive logging ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-11 13:54:47 +08:00
TeslaZY	c610bb605a	Added semi-automatic mode to the metadata filter (#11886 ) ### What problem does this PR solve? Retrieval metadata filtering adds semi-automatic mode, and users can manually check the metadata key that participates in LLM to generate filter conditions. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-11 10:45:21 +08:00
David López Carrascal	a6afb7dfe2	Fix data_sync startup crash by properly invoking async main (#11879 ) ### What problem does this PR solve? This PR fixes a startup crash in the data_sync_0 service caused by an incorrect asyncio.run call. The main coroutine was being passed as a function reference instead of being invoked, which raised: `ValueError: a coroutine was expected, got <function main ...> ` What I changed - Updated the entrypoint in sync_data_source.py to correctly invoke the coroutine with `asyncio.run(main())`. Testing - No tested. Related Issue Fixes https://github.com/infiniflow/ragflow/issues/11878 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-11 10:09:16 +08:00
TeslaZY	7b96113d4c	MinerU supports for the new backend vlm-mlx-engine (#11864 ) ### What problem does this PR solve? MinerU new version supports for the new backend vlm-mlx-engine，https://github.com/opendatalab/MinerU . ### Type of change - [ x ] New Feature (non-breaking change which adds functionality)	2025-12-11 09:59:38 +08:00
Yongteng Lei	8370bc61b7	Feat: enhance metadata operation (#11874 ) ### What problem does this PR solve? Add metadata condition in document list. Add metadata bulk update. Add metadata summary. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update	2025-12-11 09:59:15 +08:00
N0bodycan	74eb894453	Fix `RuntimeError: asyncio.run() cannot be called from a running event loop` when calling mindmap endpoint. (#11880 ) ### What problem does this PR solve? Fix RuntimeError when calling mindmap endpoint by converting `gen_mindmap()` to async function and using `await` instead of `asyncio.run()`. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-12-11 09:47:44 +08:00
balibabu	34d29d7e8b	Feat: Add configuration for webhook to the begin node. #10427 (#11875 ) ### What problem does this PR solve? Feat: Add configuration for webhook to the begin node. #10427 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-10 19:13:57 +08:00
He Wang	badf33e3b9	feat: enhance OBConnection.search (#11876 ) ### What problem does this PR solve? Enhance OBConnection.search for better performance. Main changes: 1. Use string type of vector array in distance func for better parsing performance. 2. Manually set max_connections as pool size instead of using default value. 3. Set 'fulltext_search_columns' when starting. 4. Cache the results of the table existence check (we will never drop the table). 5. Remove unused 'group_results' logic. 6. Add the `USE_FULLTEXT_FIRST_FUSION_SEARCH` flag, and the corresponding fusion search SQL when it's false. ### Type of change - [x] Performance Improvement	2025-12-10 19:13:37 +08:00
buua436	3cb72377d7	Refa:remove sensitive information (#11873 ) ### What problem does this PR solve? change: remove sensitive information ### Type of change - [x] Refactoring	2025-12-10 19:08:45 +08:00
buua436	ab4b62031f	Fix:csv parse in Table (#11870 ) ### What problem does this PR solve? change: csv parse in Table ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-10 16:44:06 +08:00
chanx	80f3ccf1ac	Fix:Modify the name of the Overlapped percent field (#11866 ) ### What problem does this PR solve? Fix:Modify the name of the Overlapped percent field ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-10 13:38:24 +08:00
Lynn	a1164b9c89	Feat/memory (#11812 ) ### What problem does this PR solve? Manage and display memory datasets. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-10 13:34:08 +08:00
Russell Valentine	fd7e55b23d	executor_manager updated docker version (#11806 ) ### What problem does this PR solve? The docker version(24.0.7) installed in the executor manager image is incompatible with the latest stable docker (29.1.3). The minmum api v29.1.3 can use is 1.4.4 api version, but 24.0.7 uses api version 1.4.3. ### Type of change - [X] Other (please describe): This could break things for people who still have an old docker installed on their system. A better approach could be a setting to share	2025-12-10 11:08:11 +08:00
Zhichang Yu	f128a1fa9e	Bump python to >=3.12 (#11846 ) ### What problem does this PR solve? Bump python to >=3.12 ### Type of change - [x] Refactoring	2025-12-09 19:55:25 +08:00
buua436	65a5a56d95	Refa:replace trio with asyncio (#11831 ) ### What problem does this PR solve? change: replace trio with asyncio ### Type of change - [x] Refactoring	2025-12-09 19:23:14 +08:00
Magicbook1108	ca2d6f3301	Fix: duplicate output by async_chat_streamly (#11842 ) ### What problem does this PR solve? Fix: duplicate output by async_chat_streamly Refact: revert manual modification ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-12-09 19:21:52 +08:00
Yongteng Lei	a94b3b9df2	Refa: treat MinerU as an OCR model (#11849 ) ### What problem does this PR solve? Treat MinerU as an OCR model. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2025-12-09 18:54:14 +08:00
balibabu	30377319d8	Fix: The variables in the message node are not displaying correctly. #11839 (#11841 ) ### What problem does this PR solve? Fix: The variables in the message node are not displaying correctly. #11839 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-09 17:59:49 +08:00
PentaFDevs	07dca37ef0	feat: add Italian language translation support (#11844 ) ### What problem does this PR solve? - Add complete Italian translation file with all UI sections - Register Italian in LanguageAbbreviation enum and language maps - Configure Italian translation in i18n config - Add Italiano to language selector dropdown ### Type of change - [x] Other (please describe): ## What Added complete Italian language translation support to RAGFlow ## Changes - Added comprehensive Italian translation file ([it.ts](ragflow/web/src/locales/it.ts:0:0-0:0)) with all UI sections (1239 lines) - Registered Italian in `LanguageAbbreviation` enum and all language maps - Configured Italian translation in i18n configuration - Added "Italiano" to language selector dropdown ## Impact - Italian users can now use RAGFlow in their native language - All major UI components are translated including: - Login/registration screens - Knowledge base management - Chat interface - Settings and configuration - Admin console - Error messages and notifications ## Testing - Verified all translation keys are present - Confirmed language selector shows "Italiano" correctly - Tested that no translation keys are missing - All UI sections properly translated Co-authored-by: PentaFrame <info@pentaframe.it>	2025-12-09 17:59:21 +08:00
changkeke	036b29f084	Docs: Enhance API reference for file management (#11827 ) ### What problem does this PR solve? The SDK documentation is lacking in file management sections. ### Type of change - [x] Documentation Update	2025-12-09 17:30:53 +08:00
N0bodycan	9863862348	fix: prevent redundant retries in async_chat_streamly upon success (#11832 ) ## What changes were proposed in this pull request? Added a return statement after the successful completion of the async for loop in async_chat_streamly. ## Why are the changes needed? Previously, the code lacked a break/return mechanism inside the try block. This caused the retry loop (for attempt in range...) to continue executing even after the LLM response was successfully generated and yielded, resulting in duplicate requests (up to max_retries times). ## Does this PR introduce any user-facing change? No (it fixes an internal logic bug).	2025-12-09 17:14:30 +08:00
Zhichang Yu	bb6022477e	Bump infinity to v0.6.11. Requires python>=3.11 (#11814 ) ### What problem does this PR solve? Bump infinity to v0.6.11. Requires python>=3.11 ### Type of change - [x] Refactoring	2025-12-09 16:23:37 +08:00
chanx	28bc87c5e2	Feature: Memory interface integration testing (#11833 ) ### What problem does this PR solve? Feature: Memory interface integration testing ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-09 14:52:58 +08:00
Yongteng Lei	c51e6b2a58	Refa: migrate CV model chat to Async (#11828 ) ### What problem does this PR solve? Migrate CV model chat to Async. #11750 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2025-12-09 13:08:37 +08:00
Stephen Hu	481192300d	Fix:[ERROR][Exception]: list index out of range (#11826 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/11821 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-09 09:58:34 +08:00
sjIlll	1777620ea5	fix: set default embedding model for TEI profile in Docker deployment (#11824 ) ## What's changed fix: unify embedding model fallback logic for both TEI and non-TEI Docker deployments > This fix targets Docker / `docker-compose` deployments, ensuring a valid default embedding model is always set—regardless of the compose profile used. ## Changes \| Scenario \| New Behavior \| \|--------\|--------------\| \| Non-`tei-` profile (e.g., default deployment) \| `EMBEDDING_MDL` is now correctly initialized from `EMBEDDING_CFG` (derived from `user_default_llm`), ensuring custom defaults like `bge-m3@Ollama` are properly applied to new tenants. \| \| `tei-` profile (`COMPOSE_PROFILES` contains `tei-`) \| Still respects the `TEI_MODEL` environment variable. If unset, falls back to `EMBEDDING_CFG`. Only when both are empty does it use the built-in default (`BAAI/bge-small-en-v1.5`), preventing an empty embedding model. \| ## Why This Change? - In non-TEI mode: The previous logic would reset `EMBEDDING_MDL` to an empty string, causing pre-configured defaults (e.g., `bge-m3@Ollama` in the Docker image) to be ignored—leading to tenant initialization failures or silent misconfigurations. - In TEI mode: Users need the ability to override the model via `TEI_MODEL`, but without a safe fallback, missing configuration could break the system. The new logic adopts a “config-first, env-var-override” strategy for robustness in containerized environments. ## Implementation - Updated the assignment logic for `EMBEDDING_MDL` in `rag/common/settings.py` to follow a unified fallback chain: EMBEDDING_CFG → TEI_MODEL (if tei- profile active) → built-in default ## Testing Verified in Docker deployments: 1. `COMPOSE_PROFILES=` (no TEI) → New tenants get `bge-m3@Ollama` as the default embedding model 2. `COMPOSE_PROFILES=tei-gpu` with no `TEI_MODEL` set → Falls back to `BAAI/bge-small-en-v1.5` 3. `COMPOSE_PROFILES=tei-gpu` with `TEI_MODEL=my-model` → New tenants use `my-model` as the embedding model Closes #8916 fix #11522 fix #11306	2025-12-09 09:38:44 +08:00
Levi	f3a03b06b2	fix: align http client proxy kwarg (#11818 ) ### What problem does this PR solve? Our HTTP wrapper still passed proxies to httpx.Client/AsyncClient, which expect proxy. As a result, configured proxies were ignored and calls could fail with ValueError("Failed to fetch OIDC metadata: Client.__init__() got an unexpected keyword argument 'proxies'"). This PR switches to the correct proxy kwarg so proxies are honored and the runtime error is resolved. ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue) --- Contribution during my time at RAGcon GmbH.	2025-12-09 09:35:03 +08:00
buua436	dd046be976	Fix: parent-child chunking method (#11810 ) ### What problem does this PR solve? change: parent-child chunking method ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-09 09:34:01 +08:00
lin	5c9672a265	Fix: advanced_ingestion_pipeline (#11816 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: kche0169 <shiratakekanpakuji@gmail.com>	2025-12-09 09:32:42 +08:00
Kevin Hu	09a3854ed8	Fix: chunk method error. (#11807 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-08 14:28:23 +08:00
Jin Hai	43f51baa96	Fix errors (#11804 ) ### What problem does this PR solve? 1. typos 2. grammar errors. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-08 12:21:18 +08:00
chanx	5a2011e687	Fix: Changed 'HightLightMarkdown' to 'HighLightMarkdown' (#11803 ) ### What problem does this PR solve? Fix: Changed 'HightLightMarkdown' to 'HighLightMarkdown', and replaced the private component with a public component. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-08 11:11:48 +08:00
Lynn	7dd9ce0b5f	Feat: default start admin (#11801 ) ### What problem does this PR solve? Default start admin when start with docker-compose.yml ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-08 11:11:27 +08:00
Stephen Hu	b66881a371	Refactor:book parser use with to handle bytesIO (#11800 ) ### What problem does this PR solve? book parser use with to handle bytesIO ### Type of change - [x] Refactoring	2025-12-08 10:18:46 +08:00
Rohit	4d7934061e	fix: Correct toast type import path in use-toast hook (#11791 ) This commit resolves an incorrect import path for `ToastProps` and `ToastActionElement` types within the `use-toast.tsx` hook. The current path, `@/registry/default/ui/toast`, does not reflect the actual file location in this repository. The import in `src/components/hooks/use-toast.tsx` has been updated from `@/registry/default/ui/toast` to the correct alias path: `@/components/ui/toast`. This ensures the types are resolved correctly and the codebase remains clean and functional.	2025-12-08 10:18:20 +08:00
chanx	660fa8888b	Features: Memory page rendering and other bug fixes (#11784 ) ### What problem does this PR solve? Features: Memory page rendering and other bug fixes - Rendering of the Memory list page - Rendering of the message list page in Memory - Fixed an issue where the empty state was incorrectly displayed when search criteria were applied - Added a web link for the API-Key - modifying the index_mode attribute of the Confluence data source. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-12-08 10:17:56 +08:00
Mustafa Aldemir	3285f09c92	Add huggingface-hub dependency (#11794 ) ### What problem does this PR solve? When a script has a block like this at the top, then uv run download_deps.py ignores the [project].dependencies in pyproject.toml and only uses that dependencies = [...] list. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-08 09:50:03 +08:00
Yongteng Lei	51ec708c58	Refa: cleanup synchronous functions in chat_model and implement synchronization for conversation and dialog chats (#11779 ) ### What problem does this PR solve? Cleanup synchronous functions in chat_model and implement synchronization for conversation and dialog chats. ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-12-08 09:43:03 +08:00
buua436	9b8971a9de	Fix:toc in pipeline (#11785 ) ### What problem does this PR solve? change: Fix toc in pipeline ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-08 09:42:20 +08:00
Jin Hai	6546f86b4e	Fix errors (#11795 ) ### What problem does this PR solve? - typos - IDE warnings ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-08 09:42:10 +08:00
天海蒼灆	8de6b97806	Feature (canvas): Add Api for download "message" component output's file (#11772 ) ### What problem does this PR solve? -Add Api for download "message" component output's file -Change the attachment output type check from tuple to dictionary,because 'attachement' is not instance of tuple -Update the message type to message_end to avoid the problem that content does not send an error message when the message type is ans ["data"] ["content"] ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-12-05 19:42:35 +08:00
TeslaZY	e4e0a88053	Feat: Fillup component return value not object (#11780 ) ### What problem does this PR solve? Fillup component return value not object ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-05 19:27:36 +08:00
少卿	7719fd6350	Fix MinerU API sanitized-output lookup and manual chunk tuple handling (#11702 ) ### What problem does this PR solve? This PR addresses two independent issues encountered when using the MinerU engine in Ragflow: 1. MinerU API output path mismatch for non-ASCII filenames MinerU sanitizes the root directory name inside the returned ZIP when the original filename contains non-ASCII characters (e.g., Chinese). Ragflow's client-side unzip logic assumed the original filename stem and therefore failed to locate `_content_list.json`. This PR adds: * root-directory detection * fallback lookup using sanitized names * a broadened `_read_output` search with a glob fallback ensuring output files are consistently located regardless of filename encoding. 2. Chunker crash due to tuple-structure mismatch in manual mode Some parsers (e.g., MinerU / Docling) return 2-tuple sections, but Ragflow’s chunker expects 3-tuple sections, leading to: `ValueError: not enough values to unpack (expected 3, got 2)` This PR normalizes all sections to a uniform structure `(text, layout, positions)`: * parse position tags when present * default to empty positions when missing preserving backward compatibility and preventing crashes. ### Type of change * [x] Bug Fix (non-breaking change which fixes an issue) [#11136](https://github.com/infiniflow/ragflow/issues/11136) [#11700](https://github.com/infiniflow/ragflow/issues/11700) [#11620](https://github.com/infiniflow/ragflow/issues/11620) [#11701](https://github.com/infiniflow/ragflow/pull/11701) we need your help [yongtenglei](https://github.com/yongtenglei) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-12-05 19:25:45 +08:00
Giles Lloyd	15ef6dd72f	fix(mcp-server): Ensure all document meta-data is cached (#11767 ) ### What problem does this PR solve? The document metadata cache is built using the list documents endpoint with default pagination parameters of page=1, page_size=3. This means when using the MCP server to search a dataset, only chunks which come from the first 30 documents in the dataset will have metadata returned. Issue described in more detail here https://github.com/infiniflow/ragflow/issues/11533 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Giles Lloyd <giles.af.lloyd@gmail.com>	2025-12-05 19:13:17 +08:00

1 2 3 4 5 ...

4698 Commits